Rojas-Time series analysis and forecasting-Book16
Rojas-Time series analysis and forecasting-Book16
Ignacio Rojas
Héctor Pomares Editors
Time Series
Analysis and
Forecasting
Selected Contributions from the ITISE
Conference
Contributions to Statistics
More information about this series at http://www.springer.com/series/2912
Ignacio Rojas • Héctor Pomares
Editors
123
Editors
Ignacio Rojas Héctor Pomares
CITIC-UGR CITIC-UGR
University of Granada University of Granada
Granada, Spain Granada, Spain
ISSN 1431-1968
Contributions to Statistics
ISBN 978-3-319-28723-2 ISBN 978-3-319-28725-6 (eBook)
DOI 10.1007/978-3-319-28725-6
Mathematics Subject Classification (2010): 37M10, 62M10, 62-XX, 68-XX, 60-XX, 58-XX, 37-XX
v
vi Preface
organization of ITISE 2015 was to create a friendly discussion forum for scientists,
engineers, educators, and students about the latest ideas and realizations in the
foundations, theory, models, and applications for interdisciplinary and multidis-
ciplinary research encompassing disciplines of statistics, mathematical models,
econometrics, engineering, and computer science in the field of time series analysis
and forecasting.
The list of topics in the successive Call for Papers has also evolved, resulting in
the following list for the last edition:
1. Time Series Analysis and Forecasting
• Nonparametric and functional methods
• Vector processes
• Probabilistic approach to modeling macroeconomic uncertainties
• Uncertainties in forecasting processes
• Nonstationarity
• Forecasting with many models. Model integration
• Forecasting theory and adjustment
• Ensemble forecasting
• Forecasting performance evaluation
• Interval forecasting
• Econometric models
• Econometric forecasting
• Data preprocessing methods: data decomposition, seasonal adjustment, singu-
lar spectrum analysis, and detrending methods
2. Advanced Methods and Online Learning in Time Series
• Adaptivity for stochastic models
• Online machine learning for forecasting
• Aggregation of predictors
• Hierarchical forecasting
• Forecasting with computational intelligence
• Time series analysis with computational intelligence
• Integration of system dynamics and forecasting models
3. High Dimension and Complex/Big Data
• Local vs. global forecast
• Techniques for dimension reduction
• Multiscaling
• Forecasting from Complex/Big Data
4. Forecasting in Real Problems
• Health forecasting
• Telecommunication forecasting
• Modeling and forecasting in power markets
• Energy forecasting
Preface vii
xi
xii Contents
xv
xvi Contributors
and
School of Atmospheric Physics, Nanjing University of Information Science &
Technology, Nanjing, China
Phil. J. Watson School of Civil and Environmental Engineering, University of
New South Wales, Sydney, NSW, Australia
Part I
Advanced Analysis and Forecasting
Methods
A Direct Method for the Langevin-Analysis
of Multidimensional Stochastic Processes
with Strong Correlated Measurement Noise
Abstract This paper addresses the problem of finding a direct operational method
to disentangle the sum of two continuous Markovian stochastic processes, a more
general case of the so-called measurement noise concept, given only a measured
time series of the sum process. The presented method is based on a recently
published approach for the analysis of multidimensional Langevin-type stochastic
processes in the presence of strong correlated measurement noise (Lehle, J Stat Phys
152(6):1145–1169, 2013). The method extracts from noisy data the respective drift
and diffusion coefficients corresponding to the Itô–Langevin equation describing
each stochastic process. The method presented here imposes neither constraints nor
parameters, but all coefficients are directly extracted from the multidimensional
data. The method is introduced within the framework of existing reconstruction
T. Scholz ()
Center for Theoretical and Computational Physics, University of Lisbon, Lisbon, Portugal
e-mail: [email protected]
F. Raischel
Instituto Dom Luiz (IDL), University of Lisbon, Lisbon, Portugal
Closer Consulting Avenida Engenheiro Duarte Pacheco, Torre 1-15ı Andar, 1070-101 Lisboa,
Portugal
e-mail: [email protected]
M. Wächter
ForWind Center for Wind Energy Research, Carl von Ossietzky University Oldenburg,
Oldenburg, Germany
P.G. Lind
ForWind Center for Wind Energy Research, Carl von Ossietzky University Oldenburg,
Oldenburg, Germany
Institut für Physik, Universität Osnabrück, Barbarastrasse 7, 49076 Osnabrück, Germany
V.V. Lopes
Universidad de las Fuerzas Armadas - ESPE, Sangolquí, Ecuador
CMAF-CIO, University of Lisbon, Lisbon, Portugal
B. Lehle
Institute of Physics, Carl von Ossietzky University Oldenburg, Oldenburg, Germany
1 Introduction
the Fokker–Planck equation [3] describes the temporal evolution of the correspond-
ing probability density function (pdf) in a PDE of the form
@P.x/ @ 1 @2 .2/
D D.1/ P.x/ C D P.x/ : (2)
@t @X 2 @X 2
A topic of increasing interest in recent years has been the question how—given a
measured time series of a stochastic variable—it can be confirmed that a stochastic
process with Markov properties generates the time series, and if the process can
be reconstructed by means of the SDE or Fokker–Planck description. Friedrich and
Peinke have shown how the Kramers–Moyal coefficients, also called the drift (D.1/ )
and diffusion (D.2/ ) coefficients, can be directly derived from observed or generated
Correlated Measurement Noise 5
data [4], a method that has found widespread application [1], for example for the
description of turbulence [4, 5], financial markets [6], wind energy generation [7, 8],
and biological systems [9]. A better estimation of transition probabilities can be
achieved by using non-parametric Kernel [10] or Maximum Likelihood estimators
[11].
Given an unknown stochastic time series, it is not clear a priori whether it
follows a Langevin equation driven by Gaussian white noise. In order to apply the
reconstruction process, it is therefore advisable to consider the following approach:
first, it should be confirmed that the time series has (approximately) Markovian
properties, e.g., by performing a Wilcoxon test [5]. If the latter is positive, one
can bin the data and calculate marginal and conditional pdfs [5], or evaluate the
conditional moments directly [12]. From these, if the limit in Eq. (13) below exists,
one can calculate the drift and diffusion coefficients. Finally, it should be verified
whether the noise is actually Gaussian: by inverting the discretized Langevin
equation one can solve for the noise increments in each time step and consider their
distribution [13].
In N dimensions, the evolution of a set of stochastic variables can be described
by an equivalent system of Itô–Langevin equations, where the stochastic equations
defined by a deterministic contribution (drift) and fluctuations from stochastic
sources (diffusion) show quite complex behavior [14]. For the general case of a
N-dimensional stochastic process X.t/ the equation is given by
q
dX D D.1/ .X/dt C D.2/ .X/dW.t/; (3)
2 Methodology
The method relies on the direct extraction of the noisy joint moments m from the
measured time series, which are defined as
Z
.0/
m .x/ D .x; x0 ; /dx01 : : : dx0N (5a)
x0
Z
.1/
mi .x; / D .x0i .t C / xi .t// .x; x0 ; /dx01 : : : dx0N (5b)
x0
Z
.2/
mij .x; / D .x0i xi /.x0j .t C / xj .t// .x; x0 ; /dx01 : : : dx0N ; (5c)
x0
.1/
hi .x; / D hŒXi .t C / Xi .t/ijX.t/Dx (6a)
.2/
hij .x; / D hŒXi .t C / Xi .t/ Xj .t C / Xj .t/ ijX.t/Dx (6b)
of the unperturbed process [1] and with the marginal and joint density of the
compound process
These quantities can be computed directly from the time series X . In the next step,
from the first noisy joint moments m.1/ .x; /, a N N matrix Z is computed by
Z
.1/
Zij ./ D mi .x; /xj dx1 : : : dxN : (8)
x
Correlated Measurement Noise 7
Here, P.1 / ; : : : ; P.max / are auxiliary matrices that describe an expansion in temporal
increments , which are integer multiples of the temporal sampling t of the
measured time series. Their definition as well as the derivation of equation
system (9) are described in [2]. The matrices M./ D M.kt/ and V are defined by
the measurement noise parameters A and B through the following equations:
where the Einstein summation convention is used [2]. The Qij are coefficients
.1/ .2/
related to the measurement noise, and hi ; hij are moments of the conditional
8 T. Scholz et al.
increments of X,
ˇ
hi .x; / D hŒXi .t C / Xi .t/ ˇX.t/Dx i
.1/
(12a)
ˇ
hij .x; / D hŒXi .t C / Xi .t/ Xj .t C / Xj .t/ ˇX.t/Dx i :
.2/
(12b)
Equations (11) are solved in the least square sense through nonlinear optimization
within the aforementioned Casadi/Ipopt/HSL framework [18–20]. Details of this
approach will be outlined elsewhere [21]. In a final step, from the conditional
moments, the drift and diffusion coefficients are computed:
.1/
.1/ 1 mi .x; /
Di .x/ D lim (13a)
!0 m.0/ .x/
.2/
.2/ 1 mij .x; /
Dij .x/ D lim ; (13b)
!0 m.0/ .x/
which completely describe the underlying process for the evolution of X, Eq. (3).
3 Results
To illustrate the usefulness of this method, we present the same numerical example
as [2], namely the stochastic process
q
dX D D.1/ .X/dt C D.2/ .X/dW.t/; (14)
with coefficients
! !
200 200=3 75 425=12
AD BD : (17)
0 200=3 425=12 125=6
Correlated Measurement Noise 9
Fig. 1 Sample of original two-dimensional time series X (left), measurement noise Y (middle),
and resulting time series X D X C Y (right)
Fig. 2 Zeroth (left), first (middle), and second (right) moment estimated directly from a synthetic
time series X (top) and reconstructed by our parameter-free method (bottom)
.1/
Fig. 3 The behavior (points) of one of the first noisy moment h1 (left) and one of the second
.2/
noisy moment h11 (right) as a function delay . A crossover between time scales can be clearly
seen. Straight lines: faster (Y) and slower (X) moments of the measurement noise and the general
process for comparison
observe these two different time scales in the moment plots, with a crossover
behavior, see Fig. 3.
References
1. Friedrich, R., Peinke, J., Sahimi, M., Tabar, M.R.R.: Approaching complexity by stochastic
methods: from biological systems to turbulence. Phys. Rep. 506(5), 87–162 (2011)
2. Lehle, B.: Stochastic time series with strong, correlated measurement noise: Markov analysis
in n dimensions. J. Stat. Phys. 152(6), 1145–1169 (2013)
3. Risken, H., Frank, T.: The Fokker-Planck Equation. Springer Series in Synergetics, vol. 18.
Springer, Berlin, Heidelberg (1996)
4. Friedrich, R., Peinke, J.: Description of a turbulent cascade by a Fokker-Planck equation. Phys.
Rev. Lett. 78, 863–866 (1997)
5. Renner, C., Peinke, J., Friedrich, R.: Experimental indications for Markov properties of small-
scale turbulence. J. Fluid Mech. 433, 383–409 (2001)
6. Ghashghaieand, S., Breymann, W., Peinke, J., Talkner, P., Dodge, Y.: Turbulent cascades in
foreign exchange markets. Nature 381, 767–770 (1996)
7. Milan, P., Wächter, M., Peinke, J.: Turbulent character of wind energy. Phys. Rev. Lett., 110,
138701 (2013)
8. Raischel, F., Scholz, T., Lopes, V.V., Lind, P.G.: Uncovering wind turbine properties through
two-dimensional stochastic modeling of wind dynamics. Phys. Rev. E 88, 042146 (2013)
9. Zaburdaev, V., Uppaluri, S., Pfohl, T., Engstler, M., Friedrich, R., Stark, H.: Langevin dynamics
deciphers the motility pattern of swimming parasites. Phys. Rev. Lett. 106, 208103 (2011)
Correlated Measurement Noise 11
10. Lamouroux, D., Lehnertz, K.: Kernel-based regression of drift and diffusion coefficients of
stochastic processes. Phys. Lett. A 373(39), 3507–3512 (2009)
11. Kleinhans, D.: Estimation of drift and diffusion functions from time series data: a maximum
likelihood framework. Phys. Rev. E 85, 026705 (2012)
12. Lehle, B.: Analysis of stochastic time series in the presence of strong measurement noise.
Phys. Rev. E 83, 021113 (2011)
13. Raischel, F., Russo, A., Haase, M., Kleinhans, D., Lind, P.G.: Optimal variables for describing
evolution of no2 concentration. Phys. Lett. A 376, 2081–2089 (2012)
14. Vasconcelos, V.V., Raischel, F., Haase, M., Peinke, J., Wächter, M., Lind, P.G., Kleinhans, D.:
Principal axes for stochastic dynamics. Phys. Rev. E 84, 031103 (2011)
15. Böttcher, F., Peinke, J., Kleinhans, D., Friedrich, R., Lind, P.G., Haase, M.: Reconstruction
of complex dynamical systems affected by strong measurement noise. Phys. Rev. Lett. 97,
090603 (2006)
16. Lind, P.G. Haase, M., Böttcher, F., Peinke, J., Kleinhans, D., Friedrich, R.: Extracting strong
measurement noise from stochastic time series: applications to empirical data. Phys. Rev. E
81, 041125 (2010)
17. Carvalho, J., Raischel, F., Haase, M., Lind, P.G.: Evaluating strong measurement noise in data
series with simulated annealing method. J. Phys. Conf. Ser. 285, 012007 (2011)
18. Wächter, A., Biegler, L.T.: On the implementation of an interior-point filter line-search
algorithm for large-scale nonlinear programming. Math. Program. 106, 25–57 (2006)
19. Andersson, J., Houska, B., Diehl, M.: Towards a computer algebra system with automatic
differentiation for use with object-oriented modelling languages. In: Proceedings of the 3rd
International Workshop on Equation-Based Object-Oriented Modeling Languages and Tools,
Oslo, pp. 99–105 (2010)
20. HSL: A collection of Fortran codes for large scale scientific computation. http://www.hsl.rl.ac.
uk/ (2013)
21. Scholz, T., Raischel, F., Wächter, M., Lehle, B., Lopes, V.V., Lind, P.G., Peinke, J.: Parameter-
free resolution of the superposition of stochastic signals. Phys. Rev. E (2016, submitted)
Threshold Autoregressive Models for Directional
Time Series
Abstract Many time series show directionality as plots against time and against
time-to-go are qualitatively different. A stationary linear model with Gaussian
noise is non-directional (reversible). Directionality can be emulated by introducing
non-Gaussian errors or by using a nonlinear model. Established measures of
directionality are reviewed and modified for time series that are symmetrical
about the time axis. The sunspot time series is shown to be directional with
relatively sharp increases. A threshold autoregressive model of order 2, TAR(2) is
fitted to the sunspot series by (nonlinear) least squares and is shown to give an
improved fit on autoregressive models. However, this model does not model closely
the directionality, so a penalized least squares procedure was implemented. The
penalty function included a squared difference of the discrepancy between observed
and simulated directionality. The TAR(2) fitted by penalized least squares gave
improved out-of-sample forecasts and more realistic simulations of extreme values.
1 Introduction
Fig. 1 Graphical inspection of directionality shows the sunspot observations rise more quickly
than they fall in time order (above) and rise more slowly than they fall in reverse time order (below)
2 Detecting Directionality
gradual returns to the median value. Such time series are asymmetric with respect to
time but symmetric with respect to the median. Different statistics are appropriate
for detecting directionality in series that are asymmetric or symmetric with respect
to the median.
In this paper, we employ relatively simple and well-established tests [3] to
detect directionality in time series: difference in linear quadratic lagged correlations;
proportion of positive differences; skewness of differences; and tests based on
comparisons of time from threshold to peak against time from peak to threshold.
More recent tests are based on properties of Markov chains [1]; spectral estimation
of kurtosis of differences in financial time series [9]; and the FeedbackTS package
in R to detect time directionality occurring in specific fragments of time series [8].
The rationale behind this statistic is as follows. Consider for example, there are
sharp increases followed by slow recessions. Suppose the sharp increase occurs
between xt and xtC1 , then .xt xN / could be negative or positive but .xtC1 xN / is very
likely to be positive. It follows that .xt xN /.xtC1 xN /2 is negative or positive whereas
.xt xN /2 .xtC1 xN / is positive and hence DLQC will tend to be negative. Both terms
in DLQC are correlations, so bounds for the DLQC are Œ2; 2, but typical values
in directional time series are smaller by two orders of magnitude. For the sunspot
series, which exhibits clear directionality, with relatively sharp increases, DLQC is
0:06.
16 M.M. Mansor et al.
.PC
above / C .Pbelow /
Pabm D 100; (5)
.PC C
above / C .Pabove / C .Pbelow / C .Pbelow /
A potentially more sensitive test for directionality is to consider the skewness of the
distribution of differences [3] given by
Pn
.yt yN /3 =.n 1/
O D P tD1 3=2 : (6)
n
N /2 =.n 1/
tD1 .yt y
1
In the following, “differences” refers to these lag one differences.
Threshold Autoregressive Models for Directional Time Series 17
If a time series has both sharp increases and sharp decreases and is symmetric about
the median we adapt the definition to be
Consider a threshold (H) set, in this investigation, at the upper quintile of the
marginal distribution of the time series xt . Suppose that xj1 < H, xj > H, and
that xt remains above H until xjCkC1 < H. Denote the time when xt is greatest (peak
value) for j t . j C k/ as . j C p/. Define the difference between time from
threshold to peak and time from the peak to the threshold as
DHPPHj D .k p/ p: (8)
In general, a directional time series will not show directionality on all of these
statistics. Any one test being statistically significant at the ˛ level, where ˛ <
0.05 say, is evidence of directionality. A conservative allowance for multiple testing
would be to use a Bonferroni inequality and claim overall significance at less than
an m˛ level where m is the number of tests.
3 Modeling Directionality
Mansor et al. [5] considered first order autoregressive processes AR(1) of the form
Xt D ˛Xt1 C t ; (9)
and showed that the choice of non-Gaussian error distributions could lead to a
variety of significant directional features. Furthermore, they demonstrated that
18 M.M. Mansor et al.
realizations from first order threshold autoregressive models TAR(1) with two
thresholds (TL ; TU ) of the form
8
< ˛U Xt1 C t if Xt1 > TU
Xt D ˛M Xt1 C t if TU < Xt1 < TL (10)
:
˛L Xt1 C t if Xt1 > TL
show substantial directionality even with Gaussian errors. They also found that the
product moment skewness of differences (O ) was generally the most effective statis-
tic for detecting directionality. They subsequently fitted a second order threshold
autoregressive model TAR(2) with one threshold (T) of the form
˛1U Xt1 C ˛2U Xt2 C t if Xt1 > T
Xt D (11)
˛1L Xt1 C ˛2L Xt2 C t if Xt1 < T
to the first 200 values in the sunspot series, by nonlinear least squares. The TAR(2)
gave some improvement over an AR(2) model for one-step ahead predictions of
the remaining 115 values in the sunspot series that were not used in the fitting
procedure. However they noted that there was scope for more realistic modeling
of directionality.
Here we consider the strategy of using a penalized least squares procedure for
the fitting of the TAR(2) model to the sunspot series. Initially, the objective function
to be minimized was
X
n
!D rt2 C .Osimulated Oobserved /2 ; (12)
tD3
where fxt g for t D 1; : : : ; n is the mean adjusted (to 0) time series, and Oobserved is
the directionality calculated for the 200 year sunspot series.
For any candidate set of parameter values, the optimization routine has to
determine not only the sum of squared errors, but also the directionality ( ). The
directionality is not known as an algebraic function of the model parameters, so it
is estimated by simulation of the TAR(2) model with the candidate parameters and
resampled residuals. A simulation of length 2 105 was used to establish to a
reasonable precision and this is referred to as Osimulated .
The R function optim() which uses the Nelder–Mead algorithm [6] was used
to optimize the parameter values. The long simulation for every set of candidate
parameters makes this a challenging optimization problem, but convergence was
typically achieved within 30 min on a standard desktop computer. The sum of
Threshold Autoregressive Models for Directional Time Series 19
where 1 and 2 are the weight given to mitigate the discrepancy in O and O ,
respectively. The modification does not noticeably increase the run time. Detailed
results are given in Sect. 4.
In Table 1 the DLQC, PC and O all indicate directionality in the series and all
have P-values of 0.00 (two-sided P-values calculated from a parametric bootstrap
procedure) [4]. The P-value for Pabm and Oabm is 0.28 and 0.79 respectively.
Table 1 Summary table of test statistics of directionality for the sunspot series (1700–2014)
Series Length Mean sd DLQC PC Pabm O Oabm
Sunspot numbers 315 49.68 40.24 0.0598 42.49 % 51.78 % 0.8555 1.5014
20 M.M. Mansor et al.
Table 2 Time series models for the sunspot series (1700–1900) compared by est
280
260
Error Variance
240
220
Fig. 2 Fitting TAR(2)[LSP] to the observed sunspots (1700–1900): trade-off between minimizing
the sum of squared residuals, and minimizing the skewness discrepancy
The details of fitting AR(2), TAR(2)[LS], and TAR(2)[LSP] models are given in
Tables 4, 5, and 6. The upper and the lower regimes of the TAR(2)[LS] and the
TAR(2)[LSP] in Tables 5 and 6, respectively, are stable AR(2) processes which
satisfy the requirements of the stationary-triangular region ˛2 > 1, ˛1 C ˛2 < 1
and ˛1 ˛2 > 1 [2].
22 M.M. Mansor et al.
Table 7 Test statistics of directionality in the simulated sunspots and the sunspot numbers (1700–
1900)
Series Length Mean sd DLQC PC (%) Pabm (%) O Oabm
Sunspot numbers 200 44:11 34:76 0:0609 43:94 48:48 0:8344 1:0646
TAR(2)[LS_G] 2105 39:95 41:38 0:0088 48:90 50:15 0:1444 0:2939
TAR(2)[LS_R] 2105 44:38 39:25 0:0173 48:26 49:05 0:2721 0:4878
TAR(2)[LSP_R] 2105 45:25 35:18 0:0266 45:50 49:18 0:6293 0:9501
A simulation of 2 105 points from the TAR(2)[LS] model with Gaussian errors
(TAR(2)[LS_G]) and with the resampled residuals (R) for TAR(2)[LS_R]; and
TAR(2)[LSP_R] gave the statistics shown in Table 7.
The TAR(2)[LSP_R] gives the best fit to the first 200 observations in the sunspot
series in terms of the statistics that were not included as criteria for fitting.
150
150
Predicted sunspots by AR(2)
150
100
100
100
50
50
50
0
0 50 100 150 0 50 100 150 0 50 100 150
Observed sunspots Observed sunspots Observed sunspots
Table 8 Forecasting measures of predicted sunspots to the sunspot series from 1901 to 2014
Model Mean(Erel ) sd(Erel ) Mean(jErel j) sd(jErel j)
AR(2) 0:6812 2.4777 0:8713 2:4169
TAR(2)[LS] 0:4886 2.3014 0:7512 2:2289
TAR(2)[LSP] 0:5015 2.1897 0:7394 2:1206
We simulate 2 105 values using an AR(2) model with Gaussian errors (AR(2)_G),
TAR(2)[LS_G], TAR(2)[LS_R], and TAR(2)[LSP_R] models. The upper and lower
coefficients for TAR(2)[LS] and TAR(2)[LSP] are the optimized parameters in
Tables 5 and 6 accordingly. We include another two different error distributions for
TAR(2)[LSP] which are the back-to-back Weibull distribution, that is one Weibull
distribution was fitted to the positive residuals and another to the absolute values of
the negative residuals, (WD); and an Extreme Value Type 1 (Gumbel) distribution
of minima (EV).
We refer TAR(2)[LSP] with WD and EV errors as TAR(2)[LSP_WD] and
TAR(2)[LSP_EV], respectively. We calculate the extreme values for every 15
consecutive years in the simulated series of length 2 105 , illustrate with boxplots
(Fig. 4) and provide descriptive statistics (Table 9).
In general, TAR(2)[LSP] models simulate greater extreme values than TAR(2)
[LS] and AR(2) models, as shown by inter-quartile range (IQR) and standard
deviation (sd) in Table 9. Furthermore, 15-year extreme values from TAR(2)[LSP_
DW] have the closest sd, and IQR, to the extreme values from the observed time
series.
24 M.M. Mansor et al.
Table 9 Descriptive statistics of 15-year extreme values in the sunspot series (1700–2014) and
the simulated series for each model with different residuals
Series n Median Mean Max IQR sd Skewness
Sunspot numbers 21 111:0 112:20 190:20 53:90 37:89 0:1158
AR(2)_R 13,333 91:60 94:54 221:40 34:53 26:10 0:5903
TAR(2)[LS_G] 13,333 101:10 99:30 175:40 26:24 20:65 0:2523
TAR(2)[LS_R] 13,333 102:70 102:00 195:10 27:58 21:94 0:0025
TAR(2)[LSP_R] 13,333 91:45 91:46 236:60 40:25 29:82 0:2110
TAR(2)[LSP_WD] 13,333 97:39 100:20 358:80 50:06 40:18 0:7913
TAR(2)[LSP_EV] 13,333 83:77 84:61 255:30 41:35 30:60 0:2973
200
200
150
150
150
100
100
100
50
50
50
300
200
150
150
200
100
100
100
50
50
0
Fig. 4 Boxplot of 15-year extreme values in the simulated series for AR(2) and TAR(2) models.
(a) AR(2)_R; (b) TAR(2)[LS_G]; (c) TAR(2)[LS_R]; (d) TAR(2)[LSP_R]; (e) TAR(2)[LS_WD];
(f) TAR(2)[LSP_EV]
5 Conclusion
There are many ways in which a time series can exhibit directionality, and different
measures are needed to identify these different characteristics. TAR models provide
a piecewise linear approximation to a wide range of nonlinear processes, and offer a
versatile modeling strategy. The sunspot series shows clear directionality, a physical
interpretation of which is given in [5]. We have shown that a nonlinear TAR(2)[LS]
Threshold Autoregressive Models for Directional Time Series 25
References
1. Beare, B.K., Seo, J.: Time irreversible copula-based Markov models. Econ. Theory 30, 1–38
(2012)
2. Chatfield, C.: The Analysis of Time Series: An Introduction, 6th edn., pp. 218–219, 223–224,
44–45. Chapman and Hall/CRC, London/Boca Raton (2004)
3. Lawrance, A.: Directionality and reversibility in time series. Int. Stat. Rev./Revue Internationale
de Statistique 59(1), 67–79 (1991)
4. Mansor, M.M., Green, D.A., Metcalfe, A.V.: Modelling and simulation of directional financial
time series. In: Proceedings of the 21st International Congress on Modelling and Simulation
(MODSIM 2015), pp. 1022–1028 (2015)
5. Mansor, M.M., Glonek, M.E., Green, D.A., Metcalfe, A.V.: Modelling directionality in sta-
tionary geophysical time series. In: Proceedings of the International Work-Conference on Time
Series (ITISE 2015), pp. 755–766 (2015)
6. Nash, J.C.: On best practice optimization methods in R. J. Stat. Softw. 60(2), 1–14 (2014)
7. Solar Influences Data Analysis Center, Sunspot Index and Long-term Solar Observations. http://
www.sidc.be/silso (last accessed 17 October 2015)
8. Soubeyrand, S., Morris, C.E., Bigg, E.K.: Analysis of fragmented time directionality in time
series to elucidate feedbacks in climate data. Environ. Model Softw. 61, 78–86 (2014)
9. Wild, P., Foster, J., Hinich, M.: Testing for non-linear and time irreversible probabilistic structure
in high frequency financial time series data. J. R. Stat. Soc. A. Stat. Soc. 177(3), 643–659 (2014)
Simultaneous Statistical Inference in Dynamic
Factor Models
Dynamic factor models (DFMs) are multivariate time series models of the form
1
X
X.t/ D .s/ f.t s/ C ".t/; 1 t T: (1)
sD1
T. Dickhaus ()
Institute for Statistics, University of Bremen, P.O. Box 330 440, 28344 Bremen, Germany
e-mail: [email protected]; http://www.math.uni-bremen.de/~dickhaus
M. Pauly
Institute of Statistics, University of Ulm, Helmholtzstr. 20, 89081 Ulm, Germany
In [39], methods for the determination of the (number of) common factors in a
factor model of the form (2) and a canonical transformation allowing a parsimonious
representation of X.t/ in (2) in terms of the common factors were derived. Statistical
inference in static factor models for longitudinal data has been studied, for instance,
in [28], where an algorithm for computing maximum likelihood estimators (MLEs)
in models with factorial structure of the covariance matrix of the observables was
developed. For further references and developments regarding the theory and the
interrelations of different types of (dynamic) factor models we refer to [5, 21], and
references therein.
Statistical inference methods for DFMs typically consider the time series in the
frequency domain, cf., among others, [17, 18] and references therein, and analyze
decompositions of the spectral density matrix of X. Nonparametric estimators
of the latter matrix by kernel smoothing have been discussed in [41]. In a
parametric setting, a likelihood-based framework for statistical inference in DFMs
was developed in [19] by making use of central limit theorems for time series
regression in the frequency domain, see [23]. The inferential considerations in
[19] rely on the asymptotic normality of the MLE #O of the (possibly very high-
dimensional) parameter vector # in the frequency-domain representation of the
model. We will provide more details in Sect. 3. To this end, it is essential that
the time series model (1) is identified in the sense of [19], which we will assume
throughout the paper. If the model is not identified, the individual contributions of
the common factors cannot be expressed unambiguously and, consequently, testing
SSI in DFMs 29
central limit theorem for empirical Fourier transforms of the observable time series.
The asymptotic normality of these Fourier transforms leads to the asymptotic
multivariate chi-square distribution of the considered vector of Wald statistics. In
Sect. 4, we propose a model-based resampling scheme for approximating the finite-
sample distribution of this vector of test statistics. We conclude with a discussion in
Sect. 5.
2 Multiple Testing
The general setup of multiple testing theory assumes a statistical model .˝; F ,
.P# /#2 / parametrized by # 2 and is concerned with testing a family H D
.Hi ; i 2 I/ of hypotheses regarding the parameter # with corresponding alternatives
Ki D n Hi , where I denotes an arbitrary index set. We identify hypotheses
with subsets of the parameter space throughout the paper. Let ' D .'i ; i 2 I/
be a multiple test procedure for H, meaning that each component 'i , i 2 I, is a
(marginal) test for the test problem Hi versus Ki in the classical sense. Moreover,
let I0 I0 .#/ I denote the index set of true hypotheses in P H and V.'/
the number of false rejections (type I errors) of ', i.e., V.'/ D i2I0 'i : The
classical multiple type I error measure in multiple hypothesis testing is the family-
wise error rate, FWER for short, and can (for a given # 2 ) be expressed as
FWER# .'/ D P# .V.'/ > 0/. The multiple test ' is said to control the FWER
at a predefined significance level ˛, if sup#2 FWER# .'/ ˛. A simple, but
often conservative method for FWER control is based on the union bound and is
referred to as Bonferroni correction in the multiple testing literature. Assuming that
jIj D m, the Bonferroni correction carries out each individual test 'i ; i 2 I, at
(local) level ˛=m. The “Bonferroni test” ' D .'i ; i 2 I/ then controls the FWER.
In case that joint independence of all m marginal test statistics can be assumed,
the Bonferroni-corrected level ˛=m can be enlarged to the “Šidák-corrected” level
1 .1 ˛/1=m > ˛=m leading to slightly more powerful (marginal) tests. Both
the Bonferroni and the Šidák test are single-step procedures, meaning that the same
local significance level is used for all m marginal tests.
An interesting other class of multiple test procedures are stepwise rejective tests,
in particular step-up-down (SUD) tests, introduced in [50]. They are most conve-
niently described in terms of p-values p1 ; : : : ; pm corresponding to test statistics
T1 ; : : : ; Tm . It goes beyond the scope of this paper to discuss the notion of p-values
in depth. Therefore, we will restrict attention to the case that every individual null
hypothesis is simple, the distribution of every Ti , 1 i m, under Hi is continuous,
and each Ti tends to larger values under alternatives. The test statistics considered
in Sect. 3 fulfill these requirements, at least asymptotically. Then, we can calculate
(observed) p-values by pi D 1 Fi .ti /, 1 i m, where Fi is the cumulative
distribution function (cdf) of Ti under Hi and ti denotes the observed value of Ti .
The transformation with the upper tail cdf brings all test statistics to a common
SSI in DFMs 31
scale, because each p-value is supported on Œ0; 1. Small p-values are in favor of the
corresponding alternatives.
Definition 1 (SUD Test of Order œ in Terms of p-Values, cf. [15]) Let p1Wm <
p2Wm < < pmWm denote the ordered p-values for a multiple test problem. For a
tuning parameter 2 f1; : : : ; mg an SUD test ' D .'1 ; : : : ; 'm / (say) of order
based on some critical values ˛1Wm ˛mWm is defined as follows. If pWm
˛Wm , set j D maxf j 2 f; : : : ; mg W piWm ˛iWm for all i 2 f; : : : ; jgg, whereas for
pWm > ˛Wm , put j D supf j 2 f1; : : : ; 1g W pjWm ˛jWm g .sup ; D 1/. Define
'i D 1 if pi ˛j Wm and 'i D 0 otherwise .˛1Wm D 1/.
An SUD test of order D 1 or D m, respectively, is called step-down (SD)
or step-up (SU) test, respectively. If all critical values are identical, we obtain a
single-step test.
In connection with control of the FWER, SD tests play a pivotal role, because
they can often be considered a shortcut of a closed test procedure, cf. [33]. For
example, the famous SD procedure of Holm [24] employing critical values ˛iWm D
˛=.m i C 1/, 1 i m is, under the assumption of a complete system of
hypotheses, a shortcut of the closed Bonferroni test (see, for instance, [49]) and
hence controls the FWER at level ˛.
In order to compare concurring multiple test procedures, also a type II error
measure or, equivalently, a notion of power is required under the multiple testing
framework. To this end, following
P Definition 1.4 of [8], we define I1 I1 .#/ D
I n I0 , m1 D jI1 j, S.'/ D i2I1 'i and refer to the expected proportion of correctly
detected alternatives, i.e., power# .'/ D E# ŒS.'/= max.m1 ; 1/, as the multiple
power of ' under #, see also [34]. If the structure of ' is such that 'i D 1pi t
for a common, possibly data-dependent threshold t , then the multiple power of '
is increasing in t . For SUD tests, this entails that index-wise larger critical values
lead to higher multiple power.
Gain in multiple power under the constraint of FWER control is only possible
if certain structural assumptions for the joint distribution of . p1 ; : : : ; pm /> or,
equivalently, .T1 ; : : : ; Tm /> can be established, cf. Example 3.1 in [10]. In particular,
positive dependency among p1 ; : : : ; pm in the sense of MTP2 (see [29]) or PRDS (see
[2]) allows for enlarging the critical values .˛iWm /1im . To give a specific example,
it was proved in [45] that the critical values ˛iWm D i˛=m, 1 i m, can be used
as the basis for an FWER-controlling closed test procedure, provided that the joint
distribution of p-values is MTP2 . These critical values have originally been proposed
T
in [48] in connection with a global test for the intersection hypothesis H0 D m iD1 Hi
and are therefore often referred to as Simes’ critical values. In [25] a shortcut for the
aforementioned closed test procedure based on Simes’ critical values was worked
out; we will refer to this multiple test as ' Hommel in the remainder of this work.
32 T. Dickhaus and M. Pauly
Simes’ critical values also play an important role in connection with control of
the FDR. The FDR is a relaxed type I error measure suitable for large systems of
hypotheses. Formally, it is defined as FDR# .'/ D E# ŒFDP.'/, where FDP.'/ D
V.'/= max.R.'/; 1/ with R.'/ D V.'/ C S.'/ denoting the total number of
rejections of ' under #. The random variable FDP.'/ is called the false discovery
proportion. The meanwhile classical linear step-up test from [1], ' LSU (say), is an
SU test with Simes’ critical values. Under joint independence of all p-values, it
provides FDR-control at (exact) level m0 ˛=m, where m0 D mm1 , see, for instance,
[14]. In [2, 46] it was independently proved that FDR# .' LSU / m0 .#/˛=m for
all # 2 if the joint distribution of . p1 ; : : : ; pm /> is PRDS on I0 (notice that
MTP2 implies PRDS on any subset). The multiple test ' LSU is the by far most
popular multiple test for FDR control and is occasionally even referred to as the
FDR procedure in the literature.
Asymptotically, the vectors of test statistics that are appropriate for testing the
hypotheses we are considering in the present work follow under H0 a multivariate
chi-square distribution in the sense of the following definition.
Definition 2 Let m 2 and D .1 ; : : : ; m /> be a vector of P positive
integers. Let .Z1;1 ; : : : ; Z1;1 ; Z2;1 ; : : : ; Z2;2 ; : : : ; Zm;1 ; : : : ; Zm;m / denote m
kD1 k
jointly normally distributed random variables with joint correlation matrix
R D ..Zk1 ;`1 ; Zk2 ;`2 / W 1 k1 ; k2 m; 1 `1 k1 ; 1 `2 k2 / such
that for any 1 k m the random vector Zk D .Zk;1 ; : : : ; Zk;k /> has a standard
normal distribution on Rk . Let Q D .Q1 ; : : : ; Qm /> , where
k
X
2
Qk D Zk;` for all 1 k m: (3)
`D1
X 2
1 X 1 X
X 2
2 2
D Cov.Z1;i ; Z2;j /D2 2 .Z1;i ; Z2;j / 0:
iD1 jD1 iD1 jD1
The upper bound in (4) follows directly from the Cauchy–Schwarz inequality,
because the variance of a chi-square distributed random variable with degrees
of freedom equals 2.
In view of the applicability of multiple test procedures for positively dependent
test statistics that we have discussed in Sect. 2.1, Lemma 1 points into the right direc-
tion. However, as outlined in the introduction, the MTP2 property for multivariate
chi-square or, more generally, multivariate gamma distributions could up to now
only be proved for special cases as, for example, exchangeable gamma variates (cf.
Example 3.5 in [29], see also [47] for applications of this type of multivariate gamma
distributions in multiple hypothesis testing). Therefore and especially in view of the
immense popularity of ' LSU we conducted an extensive simulation study of FWER
and FDR control of multiple tests suitable under MTP2 (or PRDS) in the case that
the vector of test statistics follows a multivariate chi-square distribution in the sense
of Definition 2. Specifically, we investigated the shortcut test ' Hommel for control of
the FWER and the linear step-up test ' LSU for control of the FDR and considered
the following correlation structures among the variates .Zk;` W 1 k m/ for any
given 1 ` maxfk W 1 k mg. (Since only the coefficients of determination
enter the correlation structure of the resulting chi-square variates, we restricted our
attention to positive correlation coefficients among the Zk;` .)
1. Autoregressive, AR.1/: ij D jijj , 2 f0:1; 0:25; 0:5; 0:75; 0:9g.
2. Compound symmetry (CS): ij D C.1/1fiDjg , 2 f0:1; 0:25; 0:5; 0:75; 0:9g.
34 T. Dickhaus and M. Pauly
3. Toeplitz: ij D jijjC1 , with 1 1 and 2 ; : : : ; m randomly drawn from the
interval Œ0:1; 0:9.
4. Unstructured (UN): The ij are elements of a normalized realization of a Wishart-
distributed random matrix with m degrees of freedom and diagonal expectation.
The diagonal elements were randomly drawn from Œ0:1; 0:9m .
In all four cases, we have ij D Cov.Zi;` ; Zj;` /, 1 i; j m , where m D
jf1 k m W k ` gj. The marginal degrees of freedom .k W 1 k m/
have been drawn randomly from the set f1; 2; : : : ; 100g for every simulation setup.
In this, we chose decreasing sampling probabilities of the form =. C 1/, 1
100, where denotes the norming constant, because we were most interested
in the small-scale behavior of ' Hommel and ' LSU under dependency. For the number
of marginal test statistics, we considered m 2 f2; 5; 10; 50; 100g, and for each such
m several values for the number m0 of true hypotheses. For all false hypotheses,
we set the corresponding p-values to zero, because the resulting so-called Dirac-
uniform configurations are assumed to be least favorable for ' Hommel and ' LSU , see,
for instance, [4, 14]. For every simulation setup, we performed M D 1000 Monte
Carlo repetitions of the respective multiple test procedures and estimated the FWER
or FDR, respectively, by relative frequencies or means, respectively. Our simulation
results are provided in the appendix of Dickhaus [7].
To summarize the findings, ' Hommel behaved remarkably well over the entire
range of simulation setups. Only in a few cases, it violated the target FWER level
slightly, but one has to keep in mind that Dirac-uniform configurations correspond
to extreme deviations from the null hypotheses which are not expected to be
encountered in practical applications. In line with the results in [2, 46], ' LSU
controlled the FDR well at level m0 ˛=m (compare with the bound reported at the end
of Sect. 2). One could try to diminish the resulting conservativity for small values of
m0 either by pre-estimating m0 and plugging the estimated value m O 0 into the nominal
level, i.e., replacing ˛ by m˛=m O 0 , or by employing other sets of critical values. For
instance, in [14, 15] nonlinear critical values were developed, with the aim of full
exhaustion of the FDR level for any value of m0 under Dirac-uniform configurations.
However, both strategies are up to now only guaranteed to work well under the
assumption of independent p-values and it would need deeper investigations of their
validity under positive dependence. Here, we can at least report that we have no
indications that ' LSU may not keep the FDR level under our framework, militating
in favor of applying this test for FDR control in the applications that we will consider
in Sect. 3.
Remark 1 A different way to tackle the aforementioned problem of lacking higher-
order dependency properties is not to rely on the asymptotic Q 2 .m; ; R/ (where
R is unspecified), but to approximate the finite-sample distribution of test statistics,
for example by means of appropriate resampling schemes. Resampling-based SD
tests for FWER control have been worked out in [42, 43, 52]. Resampling-based
FDR control can be achieved by applying the methods from [53, 56] or [44], among
others. We will return to resampling-based multiple testing in the context of DFMs
in Sect. 4.
SSI in DFMs 35
X
T
Q j / D .2T/1=2
X.! X.t/ exp.it!j /; !j D 2j=T; T=2 < j bT=2c:
tD1
where A. j/ 2 Rpp and the process .et /t is uncorrelated white noise, see [22]. The
representations of X in (6) and (7) justify the term “white noise factor score model”
(WNFS) which has been used, for instance, in [35].
D
Throughout the remainder, we denote convergence in distribution by !.
Theorem 1 Suppose that Assumption 1 and one of the following two conditions
hold true:
(a) Assumption 2 is fulfilled.
(b) Assumption 3 holds and the A. j/ in the representation (7) fulfill
1
X
kA. j/k < 1: (8)
jD0
D
Q j;b //1jnb ; 0N / ! .Zj;b /j2N ;
..X.! min.nb .T/; T/ ! 1; (9)
Q j;b //1jnb
where the left-hand side of (9) denotes the natural embedding of .X.!
p N
into .R / and .Zj;b /j2N is a sequence of independent random vectors, each of
which follows a complex normal distribution with mean zero and covariance matrix
SX .! .b/ /.
Proof Following [3], p. 29 f., it suffices to show convergence of finite-dimensional
margins. Recall that the indices ju ; 1 u nb , are chosen in successive order of
SSI in DFMs 37
closeness of !j;b D 2ju =T to the center ! .b/ . Hence, under Assumptions 1 and 2,
this convergence follows from Theorem 4.13 in [22] together with the continuous
mapping theorem. In the other case, the convergence in (9) is a consequence of
Theorem 3 in [23], again applied together with the continuous mapping theorem.
Remark 2
1. It is well known that (8) entails ergodicity of X.
2. Actually, Theorem 1 holds under slightly weaker conditions; see [23] for details.
Moreover, in [38] the weak convergence of the finite Fourier transform X Q has
recently been studied under different assumptions.
3. While (7) or (6) may appear structurally simpler than (1), notice that the involved
coefficient matrices A. j/ have (potentially much) higher dimensionality than
.s/ in (1).
4. In practice, it seems that the bands ˝b as well as the numbers nb have to be chosen
adaptively. To avoid frequencies at the boundary of ˝b , choosing nb D o.T/
seems appropriate.
Let the parameter vector #b contain all d D 2pk C k2 C p distinct parameters in
Q .! .b/ /, Sf .! .b/ / and S" .! .b/ /, where each of the (in general) complex elements
in Q .! .b/ / and Sf .! .b/ / is represented by a pair of real components in #b ,
corresponding to its real part and its imaginary part. The full model dimension
is consequently equal to Bd. For convenience and in view of Lemma 2, we write
with slight abuse of notation #b D vech.SX .! .b/ //, and ivech.#b / D SX .! .b/ /. The
above results motivate to study the (local) likelihood function of the parameter #b
for a given realization X D x of the process (from which we calculate X Q D xQ /. In
frequency band ˝b , it is given by
0 1
X
nb
`b .#b ; x/ D pnb jivech.#b /jnb exp @ xQ .!j;b /0 ivech.#b /1 xQ .!j;b /A I
jD1
see [20]. Optimization of the B local (log-) likelihood functions requires to solve a
system of d nonlinear (in the parameters contained in #b ) equations of the form
X
nb
S D .nb /1 xQ .!j;b /Qx.!j;b /0 :
jD1
38 T. Dickhaus and M. Pauly
To this end, the algorithm originally developed in [28] for static factor models
can be used (where formally covariance matrices are replaced by spectral density
matrices, cf. [19], and complex numbers are represented by two-dimensional vectors
in each optimization step). The algorithm delivers not only the numerical value of
the MLE #O b , but additionally an estimate VO b of the covariance matrix Vb (say) of
p O
nb #b . In view of Theorem 1 and standard results from likelihood theory (cf., e.g.,
Sect. 12.4 in [32]) concerning asymptotic normality of MLEs, it appears reasonable
to assume that
p O D
nb .#b #b / ! Tb Nd .0; Vb /; 1bB (10)
as min.nb .T/; T/ ! 1, where the multivariate normal limit random vectors Tb are
independent for 1 b B, and that VO b is a consistent estimator of Vb , which
we will assume throughout the remainder. This, in connection with the fact that the
vectors #O b , 1 b B, are asymptotically jointly uncorrelated with each other, is
very helpful for testing linear (point) hypotheses. Such hypotheses are of the form
H W C# D with a contrast matrix C 2 RrBd , 2 Rr and # consisting of all
elements of all the vectors #b . In [19] the usage of Wald statistics has been proposed
in this context. The Wald statistic for testing H is given by
Pnb Pnb
3. For all 1 b B, calculate #O b D n1
b
O 1
jD1 Zj;b and V b D nb
jD1 .Zj;b
#O /.Z #O / .
b
j;b
>
b
4. Calculate W D N.#O #/ O > C> .CVO C> /C C.#O #/,
O where #O and VO are
constructed in analogy to #O and V.
O
5. Repeat steps 2–4 M times to obtain M pseudo replicates of W and approximate
the distribution of W by the empirical distribution of these pseudo replicates.
The heuristic justification for this algorithm is as follows. Due to Theorem 1 and
the discussion around (10), it is appropriate to approximate the distribution of the
MLE in band ˝b by means of Z1;b ; : : : ; Znb ;b . Moreover, to capture the structure
of W, we build the MLEs #O and VO of the mean and the covariance matrix,
respectively, also in this resampling model. Furthermore, for finite sample sizes it
seems more suitable to approximate the distribution of the quadratic form W by a
statistic of the same structure. Throughout the remainder, we denote convergence in
p
probability by !.
Theorem 3 Under the assumptions of Theorem 2, it holds
p
sup jProb.W wjX/ Prob.W wjH/j ! 0; (12)
w2R
p p
E.Z1;b jX/ D #O b ! #b
and Var.Z1;b jX/ D VO b ! Vb :
Moreover, for each fixed 1 b B and fixed data X, the sequence of random
4
vectors .Zj;b /j is row-wise i.i.d. with lim sup E.kZ1;b k jX/ < 1 almost surely.
Hence an application of Lyapunov’s multivariate central limit theorem together with
Slutzky’s theorem implies conditional convergence in distribution given the data X
in the sense that
p p
d L. nb .#O b #O b /jX/; L.Tb / ! 0
for all 1 b B, where L.Tb / D Nd .0; Vb /. Note that, as usual for resampling
mechanisms, the weak convergence originates from the randomness of the bootstrap
procedure given X, whereas the convergence in probability arises from the sample
X. We can now proceed similarly to the proof of Theorem 1 in [37]. Since the
SSI in DFMs 41
p
random vectors nb .#O b #O b / are also independent within 1 b B given the data,
the appearing multivariate normal limit vectors Tb ; 1 b B; are independent as
well. Together with
p the continuous mapping theorem this shows that the conditional
distribution of NC.#O #/ O given X converges weakly to a multivariate normal
distribution with mean zero and covariance matrix CVC> in probability:
p p
r L. NC.#O #/jX/;
O Nr .0; CVC> / ! 0:
Furthermore, the weak law of large numbers for triangular arrays implies VO b
p
VO b ! 0: Since all Vb ; 1 b B; are positive definite, we finally have det.VO b / > 0
almost surely and therefore also det.VO b / > 0 finally almost surely. This, together
with the continuous mapping theorem, implies convergence in probability of the
Moore–Penrose inverses, i.e.,
p
.CVO C> /C ! .CVC> /C :
Thus another application of the continuous mapping theorem together with Theo-
rem 9.2.2 in [40] shows conditional weak convergence of W given X to L.WjH/;
the distribution of W under H W C# D , in probability, i.e.,
p
1 .L.W jX/; L.WjH// ! 0:
The final result is then a consequence of Helly Bray’s theorem and Polya’s uniform
convergence theorem, since the cdf of W is continuous.
Remark 3
1. Notice that the conditional distribution of W always approximates the null
distribution of W, even if H does not hold true.
2. In view of applications to multiple test problems involving a vector W D
.W1 ; : : : ; Wm /> as in Problem 1 (m D p) and Problem 2 (m D pk), our
resampling approach can be applied as follows. The vector W can be written
as a continuous function g (say) of C#O and V. O Note that the proof of
O O
Theorem 3 shows that C.# #/ always approximates the distribution of
C.#O #/ and VO VO converges to zero in probability. Thus, we can approximate
the distribution of W D g.C#O ; V/ O under H0 by W D g.C.#O #/; O VO /.
Slutzky’s Theorem, together with the continuous mapping theorem, ensures that
an analogous result to Theorem 3 applies for W . This immediately implies
that multiple test procedures for weak FWER control can be calibrated by the
conditional distribution of W . For strong control of the FWER and for FDR
control, the resampling approach is valid under the so-called subset pivotality
condition (SPC) introduced in [55]. Validity of the SPC heavily relies on the
structure of the function g. For Problems 1 and 2, the SPC is fulfilled, because
every Wi depends on mutually different coordinates of #. O
42 T. Dickhaus and M. Pauly
First of all, we would like to mention that the multiple testing results with
respect to FWER control achieved in Sects. 2 and 3 also imply (approximate)
simultaneous confidence regions for the parameters of model (1) by virtue of the
extended correspondence theorem, see Section 4.1 of [12]. In such cases (in which
focus is on FWER control), a promising alternative method for constructing a
multiple test procedure is to deduce the limiting joint distribution of the vector
.Q1 ; : : : ; Qm /> (say) of likelihood ratio statistics. For instance, one may follow the
derivations in [30] for the case of likelihood ratio statistics stemming from models
with independent and identically distributed observations. Once this limiting joint
distribution is obtained, simultaneous test procedures like the ones developed in [26]
are applicable.
Second, it may be interesting to assess the variance of the FDP in DFMs, too. For
example in [4, 13] it has been shown that this variance can be large in models with
dependent test statistics. Consequently, it has been questioned if it is appropriate
only to control the first moment of the FDP, because this does not imply a type I
error control guarantee for the actual experiment at hand. A maybe more convincing
concept in such cases is given by control of the false discovery exceedance, see [11]
for a good survey.
A topic relevant for economic applications is a numerical comparison of the
asymptotic multiple tests discussed in Sect. 2 and the bootstrap-based method
derived in Sect. 4. We will provide such a comparison in a companion paper.
Furthermore, one may ask to which extent the results in the present paper can be
transferred to more complicated models where factor loadings are modeled as a
function of covariates like in [36]. To this end, stochastic process techniques way
beyond the scope of our setup are required. A first step may be the consideration
of parametric models in which conditioning on the design matrix will lead to our
framework.
Finally, another relevant multiple test problem in DFMs is to test for cross-
sectional correlations between specific factors. While the respective test problems
can be formalized by linear contrasts in analogy to Lemmas 3 and 4, they cannot
straightforwardly be addressed under our likelihood-based framework, because the
computation of the MLE by means of the system of normal equations discussed in
Sect. 3 heavily relies on the general assumption of cross-sectionally uncorrelated
error terms. Addressing this multiple test problem is therefore devoted to future
research.
Acknowledgements The authors are grateful to Prof. Manfred Deistler for valuable comments
regarding Problem 1. Special thanks are due to the organizers of the International work-conference
on Time Series (ITISE 2015) for the successful meeting.
SSI in DFMs 43
References
1. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful
approach to multiple testing. J. R. Stat. Soc. Ser. B 57(1), 289–300 (1995)
2. Benjamini, Y., Yekutieli, D.: The control of the false discovery rate in multiple testing under
dependency. Ann. Stat. 29(4), 1165–1188 (2001)
3. Billingsley, P.: Convergence of Probability Measures. Wiley, New York, London, Sydney,
Toronto (1968)
4. Blanchard, G., Dickhaus, T., Roquain, E., Villers, F.: On least favorable configurations for
step-up-down tests. Stat. Sin. 24(1), 1–23 (2014)
5. Breitung, J., Eickmeier, S.: Dynamic factor models. Discussion Paper Series 1: Economic
Studies 38/2005. Deutsche Bundesbank (2005)
6. Chiba, M.: Likelihood-based specification tests for dynamic factor models. J. Jpn. Stat. Soc.
43(2), 91–125 (2013)
7. Dickhaus, T.: Simultaneous statistical inference in dynamic factor models. SFB 649 Discussion
Paper 2012-033, Sonderforschungsbereich 649, Humboldt Universität zu Berlin, Germany
(2012). Available at http://sfb649.wiwi.hu-berlin.de/papers/pdf/SFB649DP2012-033.pdf
8. Dickhaus, T.: Simultaneous Statistical Inference with Applications in the Life Sciences.
Springer, Berlin, Heidelberg (2014)
9. Dickhaus, T., Royen, T.: A survey on multivariate chi-square distributions and their applica-
tions in testing multiple hypotheses. Statistics 49(2), 427–454 (2015)
10. Dickhaus, T., Stange, J.: Multiple point hypothesis test problems and effective numbers of tests
for control of the family-wise error rate. Calcutta Stat. Assoc. Bull. 65(257–260), 123–144
(2013)
11. Farcomeni, A.: Generalized augmentation to control false discovery exceedance in multiple
testing. Scand. J. Stat. 36(3), 501–517 (2009)
12. Finner, H.: Testing Multiple Hypotheses: General Theory, Specific Problems, and Relation-
ships to Other Multiple Decision Procedures. Habilitationsschrift. Fachbereich IV, Universität
Trier, Germany (1994)
13. Finner, H., Dickhaus, T., Roters, M.: Dependency and false discovery rate: asymptotics. Ann.
Stat. 35(4), 1432–1455 (2007)
14. Finner, H., Dickhaus, T., Roters, M.: On the false discovery rate and an asymptotically optimal
rejection curve. Ann. Stat. 37(2), 596–618 (2009)
15. Finner, H., Gontscharuk, V., Dickhaus, T.: False discovery rate control of step-up-down tests
with special emphasis on the asymptotically optimal rejection curve. Scand. J. Stat. 39(2),
382–397 (2012)
16. Fiorentini, G., Sentana, E.: Dynamic specification tests for dynamic factor models. CEMFI
Working Paper No. 1306, Center for Monetary and Financial Studies (CEMFI), Madrid (2013)
17. Forni, M., Hallin, M., Lippi, M., Reichlin, L.: The generalized dynamic-factor model:
identification and estimation. Rev. Econ. Stat. 82(4), 540–554 (2000)
18. Forni, M., Hallin, M., Lippi, M., Reichlin, L.: Opening the black box: structural factor models
with large cross sections. Econ. Theory 25, 1319–1347 (2009)
19. Geweke, J.F., Singleton, K.J.: Maximum likelihood “confirmatory” factor analysis of economic
time series. Int. Econ. Rev. 22, 37–54 (1981)
20. Goodman, N.: Statistical analysis based on a certain multivariate complex Gaussian distribu-
tion. An introduction. Ann. Math. Stat. 34, 152–177 (1963)
21. Hallin, M., Lippi, M.: Factor models in high-dimensional time series - a time-domain approach.
Stoch. Process. Appl. 123(7), 2678–2695 (2013)
22. Hannan, E.J.: Multiple Time Series. Wiley, New York, London, Sydney (1970)
23. Hannan, E.J.: Central limit theorems for time series regression. Z. Wahrscheinlichkeitstheor.
Verw. Geb. 26, 157–170 (1973)
24. Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. Theory Appl.
6, 65–70 (1979)
44 T. Dickhaus and M. Pauly
25. Hommel, G.: A stagewise rejective multiple test procedure based on a modified Bonferroni
test. Biometrika 75(2), 383–386 (1988)
26. Hothorn, T., Bretz, F., Westfall, P.: Simultaneous inference in general parametric models. Biom.
J. 50(3), 346–363 (2008)
27. Jensen, D.R.: A generalization of the multivariate Rayleigh distribution. Sankhyā, Ser. A 32(2),
193–208 (1970)
28. Jöreskog, K.G.: A general approach to confirmatory maximum likelihood factor analysis.
Psychometrika 34(2), 183–202 (1969)
29. Karlin, S., Rinott, Y.: Classes of orderings of measures and related correlation inequalities. I.
Multivariate totally positive distributions. J. Multivar. Anal. 10, 467–498 (1980)
30. Katayama, N.: Portmanteau likelihood ratio tests for model selection. Discussion Paper Series
2008–1. Faculty of Economics, Kyushu University (2008)
31. Konietschke, F., Bathke, A.C., Harrar, S.W., Pauly, M.: Parametric and nonparametric bootstrap
methods for general MANOVA. J. Multivar. Anal. 140, 291–301 (2015)
32. Lehmann, E.L., Romano, J.P.: Testing Statistical Hypotheses. Springer Texts in Statistics, 3rd
edn. Springer, New York (2005)
33. Marcus, R., Peritz, E., Gabriel, K.: On closed testing procedures with special reference to
ordered analysis of variance. Biometrika 63, 655–660 (1976)
34. Maurer, W., Mellein, B.: On new multiple tests based on independent p-values and the
assessment of their power. In: Bauer, P., Hommel, G., Sonnemann, E. (eds.) Multiple
Hypothesenprüfung - Multiple Hypotheses Testing. Symposium Gerolstein 1987, pp. 48–66.
Springer, Berlin (1988). Medizinische Informatik und Statistik 70
35. Nesselroade, J.R., McArdle, J.J., Aggen, S.H., Meyers, J.M.: Dynamic factor analysis models
for representing process in multivariate time-series. In: Moskowitz, D.S., Hershberger, S.L.
(eds.) Modeling Intraindividual Variability with Repeated Measures Data: Methods and
Applications, Chap. 9. Lawrence Erlbaum Associates, New Jersey (2009)
36. Park, B.U., Mammen, E., Härdle, W., Borak, S.: Time series modelling with semiparametric
factor dynamics. J. Am. Stat. Assoc. 104(485), 284–298 (2009)
37. Pauly, M., Brunner, E., Konietschke, F.: Asymptotic permutation tests in general factorial
designs. J. R. Stat. Soc., Ser. B. 77(2), 461–473 (2015)
38. Peligrad, M., Wu, W.B.: Central limit theorem for Fourier transforms of stationary processes.
Ann. Probab. 38(5), 2009–2022 (2010)
39. Peña, D., Box, G.E.: Identifying a simplifying structure in time series. J. Am. Stat. Assoc. 82,
836–843 (1987)
40. Rao, C.R., Mitra, S.K.: Generalized Inverse of Matrices and Its Applications. Wiley, New York,
London, Sydney (1971)
41. Robinson, P.: Automatic frequency domain inference on semiparametric and nonparametric
models. Econometrica 59(5), 1329–1363 (1991)
42. Romano, J.P., Wolf, M.: Exact and approximate stepdown methods for multiple hypothesis
testing. J. Am. Stat. Assoc. 100(469), 94–108 (2005)
43. Romano, J.P., Wolf, M.: Stepwise multiple testing as formalized data snooping. Econometrica
73(4), 1237–1282 (2005)
44. Romano, J.P., Shaikh, A.M., Wolf, M.: Control of the false discovery rate under dependence
using the bootstrap and subsampling. TEST 17(3), 417–442 (2008)
45. Sarkar, S.K.: Some probability inequalities for ordered MTP2 random variables: a proof of the
Simes conjecture. Ann. Stat. 26(2), 494–504 (1998)
46. Sarkar, S.K.: Some results on false discovery rate in stepwise multiple testing procedures. Ann.
Stat. 30(1), 239–257 (2002)
47. Sarkar, S.K., Chang, C.K.: The Simes method for multiple hypothesis testing with positively
dependent test statistics. J. Am. Stat. Assoc. 92(440), 1601–1608 (1997)
48. Simes, R.: An improved Bonferroni procedure for multiple tests of significance. Biometrika
73, 751–754 (1986)
SSI in DFMs 45
49. Sonnemann, E.: General solutions to multiple testing problems. Translation of “Sonnemann, E.
(1982). Allgemeine Lösungen multipler Testprobleme. EDV in Medizin und Biologie 13(4),
120–128”. Biom. J. 50, 641–656 (2008)
50. Tamhane, A.C., Liu, W., Dunnett, C.W.: A generalized step-up-down multiple test procedure.
Can. J. Stat. 26(2), 353–363 (1998)
51. Timm, N.H.: Applied Multivariate Analysis. Springer, New York (2002)
52. Troendle, J.F.: A stepwise resampling method of multiple hypothesis testing. J. Am. Stat.
Assoc. 90(429), 370–378 (1995)
53. Troendle, J.F.: Stepwise normal theory multiple test procedures controlling the false discovery
rate. J. Stat. Plann. Inference 84(1–2), 139–158 (2000)
54. van Bömmel, A., Song, S., Majer, P., Mohr, P.N.C., Heekeren, H.R., Härdle, W.K.: Risk
patterns and correlated brain activities. Multidimensional statistical analysis of fMRI data in
economic decision making study. Psychometrika 79(3), 489–514 (2014)
55. Westfall, P.H., Young, S.S.: Resampling-Based Multiple Testing: Examples and Methods
for p-Value Adjustment. Wiley Series in Probability and Mathematical Statistics. Applied
Probability and Statistics. Wiley, New York (1993)
56. Yekutieli, D., Benjamini, Y.: Resampling-based false discovery rate controlling multiple test
procedures for correlated test statistics. J. Stat. Plann. Inference 82(1–2), 171–196 (1999)
The Relationship Between the Beveridge–Nelson
Decomposition and Exponential Smoothing
Víctor Gómez
1 Introduction
In this chapter, two partial fraction expansions of an ARIMA model are described.
They are based on what is known in electrical engineering as parallel decom-
positions of rational transfer functions of digital filters. The first decomposition
coincides with the one proposed by Beveridge and Nelson [2], henceforth BN, that
has attracted considerable attention in the applied macroeconomics literature, or
is a generalization of it to seasonal models. The second one corresponds to the
innovations form of the BN decomposition.
The two decompositions are analyzed using both state space and polynomial
methods. It is shown that most of the usual additive exponential smoothing models
are in fact BN decompositions of ARIMA models expressed in innovations form.
This fact seems to have passed unnoticed in the literature, although the link between
single source of error (SSOE) state space models and exponential smoothing has
been recognized [6] and used for some time (see, for example, [3]). It is also shown
that these SSOE models are in fact innovations state space models corresponding to
the BN decomposition that defines the model.
The remainder of the chapter is organized as follows. In Sect. 2, the two parallel
decompositions of an ARIMA model are presented. These two decompositions
V. Gómez ()
Ministry of Finance and P.A., Dirección Gral. de Presupuestos, Subdirección Gral. de Análisis y
P.E., Alberto Alcocer 2, 1-P, D-34, 28046 Madrid, Spain
e-mail: [email protected]
© Springer International Publishing Switzerland 2016 47
I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_4
48 V. Gómez
are analyzed using polynomial and state space methods. The connection with
exponential smoothing is studied.
where is the mean of the differenced series, B is the backshift operator, Byt =
yt1 , n is the number of seasons, d = 0; 1; 2, D = 0; 1, r = 1 B is a regular
difference, and rn = 1 Bn is a seasonal difference. Our aim in this section is
to decompose model (1) using two partial fraction expansions. This can be done
using both polynomial and state space methods. However, it is to be emphasized
that the developments of this and the following section are valid for any kind of
ARIMA model, multiplicative seasonal or not. This means that we may consider
general models with complex patterns of seasonality, like for example
" #
Y
N
2 ni 1
.B/ r d
.1 C B C B C C B /yt D .B/at ; (2)
iD1
where n1 ; : : : ; nN denote the seasonal periods and the polynomials .z/ and .z/
have all their roots outside the unit circle but are otherwise unrestricted, or even
models with seasonal patterns with non-integer periods.
The following lemma, which we give without proof, is an immediate conse-
quence of the partial fraction expansion studied in algebra and will be useful later.
See also Lemma 1 of [4, p. 528] and the results contained therein.
Lemma 1 Let the general ARIMA model r d .B/yt = .B/at , where the roots of
.z/ are simple and on or outside of the unit circle and .z/ and .z/ have degrees
p and q, respectively. Then, the following partial fraction decomposition holds
.z/ B1 Bd Xp
Ak
D C0 CC1 zC CCqpd zqpd
C C C C :
.1 z/ .z/
d 1z .1 z/ kD1 1 pk z
d
(3)
Relationship Between the BN Decomposition and Exponential Smoothing 49
.z/ X Bk d X 1
Ak
m
D C0 C C1 z C C Cqpd zqpd C C
.z/ kD1
.1 z/ k
kD1
1 pk z
X
m2
Dk C E k z
C ;
kD1
1 C Fk z C Gk z2
It is shown in [4] that a partial fraction expansion of model (1) leads to the BN
decomposition in all cases usually considered in the literature and that, therefore, we
can take this expansion as the basis to define the BN decomposition for any ARIMA
model. To further justify this approach, consider that in the usual BN decomposition,
yt = pt C ct , models for the components are obtained that are driven by the same
innovations of the series. Thus, if the model for the series is .B/yt = .B/at , the
models for the components are of the form p .B/pt = p at and c .B/ct = c at . But
this implies the decomposition
and, since the denominator polynomials on the right-hand side have no roots in com-
mon, the previous decomposition coincides with the partial fraction decomposition
that is unique.
Assuming then that the parallel decomposition of the ARIMA model (1) is the
basis of the BN decomposition, suppose in (1) that p and P are the degrees of the
autoregressive polynomials, .B/ and ˚.Bn /, and q and Q are those of the moving
average polynomials, .B/ and .Bn /. Then, letting .B/ = .B/˚.Bn /, .B/ =
r d rnD , and .B/ = .B/.Bn /, supposing for simplicity that there is no mean in (1)
and using Lemma 1, the partial fraction expansion corresponding to model (1) is
where S.z/ = 1CzC Czn1 and we have used in (4) the fact that rn = .1B/S.B/.
Here, we have grouped for simplicity several terms in the expansion so that we are
only left with the components in (4). For example,
X
dCD
Bk ˛p .z/
D ;
kD1
.1 z/k .1 z/dCD
etc. Note that the third term on the right of (4) exists only if D > 0. The degrees
of the .z/, ˛p .z/, ˛s .z/, and ˛c .z/ polynomials in (4) are, respectively, maxf0; q
p d nDg, d 1, n 2, and p 1, where p =p C P, q = q C Q, and d =
d C D.
Based on the previous decomposition, we can define several components that
are driven by the same innovations, fat g. The assignment of the terms in (4) to the
different components depends on the roots of the autoregressive polynomials in (1).
For example, the factor .1 z/d , containing the root one, should be assigned to the
trend component, pt , since it corresponds to an infinite peak in the pseudospectrum
of the series at the zero frequency. Since all the roots of the polynomial S.z/
correspond to infinite peaks in the pseudospectrum at the seasonal frequencies, the
factor S.z/ should be assigned to the seasonal component, st .
The situation is not so clear-cut; however, as regards the roots of the autoregres-
sive polynomial, .z/˚.zn /, and in this case the assignment is more subjective. We
will consider for simplicity in the rest of the chapter only a third component, which
will be referred to as “stationary component,” ct . All the roots of .z/˚.zn / will
be assigned to this stationary component. Therefore, this component may include
cyclical and stationary trend and seasonal components.
According to the aforementioned considerations, the SSOE components model
yt D pt C st C ct (5)
can be defined, where pt is the trend, st is the seasonal, and ct is the stationary
component. The models for these components are given by
r d pt D ˛p .B/at ; S.B/st D ˛s .B/at ; .B/ct D .B/at ; (6)
where .z1 / = zp .z/ and the degrees of the polynomials ı.z1 /, ˇp .z1 /,
ˇs .z1 /, and ˇc .z1 / are, respectively, maxf0; r 1g, d 1, n 2, and p 1.
Transforming each of the terms of the right-hand side of (7) back to the z variable
yields
yt D at C ytjt1
D at C ptjt1 C stjt1 C ctjt1 ; (9)
where, given a random variable xt , xtjt1 denotes the orthogonal projection of xt onto
fys W s D t 1; t 2; : : :g. If the series is nonstationary, the orthogonal projection
is done onto the finite past of the series plus the initial conditions. The relationship
among the components pt , st , and ct and their predictors, ptjt1 , stjt1 , and ctjt1 ,
can be obtained by computing the decomposition of each component in the forward
operator. For example, if we take the model followed by the trend component given
in (6), r d pt = ˛p .B/at , we can write
zd ˛p .z/ ˇp .z1 /
D k p C ;
zd .1 z/d .z1 1/d
pt D kp at C ptjt1 : (11)
Therefore, ptjt1 follows the model r d ptjt1 D ˇp .B/at1 : In a similar way, we can
prove that there exist constants, k , ks , and kc such that
1 D k C kp C ks C kc : (12)
1
r4 yt D 1 B5 at : (13)
2
1 12 z5 1 1 1 3 1 1 1 12 z
D z C C C :
1 z4 2 81z 81Cz 2 1 C z2
Thus, defining
1 1 1 3 1 12 B 1 1
pt D at ; s1;t D at ; s2;t D at ; ct D at1 ; (14)
1B8 1CB8 1 C B2 2 2
and st = s1;t Cs2;t , the BN decomposition, yt = pt Cst Cct , is obtained. The innovations
form is given by the partial fraction decomposition of the model using the forward
operator, that is
z5 12 1 1 1 1 3 1 1 z1 C 2
D 1 C C
z1 .z4 1/ 2 z1 8 z1 1 8 z1 C 1 4 z2 C 1
1 1 z 3 z 1 z C 2z2
D 1C zC : (15)
2 8 1 z 8 1 C z 4 1 C z2
It follows from this that the innovations form is yt = at C ptjt1 C s1;tjt1 C s2;tjt1 C
ctjt1 , where
1 1 1 3
ptjt1 D at1 ; s1;tjt1 D at1 ;
1B8 1CB8
1 C 2B 1 1
s2;tjt1 D at1 ; ctjt1 D at1 :
1 C B2 4 2
In addition, the following relations hold
1 3 1
pt D at C ptjt1 ; s1;t D at C s1;tjt1 ; s2;t D at C s2;tjt1 ; ct D ctjt1 :
8 8 2
Relationship Between the BN Decomposition and Exponential Smoothing 53
There are many ways to put an ARIMA model into state space form. We will use
in this chapter the one proposed by Akaike [1]. If fyt g follows the ARIMA model
.B/yt = .B/at , where .z/ = 1 C 1 z C CP p zp and .z/ = 0 C 1 z C C q zq ,
let r = max. p; q C 1/, .z/ = 1 .z/.z/ = 1 jD0 j z and xt;1 = yt , xt;i = ytCi1
j
Pi2
jD0 j atCi1j , 2 i r. Then, the following state space representation holds
xt D Fxt1 C Kf at (16)
yt D Hxt ; (17)
where
2 3 2 3
0 1 0 0 0
6 0 0 1 0 7 6 1 7
6 7 6 7
6 :: :: :: : : :: 7 6 7
FD6 : : : : : 7 ; Kf D 6 ::: 7 ; (18)
6 7 6 7
4 0 0 0 1 5 4 r2 5
r r1 r2 1 r1
i D 0 if i > p, xt = Œxt;1 ; : : : ; xt;r 0 and H = Œ1; 0; : : : ; 0. Note that we are assuming
that 0 can be different from one, something that happens with the models for the
components in the BN decomposition. The representation (16)–(18) is not minimal
if q > p, but has the advantage that the first element of the state vector is yt (the other
elements of the state vector are the one to r 1 periods ahead forecasts of yt ). This
is particularly useful if yt is not observed, so that this representation is adequate
to put the BN decomposition into state space form. To see this, suppose that the
BN decomposition is yt = pt C st C ct , where fyt g follows the model (1) and the
components follow the models given by (6). Then, we can set up for each component
a state space representation of the form (16)–(18) so that, with an obvious notation,
we get the following representation for fyt g
2 3 2 32 32 3
xp;t Fp 0 0 xp;t1 Kf ;p
4 xs;t 5 D 4 0 Fs 0 5 4 xs;t1 5 4 Kf ;s 5 at (19)
xc;t 0 0 Fc xc;t1 Kf ;c
2 3
xp;t
yt D Hp Hs Hc 4 xs;t 5 ; (20)
xc;t
To obtain the innovations state space model corresponding to (16) and (17),
where xt = Œx0p;t ; x0s;t , x0c;t 0 , F = diag.Fp ; Fs ; Fc /, Kf = ŒKf0;p ; Kf0;s ; Kf0;c 0 and H = ŒHp ,
Hs , Hc satisfy (19) and (20), consider first that in terms of the matrices in (16)
and (17) the transfer function, .z/, of model (1) can be expressed as
.z/ D H .I Fz/1 Kf
D 1 C zH .I Fz/1 FKf (21)
HKf D 1 (22)
This relation is the state space equivalent to the polynomial relation (12). We also
get from (21) that
K D FKf ; (23)
and
D xtjt1 C Kf at :
where Fp D F KH has all its eigenvalues inside the unit circle. In fact, it can
be shown that the eigenvalues of Fp coincide with the inverses of the roots of the
moving average polynomial of the model, .z/, see, for example, [5, pp. 97–98].
As an example, we will use again model (13). According to the models (14), the
state space form (19) and (20) is
2 3 2 32 3 2 3
pt 1 0 0 0 00 pt1 1=8
6 7 6 0 1 0 00 07 6 7 6 7
6 s1;t 7 6 7 6 s1;t1 7 6 3=8 7
6 7 6 76 7 6 7
6 s2;t 7 60 0 0 1 0 0 7 6 s2;t1 7 6 1=2 7
6 7 D6 76 7C6 7 at (27)
6 s2;tC1jt 7 60 0 1 00 0 7 6 s2;tjt1 7 6 1=4 7
6 7 6 76 7 6 7
4 ct 5 40 0 0 0 0 1 5 4 ct1 5 4 0 5
ctC1jt 0 0 0 0 00 ctjt1 1=2
2 3
pt
6 s 7
6 1;t 7
6
6 s 7
7
yt D 1 1 1 0 1 0 6 2;t 7 (28)
6 s2;tC1jt 7
6 7
4 ct 5
ctC1jt
Note that the relation HKf = 1 holds and that the matrix Fp = F KH has all
its eigenvalues inside the unit circle. Note also that the last row in the transition
equation is zero and that, therefore, the last state can be eliminated from the state
space form. In this way, we would obtain a minimal innovations state space form.
Suppose we are given an innovations state space representation (24) and (25),
minimal or not, where xtjt1 = Œx0p;tjt1 ; x0s;tjt1 ; x0c;tjt1 0 , F = diag.Fp , Fs , Fc /, K =
ŒKp0 ; Ks0 ; Kc0 0 and H = ŒHp , Hs , Hc , and we want to obtain the BN decomposition,
56 V. Gómez
FKf D K
to get (16) and (17), where pt = Hp xp;t , st = Hs xs;t , and ct = Hc xc;t . If Fc is singular,
then defining
ct D ctjt1 C kc at ;
ct 0 Hc ct1 k
D C c at :
xc;tC1jt 0 Fc xc;tjt1 Kc
F a Kfa D K a ; H a Kfa D 1;
0
where F a = diag.Fp , Fs , Fca /, K a = ŒKp0 ; Ks0 , Kca 0 , H a = ŒHp , Hs , Hca , Kfa = ŒKf0;p ; Kf0;s ,
0
Kfa;c 0 and Kfa;c = Œkc ; Kf0;c 0 , we get (16), (17), where pt = Hp xp;t , st = Hs xs;t , xc;t = xac;t ,
and ct = Hca xc;t . Note that kc = 1 Hp Kf ;p Hs Kf ;s and that (16), (17) is not minimal
in this case.
Relationship Between the BN Decomposition and Exponential Smoothing 57
As an example, the reader can verify that if we start with (29) and (30), we
eliminate the last state in those equations, and we follow the previous procedure,
then we get (27) and (28).
There has been lately some interest in using generalized exponential smoothing
models for forecasting, see [3]. These models are SSOE models that once they
are put into state space form they become innovations model of the type we have
considered in earlier sections. The question then arises as to whether these models
have any connection with the models given by a parallel decomposition of an
ARIMA model. It turns out that many of the basic exponential smoothing models
coincide with those corresponding to the BN decomposition and in those cases,
mostly seasonal, where they do not coincide, the exponential smoothing models
have been shown to have some kind of problem that is solved if the models given by
the parallel decomposition are used instead. To see this, suppose first Holt’s linear
model, yt = pt1 C bt1 C at , where
and k1 and k2 are constants. If we substitute (32) into the expression for yt , it is
obtained that yt = pt C .1 k1 / at . In addition, it follows from (32) and (33) that
r 2 pt D k2 at1 C k1 rat :
˛p .B/
yt D C kc a t ; (34)
.1 B/2
k2 z C k1 .1 z/ k1 k2 k2
2
C kc D C C kc ;
.1 z/ 1z .1 z/2
r 2 yt D .1 C 1 B C 2 B2 /at ; (35)
58 V. Gómez
and k1 , k2 , and k3 are constants. There are apparently three unit roots in the model.
However, a closer look will reveal that there are in fact only two unit roots. To
see this, substitute (36) and (38) into the expression for yt to give yt = pt C st C
.1 k1 k3 / at . In addition, it follows from (36)–(38) that
r 2 pt D k2 at1 C k1 rat ; rn st D k3 at :
˛p .B/ ˛s .B/
yt D 2
C C kc a t ; (39)
.1 B/ .1 B/S.B/
where S.z/ = 1CzC Czn1 , ˛p .z/ = k2 zCk1 .1z/, ˛s .z/ = k3 , and kc = 1k1 k3 .
The partial fraction expansion of the polynomial in the backshift operator on the
right-hand side of (39) is
k2 z C k1 .1 z/ k3 k1 k2 k2 k3 =n ˇ.z/
2
C C kc D C 2
C C C kc ;
.1 z/ .1 z/S.z/ 1z .1 z/ 1z S.z/
where ˇ.z/ = .n 1/ C .n 2/z C C 2zn3 C zn2 k3 =n. Thus, if we define ct
= kc at , then yt = pt C st C ct , ptjt1 = pt1 C bt1 , stjt1 = stn , ctjt1 = 0, and
yt = ptjt1 C stjt1 C at . Therefore, Holt–Winters’ model is the innovations form
corresponding to the BN decomposition of an ARIMA model of the form
where .z/ is a polynomial of degree n C 1. However, the components are not well
defined because the seasonal component can be further decomposed as
k3 =n ˇ.B/
st D C at ;
1B S.B/
and we see that the first subcomponent should be assigned to the trend because
the denominator has a unit root. To remedy this problem, the seasonal component
Relationship Between the BN Decomposition and Exponential Smoothing 59
X
n1
st D sti C ˇ.B/at ;
iD1
Pn1
ptjt1 = pt1 C bt1 and stjt1 = iD1 sti . The model can be simplified if we
assume ˇ.B/ = k3 . Another possibility is to decompose the seasonal component
further according to the partial fraction expansion of its model,
Œn=2
ˇ.z/ X ki;1 C ki;2 z
D (41)
S.z/ iD1
1 2˛i z C z2
where Œx denotes the greatest integer less than or equal to x, ˛i = cos !i and !i =
2i=n is the ith seasonal frequency. If n is even, !n=2 = 2Œn=2=n = and the
corresponding term in the sum on the right-hand side of (41) collapses to kn=2 =
.1 C z/. This would lead us to a seasonal component of the form
Œn=2
X
st D si;t (42)
iD1
where we can assume ki;1 = k1 and ki;2 = k2 for parsimony. It is a consequence of the
partial fraction decomposition that there is a bijection between ARIMA models of
the form (40) and exponential smoothing models, yt = ptjt1 C stjt1 C at , where pt
and st are given by (36), (37), (42), and (43). A solution similar to (42) and (43) has
been suggested by De Livera et al. [3], where they propose for each component, sit ,
the model
It can be shown that both solutions are in fact equivalent. However, in [3,
pp. 1516, 1520] the expression yt = pt1 C bt1 C st1 C at is used. This implies
si;tjt1 = si;t1 in model (44), something that is incorrect. The correct expression
can be obtained using the method described in Sect. 2.2 and, more specifically,
formula (23).
60 V. Gómez
References
1. Akaike, H.: Markovian representation of stochastic processes and its application to the analysis
of autoregressive moving average processes. Ann. Inst. Stat. Math. 26, 363–387 (1974)
2. Beveridge, S., Nelson, C.R.: A new approach to decomposition of economic time series into
permanent and transitory components with particular attention to measurement of the ‘business
cycle’. J. Monet. Econ. 7, 151–174 (1981)
3. De Livera, A.M., Hyndman, R.J., Snyder, R.D.: Forecasting time series with complex seasonal
patterns using exponential smoothing. J. Am. Stat. Assoc. 106, 1513–1527 (2011)
4. Gómez, V., Breitung, J.: The Beveridge-Nelson decomposition: a different perspective with new
results. J. Time Ser. Anal. 20, 527–535 (1999)
5. Hannan, E.J., Deistler, M.: The Statistical Theory of Linear Systems. Wiley, New York (1988)
6. Hyndman, R.J., Koehler, A.B., Snyder, R.D., Grose, S.: A state space framework for automatic
forecasting using exponential smoothing methods. Int. J. Forecast. 18(3), 439–454 (2002)
7. Oppenheim, A.V., Schafer, R.W.: Discrete-Time Signal Processing, 3rd edn. Prentice Hall,
New Jersey (2010)
Permutation Entropy and Order Patterns
in Long Time Series
Christoph Bandt
Abstract While ordinal techniques are commonplace in statistics, they have been
introduced to time series fairly recently by Hallin and coauthors. Permutation
entropy, an average of frequencies of order patterns, was suggested by Bandt and
Pompe in 2002 and used by many authors as a complexity measure in physics,
medicine, engineering, and economy. Here a modified version is introduced, the
“distance to white noise.”
For datasets with tens of thousands or even millions of values, which are
becoming standard in many fields, it is possible to study order patterns separately,
determine certain differences of their frequencies, and define corresponding auto-
correlation type functions. In contrast to classical autocorrelation, these functions
are invariant with respect to nonlinear monotonic transformations of the data.
For order three patterns, a variance-analytic “Pythagoras formula” combines the
different autocorrelation functions with our new version of permutation entropy.
We demonstrate the use of such correlation type functions in sliding window
analysis of biomedical and environmental data.
1 Introduction
We live in the era of Big Data. To produce time series, we now have cheap electronic
sensors which can measure fast, several thousand times per second, for weeks or
even years, without getting exhausted. A sensor evaluating light intensity, together
with a source periodically emitting light, can measure blood circulation and oxygen
saturation if it is fixed at your fingertip. In nature it can measure air pollution by
particulates, or water levels and wave heights, and many other quantities of interest.
Classical time series, obtained for instance from quarterly reports of companies,
monthly unemployment figures, or daily statistics of accidents, consist of 20 up to
a few thousand values. Sensor data are more comprehensive. A song of 3 min on
C. Bandt ()
Institute of Mathematics, University of Greifswald, Greifswald, Germany
e-mail: [email protected]
CD comprises 16 million values. On the other hand, quarterly economic reports are
usually prepared with great scrutiny, while a mass of machine-generated data may
contain errors, outliers, and missing values. And there is also a difference in scale.
The intensity which a sensor determines is usually not proportional to the effect
which it intends to measure. There is a monotonic correspondence between light
intensity and oxygen-saturated blood, which is unlikely to be linear, and careful
calibration of the equipment is necessary if measurements on metric scale are
required. Raw sensor data usually come on ordinal scale. It is this ordinal aspect,
introduced to time series by Hallin and coauthors [8, 12], which we shall address.
We present autocorrelation type functions which remain unchanged when values are
transformed by a nonlinear monotone map like f .x/ D log x:
Permutation entropy, introduced in 2002 as a complexity measure [5], has been
applied to optical experiments [3, 18, 22], brain data [9, 16, 17, 19], river flow data
[14], control of rotating machines [15, 24], and other problems. For recent surveys,
see [2, 25]. A typical application is the automatic classification of sleep stages from
EEG data, a change point detection problem illustrated in Fig. 1b. We use a new
version of permutation entropy defined in Sect. 6.
Permutation entropy is just the Shannon entropy of the distribution of order
patterns:
X
HD p log p :
Fig. 1 Biomedical data of healthy person n3 from [20], taken from the CAP sleep database at
physionet [11]. (a) Ten seconds sample from EEG channel Fp2-F4, ECG, and plethysmogram.
(b) Expert annotation of sleep depth from [20] agrees with our version 2 of permutation entropy
almost completely. (c) Function Q̌ of the plethysmogram gives an overview of 9 h of high-frequency
circulation data. See Sects. 7 and 8
Order Patterns 63
Order patterns p will be defined below. There are two parameters. One parameter
is n; the length of the order pattern—the number of values to be compared with
each other. There are nŠ order patterns of length n: It could be proved that for many
dynamical systems, the limit of Hnn is the famous Kolmogorov–Sinai entropy [1, 23].
This provides a theoretical justification of permutation entropy. For real-world time
series, however, n > 10 is not meaningful because of the fast growth of possible
patterns. We recommend using n for which nŠ is smaller than the length of the series,
even though the averaging effect of the entropy formula allows to work with larger n:
The other parameter is d; the delay between neighboring equidistant time points
which are to be compared. For d D 1 we consider patterns of n consecutive points.
For d D 2 we compare xt with xtC2 ; xtC4 ; : : : It turns out that for big data series,
we have a lot of choices for d: Actually, d can be considered as delay parameter
in the same way as for autocorrelation. Moreover, one can consider single pattern
frequencies p .d/ instead of their average H; as done in [14, 17]. It turns out
that certain differences of pattern frequencies provide interpretable functions with
better properties than the frequencies themselves [6]. Figure 1c illustrates how
such functions can be used to survey large data series, like one night of high-
frequency circulation data. The purpose of the present note is to demonstrate that
such autocorrelation type functions form an appropriate tool for the study of big
ordinal data series. We focus on patterns of length 3 which seem most appropriate
from a practical viewpoint.
n12 .d/
ˇ.d/ D p12 .d/ p21 .d/ with p12 .d/ D and p21 .d/ D 1 p12 .d/
n12 .d/ C n21 .d/
for d D 1; 2; : : : Ties are disregarded. ˇ.d/ is called up–down balance and is a kind
of autocorrelation function. It reflects the dependence structure of the underlying
process and has nothing to do with the size of the data.
64 C. Bandt
Fig. 2 Example time series for calculation of frequencies p .d/: For d D 2; one pattern D 123
is indicated
For the short time series of Fig. 2, we get ˇ.1/ D 0; ˇ.2/ D 15 ; ˇ.3/ D
24 ; ˇ.4/ D 13 ; ˇ.5/ D 1, and ˇ.6/ D 1 which could be drawn as a function.
To get reliable estimates we need of course longer time series, and we shall always
take d T=2: Let us briefly discuss the statistical accuracy of ˇ.d/:
If the time series comes from Brownian motion (cf. [6]), there are no ties and
n12 .d/ follows a binomial distribution with p D 12 and n D T d: The radius of the
95 % confidence interval for p12 .d/ then equals p1n : The error of the estimate will be
larger for more correlated time series, depending on the strength of correlation. For
our applications, we are on the safe side with a factor 2. Since ˇ.d/ D 2p12 .d/ 1
4
this gives an error of ˙ pTd : Thus to estimate ˇ.d/ for small d with accuracy
˙0:01 we could need T D 160;000 values. Fortunately, the values of jˇ.d/j in our
applications are often greater than 0.1, and T D 2000 is enough to obtain significant
estimates. Nevertheless, ˇ is definitely a parameter for large time series.
One could think that usually ˇ.d/ D 0 and significant deviations from zero are
exceptional. This is not true. We found that ˇ can describe and classify objects
in different contexts. The data for Fig. 3 are water levels from the database of the
National Ocean Service [21]. Since tides depend on moon and sun, we studied
time series for the full month of September in 18 consecutive years. For intervals
of 6 min, 1 month gives T 7000: We have two tides per day, and the basic
frequency of 25 h can be observed in the functions ˇ.d/: What is more important,
ˇ.d/ characterizes sites, and this has nothing to do with amplitudes, which vary
tremendously between sites, but do not influence ˇ at all.
Since water tends to come fast and disappear more slowly, we could expect ˇ.d/
to be negative, at least for small d: This is mostly true for water levels from lakes,
like Milwaukee in Fig. 2. For sea stations, tidal influence makes the data almost
periodic, with a period slightly smaller than 25 h. The data have 6 min intervals and
we measure d in hours, writing d D 250 as d D 25 h: A strictly periodic time
series with period L fulfils ˇ.L d/ D ˇ.d/ which in the case L D 25 implies
ˇ.12:5/ D 0; visible at all sea stations in Fig. 2. Otherwise there are big differences:
at Honolulu and Baltimore the water level is more likely to fall within the next few
hours, at San Francisco it is more likely to increase, and at Anchorage there is a
Order Patterns 65
Fig. 3 Water levels at 6 min intervals from [21]. Original data shown for Sept 1–2, 2014. Functions
ˇ are given for September of 18 years 1997–2014, and also for January in case of Anchorage and
San Francisco. d runs from 6 min to 27 h (270 values)
change at 6 h. Each station has its specific ˇ-profile, almost unchanged during 18
years, which characterizes its coastal shape. ˇ can also change with the season, but
these differences are smaller than those between stations.
Thus Fig. 3 indicates that ˇ; as well as related functions below, can solve basic
problems of statistics: describe, distinguish, and classify objects.
3 Patterns of Length 3
Three equidistant values xt ; xtCd ; xtC2d without ties can realize six order patterns.
213 denotes the case xtCd < xt < xtC2d :
66 C. Bandt
For each pattern and d D 1; 2; : : : ; T=3 we count the number n .d/ of appear-
ances in the same way as n12 .d/: In case D 312 we count all t D 1; : : : ; T 2d
with xtCd < xtC2d < xt : Let S be the sum of the six numbers. Patterns with ties are
not counted. Next, we compute the relative frequencies p .d/ D n .d/=S: In Fig. 2
we have twice 132 and 312 and once 321 for d D 1; and once 123, 231, 321 for
d D 2: This gives the estimates p321 .1/ D 0:2 and p321 .2/ D 0:33: Accuracy of
estimates p .d/ is similar to accuracy of p12 .d/ discussed above. For white noise it
is known from theory that all p .d/ are 16 [5].
As autocorrelation type functions, we now define certain sums and differences of
the p : The function
1
.d/ D p123 .d/ C p321 .d/
3
is called persistence [6]. This function indicates the probability that the sign of
xtCd xt persists when we go d time steps ahead. The largest possible value of .d/
is 23 ; assumed for monotone time series. The minimal value is 13 : The constant 13
was chosen so that white noise has persistence zero. The letter indicates that this
is one way to transfer Kendall’s tau to an autocorrelation function. Another version
was studied in [8].
It should be mentioned that ˇ can be calculated also by order patterns of length
3, with a negligible boundary error (see [4]):
Beside and ˇ we define two other autocorrelation type functions. For convenience,
we drop the argument d:
describes up–down scaling since it approximately fulfils ı.d/ D ˇ.2d/ ˇ.d/ [4].
Like ˇ; these functions measure certain symmetry breaks in the distribution of the
time series.
Figure 4 shows how these functions behave for the water data of Fig. 3 and for
much more noisy hourly measurements of particulate matter which also contain a
lot of ties and missing data. Like ties, missing values are just disregarded in the
calculation. Although there is more variation, it can be seen that curves for 11
successive years are quite similar. Autocorrelation was added for comparison,
the function 2 is defined below.
All four ordinal functions remain unchanged when a nonlinear monotone
transformation is applied to the data. They are not influenced by low frequency
Order Patterns 67
Fig. 4 Autocorrelation and ordinal functions 2 ; ; ˇ; ; ı (a) for the almost periodic series of
water levels at Los Angeles (September 1997–2014, data from [21], cf. Fig. 3), (b) for the noisy
series of hourly particulate values at nearby San Bernardino 2000–2011 from [7], with weak daily
rhythm. The functions on the left are about three times larger, for 2 nine times, but fluctuations
are of the same size
68 C. Bandt
components with wavelength much larger than d which often appear as artifacts
in data. Since ; ˇ; ; and ı are defined by assertions like XtCd Xt > 0; they
do not require full stationarity of the underlying process. Stationary increments
suffice—Brownian motion for instance has .d/ D 16 for all d [6]. Finally, the
ordinal functions are not influenced much by a few outliers. One wrong value, no
matter how large, can change .d/ only by ˙2=S while it can completely spoil
autocorrelation.
Fig. 5 Autocorrelation and persistence for an AR2 process with various perturbations. In each
case ten samples of 2000 points were processed, to determine statistical variation. On the left,
300 points of the respective time series are sketched. (a) Original signal. (b) Additive Gaussian
white noise. (c) One percent large outliers added. (d) Low-frequency function added. (e) Monotone
transformation applied to the data
Order Patterns 69
clear maxima while the maxima of have a bump. Additive Gaussian white noise
with signal-to-noise ratio 1 increases this effect: Autocorrelation is diminished by
a factor of 2, and persistence becomes very flat. In practice, there are also other
disturbances. In Fig. 5c we took 1 % outliers with average amplitude 20 times of
the original signal. In d we added the low-frequency function sin t=300: In e the
time series was transformed by the nonlinear monotone transformation y D ex=7
which does not change at all. In the presence of outliers, nonlinear scale, and
low-frequency correlated noise, persistence can behave better than autocorrelation.
5 Sliding Windows
We said that for an autocorrelation type function like ˇ or we need a time series
of length T 103 : There are many databases, however, which contain series of
length 10 up to 107 : For example, the German Weather Service [10] offers hourly
5
temperature values from many stations for more than 40 years. To treat such data,
we divide the series into subseries of length 103 up to 104 ; so-called windows. We
determine or ˇ for each window which gives a matrix. Each column represents a
window, each row corresponds to a value of d: This matrix is drawn as a color-coded
image. This is the technique known from a spectrogram. To increase resolution,
overlapping windows can be used. The size of windows can be taken even smaller
than 103 if only a global impression is needed.
As example, we consider temperature values of the soil (depth 5 cm) at the
author’s town. There is a daily cycle which becomes weak in winter. Persistence
with windows of 1 month length (only 720 values) shows the daily cycle and yearly
cycle. Measurements started in 1978 but Fig. 6 shows that during the first years (up
Fig. 6 Hourly temperature of earth in Greifswald, Germany. Data from German Weather Service
[10]. (a) The last 2000 values of 2013. The daily cycle is weak in winter. (b) Persistence for
d D 1; : : : ; 50 h shows irregularities in the data
70 C. Bandt
to 1991) temperature was taken every 3 h—the basic period is d D 8: Between 1991
and 2000, measurements were taken only three times a day. For screening of large
data series, this technique is very useful.
where the sum runs over all patterns of length n; will Pbe called the distance of
1
the data from white noise. For n D 3 we have 2 D 2 2
p 6 : Thus, is
the squared Euclidean distance between the observed order pattern distribution and
the order pattern distribution of white noise. Considering white noise as complete
disorder, 2 measures the amount of rule and order in the data. The minimal value
0 is obtained for white noise, and the maximum 1 nŠ1 for a monotone time series.
Remark The Taylor expansion of H. p/ near white noise p is
nŠ 2 .nŠ/2 X 1
H. p/ D log nŠ C . p /3 :::
2 6 nŠ
Since H is a sum of one-dimensional terms, this is fairly easy to check. For f .q/ D
q log q we have f 0 .q/ D 1 log q , f 00 .q/ D q1 ; and f 000 .q/ D q2 : We insert
q D nŠ1 for all coordinates to get derivatives at white noise, and see that the linear
term of the Taylor expansion vanishes.
Thus for signals near to white noise, 2 is just a rescaling of H: For patterns of
length n D 3 we have
H log 6 32 :
Order Patterns 71
42 D 3 2 C 2ˇ 2 C 2 C ı 2 :
This holds for each d D 1; 2; : : : The equation is exact for random processes with
stationary increments as well as for cyclic time series. The latter means that we
calculate p .d/ from the series .x1 ; x2 ; : : : ; xT ; x1 ; x2 ; : : : ; x2d / where t runs from 1
to T: For real data we go only to T 2d and have a boundary effect which causes
the equation to be only approximately fulfilled. The difference is negligible. For the
proof and checks with various data, see [4].
This partition is related to orthogonal contrasts in the analysis of variance. When
2 .d/ is significantly different from zero, we can define new functions of d:
3 2 ˇ2 2 ı2
Q D ; ˇQ D ; Q D ; ıQ D :
42 22 42 42
By taking squares, we lose the sign of the values, but we gain a natural scale. ; Q Q
Q ˇ;
Q
and ı lie between 0 and 1 D 100 %; and they sum up to 1. For each d; they describe
the percentage of order in the data which is due to the corresponding difference of
patterns.
For Gaussian and elliptical symmetric processes, the functions ˇ; ; and ı are
all zero, and Q is 1, for every d [6]. Thus the map of Q shows to which extent the
data come from a Gaussian process and where the Gaussian symmetry is broken.
This is a visual test for elliptical symmetry. It does not provide p-values but it is a
strong tool in the sense of data mining, with 300,000 results contained in the pixels
of Fig. 7.
In data with characteristic periods, for example heartbeat, respiration, and
speech, we rarely find Q > 80 %: In Fig. 7 this is only true for d near the full or the
half heartbeat period which varies around 1 s during the night. Q is small throughout,
but ˇ is large for small d and around the full period while ıQ is large around half of
the period. This is not a Gaussian process.
8 Biomedical Data
As an example, we studied data from the CAP sleep database by Terzano et al.
[20] available at physionet [11]. For 9 h of ECG data, measured with 512 Hz, Fig. 7
shows ; Q Q ; and ıQ for 540 1-min non-overlapping sliding windows as graycode
Q ˇ;
on vertical lines. The delay d runs from 1 up to 600, which corresponds to 1.2 s. The
last row shows the difference in the above equation on a smaller scale, so this figure
is a practical check of the equation for more than 300,000 instances. It can be seen
72 C. Bandt
sec %
1 80
0.5 40
0
11PM 12AM 1AM 2AM 3AM 4AM 5AM 6AM 7AM
sec %
1 80
0.5 40
0.5 0.5
0
11PM 12AM 1AM 2AM 3AM 4AM 5AM 6AM 7AM
Fig. 7 Partition of 2 for ECG data of healthy subject n3 from the CAP sleep data of Terzano et al.
[20] at physionet [11]. From above, Q ; Q̌; ;
Q andıQ are shown as functions of time (1 min windows)
and delay (0.002 up to 1.2 s). The last row shows the error in the partition of 2 on a much smaller
scale, indicating artifacts
that the differences smaller than 1 % are rare—in fact they mainly include the times
with movement artifacts.
When 2 .d/ does not differ significantly from zero, it makes no sense to divide
by 2 : As a rule of thumb, we should not consider quantities like Q when 2 .d/ <
15=n where n is the window length. In Fig. 7 this would exclude the windows with
movement artifacts which cause black lines in the last row.
Sleep stages S1–S4 and R for REM sleep were annotated in Terzano et al. [20]
by experts, mainly using the EEG channel Fp2-F4 and the oculogram. Figure 1
demonstrates that permutation entropy 2 of that EEG channel, averaged over d D
2; : : : ; 20 gives an almost identical estimate of sleep depth. Permutation entropy
was already recommended as indicator of sleep stages, see [13, 16, 25], and our
calculations gave an almost magic coincidence for various patients and different
EEG channels. REM phases are difficult to detect with EEG alone. Usually, eye
movements are evaluated. Figure 1 indicates that information on dream phases are
contained in ˇQ of the plethysmogram which is measured with an optical sensor at
the fingertip. With certainty we can see here, and in Fig. 7, all interruptions of sleep
and a lot of breakpoints which coincide with changes detected by the annotator.
Evaluation of order patterns and their spectrogram-like visualization seem to be
a robust alternative to Fourier techniques in time series analysis for big data in many
fields.
Order Patterns 73
References
1 Introduction
process that generates the data consists of repeated Bernoulli trials and we have
the number of successes (e.g., clicks) and the number of overall observations
(e.g., impressions) to our disposal. To create a generative model for this process
we employ methods from Bayesian statistics. Starting with a prior distribution
that constitutes an estimate of the parameters of the underlying Bernoulli process
up to some point in time we update that model using Bayes’ equation to first
obtain a posterior distribution. From that posterior distribution a point estimate
of the parameters of the underlying Bernoulli process is derived using the mode
of the posterior distribution. In addition to the Bayesian estimation procedure
our generative models integrate either exponential smoothing (ES) techniques or
autoregressive moving averages (ARMA) of past values of the rate to counteract
noise and outliers in the data. This chapter is a major extension of a conference
contribution [2] in which only exponential smoothing based models were discussed
and evaluated. Here, we generalize the type of linear model our generative models
are based on much further by using AR and ARMA techniques as a basis.
The main advantage of our generative models is that due to utilizing not only
past values of the rate but also past values of the number of success and failure
observations more information is available which leads to better forecasts of future
values. Also due to the probabilistic nature of our models it is straightforward
to extract uncertainty estimates for forecast values. Those uncertainty estimates
describe how (un)certain the model is that its forecast value will actually be equal or
very similar to the value of the rate that will be observed in the future. Uncertainty
estimates can be used in an application, e.g., to alert a human expert in case forecasts
become too uncertain.
Apart from some data of the aforementioned application field of online advertis-
ing we apply our generative models to several further artificial and real-world data
sets to evaluate their forecasting performance and compare them to the respective
non-generative models.
The remainder of this chapter is structured as follows: Sect. 2 gives a brief
overview of related work. In Sect. 3 we present our new generative forecasting
models. In Sect. 4 we apply our models to several artificial and real-world data
sets to evaluate and compare their forecasting performance. Finally, in Sect. 5 we
summarize our findings and briefly sketch planned future research.
2 Related Work
the values change over time. Their goal is to detect change points and outliers in a
sequence of bits in the context of an adaptive variant of the context tree weighting
compression algorithm (cf. [10, 11]). The authors either use a window of fixed
length of previous observations or exponentially decreasing weights for previous
values. However, no trend is considered.
Yamanashi et al. have integrated discounting factors into both, an EM algorithm
for Gaussian mixtures (cf. [12]) and an autoregressive (AR) model which is used
to detect change points and outliers in a time series (cf. [13]). Again, no trends
are considered and the focus is on detecting change points and outliers and not on
accurately forecasting future values of the time series.
In this section we first explain the basics of our generative models for rates and
then present several generative models based on exponential smoothing and ARMA
techniques.
In the following we denote the most recently observed value of a rate we are
interested in with the time index T 2 N and a future rate we would like to forecast
with time index t0 > T. We then have rates x1 ; : : : ; xT 2 Œ0; 1 observed at points in
time 1 ; : : : ; T 2 R which we can use to forecast the value xt0 . Each rate xt can be
subdivided into a numerator nt 2 N and a nonzero denominator dt 2 N n f0g, i.e.,
xt D ndtt for 1 t T. In this chapter we assume that each observed rate value was
created by a Bernoulli trial with some success parameter p. This parameter is usually
unknown and has to be estimated from observed data. An additional assumption we
make in this chapter is that p changes over time and, thus, estimates of the success
parameter have to be constantly kept up-to-date. Basically, with a Bernoulli trial at
each point in time either a success or a failure is observed. Our approach, however,
is more general and allows for multiple observations at each time step.
All generative models presented in this chapter use a Bayesian approach to
estimate p from observed data. The basic idea is that using Bayes’ equation
new observations can easily be integrated into a model that already describes all
observations up to a certain point in time. For the initial update, a prior distribution
has to be chosen which expresses any prior knowledge about the process a user
may have. In case no prior knowledge is available an uninformative prior can be
chosen [14]. Usually, in Bayesian statistics a conjugate prior distribution [1] is used
because then the derived posterior distribution has the same functional form as the
prior. This enables us to use the posterior as a prior in subsequent updates.
78 E. Kalkowski and B. Sick
For our problem consisting of rates the conjugate prior distribution is a beta
distribution B.xj˛; ˇ/ defined by
.˛ C ˇ/ ˛1
B.xj˛; ˇ/ D x .1 x/ˇ1 (1)
.˛/ .ˇ/
for x 2 Œ0; 1 and ˛; ˇ 2 R with ˛; ˇ > 0. Also, ./ is the gamma function defined
by
Z 1
.t/ D xt1 exp.x/ dx: (2)
0
In case the beta distribution is associated with a specific point in time, that point’s
time index is used for the distribution and its parameters as well, e.g., BT .xj˛T ; ˇT /
indicating a beta distribution at time point T with time-specific parameters ˛T and
ˇT .
Using Bayes’ equation
it can easily be shown that the parameters ˛T and ˇT of a posterior beta distribution
can be computed from the parameters ˛T1 and ˇT1 of the respective prior
distribution as
˛T D ˛T1 C nT ; (4)
ˇT D ˇT1 C .dT nT / (5)
using the latest observation xT D ndTT . The beta distribution obtained this way is used
as a prior distribution in the next update step. If a point estimate pT of the success
rate at time point T is required it can be obtained from the posterior distribution.
Possible choices to extract a point estimate are the expected value or the mode
of the distribution. This choice should be made in conjunction with the initial and
possibly uninformative prior distribution especially in case initial forecasts shall be
made based solely on the initial distribution since not for every prior distribution
both mode and expected value of the distribution are well defined. Also, depending
on the combination of initial prior and the way in which to extract point estimates,
not every forecast accurately represents the number of made success and failure
observations. In this chapter, we use a Bayesian prior with ˛ D ˇ D 1 as suggested
in [15] and extract point estimates based on the mode of the distribution:
(
0:5; ˛T D ˇT D 1
pT D MT .x/ D ˛T 1
: (6)
˛T CˇT 2
; otherwise
Generative Exponential Smoothing and Generative ARMA Models to Forecast. . . 79
Here, MT .x/ indicates the mode of the posterior beta distribution at time point T
with parameters ˛T and ˇT . The mode has been slightly extended for the case of
˛T D ˇT D 1 to allow derivation of forecasts even for initial prior distributions.
In addition to forecasts of future values of the rate we want to assess the
uncertainty of those forecasts. An estimate of how uncertain a generative model is
about a forecast can be derived from the variance of the underlying distribution. Due
to our choice of the prior distribution, the variance of a posterior beta distribution lies
1
within Œ0; 12 . To scale our uncertainty estimates to the interval Œ0; 1 we multiply
the variance by 12 before taking the square root to get a scaled standard deviation
as uncertainty estimate of a forecast. For the beta distribution we thus get an
uncertainty estimate at time point T of
s
p 12˛T ˇT
uT D 12 VT .x/ D (7)
.˛T C ˇT /2 .˛T C ˇT C 1/
where VT .x/ is the variance of the posterior beta distribution at time point T .
The key to integrating exponential smoothing and ARMA techniques into this
generative modeling framework lies in modifying the update procedure of the beta
distribution shown in (4) and (5). The following sections explain how different
generative models achieve this and how those models derive forecasts of future
values of the success parameter of the underlying Bernoulli process.
new observations:
This does not change the mode of the distribution but it influences its variance
and, thus, the uncertainty estimates: The further a forecast lies into the future the
more uncertain the model becomes about that forecast. The forecasts themselves
are independent of the amount of time into the future (forecasting horizon) for
which they are made. Point estimates and uncertainty estimates are derived using (6)
with the parameters ˛t0 and ˇt0 of the projected posterior distribution. More details
regarding the GES model can be found in [2].
.1 T /.xT xT1 /
#T D T #T1 C : (12)
T T1
The forecast has to be limited to the interval Œ0; 1 of valid rates since due
to the consideration of the local trends invalid forecasts outside of Œ0; 1 could
occur. Forecasts made by this model explicitly consider the forecasting horizon.
Uncertainty estimates are derived according to (7) using ˛t0 and ˇt0 . More details on
the GEST model can be found in [2].
To take the approach of Sect. 3.3 one step further we can not only consider the dif-
ferences of subsequent rate values but instead take the differences or smoothed local
Generative Exponential Smoothing and Generative ARMA Models to Forecast. . . 81
trends of subsequent success and failure observations separately into account. This
model is called generative double exponential smoothing (GDES). The parameters
of the beta distribution are updated according to
.1 T /.nT nT1 /
#˛;T D T #˛;T1 C ; (16)
T T1
1 T
#ˇ;T D T #ˇ;T1 C .dT nT dT1 C nT1 /: (17)
T T1
Since we consider the smoothed local trends when updating the parameters it may
happen that invalid parameter combinations are generated. For our choice of the
prior distribution (cf. Sect. 3.1) all values smaller than 1 would be invalid and thus
we limit our parameter updates accordingly. To derive a forecast we directly use the
mode of the resulting distribution as in (6). However, in contrast to the GEST model
the forecasts of the GDES model depend on the forecasting horizon. This is due to
the smoothed local trend being considered when deriving the forecasts. More details
about the GDES model can be found in [2].
In this section we present a more general approach where the weights of values
observed in the past are not decreasing exponentially but can be arbitrary real
numbers. This kind of model is called generative autoregressive model (GAR).
Basically, a non-generative autoregressive model creates a forecast by taking the
weighted sum of p values observed in the past. For our GAR model, we apply this
82 E. Kalkowski and B. Sick
approach to the two parameters of the beta distribution, i.e., the value of a parameter
at a specific point in time T depends on the weighted sum of the previous p success
or failure observations. Due to the data-dependent weights, the result may become
negative depending on the choice of parameters. To make sure that the resulting
beta distribution is well defined and to be consistent with our exponential smoothing
based generative models, we limit the parameter values to be greater than or equal
to 1. This results in
( )
Xp
˛T D max 1; t nTtC1 ; (20)
tD1
( )
X
p
ˇT D max 1; t .dTtC1 nTtC1 / : (21)
tD1
. 1; : : : ; p/
T
D arg minp fkA x bkg; (22)
x2R
with
0 1 0 1
n1 np npCh
B C B C
ADB :: :: : C bDB : C
@ : : :: A ; @ :: A ; (24)
nTphC1 nTh nT
0 1 0 1
d1 n1 dp np dpCh npCh
B C B C
BDB :: :: :: C; cDB :: C: (25)
@ : : : A @ : A
dTphC1 nTphC1 dTh nTh dT nT
A forecast of a future value of the rate is made by first computing the parameters
˛T and ˇT of the beta distribution and then taking the mode of this distribution
as defined in (6). In contrast to our exponential smoothing based models which
either do not consider the forecasting horizon at all (GES model, cf. Sect. 3.2) or
can dynamically adapt to different forecasting horizons (GEST and GDES models,
cf. Sects. 3.3 and 3.4), the weights of an AR model either have to be recomputed in
case the forecasting horizon changes or multiple sets of weights have to be prepared
Generative Exponential Smoothing and Generative ARMA Models to Forecast. . . 83
one for each horizon. An uncertainty estimate can be derived according to (7) using
the most recent beta distribution.
In addition to values observed in the past we can also explicitly include noise
terms into our model. This type of model is called generative autoregressive moving
average (GARMA). When including noise terms into (20) and (21) we get
( )
X
p X
q
˛T D max 1; t nTtC1 C t t ; (26)
tD1 tD1
( )
X
p
X
q
ˇT D max 1; t .dTtC1 nTtC1 / C t t : (27)
tD1 tD1
Here, the t and t are weights identical to those used in (20) and (21). Additionally,
1 ; : : : ; q 2 R and 1 ; : : : ; q 2 R are weights for realizations of random noise
t N .0; 2 /. Both values have a hard lower limit of 1 to be consistent with our
exponential smoothing based models and to make sure the resulting beta distribution
is well defined.
Due to not being able to directly observe the random noise variables t , training
a GARMA model is slightly more difficult than training an AR model. For the
simulations in this chapter we used a variation of the Hannan–Rissanen algorithm
[16]. In the first step of this algorithm, a high order GAR model is fitted to available
training data. Then, the forecasting errors of the trained GAR model are used as
estimates for the t . Finally, using the estimated values for t we can estimate the
parameters of a full GARMA model by solving the least squares problems
. 1; : : : ; p ; 1 ; : : : ; q /
T
D arg min
pq
fkC x dkg; (28)
x2R
with
0 1
nmpC1 nm mqC1 m
B C
CDB :: :: : :: :: :: C ;
@ : : :: : : : A (30)
nThpC1 nTh ThqC1 Th
0 1
dmpC1 nmpC1 dm nm mqC1 m
B C
DDB :: :: :: :: :: :: C ; (31)
@ : : : : : : A
dThpC1 nThpC1 dTh nTh ThqC1 Th
84 E. Kalkowski and B. Sick
0 1 0 1
nmCh dmCh nmCh
B C B C
dDB : C eDB :: C
@ :: A ; @ : A (32)
nT dT nT
4 Simulation Results
In this section we apply our generative models to several artificial benchmark data
sets and real-world data sets.
The first data sets we use for the evaluation and comparison of our models are
artificially generated. For that, we used seven different base signals: square wave,
sine, triangular, and mixed 1 through 4. Those base signals span an overall time
of 300 time steps. The signals are generated by taking 200 base observations and
varying the number of successes according to the respective type of signal. For the
square wave the number of successes alternates between 40 and 160 every 25 time
steps which results in the success rate varying between 0:2 and 0:8. The sine signal
uses a wavelength of 50 time steps and oscillates between 10 and 90 successes
which yields a rate varying between 0:05 and 0:95. The triangular signal is similar
to the sine signal except it has a triangular shaped rate. The four mixed data sets
contain rates whose characteristics change midway from a square wave signal to a
sine signal and vice versa and from a sine with low frequency to a sine with high
frequency and vice versa. For each base signal a number of variants were generated
by adding white noise of variances 0, 5, 10, 15, 20, 25, and 30 to the number of
success observations. This yields a total of 49 artificial data sets.
In addition to the artificial data sets some real-world data sets are considered.
The first is taken from the iPinYou challenge [17] held in 2013 to find a good
real time bidding algorithm for online advertising. While we have not created such
an algorithm we nevertheless use the data from the competition and forecast click
through rates. The data do not have a fixed time resolution but we aggregated them
to contain one click through rate value per hour.
Generative Exponential Smoothing and Generative ARMA Models to Forecast. . . 85
The second real-world data set was used in [18, 19] and is concerned with the
number of actively used computers in a computer pool at the University of Passau,
Germany. The data were recorded between April 15th and July 16th 2002 in 30-min
intervals.
A third real-world data set is taken from the PROBEN1 benchmark data suite
[20, 21]. The data set contains recordings of the energy consumption of a building
with a time resolution of 1 h.
The next 17 real-world data sets are concerned with search engine marketing.
Each of these data sets contains conversion rates for one search keyword over a
varying amount of time and with a varying time resolution.
The last 20 real-world data sets contain the citation rates of scientific articles.
The citation rates were aggregated to a time resolution of 1 year.
The next section presents the results of applying our forecasting models to the
data sets described in this section.
All of our models have one or more parameters that need to be adjusted before
the model can actually be applied to data. We use an iterative grid search to
automatically find good parameters. During the parameter search we only use the
first quarter of each data set. For each value in the remaining three quarters of each
data set a forecast is made using forecasting horizons from 1 to 6 time steps. For
each horizon the mean squared forecasting error (MSE) is computed which can
then be compared to the errors achieved by other models.
For an overall evaluation of our models we applied the Friedman test [22] using
a significance level of 0:01 followed by a Nemenyi test [23]. The Friedman test is a
non-parametric statistical test which compares a set of multiple forecasting models
on multiple data sets. In order to do that ranks are assigned to the models. For each
data set the model with the best performance gets the lowest (best) rank. The null
hypothesis of the Friedman test is that there are no significant differences between
the forecasting performances of all models averaged over all used data sets. For our
nine models and 534 combinations of data set and forecasting horizon Friedman’s
2F is distributed according to a 2F distribution with 8 degrees of freedom. The
critical value of 2F .8/ for a significance level of 0:01 is 20:1 which is lower than
Friedman’s 2F D 153:9 which means the null hypothesis is rejected.
In case there are significant differences a post hoc test such as the Nemenyi test is
employed to find out which of the models performed better or worse. The result of
the Nemenyi test is a critical difference (CD). For our results the Nemenyi test yields
a critical difference of CD D 0:807. This means that any two models whose average
ranks differ by more than 0:807 are significantly different from each other. The
results of the Nemenyi test can be visualized by means of a critical difference plot
(cf. Fig. 1). For comparison we also included results of a non-generative exponential
86 E. Kalkowski and B. Sick
9 8 7 6 5 4 3 2 1
CD = 0.807
AR GARMA
GAR GDES
ARMA GEST
GES SW
ES
Fig. 1 Critical difference plot of the ranked MSE values achieved by each model for a significance
level of 0:01. Smaller ranks are better than greater ranks. If the ranks of two models differ more
than the critical difference their performance is significantly different on the used data sets
4.3 Run-Time
9 8 7 6 5 4 3 2 1
CD = 0.642
GARMA ES
GAR SW
AR GDES
ARMA GES
GEST
Fig. 2 Critical difference plot of the ranked run-times required to evaluate the MSE of each model
for all used data sets using a significance level of 0:01. Smaller ranks are better since they result
from smaller run-times. If the ranks of two models differ less than the critical difference there is
no significant difference between the run-times of the respective models
In this chapter we presented several novel generative models for rates which are
based on either exponential smoothing or ARMA techniques. We evaluated the
forecasting performance of our models using several artificial and real-world data
sets. A statistical test showed that the GARMA models has the best forecasting
performance followed by a GDES model. A brief analysis of the run-time of our
models revealed that, as expected, models such as GARMA and GAR, which require
more operations to derive forecasts, also require more run-time to evaluate. The
exponential smoothing based models GDES and GES are nearly as fast as a non-
generative exponential smoothing model and a sliding window average.
This chapter focused on describing our novel modeling approaches and giving an
impression of their forecasting performance by comparing their performance using
several data sets. For the sake of brevity detailed results for individual data sets
were not given. In the future we would like to further explore how our different
kinds of models perform on data sets with specific characteristics. Also we would
like to compare our models’ forecasting performance to completely different kinds
of models, e.g., nonlinear models based on support vector regression.
For all our generative models it is possible to derive uncertainty estimates that
express how certain the model is that its predictions will actually fit to future
observations of the rate. We did not explore the possibilities of uncertainty estimates
further in this chapter; however, in the future we would like to make use of this
information to, e.g., automatically notify a human expert in case the uncertainty of
forecasts becomes too high.
Acknowledgements The search engine marketing data sets were kindly provided by crealytics.
88 E. Kalkowski and B. Sick
References
1. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
2. Kalkowski, E., Sick, B.: Generative exponential smoothing models to forecast time-variant
rates or probabilities. In: Proceedings of the 2015 International Work-Conference on Time
Series (ITISE 2015), pp. 806–817 (2015)
3. Gardner Jr., E.S.: Exponential smoothing: the state of the art. J. Forecast. 4(1), 1–28 (1985)
4. Gardner Jr., E.S.: Exponential smoothing: the state of the art—part II. Int. J. Forecast. 22(4),
637–666 (2006)
5. Chatfield, C.: Time-Series Forecasting. Chapman and Hall/CRC, Boca Raton (2000)
6. Brockwell, P.J., Davis, R.A.: Introduction to Time Series and Forecasting, 2nd edn. Springer,
New York (2002)
7. Chatfield, C.: The Analysis of Time Series: An Introduction, 6th edn. Chapman and Hall/CRC,
Boca Raton (2003)
8. Granger, C.W.J., Newbold, P.: Forecasting Economic Time Series, 2nd edn. Academic, San
Diego (1986)
9. Sunehag, P., Shao, W., Hutter, M.: Coding of non-stationary sources as a foundation for
detecting change points and outliers in binary time-series. In: Proceedings of the 10th
Australasian Data Mining Conference (AusDM ’12), vol. 134, pp. 79–84 (2012)
10. O’Neill, A., Hutter, M., Shao, W., Sunehag, P.: Adaptive context tree weighting. In:
Proceedings of the 2012 Data Compression Conference (DCC), pp. 317–326 (2012)
11. Willems, F.M.J., Shtarkov, Y.M., Tjalkens, T.J.: The context-tree weighting method: basic
properties. IEEE Trans. Inf. Theory 41(3), 653–664 (1995)
12. Yamanishi, K., Takeuchi, J., Williams, G., Milne, P.: On-line unsupervised outlier detection
using finite mixtures with discounting learning algorithms. In: Proceedings of the 6th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2000)
(2000)
13. Yamanishi, K., Takeuchi, J.: A unifying framework for detecting outliers and change points
from non-stationary time series data. In: Proceedings of the 8th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD 2002), pp. 676–681 (2002)
14. Jaynes, E.T.: Prior probabilities. IEEE Trans. Syst. Sci. Cybern. 4, 227–241 (1968)
15. Bayes, T., Price, R.: An essay towards solving a problem in the doctrine of chances. By the
late Rev. Mr. Bayes, F. R. S. communicated by Mr. Price, in a letter to John Canton, A. M. F.
R. S. Philos. Trans. 53, 370–418 (1763)
16. Hannan, E.J., Rissanen, J.: Recursive estimation of mixed autoregressive-moving average
order. Biometrika 69, 81–94 (1982)
17. Beijing Pinyou Interactive Information Technology Co., Ltd.: iPinYou: global bidding
algorithm competition. http://contest.ipinyou.com/data.shtml [Online]. Last accessed 18 Mar
2015
18. Gruber, C., Sick, B.: Processing short-term and long-term information with a combination of
hard- and soft-computing techniques. In: Proceedings of the IEEE International Conference
on Systems, Man & Cybernetics (SMC 2003), vol. 1, pp. 126–133 (2003)
19. Fuchs, E., Gruber, C., Reitmaier, T., Sick, B.: Processing short-term and long-term information
with a combination of polynomial approximation techniques and time-delay neural networks.
IEEE Trans. Neural Netw. 20(9), 1450–1462 (2009)
20. Kreider, J.F., Haberl, J.S.: The great energy predictor shootout—Overview and discussion of
results, ASHRAE Transactions 100(2), 1104–1118 (1994)
21. Prechelt, L.: PROBEN 1 – a set of neural network benchmark problems and benchmarking rules.
Technical report, University of Karlsruhe (1994)
22. Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis
of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)
23. Nemenyi, P.B.: Distribution-free multiple comparisons. Ph.D. thesis, Princeton University
(1963)
First-Passage Time Properties of Correlated
Time Series with Scale-Invariant Behavior
and with Crossovers in the Scaling
1 Introduction
The observable output signals of many complex dynamical systems are long-range
correlated time series with long memory and scale invariance properties, which are
usually studied in the framework of one-dimensional generalized random walks.
Fundamental characteristics of random walks, and of the corresponding time series,
are represented by the statistical properties of the first-passage time (FPT) [1], i.e.,
the time required for the time series to return to a certain value (usually zero).
The main statistical properties of the FPT are the functional form of its probability
distribution p.`/ and its average length h`i. Empirical studies have reported a variety
of forms for the probability distribution of FPT, including (1) pure exponential for
random uncorrelated processes [2], (2) stretched exponential forms for a diverse
group of natural and social complex systems ranging from neuron firing [3], climate
fluctuations [4], or heartbeat dynamics [5], to Internet traffic [6, 7] and stock
market activity [8, 9]; and (3) power-law form for certain on–off intermittency
processes related to nonlinear electronic circuits [10] and anomalous diffusion [11–
14]. Such diverse behavior is traditionally attributed to the specifics of the individual
system. Identifying common factors responsible for similar behaviors of FPT across
different systems has not been a focus of investigations. Indeed, these systems
exhibit different scale invariant long-range correlated behaviors and how the degree
of correlations embedded in the system dynamics relates to the statistical properties
of FPT is not known. In a previous work [15] we hypothesized that correlations are
the unifying factor behind a class of complex systems of diverse nature exhibiting
similar statistical properties for FPT, and conversely, systems that belong to the same
class of FPT properties posses a comparable degree of correlations. We investigated
[15] how the degree of correlations in the system dynamics affects key properties of
FPT—the shape of the probability density p.`/ and the FPT average length h`i. A
summary of the results we found in [15] are presented in Sect. 2.
However, instead of a single scaling exponent, many complex systems exhibit
two different scaling regimes characterized by different correlation exponents at
short and large scales with a crossover separating both regimes. This happens in
complex dynamical systems that are regulated by several competing mechanisms
acting at different time scales, thus producing fractal long-range correlations with
scaling crossovers. Examples of such scaling crossovers can be found in heartbeat
dynamics [16, 17], stock market trade [18], or in the proprioceptive system respon-
sible for human balance [19, 20]. Here, we hypothesize that correlations are also
the unifying factor behind the FPT properties of such complex systems. To probe
so, first we present an algorithm able to generate artificial, long-range correlated
time series with scaling crossovers (Sect. 3). Then, by a systematic analysis of such
time series, we present numerical results on how the different correlation exponents
at short and large scales and the scale of the crossover affect drastically the key
properties of FPT, p.`/, and h`i (Sects. 4 and 5). Finally, we compare in Sect. 6 the
theoretical and numerical results presented in Sects. 4 and 5 with those obtained in
a real time series with scaling crossover, the trajectory of the Center of Pressure of
FPT Properties of Correlated Time Series with Scaling Crossovers 91
the human postural control system in quiet standing. We show the general validity
of our results and conclude that long-range correlations can be seen, indeed, as the
common factor explaining the FPT properties in such complex systems.
α = 0.9
x (t )
α = 1.3
x (t )
time t
Fig. 1 Examples of two time series with different degree of correlations as quantified by the
scaling exponent ˛. The piecewise constant lines illustrate the FPTs for both signals
92 P. Carpena et al.
Correlations in the final time series are quantified by the single scaling exponent
˛, which is an input of the algorithm and corresponds to the detrended fluctuation
analysis (DFA) scaling exponent (Fig. 1) by construction.1 The DFA algorithm [22]
is a scaling analysis method which calculates the fluctuations F.n/ of the analyzed
signal at different time scales n removing the local trends. The analyzed time series
presents long-range correlations and scale-invariant properties if
F.n/ n˛ (1)
and the exponent ˛ quantifies the strength of the long-range correlations. Although
other scaling exponents could have been used, we prefer to use as our reference the
DFA exponent ˛, since the DFA method has become the standard when studying
such long-range correlated time series [18–20, 23, 24], and can also be applied to
real-world nonstationary time series. For uncorrelated random signals, ˛ D 0:5;
for anticorrelated signals, ˛ < 0:5; and for positively correlated signals, ˛ > 0:5.
Processes with 0 < ˛ < 1 are fractional Gaussian noises (fGns) and processes
with 1 < ˛ < 2 are fractional Brownian motions (fBms). In particular, ˛ D 1:5
corresponds to the classical random walk.
In [15] we systematically investigated how the degree of correlations in the
system dynamics affects key properties of FPT—the shape of the probability
density p.`/ and the FPT average length h`i. We include here our main results for
completeness, and also because they will serve us as a reference when we analyze
the results obtained for time series with scaling crossovers in the next sections.
Depending on the range of correlations considered, we obtained two different
regimes for the FPT probability density p.`/:
(
exp ``0 if 0 < ˛ < 1
p.`/ f .`/
(2)
`3˛
if 1 < ˛ < 2
The first case, which we call stretched-exponential regime, is obtained when the
correlations are in the range 0 < ˛ < 1, i.e., for fGns. p.`/ behaves as a stretched-
exponential and the stretching parameter depends on ˛: for the well-known case
˛ D 0:5 (white noise), we find that D 1; corresponding to a pure exponential
behavior. For ˛ < 0:5, we find that > 1, and increases as ˛ decreases. In this
case, p.`/ decays faster than exponentially. For ˛ > 0:5, we find that < 1, and
decreases as ˛ increases . In this case, p.`/ is a real stretched exponential and its tail
becomes fatter as ˛ increases. This result matches experimental observations for a
great variety of phenomena [4, 5, 8].
The second case, which we call power-law tail regime, is obtained when the
correlations are in the range 1 < ˛ < 2, i.e., for fBms. In this case, the tail of the FPT
1
When the power spectrum of a time series is of the type S. f / f ˇ , and the DFA fluctuation
function behaves as F.n/ n˛ , then the two exponents are related via ˇ D 2˛ 1.
FPT Properties of Correlated Time Series with Scaling Crossovers 93
In the stretched exponential regime (0 < ˛ < 1), a and b are positive constants,
and h`i1 is the finite constant asymptotic value in the limit of large N. For
increasing ˛, the exponent b decreases and then the convergence to the asymptotic
value h`i1 is slower with the time series length N, and the values of h`i1 also
increase with ˛.
In the power-law tail regime (1 < ˛ < 2) we find a power-law dependence of h`i
on the time series length N, with exponent ˛ 1.
The case ˛ D 1 (1=f noise) corresponds to a phase transition between both
regimes, where p.`/ decays faster than a power law and slower than a stretched
exponential, and the mean value h`i increases logarithmically with the time series
length N.
In order to systematically study the statistical properties of FPTs in time series with
scaling crossover, we need a model to artificially generate such kind of times series.
We propose to use a modified version of the Fourier Filtering method as follows:
(1) Generate a random time series in time domain, then Fourier transforms it
to the frequency ( f ) domain to obtain a white noise. (2) Choose the desired DFA
scaling exponents governing short (˛s ) and large (˛l ) time scales, and the time scale
where the crossover between both scaling regimes happens, tc . Then, multiply the
white noise by a function of the type
(
f .2˛l 1/=2 if f fc
Q. f / D .˛ ˛ / (4)
fc s l f .2˛s 1/=2 if f > fc
Fig. 2 (a) Power spectrum of a time series with N D 214 data points obtained by multiplying a
white noise in the frequency domain times the function Q. f / in Eq. (4) with exponents ˛large
˛l D 1:7 and ˛small ˛s D 0:6, and with a crossover frequency fc D tc1 D 0:005. (b) Time
series x.t/ obtained by Inverse Fourier Transform of the frequency domain signal shown in (a). (c)
DFA fluctuation function F.n/ of the time series shown in (b) versus the time scale n (measured in
number of data points)
for low frequencies ( f fc ) and f .2˛s 1/ for high frequencies ( f > fc ) matching at
fc (Fig. 2a).
(3) Fourier-transform back the signal into time domain (Fig. 2b). Now, the
resulting time series presents a scaling crossover at a time scale tc , and the DFA
scaling exponents are ˛s and ˛l for short and large time scales, respectively (Fig. 2c).
With the generation method described above, our aim is to study the functional form
of the FPT probability density p.`/ as a function of the crossover scale tc , and also
as a function of the values of the scaling exponents ˛s and ˛l for short and large
time scales, respectively.
FPT Properties of Correlated Time Series with Scaling Crossovers 95
Fig. 3 FPT probability density p.`/ numerically obtained for time series generated with different
combinations of scaling exponents ˛s and ˛l , and for different tc values following the algorithm
introduced in Sect. 3. Every probability density p.`/ is obtained from 1000 realizations of time
series of length N D 223 data points
Our results (see Fig. 3) show that, in general, p.`/ shows a mixed behavior of
two functional forms: at short scales, p.`/ exhibits the profile corresponding to the
expected functional form of the FPT probability density obtained from a single
scaling time series characterized by the exponent ˛s . Conversely, at large scales,
p.`/ behaves as the FPT probability density expected for a single scaling time series
96 P. Carpena et al.
with exponent ˛l . Both functional forms depend on the numerical values of ˛s and
˛l , as expressed in Eq. (2). Mathematically, we find
p˛s .`/ if ` < g.tc /
p.`/ D (5)
p˛l .`/ if ` > g.tc /
In this equation, the symbol ‘=’ means equality in the sense of functional form, and
p˛s .`/ and p˛l .`/ represent the functional forms expected for a single scaling time
series with ˛ D ˛s and ˛ D ˛l , respectively. The function g.tc /, which controls the
transition between both functional forms, is a monotonic function of tc , although its
particular value depends on the range of values of ˛s and ˛l , for which we consider
three different cases:
(i) Case ˛s ; ˛l < 1. According to (5), and noting that for this range of ˛s and ˛l
values stretched exponential forms are expected (2), we observe (Fig. 3a, b)
such double stretched-exponential behavior, and the transition scale g.tc /
between them depends on tc as g.tc / tcc with jcj < 1. When ˛s < ˛l
the exponent c is negative and then the transition displaces to the left as tc
increases, as in the case shown in Fig. 3a, where ˛s D 0:1 > ˛l D 0:9. In
the opposite case ˛s > ˛l , the exponent c is positive and then the transition
displaces to the right as tc increases, as in Fig. 3b, where ˛s D 0:9 > ˛l D 0:1.
Then, as tc increases, the range of validity of p˛s .`/ also increases and p.`/
evolves from near the pure ˛ D 0:1 case (faster decay than exponential) for
very low tc toward the pure ˛ D 0:9 case (slower decay than exponential) for
increasing tc . Note that a perfect exponential decay (expected for ˛ D 0:5)
would appear as a perfect straight line in Fig. 3b.
(ii) Case ˛s ; ˛l > 1. In this case, both p˛s .`/ and p˛l .`/ are decaying power-
laws (2) with exponents 3 ˛s and 3 ˛l respectively, and then p.`/ resembles
this mixed behavior (Fig. 3c, d). The transition scale g.tc / between both power-
laws is particularly simple in this case, since we obtain g.tc / tc , as can be
checked in Fig. 3c d, corresponding respectively to ˛s < ˛l and ˛s > ˛l .
(iii) Case ˛s > 1, ˛l < 1 (or vice versa). In this case, a mixed behavior stretched-
exponential and power-law is expected for p.`/. In the particular case ˛s <
1 and ˛l > 1 (Fig. 3e), p.`/ behaves as a stretched exponential for short `
values and as a decaying power-law of exponent 3 ˛l for large ` values. The
transition scale between both functional forms behaves as g.tc / tc , as can
be observed in Fig. 3e. For the opposite case ˛s > 1 and ˛l < 1 (Fig. 3f), we
observe that p.`/ behaves as a decaying power-law of exponent 3 ˛s for low
` values and as a stretched exponential in the tail, as expected. The transition
scale between both functional forms behaves again as g.tc / tc , as can be
observed in Fig. 3f where anytime tc is doubled, p.`/ increases its range the
same amount in log-scale.
FPT Properties of Correlated Time Series with Scaling Crossovers 97
Fig. 4 Mean FPT value h`i as a function of the crossover time scale tc for the three different cases
discussed in Sect. 4: (a) ˛s ; ˛l < 1. (b) ˛s ; ˛l > 1. (c) ˛s > 1, ˛l < 1 (and viceversa). Every curve
is obtained from 1000 realizations of time series of length N D 223 data points generated by our
algorithm described in Sect. 3
98 P. Carpena et al.
behavior of h`i reflects properly the functional form of p.`/ in time series with
scaling crossovers, (5). When tc ! 1, the scaling exponent of the time series is
essentially ˛l , and then h`i D h`i˛l . In contrast, when tc ! N, the scaling exponent
of the time series is solely ˛s , and thus h`i D h`i˛s . In between both extreme cases,
h`i changes from one limiting value to the other (see Fig. 4) as a function of tc , i.e.,
h`i D h`i.tc /. The functional form of h`i.tc / depends on the particular ˛s and ˛l
values, as can be seen in the numerical results shown in Fig. 4. Nevertheless, h`i.tc /
can be also calculated analytically by assuming a p.`/ form as the one given in (5),
and the analytical results are in perfect agreement with the numerical ones. The
calculations are in general rather cumbersome, and in the Appendix we include the
derivation of the case ˛s ; ˛l > 1.
Our working hypothesis is that correlations are the unifying factor behind the statis-
tical properties of FPTs in time series obtained as the output of complex dynamical
systems, independently of their different nature and their specific properties. Thus,
the theoretical and numerical results we have shown in the two precedent sections
obtained from artificial time series with scaling crossovers should be valid for real
time series exhibiting such scaling behavior. To show this, we choose as our working
example the human postural control system, and in particular, the properties of the
trajectory of the Center of Pressure (CoP) of the postural sway in quiet standing.
This system is known to have a scaling crossover since there exists two competing
dynamical mechanisms acting at different time scales [19, 20].
The data are obtained using a platform equipped with accelerometers which can
record the in-plane trajectory of the CoP of a person placed in quiet standing over the
platform. A typical xy trajectory of the CoP is shown in Fig. 5a. This trajectory can
be decomposed into the x.t/ and y.t/ time series, to study respectively the medio-
lateral and the antero-posterior motions independently. The x.t/ time series of the
CoP trajectory plotted in Fig 5a is shown in Fig. 5b, and it is the time series we
choose to analyze in the rest of this section.
If we apply DFA to this time series, we obtain a fluctuation function F.n/ with
two different power-law scalings at short and large time scales, ˛s D 1:9 and ˛l D
1:2, with a clear scaling crossover at a time scale of about tc 1 s (Fig. 5c). Note
the similarity between this result and the F.n/ function obtained from the artificial
time series generated with our model (Fig. 2c).
In this case, we observe that both ˛s and ˛l are larger than 1, and then the results
should correspond to the case (ii) discussed in Sect. 4. For this range of ˛s and ˛l ,
the FPT probability density p.`/ should behave at short scales as a power-law with
exponent .3 ˛s / D 1:1, and as a power-law with exponent .3 ˛l / D 1:8
FPT Properties of Correlated Time Series with Scaling Crossovers 99
Fig. 5 (a) A typical CoP trajectory recorded from a healthy subject during 7 min of quiet standing.
(b) x.t/ time series extracted from (a). (c) DFA fluctuation function F.n/ obtained for the x.t/ time
series shown in (b). F.n/ indicates power-law scaling behavior with a crossover at a time scale
1 s from exponent ˛s D 1:9 to ˛l D 1:2. (d) FPT cumulative distribution function 1P.`/ obtained
from x.t/ (circles). The pronounced crossover in 1 P.`/ results from the crossover in the scaling
of the fluctuation function F.n/ in (c). Solid lines in (d) represent fits to the two power-law regimes
in 1 P.`/, corresponding to the theoretically expected exponents (2 ˛s and 2 ˛l )
7 Conclusions
Acknowledgements We kindly thank Prof. Antonio M. Lallena, from the University of Granada
(Spain), for providing us with CoP data. We thank the Spanish Government (Grant FIS2012-36282)
and the Spanish Junta de Andalucía (Grant FQM-7964) for financial support. P.Ch.I. acknowledges
support from NIH–NHLBI (Grant no. 1R01HL098437-01A1) and from BSF (Grant No. 2012219).
Appendix
For brevity, we only include here the derivation of the behavior of h`i as a function of
tc for the case ˛s ; ˛l > 1, shown graphically in Fig. 4b. For such case, the functional
forms corresponding to both exponents are power-laws, and the transition between
them occurs at g.tc / tc . Thus, p.`/ is of the form:
(
`.3˛s / if ` tc
p.`/ D k 3˛
tc l .3˛l / (7)
3˛s ` if tc < ` < N
tc
RN
where k is a normalization constant that can be obtained from 1 p.`/ d` D 1, and
the factor tc3˛l =tc3˛s ensures continuity at tc . The mean FPT value is then
˛ 1 3˛l
Z tc s 1 N ˛l 1 tc tc2
N 1˛s C 3˛s
tc .1˛l /
h`i D ` p.`/ d` D ˛ 2 3˛
(8)
tc s 1 N ˛l 2 tc l tc
1
2˛s C 3˛
tc s .2˛l /
This expression is complicated, but noting that 1 < ˛s ; ˛l < 2, we can consider
the limit of large time series length N by keeping only the largest powers of N in
the numerator and denominator of (8). Similarly, and as we are interested in the
behavior of h`i as tc increases, we can keep only the highest powers of tc in the
numerator and denominator of the result. Altogether, we obtain:
2 ˛s ˛l 1 ˛s ˛l
h`i ' N tc (9)
˛l 1
FPT Properties of Correlated Time Series with Scaling Crossovers 101
This equation is in perfect agreement with the numerical results shown in Fig. 4b:
for increasing tc values, h`i changes between its two extreme values as a power-
law of tc with exponent ˛s ˛l (dotted lines in Fig. 4b). Also, Eq. (9) shows the
dependence of h`i on the time series length N in the limit of large N for fixed tc . In
this case, and as tc is finite, the function p.`/ (7) is governed in the range .tc ; 1/
by the scaling exponent ˛l , and this range controls the mean value (h`i N ˛l 1 )
exactly in the same way as in the case of time series with single scaling exponents
with 1 < ˛ < 2 [see Eq. (3)].
References
1. Condamin, S., et al.: First-passage times in complex scale-invariant media. Nature 450, 77–80
(2007)
2. Bunde, A., Havlin, S. (eds.): Fractals in Science, Springer, Heidelberg (1995)
3. Schindler, M., Talkner, P., Hänggi, P.: Firing time statistics for driven neuron models: analytic
expressions versus numerics. Phys. Rev. Lett. 93, 048102 (2004)
4. Bunde, A., et al.: Long-term memory: a natural mechanism for the clustering of extreme events
and anomalous residual times in climate records. Phys. Rev. Lett. 94, 048701 (2005)
5. Reyes-Ramírez, I., Guzmán-Vargas, L.: Scaling properties of excursions in heartbeat dynamics.
Europhys. Lett. 89, 38008 (2010)
6. Leland, W.E., et al.: On the self-similar nature of Ethernet traffic. IEEE ACM Trans. Netw. 2,
1–15 (1994)
7. Cai, S.M., et al.: Scaling and memory in recurrence intervals of Internet traffic. Europhys. Lett.
87, 68001 (2009)
8. Ivanov, P.Ch., et al.: Common scaling patterns in intertrade times of U. S. stocks. Phys. Rev. E
69, 056107 (2004)
9. Wang, F.Z., et al.: Multifactor analysis of multiscaling in volatility return intervals. Phys. Rev.
E 79, 016103 (2009)
10. Ding, M.Z., Yang, W.M.: Distribution of the first return time in fractional Brownian motion
and its application to the study of on-off intermittency. Phys. Rev. E 52, 207–213 (1995)
11. Shlesinger, M.F., Zaslavsky, G.M., Klafter, J.: Strange kinetics. Nature 363, 31–37 (1993)
12. Rangarajan, G., Ding, M.Z.: First passage time distribution for anomalous diffusion. Phys. Lett.
A 273, 322–330 (2000)
13. Khoury, M., et al.: Weak disorder: anomalous transport and diffusion are normal yet again.
Phys. Rev. Lett. 106, 090602 (2011)
14. Eliazar, I., Klafter, J.: From Ornstein-Uhlenbeck dynamics to long-memory processes and
fractional Brownian motion. Phys. Rev. E. 79, 021115 (2009)
15. Carretero-Campos, C., et al.: Phase transitions in the first-passage time of scale-invariant
correlated processes. Phys. Rev. E 85, 011139 (2012)
16. Ivanov, P.Ch.: Scale-invariant aspects of cardiac dynamics. Observing sleep stages and
circadian phases. IEEE Eng. Med. Biol. Mag. 26, 33 (2007)
17. Ivanov, P.Ch., et al.: Levels of complexity in scale-invariant neural signals. Phys. Rev. E 79,
041920 (2009)
18. Ivanov, P.Ch., Yuen, A., Perakakis, P.: Impact of stock market structure on intertrade time and
price dynamics. PLOS One 9, e92885 (2014)
19. Blázquez, M.T., et al.: Study of the human postural control system during quiet standing using
detrended fluctuation analysis. Physica A 388, 1857–1866 (2009)
102 P. Carpena et al.
20. Blázquez, M.T., et al.: On the length of stabilograms: a study performed with detrended
fluctuation analysis. Physica A 391, 4933–4942 (2012)
21. Makse, S., et al.: Method for generating long-range correlations for large systems. Phys. Rev.
E. 53, 5445 (1996)
22. Peng, C.-K., et al.: Mosaic organization of DNA nucleotides. Phys. Rev. E. 49, 1685 (1994)
23. Hu, K., et al: Effect of trends on detrended fluctuation analysis. Phys. Rev. E 64, 011114 (2001)
24. Coronado, A.V., Carpena, P.: Size effects on correlation measures. J. Biol. Phys. 31, 121 (2005)
Part II
Theoretical and Applied Econometrics
The Environmental Impact of Economic Activity
on the Planet
Abstract As the United Nations is currently discussing the post-2015 agenda, this
paper provides an updated quantification of the environmental impact index and its
evolution during the last 50 years. An updated and global environmental impact
index estimate, based on the theoretical model of consumption equations initiated
in the 1970s by Paul Ehrlich and John Holdren, was carried out. Included in the
geographic scope of the study are all countries for which data are published in the
World Bank’s database for the period 1961–2012.
Once the growing evolution of this index was noted, the secondary objectives of
the study were focused on the analysis of the relationship between CO2 emissions,
mortality rate, and green investments, the latter being estimated by the volume
of investment in Research and Development (R&D). In both cases our estimation
showed a positive and statistically significant relationship between CO2 emissions
and the mortality and R&D investments variables.
“Climate change is considered one of the most complex challenges for this century.
No country is immune nor can, by itself, address either the interconnected chal-
lenges or the impressive technological change needed” [1]. In addition, international
agencies make it clear that it is developing countries that will bear the brunt;
countries which must simultaneously cope with efforts to overcome poverty and
promote economic growth. Therefore, a high degree of creativity and cooperation,
a “climate-smart approach”, is needed.
2 Methodology
A first estimate was made to test what is actually happening in the world with regard
to the economic and social impact caused by pollutant emissions. The Ehrlich and
Holdren index was calculated for each country in the world, using data provided by
the World Bank (1961–2012). This index was then compared, country by country,
with mortality rates and R&D investments.
The Ehrlich and Holdren index [6, 7] offers a way of connecting environmental
problems and their root causes. The equation describes the relationship between
population, consumption, and environmental impact in approximate terms as the
following:
Thus:
When we use these variables to calculate the impact factor, after eliminating
common elements in the numerators and denominators of the factors, we note that
the result of the index is not greater than the total CO2 emissions, although it is
broken down into various determinants. Therefore, in developing countries, the size
of the population and the resulting degradation of potentially renewable resources
are often the most decisive factors. However, in developed countries, the main
components are the high level of resource utilisation and the pollution generated.
Thus, by presenting the index in this way comparisons can be made between more
and less developed countries, in order to discover the root causes of environmental
problems [8, 9].
On the other hand, according to Amartya Sen [10], mortality rates give the
best picture of health and disease levels in a population. We have also employed
R&D investments in our comparison to reflect the efforts made by economic agents
to improve the situation, by putting technological innovations at the service of
sustainability.
We are conscious that other variables could have been included to study this
phenomenon in more detail. Further research will engage an enlargement of our
model.
In our research, a first econometric estimation was made through panel data
techniques, because this technique allows us to deal with two-dimensional (cross
sectional/times) series.
As Baltagi [11] explains, these models have some advantages over cross-
sectional models or time series, because there is no limit to the heterogeneity of the
data within them. More informative data is provided and there is a greater suitability
for studying the dynamics of change, which is better for detecting and measuring
certain effects. This allows more complex models of behaviour to be studied and
minimises the bias resulting from the use of total aggregate data. The model utilised
in this case would be the following:
The estimate would depend on the assumptions made about the error term.
Firstly, we might consider that the coefficients of the slopes of the ˇ variables
are constant for all the regressions calculated for each country, but the independent
coefficients, or the intersection, vary for each of these populations, with the subscript
being variable. The model would be the following:
This regression model is called fixed effects or least squares dummy variable.
108 J.A. Martín Segura et al.
The error term ! it would consist of two components, "i would be the individual
specific error component and it the component already discussed, which combines
the time series and cross track error components. Hence the name of the error
components model, because the model error term has two components (more
components can also be considered, for example, the temporal component).
Of these two methods, in accordance with the data we have, the most suitable
for the objectives sought is the fixed effects method. In this, as we have said, the
coefficients of the regression equation remain fixed for all countries, but the constant
terms are different for each. These different constant terms represent the difference
in each country, at the time of addressing the issue studied. The reason for using
this calculation procedure, and not the random effects model, and without biasing
the test performed is to ensure statistical goodness of fit. We can specify that we are
working with all countries, not just a sample.
The data have been collected over time (1961–2012) and for the same individuals
(205 countries). In summary, regarding our approach, we chose the fixed effect
model because, in our opinion, this is the most adequate when interest is only
focussed on drawing inferences about the examined individuals (countries). This
approach assumes that there are unique attributes of individuals (countries) that are
not the results of random variation and that do not vary over time.
3 Results
see that the difference in the index values is due to the high level of emissions per
unit of GDP in Russia (Fig. 1).
The above graph shows the global evolution of each of the components of the
index calculated. In this case the key element for explaining the increase of CO2
emissions is the growth of per capita GDP. The population has grown but well
below the index pace. The ratio of intensity of emissions (CO2 /GDP) has declined,
showing greater energy efficiency. Each additional unit of GDP generates less CO2 .
Finally, the ratio that grows at the same rate as the index is the GDP/PC. This implies
that higher income levels mean more consumption and more production, with the
resulting increase in emissions.
We made two fixed effects econometric estimates, into sections and years. In
the first case (Table 1), we estimate the effect caused by the environmental impact
index on the mortality rate, which, as previously stated, is the variable that gives
the best view of health and disease levels in a population, as indicators of human
development. In the second estimate (Table 2), we analyse the effect of the impact
index on the sum total of R&D investments (without detailing, for the moment, what
-1
-2
-3
60 65 70 75 80 85 90 95 00 05 10 15
Fig. 1 Evolution of each of the components of the index calculated. Note: Total CO2 emis-
sions was obtained in http://data.worldbank.org/indicator/EN.ATM.CO2E.PC/countries?display=
default. GDP was obtained in http://data.worldbank.org/indicator/NY.GDP.PCAP.CD. POPULA-
TION was obtained in http://data.worldbank.org/indicator/SP.POP.TOTL
4 Conclusions
The main conclusion reached in this study is that the rate of environmental
impact has maintained an upward trend from the 1960s until today. Although this
development is decreasing in relation to GDP, in absolute terms its increase has
not stopped for over 50 years. This growth is particularly strong in the last decade
and has peaked in recent years, coinciding with the period of economic crisis and
stronger growth in emerging economies.
However, some progress is evident. During the study period CO2 emissions per
unit of production have been reduced, and the relationship between these emissions
and the total population has remained constant. At the same time, it is important to
note that the growth of investment in R&D has remained similar to the growth of
the impact index.
The influence of environmental deterioration on mortality rates has been positive
and statistically significant. However, although the statistical relationship has proved
to be meaningful and not spurious, it is obvious that this relationship does not
have a high intensity (small coefficient) and the coefficients of countries show high
variability depending on their level of development. Regardless of the statistical
consistency of this relationship, epidemiological studies show clear evidence of the
importance of this relationship. Nevertheless, further research is needed to break
down this relationship according to the level of development of countries.
Finally, according to international agencies two lines of action should be high-
lighted. We believe that they will be crucial for a successful strategy for combatting
climate change and environmental degradation. These lines are green investments
and socially responsible policies in the business world. There are significant
opportunities for businesses in green investments, in areas such as infrastructure,
education, health and new technologies related to renewable energies.
References
5. UNDP. Human Development Report 2013. The Rise of the South: Human Progress in a Diverse
World. United Nations Development Programme. UN Plaza, New York. http://hdr.undp.org/
sites/default/files/reports/14/hdr2013_en_complete.pdf (2013)
6. Miller, G.T.: Introducción a la ciencia ambiental: desarrollo sostenible de la tierra [Introduction
to environmental science: sustainable development of the earth]. Thomson, Madrid (2002)
7. Ehrlich, P.R., Holdren, J.P.: Impact of population growth. Science 171(3977), 1212–1217
(1971). doi:10.1126/science.171.3977.1212
8. Chertow, M.R.: The IPAT equation and its variants. J. Ind. Ecol. 4(4), 13–29 (2000).
doi:10.1162/10881980052541927
9. European Commission. Environmental Impact of Products (EIPRO). Analysis of the life cycle
environmental impacts related to the final consumption of the EU-25. European Commission
Joint Research Centre (DG JRC) Institute for Prospective Technological Studies. http://ftp.jrc.
es/EURdoc/eur22284en.pdf (2006)
10. Sen, A., Kliksberg, B.: Primero la Gente [People First]. Deusto (2007)
11. Baltagi, B.H.: Econometric Analysis of Panel Data, 4th edn. Wiley, Chichester (2009)
Stock Indices in Emerging and Consolidated
Economies from a Fractal Perspective
1 Introduction
A stock index is a mathematical weighted sum of some values that are listed in
the same market. Since its creation in 1884, the stock indices have been used as
an economic and financial activity measurement of a particular trade sector or a
country.
In this paper we wish to inquire about the statistical and fractal structure of the
daily closing prices of five international stock indices (Eurostoxx 50, Ibovespa,
Nikkei 225, Sensex, and Standard & Poor’s 500, of Europe, Brazil, Japan, India,
and USA, respectively) through the years 2000–2014, using two different types of
procedures. The stock data are freely available in the web Yahoo Finance [10]. We
chose these selectives because they are a good representation of the trends of the
market, since they collect and reflect the behaviors and performances of new and
old economies. We wish to check if they have a similar statistical and self-affine
structure (Figs. 1 and 2).
Since the founding work of L. Bachelier in this thesis “La théorie de la spécu-
lation,” the random walk model has been present in almost all the mathematical
approaches to the economic series.
Our goal is to check the hypothesis stating that these time data are Brownian
motions. To this end we compute the following parameters: Hurst scalar, fractal
dimension, and exponent of colored noise.
5500
5000
4500
4000
3500
3000
2500
2000
25 000
20 000
15 000
10 000
5000
0
2000 2005 2010 2015
In general it is said that the economical series are well represented by colored noises,
and we wished to test this hypothesis. A variable of this type satisfies an exponential
power law:
S. f / ' kf exp ;
116 M.A. Navascués et al.
where f is the frequency and S. f / is the spectral power. In our case we compute
discrete powers corresponding to discrete frequencies (mf0 ), where f0 D 2=T is
the fundamental frequency and m D 1; 2; : : : (T is the length of the recording).
A logarithmical regression of the variables provides the exponent as slope of the
fitting.
For it we construct first a truncated trigonometric series in order to fit the data [5].
Since the Fourier methods are suitable for variable of stationary type, we subtracted
previously the values on the regression line of the record. We analyzed the index
year by year and obtained an analytical formula for every period, as sum of the
linear part and the spectral series. We compute in this way a discrete power spectrum
which describes numerically the great cycles of the index. We performed a graphical
test in order to choose the number of terms for the truncate sum. We find that 52
summands are enough for the representation of the yearly data, that corresponds to
the inclusion of the cycles of weekly length. The formula is almost interpolatory.
The harmonics of the series allow us to obtain the spectral powers. These quantities
enable a numerical comparison between different indicators, for instance. In this
case we used the method to inquire into the mathematical structure of the quoted
indices. The numerical procedure is described in the reference [5] (Fig. 2).
In Table 1 the exponents computed for the years listed are presented. The mean
values obtained on the period are: 1:95, 1:94, 1:99, 1:93, and 1:97 for Eurostoxx
50, Ibovespa, Nikkei 225, Standard & Poor’s 500, and Sensex, respectively, with
standard deviations 0:16, 0:22, 0:20, 0:13:, and 0:17. The results suggest fairly a
structure close to a red noise, whose exponent is 2. The correlations obtained in
their computation are about 0:8.
For stock indices, the Hurst exponent H is interpreted as a measure of the trend
of the index. A value such that 0 < H < 0:5 suggests anti-persistence, and the
values 0:5 < H < 1 give evidence of a persistent series. Thus, the economical
interpretation is that a value lower than 0:5 points to high volatility, that is to say,
changes more frequent and intense, meanwhile H > 0:5 shows a more defined
tendency. By means of the Hurst parameter, one can deduce whether the record
admits a model of fractional Brownian motion [3].
It is said that the Brownian motion is a good model for experimental series and, in
particular, for economic historical data. The fractional (or fractal) Brownian motions
(fBm) were studied by Mandelbrot (see for instance [2, 3]). They are random
functions containing both independence and asymptotic dependence and admit the
possibility of a long-term autocorrelation. Another characteristic feature of fBm’s is
the self-similarity [3]. In words of Mandelbrot and Van Ness [3]: “fBm falls outside
the usual dichotomy between causal trends and random perturbation.” The fractional
Brownian motion is generally associated with a spectral density proportional to
1=f 2HC1 ; where f is the frequency. For H D 1=2 one has an 1=f 2 noise (Brownian
or red).
A fractional Brownian motion with Hurst exponent H, BH .t; !/, is characterized
by the following properties:
1. BH .t; !/ has almost all sample paths continuous (when t varies in a compact
interval I).
2. Almost all trajectories are Hölder continuous for any exponent ˇ < H, that is to
say, for each such path, there exists a constant c such that
3. With probability one, the graph of BH .t; !/ has both Hausdorff and box
dimension equal to 2 H:
4. If H D 12 , BH .t; !/ is an ordinary Brownian function (or Wiener process). In this
case the increments in disjoint intervals are independent.
5. The increments of BH .t; !/ are stationary and self-similar, in fact
fBH .t0 C T; !/ BH .t0 ; !/g ≈ fhH .BH .t0 C hT; !/ BH .t0 ; !//g;
Regarding Sensex, the maximum is set to 0:575 in 2003, which represents the
global highest exponent recorded. A minimum of the Hurst exponent is found in
2009 with value 0:455. The range of absolute variation is 0:12, a 20:87 % with
respect to the peak value of the period.
In 2008, beginning of the financial crisis, a general drop occurs in the Hurst
exponent of the indices, reaching the absolute minimum of the period in the Western
countries. The fall on Eastern countries is lower, pointing to less influence of the
financial crisis on these selectives.
Table 3 displays the average Hurst parameters of the indices in the periods 2000–
2007 (pre-crisis) and 2008–2014 (crisis). These exponents increase very slightly in
the second term, except in the Indian case, but the number of samples is insufficient
(and the difference too small) to perform statistical tests in order to check the
increment. The variability is higher in the second period too, with the exception
of Sensex.
Table 4 shows the annual fractal dimensions computed for the years considered.
The average values obtained are: 1:55, 1:54, 1:52, 1:56, and 1:50, for Eurostoxx
50, Ibovespa, Nikkei 225, Standard & Poor’s 500, and Sensex respectively, with
Table 3 Mean and standard deviation of the Hurst parameter of the indices in the periods pre-crisis
and crisis
Period Date Eurostoxx Ibovespa Nikkei S&P Sensex
2000–2007 Mean 0:445 0:462 0:483 0:441 0:508
2000–2007 Stand. dev. 0:029 0:034 0:033 0:035 0:039
2008–2014 Mean 0:452 0:466 0:485 0:448 0:493
2008–2014 Stand. dev. 0:038 0:037 0:053 0:053 0:032
standard deviations 0:03, 0:03, 0:04, 0:04, and 0:03.The typical fractal dimension of
a Brownian motion is 1:5. The highest fractal dimension (1:648) is recorded in the
American index during the year 2008 (outbreak of the crisis). The second maximum
is European (1.610).
4 Conclusions
The fractal tests in the stock records analyzed in the period 2000–2014 provide a
variety of outcomes which we summarize below.
– The numerical results present a great uniformity. Nevertheless, the p-values
provided by the statistical test support, at 95 % confidence level, the numerical
differences in the Indian index with respect to S& P, Europe, and Brazil, and
those of Japan with respect to Eurostoxx and S&P.
Are the Stock Market Indices Really Brownian Motions? 121
2.0
1.8
1.6
1.4
1.2
1.0
2000 2002 2004 2006 2008 2010 2012 2014
Fig. 3 Fractal dimensions of Eurostoxx (squares) and Sensex (dots) over the period considered
– The fractal dimensions of India and Japan are slightly lower than the rest of the
indices, pointing to a bit less complex pattern of the records (see Fig. 3).
– The year 2008 (beginning of the crisis) records a global minimum of the Hurst
exponent in the Western countries. The drop of this oscillator is less evident in
the Eastern economies, pointing to a more reduced (or delayed) influence of the
financial crisis. The location of the maxima is not so uniform and moves over the
period.
– The mean average of the Hurst exponents is 0:47. The standard deviations are
around 0:03.
– The results obtained from the exponent ˛ are around 1:95, very close to the
characteristic value of a red noise or Brownian motion (˛ D 2), with a typical
deviation of 0:02.
– The correlograms of the different indices, whose tendency to zero is slow, pre-
luded a type of random variable very different of a white noise. The computations
performed confirmed this fact. The stock records may admit a representation
by means of colored noises, in particular of red noise, refined by a model of
fractional Brownian motion quite strict. The numerical results suggest that the
Hurst exponent is a good predictor of changes in the market.
– The Hurst scalar is suitable for the numerical description of this type of econom-
ical signals. The exponent gives a measure of the self-similarity (fractality) of the
data.
In general, we observe a mild anti-persistent behavior in the markets (H < 0:5)
that is slightly weaker in the Eastern economies (mainly in India during the first
122 M.A. Navascués et al.
years of the period). Nevertheless, it is likely that the globalization process will lead
to a greater uniformity.
Concerning the methodology, we think that the first test performed is merely
exploratory. The necessary truncation of the series defined to compute the powers
collects only the macroscopic behavior of the variables and omits the fine self-affine
oscillations. The fractal test is however robust. There is absolutely no doubt that the
tools provided by the Fractal Theory allow us to perform a more precise stochastic
study of the long-term trends of the stocks, and give a higher consistency than the
classical hypotheses.
References
1. Hurst, H.E.: Long-term storage of reservoirs: an experimental study. Trans. Am. Soc. Civ. Eng.
116, 770–799 (1951)
2. Mandelbrot, B.B., Hudson, R.L.: The (Mis)Behavior of Markets: A Fractal View of Risk, Ruin
and Reward. Basic Books, New York (2004)
3. Mandelbrot, B.B., Ness, J.V.: Fractional Brownian motions, fractional noises and applications.
SIAM Rev. 10, 422–437 (1968)
4. Navascués, M.A., Sebastián, M.V., Blasco, N.: Fractality tests for the reference index of the
Spanish stock market (IBEX 35). Monog. Sem. Mat. García de Galdeano 39, 207–214 (2014)
5. Navascués, M.A., Sebastián, M.V., Ruiz, C., Iso, J.M.: A numerical power spectrum for
electroencephalographic processing. Math. Methods Appl. Sci. (2015) (Published on-line
doi:10.1002/mma.3343)
6. Navascués, M.A., Sebastián, M.V., Campos, C., Latorre, M., Iso, J.M., Ruiz, C.: Random and
fractal models for IBEX 35. In: Proceedings of International Conference on Stochastic and
Computational Finance: From Academia to Industry, pp. 129–134 (2015)
7. Peters, E.E.: Chaos and Order in the Capital Market: A New View of Cycles, Prices and Market
Volatility. Wiley, New York (1994)
8. Peters, E.E., Peters, D.: Fractal Market Analysis: Applying Chaos Theory to Investment and
Economics. Wiley, New York (1994)
9. Rasheed, K., Qian, B.: Hurst exponent and financial market predictability. In: IASTED
Conference on Financial Engineering and Applications (FEA 2004), pp. 203–209 (2004)
10. Yahoo Finance. http://finance.yahoo.com/ (2015)
Value at Risk with Filtered Historical Simulation
Abstract In this paper we study the properties of estimates of the Value at Risk
(VaR) using the historical simulation method. Historical simulation (HS) method
is widely used method in many large financial institutions as a non-parametric
approach for computing VaR. This paper theoretically and empirically examines
the filtered historical simulation (FHS) method for computing VaR that combines
non-parametric and parametric approach. We use the parametric dynamic models
of return volatility such as GARCH, A-GARCH. We compare FHS VaR with VaR
obtained using historical simulation and parametric VaR.
1 Introduction
The quantification of the potential size of losses and assessing risk levels for
financial instruments (assets, FX rates, interest rates, commodities) or portfolios
composed of them is fundamental in designing risk management and portfolio
strategies [12]. Current methods of evaluation of this risk are based on value-at-
risk (VaR) methodology. VaR determines the maximum expected loss which can be
generated by an asset or a portfolio over a certain holding period with a predeter-
mined probability value. VaR model can be used to evaluate the performance of a
portfolio by providing portfolio managers the tool to determine the most effective
risk management strategy for a given situation [12].
There exist many approaches on how to estimate VaR. An introduction to VaR
methodology is given in [15–18]. A good overview of VaR methods is given in [1,
2, 8, 9], etc. The commonly used techniques include analytic simulation techniques
either parametric or non-parametric [1, 6, 7, 21].
choice of can make the VaR estimates much more responsive to a large loss of
the returns. Exponentially weighted HS makes risk estimates more efficient and
effectively eliminates any ghost effects [9]. For this reason this method is widely
used in commercial banks.
Volatility-weighted historical simulation, introduced by Hull and White [1, 9, 14],
is another generalization of the historical simulation method. This approach was
designed to weight returns by adjusting them from their volatility to the current
volatility. Briefly we can write this approach as follows.
Let the time series of unadjusted historical returns be frt gTtD1 , where T is the time
at the end of the sample, when VaR is estimated. Then the volatility adjusted returns
series at every time t < T is
OT
rQt D rt ; (1)
Ot
where T is fixed and t varies over the sample (t D 1; 2; : : : ; T), O T ; O t are the
estimation of the standard deviation of the returns series in time T; t, respectively
[1, 9]. Actual return in any period t is therefore increased (or decreased), depending
on whether the current forecast of volatility is greater (or less than) as the estimated
volatility for period T.
Last approach gives the filtered historical simulation (FHS). FHS approach was
proposed in a series of papers by Barone-Adesi et al. [4, 5]. Barone-Adesi assumed
that the historical returns are necessary to filter, which means to adjust them to
reflect current information about security risk. FHS combines in case of non-
parametric VaR the benefits of HS with the power and flexibility of conditional
volatility models such as GARCH, A-GARCH [9]. FHS uses a parametric dynamic
model of returns volatility, such as the GARCH, A-GARCH models to simulate log
returns on each day over the risk horizon [1].
The estimation of the 100˛% 1-day VaR of a single asset using FHS can be
obtained as follows [9]:
1. In the first step we fit log return data by an appropriate model (for example,
EWMA, GARCH, A-GARCH).
2. In the second step we use fitted model for forecast volatility for each of the day
in the sample period. These volatility forecasts are then divided into the realized
returns to produce a set of standardized returns that are i.i.d. We can use next
EWMA, GARCH, A-GARCH recursive formula for the variance estimate at time
t for a returns time series rt :
– EWMA variance:
2 2
O t D .1 /rt1 C O t1 I 0<<1 (2)
126 M. Bohdalová and M. Greguš
– GARCH variance:
O t2 D !O C ˛O 2
O rt1 C ˇO O t1
2
I! 0; ˛ 0; ˇ 0; ˛ C ˇ 1: (3)
– AGARCH variance:
O t2 D !O C ˛.
O t1 /
2
C ˇO O t1
2
I! 0; ˛ 0; ˇ 0; 0; ˛ Cˇ 1; (4)
where t D 2; : : : ; T and
1X 2
T
O 12 D r : (5)
T iD1 i
3. In the third step we do bootstrap from the data set of the standardized returns. We
obtain simulated returns, each of which reflects current market conditions due to
scaling by today’s forecast of tomorrow’s volatility using Eq. (1).
4. In the last step we take VaR as 100˛% quantile from these simulated returns.
The problem is how to estimate the coefficients of each model. Risk Metrics
methodology recommends setting decay factor as 0.94 in EWMA volatility
model. GARCH or A-GARCH coefficients can be estimated using maximum-
likelihood estimation (MLE). It is inappropriate to use recursive formulas (3), (4) to
estimate GARCH coefficients directly because computational complexity increases
exponentially with the number of data. If some mathematical software has not the
capability to estimate GARCH or A-GARCH models, it is possible to use our
proposed algorithms. GARCH model is obtained when D 0. The program code is
written in Wolfram Mathematica software.
Program code for the estimation of A-GARCH coefficients: Alpha, Beta, Omega,
Lambda
(* logarithmic returns are in variable LV *)
Mu = Mean[LV];
Rtn = Flatten[{Mu, LV}];
NumD = Length[Rtn];
Module[{
tab =
GcondVarianceTable[Length[Rtn], Alpha, Beta, Omega,
Rtn, Mu, Lambda]},
-0.5 Sum[Module[{gcr = GcondVariance[tab, t, Beta,
Mu, Lambda]},
Log[Sqrt[gcr] + error[t, Rtn, Mu, Lambda]]/gcr]] ,
{t, 2, Length[Rtn]}];
3 Empirical Investigation
In this section we analyze the risk connected with investing in Bitcoin denominated
in US Dollar. Bitcoin as an online payment system was introduced in 2009 for
worldwide users. Bitcoin is a decentralized virtual currency, and no institution (e.g.,
a central bank) takes control of its value [10]. It is publicly designed and everyone
can participate. Bitcoin is based on peer-to-peer technology; managing transactions
and the issuing of bitcoins is performed collectively by the network. Using a lot of
its unique properties, Bitcoin allows fascinating uses which no previous payment
system could embrace.
Subject of this paper is to analyze the risk of the Bitcoin currency denominated
in USD using value at -risk. The Bitcoin fluctuations are measured in terms of
logarithmic returns without dividends (log returns are obtained by formula: rt D
ln Pt = ln Pt1 ; t D 1; : : : ; T, where Pt are 24 h average prices recorded in time t).
Sample period for BTC/USD began on July 17, 2010, and continues up to September
2, 2015, and comprises 1854 daily prices including weekends and holidays. It
is because Bitcoin allows any bank, business, or individual to securely send and
receive payments anywhere at any time.
Our sample period covers the entire daily history accessible from the global
financial portal Quandl Ltd., (n. d.) [20]. All results were obtained using Wolfram
Mathematica program code written by the authors. The virtual currency Bitcoin
recorded several major changes during the analyzed period. The time series of the
24 h average prices and their logarithmic returns are shown in Fig. 1. The historical
128 M. Bohdalová and M. Greguš
Fig. 1 BTC/USD 24 h average prices (left) and log returns (right). Sample Range: 17/07/2010–
02/09/2015
Fig. 2 BTC/USD 24 h average prices and log returns. Sample Range: 11/06/2013–02/09/2015
minimum of the log return was recorded on December 6, 2013, and historical
maximum of the log return was recorded on February 1, 2011. Last significant
decrease happened on January 14, 2015. We see a small peak in the very right part
of the graph on the left. This part of the graph approximately corresponds to June,
July, and August 2015. This peak highly probably relates to the recent crisis in
Greece, and this peak may be caused by the fact that more and more Greeks may
have started to use Bitcoins as their banks were closed. This may be especially true
about the maximum part of the peak, which occurred in July, because as we know,
this is exactly the time when the Greek banks were closed (see Fig. 1). When the
banks opened, 24 h average prices of Bitcoin started to fall, which is quite natural
as Greeks may have started to believe more in their bank system, and as a result,
they probably purchased fewer bitcoins than during the time when their banks were
closed. Volatility clustering with presence of heteroscedasticity is visible on the log
returns in Fig. 1 on the right. For better understanding of the last history we have
selected another sample which began on June 11, 2013, and ended on September
02, 2015 (see Fig. 2).
Table 1 presents descriptive statistics on the log returns on Bitcoin currency in
USD. All statistics except the skewness are highly significant unequal to zero at
significancy level 5 % (absolute values of the test statistics are greater than quantile
Value at Risk with Filtered Historical Simulation 129
z0:975 D 1:96 of the normal distribution). The mean daily log return of the Bitcoin
was 0.45 %. The standard deviation was 5.93 % per day.
The squared log returns are significantly positive serial autocorrelated over the
long lags for the full sample and for the sample covering the last 2 years (see Fig. 3).
It means that heteroscedasticity is present in the analyzed samples.
Andreson–Darling p-value was used for testing the type of the distribution. We
select a set of the distributions—normal, Student’s, normal mixture, and Johnson SU
distribution. Table 2 shows that the full sample data follow Johnson SU distribution,
and the sample of the last 800 days does not follow the normal distribution.
Based on a T D 800 rolling sample, we have generated 1054 out-of-sample
forecasts of the VaR. The parameters of the fitted models are re-estimated each
130 M. Bohdalová and M. Greguš
Fig. 4 5 % and 1 % 1-day FHS VaR for GARCH adjusted log returns
Fig. 5 5 % and 1 % 1-day Normal VaR for GARCH adjusted log returns
trading day to calculate the 95 % and 99 % 1-day VaR. We have used GARCH and
EWMA models as fitting models, which were estimated in Wolfram Mathematica
software using our procedure. A-GARCH model is not suitable because there is no
skewness in our data. We have used especially GARCH filtered returns for which we
have chosen several methods: FHS VaR, normal VaR, Student’s t VaR, and Johnson
SU VaR. We compare all of these methods with real log returns and with historical
simulations VaR. Figure 4 compares FHS VaR with HS 5 %, 1 % 1-day VaR. We
can see that historical VaR does not respond very much to volatility distortions.
FHS predicted more precisely the risks connected with investing into bitcoins.
Comparing Normal VaR (Fig. 5), Student’s t VaR (Fig. 6), and Johnson SU VaR
(Fig. 7), the Johnson SU VaR is the best at recognizing the most extreme distortions,
and Student’s t VaR gives the lowest estimates of the risks. It was not possible to
calculate 1 % 1-day Johnson SU VaR because there were too few overruns of real
losses. Figure 8 shows the estimation of the VaR for EWMA adjusted returns. We
can say that this VaR estimation is the worst because it does not recognize the finest
distortions well.
Value at Risk with Filtered Historical Simulation 131
Fig. 6 5 % and 1 % 1-day Student’s VaR for GARCH adjusted log returns
Fig. 7 5 % 1-day for Johnson SU VaR for GARCH adjusted log returns
Fig. 8 5 % and 1 % 1-day historical VaR for EWMA adjusted log returns
132 M. Bohdalová and M. Greguš
4 Conclusion
We have introduced FHS method for computing VaR in this paper, and we
applied this method on historical data on bitcoin denominated in USD. We have
discovered that GARCH model was suitable for our data. We have applied historical
simulations and the most widely used parametric VaR method on the adjusted
log returns. The most suitable method was Johnson SU VaR and the second most
suitable method was FHS VaR.
References
1. Alexander, C.: Market Risk Analysis. Chichester, Wiley, New York (2008)
2. Allen, S.: Financial Risk Management. A Practitioner’s Guide to Managing market and Credit
Risk. Wiley, New Jersey (2003)
3. Andersen, T.G., Davis, R.A., Kreiss, J.P., Mikosch, T.: Handbook of Financial Time Series.
Springer, Heidelberg (2009)
4. Barone-Adesi, G., Bourgoin, F., Giannopoulos, K.: Don’t look back. Risk 11, 100–103 (1998)
5. Barone-Adesi, G., Giannopoulos, K., Vosper, L.: VaR without correlations for portfolios of
derivative securities. J. Futur. Mark. 19, 583–602 (1999). Available at http://www.research
gate.net/profile/Kostas_Giannopoulos/publication/230278913_VaR_without_correlations_for_
portfolios_of_derivative_securities/links/0deec529c80e0b8302000000.pdf
6. Barone-Adesi, G., Giannopoulos, K., Vosper, L.: Filtering historical simulation. Backtest anal-
ysis. Mimeo. Universita della Svizzera Italiana, City University Business School, Westminster
Business School and London Clearing House (2000)
7. Bohdalová, M.: A comparison of value-at-risk methods for measurement of the financial risk.
In: E-Leader Prague, June 11–13, 2007, pp. 1–6. CASA, New York (2007). http://www.g-casa.
com/PDF/Bohdalova.pdf
8. Christoffersen, P.: Value-at-Risk Models. In: Andersen, T.G., Davis, R.A., Kreiss, J.P.,
Mikosch, T. (eds.) Handbook of Financial Time Series. Springer, Heidelberg (2009)
9. Dowd, K.: An Introduction to Market Risk Measurement. Wiley, Chichester (2002)
10. Easwaran, S., Dixit, M., Sinha, S.: Bitcoin dynamics: the inverse square law of price
fluctuations and other stylized facts. In: Econophysics and Data Driven Modelling of Market
Dynamics New Economic Windows 2015, pp. 121–128 (2015). http://dx.doi.org/10.1007/978-
3-319-08473-2_4
11. Escanciano, J.C., Pei, P.: Pitfalls in backtesting historical simulation VaR models. J. Bank.
Financ. 36, 2233–2244 (2012). http://dx.doi.org/10.1016/j.jbankfin.2012.04.004
12. Hammoudeh, S., Santos, P.A., Al-Hassan, A.: Downside risk management and VaR-based
optimal portfolios for precious metals, oil and stocks. N. Am. J. Econ. Finance 25, 318–334
(2013). http://dx.doi.org/10.1016/j.najef.2012.06.012
13. Hendricks, D.: Evaluation of value at risk models using historical data. FRBNY Economic
Policy Review, April 1996, New York (1996)
14. Hull, J., White, A.: Value at risk when daily changes in market variables are not normally
distributed. J. Derivatives 5, 9–19 (1998)
15. Jorion, P.: Value at Risk: The New Benchmark for Managing Financial Risk. McGraw-Hill,
New York (2001)
16. Jorion, P.: Financial Risk Manager Handbook. Wiley, New Jersey (2003)
Value at Risk with Filtered Historical Simulation 133
17. Kuester, K., Mittnik, S., Paolella, M.S.: Value-at-risk prediction: a comparison of alternative
strategies. J. Financ. Econ. 4, 53–89 (2006)
18. Lu, Z., Huang, H., Gerlach, R.: Estimating value at risk: from JP Morgan’s standard-EWMA to
skewed-EWMA forecasting. OME Working Paper No: 01/2010 (2010). http://www.econ.usyd.
edu.au/ome/research/working_papers
19. McNeil, A.J., Frey, R.: Estimation of Tail-Related Risk Measures for Heteroscedastic Financial
Time Series: An Extreme Value Approach. ETH, Zurich (1999). http://www.macs.hw.ac.uk/~
mcneil/ftp/dynamic.pdf
20. Quandl Ltd.: Bitcoin-data. Retrieved from (n. d.). https://www.quandl.com/collections/
markets/bitcoin-data (2015)
Epava, Olomouc (2015)
21. Valenzuela, O., Márquez, L., Pasadas, M., Rojas, I.: Automatic identification of ARIMA
time series by expert systems using paradigms of artificial intelligence. In: Monografias del
Seminario Matemático García de Galdeano, vol. 31, pp. 425–435 (2004)
A SVEC Model to Forecast and Perform
Structural Analysis (Shocks) for the Mexican
Economy, 1985Q1–2014Q4
1 Introduction
Studying the factors that generate fluctuations in economic aggregates is one of the
most important directions of recent macroeconomic literature.
The economy of Mexico is rather unique in that, on the one hand, it has a
natural market with the USA, depending heavily on its neighbor’s total activity and
industrial cycle; on the other, it is an emerging market that suffers from the structural
problems characteristic to such economies.
The objective is to perform structural analysis of the shocks to Mexican GDP for
1985.Q1–2014.Q4. A structural vector error correction (SVEC) model is estimated,
This article is part of the research project Mexico: growth, cycles and labor precariousness, 1980–
2020 (IN302514), DGAPA, UNAM.
The usual disclaimer applies.
E. Loría () • E. Salas
Center for Modeling and Economic Forecasting, School of Economics, UNAM, Mexico City,
Mexico
e-mail: [email protected]; [email protected]
2 Literature Review
Before Nelson and Plosser’s [1] seminal article, macroeconomic variables were
modeled as stochastic processes that evolve around deterministic trends. Today,
it is generally accepted that an economy can be knocked away from its previous
trajectory by a relevant shock, and it may never return to it afterwards [2].
Macroeconomic series are often of order I(1), requiring the use of cointegration
methodologies [3]. SVECs not only enable an adequate estimation, but they also
allow for the incorporation of long- and short-term restrictions, the analysis of
shocks between variables, and the determination of their temporary or permanent
nature.
Two general examples of this are the work of Rudzkis and Kvedaras [4] for
Lithuania, which applies the results of the model to make forecasts; and of Lanteri
[5], which studies as a whole the structural shocks of the economy of Argentina.
The literature review revealed a strong debate regarding the efficiency and tem-
porality of real effects of monetary policy. Assenmacher-Wesche [6] studies the
transmission mechanisms of monetary policy in Switzerland after the modifications
introduced in the year 2000. Bernanke et al. [7] find that shocks generated by
monetary policy in the USA can explain around 20 % of the variation of output.
A SVEC Model to Forecast and Perform Structural Analysis (Shocks) for. . . 137
2.1.2 Unemployment
Rodrik [10], Rapetti et al. [12], and Razmi et al. [13] demonstrate empirically
that a depreciated real exchange rate is fundamental to explaining the growth of
developing countries.
Ibarra [14] demonstrates that real appreciation of the Mexican Peso has weak-
ened the country’s economic growth in the long term by reducing profit margins,
and thus diminishing investment in the tradable goods sector.
Several authors with different methodologies and periods of study such as Torres
and Vela [15] and Chiquiar and Ramos-Francia [16] have demonstrated empirically
that the industrial structure of Mexico has linked itself systemically and progres-
sively to its US counterpart.
All of the above suggests that the range of effects and reactions is very wide,
and reflects the existence of a considerable volume of literature that supports our
system. Here we can conclude that, although the information system appears at first
to be relatively small (five variables), the range of effects that might be studied and
of hypotheses to be proved is very broad.
3 Stylized Facts
1
Variables in lower case represent logarithms.
138 E. Loría and E. Salas
Q U Y
-0.8 8 14.6
-1.0 6 14.4
14.2
-1.2 4
10
30 14.0
2 .2
-1.4
5 20 13.8
0 .1
-1.6 10
0
0 .0
-5 -.1
-10
YUS M2
4.8 23.0
22.5
4.6
22.0
4.4
21.5
.8 4.2 21.0
.4
.4 20.5
4.0 .2
.0 .0
-.4 -.2
-.8 -.4
88 90 92 94 96 98 00 02 04 06 08 10 12 88 90 92 94 96 98 00 02 04 06 08 10 12
and the high partial correlation (Table 1). Multicollinearity reduces the efficiency
of the estimators and generates several inference problems, such as signs that are
opposite to the theoretical relationship, and perturbations in Granger causality and
in the detection of weak exogeneity. The above translates into obstacles for correct
estimation of the shocks.
First, the complete model was estimated and simulated in order to test its
historical replication capability. Then, it was divided into two submodels, which
are nested to free them of multicollinearity and to subsequently extract the relevant
macroeconomic effects of the shocks correctly.
4 Econometric Issues
4.1 Estimation
Considering the statistical significance measures and the values of the error
correction mechanisms, the unrestricted model suggests that the weak exogeneity
criterion for q is not met, which indicates that it should be modeled itself specifically
[19]. Table 2 shows that ˛ 15 is significant and falls in the correct range (1, 0). This
puts into question the weak exogeneity criterion for q. However, both by theory and
by the above-mentioned economic policy considerations, it is very difficult to argue
that q is endogenous.
Lastly, the historical simulation is very satisfactory for all five variables, par-
ticularly for y, which validates the normalization on this variable and is a further
indicator of correct specification (Fig. 1).
Chow tests on Table 3 also reject the structural break hypothesis. In summary,
even though the unrestricted (i.e., complete) model has issues due to overparameter-
ization, which yields multicollinearity problems,2 it can be considered an adequate
2
This is not a problem for forecasting purposes [17, 20, 21] given the perpetuation of a
stable dependency relationship between variables, and the perpetuation of stable interdependence
relationships within Z [25].
140 E. Loría and E. Salas
statistical model insofar as it reports the existence of weak and strong exogeneity3
and super exogeneity (i.e., stability of the model). The model thus fulfills the
objectives of econometric analysis: (a) elasticity analysis, (b) forecast, and (c) policy
analysis [18, 22].
Xp1 1
0
„ D ˇ? ˛? Ik i ˇ? ˛? : (3)
iD1
The SVEC model can be used to identify the shocks to be traced in an impulse-
response analysis by imposing restrictions on the matrix of long-run shocks and the
matrix B of contemporaneous effects of the shocks.
The long-run effects of the " shocks are given by rk .„/ D K r and hence, „B
has rank K r. Hence, there can be at most r shocks with transitory effects (zero
long-run impact) and at least k D K r shocks have permanent effects. Due to the
reduced rank of the matrix, each column of zeros stands for only k* independent
restrictions. k .k 1/ =2 additional restrictions are needed to exactly identify
the permanent shocks and r .r 1/ =2 additional contemporaneous restrictions are
needed to exactly identify the transitory shocks. Estimation is done by maximum
likelihood using the reduced form.
3
Which follows from Granger causality.
4
What follows is based on Lütkepohl and Krätzig [23, p. 168].
A SVEC Model to Forecast and Perform Structural Analysis (Shocks) for. . . 141
To fulfill the second objective of the article, which is to analyze the structural shocks
that underlie the information system, it was necessary to develop two submodels
from the original set, eliminating the above-mentioned colinearity issues. Both
submodels are exactly identified with three restrictions: one short-term restriction,
given that there is only one cointegration vector in each submodel, and two long-
term restrictions. All confidence intervals were constructed using Hall’s percentile
method at 95 % with 100 replications.
The first effect to be proved is the non-neutrality of monetary policy. New Keynesian
theory [7] has accepted from its birth that there are temporary, short-term effects.
This is the foundation of the unconventional monetary policy that has been intensely
applied since the crisis of 2009 to prevent further drops in output and prices. For this
reason, the following restrictions were applied to matrices B and „B (Table 4).
Figure 2 corroborates the above hypothesis, in that it shows a positive, intermit-
tently limited effect of m2 on y, which fades away after quarter number 16. Variance
analysis in Fig. 2 shows that, after the second period, the effect of the shock of
m2 is negligible, as it accounts for no more than 5 % of the variance, and quickly
dissipates afterwards.
0.02 20%
0.015 15%
0.01 10%
0.005 5%
0 0%
0 4 8 12 16 1 3 5 7 9 11 13 15
-0.005
Fig. 2 Impulse response: M2 ! y (left panel) and the effect of M2 in variance decomposition of
y (right panel)
142 E. Loría and E. Salas
0 0 0
2
4
6
8
10
12
14
16
0
2
4
6
8
10
12
14
16
-0.01
-0.2
-0.02
-0.4
-0.03
-0.6 -0.04
-0.05
-0.8
0.8
0.6
0.4
0.2
0
0 2 4 6 8 10 12 14 16 18 20
The second effect to be proved within the submodel Z1 is Okun’s Law [24] and
the hysteresis hypothesis of unemployment. The former suggests a negative and
bidirectional relationship between unemployment and the growth of output.
Figure 3 shows clearly the permanent effect of Okun’s Law. Finally, Fig. 4
reflects the presence of hysteresis: once unemployment increases (is shocked), it
does not return to its original level. After ten quarters, it stabilizes at twice the
original level.
Since the introduction of NAFTA, Mexico has been an open and integrated
economy, particularly to US industries, as mentioned above. As was done in the
previous section, three restrictions were defined, yielding an exactly identified
model (Table 5).
Figure 5 (left panel) noticeably reflects the immediate, positive, significant, and
permanent effect of the US industrial output shock. The effect of the real exchange
rate on output is clearly expansionary and permanent (Fig. 5, right panel).
A SVEC Model to Forecast and Perform Structural Analysis (Shocks) for. . . 143
0.02 0.04
0.015 0.03
0.01 0.02
0.005 0.01
0 0
0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20
-0.01
5 Conclusion
Given that forecasting was one of the initial and main objectives of the model,
multicollinearity issues arose. In order to perform structural analysis, this problem
was solved by dividing the cointegration space into two submodels with different
variable sets, and therefore two groups of domestic and external shocks that follow
from what the literature considers to be relevant variables.
This last characteristic of the SVEC technique (i.e., the ability to evaluate shocks)
is widely used to corroborate disputed hypothesis, such as real monetary effects,
hysteresis of unemployment, linkage to the US industrial product, and the exchange
rate as a growth factor.
The summary of the results of this model can be structured based on Rodrik [11]
who, on the basis of New Development theory, proposes the widespread use of an
industrial policy for Mexico, in view of its linkage to the USA, and a high and stable
exchange rate, as a way to increase productivity and thus grow through the external
sector.
Hysteresis of unemployment is one of the most pressing reasons to seek high
growth rates. This is because, according to Okun’s [24] analysis of unemployment,
the unemployment generated by a crisis has long-term effects and, in terms of the
bidirectional relationship, hysteresis of unemployment could generate a drop of the
dynamics of growth, potentially creating a poverty trap.
The crisis of 2009 left profound lessons. One of them was the use of uncon-
ventional, active, and countercyclical monetary policy, mainly from the theoretical
point of view of New Keynesian consensus. The above empirical analysis provides
solid evidence for the use of this tool, particularly in periods of recession or crisis.
144 E. Loría and E. Salas
6 Statistical Appendix
References
1. Nelson, C., Plosser, C.: Trends and random walks in macroeconomic time series: some
evidence and implications. J. Monet. Econ. 10(2), 139–162 (1982)
2. Durlauf, S., Romer, D., Sims, C.: Output persistence, economic structure, and the choice of
stabilization policy. Brookings Papers on Economic Activity, pp. 69–136 (1989)
3. Johansen, S.: Determination of cointegration rank in the presence of a linear trend. Oxf. Bull.
Econ. Stat. 54(3), 383–397 (1992)
4. Rudzkis, R., Kvedaras, V.: A small macroeconometric model of the Lithuanian economy.
Austrian J. Stat. 34(2), 185–197 (2005)
5. Lanteri, L.: Choques externos y fuentes de fluctuaciones macroeconómicas. Una propuesta con
modelos de SVEC para la economía Argentina, Economía Mexicana Nueva Época. Núm. 1.
Primer semestre. CIDE. México (2011)
6. Assenmacher-Wesche, K.: Modeling monetary transmission in Switzerland with a structural
cointegrated VAR model. Swiss J. Econ. Stat. 144(2), 197–246 (2008)
7. Bernanke, B., Gertler, M., Watson, M., Sims, C., Friedman, B.: Systematic monetary policy and
the effects of oil price shocks. Brookings Papers on Economic Activity, pp. 91–157 (1997)
8. Brüggemann, R.: Sources of German unemployment: a structural vector error correction
analysis. Empir. Econ. 31(2), 409–431 (2006)
9. Bukowski, M., Koloch, G., Lewandowski, P.: Shocks and rigidities as determinants of CEE
labor markets’ performance. A panel SVECM approach. http://mpra.ub.uni-muenchen.de/
12429/1/MPRA_paper_12429.pdf (2008)
10. Rodrik, D.: Growth after the crisis. Working Paper 65, Commission on Growth and Develop-
ment, Washington, DC (2005)
A SVEC Model to Forecast and Perform Structural Analysis (Shocks) for. . . 145
11. Rodrik, D.: The real exchange rate and economic growth. Brook. Pap. Econ. Act. 2, 365–412
(2008)
12. Rapetti, M., Skott P., Razmi A.: The real exchange rate and economic growth: are developing
countries different? Working Paper 2011-07. University of Massachusetts Amherst (2011)
13. Razmi, A., Rapetti, M., Scott, P.: The Real Exchange Rate and Economic Development. Struct.
Chang. Econ. Dyn. 23(2), 151–169 (2012)
14. Ibarra, C.: México: la maquila, el desajuste monetario y el crecimiento impulsado por las
exportaciones. Revista Cepal. 104 Agosto (2011)
15. Torres, A., Vela, O.: Trade integration and synchronization between the business cycles of
Mexico and the United States. N. Am. J. Econ. Financ. 14(3), 319–342 (2003)
16. Chiquiar, D., Ramos-Francia, M.: Trade and business-cycle synchronization: evidence from
Mexican and US manufacturing industries. N. Am. J. Econ. Financ. 16(2), 187–216 (2005)
17. Kennedy, P.: A Guide to Econometrics, 6th edn. Blackwell, Oxford (2008)
18. Hendry, D.: The econometrics of macroeconomic forecasting. Econ. J. 107(444), 1330–1357
(1997)
19. Johansen, S.: Testing weak exogeneity and the order of cointegration in UK money demand
data. J. Policy Model 14(3), 313–334 (1992)
20. Conlisk, J.: When collinearity is desirable. West. Econ. J. 9, 393–407 (1971)
21. Blanchard, O.: Comment. J. Bus. Econ. Stat. 5, 449–451 (1987)
22. Charemza, W., Deadman, D.: New Directions in Econometric Practice: General to Specific
Modelling, Cointegration and Vector Autoregression. E. Elgar, Aldershot (1992)
23. Lütkepohl, H., Krätzig, M.: Applied Time Series Econometrics. Cambridge University Press,
Cambridge (2004)
24. Okun, A.: Potential GNP: its measurement and significance. In: Proceedings of the Business
and Economic Statistics Section, pp. 98–104. American Statistical Association, Alexandria
(1962)
25. Farrar, D., Glauber, R.: Multicollinearity in regression analysis: the problem revisited. Rev.
Econ. Stat. 49, 92–107 (1967)
Intraday Data vs Daily Data to Forecast
Volatility in Financial Markets
Keywords Bayesian estimation • Big data • Intraday data • Markov chain Monte
Carlo • Particle filter • Realized volatility • Stochastic volatility
1 Introduction
has been trying to establish a link between the realized volatility measure and the
stochastic volatility model [10, 11].
The measures of volatility are important in the decision-making process, as
examples we have decisions associated with the portfolio allocation of assets and the
definition of the price of derivative assets. Different kinds of measures of volatility
have been proposed, and one is the realized volatility. This measure is seen as an
estimator of the integrated volatility defined within a theoretical framework that
accommodates the evolution of asset prices through a continuous stochastic process.
We sought to establish if volatility measures obtained through intraday data are
compatible with the ones obtained using data at lower frequencies. They must be
linked because daily observations must correspond to the end points at each day
through the intraday data. However, the difference in the amount of information is
immense. With daily data, less information is available and more structure is needed.
The aim is to verify if the measures of volatility obtained with less information, but
with more structure supplied by a model, are compatible with the ones obtained
using intraday data. The model used is the stochastic volatility (SV) model. In the
case of compatible approaches, it opens space for the development of more robust
models, where different observations at different frequencies can be considered.
The realized volatility measure is compatible with the forecasts obtained through
the SV model, if it is within the bounds defined by the quantiles associated with the
predictive distribution of the states defined by the model.
The next sections present the results as follows. Section 2 presents the basic
results associated with the definition of the realized volatility measure. It is an
estimator of the integrated volatility and calculated using intraday data. The aim is to
compare this measure with the forecasts obtained through the SV model. In Sect. 3
are presented a series of results associated with the parameters’ estimation through
a Bayesian framework with Markov chain Monte Carlo (MCMC) simulations. The
parameters are needed to define the filter distributions. In Sect. 4 are presented
the results for defining the approximation to the filter distribution using particle
filter methods. In Sect. 5 a set of intraday observations to five stocks are used to
demonstrate the evolution of the volatility through two approaches, and the results
are compared to establish the compatibility of both approaches. In Sect. 6 some
conclusions are presented.
Assuming that the evolution of the prices of a given asset P.t/ follows a diffusion
process given by dP.t/ D .t/P.t/dt C .t/P.t/dW.t/, where .t/ is a mean
evolution function, 2 .t/ the variance evolution function, and W.t/ the standard
Brownian motion, from this representation, a measure of interest is the integrated
Intraday Data and Volatility Forecasts 149
volatility given by
Z t
2
IVt D .s/ds : (1)
0
If the evolution of the price is given by the aforementioned diffusion process, for a
period of length t, the volatility for the period is given by (1). As this quantity is not
observable, an estimator was proposed, which is known as the realized volatility.
Considering yt the return at t, that represents the return associated with one period
(a day), and the partition 0 < t1 < t2 < < tn < 1, the intraday returns are given
by yti D pti pti1 , i D 1; : : : ; n, where pti is the log-price at ti . The realized volatility
is given by the sum of the squares of the intraday returns
X
n
RVt D y2ti : (2)
iD1
Important research has been conducted trying to establish if the RVt is a consistent
P
estimator to IVt , RVt ! IVt , i.e., it converges in probability. The statistical
properties of the estimator RVt were extensively analyzed and some of the main
references are [9, 12–14].
The diffusion process considered above for the price can be extended with
a diffusion process for the variance, for example, through a Cox–Ingersoll–
Ross kind of process d log. 2 .t// D .˛ ˇ log. 2 .t/// C log. 2 .t//dZ.t/,
where Z.t/ is a Brownian motion independent from W.t/. The bivariate diffusion
process can be seen as the continuous version of the stochastic volatility model
in discrete
Rt time that we use hereafter, where ahead, D ˛, D eˇ and
t D t1 eˇ.ts/ log. 2 .s//dZ.s/.
The volatility evolution of financial returns is not directly observable, and different
kinds of models have been proposed to estimate the aforementioned evolution. An
example is the SV model proposed by Taylor [3]. A common representation is
given by
˛
t
yt D exp "t ; "t N.0; 1/ (3)
2
˛tC1 D C .˛t / C tC1 ; tC1 N.0; 1/ ; (4)
method of moments. Later, in the seminal paper of Jacquier et al. [15], the authors
show that Bayesian estimation procedures associated with MCMC can be used to
obtain more efficient estimates for the parameters.
The article [15] began a series of important research on the best way to estimate
the model, and for the most cited, see [16–20]. Different approaches have been
proposed, which are associated with different algorithms to obtain samples for
the vector of states. Here, we develop a gaussian approximation which gives high
acceptance rates in the MCMC simulations used in the estimation process.
To estimate the model the marginal posterior distribution of the parameters is
approximated. This can be done by simulating from the distribution of ˛1Wn ; jy1Wn ,
where ˛1Wn D .˛1 ; : : : ; ˛n / and y1Wn D .y1 ; : : : ; yn /. Using Gibbs sampling
the parameters are sampled conditional on the states, j˛1Wn ; y1Wn , and the states
conditional on the parameters, ˛1Wn j; y1Wn . We develop a single-move sampler to
simulate from ˛t j˛nt ; ; y1Wn , where ˛nt D .˛1 ; : : : ; ˛t1 ; ˛tC1 ; ˛n /. Based on a
second order Taylor approximation to the target density gives a gaussian density
as the approximating density. Assuming that at iteration k the sampled elements are
.k/ .k/
.k/ D . .k/ ; .k/ ; .k/ / and ˛ .k/ D .˛1 ; : : : ; ˛n /, at iteration k C 1 the algorithm
proceeds as
.kC1/ .k/
1. Sample from ˛t j˛t1 ; ˛tC1 ; yt ; .k/ ; t D 1; : : : ; n.
2. Sample from ; j .k/ ; ˛ .kC1/ ; y1Wn .
3. Sample from j .kC1/ ; .kC1/ ; ˛ .kC1/ .
To obtain samples for the states in step 1, the algorithm proceeds as follows. The
logarithm of the density function assumes the form
where W is the Lambert function. The second order Taylor approximation of `.˛t /
around ˛t is the log-kernel of a gaussian density with mean ˛t and variance
1 2e˛t 2
s2t D D 2 2 /e˛t
: (7)
` .˛t /
00
yt 2
C 2.1 C
This is the approximating density used to obtain samples for the vector of states.
The approximation is very good and the acceptance rates are very high. However,
even with chains that always move, sometimes they move slowly, and high levels of
autocorrelation are obtained. Due to the simplicity of the sampler, several strategies
may be considered to reduce the levels of autocorrelation and to define more efficient
estimation procedures. The main point to highlight is that, with SV models, gaussian
approximations are straightforward to implement.
The SV model is a state space model where the evolution of the states defines the
evolution of the volatility. Forecasts for the evolution of the states in this setting
require the development of simulation techniques known as Sequential Monte Carlo
(SMC), also referred as particle filter methods [21–27]. The aim is to update the filter
distribution for the states when new information arrive.
Using a model that depends on a set of parameters, all forecasts are conditioned
by the parameters. It is not realistic to assume that the parameters are known, and
the parameters are estimated through Bayesian estimation methods. This constitutes
an approximation, because even if model’s uncertainty is not taken into account, it
can be assumed that the parameters can vary over time.
The quantities of interest are the values of the states governing the evolution
of the volatility, which are propagated to define the predictive density of the
returns, defined here as f .ytC1 jy1Wt /. However, essential to the definition of this
distribution is the filter density associated with the states, f .˛t jy1Wt /. Bayes’ rule
allows us to assert that the posterior density f .˛t jy1Wt / of states is related to
the density f .˛t jy1Wt1 / prior to yt , and the density f .yt j˛t / of yt given ˛t by
f .˛t jy1Wt / / f .yt j˛Rt /f .˛t jy1Wt1 /. The predictive density of ytC1 given y1Wt is defined
by f .ytC1 jy1Wt / D f .ytC1 j˛tC1 /f .˛tC1 jy1Wt /d˛tC1 .
Particle filters approximate the posterior density of interest, f .˛t jy1Wt /, through
a set of m “particles” P f˛t;1 ; : : : ; ˛t;m g and their respective weights ft;1 ; : : : ; t;m g,
where t;j 0 and m jD1 t;j D 1. This procedure must be implemented sequentially
with the states evolving over time to accommodate new information that arrive. It
is difficult to obtain samples from the target density, and an approximating density
is used instead, afterwards the particles are resampled to better approximate the
target density. This is known as the sample importance resampling (SIR) algorithm.
A possible approximating density is given by f .˛t j˛t1 /, however, [27, 28] pointed
152 A.A.F. Santos
out that as a density to approximate f .˛t jy1Wt / is not generally efficient, because
it constitutes a blind proposal that does not take into account the information
contained in yt .
Through SMC with SIR the aim is to update sequentially the filter density for
the states. The optimal importance density is given by f .yt j˛t /f .˛t j˛t1 ; yt /, which
induces importance weights with zero variance. Usually it is not possible to obtain
samples from this density, and an importance density g.˛t /, different from the
optimal density, is used to approximate the target density.
To approximate the filter densities associated with the SV model, [27] considered
the same kind of approximations used to sample the states in a static MCMC setting.
However, the approximations were based on a first order Taylor approximation,
and it was demonstrated by Smith and Santos [29] that they are not robust when
information contained in more extreme observations need to be updated (also called
very informative observations). In [29], a second order Taylor approximation for the
likelihood combined with the predictive density for the states leads to improvements
in the particle filter algorithm. As the auxiliary particle filter in [27], avoids blind
proposals like the ones proposed in [30], takes into account the information in yt ,
and defines a robust approximation for the target density, which also avoids the
degeneracy of the weights.
Here we develop the aforementioned results using a robuster approximation for
the importance density. The logarithm of the density f .yt j˛t /f .˛t j˛t1 /, `.˛t /, is
concave on ˛t , and to maximize the function in order to ˛t , let us consider the first
derivative equal to zero, `0 .˛t / D 0. Solving in order to ˛t the solution is
!
y2t 2
e 2
˛t DW C ; with D .1 / C ˛t1 : (8)
2 2
The second derivative is given by `00 .˛t / D .2e˛t C 2 y2t /=.2 2 e˛t /, which is
strictly negative for all ˛t , so ˛t maximizes the function `.˛t / defining a global
maximum. The second order Taylor expansion of `.˛t / around ˛t defines the log-
kernel of a gaussian density with mean mt D ˛t and variance
2
2 e mt
s2t D 2
: (9)
2emt C y2t
This gaussian density will be used as the importance density in the SIR algorithm.
In the procedures implemented, the estimates of interest were approximated
using particles with equal weights, which means that a resampling step is performed.
Assuming at t 1 a set of m particles ˛t1
m
D f˛t1;1 ; : : : ; ˛t1;m g with associated
Intraday Data and Volatility Forecasts 153
weights 1=m, which approximate the density f .˛t1 jy1Wt1 /, the algorithm proceeds
as follows:
1. Resample m particles from ˛t1 m
, obtaining ˛tm D f˛t;1 ; : : : ; ˛t;m g.
2. For each element of the set, ˛t;i , i D 1; : : : ; m, sample a value from a
gaussian distribution with mean and variance defined by (8) and (9), respectively,
obtaining the set f˛t;1 ; : : : ; ˛t;m g.
3. Calculate the weights,
f .yt j˛t;i /f .˛t;i j˛t1;i / wi
wi D ; i D Pm : (10)
g.˛t;i jmt ; s2t / iD1 wi
4. Resample from the set f˛t;1 ; : : : ; ˛t;m g using the set of weights f1 ; : : : ; m g
obtaining a sample f˛tj1Wt;1 ; : : : ; ˛tj1Wt;m g, where to each particle a weight of 1=m
is associated.
For the one step-ahead volatility forecast, having the approximation to the density
f .˛t jy1Wt /, and due to the structure of the system equation in the SV model, AR(1)
with gaussian noise, it is easy to sample from f .˛tC1 jy1Wt /, the predictive density for
the states.
5 An Empirical Demonstration
We describe the data, and we compute some statistics able to summarize the
information. It is highlighted the differences in the amount of information within
each dataset. Intraday data are used to calculate the measures of realized volatility.
Forecasts of daily volatility are obtained using daily observations through filter
distributions for the states within the SV model. Filter distributions depend on the
parameters’ values, which are estimated before the forecasting. Data observed in
time intervals of equal length are the ones mostly used in econometric models.
However, intraday observations are becoming largely available. Considering only
the prices that correspond to transactions, observations not equally spaced in time
have to be considered. All observations available were used to compute the RV
measures and a comparison is established with the SV forecasts.
Intraday observations were used to five stocks traded in US stock markets. The
companies are The Boeing Company (BA), Bank of America (BAC), General
Electric (GE), International Business Machines Corporation (IBM), and Intel
(INTC). The observations were collected from the publicly available web pages
154 A.A.F. Santos
of the company BATS Global Markets, Inc. from January 30, 2014 to August 27,
2015. The web page address is https://www.batstrading.com, and software written
in Objective-C was used to record the intraday data for the period considered.
The data correspond to more than two millions observations per stock. The second
kind of observations are end-of-day prices from January 29, 2012 to August 27,
2015 obtained through the publicly available dataset supplied by Yahoo! Inc. The
web page address is http://finance.yahoo.com, and the data can be obtained using
publicly available software, written in R or Matlab, or even using the Yahoo Query
Language (YQL).
Through the analysis of Table 1, we can assess some of the main characteristics
associated with the intraday data. The mean of the price is related to the mean of
the volume, and the mean time-span between consecutive transactions is around 5 s.
However in our dataset, we find that simultaneous transactions are very common.
Regarding the RV measure, the mean values are not significantly different between
the stocks, but standard deviations and values for the skewness are more variable.
The distributions of the RVs have positive skewness, and in some days the realized
volatility assumes very high values. The maximum values for all the assets were
obtained on August 24, 2015, when in a few seconds during the day the Down
Jones Index fell more than 1000 points due to the effects of the crisis in the Chinese
stock exchanges.
Before the application of the particle filter methods, which are used to approximate
the filter distributions for the states, the parameters of the model need to be defined,
but it is unrealistic to assume that the parameters are constant over time. We use
a moving window of financial returns as a way of estimating the time-varying
parameters.
The parameters were estimated using the observations prior to the ones used
to update the filter distributions. The first sample corresponds to observations
Intraday Data and Volatility Forecasts 155
from January 29, 2012 to January 28, 2014 (500 observations). The first filter
distribution corresponds to January 29, 2014, which will be used to forecast the
volatility for the next day. After being used to update the filter distribution, the
observation is incorporated in the sample, and the parameters are estimated again.
The first observation of the sample is deleted to maintain the number of observations
constant. Regarding the estimation of the parameters, three choices were made.
First, a sample of 500 observations was used, corresponding approximately to 2
years of daily data. Second, it was assumed that parameters could vary over time.
The parameters are estimated at each iteration of the particle filter. Finally, we are
aware that 500 observations can be too few to estimate the parameters of a nonlinear
state space model, and strong prior distributions for the parameters were assumed.
In the estimation of the SV models using Bayesian methods and MCMC, most
results are associated with the design of efficient samplers for the states. Given the
states, it is fairly straightforward to obtain samples for the parameters. Here we
adopt the prior distributions commonly found in the literature, a gaussian to , a
beta to . C1/=2, and a scaled chi-square to 2 . This is in line with the literature and
used in one of the articles where a very efficient way for estimating the SV model is
presented, see [20]. Due to the samples with a relative small number of observations,
using the same form for the prior densities, we have considered somehow stronger
prior distributions. In Table 2, we present the estimates for the parameters using the
entire sample (January 29, 2012–August 27, 2015) for all the companies considered
in this section. The estimates are in line with the ones commonly found in the
literature, especially the values for the persistence parameter and the values for
the effective sample size (ESS). However, it is not realistic to assume the values of
the parameters are constant over time, which is confirmed by the estimations using
a moving window. Due to space restrictions, we present only the evolution of the
parameters’ estimates associated with the BAC in Fig. 1. It is apparent that it is
reasonable to assume that the parameters vary over time, and at each iteration of the
particle filter algorithm, a renewed set of parameters are used based on the updated
sample.
0.6
0.95 0.4
0.58
0.35
0.56 0.9
0.54
ση
μ
0.3
0.85
0.52
0.25
0.5
0.8
0.48 0.2
0.46 0.75
0.15
0 100 200 300 400 0 100 200 300 400
0 50 100 150 200 250 300 350 400
iteration iteration
iteration
Fig. 1 The figure illustrates the evolution of the mean of the chains for the parameters for SV
model associated with the BAC. With a moving window of 500 observations 398 estimation
processes were applied
The comparisons are made between the RV measures and the evolution of the
predictive distribution of the states associated with the SV model. A Bayesian
approach is adopted to approximate through simulations (particle filter) the filter
distribution for the states. The RV measures are obtained using ultra-high-frequency
data, they are model-free and represent point estimates through time. One the other
hand, the volatility evolution obtained through the SV model is represented by the
evolution of the predictive distribution of the states. This uses less information
(daily returns) and is model-dependent. Because of the differences in the amount
of information and the dependence on a model, there is the possibility that the
volatility evolution maybe different for each approach. In the experiments with the
data, we compare the evolution of the RV measure through time with the evolution
of the quantiles associated with the predictive distribution of the states. To assess
the compatibility, we check if the RV measures are within the bounds defined by the
respective quantiles of the predictive distributions of the states.
When the filter distributions of the states are approximated through the particle
filter, we found out that the results are very sensitive to the parameters used in the
SV model. As described above, we considered time-varying parameters obtained
through a sample obtained using a moving window, and the main adjustments
performed were related to the prior distributions associated with the parameters. For
the parameter , the prior considered was N.0; 0:1/ (BA,BAC,GE,IBM), and
N.0:5; 0:1/ (INTC). To the parameter , beta distributions were considered,
B.a; b/, where b D 2:5 for all stocks, a D 50 (BA,BAC,GE), and a D 80
(IBM,INTC). Finally, for 2 the prior distribution 2 ı21 is considered with
ı D 0:1.
The main results are presented in Table 3 and Fig. 2 (just for the GE due to space
constraints) where the evolution of the RV is compared with the evolution of the
quantiles. In general the results are satisfactory, the predictive distribution of the
states accommodates the evolution of the RV measure. There are some cases where
Intraday Data and Volatility Forecasts 157
1.5
0.5
volatility
-0.5
-1
-1.5
-2
-2.5
50 100 150 200 250 300
time
Fig. 2 This figure depicts, for the stock GE, the evolution of the logarithm of the RV (solid line),
and the respective quantiles, q0:05 and q0:95 (dashed line), and q0:5 (plus sign), associated with the
predictive distribution of the states in the SV model
the adjustment is more pronounced and there is some divergence in others. The most
evident divergence is with the INTC, which is in line with the adjustments that were
needed for the prior distributions of the parameters. In the other cases the adjustment
is better, especially with the BA, BAC, and GE.
6 Conclusion
Acknowledgements I would like to thank the editors of this volume and organizers of the ITISE
2015, Granada, professors Ignacio Rojas and Héctor Pomares, as well as all the participants in the
conference that contributed with their feedback to improve the results here present.
References
22. Carpenter, J., Clifford, P., Fearnhead, P.: Improved particle filter for nonlinear problems. IEE
Proc. Radar Sonar Navig. 146(1), 2–7 (1999)
23. Del Moral, P., Doucet, A., Jasra, A.: Sequential monte carlo samplers. J. R. Stat. Soc. Ser. B
(Stat. Methodol.) 68(3), 411–436 (2006)
24. Doucet, A., Godsill, S., Andrieu, C.: On sequential Monte Carlo sampling methods for
Bayesian filtering. Stat. Comput. 10(3), 197–208 (2000)
25. Fearnhead, P., Wyncoll, D., Tawn, J.: A sequential smoothing algorithm with linear computa-
tional cost. Biometrika 97(2), 447–464 (2010)
26. Godsill, S., Clapp, T.: Improvement strategies for Monte Carlo particle filters. In: Sequential
Monte Carlo Methods in Practice, pp. 139–158. Springer, Berlin (2001)
27. Pitt, M.K., Shephard, N.: Filtering via simulation: auxiliary particle filters. J. Am. Stat. Assoc.
94(446), 590–599 (1999)
28. Pitt, M.K., Shephard, N.: Auxiliary variable based particle filters. In: Sequential Monte Carlo
Methods in Practice, pp. 273–293. Springer, Berlin (2001)
29. Smith, J., Santos, A.A.F.: Second-order filter distribution approximations for financial time
series with extreme outliers. J. Bus. Econ. Stat. 24(3), 329–337 (2006)
30. Gordon, N.J., Salmond, D.J., Smith, A.F.: Novel approach to nonlinear/non-gaussian Bayesian
state estimation. In: IEE Proceedings F (Radar and Signal Processing), IET, vol. 140, pp.
107–113. (1993)
Predictive and Descriptive Qualities of Different
Classes of Models for Parallel Economic
Development of Selected EU-Countries
Abstract In this paper, we extend and modify our modelling [presented in the con-
ference paper (Komorník and Komorníková, Predictive and descriptive models of
mutual development of economic growth of Germany and selected non-traditional
EU countries. In: ITISE 2015, International Work-Conference on Time Series,
pp. 55–64. Copicentro Granada S.L, 2015)] of the parallel development of GDP
of Germany (as the strongest EU economy), the so-called V4 countries (Poland,
the Czech Republic, Hungary, Slovakia) and Greece (as the most problematic
EU economy). Unlike in Komorník and Komorníková (Predictive and descriptive
models of mutual development of economic growth of Germany and selected non-
traditional EU countries. In: ITISE 2015, International Work-Conference on Time
Series, pp. 55–64. Copicentro Granada S.L, 2015), we analyse the data provided
by OECD (freely available from http://stats.oecd.org/index.aspx?queryid=218) that
are expressed in USD (using the expenditure approach) and covering a longer time
interval than our former data from EUROSTAT (http://appsso.eurostat.ec.europa.
eu/nui/show.do?wai=true&data-set=namq_10_gdp) (expressed in EUR using the
output approach). The best predictive quality models were found in the class of
multivariate TAR (Threshold Autoregressive) models with aggregation functions’
type thresholds. On the other hand, the best descriptive quality models were found
in the competing classes of one-dimensional MSW (Markov Switching) and STAR
(Smooth Transition Autoregressive) models.
J. Komorník ()
Faculty of Management, Comenius University, Odbojárov 10, P.O. Box 95, 820 05 Bratislava,
Slovakia
e-mail: [email protected]
M. Komorníková
Faculty of Civil Engineering, Slovak University of Technology, Radlinského 11, 810 05
Bratislava, Slovakia
e-mail: [email protected]
1 Introduction
It is well known that shortly after the dramatic political changes two and a half
decades ago the foreign trade of the so-called Visegrad group (V4) countries: Poland
(Pl), the Czech Republic (Cz), Hungary (Hu) and Slovakia (Sk) has been largely
reoriented from the former Soviet Union to EU (and predominantly to Germany).
These intensive trade relations have greatly influenced the economic development
of the V4 countries. In Figs. 1, 2 and 3 we present the parallel development of
the growth of GDP (in %) in the period 2000Q1–2015Q1 for Germany (De), V4
countries and Greece (Gr). These graphs have been calculated from the seasonally
adjusted quarterly data of GDP provided by the OECD [12] (expressed in USD
applying the expenditure approach).
Since the first three of the above V4 countries still use their national currencies,
using USD (as a neutral currency) provides a more balanced approach to the data of
the individual considered countries.
We can observe a parallel dramatic drop of GDP during the short period of the
global financial markets’ crisis around 2009 (that has been the most severe in case
of Slovakia) followed by a subsequent moderate recovery in all considered countries
(except for Greece, where the period of negative GDP growth lasted for more than
Fig. 1 The development of the GDP growth (in %) for Germany (black) and (gray) Czech
Republic—left, Slovakia—right
Fig. 2 The development of the GDP growth (in %) for Germany (black) and (gray) Poland—left,
Hungary—right
Qualities of Different Classes of Models for Parallel Economic Development 163
Fig. 3 The development of the GDP growth (in %) for Germany (black) and Greece (gray)
4 years starting in 2010). It is also noticeable that slight problems with the German
GDP growth around 2013 are accompanied by similar problems in the last three of
the V4 countries (Cz, Hu, Sk). The Polish economy is the largest in the V4 group
and seems to be more robust than the other three.
As we can see in Table 1, the German data exhibit high correlations with the last
three of the V4 countries (Cz, Hu, Sk) with the maximum reached by the Czech
Republic. Low values of correlations of the Polish data with the remaining ones
correspond to a stronger robustness of the Polish economy (which is by far the
largest in the V4 group). The correlations between the couples from the triple Cz,
Hu, Sk are the greatest among all considered couples of countries, but that slight
numerical paradox can not weaken the significance of the clear German dominance
of the foreign trade of those three countries.
The rest of the paper has the following structure. The second part briefly presents
theoretical foundations of the used modelling methodology and the third part
contains the results of our calculations. Finally, related conclusions are outlined.
164 J. Komorník and M. Komorníková
2 Theoretical Background
Aggregation functions have been a genuine part of mathematics for years. Recall,
e.g., several types of means (e.g. Heronian means) considered in ancient Greece.
The theory of aggregation functions, however, has only been established recently,
and its overview can be found in [2, 5], among others. We will deal with inputs and
outputs from the real unit interval Œ0; 1 only, though, in general, any real interval
I could be considered. Formally, an aggregation function assigns to an m-tuple of
inputs one representative value (this value does not need to be from the inputs set,
compare arithmetic mean, for example).
Recall that an aggregation function is defined for a fixed m 2 N; m 2 as a
mapping (see [2, 5])
X
m
WA.x/ D wi xi (1)
iD1
P
for given non-negative weights w1 ; : : : ; wm ; miD1 wi D 1.
Yager [17, 18] introduced the OWA function as a symmetrization of a weighted
arithmetic mean, i.e. of an additive aggregation function.
Definition 2 (See Definition 2 in [11]) Let A W Œ0; 1m ! Œ0; 1 be an aggregation
function. Symmetrized aggregation function SA W Œ0; 1m ! Œ0; 1 is given by
SA .x/ D A x .1/ ; : : : ; x .m/ ; (2)
weighted average
X
m
SA .x/ D OWA.x/ D wi x .i/ : (3)
iD1
X
m
SA .x/ D OMA.x/ D fi .x .i/ /; (4)
iD1
In this paper the copula function C.M/ .u; v/ D Min.u; v/ D min.u; v/ has been
.M/
used. The corresponding functions fi are
.M/
fi .x/ D min .max .x vi1 ; 0/ ; wi / : (6)
Note that all OWA functions present a specific subclass of OMA functions and
can be expressed in the form of (4) and (5) for the product copula C.u; v/ D
….u; v/ D u v corresponding to all couples of random vectors .X; Y/ with
independent components and continuous marginal distribution functions.
Remark Observe that the above-mentioned aggregation functions can be seen as
particular integrals. Indeed, weighted averages WA coincide with the Lebesgue
P
integral based on a probability measure p W 2f1;:::;mg ! Œ0; 1; p.E/ D i2E wi .
Similarly, OWA function is the Choquet
PcardE integral based on a symmetric capacity
W 2f1;:::;mg ! Œ0; 1; .E/ D jD1 wj D vcardE . OMA functions are copula-
based integrals based on defined above. When C D Min, see (6), then the Sugeno
integral based on is recovered. For more details see [7].
166 J. Komorník and M. Komorníková
TAR models were first proposed by Tong [14] and discussed in detail in Tong [15].
The TAR models are simple and easy to understand, but rich enough to generate
complex non-linear dynamics.
A standard form of k-regimes TAR for a m-dimensional time series yt D
.y1;t ; : : : ; ym;t /; t D 1; : : : ; n is given by (see, e.g., [16])
. j/
X
p
. j/ . j/
yt D ˚0 C ˚i yti C "t if cj1 < qt cj ; t D p C 1; : : : ; n; (7)
iD1
will be based on the Akaike information criterion (see, e.g., [1, 8, 16])
X
k
AIC D nj ln.det.˙O j // C 2m.m p C 1/; (8)
jD1
In the TAR models, a regime switch happens when the threshold variable crosses
a certain threshold. If the discontinuity of the threshold is replaced by a smooth
transition function 0 < F.qt ; ; c/ < 1, (where the parameter c can be interpreted
as the threshold, as in TAR models, and the parameter determines the speed and
smoothness of the change in the value of the transition function), TAR models can be
generalized to STAR models [4, 13]. The k-regimes one-dimensional STAR model
is given by (see, e.g., [4, 13])
X
k1
yt D ˚1 Xt C .˚iC1 ˚i / Xt F.qt ; i ; ci / C "t ; t D p C 1; : : : ; n; (9)
iD1
the logistic function approaches the indicator function I.qt > c/ and the LSTAR
model reduces to a TAR model while the ESTAR model is reduced to the linear
model.
3.
1
lim FL .qt ; ; c/ D 0; lim FL .qt ; ; c/ D 1; lim FL .qt ; ; c/ D ;
qt !1 qt !C1 qt !c 2
the logistic function is monotonic and the LSTAR model switches between two
regimes smoothly depending on how much the threshold variable qt is smaller
than or greater than the threshold c.
the exponential function is symmetrical and the ESTAR model switches between
two regimes smoothly depending on how far the threshold variable qt is from the
threshold c.
In this paper, for one-dimensional TAR and STAR models two alternative
approaches to the construction of the threshold variable qt will be used. The first
one is a traditional self-exciting, where qt D ytd for a suitable time delay d > 0.
We denote this class of TAR models as SETAR (Self-Exciting TAR) models. The
second approach is using the exogenous threshold variable that is the delayed values
of the German data.
For m-dimensional MTAR models (m D 6), again two alternative approaches to
the construction of the threshold variable qt will be used. The first one is using the
delayed values of the German data. The second approach is to use the outputs of
aggregation functions in the role of threshold variables. For a suitable aggregation
function A the value of threshold variable is
From the class of OWA operators we used MIN function (with wm D 1 and wi D 0
otherwise) and MAX function (with w1 D 1 and wi D 0 otherwise), corresponding
to extremal cases. Finally, Ordered Modular Average (OMA) function with Sierpin-
.M/
ski carpet S and functions fi (6), (11) was also used.
Qualities of Different Classes of Models for Parallel Economic Development 169
Discrete state Markov processes are very popular choices for modelling state-
dependent behaviour in natural phenomena, and are natural candidates for modelling
the hidden state variables in Markov switching models.
Assuming that the random variable st can attain only values from the set
f1; 2; : : : ; kg, the basic k-regime autoregressive Markov-switching model is given
by [6]
where st is discrete ergodic first order Markov process, the model is described in
particular regimes by an autoregressive model AR(p) and t D p C 1; : : : ; n; n is the
length of time series. We assume that "t i.i.d. N.0; "2 /.
The Akaike information criterion (AIC) [10] has the form
For evaluation of the predictive qualities of investigated models (that were measured
by the criterion Root Mean Square Error, RMSE) for one-step-ahead predictions
(see, e.g., [4]), the most recent 9 data (2013Q1–2015Q1) have been left out from
all individual time series and remaining earlier data (2000Q1–2012Q4) were used
for constructions of the models. All calculations were performed using the system
Mathematica, version 10.2.
Inspired by the recommendation from Franses and Dijk [4] the following steps
have been applied:
1. specification of an appropriate linear AR(p) model for the time series under
investigation,
170 J. Komorník and M. Komorníková
3 Modelling Results
Table 2 The descriptive errors ( " ) of selected models (for individual countries)
Standard deviation of residuals "
Model Threshold De Cz Pl Hu Sk Gr
SETAR 0:118 0:114 0:204 0:123 0:047 0:148
TAR De 0:118 0:115 0:206 0:124 0:052 0:118
MTAR De 0:139 0:119 0:280 0:140 0:156 0:215
MTAR WASC 0:143 0:125 0:279 0:151 0:164 0:227
MTAR OWASC 0:146 0:132 0:278 0:149 0:169 0:232
MTAR OMASC;Min 0:138 0:126 0:226 0:148 0:164 0:224
LSTAR 0:091 0:073 0:121 0:082 0:064 0:143
LSTAR De 0:091 0:074 0:114 0:085 0:094 0:098
ESTAR 0:089 0:068 0:120 0:070 0:048 0:143
ESTAR De 0:089 0:067 0:114 0:074 0:090 0:136
MSW 0:047 0:082 0:107 0:052 0:028 0:087
Values in bold represent the optimal models
Table 3 The predictive errors (RMSE) of selected models (for individual countries)
Prediction errors RMSE for 8 one-step-ahead forecasts
Model Threshold De Cz Pl Hu Sk Gr
SETAR 0:037 0:035 0:064 0:053 0:004 0:088
TAR De 0:037 0:034 0:064 0:035 0:034 0:087
MTAR De 0:057 0:099 0:079 0:140 0:037 0:074
MTAR WASC 0:020 0:104 0:054 0:053 0:026 0:054
MTAR OWASC 0:060 0:083 0:095 0:060 0:023 0:067
MTAR OMASC;Min 0:074 0:048 0:136 0:028 0:002 0:039
LSTAR 0:072 0:164 0:262 0:215 0:063 0:292
LSTAR De 0:072 0:139 0:369 0:262 0:036 0:130
ESTAR 0:067 0:432 0:263 0:214 0:415 0:294
ESTAR De 0:067 0:217 0:315 0:237 0:138 0:087
MSW 0:261 0:374 0:307 0:180 0:187 0:257
Values in bold represent the optimal models
172 J. Komorník and M. Komorníková
4 Conclusions
The main results of our investigations are two-fold. From the economical point of
view, we provided another evidence of the dominance of the German economy with
respect to other economies of the considered group of countries (especially to the
three smaller V4 economies, mainly the Czech one). On the other hand, a relatively
more robust position of the (largest V4) Polish economy has been demonstrated.
Moreover, our analysis provides further empirical justification of applications
of the OMA type of aggregation functions in construction of threshold variables
for MTAR models with promising results concerning their potential out-of-sample
predictive performance.
Acknowledgements This work was supported by the grants VEGA 1/0420/15 and APVV-14-
0013.
References
1. Bacigál, T.: Multivariate threshold autoregressive models in Geodesy. J. Electr. Eng. 55(12/s),
91–94 (2004)
2. Calvo, T., Kolesárová, A., Komorníková, M., Mesiar R.: Aggregation operators: properties,
classes and construction methods. In: Calvo, T., Mayor, G., Mesiar, R. (eds.) Studies in
Fuzziness and Soft Computing, vol. 97, Aggregation Operators: New Trend and Applications,
Eds:, pp. 3–106. Physica Verlag, Heidelberg (2002)
Qualities of Different Classes of Models for Parallel Economic Development 173
3. EUROSTAT. http://appsso.eurostat.ec.europa.eu/nui/show.do?wai=true&data-set=namq_10_
gdp (2015)
4. Franses, P.H., Dijk, D.: Non-linear Time Series Models in Empirical Finance. Cambridge
University Press, Cambridge (2000)
5. Grabisch, M., Marichal, J.-L., Mesiar, R., Pap, E.: Aggregation Functions. Encyclopedia of
Mathematics and its Applications, vol. 127. Cambridge University Press, Cambridge (2009)
6. Hamilton, J.D.: A new approach to the economic analysis of nonstationary time series and the
business cycle. Econometrica 57(2), 357–384 (1989)
7. Klement, E.P., Mesiar R., Pap, E.: A universal integral as common frame for Choquet and
Sugeno integral. IEEE Trans. Fuzzy Syst. 18(1), art. no. 5361437, 178–187 (2010)
8. Komorník, J., Komorníková, M.: Applications of regime-switching models based on aggrega-
tion operators. Kybernetika 43(4), 431–442 (2007)
9. Komorník, J., Komorníková, M.: Predictive and descriptive models of mutual development
of economic growth of Germany and selected non-traditional EU countries. In: ITISE 2015,
International Work-Conference on Time Series, pp. 55–64. Copicentro Granada S.L (2015)
10. Linhart, H., Zucchini, W.: Model Selection. Wiley Series in Probability and Mathematical
Statistics. Wiley, New York (1986)
11. Mesiar, R., Mesiarová-Zemánková, A.: The ordered modular averages. IEEE Trans. Fuzzy
Syst. 19(1), 42–50 (2011)
12. OECD.Stat. http://stats.oecd.org/index.aspx?queryid=218 (2015)
13. Teräsvirta, T.: Specification, estimation, and evaluation of smooth transition autoregressive
models. J. Am. Stat. Assoc. 89, 208–218 (1994)
14. Tong, H.: On a threshold model. In: C.H. Chen (ed.) Pattern Recognition and Signal Processing.
Sijhoff & Noordhoff, Amsterdam (1978)
15. Tong, H.: Non-linear Time Series: A Dynamical System Approach. Oxford University Press,
Oxford (1990)
16. Tsay, R.S.: Testing and modeling multivariate threshold models. J. Am. Stat. Assoc. 93, 1188–
1202 (1998)
17. Yager, R.R.: On ordered weighted averaging aggregation operators in multicriteria decision
making. IEEE Trans. Syst. Man Cybern. 18(1), 183–190 (1988)
18. Yager, R.R., Kacprzyk, J.: The Ordered Weighted Averaging Operators Theory and Applica-
tions. Kluwer, Boston, MA (1997)
Search and Evaluation of Stock Ranking Rules
Using Internet Activity Time Series
and Multiobjective Genetic Programming
Abstract Hundreds of millions of people are daily active on the internet. They view
webpages, search for different terms, post their thoughts, or write blogs. Time series
can be built from the popularity of different terms on webpages, search engines, or
social networks. It was already shown in multiple publications that popularity of
some terms on Google, Wikipedia, Twitter, or Facebook can predict moves on the
stock market. We are trying to find relations between internet popularity of company
names and the rank of the company’s stock. Popularity is represented by time
series of Google Trends data and Wikipedia view count data. We use multiobjective
genetic programming (MOGP) to find these relations. MOGP is using evolutionary
operators to find tree-like solutions to multiobjective problems and is popular in
financial investing in the last years. Stock rank is used in an investment strategy to
find stock portfolios in our implementation, revenue and standard deviation are used
as objectives. Solutions found by the MOGP algorithm show the relation between
the internet popularity and stock rank. It is also shown that such data can help to
achieve higher revenue with lower risk. Evaluation is done by comparing the results
with different investment strategies, not only the market index.
1 Introduction
future moves on the financial markets. Recent studies show a lot of forecasting
capabilities. Sources of internet activity data are, for example:
• Google Trends—they contain the search popularity of different terms in the
Google search engine. Preis et al. [4] used popularity of 98 search terms to build
investment strategies.
• Wikipedia page views—Moat et al. [5] found out that market falls are preceded
by increased number of page views of financial terms and companies.
• Twitter posts—Ruiz et al. [6] found correlation of posts about companies with
trade volume and also smaller correlation with stock price, which was used in a
trading strategy.
We are using Google Trends and Wikipedia page views of the traded companies
to rank their stock. This rank is then used to select stocks in an investment
strategy. We use multiobjective genetic programming (MOGP) to find the model.
The advantage of this algorithm is that it is able to search space of possible models,
represented by a tree structure, while covering both revenue and risk goals. This
representation is useful for complex nonlinear models with multiple inputs.
Genetic programming is an evolutionary optimization algorithm, which is search-
ing for problem solutions. Solution is a program represented by a tree structure. The
tree-based solutions are formed from two different sets of vertices. The first group
are terminal symbols, for example, inputs, constants, or any method calls, which
do not accept any parameters. Those are leafs of the tree structure. The second
set are nonterminals, or functions, that accept parameters. For example, arithmetic
operators, logical operators, conditions, etc. They are expected to be type and run
safe, so that the solutions can be executed to transform inputs to outputs. The first
vertex in the tree is called root and the depth of every vertex is defined as the
distance from the root. First generation of solutions is created randomly. Every
next generation is created by stochastic transformation of the previous generation.
Transformation is done by applying operators, which are inspired by the evolution
theory. These operators are mostly selection, mutation, and crossover [7]. Every
next generation is expected to be better.
The quality of the solutions is evaluated by the fitness function. When dealing
with multiobjective optimization, there are multiple fitness functions required, one
for every objective. There are many algorithms to handle multiple objectives in
evolutionary algorithms, overview can be found in [8].
2 Related Research
stocks from S&P300 index with three objectives. Huang et al. [11] outperformed
the market with their single objective genetic algorithm, which was used to tune the
parameters of their fuzzy model. Then they improved the results by implementing a
two-objective NSGA II algorithm [12] and adding domain knowledge [13].
Another area of research deals with generating trading rules. These are tree
structures, which return a logical value, which decides whether to enter or leave
the market. Allen and Karjalainen [14] were first to experiment on S&P 500 index,
but failed to beat the market. Neely [15] and other researchers added risk as a second
objective, but still failed to beat the market. Potvin et al. [16] were only able to beat
a stable or falling market. Becker and Seshadri [17] made some modifications to
the typical approach and beat the market. They used monthly data instead of daily,
reduced the number of functions, increased the number of precalculated indicators
(so in fact increasing domain knowledge), coevolved separate rules for selling and
buying, penalted more complex solutions, and took into account the number of
profitable periods instead of the total revenue. Lohpetch and Corne [18, 19] were
analyzing the differences between the different approaches and found out that longer
trading periods (monthly instead of daily) and introduction of a validation period
are causing better results. Their NSGA II implementation was able to beat the
single objective solution and also market and US bonds [20]. Briza and Naval [21]
used the multiobjective particle swarm algorithm with revenue and Sharpe ratio as
objectives and outperformed market and five indicators (moving averages, moving
average convergence—divergence, linear regression, parabolic stop, and reverse and
directional movement index) on training period, but failed to outperform market on
testing period.
A lot of research was done in the typical problem of portfolio optimization,
where the optimal weights of stocks in a portfolio are searched. Many single and
multiobjective evolutionary algorithms were used and compared, an overview was
done, for example, by [22] and [23].
Chen and Navet [24] criticize the research in the area of genetic programming
usage in investment strategies and suggest more pretesting. They compare strategies
with random strategies and lottery training without getting good results. Genetic
programming should prove its purpose by performing better than these random
strategies, according to them.
Most of the research in this area compares the results only with the most simple
buy and hold strategy, although wider evaluation should be done. We evaluate the
found rules by comparing them to different trading strategies, for example, random
solutions, bonds, and strategies based on technical or fundamental analysis.
We evaluate also the genetic programming itself by comparing it to the random
search algorithm. This is necessary to show that the algorithm is useful.
178 M. Jakubéci and M. Greguš
3 Goal
Our goal is to find a model for stock ranking based on its previous popularity
on the internet (based on Google Trends and Wikipedia page views). To evaluate
this ranking, we implement an investment strategy and compare its performance to
different strategies. We expect that our strategy can achieve the highest profit.
4 Methods
less than N then fill new archive with dominated individuals in population and
archive.
4. Termination: If t T or another stopping criterion is satisfied then set A to the set
of decision vectors represented by the not dominated individuals in the archive.
Stop.
5. Mating selection: Perform binary tournament selection with replacement on the
new archive in order to fill the mating pool.
6. Variation: Apply recombination and mutation operators to the mating pool and
set new population to the resulting population. Increment generation counter
(t D t C 1) and go to Step 2.
We used only internet popularity data as input, compared to our previous efforts
[25, 26], where we combined this data with historical prices. Transaction fees are
ignored. Portfolio is updated daily, ten companies with highest rank are bought and
ten companies with lowest rank are sold. Results are compared with the buy and
hold strategy on the DJIA, as a representation of the market.
Implementation was done in the C# language, which has a high performance but
is still easy to use. The language integrated many features from dynamic program-
ming, for example, the expression trees, which allow working with an algorithm as
a data structure. This is important for the genetic programming algorithm, because it
allows modifications in the solutions and application of the evolutionary operators.
The MetaLinq library was used to simplify these modifications (at http://metalinq.
codeplex.com/). We used the Microsoft Automatic Graph Layout library for tree
visualization (at http://research.microsoft.com/en-us/projects/msagl/).
We compared the results with a set of investment strategies:
• Lottery trading is doing decisions randomly. That means that it always gives a
random evaluation of a stock.
• Random strategy is a randomly created strategy. Such strategies are created also
in the first generation of the genetic programming simulation.
• Risk-free investment is represented by 3-year US treasury bonds.
• Buy and hold strategy means that the asset is bought on the beginning of the
period and sold at the end. It is the most basic strategy and it was applied to the
DJI index.
• Dogs of the Dow strategy is investing to ten companies from the DJI index with
the highest dividend yield [27].
• Simple moving averages (SMA) is calculated as an average of previous days,
when the price rises above the moving average, stock should be bought, when it
falls under the moving average, it should be bought [28].
• Exponential moving averages (EMA) is similar to the SMA, but with decreasing
effect of the older days in the calculation [28].
• Moving average convergence divergence (MACD) is calculated as a difference
between 26-period EMA and 12-period EMA, when it crosses the signal line
(EMA of MACD) from below, it is a buy signal [28].
180 M. Jakubéci and M. Greguš
5 Results
We ran the genetic programming multiple times to find Pareto fronts with different
solutions. An example of such solution in tree representation can be seen in
Fig. 1. It can be written as Multiply(Multiply(lastGoogle, Cos(Multiply(lastGoogle,
Sin(Lag(google, 49))))), Multiply(lastWiki, 0,38)). Meaning of some of the nodes:
• lastWiki and lastGoogle are the popularity values from previous day,
• Google and Wiki are the popularity time series from last 50 days, and
• Lag is the move in the time series.
This shows a linear relation between the rank and value of Google and Wikipedia
popularity from last day. We can also see a negative correlation with popularity
from the day before. This negative relation is caused by the cosine operator in the
normalized range of values. We can interpret this as a rule to buy a stock, while
the popularity is rising and sell it, when it decreases. This is the case for almost all
solutions (92 %) from the Pareto front of multiple simulation runs.
Although similar relation is present in most of the models, some (7 %) contain
even an exponential relation. Example can be seen in Fig. 2.
We used the ranking rule in an investment strategy and compared RoR and
StdDev with different other strategies. We used the training period to find Pareto
fronts, filter solutions in a validation period as suggested by [19] and evaluated the
results in two different evaluation periods. Results are shown as averages in ten
different runs.
First we compared the performance with the random search algorithm. Figure 3
shows that genetic programming is able to find better Pareto fronts. Table 1
shows that genetic programming outperforms the random search in both return and
deviation. It also found larger Pareto fronts, which is useful for more diverse trading
options, with different revenue and risk combinations.
Next we compared the results with different trading strategies in two time settings
and without transaction fee and also with 0.5 % transaction fee. Results for the first
time setting with transaction fee are shown in Figures 4, 5, 6, and 7, where RoR
is shown on the x axis and StdDev on y axis. This setting consists of a training
period 2010–2012, validation period 2013, first evaluation period first half of 2014,
Fig. 4 Pareto front of genetic programming and other investment strategies in the training period
of 2010–2012, including a transaction fee
Fig. 5 Pareto front of genetic programming and other investment strategies in the validation period
of 2013, including a transaction fee
Search and Evaluation of Stock Ranking Rules Using Internet Activity Time. . . 183
Fig. 6 Results of genetic programming and other investment strategies in the evaluation period of
2014, including a transaction fee
Fig. 7 Results of genetic programming and other investment strategies in the evaluation period of
2008–2009, including a transaction fee
184 M. Jakubéci and M. Greguš
and second evaluation period of 2008–2009. The periods were chosen to contain a
raising market and also the falling market of the financial crisis.
Figure 4 shows the results in the training period. The line represents the
different ranking rules found by genetic programming and the dots represent the
other investment strategies, one of them is the genetic programming average.
Genetic programming is able to achieve higher revenues while maintaining a lower
deviation.
The same information is shown in Fig. 5, but for the validation period. This
data was not used for training, so the results are a little bit worse. But genetic
programming has still the highest revenue, while having almost the lowest deviation.
Most interesting are results in the evaluation periods. Figure 6 shows the results
in 2014, genetic programming has the highest revenue with lowest deviation, so it
clearly outperforms all the other strategies.
The second evaluation period is shown in Fig. 7, this is in fact the falling market
of the financial crisis in the years of 2008–2009. It can be seen that the market
index and fundamental analysis are in a loss, outperformed even by the bonds.
Technical analysis strategies have the lowest deviation with a small profit. Genetic
programming has again the highest profit.
The results are summarized in Fig. 8 and it is obvious that genetic programming
outperformed all the other strategies. It includes the information for both time
settings and both transaction fee settings. Technical analysis, Dogs of the Dow, and
DJIA index performed quite well too, but were weak in the crisis periods.
Fig. 8 Average daily rate of return of genetic programming and other strategies
Search and Evaluation of Stock Ranking Rules Using Internet Activity Time. . . 185
It’s obvious that a strategy based on internet popularity can be highly profitable.
More research and evaluation in this area is needed in the future.
6 Conclusion
We found out that data about internet activity can be used to rank stocks and achieve
interesting revenues when using this rank for investing. We used MOGP algorithm
to find the relation between internet activity and stock rank.
We found out that there is a positive correlation between stock rank and term
popularity and also a negative correlation between stock rank and older popularity.
This can be interpreted as a high rank for stocks with rising popularity and low rank
for falling popularity. This contradicts partly the findings of [4], whose strategy was
based on selling when popularity was rising. Although they used weekly trading and
we used daily trading, this lag could cause the difference.
We compared the rules used in an investment strategy with other strategies, for
example, random algorithms, market index, technical, and fundamental strategies.
Rules found by genetic programming are able to compete with them and even
outperform them in most of the cases.
We suggest that more research is needed in this area. Strategies should be
compared to more trading strategies and evaluation on more data periods is needed.
Acknowledgment This research has been supported by a VUB grant no. 2015-3-02/5.
References
1. Bohdalová, M.: A comparison of value-at-risk methods for measurement of the financial risk.
In: The Proceedings of the E-Leader, pp. 1–6. CASA, New York, (2007)
2. Bohdalová, M., Šlahor, L.: Modeling of the risk factors in correlated markets using a
multivariate t-distributions. Appl. Nat. Sci. 2007, 162–172 (2007)
3. Bohdalová, M., Šlahor, L.: Simulations of the correlated financial risk factors. J. Appl. Math.
Stat. Inf. 4(1), 89–97 (2008)
4. Preis, T., Moat, S.H., Stanley, H.E.: Quantifying trading behavior in financial markets using
Google Trends. Sci. Rep. 3, 1684 (2013)
5. Moat, H.S., Curme, CH., Avakian, A., Kenett, D.Y., Stanley, H.E., Preis, T.: Quantifying
Wikipedia usage patterns before stock market moves. Sci. Rep. 3, 1801 (2013)
6. Ruiz, J.E., Hristidis, V., Castillo, C., Gionis, A., Jaimes, A.: Correlating financial time series
with micro-blogging activity. In: Proceedings of the Fifth ACM International Conference on
Web Search and Data Mining, ACM, New York, NY, USA, pp. 513–522 (2012)
7. Poli, R., Langdon, W.B., McPhee, N.F.: A Field Guide to Genetic Programming. http://www.
gp-field-guide.org.uk (2008)
8. Ghosh, A., Dehuri, S.: Evolutionary algorithms for multicriterion optimization: a survey. Int.
J. Comput. Inform. Sci. 2(1), 38–57 (2005)
9. Mullei, S., Beling, P.: Hybrid evolutionary algorithms for a multiobjective financial problem.
In: Proceedings of the 1998 IEEE International Conference on Systems, Man, and Cybernetics,
11–14 October 1998, San Diego, CA, USA, vol. 4, pp. 3925–3930 (1998)
186 M. Jakubéci and M. Greguš
10. Becker, Y.L., Fei, P., Lester, A.: Stock selection—an innovative application of genetic
programming methodology. In: Genetic Programming Theory and Practice IV, pp. 315–334.
Springer, New York (2007)
11. Huang, C.F., Chang, C.H., Chang, B.R., Cheng, D.W.: A study of a hybrid evolutionary fuzzy
model for stock selection. In: Proceeding of the 2011 IEEE International Conference on Fuzzy
Systems, 27–30 June 2011, pp. 210–217. Taipei (2011)
12. Chen, S.S., Huang, C.F., Hong, T.P.: A multi-objective genetic model for stock selection.
Proceedings of The 27th Annual Conference of the Japanese Society for Artificial Intelligence,
Toyama, Japan, 4–7 June 2013
13. Chen, S.S., Huang, C.F., Hong, T.P.: An improved multi-objective genetic model for stock
selection with domain knowledge. In: Technologies and Applications of Artificial Intelligence,
Lecture Notes in Computer Science, vol. 8916, pp. 66–73. Springer (2014)
14. Allen, F., Karjalainen, R.: Using genetic algorithms to find technical trading rules. J. Financ.
Econ. 51, 245–271 (1999)
15. Neely, C.H.: Risk-adjusted, ex-ante, optimal technical trading rules in equity markets. Int. Rev.
Econ. Financ. 12, 69–87 (1999)
16. Potvin, J.Y., Soriano, P., Vallée, M.: Generating trading rules on the stock markets with genetic
programming. Comput. Oper. Res. 31(7), 1033–1047 (2004)
17. Becker, L.A., Seshadri, M.: GP-evolved technical rules can outperform buy and hold. In:
Proceedings of the 6th International Conference on Computational Intelligence and Natural
Computing, pp. 26–30 (2003)
18. Lohpetch, D., Corne, D.: Discovering effective technical trading rules with genetic pro-
gramming: towards robustly outperforming buy-and-hold. World Congress on Nature and
Biologically Inspired Computing, pp. 431–467 (2009)
19. Lohpetch, D., Corne, D.: Outperforming buy-and-hold with evolved technical trading rules:
daily, weekly and monthly trading. In: Proceedings of the 2010 International Conference on
Applications of Evolutionary Computation, vol. 6025, pp. 171–181. Valencia, Spain (2010)
20. Lohpetch, D., Corne, D.: Multiobjective algorithms for financial trading: Multiobjective
out-trades single-objective. IEEE Congress on Evolutionary Computation, 5–8 June 2011,
New Orleans, LA, USA, pp. 192–199 (2011)
21. Briza, A.C., Naval, P.C.: Design of stock trading system for historical market data using
multiobjective particle swarm optimization of technical indicators. In: Proceedings of the 2008
GECCO Conference Companion on Genetic and Evolutionary Computation, pp. 1871–1878.
Atlanta, Georgia, USA (2008)
22. Hassan, G.N.A.: Multiobjective genetic programming for financial portfolio management in
dynamic environments. Doctoral Dissertation, University College London (2010)
23. Tapia, G.C., Coello, C.A.: Applications of multi-objective evolutionary algorithms in eco-
nomics and finance: a survey. IEEE Congress on Evolutionary Computation, pp. 532–539
(2007)
24. Chen, S.H., Navet, N.: Failure of genetic-programming induced trading strategies: distin-
guishing between efficient markets and inefficient algorithms. Comput. Intell. Econ. Finan.
2, 169–182 (2007)
25. Jakubéci, M.: Výber portfólia akcií s využitím genetického programovania a údajov o pop-
ularite na internete. VII. mezinárodní vědecká konference doktorandů a mladých vědeckých
pracovníků, pp. 47–56. Opava, Czech Republic (2014)
26. Jakubéci, M.: Evaluation of investment strategies created by multiobjective genetic program-
ming. In: Proceedings of the 7th International Scientific Conference Finance and Performance
of Firms in Science, Education and Practice, 23–24 April 2015, pp. 498–509. Zlin, Czech
Republic (2015)
27. Domiana, D.L., Loutonb, D.A., Mossmanc, C.E.: The rise and fall of the Dogs of the Dow.
Financ. Serv. Rev. 7(3), 145–159 (1998)
Search and Evaluation of Stock Ranking Rules Using Internet Activity Time. . . 187
28. Kirkpatrick, C., Dahlquist, J.: Technical analysis. FT Press, Upper Saddle River, NJ (2010)
29. Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: improving the strength Pareto evolutionary
algorithm. In: Evolutionary Methods for Design, Optimisation and Control with Application
to Industrial Problems (EUROGEN 2001), pp. 95–100. International Center for Numerical
Methods in Engineering, Barcelona
Integer-Valued APARCH Processes
Abstract The Asymmetric Power ARCH representation for the volatility was
introduced by Ding et al. (J Empir Financ 1:83–106, 1993) in order to account for
asymmetric responses in the volatility in the analysis of continuous-valued financial
time series like, for instance, the log-return series of foreign exchange rates, stock
indices, or share prices. As reported by Brännäs and Quoreshi (Appl Financ Econ
20:1429–1440, 2010), asymmetric responses in volatility are also observed in time
series of counts such as the number of intra-day transactions in stocks. In this work,
an asymmetric power autoregressive conditional Poisson model is introduced for
the analysis of time series of counts exhibiting asymmetric overdispersion. Basic
probabilistic and statistical properties are summarized and parameter estimation is
discussed. A simulation study is presented to illustrate the proposed model. Finally,
an empirical application to a set of data concerning the daily number of stock
transactions is also presented to attest for its practical applicability in data analysis.
1 Introduction
In this work, focus is put on models in which the count variable is assumed to
be Poisson distributed conditioned on the past which is to say that the conditional
distribution of the count variable, given the past, is assumed to be Poisson with time-
varying mean t , satisfying some autoregressive mechanism. An important family of
such observation-driven models that is able to handle overdispersion is the class of
Integer-Valued APARCH Processes 191
Fig. 1 Time series plots for Glaxosmithkline (top) and Astrazeneca (bottom)
Autoregressive Conditional Poisson (ACP), first introduced in [9], but also referred
to as the INGARCH model due to its analogy to the conventional GARCH model
(see Ferland et al. [5]).
An INteger-valued GARCH process of orders . p; q/, INGARCH. p; q/ in short,
is defined to be an integer-valued process .Yt / such that, conditioned on the past
experience, Yt is Poisson distributed with mean t , and t is obtained recursively
from the past values of the observable process .Yt / and .t / itself, that is
X
p
X
q
Yt jFt1 Po.t /; t D 0 C i Yti C ıj tj ; t 2 Z ;
iD1 jD1
where Ft1 WD .Ys ; s t 1/, 0 > 0, i 0, and ıj > 0. In [5] it was shown
that the process .Y
Pp t / is strictly
Pq stationary with finite first and second order moments
provided that iD1 i C jD1 ıj < 1. The particular case p D q D 1 was analyzed
by Fokianos and Tjøstheim [6] and Fokianos et al. [7] under the designation of
Poisson Autoregression. The authors considered linear and nonlinear models for t .
192 M.C. Costa et al.
where it is assumed that the parameters d, a, b are positive, and 0 and Y0 are
fixed. This representation corresponds exactly to the INGARCH.1; 1/ model in [5],
nevertheless, the approach followed by Fokianos et al. [7] is slightly different in the
sense that the linear model is rephrased as Yt D Nt .t /; t 2 N with t defined as
in (2), and 0 and Y0 fixed. For each time point t, the authors introduced a Poisson
process of unit intensity, Nt ./, so that Nt .t / represents the number of such events
in the time interval Œ0; t . Following this rephrasing a perturbation is introduced
in order to demonstrate -irreducibility and as a consequence geometric ergodicity
follows. The nonlinear case is considered a generalization of the previous situation
in which the conditional mean, EŒYt jFt1 Y;
D t , is a nonlinear function of both the
past values of t and the past values of the observations. Sufficient conditions to
prove geometric ergodicity can be found in [7].
It is worth to mention that the models above can not cope with the presence of
asymmetric overdispersion. This paper aims at giving a contribution towards this
direction by introducing the INteger-valued APARCH process.
Definition 1 (INAPARCH. p; q/ Model) An INteger-valued APARCH. p; q/ is
defined to be an integer-valued process .Yt /, such that, conditioned on the past, the
distribution of Yt is Poisson with mean value t satisfying the recursive equation
X
p
X
q
ıt D ! C ˛i .jYti ti j i .Yti ti //ı C ˇj ıtj ; t 2 Z
iD1 jD1
Yt jFt1 Po.t /; ıt D ! C ˛j.Yt1 t1 j .Yt1 t1 //ı C ˇıt1 (3)
First note that the chain is weak Feller (cf., [2]). Define C WD Œc; c then
Since ˛; ˇ; ı; > 0 and in view of the fact that j j < 1 and jYt1 t1 j .Yt1
t1 / 0, the expression above simplifies to
! C ˇı ˛
P.; Cc / C E .jYt1 t1 j .Yt1 t1 //ı jt1 D :
c c
Given the definition of E .jYt1 t1 j .Yt1 t1 //ı jt1 D ,
X yt1 C1
! C ˇı ˛
P.; C /
c
C e .jyt1 j .yt1 //ı :
c c y D0
.y t1 /Š
t1
P
By d’Alembert’s criterion, the series C1 yt1 ı
yt1 D0 .yt1 /Š .jyt1 j .yt1 // ; is
absolutely convergent. Being convergent, the series has a finite sum and so it can be
written that
C1
X yt1
! C ˇı ˛
P.; Cc / C e .jyt1 j .yt1 //ı < :
c c y D0
.y t1 /Š
t1
leading to conclude that the series has at least one stationary solution.
Finally, in proving uniqueness we proceed as follows: first note that the
INAPARCH.1; 1/ model belongs to the class of observation-driven Poisson count
processes considered in Neumann [11], Yt jFt1 Y;
Po.t /I t D f .t1 ; Yt1 /; t 2
N with
1
f .t1 ; Yt1 / D .! C ˛.jYt1 t1 j .Yt1 t1 //ı C ˇıt1 / ı :
194 M.C. Costa et al.
Thus, the result follows if the function f above satisfies the contractive condition
where k1 and k2 are nonnegative constants such that k1 C k2 < 1. For the
INAPARCH.1; 1/ model the contractive condition simplifies to
@f @f
j f .t1 ; Yt1 / f .0t1 ; Yt1
0
/j k k1 jt1 0t1 j C k 0
k1 jYt1 Yt1 j;
@t1 @Yt1
for ı 2, then the contractive condition holds. This concludes the proof.
Neumann [11] proved that the contractive condition in (4) is, indeed, sufficient
to ensure uniqueness of the stationary distribution and ergodicity of .Yt ; t /. The
results are quoted below.
Proposition 2 Suppose that the bivariate process .Yt ; t / satisfies (3) and (5) for
ı 2. Then the stationary distribution is unique and EŒ1 < 1.
Proposition 3 Suppose that the bivariate process .Yt ; t / is in its stationarity
regime and satisfies (3) and (5) for ı 2. Then the bivariate process .Yt ; t / is
ergodic and EŒ21 < 1.
Furthermore, following Theorem 2.1. in [4], it can be shown that if the process
.Yt ; t / satisfies (3) and (5) for ı 2, then there exists a solution of (3) which
is a -weakly dependent strictly stationary process with finite moments up to any
positive order and is ergodic.
3 Parameter Estimation
In this section, we consider the estimation of the parameters of the model (3).
The conditional maximum likelihood (CML) method can be applied in a very
straightforward manner. Note that by the fact that the conditional distribution is
Poisson the conditional likelihood function, given the starting value 0 and the
observations y1 ; : : : ; yn , takes the form
Y
n
et ./ t t ./
y
L./ WD (6)
tD1
yt Š
Integer-Valued APARCH Processes 195
X
n X
n
ln.L.// D Œyt ln.t / t ln.yt Š/ `t ./ : (7)
tD1 tD1
For the calculation of the first order derivatives of the general INAPARCH. p; q/
model the auxiliary calculations presented below are needed.
@`t @t yt
D 1 ; i D 1; : : : ; 2 C 2p C q ;
@i @i t
@.ı /
where @ t
@i D ııt @i ; i D 1; : : : ; 2 C 2p C q: Thus, for i D 1; : : : ; p and for
t t
!
@t t X
p
@ti X @ıtk
q
D ı ı ˛i gı1
ti .Iti C i / C ˇk C ıtj ;
@ˇj ıt iD1
@ˇj kD1
@ˇ j
(
@t t X
p
ı @ti
D ı ˛i gıti .Iti C i / C ln.gti /
@ı ıt iD1
gti @ı
9
X
q
@ıtj ıt =
C ˇj ln.ıt / ;
@ı ı ;
jD1
196 M.C. Costa et al.
1 yt > t
where gti D jyti ti j i .yti ti / and It D : Thus, for the
1 y t < t
INAPARCH.1; 1/ model the score function can then be explicitly written as
2P 3
n yt @t
t 1 @1
6 tD1
7
6 :: 7
Sn ./ D 6 7
4P : 5
n yt @t
tD1 t 1 @5
with
@`t yt @t
D 1 :
@i t @i
It follows that, at D 0
@`t
E jFt1 D 0 ;
@ 0
Integer-Valued APARCH Processes 197
h i 2 h i
yt 1
since E t 1jFt1 D 0, and E ytt 1 jFt1 D V ytt 1jFt1 D t : It
can also easily be shown that, for ı 2
E t22ı jFt1 < C1 ; E t1ı jFt1 < C1
E t2ı ln.t /jFt1 < EŒln.t /jFt1 < EŒt jFt1 < C1
E 2t ln2 .t /jFt1 < C1 ; E Œt ln.t /jFt1 < C1 :
h i
Thus, it can be concluded that V @` @
t
jF t1 < C1 and that @`t =@ is a martingale
difference sequence with respect to Ft1 . The application of a central limit theorem
for martingales guarantees the desired asymptotic normality.
It is worth to mention here that in Sect. 2 it was concluded that the process has
finite moments up to any positive order and is -weak dependent, which implies
ergodicity. This is sufficient to state that the Hessian matrix converges in probability
to a finite limit. Finally, all third derivatives are bounded by a sequence that
converges in probability. Given these three conditions, it is then concluded that the
CML estimator, , O is consistent and asymptotically normal
p
n.O 0 / ! N .0; G1 . 0 //
d
X
n
@`t ./ X 1
n
@t ./ @t ./ 0
Gn ./ D V jFt1 D :
tD1
@ ./
tD1 t
@ @
4 Simulation
Fig. 2 Bias of the conditional ML estimates, for cases C2 (top left), C4 (top right), C5 (bottom
left), and C6 (bottom right) (numbers 1–5 below the boxplots refer to the estimated parameters, in
the order appearing in Table 1)
Integer-Valued APARCH Processes 199
and C6. From this part of the simulation study a few conclusions can be drawn: it is
clear that as the theoretical values of the ˛ and ˇ parameters rise, the point estimates
obtained are much closer to what was expected, in particular for the ˛ parameter;
and ı parameters are fairly estimated but there is a certain difficulty in the estimation
of the ! parameter that tends to be underestimated, with the exception of C5 case;
there is also a very high degree of variability, in particular for ! and ı parameters.
An important conclusion is that condition (5) does not seem to interfere with the
quality of the point estimates for this model. In fact, best overall estimates were
obtained for cases C6 and C7, clearly not obeying condition (5).
For C2 and C4 cases, 300 samples were simulated considering values of ı varying
from 2.0 to 3.0 (i.e., six different situations for each case). After preliminary data
analysis with the construction of boxplots and histograms that confirm the presence
of overdispersion, the log-likelihood was studied in the following manner: for each
set of 300 samples the log-likelihood was calculated, varying the ı parameter in the
range 2.0–3.0. It was expected that the log-likelihood was maximum for the ı value
used to simulate that particular set of 300 samples. Results are presented in Table 2
for Case 2. Case 2 was chosen for representation herein just because for this case
the first three values for the ı parameter lie inside the region that obeys condition (5)
and the last three lie outside this region. Nevertheless, same behavior was observed
for both Case 2 and Case 4 and the ı value for which the calculated log-likelihood
was maximum was exactly what was expected for both cases and all six different
situations. In Table 2, it can be observed that the mean log-likelihood is maximum
for the ı value corresponding to the ı value used for the simulation of the respective
set of samples.
In this section, the results above are applied in the analysis of the motivating
examples presented in Fig. 1, Sect. 1. As already described, data consist on the
number of transactions per minute during one trading day for Glaxosmithkline and
Astrazeneca. CML estimation method was applied and the results are shown in
Table 3. Note that the estimated value of is negative for both time series meaning
that there is evidence that positive shocks have stronger impact on overdispersion
than negative shocks. Another important feature exhibited by both time series is that
the estimated value of ı fail the condition ı 2. It is worth mentioning that this is
not a surprising result since in the estimation of the Standard and Poor 500 stock
market daily closing price index in [3], the ı estimate obtained did not also satisfy
such sufficient condition for the process to be covariance stationary.
A short simulation study was also carried out in this section. The CML point
estimates of both real-data series in Table 3 were used to simulate 300 independent
replicates of length 500 from the INAPARCH.1; 1/ model, namely, GSK and AZN
cases, referring, respectively, to the samples based on the point estimates for the
Glaxosmithkline and Astrazeneca time series. CML estimates were then obtained
for these samples and the results are presented in Table 4, with corresponding
bias in Fig. 3. Regarding Fig. 3, it can be seen that variability and the tendency to
underestimate the ! parameter is maintained (taking in consideration the median
value) but in relation to the ı parameter variability has decreased significantly. From
inspection of Table 4, it can be said that, in general, CML point estimates are not
very far from what was expected in both cases, although better overall estimates
were obtained for the AZN case. Considering that condition (5) was not fulfilled for
either AZN or GSK cases (2ı .2˛ı C ˇ2 / equals 2.0297 for the AZN case and 1.4091
Table 3 Maximum likelihood estimation results for Glaxosmithkline and Astrazeneca time series
(standard errors in parentheses)
Time series !O ˛O O ˇO ıO
Glaxosmithkline 0.378 0.139 0.326 0.879 0.982
(0.068) (0.007) (0.084) (0.007) (0.0005)
Astrazeneca 2.486 0.282 0.278 0.750 1.059
(0.108) (0.006) (0.036) (0.004) (0.0008)
Table 4 Maximum likelihood estimation results for GSK and AZN cases (standard deviations in
parentheses)
Samples !O ˛O O ˇO ıO
GSK case 0.739 0.182 0.241 0.871 1.220
(0.577) (0.094) (0.259) (0.025) (0.388)
AZN case 2.495 0.309 0.147 0.699 0.998
(0.665) (0.101) (0.258) (0.107) (0.161)
Integer-Valued APARCH Processes 201
Fig. 3 Bias of CML estimates, for the AZN case (numbers 1–5 below the boxplots refer to the
estimated parameters, in the order appearing in Table 4)
for the GSK case), as was already mentioned in Sect. 4, it seems that violating the
sufficient condition for ergodicity has no effect on the behavior of the estimation
procedure. The impact of violating necessary instead of sufficient conditions for
ergodicity remains as a topic of future work.
Acknowledgements This research was partially supported by Portuguese funds through the Cen-
ter for Research and Development in Mathematics and Applications, CIDMA, and the Portuguese
Foundation for Science and Technology,“FCT—Fundação para a Ciência e a Tecnologia,” project
UID/MAT/04106/2013.
References
1. Brännäs, K., Quoreshi, A.M.M.S.: Integer-valued moving average modelling of the number of
transactions in stocks. Appl. Financ. Econ. 20, 1429–1440 (2010)
2. Davis, R.A., Dunsmuir, W.T.M., Streett, S.B.: Observation driven models for Poisson counts.
Biometrika 90, 777–790 (2003)
3. Ding, Z., Granger, C.W., Engle, R.F.: A long memory property of stock market returns and a
new model. J. Empir. Financ. 1, 83–106 (1993)
4. Doukhan, P., Fokianos, K., Tjøstheim, D.: On weak dependence conditions for Poisson
autoregressions. Stat. Probab. Lett. 82, 942–948 (2012)
5. Ferland, R., Latour, A., Oraichi, D.: Integer-valued GARCH process. J. Time Ser. Anal. 6,
923–942 (2006)
6. Fokianos, K., Tjøstheim, D.: Nonlinear Poisson autoregression. Ann. Inst. Stat. Math. 64,
1205–1225 (2012)
7. Fokianos, K., Rahbek, A., Tjøstheim, D.: Poisson autoregression. J. Am. Stat. Assoc. 104,
1430–1439 (2009)
202 M.C. Costa et al.
8. Franke, J.: Weak dependence of functional INGARCH processes. Technical Report 126,
Technische Universität Kaiserslautern (2010)
9. Heinen, A.: Modelling time series count data: an autoregressive conditional Poisson model.
Center for Operations Research and Econometrics (CORE) Discussion Paper No. 2003-63,
University of Louvain (2003)
10. Meyn, S.P., Tweedie, R.L.: Markov Chains and Stochastic Stability. Springer, New York (1994)
11. Neumann, M.H.: Absolute regularity and ergodicity of Poisson count processes. Bernoulli 17,
1268–1284 (2011)
12. Turkman, K.F., Scotto, M.G., de Zea Bermudez, P.: Non-linear Time Series: Extreme Events
and Integer Value Problems. Springer, Cham (2014)
Part III
Applications in Time Series Analysis
and Forecasting
Emergency-Related, Social Network Time
Series: Description and Analysis
Horia-Nicolai Teodorescu
Abstract Emergencies and disasters produce a vast traffic on the social networks
(SNs). Monitoring this traffic for detection of emergency situations reported by
the public, for rapid intervention and for emergency consequences mitigation, is
possible and useful. This article summarizes the results presented in a series of
previous papers and brings new data evidence. We describe the specific traffic on
social networks produced by several emergency situations and analyze the related
time series. These time series are found to present a good picture of the situation
and have specific properties. We suggest a method for the analysis of SN-related
time series with the aim of establishing correlations between characteristics of the
disaster and the SN response. The method may help the prediction of the dimension
of the emergency situations and for forecasting needs and measures for relief and
mitigation. Time series of real data (situation time series) are exemplified. The take-
away from this article is that the SN content must be deeper statistically analyzed
to extract more significant and reliable information about the social and external
processes in disasters and use learned correlations for prediction and forecasting.
This paper heavily relies on previous works of the author and reproduces several of
the idea, sometimes ad verbatim, from those papers.
1 Introduction
In recent years, academics, several agencies and organizations, and several com-
panies proposed the use of social networks (SNs) for detecting and evaluating
emergency situations and improving rescue operations. For this goal, it was
suggested that messages must be analyzed and relevant data extracted. Various
specific analytics and similar applications have been developed and presented in
the literature. Moreover, some commercial, even freely available apps, such as
Banjo allow today almost everyone to follow in almost real-time major events
developing as reflected in a variety of media and SNs. We recall that Banjo collects
messages and posts from all major SNs and social media and lets users see the most
recent ones; the newly introduced feature “rewind” also lets users see older events.
Therefore, this application, supplemented with appropriate means for statistics and
analysis may help researchers perform extended studies on the reflection on SNs
and SMs of developing and older emergencies. However, these tools—standard and
adapted analytics and search tools—are not enough for the substantial understanding
and evaluation of the emergencies, in view of decision making for rescue and
mitigation of the effects of disasters. In the first place, the assumption that the
response to emergency on SNs correlates linearly with the event, as found on or
implied by the sites of several analytics was never tested and the mechanisms
relating the manner and number of responses to dramatic events remain largely
unquestioned and unknown. Thus, finding statistically validated ways to measure
and to represent the correlations between dramatic events in the real world and
the response on SNs requires further studies. This article aims to contribute to the
advancement of the topic and to help elucidating some of these issues.
In [1] we addressed the basic issues of the correlational analysis of the response
on SNs/SMs and the events that produces the related traffic with the applicative
concern of disaster reflection in the SMs/SNs media. This article provides further
evidence support and details on the correlations between events amplitude and traffic
amplitude on SNs/SMs.
We briefly recall several projects regarding the use of SN analytics for disaster
situations; the TweetTracker, proposed as “a powerful tool from Arizona State
University that can help you track, analyze, and understand activity on Twitter,”
extensively described in Kumar et al. [2] and in the papers by Morstatter et al.
[3] and Kumar et al. [4]. TweetTracker has been reportedly used by Humanity
Road, an organization “delivering disaster preparedness and response information
to the global mobile public before, during, and after a disaster,” see (http://
tweettracker.fulton.asu.edu/). But not only charities and universities turned toward
SNs for monitoring disasters. Federal agencies in the USA are requested to monitor
the spread and quality of their e-services using analytics (DAP: Digital Metrics
Guidance and Best Practices); these agencies are expressly required to use analytics
[5], so, it is not surprising that several of these agencies used monitoring and
predictive analytics, for event detection, including early detection of earthquakes
and flu outbreaks [6], among others. Also, federal agencies in the USA have already
applied SN analysis in law enforcing [7]. There is no independent analysis of the
results, however.
There are numerous academic studies, among others [8–14] advocating the
application of SN content analysis in disasters. However, there are few proved and
convincing uses until now.
The organization of this article is linear. Section 2 discusses the incentives for the
use and the limits in the use of analytics in disasters. The third Section introduces
Emergency-Related, Social Network Time Series: Description and Analysis 207
issues related to the description of the SN responses to events and of the related time
series. Examples of SN-related time series are presented and analyzed in the fourth
Section. The last section draws conclusions.
Notice that throughout the paper, we name “post” any type object on a SNs/SMs
(e.g., tweet, message, posted pictures, blogs, etc.). Also notice that this article is a
review of the work performed mainly by the author and colleagues on the use of SNs
for disaster detection, improving rescue timeliness, and increasing the mitigation of
the disaster consequences. The papers summarized here include mainly [1, 15–17].
There are several reasons for advocating the use of SNs/SMs analysis in view
of detecting emergencies. The most obvious are the large amount and the low-
granularity of the information one can gather, the sometimes fast response of the
SNs/SMs to events, and the lower cost of the gathered information. The obvious
drawbacks are the questionable respect of the privacy of people suffering hardship,
the large amount of noise in the information collected, first analyzed in [17], the lack
of studies on the reliability of the collected information, the issue of interpretation
of the information (DAP: Digital Metrics Guidance and Best Practices) and Herman
[5], and the questionable advantage in the speed of information propagation on SNs
compared to sensor networks, in some cases. Related to the last issue, Teodorescu
[1] draws the attention that the optimism expressed in [18] on the speed advantage
of SNs should be moderated because “while it is true that the earthquake waves
may travel slowly, allowing tweets to arrive at distant enough locations before the
earthquake, radio and wire communications between seismic monitoring stations
send warnings on a regular basis and much faster than the analysis of tweets, as
used in [18], who argues that ‘tweets give USGS early warning on earthquakes’.”
Nevertheless, there is little doubt that the monitoring and analysis of SNs/SMs
for the detection of emergencies and for assessing their implications is a valuable
addition to the tools of law enforcers, relief agencies, and decision makers directing
the mitigation of disaster effects [19–21]. The question is not if to use SNs and SMs
analysis, but how to best use it. The answer to the previous question requires deeper
understanding of the SNs/SMs response to events. This and previous articles [1,
15–17] address this particular issue.
Because the infancy era of SNs and analytics ended and tools are already
available for collecting various sorts of data, we suggest that statisticians can bring
in the near future an essential contribution to the deeper understanding on the
SNs behavior. What is needed in the first place are extensive correlation studies
to determine the mixture of factors influencing the SNs response to events and the
208 H.-N. Teodorescu
roles and weights of those factors. Studies by Schultz et al. [22] and Utz et al. [23],
and Teodorescu [1, 15] have already “emphasize that there numerous factors that
influence SN communications, including various emotions, the type of the crisis
and the SN itself.”
Reflecting concerns, systematically validated evidence, and empirical, anecdotic
evidence gathered during a project and partly presented in [1, 15–17], moreover
reflected concerns expressed by other studies [24–27], we summarized [1, 15] the
following reservations:
1. The usefulness of the information collected on SNs in rescuing and in disaster
management is limited and case-dependent; a large amount of posts represent
noise that, when not filtered, may confuse the efforts of disaster mitigation. In
more detail:
2. Posts may help produce better assessment of the situation when the event
involves a large part of the population, but not when the posts refer to localized,
distant events not affecting their authors [1]. Stefanidis et al. [28] specifically
address the issue of gathering both content and location for improving the data
relevance.
3. Posts triggered by disasters frequently represent only a carrier of sentiments and
frustrations unrelated to the event—although triggered by it—or the result of the
chatting desire. When not filtered, these posts clutter the data and may modify
the results of the analysis in unexpected ways [17]. The effect of the noise on
the correlation between the SNs response to events and number of the affected
population was addressed, for example, in [16].
4. Filtering all the above described noise is beyond the capabilities of current
analytics.
The literature on the relevance of the messages on SNs for disaster mitigation
includes various useful points of view, see, for example, [19, 24–27, 29–34]. Castillo
et al. [30] tried to propose a more technical approach, by defining the posts relevance
in terms of the number of their retransmissions and audience. Unfortunately, it is
quite easy to see that there is no reason for equality of the importance of a post for
rescuers and the interest the audience gets in that message. The person receiving
the post is key in this respect. While posts retransmissions and audience may help
filtering part of the unessential data, they are not solving the problem [1]. These
issues are exemplified in the subsequent sections of this article.
In the generation of time series discussed in this article, two processes contribute
[1, 15]: the external event (disaster) and the societal process represented by the SN-
reflection of the event. The time series reflect both of them, yet the interest of the
rescuers and emergency agencies may differ in scope; some decision makers may
be interested in the event itself (how large is the flooded area; how big was the
Emergency-Related, Social Network Time Series: Description and Analysis 209
earthquake and how many buildings are destroyed); others may wish to know only
about the affected population in terms of injured or dead numbers; relief agencies
may wish to know on all aspects, including panic spreading. The posts on SNs may
not answer all the related questions; we don’t know now well to what SNs response
correlate. Subsequently the interest is restrained to what the time evolution of the
number of posts, seen as a time series, may tell us. Precisely, we aim to determine if
the future number of posts can be predicted from short segments of the related time
series. This question is essential because, on one side, we need to make predictions
for better planning and mitigation of the disaster consequences; second, the ability
of making correct predictions validates the model and frees the rescuers from part
of the task of monitoring the SN response.
We distinguish between “instantaneous” (short duration) events (e.g., explosions,
earthquakes, landslides, tornados, crashes, and some attacks), and those (such as
storms and wildfires) that we name long-duration events (also characterized by
a slow onset.) Short duration events have abrupt onset. In technical terms, short
duration events act as Dirac impulse excitations on the system represented by
the SN. From another point of view [1], events may be predictable, enabling us
to issue warnings (in this category are storms, hurricanes, and most floods), and
unpredictable, such as earthquakes and crashes. Notice that actually there is a third
category, including the events that allow us a window for warning of the order of
minutes (e.g., tsunamis and tornados). For brevity, we are concerned with the main
two categories.
The subsequent paragraphs, on notations and main parameters of the SNs/SMs
response to events, are almost ad verbatim from [15] and [1]. Conveniently assume
that one can determine the moment of occurrence (onset) of the event, ton - ev and the
moment of its vanishing, toff - ev . The duration of the event is Tev D toff-ev ton-ev . The
“intensity” (amplitude) of the event, E(t), is the number of messages or posts per unit
time. Then, several features of the event might be defined, for example, its maximal
peak amplitude, the onset time, the decrease time, the number of peaks, (main)
frequency of its fluctuations, etc. The elementary parameters of the SN response
and their notations are listed subsequently.
• nev (t1 , t2 )—the number of posts in a specified time interval, (t1 , t2 ); when the
interval length t2 t1 is the unity this parameter is the same as the one below and
is the variable in the time series.
• nev (t)—the temporal density distribution (i.e., the “instantaneous” number of
relevant messages) of the SN/SM response (relevant posts), for the event ev.
• Onset delay of the SN response, on D t .n > 0/ t0 ; here, ton - ev , the onset time
of the event, with t0 .
• Nmax —maximal amplitude of the SN response (maximum number of posts per
time unit; the specified unit of time may be an hour, a day, or a minute).
• Ntot —total number of posts (messages); Ntot may correlate with the duration of
the disaster, with the total number of victims, or with effects of panic, effects of
“spectacularity,” etc.
• tmax .since t0 / D t .n D Nmax / t0 —time to the peak of the SN response.
• plt —plateau duration (if any) of the SN response.
210 H.-N. Teodorescu
Further parameters have been introduced and explained in Teodorescu [1]. Also,
two examples of time series were given in that paper, related to the SNs responses to
the weather event (winter storm) in New England, January–February 2015, and to
the small earthquake series in Romania, during 2014—first months of 2015. Further
examples are given in this article, using time series represented by n .tk ; tk C t/ D
nk , where t is typically one day. The main interest is to find a good empirical model
for the time series and to try to explain the process leading to the empirical model
in the frame of a theoretical one.
The method of gathering data (posts) on several SNs and SMs was presented in
Teodorescu [1, 15–17]. A first example of time series is shown in Fig. 1, which
represents the response in number of tweets per day to a long-duration event, the
series of winter storms in New England, January–February 2015 (only January
shown). Data in Fig. 1 was collected by the author daily using both Topsy Social
Analytics (http://topsy.com/) and Social Mention search (http://socialmention.com)
with the search condition “blizzard OR storm AND snow OR ice,” manually
assembled and processed by the author; the figures in the graph represent averaged
numbers. Because of the several relapses of the storm, there are, as expected, several
peaks. Also, an almost plateau is present due to multi-day sustained storm. The time
of the peaks coincided with the day of the storm recurrences, but there was no good
correlation of the number of posts with the quantity of snow or the temperature in the
region as announced by major meteorological stations. The manual analysis of the
posts shown that a large number (majority) of the posts were triggered anecdotally
by the weather condition, but were not relevant with respect to the persons in need.
Worryingly, very few persons on need were found to post messages and virtually no
posts were from aged or sick persons in need, although a few messages written by
Fig. 1 Time series for the number of messages for the ‘blizzard OR storm AND snow OR ice’
event, measured as tweets per day, for the period 1 February–2 March 2015 (horizontal axis: day
count)
Emergency-Related, Social Network Time Series: Description and Analysis 211
14000 14000
12000 y = 36010e-1.592x
12000
10000 R² = 0.9418
10000
8000 8000
6000 y = 10226e-1x
6000
4000 R² = 0.8408
4000
2000
2000
0
0
1 2 3 4 5 6
1 2 3 4
Fig. 2 Response of Tweeter to the Czech event occurred on February 24, 2015 (see text). Also
shown is the effect of the noise on the quality of the model
35000
30000
25000
20000 y = 32316e-0.761x
15000 R² = 0.9535
10000
5000
0
0 1 2 3 4 5
Fig. 3 Approximation of the decay of the response to Copenhagen attack on February 14, 2015
Fig. 4 Forgetting (decrease) model for SN response: polynomial and exponential approximations
for the train derailment event in California, February 24, 2015
work well enough. The decrease of the response is also best (among simple models)
described by the exponential model, which also can find a natural explanations
and can be extended to cover larger time intervals. Although polynomial models
may work better for short durations, they behave wrong when time increases (see
negative values of the response and fluctuations) and have no reasonable basis as
models, see Fig. 4. In Teodorescu [35], an approach was presented for optimizing
the model based on adaptive fuzzy coordinate transformations.
Converting the activity on the SNs and SMs into time series seems a natural
idea, moreover it is relatively easy to perform as explained in Section 3. This
representation makes the SNs’ response to events easier to represent and analyze
and provides a large number of study avenues. Importantly, once a type of model
for the response is established, it is easy to determine from the initial data the
parameters of the model. This, in turn, allows us to make reliable enough predictions
Emergency-Related, Social Network Time Series: Description and Analysis 213
on the amplitude of the response in the hours and days to come. The ability to make
predictions is, as it is well-known, crucial in the process of planning for rescues and
making decisions for mitigation of the effects of the emergencies.
However, the predictions of the actual situation in emergencies, based on the
number of posts referring to a specific emergency, should be considered with
care, because many, possibly most of the posts may come from people outside or
unrelated to the emergency. Also, as proved by the analysis of posts related to the
events of snow storms in New England in 2015, even when the posts come from
people in the involved region and even from people involved in the event, the vast
majority of those people may be not in need. Therefore, a detailed analysis of the
posts has to be made before they are included in the count of significant posts and
used to build the time series for representing the situation. The tasks of perfecting
the search for posts of interest related to emergencies and of validating the relevance
of the posts from the points of view of rescuers and decision makers remain to be
accomplished, as currently existing systems are unable to perform these tasks with
enough accuracy.
Additional Material Additional material regarding methods of data collection and analysis, on
the limits of these methods and on the uncertainty in the data can be found on the web address of
the project, at http://iit.academiaromana-is.ro/sps/ or can be asked directly from the author.
Acknowledgments The research and the participation in this conference are partly supported
by the multi-annual NATO SPS grant 984877 “Modelling and Mitigation of Public Response
to Catastrophes and Terrorism.” The contest from various freely available analytics, including
those from SocialMention™ real-time search platform, Topsy Inc.™, Google Analytics™, etc.,
was only used as resources (as permitted). Data generated by these analytics were compared and
only derivatives of the data are given in this article.
Conflict of Interest In the above-mentioned grant, the author is the PI. At our best knowledge,
neither the author nor his employers or the institutions and organizations named in this article in
relation to the author have any interest in the tools available free on the Internet and used in this
research, or in the companies or organizations that produced those tools.
References
1. Teodorescu, H.N.: Emergency situational time series analysis of SN traffic. In: Proceedings of
the ITISE 2015 International Work-Conference on Time Series, Granada, July 1–3, 2015, pp.
1–12 (also, additional material at http://iit.academiaromana-is.ro/sps/)
2. Kumar, S., Morstatter, F., Liu, H.: Twitter Data Analytics. Springer, New York. http://
tweettracker.fulton.asu.edu/tda/TwitterDataAnalytics.pdf (2013)
3. Morstatter, F., Kumar, S., Liu, H., Maciejewski, R.: Understanding Twitter data with Tweet-
Xplorer (Demo). In: Proceedings of the 19th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, KDD 2013, pp. 1482–1485. ACM, New York (2013)
4. Kumar, S., Barbier, G., Abbasi, M.A., Liu, H.: TweetTracker: an analysis tool for humanitarian
and disaster relief. ICWSM, http://tweettracker.fulton.asu.edu/Kumar-etal_TweetTracker.pdf
(2011)
5. Herman, J.: Social media metrics for Federal Agencies. http://www.digitalgov.gov/2013/04/19/
social-media-metrics-for-federal-agencies/ (2013)
214 H.-N. Teodorescu
6. Konkel, F.: Predictive analytics allows feds to track outbreaks in real time. http://fcw.com/
articles/2013/01/25/flu-social-media.aspx (2013)
7. Lyngaas, S.: Faster data, better law enforcement. http://fcw.com/articles/2015/02/03/faster-
data-better-enforcement.aspx (2015)
8. Abbasi, M.-A., Kumar, S., Filho, J.A.A., Liu, H.: Lessons learned in using social media
for disaster relief—ASU crisis response game. In: Social Computing, Behavioral—Cultural
Modeling and Prediction. LNCS 7227, pp. 282–289. Springer, Berlin/Heidelberg (2012)
9. Anderson, K.M., Schram, A.: Design and implementation of a data analytics infrastructure in
support of crisis informatics research (NIER track). In: Proceedings of the 33rd International
Conference on Software Engineering (ICSE’11), May 21–28, 2011, Waikiki, Honolulu, HI,
USA. pp. 844–847. ACM, New York (2011)
10. Boulos, M.N.K., Sanfilippo, A.P., Corley, C.D., Wheeler, S.: Social Web mining and exploita-
tion for serious applications: technosocial predictive analytics and related technologies for
public health, environmental and national security surveillance. Comput. Methods Programs
Biomed. 100, 16–23 (2010)
11. Liu, B.F., Austin, L., Jin, Y.: How publics respond to crisis communication strategies: the
interplay of information form and source. Public Relat. Rev. 37(4), 345–353 (2011)
12. Houston, J.B., Spialek, M.L., Cox, J., Greenwood, M.M., First, J.: The centrality of com-
munication and media in fostering community resilience. A framework for assessment and
intervention. Am. Behav. Sci. 59(2), 270–283 (2015)
13. Merchant, R.M., Elmer, S., Lurie, N.: Integrating social media into emergency-preparedness
efforts. N. Engl. J. Med. 365, 289–291 (2011)
14. Teodorescu, H.N.: SN voice and text analysis as a tool for disaster effects estimation—a
preliminary exploration. In: Burileanu, C., Teodorescu, H.N., Rusu, C. (eds.) Proceedings of
the 7th Conference on Speech Technology and Human—Computer Dialogue (SpeD), Oct 16–
19, 2013, pp. 1–8. IEEE, Cluj-Napoca (2013) doi:10.1109/SpeD.2013.6682650
15. Teodorescu, H.N.: Using analytics and social media for monitoring and mitigation of social
disasters. Procedia Eng. 107C, 325–334 (2015). doi:10.1016/j.proeng.2015.06.088
16. Teodorescu, H.N.L.: On the responses of social networks’ to external events. Proceedings of
the 7th IEEE International Conference on Electronics, Computers and Artificial Intelligence
(ECAI 2015), 25 June –27 June, 2015, Bucharest (2015)
17. Teodorescu, H.N.L.: Social signals and the ENR index—noise of searches on SN with
keyword-based logic conditions. In: Proceedings of the IEEE Symposium ISSCS 2015, Iasi
(2015) (978-1-4673-7488-0/15, 2015 IEEE)
18. Konkel, F.: Tweets give USGS early warning on earthquakes. (2013). http://fcw.com/articles/
2013/02/06/twitter-earthquake.aspx
19. Bruns, A., Burgess, J.E.: Local and global responses to disaster: #eqnz and the Christchurch
earthquake. In: Sugg, P. (ed.) Disaster and Emergency Management Conference Proceedings,
pp. 86–103. AST Management Pty Ltd, Brisbane (2012)
20. Pirolli, P., Preece, J., Shneiderman, B.: Cyberinfrastructure for social action on national
priorities. Computer (IEEE Computer Society), pp. 20–21 (2010)
21. Zin, T.T., Tin, P., Hama, H., Toriu, T.: Knowledge based social network applications to
disaster event analysis. In: Proceedings of the International MultiConference of Engineers and
Computer Scientists, 2013 (IMECS 2013), vol. I, Mar 13–15, 2013, pp. 279–284. Hong Kong
(2013)
22. Schultz, F., Utz, S., Göritz, A.: Is the medium the message? Perceptions of and reactions to
crisis communication via twitter, blogs and traditional media. Public Relat. Rev. 37(1), 20–27
(2011)
23. Utz, S., Schultz, F., Glocka, S.: Crisis communication online: how medium, crisis type and
emotions affected public reactions in the Fukushima Daiichi nuclear disaster. Public Relat.
Rev. 39(1), 40–46 (2013)
24. Chae, J., Thom, D., Jang, Y., Kim, S.Y., Ertl, T., Ebert, D.S.: Public behavior response analysis
in disaster events utilizing visual analytics of microblog data. Comput. Graph. 38, 51–60 (2014)
Emergency-Related, Social Network Time Series: Description and Analysis 215
25. Murakami, A., Nasukawa, T.: Tweeting about the tsunami? Mining twitter for information
on the Tohoku earthquake and tsunami. In: Proceedings of the 21st International Confer-
ence Companion on World Wide Web, WWW’12, pp. 709–710. ACM, New York (2012)
doi:10.1145/2187980.2188187
26. Potts, L., Seitzinger, J., Jones, D., Harrison, A.: Tweeting disaster: hashtag constructions
and collisions. In: Proceedings of the 29th ACM International Conference on Design of
Communication, SIGDOC’11, pp. 235–240. ACM, New York (2011)
27. Toriumi, F., Sakaki, T., Shinoda, K., Kazama, K., Kurihara, S., Noda, I. Information sharing
on Twitter during the 2011 catastrophic earthquake. In: Proceedings of the 22nd International
Conference on World Wide Web companion, pp. 1025–1028. Geneva (2013)
28. Stefanidis, A., Crooks, A., Radzikowski, J.: Harvesting ambient geospatial information from
social media feeds. GeoJournal 78, 319–338 (2013)
29. Acar, A., Muraki, Y.: Twitter for crisis communication: lessons learned from Japan’s tsunami
disaster. Int. J. Web Based Communities 7(3), 392–402 (2011)
30. Castillo, C., Mendoza, M., Poblete, B.: Information credibility on Twitter. In: WWW 2011—
Session: Information Credibility, World Wide Web Conference (IW3C2), Mar 28–April 1,
2011, pp. 675–684. ACM, Hyderabad (2011)
31. Cheong, F., Cheong, C.: Social media data mining: a social network analysis of tweets during
the 2010–2011 Australian floods. In: Proceedings of the PACIS, Paper 46. http://aisel.aisnet.
org/pacis2011/46 (2011)
32. Gil, Y., Artz, D.: Towards content trust of web resources. Web Semant. Sci. Serv. Agents World
Wide Web 5, 227–239 (2007)
33. Kent, M.L., Carr, B.J., Husted, R.A., Pop, R.A.: Learning web analytics: a tool for strategic
communication. Public Relat. Rev. 37, 536–543 (2011)
34. Qu, Y., Huang, C., Zhang, P., Zhang, J.: Microblogging after a major disaster in China: a case
study of the 2010 Yushu Earthquake. In: CSCW 2011, March 19–23, 2011. ACM, Hangzhou
(2011)
35. Teodorescu, H.N.L.: Coordinate fuzzy transforms and fuzzy tent maps—properties and
applications. Studies Inform. Control 24(3), 243–250 (2015)
Competitive Models for the Spanish Short-Term
Electricity Demand Forecasting
Abstract The control and scheduling of the demand for electricity using time
series forecasting is a powerful methodology used in power distribution systems
worldwide. Red Eléctrica de España, S.A. (REE) is the operator of the Spanish
electricity system. Its mission is to ensure the continuity and security of the
electricity supply. The goal of this paper is to improve the forecasting of very
short-term electricity demand using multiple seasonal Holt–Winters models without
exogenous variables, such as temperature, calendar effects or day type, for the
Spanish national electricity market. We implemented 30 different models and
evaluated them using software developed in MATLAB. The performance of the
methodology is validated via out-of-sample comparisons using real data from the
operator of the Spanish electricity system. A comparison study between the REE
models and the multiple seasonal Holt–Winters models is conducted. The method
provides forecast accuracy comparable to the best methods in the competitions.
1 Introduction
Electric power markets have become competitive due to the deregulation carried
out in recent years that allows the participation of producers, investors, traders and
qualified buyers. Thus, the price of electricity is determined on the basis of a buying–
selling system. Short-term load forecasting is an essential instrument in power
system planning, operation and control. Many operating decisions are based on load
forecasts, such as dispatch scheduling of generating capacity, reliability analysis and
maintenance planning for the generators. Therefore, demand forecasting plays an
important role for electricity power suppliers, as both excess and insufficient energy
production may lead to high costs and significant reductions in profits. Forecasting
accuracy has a significant impact on electric utilities and regulators.
A market operator, a distribution network operator, electricity producers, con-
sumers and retailers make up the Spanish electricity system. This market is based
on either a pool framework or bilateral contracts. Market clearing is conducted
once a day, providing hourly electricity prices. The market operator in Spain is
OMEL (Operador del Mercado Ibérico de Energía, Polo Español, S. A.), which is
responsible for managing the bidding system for the purchase and sale of electricity
according to the legal duties established, as well as the arrangement of settlements,
payments and collections, incorporating the results of the daily and intra-day
electricity markets. Red Eléctrica de España, S.A. (REE) is the distribution network
operator within the Spanish electricity system, and its mission is to ensure the
continuity and security of the electricity supply. The role of REE as system operator
consists of maintaining a balance. For this purpose, it forecasts consumption,
operating and overseeing the generation and transmission installations in real time,
thus ensuring that the planned production at the power stations coincides at all times
with the actual consumer demand. A good description of the functioning of Spain’s
electricity production market can be found in [1–4].
A state-of-the-art description of different methods of short-term electricity
demand forecasting can be found in [5, 6]. These methods are based on the Holt–
Winters exponential smoothing model, ARIMA time series models, and electricity
demand regression models with exogenous variables, such as temperature, artificial
neural networks (ANNs) and hybrid forecast techniques, among others. ANN mod-
els are an alternative approach to load forecast modelling [7, 8]. Some researchers
consider related factors such as temperature, e.g. [9–11], and calendar effects or day
type [12] in load forecasting models.
Exponential smoothing methods are especially suitable for short-term forecasting
[6, 13–16]. Seasonal ARIMA [17] and Holt–Winters exponential smoothing [18]
are widely used, as they require only the quantity-demanded variable, and they
are relatively simple and robust in their forecasting. Recent papers have stimulated
renewed interest in Holt–Winters exponential smoothing for short-term electricity
demand forecasting, due to its simple model formulation and good forecasting
results [16, 19–23]. In a recent paper, Bermudez [24, 25] analyses an extension
of the exponential smoothing formulation, which allows the use of covariates
to introduce extra information in the forecasting process. Calendar effects, such
as national and local holidays and vacation periods, are also introduced using
covariates. This hybrid approach can provide accurate forecasts for weekdays,
public holidays, and days before and after public holidays.
In recent years, new techniques have been used in electricity demand forecasting.
For example, non-linear regression models have been used for load forecasting
[26–28]. Functional time series modelling and forecasting techniques have been
extensively studied for electricity demand forecasting, and [29–32] have also studied
functional nonparametric time series modelling. Hyndman and Shang [27] propose
forecasting functional time series using weighted functional principal component
regression and weighted functional partial least squares regression. Some authors
Competitive Models for the Spanish Short-Term Electricity Demand Forecasting 219
have studied the possibility of using state-space models [33–36] to improve load
forecasting performance.
It is well known that some exogenous variables, particularly temperature and
calendar effects (including annual, weekly and daily seasonal patterns, as well as
public holidays, and different hours on work days and non-work days), have an
influence on the electricity demand [37]. There are some considerations to take into
account, as the complexity of the model would increase if exogenous variables such
as temperature were introduced. This solution is certainly not a parsimonious way
of modelling. Several authors have considered this problem. The interested reader
is referred to Soares and Medeiros [12] and Valor et al. [38] for more detailed
descriptions. Temperature is not considered for two reasons:
• Available data on Spanish electricity consumption is not disaggregated regionally
into different climatic sub regions. Taking into account that the available
electricity consumption data used in the study correspond to the whole Spanish
state, while the meteorological parameters vary in different geographical regions
(especially between northern and southern Spain, and between the islands
(Canary and Balearic Islands) and the Iberian Peninsula), it is necessary to
calculate weighted indices for the representative meteorological parameters for
the entire geographical region under consideration. Soares and Medeiros [12]
also discuss this problem.
• The hourly temperatures in Spain are not easily accessible. The selected variable
is the mean daily air temperature (ı C), as it can better capture thermal oscillation
within a day [9]. Taylor and Buizza [12] present an analysis based on daily
electricity demand data.
Moreover, forecasting models based on linear models now tend to approach this
task by using separate models for each hourly data point (some researchers propose
forecasting models that include 168 separate regression equations, one for each
hour in the week). However, as the problem is intrinsically a multivariate one, the
interrelated information is lost when the models treat each hourly value separately. It
is also possible to use the grouping of work days and weekends, but the problem of
information loss would be the same, due to disruption in the correlation structure
present in the time series. In modelling time series, it is simply not possible to
perform groupings of weekdays and weekends, as this would change the structure
of temporal dependence.
The models discussed in this paper only consider time as an explanatory
variable to create simple, flexible models. In this paper, an automatic forecasting
procedure based on univariate time series models is implemented. The hourly
Spanish electricity demand forecasting performance of multiple seasonal univariate
Holt–Winters models is examined, based on multi-step ahead forecast mean squared
errors, which are greatly dominated by daily, weekly and annual seasonal cycles.
REE’s model for electricity demand forecasting consists of one daily model and
24 hourly models. About 150 parameters are readjusted each hour to enhance the
next hour’s forecasts. The mean forecast accuracy of REE models is reported as a
MAPE (mean absolute percentage error) of around 2 % [2].
220 J.C. García-Díaz and Ó. Trull
The data set used in this paper covers the period from 1 July 2007 to 9 April 2014. It
is provided by REE (www.ree.es). REE also provides forecasts for the next hours, as
well as other operational values. The data set comprises all the transactions carried
out in mainland Spain, not including the Canary Islands or the Balearic Islands. This
kind of series is highly dominated by three seasonal patterns: the daily cycles (intra-
day), with a period length of s1 D 24, the weekly cycles (intra-week), with length of
s2 D 168, and the yearly cycles (intra-year), where s3 D 8766. Figure 1, on the left,
shows a random week from this series, in 2010, 2011, 2012 and 2013, where the
intra-day and intra-week seasonal patterns are recognisable. The right panel depicts
two years on the same axes, and the overlapping of the two series denotes the intra-
year seasonal pattern.
The methodology used by REE to provide forecasts for the next day consists of a
combination of 24 hourly models plus 1 daily model. All these models share the
same structure, which considers the special days, weather and some distortions. The
Competitive Models for the Spanish Short-Term Electricity Demand Forecasting 221
17-Jun-2010 16-Jun-2011
Week 25/2012 Week 25/2013
Fig. 1 Seasonal patterns obtained from the hourly Spanish electricity demand time series. On the
left, four representations of week 25 in different years are depicted, where the intra-day cycles
from Monday to Friday and the intra-week cycles on the weekend can be seen. On the right, the
hourly demand for years 2010 and 2011 is depicted. The overlap of the two series denotes the third
seasonality: intra-year
where pt refers to the trend, while st refers to the seasonality. The factor CSDt is the
contribution of the special days, and CWEAt is the weather conditions. ut refers
to a stationary disturbance. The component pt C st C ut is denoted as the base
consumption, modelled using the ARIMA methodology. The rest of the factors are
modelled as dummy variables. More detailed descriptions of the Spanish electricity
system and REE forecasting methods can be found in [1–4].
In this paper we used the forecasts provided directly by REE employing the
previously mentioned methodology.
Double and triple seasonal Holt–Winters models (HWT) were introduced by Taylor
[16, 20] as an evolution of Holt–Winters methods [18]. Here, we propose a
generalisation to n-seasonality of the HWT models, formed by transition equations:
level smoothing (2); trend smoothing (3); and as many seasonality smoothing
222 J.C. García-Díaz and Ó. Trull
equations (4) as seasonal patterns are taken into account. A final forecast equation
(5) uses the previous information to provide a k-step-ahead forecast. We call this
model nHWT, and the smoothing equations for an additive trend-multiplicative
seasonality model are defined in Eqs. (2), (3), (4), and (5).
0 1
X
St D ˛ @ Y .i/ A C .1 ˛/ .St1 C Tt1 / ;
t
(2)
Itsi
0 1
.i/ X .i/
D •.i/ @ Y . j/ A C 1 ı .i/ Itsi ;
t
It (4)
St Itsj
j¤i
Y Y .i/
b .i/
Xt .k/ D .St C kTt / Itsi Ck C 'AR
k
Xt .St1 C kTt1 / Itsi ; (5)
i i
where Xt are the observed values, St is the level, Tt is the additive trend smoothing
and I t(i) are the seasonal smoothing equations for the seasonal pattern. The involved
parameters ˛ and are the smoothing parameters for level and trend, while ı (i)
is the smoothing parameter for each seasonal pattern. Here i means the seasonal
component, with a seasonal cycle length of Si . b Xt .k/ is the k-step ahead forecast.
'AR is the adjustment for the first autocorrelation error. Additionally, Hyndman et al.
[35] and Taylor [39] proposed damped trend versions of (2), and we gathered all
models following Pegel’s classification, shown in Table 1.
These models are expounded in Table 2 for better understanding. This table
shows the complete formulae for the adjusted models using the first autoregressive
error. To obtain normal models, 'AR D 0 annuls this component.
.i/
X . j/ .i/ .i/
Y . j/ .i/
b i k .i/ @ .i/ @
Xt .k/ D St C Tt C 'AR "t It D ı Xt St Itsj A C 1 ı .i/ Itsi It D ı Xt =St Itsj A C 1 ı .i/ Itsi
iD1 j¤i j¤i
k k
!
X X .i/
X Y .i/
b i k b i k
Xt .k/ D St C Tt C Itsi Ck C C'AR "t Xt .k/ D St C Tt Itsi Ck C 'AR "t
iD1 i iD1 i
(continued)
223
Table 2 (continued)
224
XX
Additive Multiplicative
XXXSeason None
Trend XXX
! !
X .i/
Y .i/
Multiplicative St D ˛Xt C .1 ˛/ .St1 Rt1 / St D ˛ Xt Itsi C .1 ˛/ St1 Rt1 St D ˛ Xt = Itsi C .1 ˛/ St1 Rt1
i i
St
Rt D St1 C .1 / Rt1 Rt D .St =St1 / C .1 / Rt1 Rt D .St =St1 / C .1 / Rt1
0 1 0 1
k .i/ .i/ @
X . j/ .i/ .i/ .i/ @
Y . j/ .i/
b
Xt .k/ D St Rkt C 'AR "t It D ı Xt St Itsj A C 1 ı .i/ Itsi It D ı Xt =St Itsj A C 1 ı .i/ Itsi
j¤i j¤i
X .i/ Y .i/
b k b k
Xt .k/ D St Rkt C Itsi Ck C 'AR "t Xt .k/ D St Rkt Itsi Ck C 'AR "t
i i
! !
X .i/
Y .i/
Damped multiplicative St D ˛Xt C .1 ˛/ St1 Rt1 St D ˛ Xt Itsi C .1 ˛/ St1 Rt1 St D ˛ Xt = Itsi C .1 ˛/ St1 Rt1
i i
St
Rt D St1 C .1 / Rt1 Rt D .St =St1 / C .1 / Rt1 Rt D .St =St1 / C .1 / Rt1
k
X
i 0 1 0 1
iD1 .i/
X . j/ .i/ .i/
Y . j/ .i/
b k
Xt .k/ D St Rt C 'AR "t It D ı .i/ @X t St Itsj A C 1 ı .i/ Itsi It D ı .i/ @X t =St Itsj A C 1 ı .i/ Itsi
j¤i j¤i
k
X k
X
i i
iD1
X .i/ iD1
Y .i/
b k b k
Xt .k/ D St Rt C Itsi Ck C 'AR "t Xt .k/ D St Rt Itsi Ck C 'AR "t
i i
Rows are sorted by trend method, whereas columns are sorted by season. Here Xt are the observed values, St is the level, Tt is the additive trend, Rt is the multiplicative
(i)
Xt .k/ is the k-ahead forecast. ˛, , ı are the smoothing parameters, whereas is the damping
trend, I t is the seasonal index of the i seasonality (of cycle length si ) and b
factor, and 'AR is the adjustment with the AR(1) error. "t is the forecast error
J.C. García-Díaz and Ó. Trull
Competitive Models for the Spanish Short-Term Electricity Demand Forecasting 225
The model evaluation is based on the one-step-ahead prediction errors for obser-
vations within the evaluation period. One-step-ahead predictions are generated
from the model specifications and parameter estimates. Smoothing parameters are
determined in order to minimise the root of the mean of squared one-step-ahead
prediction errors (RMSE), as defined in (6).
r
1 Xb 2
RMSE D Xt Xt : (6)
N
The parameters were constrained to [0;1], and they were all optimised simulta-
neously. The main problem is that these optimisation algorithms strongly depend
on the starting values used for the algorithm. Therefore, the optimisation was
performed evaluating 10.000 initial vectors of smoothing parameters v D
by first
˛; ; ı .i/ ; ; 'AR , obtained randomly from a grid of dimensions of the number
of parameters to optimise, ranging from 0 to 1. The RMSE of each vector was
evaluated, and the ten vectors with the lowest RMSE were used as starting values
to perform an optimisation algorithm. The vector with the lowest RMSE provides
the smoothing parameters for the model. Since nHWT equations are recursive, the
model must be initialised. This procedure is explained in [20]. For the additive
trend models, the initialisation method proposed by Taylor [16] is used. For the
multiplicative trend models, we adapted the Holt–Winters [18] method to a multiple
seasonal pattern. Level is initialised following [40]. The seasonal indices for each
seasonality are computed as the average of dividing each cycle by its mean. Finally,
the indices are weighted by dividing the longer seasonal period indices by the shorter
ones.
In order to find the best predictive model, the work is divided into two stages: first,
a search is performed for the in-sample forecasting model, and in a second step, a
validation of out-of-sample forecasting is carried out.
The optimisation problem will be solved using a methodology specifically
®
developed and implemented in the software program running on the MATLAB
environment. In this Section, an extensive set of computational results and com-
parative studies are presented to study the performance of the proposed nHWT
models for short-term demand forecasting implemented in the software package.
The results are compared numerically with those obtained by the REE model in the
same periods.
226 J.C. García-Díaz and Ó. Trull
MAPE and symmetrical MAPE (sMAPE), defined in Eqs. (7) and (8), respec-
tively, were used to compare the prediction accuracy.
ˇ ˇ
1 X ˇˇ Xt b
Xt ˇˇ
MAPE D ˇ ˇ 100; (7)
N ˇ Xt ˇ
ˇ ˇ
Xt ˇˇ
1 X ˇˇ Xt b
sMAPE D ˇ ˇ 100: (8)
N Xt ˇ
ˇ Xt C b
Some authors, e.g. [41], have proposed the sMAPE as the best performance
measure to select among models. It is an average measure of the forecast accuracy
across a given forecast horizon, and it provides a global measurement of the
goodness of fit, while MAPE only provides point measures. Thus, in order to select
the best model to forecast future demands, a competition is carried out with all the
models present in Table 1, where the model with the lowest sMAPE for the forecasts
provided is chosen.
The following scenario was established to conduct the competition: Four time
windows from 2013 were randomly selected, one for each season (spring, summer,
autumn and winter), in order to address different climate conditions and three
alternatives, according to the methodology:
• Double seasonal (DS) and triple seasonal (TS) models using the full data set from
July 2007 until the day immediately before the forecasting period.
• Double seasonal models using only the seven weeks immediately before the
forecasting period, in order to check the influence of the data length on
predictions.
The REE model was also included as a reference. Because REE updates the
forecasts each hour, the analysis uses an hourly re-estimation for all models, in
order to establish the same comparison scenario. Table 3 summarises the results
obtained by comparing the sMAPE of the best models on four random days, each
corresponding to a different season.
DS methods using the full data set outperformed the other alternatives, even
though the TS method was expected to be the most accurate, mainly due to the
influence of national holidays (in Spain, Christmas ends on 6 January, and 12
October is Hispanic Day). Other authors [23] smooth the series in order to avoid
these unusual day effects, but in this case the serial has not been reworked or
Competitive Models for the Spanish Short-Term Electricity Demand Forecasting 227
Table 3 Symmetrical MAPE comparisons of the most accurate models, each alternative studied,
and the REE model
14 January 22 April 8 July 13 October
Alternative Models 2013 2013 2013 2013
Double 7 Weeks AMC24,168 0.741 0.689 0.351 0.611
dMC24,168 0.761 0.744 0.343 0.588
MMC24,168 0.830 1.265 0.982 0.850
Double Full Data set AMC24,168 0.508 0.576 0.303 0.634
DMC24,168 0.505 0.577 0.304 0.633
MMC24,168 0.503 0.578 0.302 0.634
Triple Full Data set NAC24,168,8760 0.590 0.579 0.355 0.591
AAC24,168,8760 0.750 0.645 0.417 0.619
DAC24,168,8760 0.730 0.634 0.426 0.624
REE 0.449 0.534 0.416 0.532
Results are shown for four different days, one in each season
adapted. Figure 2 shows the evolution of the MAPE within the forecast horizon
of all the selected models compared to the REE. We used the MAPE graphically, as
it makes it feasible to understand the evolution of the accuracy with the lead time
used. The models that show the best performance are the AMC24,168 , the MMC24,168
and the DMC24,168 . Results reveal that our models perform similarly to REE, and in
some cases, they outperform it.
The Diebold–Mariano test [42] compares the forecast accuracy of two competing
forecasting methods. We proposed and evaluated explicit tests to check whether
there were significant differences among the selected models, and no significant
differences were found. Thus, the AMC24,168 model is chosen, as the DMC24,168
uses more parameters, and the MMC24,168 requires more computational effort.
4.2 Validation
The validation process was carried out by comparing real time forecasts proposed
by REE and forecasts obtained with the selected model. The validation took place
during the weeks from 3 April 2014 to 9 April 2014, the last dates available at
the time of the study. In order to provide forecasts for these dates, models were
optimised using the full data set until the forecasting period, and 24-h forecasts were
provided. Figure 3 depicts the evolution of the MAPE obtained during the validation
process by the AMC24,168 model and REE. In this case, we use the MAPE because
we want to observe the short-term accuracy evolution compared to the forecast
horizon. The left graph shows the first case, in which 12-h forecast horizon results
are compared, as REE only forecasted until 22:50 h. The accuracy obtained with our
model is around 0.6 %, whereas the REE model reaches 0.9 %. On the graph on the
228 J.C. García-Díaz and Ó. Trull
MAPE(%)
MAPE(%)
1.4
5
1
3
0.6 1
10 20 10 20
Forecast Horizon (hours)
1.5 8 July 2013 13 Oct. 2013
2
1 MAPE(%) 1.6
MAPE(%)
1.2
0.5
0.8
0 0.4
0 10 20 10 20
Forecast Horizon (hours)
Fig. 2 MAPE evolution for the selected double seasonal models, compared to the REE model,
in a forecast horizon of 24 h in four seasons. Top-left panel depicts comparisons for 14 January
2013, and the right 22 April, while the bottom-left panel shows 8 July, and the right 13 October.
Differences among the forecasts provided by AMC, DMC and MMC are not significant; therefore,
it looks like the graphs overlap
0.8 4
0.7 1.5
0.6 3
2 1
0.5
0.4 1 0.5
0 0
11:00 17:00 22:00 2:00 10:00 18:00 4:00 10:00 17:00 24:00
Fig. 3 Forecasting accuracy evolution of the AMC24,168 model compared to the REE, measured
using MAPE along the forecast horizon. The left panel shows the results for 3 April 2014, and the
right panel for 9 April 2014. The middle panel depicts the MAPE for 7 April 2014 provided by
the AMC24,168 model forecast at 0:00 h, compared to the REE provided at 10:30, as well as the
re-forecast with the AMC24,168 model at 13:00, compared to the REE at 17:00
Competitive Models for the Spanish Short-Term Electricity Demand Forecasting 229
right, the 9 April 2014 results are analysed. The AMC24,168 has an average MAPE
of around 1 %, outperforming the REE, which reaches 1.5 %.
In order to check the horizon in which our model could provide accurate
forecasts, a forecast was made 24 h ahead, from 01:00 to 24:00 h on 7 April
2014, with AMC24,168 . The model outperforms the forecasts provided by REE at
10:30, which asymptotically reaches the performance obtained by the AMC24,168 .
A re-forecast made at 13:00 with AMC24,168 enhances the results, whereas the re-
estimation by REE at 17:00 h enhances its performance, but decreases drastically
after only a few hours, reaching the MAPE obtained by the original estimation with
AMC24,168 .
5 Conclusions
This paper presents an alternative methodology for forecasting the Spanish short-
term electricity demand, using models that parsimoniously provide forecasts com-
parable to the REE model.
The paper proposes a generalisation of the HWT models to n-multiple seasonal
models, and new initialisation criteria are developed for multiple seasonal com-
ponents. The methodology was implemented in a software application using the
®
MATLAB environment to predict hourly electricity demand in Spain by selecting
from 30 multiple Holt–Winters models with improved optimisation of smoothing
parameter methods.
An analysis using the Spanish data is conducted in two stages. The purpose of
the first stage is to select the best model for the forecasts. Three alternatives, using
double and triple seasonal methods for all models, were analysed by computing
forecasts using the implemented models. As a result, the AMC24,168 model is
selected. In a second stage, a validation analysis is conducted by making real time
forecasts and comparing them to REE.
The REE model has more than 100 parameters that are estimated hourly, and
it has a performance of around 2 % in terms of MAPE, whereas the methodology
presented here shows results similar to those obtained by the REE, obtaining MAPE
between 0.6 and 2 % in the time period considered. However, a maximum of
five parameters are optimised in the proposed models, significantly reducing the
computational effort.
Additional conclusions are drawn from comparing the models. The double
seasonal method obtains better forecasts than the triple seasonal, due to the fact
that the series is neither adapted nor reworked when triple seasonal methods are
used.
230 J.C. García-Díaz and Ó. Trull
References
1. Pino, R., Parreño, J., Gomez, A., Priore, P.: Forecasting next-day price of electricity in the
Spanish energy market using artificial neural networks. Eng. Appl. Artif. Intell. 21, 53–62
(2008)
2. Cancelo, J.R., Espasa, A., Grafe, R.: Forecasting the electricity load from one day to one week
ahead for the Spanish system operator. Int. J. Forecast. 24(4), 588–602 (2008)
3. Nogales, F.J., Conejo, A.J.: Electricity price forecasting through transfer function models. J.
Oper. Res. Soc. 57(4), 350–356 (2006)
4. Conejo, A.J., Contreras, J., Espínola, R., Plazas, M.A.: Forecasting electricity prices for a day-
ahead pool-based electric energy market. Int. J. Forecast. 21, 435–462 (2005)
5. Muñoz, A., Sánchez-Úbeda, E., Cruz, A., Marín, J.: Short-term forecasting in power systems:
a guided tour. In: Rebennack, S., Pardalos, P.M., Pereira, M.V.F., Iliadis, N.A. (eds.) Handbook
of Power Systems II, pp. 129–160. Springer, Berlin/Heidelberg (2010)
6. Weron, R.: Electricity price forecasting: a review of the state-of-the-art with a look into the
future. Int. J. Forecast. 30(4), 1030–1081 (2014)
7. Carpinteiro, O., Reis, A., Silva, A.: A hierarchical neural model in short-term load forecasting.
Appl. Soft Comput. 4, 405–412 (2004)
8. Darbellay, G.A., Slama, M.: Forecasting the short-term demand for electricity: do neural
networks stand a better chance? Int. J. Forecast. 1(16), 71–83 (2000)
9. Pardo, A., Meneu, V., Valor, E.: Temperature and seasonality influences on Spanish electricity
load. Energy Econ. 24(1), 55–70 (2002)
10. Pedregal, D.J., Trapero, J.R.: Mid-term hourly electricity forecasting based on a multi-rate
approach. Energy Convers. Manag. 51, 105–111 (2010)
11. Taylor, J.W., Buizza, R.: Using weather ensemble predictions in electricity demand forecasting.
Int. J. Forecast 19(1), 57–70 (2003)
12. Soares, L.J., Medeiros, M.C.: Modeling and forecasting short-term electricity load: a compar-
ison of methods with an application to Brazilian data. Int. J. Forecast 24(4), 630–644 (2008)
13. Chatfield, C., Yar, M.: Holt-Winters forecasting: some practical issues. The Statistician 37,
129–140 (1988)
14. Gardner, E.S.: Exponential smoothing: the state of the art. J. Forecast. 4, 1–28 (1985)
15. Gardner Jr., E.S.: Exponential smoothing: the state of the art, part II. Int. J. Forecast. 22, 637–
666 (2006)
16. Taylor, J.W.: Short-term electricity demand forecasting using double seasonal exponential
smoothing. J. Oper. Res. Soc. 54(8), 799–805 (2003)
17. Box, G., Jenkins, G.M., Reinsel, G.: Time Series Analysis: Forecasting & Control. Prentice-
Hall, Englewood Cliffs, NJ (1994)
18. Winters, P.R.: Forecasting sales by exponentially weighted moving averages. Management 6,
324–342 (1960)
19. Taylor, J.W.: An evaluation of methods for very short-term load forecasting using minute-by-
minute British data. Int. J. Forecast. 24(4), 645–658 (2008)
20. Taylor, J.W.: Triple seasonal methods for short-term electricity demand forecasting. Eur. J.
Oper. Res. 204(1), 139–152 (2010)
21. Arora, S., Taylor, J.W.: Short-term forecasting of anomalous load using rule-based triple
seasonal methods. Power Syst. IEEE Trans. 28(3), 3235–3242 (2013)
22. Taylor, J.W., McSharry, P.E.: Short-term load forecasting methods: an evaluation based on
European data. Power Syst. IEEE Trans. 22(4), 2213–2219 (2007)
23. Taylor, J.W., de Menezes, L.M., McSharry, P.E.: A comparison of univariate methods for
forecasting electricity demand up to a day ahead. Int. J. Forecast. 22(1), 1–16 (2006)
24. Corberán-Vallet, A., Bermúdez, J.D., Vercher, E.: Forecasting correlated time series with
exponential smoothing models. Int. J. Forecast. 27(2), 252–265 (2011)
25. Bermúdez, J.D.: Exponential smoothing with covariates applied to electricity demand forecast.
Eur. J. Ind. Eng. 7(3), 333–349 (2013)
Competitive Models for the Spanish Short-Term Electricity Demand Forecasting 231
26. Pierrot, A., Goude, Y.: Short term electricity load forecasting with generalized additive models.
In: Proceedings of 16th International Conference Intelligent System Applications to Power
Systems, pp. 410–415. New York: Institute of Electrical and Electronics Engineers (2011)
27. Fan, S., Hyndman, R.J.: Short-term load forecasting based on a semi-parametric additive
model. Power Syst. IEEE Trans. 27(1), 134–141 (2012)
28. Ba, A., Sinn, M., Goude, Y., Pompey, P.: Adaptive learning of smoothing functions: application
to electricity load forecasting. In: Bartlett, P., Pereira, F., Burges, C., Bottou, L., Weinberger,
K. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 2519–2527. MIT
Press, Cambridge (2012)
29. Antoch J., Prchal, L., DeRosa, M., Sarda, P.: Functional linear regression with functional
response: application to prediction of electricity consumption. In: Proceedings of the Func-
tional and Operatorial Statistics, IWFOS 2008. Springer, Heidelberg (2008)
30. Cho, H., Goude, Y., Brossat, X., Yao, Q.: Modeling and forecasting daily electricity load
curves: a hybrid approach. J. Am. Stat. Assoc. 108, 7–21 (2013)
31. Vilar, J.M., Cao, R., Aneiros, G.: Forecasting next-day electricity demand and price using
nonparametric functional methods. Int. J. Electr. Power Energy Syst. 39(1), 48–55 (2012)
32. Aneiros-Pérez, P., Vieu, G.: Nonparametric time series prediction: a semifunctional partial
linear modeling. J Multivar Anal 99, 834–857 (2008)
33. Harvey, A., Koopman, S.: Forecasting hourly electricity demand using time-varying splines. J.
Am. Stat. Assoc. 88, 1228–1253 (1993)
34. Dordonnat, V., Koopman, S.J., Ooms, M., Dessertaine, A., Collet, J.: An hourly periodic state
space model for modelling French national electricity load. Int. J. Forecast. 24(4), 566–587
(2008)
35. Hyndman, R., Koehler, A.B., Ord, J.K., Snyder, R.D.: Forecasting with Exponential Smooth-
ing: The State Space Approach. Springer, Heidelberg (2008)
36. So, M.K.P., Chung, R.S.W.: Dynamic seasonality in time series. Comput. Stat. Data Anal. 70,
212–226 (2014)
37. Weron, R.: Modeling and Forecasting Electricity Loads and Prices: A Statistical Approach.
Wiley, Chichester (2006)
38. Valor, E., Meneu, V., Caselles, V.: Daily air temperature and electricity load in Spain. J. Appl.
Meteorol. 40(8), 1413–1421 (2001)
39. Taylor, J.W.: Exponential smoothing with a damped multiplicative trend. Int. J. Forecast. 19(4),
715–725 (2003)
40. Chatfield, C.: The Holt-Winters forecasting procedure. Appl. Stat. 27, 264–279 (1978)
41. Makridakis, S., Hibon, M.: The M3-competition: results, conclusions and implications. Int. J.
Forecast. 16(4), 451–476 (2000)
42. Diebold, F.X., Mariano, R.S.: Comparing predictive accuracy. J. Bus. Econ. Stat. 13(3), 253–
263 (1995)
Age-Specific Death Rates Smoothed
by the Gompertz–Makeham Function
and Their Application in Projections
by Lee–Carter Model
Abstract The aim of this paper is to use stochastic modelling approach (Lee–Carter
model) for the case of age-specific death rates for the Czech population. We use
an annual empirical data from the Czech Statistical Office (CZSO) database for the
period from 1920 to 2012. We compare two approaches for modelling between each
other, one is based on the empirical time series of age-specific death rates and the
other one is based on smoothed time series by the Gompertz–Makeham function,
which is currently the most frequently used tool for smoothing of mortality curve
at higher ages. (Our review also includes a description of other advanced models
which are commonly used.) Based on the results of mentioned approaches we
compare two issues of time series forecasting—variability and stability. Sometimes
stable development of time series can be the correct issue which ensure significant
and realistic prediction, sometimes not. In the case of mortality it is necessary to
consider both unexpected or stochastic changes and long-term stable deterministic
trend. Between them we have to find a mutual compromise.
Trend of mortality is one of the most important indicator of standard of living. Mor-
tality is an important component of population’s reproduction and its development
is very interesting topic for demographers and actuaries. If mortality is going to be
better, then people live longer. The reason for the improvement in mortality and also
for increase in life expectancy (labelled ex;t ) could be better health care. The second
one is greater interest in healthy life style. On the other hand, the increase of ex;t
means population ageing [19]. More people live to the highest ages, so it is very
important to have better imagination about mortality at these ages. In the past it was
not so important, because only a few people live to the highest ages.
The level of mortality affects the length of human life. When we analyse the
development of mortality, it is important to know that the biggest changes of
mortality come out at higher ages (approximately 60 years and above), where
mortality has different character in comparison with lower ages. It is caused not
only by small numbers of deaths .Dx;t /, but also by small numbers of living at the
highest ages .Ex;t /. It is also necessary to realize that these data could be affected by
systematic and random errors. If we want to capture the most accurately mortality
of oldest people it is good idea to make minor adjustments in data matrix. This is
mainly related to smoothing of mortality curve and possibility of its extrapolation
until the highest ages. We can use several existing models for smoothing. The oldest
one (but still very often used) is the Gompertz–Makeham function [13, 20]. It is
suitable for the elimination of fluctuations in age-specific death rates (labelled mx;t )
and also for their subsequent extrapolation until the highest ages. The disadvantage
is that it can not be used for prediction of mortality and therefore neither for the
calculation of demographic projections ([1] or [25]).
Demographic projections of possible future evolution of population are essential
information channel, which is used for providing a key information about potential
evolution of mortality, birth rates, immigration and emigration, or other demo-
graphic statistics [24]. Each projection is based on the assumptions, which could
but might not be occurred. Stochastic demographic projections are based on the
main components [17], explaining trend, which is included in the development of
time series of age-specific demographic rates. A major influence on results has the
length of the time series (see, e.g., Coale and Kisker [8], or comparing the multiple
results of populations from study by Booth et al. [4]).
In this paper we focus on the evolution of data about mortality in the Czech
Republic, provided by the Czech Statistical Office [9]. The length of the time series
is sufficient for statistically significant projections, but the empirical data contain
high variability at the highest ages. We use two approaches for our analysis (see also
Simpach et al. [26]). The first one uses the empirical mx;t for the period from 1920 to
2012 and the second one uses smoothed values of mx;t by the Gompertz–Makeham
function. The first model contains unexpected variability in the time series, and the
second one is represented by stability and absence of unexpected changes. These
models will be evaluated and final projections of mx;t (and also estimated ex;t ) for
the Czech population until 2050 will be compared with each other.
For purposes of mortality analysis in the Czech Republic we use the data about
mortality from the Czech Statistical Office [9]: numbers of deaths at complete age
x Dx;t , and the exposure to risk Ex;t , which is estimated like mid-year population at
Age-Specific Death Rates Smoothed by the Gompertz–Makeham Function 235
age x (males and females separately in all cases). We use the annual data for the
reporting period from 1920 to 2012. The age-specific death rates (see, e.g., Erbas
et al. [10]) we calculate as
Dx;t
mx;t D (1)
Ex;t
and these empirical rates [in logarithms ln.mx;t /] we can see in 3D perspective charts
([7] or [14]) in Fig. 1.
It is important to know that mortality is influenced by systematic and random
errors (especially at higher ages). That is the reason why the modelling approach is
applied. Before we mention any function it is useful to notice the relation between
Fig. 1 Empirical data of mx;t in logs of the Czech males (top left) and females (top right). Bottom
left and right are these rates smoothed by the Gompertz–Makeham function. Source: Data CZSO
[9], authors’ calculations
236 O. Šimpach and P. Dotlačilová
mx;t and the intensity of mortality (labelled x;t ). The relationship could be written
as
mx D .x C 0:5/: (2)
For modelling of mortality at higher ages is very often used the Gompertz–
Makeham function, which can be written as
x D a C bcx ; (3)
where a, b and c are unknown parameters (more e.g. in [12] or [23]). Formula is
based on the exponential increase in mortality with an increasing age. There is well-
known fact that the development of mortality is different in each population. Due to
this issue we have more existing functions for mortality modelling. Let us mention
two logistic functions: Kanisto model
aebx
x D ; (4)
1 C aebx
aebx
x DcC ; (5)
1 C aebx
which is enriched by one more parameter c (more in [12]). Both functions could
be included among more optimistic ones, because they assume slow increase of
mortality with an increasing age. They are suitable for population with long-term
low level of mortality. On the other hand, we can mention functions which are
suitable for population with higher level of mortality (i.e. Gompertz–Makeham
function written above (3) and Coale–Kisker model). It is defined like
2 CbxCc
x D eax ; (6)
where a, b and c are also unknown parameters of this model. The third group
includes models which are useful somewhere between two previous groups. Let
us mention, e.g., Heligman–Pollard model
GH x
qx D ; (7)
1 C GH x
where G and H are unknown parameters, qx is in this case the probability of dying—
model is designed for estimation in this form. (From algorithm of life tables we need
to know the relationship between mx;t and qx;t . Probability of surviving (px;t ) follows
the Gompertz law of mortality (see Gompertz [13]) px;t D emx;t and probability of
Age-Specific Death Rates Smoothed by the Gompertz–Makeham Function 237
mx D bxa ; (8)
a and b are unknown parameters and this model is designed for estimation in “age-
specific death rates’ form”.
We return our attention to original Gompertz–Makeham function (3), then we
estimate their parameters in DeRaS software (see Burcin et al. [6]) and after we
calculate values of smoothed mx;t for males and females (see also Fig. 1). It is known
that the instability of the time series reduces their predictive capability ([2] or [11]).
The history although has the lowest weight in the prediction model. But for the
modelling of mortality, which is a long-term process that it has for each population
its long-term trend, the history even with a little weight could be quite important [4].
Interesting idea is to consider our two data matrices, (one empirical and the other
one smoothed), for calculation of mortality forecast up to the year 2050. The logs
of mx;t can be decomposed ([17] or [18]) as
M D A C BK> C E; (10)
!1
X X
T
bx D 1and kt D 0: (11)
xD0 tD1
to conclude, that if we find suitable intercept (b0x ) and slope (b1x ) of linear regression,
we can easily make linear extrapolation to the future as
where t is time variable. In Fig. 2 (top left and right) are shown the logs of mx;t for
males and females in “rainbow” chart over time (see study by Hyndman [14]), while
on the bottom charts are represented male’s and female’s logs of mx;t smoothed
by the Gompertz–Makeham function. There are ages at which the development
of time series is approximately linear and actually decreasing. But especially at
the advanced ages in the case of empirical data (top charts), we can see greater
variability, which cannot be explained by linear models only.
Fig. 2 Empirical logs of mx;t of the Czech males (top left) and females (top right) over time.
Bottom left and right is shown the development of these rates smoothed by the Gompertz–
Makeham function. Source: Data CZSO [9], author’s calculations
Age-Specific Death Rates Smoothed by the Gompertz–Makeham Function 239
where lx;t is the number of survivors at the exact age x from the default file
of 100,000 live births from tabular population. The number of deaths in tabular
population is given by equation
After that we calculate the number of lived years (Lx;t ), and the number of years of
remaining life (Tx;t ) like
lx;t C lxC1;t
Lx;t D for x > 0; (17)
2
and
Tx;t
ex;t D : (19)
lx;t
240 O. Šimpach and P. Dotlačilová
Using the SVD method implemented in the package “demography” [14], which is
developed for RStudio [22], we estimate the parameters aO x (age-specific mortality
profiles independent of time) and bO x (additional age-specific components determine
how much each age group changes when kt changes) for both Lee–Carter’s model.
We can see them in Fig. 3, from which it is also clear the comparison between the
different evolutions of these parameters, depending on the input variability.
The mortality indices kO t (the time-varying parameters) were estimated for both
models (empirical and smoothed) and it was found that the results are almost
identical. This is due to the fact that these indices are also almost independent on
the input variability of mx;t . We can see these estimates in Fig. 4. For these estimates
we calculated the predictions up to the year 2050 based on the methodological
approach of ARIMA, [5] and ran by “forecast” package in R [15, 16]. Results are
four ARIMA(1,1,0) models with drifts (see Table 1). Parameters AR(1) signed by
are equal to zero at the 5 % significance level. From these predictions with 95 %
confidence intervals, (which can be seen in Fig. 4 too) it is clear, that models for
females provide slightly lower values of these estimates.
Now we evaluate all Lee–Carter’s models on the basis of approach, which is
presented by Charpentier and Dutang [7]. Using the RStudio we display Pearson’s
residues in “rainbow” bubble charts firstly for the empirical males’ model, secondly
for the empirical females’ model, thirdly for the smoothed males’ model and lastly
for the females’ smoothed model. Each model is evaluated on the basis of the
residues by age x (left charts) and of the residues at time t (right charts). Most
residues are concentrated around 0, higher variability is explained by the estimated
Fig. 3 Comparison of two Lee–Carter’s models for males and females—the estimates of param-
eter aO x and bO x . Wheels represent model based on the empirical data matrix, lines represent model
based on smoothed data matrix. Source: authors’ calculations
Age-Specific Death Rates Smoothed by the Gompertz–Makeham Function 241
model. The Pearson’s residues for empirical and smoothed models (males and
females) are shown in Fig. 5. Because the evaluation results are good, we proceed
to fit and then to estimate the future values of ln(mx;t ) using parameters aO x , bO x and kO t
of both Lee–Carter’s models for males and for females.
Let us use the Lee–Carter model (9) in the following form
ln mx;t D aO x C bO x kO t ; (20)
where x D 0; 1; : : : ; 100 and t D 1920; : : : ; 2050. Obtained values with the attached
estimates of ln(mx;t ) based on the empirical/smoothed data matrix are displayed in
Fig. 6. It is evident that the empirical model provides more variable values of ln(mx;t )
than the smoothed model, especially at the highest ages (60+).
We use life tables algorithm and estimate life expectancy at birth (e0;t ), which is
one of the most important statistic result from demographic forecasts. In 5-year time
periods we show them in Table 2, estimated on the basis of empirical and smoothed
model. We believe that model which is based on smoothed data by the Gompertz–
Makeham function provides the prediction closer to reality. Mortality is explained
and predicted by its main components which is much more sophisticated approach
242 O. Šimpach and P. Dotlačilová
Fig. 5 Diagnostic control of two Lee–Carter’s models—Pearson’s residues for males and females.
Source: author’s calculations
Age-Specific Death Rates Smoothed by the Gompertz–Makeham Function 243
Fig. 6 Fitted values of ln(mx;t ) for the Czech males (left charts) and for females (right charts) for
the period from 1920 to 2012 with attached forecasts of these rates. Top charts are constructed by
model based on empirical data, bottom charts by model based on smoothed data by the Gompertz–
Makeham function. Source: authors’ calculations
than expects, e.g., a simple linear decreasing (see Fig. 2: there is a risk that the trend
will not be sufficiently explained and there remain the unexplained variability in
residues). Another study in the Czech Republic [1] also predicts death rates, but
used approach was on the basis of shortened data matrix of mx;t (since 1950) and
without smoothing.
244 O. Šimpach and P. Dotlačilová
4 Conclusion
In our paper we examined whether the Lee–Carter’s model provides better predic-
tions of future ln(mx;t ), which are based on the empirical data matrix or on smoothed
data matrix obtained by the Gompertz–Makeham function (which is currently
the most famous one for modelling and extrapolating of mortality curves). The
advantage of empirical model was that we analysed data without any modifications.
Residues of both models seem to be favourable. On the basis of this it is no doubt
about one of the used models. But if we look at our results in Fig. 6, we can see,
that the ln(mx;t ) decline through the all age groups in the smoothed model only,
what is correctly related to the law of Gompertz mortality. This may involve the one
important conclusion. From our comparison we can claim that the model based
on smoothed data fits better the reality, because it refers to the expected future
development of the analysed population.
Acknowledgements This paper was supported by the Czech Science Foundation project No.
P402/12/G097 DYME—Dynamic Models in Economics.
References
8. Coale, A.J., Kisker, E.E.: Mortality crossovers: reality or bad data? Popul. Stud. 40, 389–401
(1986)
9. CZSO: Life tables for the CR since 1920. Czech Statistical Office, Prague. https://www.czso.
cz/csu/czso/life_tables (2015)
10. Erbas, B., et al.: Forecasts of COPD mortality in Australia: 2006–2025. BMC Med. Res.
Methodol. 2012, 12–17 (2012)
11. Gardner Jr., E.S., McKenzie, E.: Forecasting trends in time series. Manag. Sci. 31(10), 1237–
1246 (1985)
12. Gavrilov, L.A., Gavrilova, N.S.: Mortality measurement at advanced ages: a study of social
security administration death master file. N. Am. Actuar. J. 15(3), 432–447 (2011)
13. Gompertz, B.: On the nature of the function expressive of the law of human mortality, and on
a new mode of determining the value of life contingencies. Philos. Trans. R. Soc. Lond. 115,
513–585 (1825)
14. Hyndman, R.J.: Demography: forecasting mortality, fertility, migration and population data. R
package v. 1.16. http://robjhyndman.com/software/demography/ (2012)
15. Hyndman, R.J., Shang, H.L.: Forecasting functional time series. J. Korean Stat. Soc. 38(3),
199–221 (with discussion) (2009)
16. Hyndman, R.J., Koehler, A.B., Snyder, R.D., Grose, S.: A state space framework for automatic
forecasting using exponential smoothing methods. Int. J. Forecast. 18(3), 439–454 (2002)
17. Lee, R.D., Carter, L.R.: Modeling and forecasting U.S. mortality. J. Am. Stat. Assoc. 87, 659–
675 (1992)
18. Lee, R.D., Tuljapurkar, S.: Stochastic population forecasts for the United States: beyond high,
medium, and low. J. Am. Stat. Assoc. 89, 1175–1189 (1994)
19. Lundstrom, H., Qvist, J.: Mortality forecasting and trend shifts: an application of the Lee-
Carter model to Swedish mortality data. Int. Stat. Rev. (Revue Internationale de Statistique)
72(1), 37–50 (2004)
20. Makeham, W.M.: On the law of mortality and the construction of annuity tables. Assur. Mag.
and J. Inst. Actuar. 8(1860), 301–310 (1860)
21. Melard, G., Pasteels, J.M.: Automatic ARIMA modeling including intervention, using time
series expert software. Int. J. Forecast. 16, 497–508 (2000)
22. R Development Core Team: R: A Language and Environment for Statistical Computing. R
Foundation for Statistical Computing, Vienna (2008)
23. Simpach, O.: Faster convergence for estimates of parameters of Gompertz-Makeham function
using available methods in solver MS Excel 2010. In: Proceedings of 30th International
Conference on Mathematical Methods in Economics, Part II, pp. 870–874 (2012)
24. Simpach, O.: Detection of outlier age-specific mortality rates by principal component method
in R software: the case of visegrad four cluster. In: International Days of Statistics and
Economics, pp. 1505–1515. Melandrium, Slany (2014)
25. Simpach, O., Pechrova, M.: The impact of population development on the sustainability of
the rural regions. In: Agrarian Perspectives XXIII – The Community-Led Rural Development,
pp. 129—136. Czech University of Life Sciences, Prague (2014)
26. Simpach, O., Dotlacilova, P., Langhamrova, J.: Effect of the length and stability of the time
series on the results of stochastic mortality projection: an application of the Lee-Carter model.
In: Proceedings ITISE 2014, pp. 1375–1386 (2014)
An Application of Time Series Analysis
in Judging the Working State of Ground-Based
Microwave Radiometers and Data Calibration
Z. Wang () • Q. Li
CMA Key Laboratory for Aerosol-Cloud-Precipitation, Collaborative Innovation Center
on Forecast and Evaluation of Meteorological Disasters, Nanjing University of Information
Science and Technology, Nanjing 210044, China
School of Atmospheric Physics, Nanjing University of Information Science and Technology,
Nanjing 210044, China
e-mail: [email protected]
J. Huang
CMA Key Laboratory for Aerosol-Cloud-Precipitation, Collaborative Innovation Center
on Forecast and Evaluation of Meteorological Disasters, Nanjing University of Information
Science and Technology, Nanjing 210044, China
Y. Chu
Institute of Urban Meteorological Research, CMA, Beijing 100089, People’s Republic of China
to early summer (late May). The results show that the first five channels worked well
in the period while the last seven channels didn’t, implying that the radiometer need
to be maintained or repaired by manufacture. The methodology suggested by this
paper has been applied to the similar radiometers at Wuhan and Beijing for quality
control and calibration on observed TB data.
1 Introduction
where
n z o
.0; z/ D exp ka z0 sec dz0 ; (2)
0
is the transmittance of the air from height z down to the antenna (z D 0), .0; 1/
is the transmittance of the whole atmosphere, and TB .1/ is the cosmic brightness
temperature and is taken as 2.9 K [16] for computation in this paper. T(z) is the
temperature profile, ka (z) is the absorption coefficient because of mainly oxygen
and water vapor in case of clear-sky and depends mainly on pressure, temperature,
humidity, and wave frequency [19, 20].
A software for brightness temperature simulation based Eq. (1) was designed and
has been used for many years [16] and the absorption coefficient, ka (z), is calculated
according to Liebe’s model [19, 20]. Atmospheric profiles needed for computation
in Eq. (1) can be obtained from either radiosonde or NCEP model outputs (http://
rda.ucar.edu/datasets/ds083.2/, doi:10.5065/D6M043C6).
The radiometer at Nanjing has 12 channels including five channels at 22.235,
23.035, 23.835, 26.235, and 30 GHz for sensing air humidity and liquid water
content and seven channels at 51.25, 52.28, 53.85, 54.94, 56.66, 57.29, and
58.80 GHz for sensing air temperature profiles. Ground-based radiometer should
be calibrated with liquid nitrogen (LN) once every a few months and the radiometer
at Nanjing was just LN-calibrated a few days before Nov. 27, 2010. Therefore data
collection for the experiment started on Nov. 27, 2010. Observations in case of cloud
must be deleted from the sample because of the uncertainty in the radiance of cloud.
According to “relative humidity <85 %” [21] for clear-sky, a sample with 77 “Clear-
sky” events, as listed in Table 1, was obtained for Nanjing station at 08:00 BST in
the experiment period from Nov. 27, 2010 to May 29, 2011.
250 Z. Wang et al.
Table 1 Clear-sky dates in the experiment period from Nov. 27, 2010 to May 29, 2011 at Nanjing
Event No. of Event No. of Event No. of
Date index days Date index days Date index days
2010-11-27 1 0 2011-01-12 27 46 2011-03-31 53 124
2010-11-28 2 1 2011-01-13 28 47 2011-04-04 54 128
2010-12-01 3 4 2011-01-16 29 50 2011-04-09 55 133
2010-12-03 4 6 2011-01-24 30 58 2011-04-10 56 134
2010-12-04 5 7 2011-01-25 31 59 2011-04-12 57 136
2010-12-07 6 10 2011-01-30 32 64 2011-04-17 58 141
2010-12-08 7 11 2011-01-31 33 65 2011-04-18 59 142
2010-12-09 8 12 2011-02-02 34 67 2011-04-19 60 143
2010-12-10 9 13 2011-02-03 35 68 2011-04-20 61 144
2010-12-11 10 14 2011-02-04 36 69 2011-04-23 62 147
2010-12-16 11 19 2011-02-05 37 70 2011-04-25 63 149
2010-12-17 12 20 2011-02-11 38 76 2011-04-26 64 150
2010-12-20 13 23 2011-02-15 39 80 2011-04-27 65 151
2010-12-23 14 26 2011-03-03 40 96 2011-04-28 66 152
2010-12-26 15 29 2011-03-04 41 97 2011-04-29 67 153
2010-12-28 16 31 2011-03-07 42 100 2011-05-01 68 155
2010-12-29 17 32 2011-03-08 43 101 2011-05-02 69 156
2010-12-30 18 33 2011-03-11 44 104 2011-05-07 70 161
2010-12-31 19 34 2011-03-12 45 105 2011-05-13 71 167
2011-01-01 20 35 2011-03-16 46 109 2011-05-17 72 171
2011-01-04 21 38 2011-03-17 47 110 2011-05-19 73 173
2011-01-06 22 40 2011-03-23 48 116 2011-05-20 74 174
2011-01-07 23 41 2011-03-25 49 118 2011-05-25 75 179
2011-01-08 24 42 2011-03-27 50 120 2011-05-28 76 182
2011-01-10 25 44 2011-03-28 51 121 2011-05-29 77 183
2011-01-11 26 45 2011-03-30 52 123
Figure 1 shows the simulated brightness temperature series together with the
observed brightness temperature series for the 77 clear-sky events. In this figure,
the curve TBM is for observation and the curve TBC is for simulation.
It can be seen from Fig. 1 that the consistence between the simulation and
the observation is good for each of the five channels though significant biases
exist. Results from statistic analyses are given in Table 2. The 30 GHz channel is
slightly worse as compared with the other four channels and some of the observed
An Application of Time Series Analysis in Judging the Working State. . . 251
(a) (b)
7%0
7%0
7%&
7%&
7% .
7% .
(YHQWLQGH[ (YHQWLQGH[
(c) (d)
7%0 7%0
7%& 7%&
7% .
7% .
(YHQWLQGH[ (YHQWLQGH[
(e)
(c)
7%0
7%&
7% .
(YHQWLQGH[
Fig. 1 The calculated and observed brightness temperature variations at 08:00BT on 77 clear days
for the five channels at the frequencies of (a) 22.235, (b) 23.035, (c) 23.835, (d) 26.235, and
(e) 30.000 GHz
Table 2 Statistics for the calculated and observed brightness temperatures for the five channels to
sense humidity and liquid water content (sample size N D 77)
Channel Frq./GHz 22.235 23.035 23.835 26.235 30.000
TBM=K 26:59 24:97 20:68 18:25 1:93
TBC=K 18:49 18:15 16:55 13:08 12:57
Biasa /K 8:10 6:82 4:13 5:17 10:64
R 0:9821 0:9832 0:9821 0:9816 0:9168
a
Bias D TBM TBC
brightness temperature values are negative(see Fig. 1e), which is obviously wrong.
The correlation coefficients are all better than 0.91, which implies that all the
observed data after bias correction are good for further applications and all the five
channels have been working well in the given period.
252 Z. Wang et al.
The simulated and the observed brightness temperature series at the seven frequen-
cies for atmospheric temperature remote sensing are given in Fig. 2.
As the time goes from winter through spring to summer, the air temperature
at Nanjing increases and, accordingly, the brightness temperature increases. The
brightness temperature at higher frequencies indicates the temperature in the lower
troposphere and the brightness temperature at lower frequencies indicates the
temperature within the whole atmosphere including the upper troposphere [16].
Therefore, the brightness temperature at higher frequencies increases quicker than
at lower frequencies (as shown by TBC series in figures from Fig. 2a to g) since the
temperature increase in the lower troposphere is quicker because of the land surface
than that in the upper troposphere.
But the observed brightness temperatures at lower frequencies such as 51.25,
52.28, and 53.38 GHz are gradually going down and shifting away from the
simulated values, as shown by TBM series in Fig. 2, especially at 51.25 GHz as
shown in Fig. 2a. The phenomenon like this can also be seen at higher frequencies
though not so obvious as at lower frequencies. This phenomenon implies that
the radiometer was not working properly at these channels. (It was later that the
manufacture found because of a failed diode.)
If one’s purpose is to make use of the obtained TBM as shown in Fig. 2, from
which the air temperature information can be retrieved, the correction for the time-
shift must be performed in advance as an important further processing. It can be seen
that the influence of the radiometer malfunction on the data quality was gradually
increasing as the time was going. Therefore suppose the divergence as a function of
time can be expressed by f (D), where D is the number of days from the first day of
the experiment (see Table 1) so that TBM after the time-shift correction becomes
The obtained TBO is expected to be more consistent with TBC than TBM. It has
been found that a polynomial like
f .D/ D a0 C a1 D C a2 D2 C a3 D3 ; (4)
can give quite good results as shown by TBO series in Fig. 2, in this case in terms
of the correlation given in Table 3. The values of a0 a3 in the polynomial above
were obtained with regression analysis on the data. It can be seen from Table 3 that
the correlation coefficients between TBC and TBO after correction are obviously
improved as compared with those between TBC and TBM before correction.
Figure 2 shows that the TBO series keep both the shorter-scale temperature
fluctuation features and the long-scale variation from winter to early summer, and
would be better than TBM for air temperature retrievals.
An Application of Time Series Analysis in Judging the Working State. . . 253
7% .
7% .
7%0
7%0
7%&
7%&
7%2 7%2
(YHQWLQGH[ (YHQWLQGH[
(c) (d)
7%0
7%&
7%2
7% .
7% .
7%0
7%&
7%2
(YHQWLQGH[ (YHQWLQGH[
(e) (f)
(YHQWLQGH[ (YHQWLQGH[
(g)
7%0
7%&
7%2
7% .
(YHQWLQGH[
Fig. 2 The calculated and observed brightness temperature variations at 08:00 on 77 clear days
for the seven temperature-sensing channels at the frequencies of (a) 51.25, (b) 52.28, (c) 53.85,
(d) 54.94, (e) 56.66, (f) 57.29, and (g) 58.80 GHz
254 Z. Wang et al.
Table 3 The correlation coefficients between TBC and TBM before and after correction for the
seven channels to sense air temperature profiles (sample size N D 77)
Channel Frq./GHz 51.25 52.28 53.85 54.94 56.66 57.29 58.80
R before correction 0:6017 0:6761 0:5817 0.2356 0.6384 0.8387 0.9420
R after correction 0:5728 0:7575 0:7627 0.8339 0.9321 0.9521 0.9784
The method described above for judging the working state of a ground-based
microwave radiometer based on time series analysis has been applied successfully
to other radiometers.
There are two radiometers in Wuhan. Table 4 is the Statistics for comparison
between the calculated and observed brightness temperatures for the two radiome-
ters. It can be seen that Radiometer A (one of the two) worked quite well in the
two summers in 2009 and 2010 for inspection with only exception of the channel
at 54.940 GHz according to the correlation coefficient (R2 D 0.7022). The data
obtained with Radiometer A was directly used for precipitation weather analysis
[22]. But Radiometer B (another of the two) was worse and all its channels for
water vapor remote sensing failed for their biases were very large and correlation
coefficient very low as shown in Table 4. The channels for upper air temperature
remote sensing (working at the lower frequency range) did not work well either
according to the low correlation coefficient. Therefore the data obtained with
Radiometer B in the inspection period of Feb. 2010 need further processing for
applications or it is suggested that the radiometer needs LN calibration and, possibly,
hardware inspection as well.
Figure 3a, b show the time series of observed clear-sky brightness temperatures
(TBM) as compared with simulated (TBC) for the two typical channels of a Beijing
radiometer during the period from Jan. 1, 2010 to Dec. 31, 2011. It can be seen that
the seasonal variation feature shown by TBC is very clear in Fig. 3a for air humidity
and in Fig. 3b for air temperature but there are two mutations in the TBM series. The
An Application of Time Series Analysis in Judging the Working State. . . 255
Table 4 Statistics for comparison between the calculated and observed brightness temperatures
for Wuhan Radiometer A in summers of both 2009 and 2010 and Wuhan Radiometer B in Feb.
2010
Radiometer A Radiometer B
Channel Frq./GHz TBC/K Bias/K R2 TBC/K Bias/K R2
1 22.234 53:17 3:76 0.9392 10:79 13:90 0.3747
2 22.500 53:68 1:14 0.9331 10:59 13:93 0.7467
3 23.034 51:93 3:52 0.9399 10:48 16:09 0.7402
4 23.834 45:78 1:18 0.9436 10:08 14:83 0.3189
5 25.000 36:94 0:47 0.9399 9:57 10:92 0.5953
6 26.234 30:88 0:53 0.9369 9:37 8:42 0.3073
7 28.000 26:64 0:32 0.8974 9:53 13:53 0.1019
8 30.000 25:14 2:20 0.8950 10:12 17:34 0.1138
9 51.248 121:76 4:99 0.8535 103:71 4:39 0.0902
10 51.760 140:20 5:64 0.8283 120:97 5:96 0.1204
11 52.280 164:90 5:08 0.8528 143:70 1:59 0.0163
12 52.804 196:01 4:40 0.9039 172:39 0:07 0.3651
13 53.336 230:62 1:78 0.9250 205:11 0:67 0.0357
14 53.848 259:92 0:37 0.9722 233:95 0:83 0.0950
15 54.400 279:91 0:52 0.9800 254:57 0:39 0.6646
16 54.940 288:25 0:87 0.7022 263:38 0:94 0.9688
17 55.500 291:19 0:91 0.9895 266:37 1:30 0.7966
18 56.020 292:36 0:93 0.9850 267:44 0:39 0.9935
19 56.660 293:15 0:97 0.9857 268:10 1:68 0.9662
20 57.288 293:61 0:93 0.9761 268:43 1:38 0.9889
21 57.964 293:88 0:58 0.9737 268:60 1:34 0.9645
22 58.800 294:08 0:85 0.9652 268:68 1:27 0.9655
first mutation was found associated with LN calibration at the index of 294 as shown
in Fig. 3a and the second was because of the instrument movement from Shunyi
(40ı070 N, 116ı 360 E, 34.0 m ASL) to Shangdianzi (40ı 390 N, 117ı 070 E, 293.3 m
ASL) at the index of 520 in Fig. 3b. The TBM before LN calibration differs a lot
as compared with the TBM after LN calibration for humidity channels while the
TBM for temperature channels decreases obviously after the instrument movement.
Therefore, the whole sample should be sliced into three sub-samples for any further
comparison and applications. The authors [18] made linear regression analysis on
the three sub-samples to set up the relationship between TBM and TBC for each
channel so that TBM can be transformed into TBO according to
7%0 7%0
7%& 7%&
TB(K)
TB(K)
Event index Event index
(a) channel 4(23.834GHz) (b) channel 14(53.848GHz)
7%2
7%2
7%& 7%&
TB(K)
TB(K)
Event index Event index
(c) channel 4 (d) channel 14
Fig. 3 The time series of observed clear-sky brightness temperatures before and after transforma-
tion (TBM and TBO) as compared with model simulated results (TBC) for the two typical channels
of the radiometer at Beijing during the period from Jan. 1, 2010 to Dec. 31, 2011. (a) Channel 4
(23.834 GHz), (b) channel 14 (53.848 GHz), (c) channel 4, and (d) channel 14
The coefficients a and b are obtained from linear regression analysis under the
condition to make
Table 5 gives the coefficients a and b for data transformation and the correlation
coefficients before and after the transformation. Figure 3c, d are the same as
Fig. 3a, b but TBM has been replaced with TBO. The comparison between Fig. 3c
and a and between Fig. 3d and b, respectively, shows that the discrepancy between
TBM and TBC has been decreased. It can be seen that for most channels except
channel 7 and 8 the linear transformation has made the correlation improved.
Channel 7 and 8 need further inspection.
An Application of Time Series Analysis in Judging the Working State. . . 257
Table 5 The coefficients a and b in equation TBO D a TBM C b and the comparison of the
correlation coefficients of full sample before and after transformation
a and b for a and b for a and b for R2 before R2 after
sub-sample 1 sub-sample 2 sub-sample 3 transformation transformation
Channel (N D 294) (N D 226) (N D 83) (N D 603) (N D 603)
1 1:2103; 15:8 1.1058, 1.9 1:1042; 3:3 0.8673 0.9408
2 1:2170; 15:9 1.1138, 1.2 1:0872; 0:6 0.8525 0.9398
3 1:2930; 19:8 1.2048, 1.7 1:0544; 2:1 0.7373 0.8904
4 1:2614; 18:9 1.2538, 2.1 1:0223; 3:4 0.6828 0.8991
5 1:2635; 14:2 1.3039, 2.6 0:8770; 4:7 0.5987 0.8080
6 1:1073; 7:9 1.3270, 3.4 0:7857; 3:7 0.5217 0.6843
7 0:6872; 1:4 1.1621, 0.4 0:1642; 10:3 0.1528 0.3066
8 0:0828; 17:9 0.1271, 12.9 0:0948; 13:6 0.0165 0.0354
9 0:9044; 10:9 1.0356, 1.0 0:7305; 34:4 0.8242 0.9162
10 0:9705; 2:5 1.1330, 14.0 0:9021; 18:5 0.7635 0.9239
11 0:8240; 29:7 0.9642, 9.4 0:9942; 9:7 0.9036 0.9553
12 0:8485; 31:1 0.9542, 12.2 1:0048; 8:6 0.9290 0.9722
13 0:8642; 32:4 0.9609, 11.3 0:9596; 17:1 0.9527 0.9887
14 0:9177; 21:4 0.9626, 10.0 0:9463; 18:6 0.9738 0.9887
15 0:9599; 10:5 0.9865, 3.3 0:9535; 15:3 0.9846 0.9959
16 0:9780; 5:6 0.9966, 0.3 0:9476; 16:2 0.9893 0.9960
17 0:9929; 1:5 1.0178, 5.8 0:9466; 16:4 0.9844 0.9909
18 0:9973; 0:2 1.0144, 4.9 0:9292; 21:1 0.9869 0.9931
19 1:0030; 1:6 1.0161, 5.3 0:9252; 22:4 0.9856 0.9922
20 1:0043; 1:9 1.0221, 7.1 0:9193; 24:0 0.9834 0.9903
21 1:0101; 3:6 1.0244, 7.7 0:9156; 25:1 0.9825 0.9899
22 1:0159; 5:2 1.0318, 9.8 0:9122; 26:2 0.9802 0.9886
5 Conclusion
In this case, the radiometer working state must be checked and the TB readouts
obtained must be calibrated in advance for further possible applications in retrieving
meteorological information.
References
1. Westwater, E., Crewell, S., Matzler, C.: A review of surface-based microwave and millimeter-
wave radiometric remote sensing of the troposphere. Radio Sci. Bull. 2004(310), 59–80 (2004)
2. Cimini, D., Westwater, E., Gasiewski, A., Klein M., Leuski V., Liljegren J.: Ground-based
millimeter- and submillimiter-wave observations of low vapor and liquid water contents. IEEE
Trans. Geosci. Remote Sens. 45(7), Part II, 2169–2180 (2007)
3. Cimini, D., Campos, E., Ware, R., Albers, S., Giuliani, G., Oreamuno, J., Joe, P., Koch,
S.E., Cober, S., Westwater, E.: Thermodynamic atmospheric profiling during the 2010 Winter
Olympics using ground-based microwave radiometry. IEEE Trans. Geosci. Remote Sens.
49(12), 4959–4969 (2011)
4. Cimini, D., Nelson, M., Güldner, J., Ware, R.: Forecast indices from ground-based
microwave radiometer for operational meteorology. Atmos. Meas. Tech. 8(1), 315–333 (2015).
doi:10.5194/amt-8-315-2015
5. Güldner, J., Spänkuch, D.: Remote sensing of the thermodynamic state of the atmospheric
boundary layer by ground-based microwave radiometry. J. Atmos. Oceanic Tech. 18, 925–933
(2001)
6. Hewison, T.: 1D-VAR retrievals of temperature and humidity profiles from a ground-based
microwave radiometer. IEEE Trans. Geosci. Remote Sens. 45(7), 2163–2168 (2007)
7. Maschwitz, G., Löhnert, U., Crewell, S., Rose, T., Turner, D.D.: Investigation of ground-based
microwave radiometer calibration techniques at 530 hPa. Atmos. Meas. Tech. 6, 2641–2658
(2013). doi:10.5194/amt-6-2641-2013
8. Stähli, O., Murk, A., Kämpfer, N., Mätzler, C., Eriksson, P.: Microwave radiometer to retrieve
temperature profiles from the surface to the stratopause. Atmos. Meas. Tech. 6, 2477–2494
(2013). doi:10.5194/amt-6-2477-2013
9. Xu, G., Ware, R., Zhang, W., Feng, G., Liao, K., Liu, Y.: Effect of off-zenith observation
on reducing the impact of precipitation on ground-based microwave radiometer measurement
accuracy in Wuhan. Atmos. Res. 140–141, 85–94 (2014)
10. Campos, E., Ware, R., Joe, P., Hudak, D.: Monitoring water phase dynamics in winter clouds.
Atmos. Res. 147–148, 86–100 (2014)
11. Serke, D., Hall, E., Bognar, J., Jordan, A., Abdo, S., Baker, K., Seitel, T., Nelson, M., Reehorst,
A., Ware, R., McDonough, F., Politovich, M.: Supercooled liquid water content profiling case
studies with a new vibrating wire sonde compared to a ground-based microwave radiometer.
Atmos. Res. 149, 77–87 (2014)
12. Ware, R., Cimini, D., Campos, E., Giuliani, G., Albers, S., Nelson, M., Koch, S.E., Joe, P.,
Cober, S.: Thermodynamic and liquid profiling during the 2010 Winter Olympics. Atmos. Res.
132, 278–290 (2013)
An Application of Time Series Analysis in Judging the Working State. . . 259
13. Ware, R., Solheim, F., Carpenter, R., Güldner, J., Liljegren, J., Nehrkorn, T., Vandenberghe, F.:
A multi-channel radiometric profiler of temperature, humidity and cloud liquid. Rad. Sci. 38,
8079–8032 (2003)
14. Wang, Z., Li, Q., Hu, F., Cao, X., Chu, Y.: Remote sensing of lightning by a ground-based
microwave radiometer. Atmos. Res. 150, 143–150 (2014)
15. Solheim, F., Godwin, J., Westwater, E., Han, Y., Keihm, S., Marsh, K., Ware, R.: Radiometric
profiling of temperature, water vapor, and cloud liquid water using various inversion methods.
Radio Sci. 33(2), 393–404 (1998)
16. Westwater, E., Wang, Z., Grody, N.C., et al.: Remote sensing of temperature profiles from a
combination of observations from the satellite-based microwave sounding unit and the ground-
based profiler. J. Atmos. Oceanic Tech. 2, 97–109 (1985)
17. Wang, Z., Cao, X., Huang, J., et al.: Analysis on the working state of a ground based microwave
radiometer based on radiative transfer model and meteorological data variation features. Trans.
Atmos. Sci. 37(1), 1–8 (2014) (in Chinese with English abstract)
18. Li, Q., Hu, F., Chu, Y., Wang, Z., Huang, J., Wang, Y., Zhu, Y.: A consistency analysis
and correction of the brightness temperature data observed with a ground based microwave
radiometer in Beijing. Remote. Sens. Technol. Appl. 29(4), 547–556 (2014) (in Chinese with
English abstract)
19. Liebe, H.J.: An updated model for millimeter wave propagation in moist air. Radio Sci. 20(5),
1069–1089 (1985)
20. Liebe, H.J., Rosenkranz, P.W., Hufford, G.A.: Atmospheric 60 GHz oxygen spectrum: new
laboratory measurements and line parameters. J. Quant. Spectrosc. Radiat. Transfer 48, 629–
643 (1992)
21. Decker, M.T., Westwater, E.R., Guiraud, F.O.: Experimental evaluation of ground-based
microwave radiometric sensing of atmospheric temperature and water vapor profiles. J. Appl.
Meteorol. 17(12), 1788–1789 (1978)
22. Ao, X., Wang, Z., Xu, G., Zhai, Q.i., Pan, X.: Application of Wuhan ground-based radiometer
observations in precipitation weather analysis. Torrential Rain Disasters 30(4), 358–365 (2011)
(in Chinese with English abstract)
Identifying the Best Performing Time Series
Analytics for Sea Level Research
Phil. J. Watson
1 Introduction
Sea level rise is one of the key artefacts of climate change that will have profound
impacts on global coastal populations [1, 2]. Understanding how and when impacts
will occur and change are critical to developing robust strategies to adapt and
minimise risks.
Although the body of mean sea level research is extensive, professional debate
around the characteristics of the trend signal and its causalities remains high [3]. In
particular, significant scientific debate has centred around the issue of a measurable
acceleration in mean sea level [4–9], a feature central to projections based on the
current knowledge of climate science [10].
Monthly and annual average ocean water level records used by sea level
researchers are a complex composite of numerous dynamic influences of largely
oceanographic, atmospheric or gravitational origins operating on differing temporal
and spatial scales, superimposed on a comparatively low amplitude signal of sea
level rise driven by climate change influences (see [3] for more detail). The mean
sea level (or trend) signal results directly from a change in volume of the ocean
attributable principally to melting of snow and ice reserves bounded above sea level
(directly adding water), and thermal expansion of the ocean water mass. This low
amplitude, non-linear, non-stationary signal is quite distinct from all other known
dynamic processes that influence the ocean water surface which are considered to
be stationary; that is, they cause the water surface to respond on differing scales and
frequencies, but do not change the volume of the water mass. In reality, improved
real-time knowledge of velocity and acceleration rests entirely with improving the
temporal resolution of the mean sea level signal.
Over recent decades, the emergence and rapid improvement of data adaptive
approaches to isolate trends from non-linear, non-stationary and comparatively
noisy environmental data sets such as EMD [11, 12], Singular Spectrum Analysis
(SSA) [13–15] and Wavelet analysis [16–18] are theoretically encouraging. The
continued development of data adaptive and other spectral techniques [19] has given
rise to recent variants such as CEEMD [20, 21] and Synchrosqueezed Wavelet
Transform (SWT) [22, 23].
An innovative process by which to identify the most efficient method for
estimating the trend is to test against a “synthetic” (or custom built) data set with
a known, fixed mean sea level signal [3]. In general, a broad range of analysis
techniques have been applied to the synthetic data set to directly compare their
utility to isolate the embedded mean sea level signal from individual time series.
Various quantitative metrics and associated qualitative criteria have been used to
compare the relative performance of the techniques tested.
2 Method
The method to determine the most robust time series method for isolating mean sea
level with improved temporal accuracy is relatively straightforward and has been
based on three key steps, namely:
1. development of synthetic data sets to test;
2. application of a broad range of analytical methods to isolate the mean sea level
trend from the synthetic data set and
3. comparative assessment of the performance of each analytical method using
a multi-criteria analysis (MCA) based on some key metrics and a range of
additional qualitative criteria relevant to its applicability for broad, general use
on conventional ocean water level data worldwide.
Identifying the Best Performing Time Series Analytics for Sea Level Research 263
The core synthetic data set developed for this research has been specifically designed
to mimic the key physical characteristics embedded within real-world ocean water
level data, comprising a range of six key known dynamic components added to a
non-linear, non-stationary time series of mean sea level [3]. The fixed mean sea level
signal has been generated by applying a broad cubic smoothing spline to a range of
points over the 1850–2010 time horizon reflective of the general characteristics of
the global trend of mean sea level [24], accentuating the key positive and negative
“inflexion” points evident in the majority of long ocean water level data sets [25].
This data set has been designed as a monthly average time series spanning a 160-
year period (from 1850 to 2010) to reflect the predominant date range for the longer
records in the Permanent Service for Mean Sea Level (PSMSL), which consolidates
the world’s ocean water level data holdings.
The synthetic data set contains 20,000 separate time series, each generated by
successively adding a randomly sampled signal from within each of the six key
dynamic components to the fixed mean sea level signal. The selection of 20,000
time series represents a reasonable balance between optimising the widest possible
set of complex combinations of real-world signals and the extensive computing
time required to analyse the synthetic data set. Further, the 20,000 generated trend
outputs from each analysis provide a robust means of statistically identifying the
better performing techniques for extracting the trend [3].
Additionally, the core 160-year monthly average data set has been subdivided
into 2 80 and 4 40 year subsets and annualised to create 14 separate data sets to
also consider the influence of record length and issues associated with annual versus
monthly records.
The time series analysis methods that have been applied to the synthetic data set to
estimate the trend are summarised in Table 1. This research has not been designed to
consider every time series analysis tool available. Rather the testing regime is aimed
at appraising the wide range of tools currently used more specifically for mean sea
level trend detection of individual records, with a view to improving generalised
tools for sea level researchers. Some additional, more recently developed data
adaptive methods such as CEEMD [21] and SWT [22, 23] have also been included
in the analysis to consider their utility for sea level research. It is acknowledged
that various methods permit a wide range of parameterisation that can critically
affect trend estimation. In these circumstances, broad sensitivity testing has been
undertaken to identify the better performing combination and range of parameters
Table 1 Summary of analysis techniques applied to synthetic data set
264
Software package/additional
Method Sub-method Additional condition comment
Linear regression n/a n/a n/a
Polynomial regression Second order n/a n/a
LOESS smoothing n/a n/a ˛ D 0.75, order D 2, weighted least
squares
Smoothing splines Cubic smoothing, thin plate PRS, B-Spline œ based on both GCV “mgcv” package in R [26–29]
and REML
Moving averagea 10- to 40-year smooth Single, triple and quad “zoo” package in R [30]
averaging.
Structural modelsb Seasonal decomposition and basic struct. model Based on LOESS and Stl decomposition in R [31].
ARIMA StructTS in R [32]
Butterworth filterc 10–80 years cycles removed n/a GRETL [33]
SSAd,e 1d and Toeplitz variants Win: 10–80 years “Rssa” package in R [34]
EMD Envelope: interpolation, spline smoothing, locfit Boundary condition: “EMD” package in R [11, 35, 36]
smoothing. Sifting by interpolation and spline smoothing none, symmetric, wave,
periodic
EEMDf Noise amplitude: 20–200 mm Trials: 20–200 “hht” package in R [12, 37, 38]
CEEMDf Noise amplitude: 20–200 mm Trials: 20–200 “hht” package in R [21, 37, 38]
(continued)
P.J. Watson
Table 1 (continued)
Software package/additional
Method Sub-method Additional condition comment
Wavelet analysis Multi-resolution decomposition using MODWT Daubechies filters: “wmtsa” package in R [16, 38, 39]
Symmlet (S2–S10)
Synchrosqueezed Wavelet Wavelet filters: “Bump” ( D 1, s D 0.2), “CMHat” Gen parameter: 100 “SynchWave” package in R [22, 23]
Transform (SWT)e ( D 1, s D 5), “Morlet” ( D 0.05PI), “Gauss” ( D 2, 1000
s D 0.083) 10,000
100,000
Notes: The above-mentioned table provides a general summary of the analytical techniques applied to the synthetic data set in order to test the utility of
extracting the embedded mean sea level (trend) component. The “Sub-Method” and “Additional Condition” provide details on the sensitivity analysis pertaining
to the respective methodologies. Where possible, relevant analytical software from the R open source suite of packages have been used [40]
a
Moving (or rolling) averages are centred around the data point in question and therefore the determined trend is restricted to half the averaging window inside
both ends of the data set
b
Structural models are only relevant for monthly average data sets
c
For the respective 40-year monthly and annual synthetic data sets, only cycles up to and including 40 years have been removed by the digital filter
d
For the respective 40-year data sets, only window lengths from 10 to 30 years have been considered. Similarly, for the respective 80-year data sets, only
window lengths from 10 to 70 years have been considered
e
Auto detection routines have been specifically written to isolate decomposed elements of the time series with low frequency trend characteristics
f
The noise amplitude for the annual data sets includes the full range, but, for the monthly data sets only ranges from 50 to 200 mm
Identifying the Best Performing Time Series Analytics for Sea Level Research
265
266 P.J. Watson
for a particular method when applied specifically to ocean water level records (as
represented by the synthetic data sets).
With methods such as SSA and SWT, it has been necessary to develop auto
detection routines to isolate specific elements of decomposed time series with
characteristics that resemble low frequency trends. Direct consultation with leading
time series analysts and developers of method specific analysis tools has also
assisted to optimise sensitivity testing.
In addition to identifying the analytic that provides the greatest temporal precision in
resolving the trend, the intention is to use this analytic to underpin the development
of tools for wide applicability by sea level researchers. Comparison of techniques
identified in Table 1, have been assessed across a relevant range of quantitative and
qualitative criteria, including:
• Measured accuracy (Criteria A1 ). This criterion is based upon the cumulative
sum of the squared differences between the fixed mean sea level signal and the
trend derived from a particular analytic for each time series in the synthetic data
set. This metric has then been normalised per data point for direct comparison
between the different length synthetic data sets (40, 80 and 160 years) as follows:
20;000
1X
A1 D .xi X/2 (1)
n iD1
where X represents the fixed mean sea level signal embedded within each time
series; xi represents the trend derived from the analysis of the synthetic data set
using a particular analytical approach and n represents the number of data points
within each of the respective synthetic data sets (or lesser outputs in the case of
moving averages).
It is imperative to note that particular combinations of key parameters used as
part of the sensitivity testing regime for particular methods (refer Table 1), resulted
in no (or limited) outputs for various time series analysed. This occurred either
due to the analytic not resolving a signal within the limitations established for a
trend (particularly for auto detection routines necessary for SSA and SWT) or where
internal thresholds/convergence protocols were not met for a particular algorithm
and the analysis terminated. Where such circumstances occurred, the determined
A1 metric was prorated to equate to 20,000 time series for direct comparison across
methods. Where the outputs of an analysis resolved a trend signal in less than 75 %
(or 15,000 time series) of a particular synthetic data set, the result was not included
in the comparative analysis.
Identifying the Best Performing Time Series Analytics for Sea Level Research 267
3 Results
In total, 1450 separate analyses have been undertaken as part of the testing regime,
translating to precisely 29 million individual time series analyses. Figure 1 provides
a pictorial summary of the complete analysis of all monthly and annual data
268 P.J. Watson
Fig. 1 Analysis overview based on Criteria A1 . Notes: This chart provides a summary of all
analysis undertaken (refer Table 1). Scales for both axes are equivalent for direct comparison
between respective analyses conducted on the monthly (top panel) and annual (bottom panel)
synthetic data sets. The vertical dashed lines demarcate the results of each method on the 160-,
80- and 40-year length data sets in moving from left to right across each panel. Where the analysis
permitted the resolution of a trend signal across a minimum of 75 % (or 15,000 time series) of a
synthetic data set, this has been represented as “complete”. Those analyses resolving trends over
less than 75 % of a synthetic data set are represented as “incomplete”
sets (40-, 80- and 160-year synthetic data sets) plotted against the key metric,
criteria A1 . Equivalent scales for each panel provide direct visual and quantitative
comparison between monthly and annual and differing length data sets. For the sake
of completeness, it is worth noting a further 36 monthly analysis results lie beyond
the limit of the scale chosen and therefore are not depicted on the chart. Where
analysis resolves a trend signal across more than 75 % (or 15,000 time series) of
a synthetic data set, the output is used for comparative purposes and depicted on
Fig. 1 as “complete”.
From Fig. 1, it is evident that the cumulative errors of the estimated trend
(criteria A1 ) are appreciably lower for the annual data sets when considered across
the totality of the analysis undertaken. More specifically, for the 579 “complete”
monthly outputs, 408 (or 71 %) fall below an A1 threshold level of 30 106 mm2
(where the optimum methods reside). Comparatively, for the 632 “complete” annual
outputs, 566 (or 90 %) are below this threshold level.
The key reason for this is that the annualised data sets not only provide a
natural low frequency smooth (through averaging calendar year monthlies), but,
the seasonal influence (at monthly frequency) is largely removed, noting the bin
of seasonal signals sampled to create the synthetic data set also contains numerous
time-varying seasonal signals derived using ARIMA.
Identifying the Best Performing Time Series Analytics for Sea Level Research 269
Based on the testing regime performed on the synthetic data sets, EEMD outper-
formed CEEMD. Both variants of the ensemble EMD, using the sensitivity analysis
advised, proved the most computationally expensive of all the algorithms tested.
Both of these EMD variants were substantially outperformed by the MODWT and
SSA, but importantly, processing times were of the order of 3000–4000 times that
of these better performing analytics.
Clearly for these particularly complex ocean water level time series, the excessive
computational expense of these algorithms has not proven beneficial. One of
the more inconsistent performers proved to be the SWT. This algorithm proved
highly sensitive to the combination of wavelet filter and generalisation parameter.
Certain combinations of parameters provided exceptional performance on indi-
vidual synthetic data sets but proved less capable of consistently resolving low
frequency “trend-like” signals across differing length data sets. Of the analytics
tested, this algorithm proved the most complex to optimise in order to isolate and
reconstruct trends from the ridge extracted components. Auto detection routines
were specifically developed to test and isolate the low frequency components based
on first differences. However, a significant portion of the sensitivity analyses for
SWT had difficulty isolating the low frequency signals across the majority of the
data sets tested.
SSA has also been demonstrated to be a superior analytical tool for trend
extraction across the range of synthetic data sets. However, like the SWT, SSA
requires an elevated level of expertise to select appropriate parameters and internal
methods to optimise performance. Auto detection routines were also developed
to isolate the key SSA eigentriple groupings with low frequency “trend-like”
characteristics, based on first differences. With this approach, not all time series
could be resolved to isolate a trend within the limits established. Auto detection
routines based on frequency contribution [42] were also provided by Associate
Professor Nina Golyandina (St Petersburg State University, Russia) to test, proving
comparable to the first differences technique.
4 Discussion
With so much reliance on improving the temporal resolution of the mean sea level
signal due to its association as a key climate change indicator, it is imperative
to maximise the information possible from the extensive global data holdings of
the PSMSL. Numerous techniques have been applied to these data sets to extract
trends and infer accelerations based on local, basin or global scale studies. Ocean
water level data sets, like any environmental time series, are complex amalgams
of physical processes and influences operating on different spatial scales and
frequencies. Further, these data sets will invariably also contain influences and
signals that might not yet be well understood (if at all).
With so many competing and sometimes controversial findings in the scientific
literature concerning trends and more particularly, accelerations in mean sea level
272 P.J. Watson
(refer Sect. 1), it is difficult to definitively separate sound conclusions from those
that might unwittingly be influenced by the analytical methodology applied (and to
what extent). This research has been specifically designed as a necessary starting
point to alleviate some of this uncertainty and improve knowledge of the better
performing trend extraction methods for individual long ocean water level data.
Identification of the better performing methods enables the temporal resolution of
mean sea level to be improved, enhancing the knowledge that can be gleaned from
long records which includes associated real-time velocities and accelerations. In
turn, key physically driven changes can be identified with improved precision and
confidence, which is critical not only to sea level research, but also climate change
more generally at increasingly finer (or localised) scales.
The importance of resolving trends from complex environmental and climatic
records has led to the application of increasingly sophisticated, so-called data
adaptive spectral and empirical techniques [12, 19, 43, 44] over comparatively
recent times. In this regard, it is readily acknowledged that whilst the testing
undertaken within this research has indeed been extensive, not every time series
method for trend extraction has been examined. The methods tested are principally
those that have been applied to individual ocean water level data sets within the
literature to estimate the trend of mean sea level.
Therefore spatial trend coherence and multiple time series decomposition tech-
niques such as PCA/EOF, SVD, MC-SSA, M-SSA, XWT, some of which are used
in various regional and global scale sea level studies [45–51] are beyond the scope
of this work and have not been considered. In any case, the synthetic data sets
developed for this work have not been configured with spatially dependent patterns
to facilitate rigorous testing of these methods. In developing the synthetic data sets
to test for this research, Watson [3] noted specifically that a natural extension (or
refinement) of the work might be to attempt to fine tune the core synthetic data set
to reflect the more regionally specific signatures of combined dynamic components.
Other key factors for consideration include identifying the method(s) that
prove robust over the differing length time series available whilst resolving trends
efficiently, with little pre-conditioning or site specificity. Whilst recognising that
various studies investigating mean sea level trends at long gauge sites have utilised
the construction of comparatively detailed site specific general additive models,
these models have little direct applicability or transferability to other sites and have
not been considered further for this work.
Of the analysis methods considered, the comparatively simple 30-year moving
(or rolling) average filter proved the optimal performer against the key A1 criterion
when averaged across all length data sets. Although not isolating and removing
high amplitude signals or contaminating noise, the sheer width of the averaging
window proves to be very efficient in dampening their influence for ocean water
level time series. However, the resulting mean sea level trend finishes 15 years inside
either end of each data set, providing no temporal understanding of the signal for
the most important part of the record—the recent history, which is keenly desired to
better inform the trajectory of the climate related signal. Although well performing
on a range of criteria, this facet is a critical shortcoming of this approach. Whilst
Identifying the Best Performing Time Series Analytics for Sea Level Research 273
triple and quadruple moving averages were demonstrated to marginally lower the
A1 criteria, respectively, compared to the equivalent single moving average, the loss
of data from the ends of the record was further amplified by these methods.
It is also noted that the simple linear regression analysis also performed
exceptionally well against the A1 criteria when averaged across all data sets. Based
on the comparatively limited amplitude and curvature of the mean sea level trend
signal embedded within the synthetic data set it is perhaps not surprising that
the linear regression performs well. But, like the moving average approach, its
simplicity brings with it a profound shortcoming, in that it provides limited temporal
instruction on the trend other than its general direction (increasing or decreasing).
No information on how (or when) this signal might be accelerating is possible from
this technique, which regrettably, is a facet of critical focus for contemporary sea
level research.
It has been noted that unfortunately many studies using wavelet analysis have
suffered from an apparent lack of quantitative results. The wavelet transform
has been regarded by many as an interesting diversion that produces colourful
pictures, yet purely qualitative results [52]. The initial use of this particular multi-
resolution decomposition technique (MODWT) for application to a long ocean
water level record can be found in the work of Percival and Mofjeld [53]. There
is no question from this current research, that wavelet analysis has proven a “star
performer”, producing measurable quantitative accuracy exceeding other methods,
with comparable consistency across all length synthetic data sets and with minimal
computational expense.
Importantly, it is worth noting that the sensitivity testing and MCA used to
differentiate the utility of the various methods, unduly disadvantages the SSA
method. In reality the SSA method performs optimally with a window length
varying between L/4 and L/2 (where L is the length of the time series). Varying the
window length permits necessary optimisation of the separability between the trend,
oscillatory and noise components [54]. However, for the sensitivity analysis around
SSA, only fixed window lengths were compared across all data sets. Although SSA
(with a fixed 30-year window) performed comparably for the key A1 criteria with
MODWT (refer Table 2), a method that optimises the window length parameter
automatically would, in all likelihood have further improved this result. Only a
modest improvement of less than 4 % would be required to put SSA on parity with
the accuracy of MODWT. In addition, auto detection routines designed to select
“trend-like” SSA components are unlikely to perform as well as the interactive
visual inspection (VI) techniques commonly employed by experienced practitioners
decomposing individual time series [43]. Clearly VI techniques were not an option
for the testing regime described herein, which involved processing 14 separate data
sets each containing 20,000 time series.
It is important that both the intent and the limitations of the research work
presented here are clearly understood. The process of creating a detailed synthetic
ocean water level data set, embedded with a fixed non-linear, non-stationary mean
sea level signal to test the utility of trend extraction methods is unique for sea
level research. Despite broad sensitivity testing designed herein, this work should
274 P.J. Watson
5 Conclusion
The monthly and annual average ocean water level data sets used to estimate
mean sea level are like any environmental or climatic time series data, ubiquitously
“contaminated” by numerous complex dynamic processes operating across differing
spatial and frequency scales, often with very high noise to signal ratio. Whilst the
primary physical processes and their scale of influence are known generally [3],
not all processes in nature are fully understood and the quantitative attribution
of these associated influences will always have a degree of imprecision, despite
improvements in the sophistication of time series analyses methods [44]. In an
ideal world with all contributory factors implicitly known and accommodated, the
extraction of a trend signal would be straightforward.
In recent years, the controversy surrounding the conclusions of various published
works, particularly concerning measured accelerations from long, individual ocean
water level records necessitate a more transparent, qualitative discussion around
the utility of various analytical methods to isolate the mean sea level signal with
improved accuracy. The synthetic data set developed by Watson [3] was specifically
designed for long individual records, providing a robust and unique framework
within which to test a range of time series methods to augment sea level research.
The testing and analysis regime summarised in this paper is extensive, involving
1450 separate analyses across monthly and annual data sets of length 40, 80 and
160 years. In total, 29 million individual time series were analysed. From this work,
there are some broad general conclusions to be drawn concerning the extraction of
the mean sea level signal from individual ocean water level records with improved
temporal accuracy:
• Precision is enhanced by the use of the longer, annual average data sets;
• The analytic producing the optimal measured accuracy (Criteria A1 ) across all
length annual data sets was the simple 30-year moving average filter. However,
the outputted trend finishes half the width of the averaging filter inside either end
of the data record, providing no temporal understanding of the trend signal for
the most important part of the record – the recent history;
• The best general purpose analytic requiring minimum expert judgment and
parameterisation to optimise performance was multi-resolution decomposition
using MODWT and
Identifying the Best Performing Time Series Analytics for Sea Level Research 275
Acknowledgements The computing resources required to facilitate this research have been
considerable. It would not have been possible to undertake the testing program without the
benefit of access to high performance cluster computing systems. In this regard, I am indebted
to John Zaitseff and Dr Zavier Barthelemy for facilitating access to the “Leonardi” and “Manning”
systems at the Faculty of Engineering, University of NSW and Water Research Laboratory, respec-
tively.Further, this component of the research has benefitted significantly from direct consultations
with some of the world’s leading time series experts and developers of method specific analysis
tools. Similarly, I would like to thank the following individuals whose contributions have helped
considerably to shape the final product and have ranged from providing specific and general expert
advice, to guidance and review (in alphabetical order): Daniel Bowman (Department of Geological
Sciences, University of North Carolina at Chapel Hill); Dr Eugene Brevdo (Research Department,
Google Inc, USA); Emeritus Professor Dudley Chelton (College of Earth, Ocean and Atmospheric
Sciences, Oregon State University, USA); Associate Professor Nina Golyandina (Department
of Statistical Modelling, Saint Petersburg State University, Russia); Professor Rob Hyndman
(Department of Econometrics and Business Statistics, Monash University, Australia); Professor
Donghoh Kim (Department of Applied Statistics, Sejong University, South Korea); Alexander
Shlemov (Department of Statistical Modelling, Saint Petersburg State University, Russia); Asso-
ciate Professor Anton Korobeynikov (Department of Statistical Modelling, Saint Petersburg State
University, Russia); Emeritus Professor Stephen Pollock (Department of Economics, University
of Leicester, UK); Dr Natalya Pya (Department of Mathematical Sciences, University of Bath,
UK); Dr Andrew Robinson (Department of Mathematics and Statistics, University of Melbourne,
Australia); and Professor Ashish Sharma (School of Civil and Environmental Engineering,
University of New South Wales, Australia).
References
1. McGranahan, G., Balk, D., Anderson, B.: The rising tide: assessing the risks of climate change
and human settlements in low elevation coastal zones. Environ. Urban. 19(1), 17–37 (2007)
2. Nicholls, R.J., Cazenave, A.: Sea-level rise and its impact on coastal zones. Science 328(5985),
1517–1520 (2010)
3. Watson, P.J.: Development of a unique synthetic data set to improve sea level research and
understanding. J. Coast. Res. 31(3), 758–770 (2015)
4. Baart, F., van Koningsveld, M., Stive, M.: Trends in sea-level trend analysis. J. Coast. Res.
28(2), 311–315 (2012)
5. Donoghue, J.F., Parkinson, R.W.: Discussion of: Houston, J.R. and Dean, R.G., 2011. Sea-
level acceleration based on U.S. tide gauges and extensions of previous global-gauge analyses.
J. Coast. Res. 27(3), 409–417; J. Coast. Res, 994–996 (2011)
276 P.J. Watson
6. Houston, J.R., Dean, R.G.: Sea-level acceleration based on U.S. tide gauges and extensions of
previous global-gauge analyses. J. Coast. Res. 27(3), 409–417 (2011)
7. Houston, J.R., Dean, R.G.: Reply to: Rahmstorf, S. and Vermeer, M., 2011. Discussion of:
Houston, J.R. and Dean, R.G., 2011. Sea-level acceleration based on U.S. tide gauges and
extensions of previous global-gauge analyses. J. Coast. Res. 27(3), 409–417; J. Coast. Res.,
788–790 (2011)
8. Rahmstorf, S., Vermeer, M.: Discussion of: Houston, J.R. and Dean, R.G., 2011. Sea-level
acceleration based on U.S. tide gauges and extensions of previous global-gauge analyses. J.
Coast. Res. 27(3), 409–417; J. Coast. Res., 784–787 (2011)
9. Watson, P.J.: Is there evidence yet of acceleration in mean sea level rise around Mainland
Australia. J. Coast. Res. 27(2), 368–377 (2011)
10. IPCC: Summary for policymakers. In: Stocker, T.F., Qin, D., Plattner, G.-K., Tignor, M., Allen,
S.K., Boschung, J., Nauels, A., Xia, Y., Bex, V., Midgley, P.M. (eds.) Climate Change 2013:
The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report
of the Intergovernmental Panel on Climate Change. Cambridge University Press, Cambridge
(2013)
11. Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, E.H., Zheng, Q., Tung, C.C., Liu, H.H.:
The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary
time series analysis. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 454(1971), 903–995
(1998)
12. Wu, Z., Huang, N.E.: Ensemble empirical mode decomposition: a noise-assisted data analysis
method. Adv. Adapt. Data Anal. 1(01), 1–41 (2009)
13. Broomhead, D.S., King, G.P.: Extracting qualitative dynamics from experimental data. Physica
D 20(2), 217–236 (1986)
14. Golyandina, N., Nekrutkin, V., Zhigljavsky, A.A.: Analysis of Time Series Structure: SSA and
Related Techniques. Chapman and Hall/CRC, Boca Raton, FL (2001)
15. Vautard, R., Ghil, M.: Singular spectrum analysis in nonlinear dynamics, with applications to
paleoclimatic time series. Physica D 35(3), 395–424 (1989)
16. Daubechies, I.: Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics
(SIAM), Philadelphia (1992)
17. Grossmann, A., Morlet, J.: Decomposition of Hardy functions into square integrable wavelets
of constant shape. SIAM J. Math. Anal. 15(4), 723–736 (1984)
18. Grossmann, A., Kronland-Martinet, R., Morlet, J.: Reading and understanding continuous
wavelet transforms. In: Combes, J.-M., Grossmann, A., Tchamitchian, P. (eds.) Wavelets,
pp. 2–20. Springer, Heidelberg (1989)
19. Tary, J.B., Herrera, R.H., Han, J., Baan, M.: Spectral estimation—What is new? What is next?
Rev. Geophys. 52, 723–749 (2014)
20. Han, J., van der Baan, M.: Empirical mode decomposition for seismic time-frequency analysis.
Geophysics 78(2), O9–O19 (2013)
21. Torres, M.E., Colominas, M.A., Schlotthauer, G., Flandrin, P.: A complete ensemble empirical
mode decomposition with adaptive noise. In: Proceedings of the IEEE International Confer-
ence on Acoustics, Speech and Signal (ICASSP), May 22–27, 2011, Prague Congress Center
Prague, Czech Republic, pp. 4144–4147 (2011)
22. Daubechies, I., Lu, J., Wu, H.T.: Synchrosqueezed wavelet transforms: an empirical mode
decomposition-like tool. Appl. Comput. Harmon. Anal. 30(2), 243–261 (2011)
23. Thakur, G., Brevdo, E., Fučkar, N.S., Wu, H.T.: The synchrosqueezing algorithm for time-
varying spectral analysis: robustness properties and new paleoclimate applications. Signal
Process. 93(5), 1079–1094 (2013)
24. Bindoff, N.L., Willebrand, J., Artale, V., Cazenave, A., Gregory, J., Gulev, S., Hanawa, K.,
Le Quéré, C., Levitus, S., Nojiri, Y., Shum, C.K., Talley, L.D., Unnikrishnan, A.: Observations:
oceanic climate change and sea level. In: Solomon, S., Qin, D., Manning, M., Chen, Z.,
Marquis, M., Averyt, K.B., Tignor, M., Miller, H.L. (eds.) Climate Change 2007: The
Physical Science Basis. Contribution of Working Group I to the Fourth Assessment Report
of the Intergovernmental Panel on Climate Change. Cambridge University Press, Cambridge
(2007)
Identifying the Best Performing Time Series Analytics for Sea Level Research 277
25. Woodworth, P.L., White, N.J., Jevrejeva, S., Holgate, S.J., Church, J.A., Gehrels, W.R.:
Review—evidence for the accelerations of sea level on multi-decade and century timescales.
Int. J. Climatol. 29, 777–789 (2009)
26. Wood, S.: Generalized Additive Models: An Introduction with R. CRC, Boca Raton, FL (2006)
27. O’Sullivan, F.: A statistical perspective on ill-posed inverse problems. Stat. Sci. 1(4), 502–527
(1986)
28. O’Sullivan, F.: Fast computation of fully automated log-density and log-hazard estimators.
SIAM J. Sci. Stat. Comput. 9(2), 363–379 (1988)
29. Eilers, P.H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties. Stat. Sci. 11(2),
89–102 (1996)
30. Zeileis, A., Grothendieck, G.: Zoo: S3 infrastructure for regular and irregular time series. J.
Stat. Softw. 14(6), 1–27 (2005)
31. Cleveland, R.B., Cleveland, W.S., McRae, J.E., Terpenning, I.: STL: a seasonal-trend decom-
position procedure based on loess. J. Off. Stat. 6(1), 3–73 (1990)
32. Durbin, J., Koopman, S.J.: Time Series Analysis by State Space Methods (No. 38). Oxford
University Press, Oxford (2012)
33. GRETL.: Gnu Regression, Econometrics and Time series Library (GRETL). http://www.gretl.
sourceforge.net/ (2014)
34. Golyandina, N., Korobeynikov, A.: Basic singular spectrum analysis and forecasting with R.
Comput. Stat. Data Anal. 71, 934–954 (2014)
35. Kim, D., Oh, H.S.: EMD: a package for empirical mode decomposition and Hilbert spectrum.
R J. 1(1), 40–46 (2009)
36. Kim, D., Kim, K.O., Oh, H.S.: Extending the scope of empirical mode decomposition by
smoothing. EURASIP J. Adv. Signal Process 2012(1), 1–17 (2012)
37. Bowman, D.C., Lees, J.M.: The Hilbert–Huang transform: a high resolution spectral method
for nonlinear and nonstationary time series. Seismol. Res. Lett. 84(6), 1074–1080 (2013)
38. Percival, D.B., Walden, A.T.: Wavelet Methods for Time Series Analysis (Cambridge Series in
Statistical and Probabilistic Mathematics). Cambridge University Press, Cambridge (2000)
39. Daubechies, I.: Orthonormal bases of compactly supported wavelets. Commun. Pure Appl.
Math. 41(7), 909–996 (1988)
40. R Core Team.: R: A language and environment for statistical computing. R Foundation
for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org/
(2014)
41. Mandic, D.P., Rehman, N.U., Wu, Z., Huang, N.E.: Empirical mode decomposition-based
time-frequency analysis of multivariate signals: the power of adaptive data analysis. IEEE
Signal Process Mag. 30(6), 74–86 (2013)
42. Alexandrov, T., Golyandina, N.: Automatic extraction and forecast of time series cyclic
components within the framework of SSA. In: Proceedings of the 5th St. Petersburg Workshop
on Simulation, June, St. Petersburg, pp. 45–50 (2005)
43. Ghil, M., Allen, M.R., Dettinger, M.D., Ide, K., Kondrashov, D., Mann, M.E., Yiou, P.:
Advanced spectral methods for climatic time series. Rev. Geophys. 40(1), 3–1 (2002)
44. Moore, J.C., Grinsted, A., Jevrejeva, S.: New tools for analyzing time series relationships and
trends. Eos. Trans. AGU 86(24), 226–232 (2005)
45. Church, J.A., White, N.J., Coleman, R., Lambeck, K., Mitrovica, J.X.: Estimates of the regional
distribution of sea level rise over the 1950–2000 period. J. Clim. 17(13), 2609–2625 (2004)
46. Church, J.A., White, N.J.: A 20th century acceleration in global sea-level rise. Geophys. Res.
Lett. 33(1), L01602 (2006)
47. Church, J.A., White, N.J.: Sea-level rise from the late 19th to the early 21st century. Surv.
Geophys. 32(4-5), 585–602 (2011)
48. Domingues, C.M., Church, J.A., White, N.J., Gleckler, P.J., Wijffels, S.E., Barker, P.M., Dunn,
J.R.: Improved estimates of upper-ocean warming and multi-decadal sea-level rise. Nature
453(7198), 1090–1093 (2008)
278 P.J. Watson
49. Hendricks, J.R., Leben, R.R., Born, G.H., Koblinsky, C.J.: Empirical orthogonal function
analysis of global TOPEX/POSEIDON altimeter data and implications for detection of global
sea level rise. J. Geophys. Res. Oceans (1978–2012), 101(C6), 14131–14145 (1996)
50. Jevrejeva, S., Moore, J.C., Grinsted, A., Woodworth, P.L.: Recent global sea level acceleration
started over 200 years ago? Geophys. Res. Lett. 35(8), L08715 (2008)
51. Meyssignac, B., Becker, M., Llovel, W., Cazenave, A.: An assessment of two-dimensional past
sea level reconstructions over 1950–2009 based on tide-gauge data and different input sea level
grids. Surv. Geophys. 33(5), 945–972 (2012)
52. Torrence, C., Compo, G.P.: A practical guide to wavelet analysis. Bull. Am. Meteorol. Soc.
79(1), 61–78 (1998)
53. Percival, D.B., Mofjeld, H.O.: Analysis of subtidal coastal sea level fluctuations using wavelets.
J. Am. Stat. Assoc. 92(439), 868–880 (1997)
54. Hassani, H., Mahmoudvand, R., Zokaei, M.: Separability and window length in singular
spectrum analysis. Comptes Rendus Mathematique 349(17), 987–990 (2011)
Modellation and Forecast of Traffic Series
by a Stochastic Process
Abstract Traffic in a road point is counted, by a device, for more than 1 year,
giving us time series. The obtained data have a trend with a clear seasonality. This
allows us to split taking each season (1 week or 1 day, depending on the series) as
a separated path. We see the set of paths we have like a random sample of paths for
a stochastic process. In this case seasonality is a common behaviour for every path
that allow us to estimate the parameters of the model. We use a Gompertz-lognormal
diffusion process to model the paths. With this model we use a parametric function
for short-time forecasts that improves some classical methodologies.
1 Introduction
The challenge we assume in this paper is to model and forecast the traffic flow
at a road point. We work with four time series with different features, all of them
available at the web page http://geneura.ugr.es/timeseries/itise2015 offering the time
series in CSV file format. They correspond to four monitoring devices installed in
Granada (Spain) and include data between 14 November 2012 and 07 August 2013.
These data are widely described in [4].
Four different time intervals (24 h, 1 h, 30 min and 15 min) are presented
– ITISE15-TS_A. Interval: 24 h Time Series, from 2012-11-26 to 2013-07-14. 231
values for training. Seven values for testing.
– ITISE15-TS_B. Interval: 1 h Time Series, from 2013-05-22 00:00 (W) to 2013-
08-07 00:00 (W). 1872 values for training. 24 values for testing.
– ITISE15-TS_C. Interval: 30 min Time Series, from 2012-11-24 00:00 (W) to
2013-07-21 00:00 (W). 11518 values for training. 48 values for testing.
– ITISE15-TS_D. Interval: 15 min Time Series, from 2012-11-14 00:00 (W) to
2013-02-24 00:00 (W). 9888 values for training. 96 values for testing.
forecast with both models. Then we calculate some standard error measurements:
mean absolute error (MAE), mean absolute percentage error (MAPE), mean squared
error (MSE) and root mean squared error (RMSE).
Results show that Gompertz-lognormal diffusion process gives a better forecast
than autoregressive integrated moving average (ARIMA) model does.
The rest of the paper is organised as follows: Sect. 2 succinctly reviews the
state of the art related to time series. Section 3 describes the methodology applied.
Section 4 presents the results for the four temporal series and, ending, Sect. 5 itemise
the conclusions.
There exists a large number of approaches for modelling temporal series. The most
widely applied are ARIMA models.
ARIMA models were developed by Box and Jenkins [2] with the purpose of
searching the best fit of the values of a temporal series, to have accurate forecasts.
There are three primary stages in building an ARIMA model; seasonality identifi-
cation, estimation and validation. Although ARIMA models are quite flexible, the
most general model includes difference operators, autoregressive terms, moving
average terms, seasonal difference operators, seasonal autoregressive terms and
seasonal moving average terms. Building good ARIMA models generally requires
more experience than commonly used statistical methods.
On the other hand, diffusion process is useful for modelling time-dependent
variables that increase, usually with an exponential or sigmoidal trend [1]. In these
cases practitioners choose a model among a set of them, depending on the features
of the observed paths, the kind of data and how interpretable the parameters are.
Then, next step is to estimate the parameters of the model and use a parametric
function, such as mode or mean functions, to forecast future unknown values.
Some papers focus their attention on other related questions, such as first-passage
time [6] or additional information to add as exogenous factor for the model [5, 7, 12].
The main advantage using stochastic diffusion process is that it is not necessary
to make changes to fit the parameters because they are estimated by maximum
likelihood.
Nevertheless, using stochastic processes is not the common way to face with time
series, so we need to make a conscious processing of the data. Next section shows
the methodology we have performed.
3 Methodology
We have analysed the time series very cautiously in order to obtain the model
for the data. Firstly, we preprocess the data, doing a selection and removing
anomalous values. We describe this step in Sect. 3.1. With the resulting dataset,
Modellation and Forecast of Traffic Series by a Stochastic Process 283
The first step for preprocessing data is to split the original data series into trajectories
or paths. The path length is fixed depending on the type of the data series and the
data which we intend to forecast. By splitting the data series, we get data by weeks
(series A) or by days (series B, C and D).
In this case, the data series provides the traffic flow day by day, through some
months. The challenge is to forecast the traffic flow for the following week. We
have split the data series following a weekly pattern, so each week of the data series
is a path for the model.
Some of the paths include holidays which disturb the general pattern. Because
the week that we want to predict does not have any holy day, we remove those paths
to fit the model.
We use the rest of the paths for estimating the parameters for the process and we
forecast by using the conditional mean and the conditional mode functions of the
process. Our approach uses the conditional mean and the conditional mode to the
previous step of the path for forecasting, using always the previous path in time.
This approach runs not so good in some situations, for example, if we want to
predict the traffic on Monday, the information of the previous day, Sunday, is not
representative because it is a weekend in which the behaviour is very different. This
fact also happens when we try to predict the traffic on Saturday from the information
on Friday.
The reason why the model does not work in these particular cases is because the
diffusion process is a continuous time model. However, we are using it as a discrete
time model. If the discrete values have small laps between them, the model works
properly. Opposite, we have no relevant information from the previous step for the
predictions if the discrete values are separated by a long time.
To avoid this problem and to obtain a best set of discrete values, we consider
a second approach. In this case we use a different model for each week day. We
estimate seven models, one for each day of the week. Then, we use the conditional
mean and mode functions from the previous step of the path for forecasting. With
this approach we get better results.
Figures 5 and 6 show the multiple trajectories that we take into account for
modelling with Gompertz-lognormal diffusion process.
284 D. Romero et al.
Fig. 5 Processed data of the observed values for ITISE15-TS_A time series for modelling with
Gompertz-lognormal diffusion process, first approach
Fig. 6 Processed data of the observed values for ITISE15-TS_A time series for modelling with
Gompertz-lognormal diffusion process, second approach
Modellation and Forecast of Traffic Series by a Stochastic Process 285
For these cases we split the temporal series by days, getting different paths. From
this set of paths we select only those ones that provide information related to our
forecast objective. For example, for one of the cases where the required forecast is
a holy day we only consider paths of holidays. In other series, we observe that the
sample paths drastically change its behaviour from a date on, so we consider that
a good model to forecast should use only new information since this date. Finally,
some paths show anomalous behaviour because the path contains either holy days
or erroneous data. We remove those paths from the study, taking into account that
the required forecast days are working days.
Considering the selected paths, we estimate the parameters of the model and use
conditional mean and mode functions. The conditional mean and mode functions
made forecasts based on the previous path in time. In this case, though the series
considered are discrete, the continuous model fits these discrete data properly.
Figures 7, 8 and 9 display multiple trajectories, obtained by splitting the time
series, for modelling the data with the Gompertz-lognormal diffusion process.
Fig. 7 Processed data of the observed values for ITISE15-TS_B time series for modelling with
Gompertz-lognormal diffusion process
286 D. Romero et al.
Fig. 8 Processed data of the observed values for ITISE15-TS_C time series for modelling with
Gompertz-lognormal diffusion process
Fig. 9 Processed data of the observed values for ITISE15-TS_D time series for modelling with
Gompertz-lognormal diffusion process
Modellation and Forecast of Traffic Series by a Stochastic Process 287
For the first series, named ITISE15-TS_A, seasonality is weekly. In this case, as
we claimed previously, we use two strategies.
Firstly, we get the series of each week separately. In order to forecast the next
week day we use the information from the total set of previous data, removing any
anomalous data. The parameter estimation is unique for the set of paths. For each
day we want to predict, we use the information of the day immediately preceding.
Secondly, we estimate seven different models, one for each week day. For each
one, we use a sample path containing all the information about that day, eliminating
the anomalous cases. In order to do the predictions, we use the respective model and
the information that provides the previous week day value of the series.
For example, following the first strategy, the forecast of Monday is conditioned
to previous Sunday. However, following the second strategy, the forecast of Monday
is conditioned to previous Monday.
These two ways are named with subscripts 1 and 2, respectively.
The rest of the series, named ITISE15-TS_B, ITISE15-TS_C and ITISE15-
TS_D, have daily seasonality, the differences are only in time lapse measures. In
288 D. Romero et al.
these cases we use multiple paths of day-length to estimate the parameters. The
forecast of new values is conditioned to the previous path in each case.
In order to compare the obtained results with classical techniques, we have
carried out an ARIMA model for the same series. For this analysis we did not split
the temporal series and we did not preprocessed the data (holidays and erroneous
data are not removed).
4 Results
We resume the forecast values jointly with the real observed values in
Figs. 10, 11, 12 and 13. They show that both conditional mode and mean functions
work better than ARIMA predictions. As we expected, the conditional mode
function works better than conditional mean function. It is an expectable result
because conditional mode function shows more probable values and conditional
mean function shows average results.
Note that lognormal variables have not symmetric density probability function.
Right asymmetry provokes the mean has higher value than the mode. Since for each
fixed time t, the variable X.t/ is lognormal distributed, then is expectable that mean
function exceeds the mode function.
Tables 1, 2, 3 and 4 show the error measurements for all series, using the
conditional mode function and the conditional mean function of the Gompertz-
Fig. 10 Observed values, conditional mode forecast, conditional mean forecast and ARIMA
model forecast for the seven test values of ITISE15-TS_A series
Modellation and Forecast of Traffic Series by a Stochastic Process 289
Fig. 11 Observed values, conditional mode forecast, conditional mean forecast and ARIMA
model forecast for the 24 test values of ITISE15-TS_B series
Fig. 12 Observed values, conditional mode forecast, conditional mean forecast and ARIMA
model forecast for the 48 test values of ITISE15-TS_C series
lognormal diffusion process, and using ARIMA model. These errors are calculated
only for the forecast time.
Note that Table 1 resumes results for two different predictions with Gompertz-
lognormal diffusion process and for ARIMA. Firstly, with subscript 1, there are
prediction errors for the week, forecasting each value for the day using the
290 D. Romero et al.
Fig. 13 Observed values, conditional mode forecast, conditional mean forecast and ARIMA
model forecast for the 96 test values of ITISE15-TS_D series
Table 1 Error measurements for the predictions of the temporal series in dataset ITISE15-TS_A
MAE MAPE (%) MSE RMSE
Cond. mean1 2261:51 17:17 12264729:06 3502:10
Cond. mode1 2297:23 19:23 9732040:16 3119:62
Cond. mean2 464:86 13:81 338310:52 581:64
Cond. mode2 419:16 14:7 261344:34 511:22
ARIMA model 1227:06 7:67 2017643:49 1420:44
Data are fitted by a Gompertz-lognormal diffusion process and by an ARIMA model. Best results
are in bold
Table 2 Error measurements for the predictions of the temporal series in dataset ITISE15-TS_B
MAE MAPE (%) MSE RMSE
Cond. mean 160:03 30:30 37618:16 193:95
Cond. mode 46:40 14:79 3935:26 62:73
ARIMA model 291:12 223:28 113569:38 337:00
Data are fitted by a Gompertz-lognormal diffusion process and by an ARIMA model. Best results
are in bold
information about the previous day. With subscript 2, we predict each value using
the information about previous week.
Modellation and Forecast of Traffic Series by a Stochastic Process 291
Table 3 Error measurements for the predictions of the temporal series in dataset ITISE15-TS_C
MAE MAPE (%) MSE RMSE
Cond. mean 67:88 31:71 7272:40 85:28
Cond. mode 34:72 16:68 2665:53 51:63
ARIMA model 226:10 193:55 68117:50 260:99
Data are fitted by a Gompertz-lognormal diffusion process and by an ARIMA model. Best results
are in bold
Table 4 Error measurements for the predictions of the temporal series in dataset ITISE15-TS_D
MAE MAPE (%) MSE RMSE
Cond. mean 33:54 21:51 2300:81 47:97
Cond. mode 19:85 31:10 950:31 30:83
ARIMA model 85:15 70:87 11674:84 108:05
Data are fitted by a Gompertz-lognormal diffusion process and by an ARIMA model. Best results
are in bold
5 Conclusions
As we can appreciate, after getting off seasonality, the use of diffusion process is
helpful in temporal series prediction problems, when they are continuous enough.
Occasionally, we need to select the paths from the temporal series. That allows us
to work with the data and the diffusion process. In general, the fitting of the model
to the series allows better predictions, with lower error than classic model ARIMA.
Moreover, excepting MAPE in Tables 1 and 4, the rest of measurements point out
that the conditional mode function of the Gompertz-lognormal diffusion process
provides better predictions for all the temporal series. This can be due to the
finite dimensional distributions of the process are lognormal, and lognormal is an
asymmetrical distribution. The asymmetry implies the mode function predicts lower
values than mean function.
The main problem that this treatment for the series presents is the required work
to preprocess. Nevertheless, this is easily automationable because it is based on
obvious and simple procedures.
Other methods to obtain predictions like logistic regression, multilayer per-
ceptrons and support vector machine have been used [4]. We can see the error
measurement are not better than the obtained using Gompertz-lognormal diffusion
process joint with the preprocess of the data. All error measurements are lower when
we use the Gompertz-lognormal diffusion process except MAPE in Table 1.
Acknowledgements This work has been supported in part by FQM147 (Junta de Andalucía),
SIPESCA (Programa Operativo FEDER de Andalucía 2007–2013), TIN2011-28627-C04-02 and
TIN2014-56494-C4-3-P (Spanish Ministry of Economy and Competitivity), SPIP2014-01437
(Dirección General de Tráfico), PRY142/14 (Este proyecto ha sido financiado íntegramente por la
Fundación Pública Andaluza Centro de Estudios Andaluces en la IX Convocatoria de Proyectos de
Investigación) and PYR-2014-17 GENIL project (CEI-BIOTIC Granada). Thanks to the Granada
Council, Concejalía de protección ciudadana y movilidad.
292 D. Romero et al.
References
1. Baudoin, F.: Diffusion Processes and Stochastic Calculus. EMS Textbooks in Mathematics
Series, vol. 16. European Mathematical Society, Zurich (2014)
2. Box, G., Jenkins, G.: Some comments on a paper by Chatfield and Prothero and on a review
by Kendall. J. R. Stat. Soc. 136(3), 337–352 (1973)
3. Box, G., Jenkins, G.: Time Series Analysis: Forecasting and Control. Holden Day, San
Francisco (1976)
4. Castillo, P.A., Fernandez-Ares, A.J., G-Arenas, M., Mora, A., Rivas, V., Garcia-Sanchez, P.,
Romero, G., Garcia-Fernandez, P., Merelo, J.J.: SIPESCA-competition: a set of real data time
series benchmark for traffic prediction. In: International Work-Conference on Time Series
Analysis, Proceedings ITISE 2015, pp. 785–796 (2015)
5. García-Pajares, R., Benitez, J., Palmero, S.G.: Feature selection form time series forecasting:
a case study. In: Proceedings of 8th International Conference on Hybrid Intelligent Systems,
pp. 555–560 (2008)
6. Gutiérrez, R., Román, P., Torres, F.: Inference and first-passage-times for the lognormal
diffusion process with exogenous factors: application to modelling in economics. Appl. Stoch.
Models Bus. Ind. 15, 325–332 (1999)
7. Gutiérrez, R., Román, P., Romero, D., Torres, F.: Forecasting for the univariate lognormal
diffusion process with exogenous factors. Cybern. Syst. 34(8), 709–724 (2003)
8. Gutiérrez, R., Román, P., Romero, D., Serrano, J.J., Torres, F.: A new Gompertz-type diffusion
process with application to random growth. Math. Biosci. 208, 147–165 (2007)
9. Rico, N., Román, P., Romero, D., Torres, F.: Gompertz-lognormal diffusion process for
modelling the accumulated nutrients dissolved in the growth of Capiscum annuum. In:
Proceedings of the 20th Annual Conference of the International Environmetrics Society, vol. 1,
p. 90 (2009)
10. Rico, N., G-Arenas, M., Romero, D., Crespo, J.M., Castillo, P., Merelo, J.J.: Comparing
optimization methods, in continuous space, for modelling with a diffusion process. In:
Advances in Computational Intelligence. Lecture Notes in Computer Science, vol. 9095,
pp. 380–390. Springer, Heidelberg (2015)
11. Romero, D., Rico, N., G-Arenas, M.: A new diffusion process to epidemic data. In: Computer
Aided Systems Theory - EUROCAST 2013. Lecture Notes in Computer Science, vol. 8111,
pp. 69–76. Springer, Heidelberg (2013)
12. Sa’ad, S.: Improved technical efficiency and exogenous factors in transportation demand for
energy: an application of structural time series analysis to South Korean data. Energy 35(7),
2745–2751 (2010)
13. Shumway, R.H., Stoffer, D.S.: ARIMA models. In: Time Series Analysis and Its Applications.
Springer Texts in Statistics. Springer, New York (2011)
Spatio-Temporal Modeling for fMRI Data
Abstract Functional magnetic resonance imaging (fMRI) uses fast MRI techniques
to enable studies of dynamic physiological processes at a time scale of seconds.
This can be used for spatially localizing dynamic processes in the brain, such as
neuronal activity. However, to achieve this we need to be able to infer on models of
four-dimensional data. Predominantly, for statistical and computational simplicity,
analysis of fMRI data is performed in two-stages. Firstly, the purely temporal
nature of the fMRI data is modeled at each voxel independently, before considering
spatial modeling on summary statistics from the purely temporal analysis. Clearly,
it would be preferable to incorporate the spatial and temporal modeling into one
all encompassing model. This would allow for correct propagation of uncertainty
between temporal and spatial model parameters. In this paper, the strengths and the
weaknesses of currently available methods will be discussed based on hemodynamic
response (HRF) signal modeling and spatio-temporal noise modeling. Specific
application to a medical study will also be described.
1 Introduction
from fMRI scanner has been shown to be closely linked to neural activity [11].
Through a process called the hemodynamic response, blood releases oxygen to
active neurons at a greater rate than to inactive ones. The difference in magnetic
susceptibility between oxyhemoglobin and deoxyhemoglobin, and thus oxygenated
or deoxygenated blood, leads to magnetic signal variation which can be detected
using an MRI scanner. The relationship between the experimental stimulus and
the BOLD signal involves the hemodynamic response function (HRF). For the
simplicity of illustration, we denote the relationship by
Several pioneer papers have used experiments to confirm that the functional form
of F is approximately linear based on some typical STIMULUS such as auditory
or finger tapping. Furthermore, it has been reported that the fMRI time series is
hampered by the hemodynamic distortion. These effects result from the fact that
the fMRI signal is only a secondary consequence of the neuronal activity. See, for
example, Glover [8]. These observations have simplified the above model greatly
leading to the following popular model adopted by many fMRI studies:
paradigm allows the HRF to return to baseline or recover after every trial. Most of
the approaches are time domain based because of the spiking nature of the stimuli.
In this paper, we introduce a Fourier method based on the transfer function
estimate (TFE). This is a frequency domain nonparametric method for extracting
the HRF from any kind of experimental designs, either block or event-related.
This approach also allows TFE to detect the activation through test of hypotheses.
Moreover, it can be further developed to validate the linearity assumption. This
is very important as reported in some of the earlier papers, the BOLD response
is an approximate linear function of the stimuli and the HRF, and the linearity
assumption may not hold throughout the brain. These desirable features of TFE will
be demonstrated using on simulation and real data analysis. More specifically,
– We first extend Bai et al.’s [1] method to multivariate form to estimate multiple
HRFs simultaneously using ordinary least square (OLS). We also verified and
reported the consistency and asymptotic normality of the OLS estimator.
– TFE detects the brain activation while estimating HRF by providing the F map
and also tests the linearity assumption inherited from the convolution model.
– TFE is able to compare the difference among multiple HRFs in the experiment
design.
– TFE adapts to all kinds of experiment designs, and it does not depend on the
pre-specified HRF length support.
The present paper is based on the first author’s Ph.D. dissertation [5] in which
the methodology and its statistical sampling properties are described in more
details. The current approach is based on the OLS method, a weighted least square
(WLS) version for estimating the HRF has been implemented and a more detailed
discussion on these approaches to fMRI can be found therein.
2 Method
In this section, we will outline how we developed TFE in order to make it feasible
to estimate HRF under a multi-stimulus design experiment.
Using multiple stimuli in one experiment session is in high demand as it obtains
stronger response signal compared to the single stimulus design. In the view of
human brain’s biological process, the advantage from multiple stimuli is to avoid the
refractory effect, which causes nonlinearity in the response over time [6]. Subjects
are easy to get bored if only one stimulus shows repeatedly in a session. In the
view of experiment design, multiple stimuli are helpful to make efficient design in a
limited time that the scanner provides. Thus the change of stimulus over successive
trials is beneficial for experiment design.
296 W. Chen et al.
2.1 Model
where xi .t/ represents the ith stimulus function, hi .t/ is the corresponding P HRF,
and ˝ is the binary convolution operator define by x ˝ h.t/ D u x.u/h
.t u/. We assume that the error or noise series, z.t/, is stationary with 0 mean
and power spectrum szz .r/, where r is the radian frequency. Here T is the duration
of the experimental trial.
To write the convolution model (3) in a matrix form, let x.t/ be an n vector-valued
series, i.e., x.t/ D .x1 .t/; x2 .t/; : : : ; xn .t// . Suppose that h.u/ is a 1 n filter given
by h.u/ D .h1 .u/; h2 .u/; : : : ; hn .u//. The BOLD model (3) to be considered in this
paper is
X
y.t/ D h.u/x.t u/ C z.t/: (4)
u
We further assume that the HRF h.u/ is 0 when u < 0 or u > d, where d is the
length of HRF latency determined by underlying neural activity.
P Let H./ denote the finite Fourier transform (FFT) of h./ given by H.r/ D
t h.t/ exp.irt/. Define similarly
X
T1 X
T1
Y.r/ Y .T/ .r/ D y.t/ exp.irt/; X.r/ X.T/ .r/ D x.t/ exp.irt/;
tD0 tD0
X
T1
Z.r/ Z .T/ .r/ D z.t/ exp.irt/; r 2 R:
tD0
Y.2.K Ck/=T/DH.r/X.2.K
P Ck/=T/CZ.2.K Ck/=T/; k D 0; ˙1; : : : ; ˙m:
(5)
Here m is an appropriate integer to be specified, which is related to the degree of
smoothing the Fourier transform in estimating the spectral density function.
Spatio-Temporal Modeling for fMRI Data 297
Relation (5) is seen to have the form of a multiple regression relation involving
complex-valued variates provided H.r/ is smooth. In fact, according to [3], an
efficient estimator is given by
O
H.r/ D sOyx .r/Osxx .r/1 ; r 2 R: (6)
X
T1
Oh.u/ D 1 O 2t exp.i2tu=T/;
H u D 0; 1; : : : ; d: (7)
T tD0 T
It has been shown that h.u/O has attractive sampling properties that it is an
asymptotically consistent and efficient estimate of h.u/.
3 Hypothesis Testing
After introducing TFE method, this section introduces the multivariate tests for
fMRI analysis.
First, we introduce two key concepts [3] here to support the hypothesis testing.
Coherence is an important statistic that provides a measure of the strength of
a linear time invariant relation between the series y.t/ and the series x.t/, that is, it
indicates whether there is a strongly linear relationship between the BOLD response
298 W. Chen et al.
and the stimuli. From a statistical point of view, we can test the linear time invariant
assumption for the convolution model; for the fMRI exploration, we can choose
the voxels with significant large coherence where the BOLD series have functional
response to the stimulus, and then estimate the HRF in those voxels.
Coherence is defined as
ˇ ˇ
ˇRyx .r/ˇ2 D syx .r/sxx .r/1 sxy .r/=syy .r/: (8)
syxi xj .r/ D syxi .r/ syxj .r/sxj xj .r/1 sxj xi .r/: (9)
Usually the case of interest is the relationship between the response and a single
stimulus after the other stimuli are accounted for, that is, xi is the single stimulus of
interest, and xj is the other stimuli involved in the design paradigm.
The partial coherence of y.t/ and xi .t/ after removing the linear effects of xj .t/ is
given by
If n D 2, that is, if there are two kinds of stimulus in the experiment, it can be
written as
ˇ ˇ
ˇ ˇ ˇRyx .r/ Ryx .r/Rx x .r/ˇ2
ˇRyx x .r/ˇ2 D ˇ
i
ˇ2
j
ˇ
i j
ˇ2 : (11)
Œ1 ˇRyx .r/ˇ Œ1 ˇRx x .r/ˇ
i j
j i j
The linearity assumption functions as the essential basis of the convolution model.
As we know, any nonlinearity in the fMRI data may be caused by the scanner system
or the human physical capability such as refractory. Refractory effects refer to the
reductions in hemodynamic amplitude after several stimuli presented. If refractory
effects are present, then a linear model will overestimate the hemodynamic response
to closely spaced stimuli, potentially reducing the effectiveness of experimental
analyses. It is critical, therefore, to consider the evidence for and against the linearity
of the fMRI hemodynamic response.
It is possible that the nonlinearity is overwhelmed during scanning. Conse-
quently, it is crucial to make sure that the linearity assumption is acceptable.
The advantage of our method is that we can first determine whether the linearity
assumption is acceptable before using the convolution model for analysis.
The value of coherence, between 0 and 1, reflects the strength of the linear
relation between fMRI response and the stimuli. Under certain conditions, RO Yx .r/
is asymptotically normal with mean Ryx .r/ and variance proportional to constant
.1 R2yx .r//=Tb. Moreover, if Ryx D 0, then
.c n/jRO yx .r/j2
F.r/ D F2n;2.cn/ ; (12)
n.1 jRO yx .r/2 j/
R
where c D bT= and D 2 with being the lag-window generator depending
on the choice of window function. If the F statistic on coherence is significant, it is
reasonable to accept the linearity assumption.
For each brain area, stimuli have varying effects. For the motor cortex in the left
hemisphere, right-hand motion causes much more neural activities than left-hand
motion. Partial coherence is able to distinguish between right- and left-hand effects,
determine whether left-hand motion evokes neural activity, and identify which
motion has greater effect. The following test is applied for these kinds of research
questions.
For partial coherence, if Ryxi xj D 0, then
The HRF in fMRI indicates the arising neural activity. If there is activation evoked
by the stimulus, then the corresponding HRF cannot be ignored. If there is no HRF
in a brain region, there is no going-on neuronal activity. To detect activation in the
brain region is to see whether there is underlying HRF. For our frequency method,
we test H.r0 / D 0 at stimulus-related frequency r0 .
We are interested in testing the hypothesis H.r/ D 0. This is carried out by means
of analogs of the statistic (8). In the case H.r/ D 0,
O sxx .r/H.r/
.bT= /H.r/O O
(14)
nOszz .r/
where
.bT= /1 sOxx .r/1 r ¤ 0 mod
†D :
.bT= 1/1 sOxx .r/1 r D 0 mod
NnC .; / is the complex multivariate normal distribution for the n vector-valued
random variable [3].
The contrast between different HRF estimates P O ,
can be represented by c H.r/
n
where c D .c1 ; c2 ; : : : ; cn / which satisfies that iD1 ci D 0. For the complex
number c H.r/O , the hypothesis testing can be conducted by the definition of
Spatio-Temporal Modeling for fMRI Data 301
1
N 0; szz .r/cv † v cv : (16)
2
Thus, the t statistic for the contrast between different HRF estimates is
O v .r/
cv H
q t2.bT= n/ ; r ¤ 0 mod : (18)
2.bT= n/
bT=
sOzz .r/cv † 1
v cv
The contrast is highly utilized in fMRI to point out the discrepancy of responses
in different conditions. In the fMRI softwares SPM and FSL, we need first to specify
“conditions,” which is analogous to types of stimuli here. For example, if we have
two types of stimuli from the right and left hands, there are two conditions: Right
and Left. Then we need to set up the contrast of conditions according to our interest.
For testing whether the right hand has greater effect than the left hand, the contrast
should be Right > Left, equivalent to Right Left > 0. So we state the contrast in a
vector c D .1; 1/, 1 for the condition Right and 1 for the condition Left. After
settling the contrast, SPM and FSL will continue their general linear model, using
parameters to conduct the t statistic.
The hypothesis of comparing the HRF similarity here equates the contrasts in
SPM and FSL. We have two types of stimuli: Right and Left. Then we have
respective HRF estimates for Right and Left. To test whether Right > Left, we
O . As the result, t statistic in (18) is used for testing
specify c D .1; 1/ in c H.r/
their difference.
3.6 Remarks
We will make a few remarks or comments on our method in contrast with the
existing methods.
1. Some popular packages [7, 9] estimate the activation based on (3) and (2) using
general linear models (GLM) procedure. The model considered in SPM and FSL
302 W. Chen et al.
is given by
ˇO
tD
O
se.ˇ/
will further need to verify the error auto-covariance structure for this step, which
can be inefficient for the whole brain.
4. A key issue to be addressed in using the TFE method is the smoothing parameter
in estimating various auto- and cross-spectra. Some data-adaptive procedures
such as cross-validation have been investigated in [5] and the HRF shape has
made such a choice less challenging.
5. When the voxel time series is non-stationary, then both time and frequency
domain methods will not work that well. In that situation, one can invoke the
short time (or windowed) Fourier transform method (a special case of wavelets
analysis) to estimate the HRF, this will be carried out in another paper.
6. An extensive review is given in one of the chapters of Chen [5]. Here the main
objective is to highlight some important contributions to the problem of HRF
estimation using Fourier transform. This type of deconvolution or finite impulse
response (FIR) problems has been known to the area of signal processing.
The various forms of the test statistics (12), (13), however, have not been
appropriately addressed. Hence the aim of the current paper is to illustrate the
usefulness of the statistical inference tools offered by this important frequency
domain method.
4 Simulation
The simulation study was based on a multiple stimuli experiment design and a
simulated brain. The experiment in the section included two types of stimuli, called
left and right. The simulated brain had 8 8 voxels, which was designed to have
various brain functions in the left and right experiment design. The brain was
divided into four regions: one only responded to left, one only responded to right,
one can respond to both left and right, and the remaining one had no response in the
experiment.
The fMRI data was simulated based on the convolution model (3): the con-
volution of the pre-specified left HRF h1 ./ and right HRF h2 ./, and the known
experiment paradigm for left stimulus x1 .t/ and right stimulus x2 .t/. The response
was given by
The ARMA was chosen to test the strength of our method under other types of
correlated structures, and the coefficients were selected to illustrate the performance
of the procedure under moderate, serially correlated noise.
The illustrated experiment paradigm and the brain map are shown in Fig. 1. This
was an event-related design with left and right interchanging, where the green stands
for the left, and the purple stands for the right. The left and right stimuli came
periodically. The stimulus-related frequency for each was 1=40, and the overall
task-related frequency for the experiment was 1=20. The brain map shows the brain
function in each region (Fig. 1b).
The first simulation was to detect the activation regions in the brain. We assumed
both of the original HRFs for left and right were Glover’s HRF. At the experiment
frequency 1=20, the coherence and F statistic map are shown in Fig. 2. The lighter
the color is, the higher the values are. The high value of coherence in the responsive
region implies a strong linear relation in the simulation. Also, the F statistic
represents the strength of activation in the voxel. As expected, there were three
Fig. 1 The simulated brain map in the simulation. (a) shows the experiment design of the
simulation with two kinds of stimuli, which are finger taping on the right (shown in purple) and
on the left (shown in green). (b) is the simulated brain map: the purple region only responds to
the right-hand stimulus; the green region only responds to the left-hand stimulus; the brown region
responds to both left and right; and the white region has only noise. Originally published in Chen
(2012). Published with the kind permission of © Wenjie Chen 2012. All Rights Reserved
Spatio-Temporal Modeling for fMRI Data 305
Fig. 2 Detecting the activation regions by TFE. The activation region is where the brain has
response to the experiment stimulus. (a) shows that the true HRFs for both left and right are the
same. (b) shows the coherence value obtained in voxels (the red color means high intensity, and the
yellow indicates low intensity). (c) shows the corresponding F statistic, called F map. As shown
in (d), both right and left activated regions (marked in red) are detected. Originally published in
Chen (2012). Published with the kind permission of © Wenjie Chen 2012. All Rights Reserved
Fig. 3 Hypothesis testing with two identical HRFs in the simulated brain. (a) shows the Glovers
HRF for both left and right. (b) shows the overall t statistic over the brain map, where red
color means high positive values, green color means negative values, and yellow means near 0.
(c) shows the rejection region for the test: left right; (d) shows the rejection region for
left right; (e) shows the rejection region for left = right. Originally published in Chen (2012).
Published with the kind permission of © Wenjie Chen 2012. All Rights Reserved
activation regions: Left (L), Right (R), Left and Right (L&R), which was selected
at ˛ D 0:01 level.
At the stimulus-related frequency 1=40 level, we compared the similarity of left
and right HRFs. The true left and right HRFs are the same Glover’s HRF. The HRF
discrepancy region is only two regions: Left (L) and Right (R), where we regarded
no response as zero HRF. The simulation result is displayed in Fig. 3. The rejection
region for L>R is the region L; the rejection region for L<R is the region R; the
rejection region for L¤R is L&R at level ˛ D 0:05. As we can see, if the voxel
has the same response to different stimuli, it shows in the result that there is no
difference in the HRFs.
306 W. Chen et al.
Fig. 4 Detecting the activation regions by TFE with non-identical HRFs. The activation region is
where the brain has response to the experiment stimulus. (a) shows the true HRFs for both left
(green) and right (purple). (b) shows the coherence obtained in voxels (the red color means high
intensity, and the yellow indicates low intensity). (c) shows the corresponding F statistic, F map.
As shown in (d), both right and left activated regions (marked in red) are detected. Originally
published in Chen (2012). Published with the kind permission of © Wenjie Chen 2012. All Rights
Reserved
Fig. 5 Hypothesis testing with two non-identical HRFs in the simulated brain. (a) shows the
Glovers HRF for left (green) and half Glovers HRF for right (purple). (b) shows the overall
t statistic over the brain map, where red color means high positive values, green color means
negative values, and yellow means near 0. (c) shows the rejection region for the test: left right;
as the left HRF has much higher amplitude than the right one, the rejection region for the test
left right is the two regions that response to left-hand stimulus. (d) shows the rejection region
for right left; (e) shows the rejection region for left = right. Originally published in Chen (2012).
Published with the kind permission of © Wenjie Chen 2012. All Rights Reserved
The second simulation was built on different HRFs. The left HRF kept Glover’s
HRF, and the right HRF reduced Glover’s HRF to half. As we can see, the left and
right HRFs had different amplitudes. The left was larger than the right one. At the
experiment frequency 1=20, the activation region is accurately spotted in Fig. 4.
At the individual-stimulus-related frequency 1=40, the difference between left
and right HRF was detected, as shown in Fig. 5. The rejection region for L>R
contains the regions that respond to both L and R. The hypothesis testing of similar
HRFs clearly separated different HRFs.
Spatio-Temporal Modeling for fMRI Data 307
In order to test whether the method of Bai et al. [1] is applicable to real data
and detect fMRI activation, we applied the nonparametric method to the published
auditory data set on the Statistical Parametric Mapping website (http://www.fil.ion.
ucl.ac.uk/spm/data/auditory/). According to the information listed on the website,
these whole brain BOLD/EPI images were acquired on a modified 2T Siemens
MAGNETOM Vision system. Each acquisition consisted of 64 contiguous slices
.64 64 64; 3 mm 3 mm 3 mmvoxels/. Acquisition took 6.05 s, with the scan
to scan repeat time (TR) set arbitrarily to 7 s. During the experiment 96 acquisitions
were made in blocks of 6 that resulted in 16 42-s blocks. The blocks alternated
between rest and the auditory stimulation. We included eight trials in our dataset,
with the first six images acquired in the first run discarded due to T1 effects. The data
was preprocessed using SPM5, and included realignment, slice timing correction,
coregistration, and spatial smoothing.
Figure 6a shows the time course data from the one voxel that had the greatest F
value in Eq. (14). The voxel time series depicted in Fig. 6a has been detrended [10]
because the trends may result in false-positive activations if they are not accounted
for in the model. Since the voxel has a high F value, its time series has good
relationship to the task, similar to the pattern we obtained in our second simulation.
Figure 6b shows several HRF estimates from the 12 voxels with the highest F-
values in the brain. The majority of the HRF estimates closely match the HRF
shape, showing the increase in the signal that corresponds to the HRF peak and
some even depicting the post-dip after the peak signal. The TR for this dataset is 7 s,
which corresponds to the time interval between the acquisition of data points. This
fMRI signal
100
−100
(a) Raw FMRI data from the voxel with the grea test F value (b) HRF estimates
Fig. 6 HRF estimation for auditory data. (a) is the experimental design paradigm (the red dashed
line) for the auditory data. The solid line is fMRI response from an activated voxel over time;
(b) is the HRF estimates from the 12 highly activated voxels found by using TFE in the brain. Due
to the large TR (7 s), there is a limitation on showing the HRF estimate in finer temporal resolution.
In (b), we still can see the different shapes of HRF. Originally published in Chen (2012). Published
with the kind permission of © Wenjie Chen 2012. All Rights Reserved
308 W. Chen et al.
leads to a very low temporal resolution with which to measure the hemodynamic
response. The limitation of large TR time for estimating HRF is not only the low
temporal resolution, but it also conducts the wrong timing for the stimulus onset.
For instance, if TR is seven seconds, we have 40-s blocks instead of 42-s. If the
stimulus function X.t/ is 0–1 function which indicates the onset for every seven
seconds, then we will miss the exact onset on 40, 80, 120, 160, : : : s. The strategy
here we used is interpolation. We interpreted the preprocessed data on the second-
based time series, and then applied TFE to see the HRF. As a result, we could only
see the framework of the HRF and approximate its value. Despite this limitation,
the resulting framework gave us evidence that our method does indeed capture the
various HRFs in the voxels. In addition, it establishes that our HRF-based analysis
can be applied to real data and may be improved with correspondingly refined
temporal resolution.
In the experimental design of this study, there are seven stimulus blocks in the
time series data that have a total duration of 90 acquisitions. As a result, the task-
related frequency is 7=90 D 0:0778. Using this information, we can apply our
method in order to generate an F-statistic map to show the activation in the brain that
is triggered by the stimuli (bottom row in Fig. 7). For comparison, we also generated
a T map using SPM5 (the SPM T map) that is shown in the upper row of Fig. 7. The
SPM T map is a contrast image that obtains the activation triggered by the stimulus
by applying the canonical SPM HRF uniformly throughout the brain. As a result,
it does not take into account any HRF variation that might occur in the different
regions of the brain.
In both rows of Fig. 7, increased activation is depicted by increased hot color,
such that the bright yellow regions represent more activation. As expected from an
auditory study, both the F map generated using our method and the SPM-generated
T map display activation in the temporal lobe. The F map from our analysis shows
Fig. 7 (b) F map and (a) T map of the activation by using TFE and SPM. T maps contain blue
and hot colors which respectively indicates negative and positive factor. The F map generated by
TFE (bottom row) appears to have less noise compared to the SPM-generated T map (upper row).
Originally published in Chen (2012). Published with the kind permission of © Wenjie Chen 2012.
All Rights Reserved
Spatio-Temporal Modeling for fMRI Data 309
6 Discussion
The TFE, based on the method of Bai et al. [1], completes the fMRI data analysis
procedure: from adapting to various experimental design, through estimating HRF,
to detecting the activation region.
The first benefit is the experiment design. TFE can be applied for any type of
experimental design, including multiple stimulus design, event-related design, and
block design. Our nonparametric method can be applied to the multiple stimulus
experiment paradigm. From the property of HRF, different stimuli may cause
different hemodynamic responses even in one specific region. Consequently, the
corresponding HRF estimates to each stimulus will be given in our method, and
furthermore we carry out the statistical testing to see whether they are equivalent to
each other.
Our method can also be applied to block design and some rapid event-related
design. Most of the existing HRF estimation methods are only applied to event-
related design. With our method’s adaptability to various experimental design, we
extended the application to rapid event-related design and to block design by adding
an extra rest period. In fact, as long as there is a resting period during the design,
our method is better in estimating HRF.
The second benefit is reducing the noise. Noises might come from multiple
sources, such as the various types of scanners with systematic errors in use, the
background noise in the environment, and differences in the individual subjects’
heart beats and breathing. These noises would lead to heterogeneity of the records
of fMRI data. By using TFE, the heterogeneity is considered in the frequency
domain, which simplifies the error structure estimation process. Such simplicity
comes from the asymptotic independence of the spectrum in different Fourier
frequencies when we transfer the time series analysis to the frequency domain. In
addition, for efficiency, we use the WLS method to estimate the error spectrum.
Unlike the existing work [3, 13] based on the WLS method, which is implemented
by a computationally costly high-dimensional-matrix operation, our method shows
310 W. Chen et al.
higher performance, since the dimension of our matrix operation depends only on
the number of stimulus types in the experiment design.
The third benefit is HRF estimation. TFE does not require the length of HRF,
which is also called the latency (width) of HRF. As in most HRF modeling methods,
the length of HRF is the input as a priori to start the analysis. In practice, however,
the latency of HRF is unknown for the researchers. If the length of HRF is assumed
as known, such as in smooth FIR or the two-level method in [13], the final result
may be very sensitive to the input lengths. For TFE, the latency of HRF is not a
factor that affects the estimates. Additionally, the TFEs gives us a rough idea about
the latency of HRF by looking at how the estimates go to zero eventually over time.
One of the most important benefits is that TFE is able to generate the brain
activation map without using the general linear method (GLM). In fact, it simplified
the analysis by reducing the number of steps from two to one. The typical fMRI
analysis (SPM, FSL) requires two steps to customize HRF in the analysis. The first
step estimates HRF, and the second step applies the GLM to study the detection of
activation. Some issues related to GLM have not been addressed even in the most
recent versions of these packages. For example, the estimated standard error used
in tests for activation is a model-based approach and it is not efficient to check the
model validity for each voxel. Nevertheless, the GLM method continue to be applied
to explore other areas of brain research such as connectivity. In TFE, activation
detection is generalized by testing the hypothesis for the HRF estimates, which does
not require additional GLM and the specification of the error structure. Applications
of TFE to Parkinson’s disease and schizophrenia patients can be found in [5].
Also, the unique feature of using TFE is being able to test the linearity
assumption. As the linearity assumption is the foundation of the convolution model
we used, our method is able to test its validity before estimation, which is definitely
important for further analysis. As the linearity assumption is valid for the fMRI
data after testing, we are then able to use our nonparametric method to perform
the analysis, or any analysis tool based on the linearity assumption. If the linearity
testing fails, nonlinearity dominates the fMRI data, and the nonlinear estimation
method might be used [6, 8, 12].
References
1. Bai, P., Huang, X., Truong, Y.K.: Nonparametric estimation of hemodynamic response
function: a frequency domain approach. In: Rojo, J. (ed.) Optimality: The Third Erich L.
Lehmann Symposium. IMS Lecture Notes Monograph Series, vol. 57, pp. 190–215. Institute
of Mathematical Statistics, Beachwood (2009)
2. Boynton, G.M., Engel, S.A., Glover, G.H., Heeger, D.J.: Linear systems analysis of functional
magnetic resonance imaging in human V1. J. Neurosci. 16(13), 4207–4221 (1996)
3. Brillinger, D.R.: Time Series. Holden-Day, San Francisco (1981)
4. Buckner, R.L., Bandettini, P.A., OCraven, K.M., Savoy, R.L., Petersen, S.E., Raichle, M.E.,
Rosen, B.R.: Detection of cortical activation during averaged single trials of a cognitive task
using functional magnetic resonance imaging. Proc. Natl. Acad. Sci. U. S. A. 93(25), 14878–
14883 (1996)
Spatio-Temporal Modeling for fMRI Data 311
5. Chen, W.: On estimating hemodynamic response functions. Ph.D. thesis, The University of
North Carolina at Chapel Hill (2012)
6. Friston, K.J., Josephs, O., Rees, G., Turner, R.: Nonlinear event-related responses in fMRI.
Magn. Reson. Med. 39, 41–52 (1998)
7. Friston, K.J., Ashburner, J., Kiebel, S.J., Nichols, T.E., Penny, W.D. (eds.): Statistical
Parametric Mapping: The Analysis of Functional Brain Images. Academic, New York (2007).
http://www.fil.ion.ucl.ac.uk/spm/
8. Glover, G.H.: Deconvolution of impulse response in event-related BOLD fMRI1. NeuroImage
9(4), 416–429 (1999)
9. Jenkinson, M., Beckmann, C.F., Behrens, T.E., Woolrich, M.W., Smith, S.M.: Fsl. NeuroImage
62, 782–790 (2012). http://www.fmrib.ox.ac.uk/fsl/
10. Marchini, J.L., Ripley, B.D.: A new statistical approach to detecting significant activation in
functional MRI. NeuroImage 12(4), 366–380 (2000)
11. Ogawa, S., Lee, T.M., Kay, A.R., Tank, D.W.: Brain magnetic resonance imaging with contrast
dependent on blood oxygenation. Proc. Natl. Acad. Sci. U. S. A. 87(24), 9868–9872 (1990)
12. Vazquez, A.L., Noll, D.C.: Nonlinear aspects of the BOLD response in functional MRI.
NeuroImage 7(2), 108–118 (1998)
13. Zhang, C., Jiang, Y., Yu, T.: A comparative study of one-level and two-level semiparametric
estimation of hemodynamic response function for fMRI data. Stat. Med. 26(21), 3845–3861
(2007)
Part IV
Machine Learning Techniques in Time
Series Analysis and Prediction
Communicating Artificial Neural Networks
with Physical-Based Flow Model for Complex
Coastal Systems
Bernard B. Hsieh
1 Introduction
Numerical simulation of flows and other processes occurring in water has now
matured into an established and efficient part of hydraulic analyses. However,
models for coastal, estuary, and river systems with millions of nodes are becoming
increasingly common. These models make heavy demands on computing capacity.
The communication of computational intelligent techniques, such as Artificial
Neural Networks (ANNs) with numerical flow model or other types of physical-
based simulation is to introduce the data-driven techniques for developing modeling
model procedures to obtain more reliable and faster turn-around results. The
computational strategy is based on the analysis of all the data characterizing the
system.
Several categories for communication are how we can improve the model
accuracy during the modeling calibration/simulation processes, such as building
up more reliable time series boundary conditions and interior calibration points
(application type 1), how to generate a corrected time series when the calibra-
tion of a physical-based model has reached its optimal limitation (application
type 2) and how to perform a reliable boundary condition from a simulation-
optimization computational inverse modeling approach (application type 3—Hsieh
[1]). In addition, the great divergence between the response-time requirements
and the computational-time requirements points out the need to reduce the time
required simulating the impact of input events on hydrodynamics of the modeled
flow/transport system. The address of reducing time application (type 4) can be
called as “surrogate modeling” method to simplified a complex system but it will
receive reasonable accurate results and much shorten computational resources. For
instance, a surrogate modeling system which integrates a storm surge model with
supervised–unsupervised ANNs is used to solve the response-time requirement is
considered as a good example. Due to the length of paper limitation, this more
advanced approach can be found in Hsieh and Ratcliff [2, 3] and Hsieh [4].
This paper focuses the use of supervised ANNs to obtain the reliable long-term
continuous time series (type 1) of the physical-based model in a tidal-influenced
environment as the demonstration example.
The project area is located in Terrebonne Parish and includes the city of Houma
(Fig. 1). The Terrebonne estuary in southern Louisiana, USA, consists of many
waterways, lakes, and salt marsh lands. Houma, at the northern end of the Houma
Navigation Canal (HNC), is home to commerce and industry that relies on the
HNC.
Swarzenski et al. [5] reported that the GIWW influences open water salinities
by transporting freshwater to coastal water at some distance from the Lower
Communicating Artificial Neural Networks with Physical-Based Flow Model. . . 317
Houma GIWW
West of BL
GIWW at Larose
We
Ca st Mi South of Bayou
na no
l rs L’Eau Bleu
Grand Bayou
HNC
Dulac
Atchafalaya River (LAR). Fresh water moves south down the HNC. The GIWW
at Larose stays fresh under most conditions. However, when the stage of the LAR
drops, the HNC can become a conduit for saltwater to move northwards from the
Gulf of Mexico to the GIWW. Annual sea level is highest in the fall at the time the
LAR is at its annual lowest. The differential head gradient between in the LAR and
rest of the Louisiana coast is small and easily reversed by wind events, including
tropical storms.
The purpose of the HNC deepening study is to identify the most economically
feasible and environmentally acceptable depth of the HNC. Identification of the
final lock sill depth will be a direct result of the limited reevaluation study
and navigational safety and maintenance concerns. To improve the reliability
for numerical model calibration and simulation, knowledge base enhancement is
conducted by ANNs. Particularly, the enhancement of the continuous record for
flow and salinity inland boundary conditions during the calibration year, the average
flow simulation year, and the low and high flow simulation months are the primary
focus in this paper. The initial study has been summarized by Hsieh and Price [6].
Knowledge recovery system (KRS) deals with the activities which are beyond the
capabilities of normal data recovery systems. It covers most of the information
which is associated with knowledge oriented problems and is not limited to numer-
ical problems. Almost every management decision, particularly for a complex and
dynamic system, requires the application of a mathematical model and the validation
of the model with numerous field measurements. However, due to instrumentation
adjustments and other problems, the data obtained could be incomplete or produce
abnormal recording curves. Data may also be unavailable at appropriate points in the
318 B.B. Hsieh
computational domain when the modeling design work changes. Three of the most
popular temporal prediction/forecasting algorithms are used to form the KRS for
this project: Multi-layered Feed Forward Neural Networks (MLPs), Time-Delayed
Neural Networks (TDNN), and Recurrent Neural Networks (RNN). The detailed
theoretical development for the algorithms is described in Principe et al. [7] and
Haykin [8]. Recurrent networks can be classified as fully and partially recurrent: the
Jordan RNN, has feedback (or recurrent) connections from the output layer to its
inputs; the locally RNN [9], uses only local feedback; and the globally connected
RNN, has feedback connections from its hidden layer neurons back to its inputs,
such as Elman RNN.
The KRS [10] for missing data is based on the transfer function (response
function) approach. The simulated output can generate a recovered data series
using optimal weights from the best-fit activation function (transfer function). This
series is called the missing window. Three types of KRS are defined: self-recovery,
neighboring gage recovery with same parameter, and mixed neighboring and remote
gages recovery with multivariate parameters. It is noted that the knowledge recovery
will not always be able to recover the data even with perfect information (no missing
values or highly correlated) from neighboring gages.
The knowledge base development for KRS can be conducted as the following
steps:
1. To understand possible physical system associated with corresponding
input/output system.
2. To create a input/output system based on initial cross analysis between inputs
and target output series.
3. To detect and correct outliers and abnormal values.
4. To perform KRS for input system as necessary.
5. To conduct different training strategies and architecture.
6. To adjust/select additional possible inputs.
means there is less than 10 % data missing during entire year; Partial Missing (PM)
indicates the missing values are between 10 and 50 %; and Most Missing (MM)
means more than 50 % of the values are missing.
The inland boundary conditions were determined at three locations based on the
numeric modeling design and available gages. These locations are West Minors
Canal (WMC-GIWW), West of Bayou Lafourche at Larose (GIWW), and Bayou
Lafourche at Thibodaux. Note that only the WMC flow and salinity gage KRS will
be demonstrated in this paper.
Table 1 summarizes all the knowledge bases for gages related to the above
inland boundary locations. Although the Morgan City gage is out of the numerical
model domain, it is a major freshwater contributor to the system. The United States
Geological Survey (USGS) gage HNC at Dulac is even further south of the inland
boundary points, but it is an indicator of how much tidal flow will enter and leave
the system. Wind stress and tidal fluctuations are two other forcings which drive the
hydrodynamic system. Wind stress and wind direction are converted to horizontal
(x), vertical (y), and along-channel (c) components. The tidal elevations from the
Gulf coast propagate the forcing signal inland. The Grand Isle tidal station, a coastal
gage, is used as this driving forcing and the West bank wind gauge is represented as
wind forcing.
320 B.B. Hsieh
Flow data for West Minors Canal (GIWW) during the 2004–2005 period has more
than 80 % missing. The existing data distributes into a period between 1/25/05 and
3/22/05. The first step of this approach is to determine the most significant inputs
that dominate the change of flow at the target location. Usually, cross-correlation
analysis and an x–y scatter plot can be good tools to aid in making this decision.
For this system, significant inputs include stages and flows from gage information
at Morgan City, Houma, and Dulac, local stages, remote tidal forcing, and wind
stress. The meaningful candidates for the WMC-GIWW are Dulac flow, Morgan
City stage, West Minors Canal stage, Grand Isle tidal elevation, Houma stage, and
wind stress x-component. From an initial analysis, it was found that the hydraulic
gradient plays a significant role. To incorporate the way in which lower frequency
inputs propagate into the system, higher frequency information for remote inputs
(such as the Grand Isle tidal elevation and Dulac flow) are filtered out. The most
significant tidal frequencies for Gulf of Mexico are diurnal frequencies, so a 25-h
moving average filter is applied to both input functions. The training set used the
above described six series as inputs and the flow of West Minors Canal (GIWW) as
output, from 1/25/05 to 3/22/05.
Several training algorithms (Neurosolutions, v6.0 [11]), including static and
dynamic systems, and training strategies were tested. The best algorithm for this
application was Jordan–Elman Recurrent ANNs. Excellent agreement between the
ANNs model and flow observations was obtained (correlation coefficient D 0.951
and mean absolute error D 229 cfs (7.4 cms). The most significant inputs are subtidal
Dulac flow (filtering out the signal periods higher than 25 h), and the Morgan City
stage. With this success training, the flow near West Minors Canal at GIWW from
3/23/04 to 1/24/05 was simulated when six input series in the same period were
applied (Fig. 2). The annual flow average during this period was estimated to be
5000
MG01(ANNs)
4500
MG(01)
4000
3500
3000
2500
2000
1500
1000
500
0
1 505 1009 1513 2017 2521 3025 3529 4033 4537 5041 5545 6049 6553 7057 7561 8065 8569
Fig. 2 KRS (blue color shows the knowledge recovery and the pink color represents the
observations) for GIWW flow (cfs) near West Minors Canal flow gage
Communicating Artificial Neural Networks with Physical-Based Flow Model. . . 321
2544 cfs (94 cms). After the flow data near West Minors Canal had been recovered,
the focus turned to the salinity data recovery. This was easier to recover due to
smaller data gaps and more complete salinity data at Houma GIWW.
An ANNs model constructed with relatively few patterns used for training will
yield less reliable responses to conditions the system never learned. To meet the
generalized response of flow and salinity boundary conditions to other hydrologic
patterns, the ANNs model (called the simulator) has to be retrained by all the
input/output with a one year basis, or be longer if more information is available.
Unlike the knowledge recovery process for the WMC 2004–2005 salinity, the
intention of this simulator is not to use the flow information obtained from
WMC/GIWW flow simulator and the salinity from the Houma gage. The optimal
candidate for the salinity simulator at WMC/GIWW is six-input system. These are
up-gradient subtidal flow, WMC flow, subtidal component from Gulf of Mexico,
Houma gage stage, x-component wind stress, and Inverse Morgan City subtidal
flow. After 6000 training iterations, the simulator (Fig. 3) had a 0.921 correlation
coefficient and 0.03 ppt. mean absolute error.
4.5
4 WMC(sal)
WMC(sal) -ANNS
3.5
2.5
1.5
0.5
0
1 498 995 1492 1989 2486 2983 3480 3977 4474 4971 5468 5965 6462 6959 7456 7953 8450
-0.5
Fig. 3 ANNs training result ( pink color) of salinity (ppt.) simulator for GIWW near West Minors
Canal based on the 2004–2005 modeling period. The blue color shows the observations
322 B.B. Hsieh
0.4
0.35 Variance
0.3
0.25
0.2
0.15
0.1
0.05
0
1 2 3 4 5 6 7 8 9 10 11 12 13
Fig. 4 Annual dry–wet index variance for Morgan City flow from 1994 to 2006
Dry–wet index and variance analyses showed the average flow year to be 2003, the
high flow month to be April 2002, and the low flow month to be September 2006.
A knowledge base with six related gages to cover all three required periods was
then constructed (Table 2). Information from the gages at Morgan City, Dulac, and
Houma were used as remote forcing to quantify and recover the missing flow and
salinity data for the three boundary gages. The remote forcing or stress also included
tidal propagation and wind stress.
The procedure to process these flow and salinity boundary conditions was to
apply the simulators (connection weights) to simulate the target output after the
set of input parameters had been identified by the simulator. Note that stage, flow,
and salinity information during these three periods for the gage GIWW near West
Minors Canal was totally missing. This therefore required a simulation process, not
data recovery. Although the targets were to construct inland boundary condition
gage information for these three required periods, the information from the other
three related gages as well as tidal and wind forcing needed to be recovered first. For
example, to simulate flow at WMC during the average flow year 2003, the missing
values for flow at the Houma gage, tidal elevation at Grand Isle, and wind stress at
the West Bank station during the same period of time had to be estimated first.
324 B.B. Hsieh
6000
WMC(f)
5000
Lafourche(f)
4000
3000
2000
1000
0
1 34 67 100 133 166 199 232 265 298 331 364 397 430 463 496 529 562 595 628 661 694
Fig. 5 Simulated/recovered flow (cfs) for both GIWW near WMC and West Bayou Lafourche at
Larose gages during the high flow month, April 2003
Figure 5 shows boundary flow simulation results for both GIWW near WMC
and West Bayou Lafourche at Larose gage during the high flow month, while Fig. 6
presents the simulated/recovered salinity of the three inland boundary gages during
the 2003 average flow year with a similar approach technique.
Communicating Artificial Neural Networks with Physical-Based Flow Model. . . 325
8
WMC(sal)
7
Lafourche (sal)
6
Thibodaux(sal)
5
0
1 489 977 1465 1953 2441 2929 3417 3905 4393 4881 5369 5857 6345 6833 7321 7809 8297
Fig. 6 Simulated/recovered salinity (ppt.) for the three inland boundary gages—GIWW near
WMC, West Bayou Lafourche at Larose, and Bayou Lafourche at Thibodaux—during the average
flow year 2003
9 Learning Remarks
recovery also provided good results because there were less missing values, meaning
fewer values needed to be estimated within this model calibration/validation period.
However, this will not occur in a generalized situation. Therefore, to improve
prediction reliability, the simulators are developed. The method presented herein
achieved the result of overcoming the condition where information is not available
from neighboring gages, through the ability to estimate corresponding variable time
series. This condition becomes “system simulation,” which goes beyond performing
“filling-the-gaps” activities.
A flow pattern search procedure using renormalized transformation of long-term
flow record to identify the average flow year within the selected time window. The
used technique is an initial part of unsupervised ANNs, which can be used to search
primary flow patterns from a long historical hydrologic record, is given. The year
2003 is the best candidate as average flow year from Morgan City monthly average
flow record (1994–2006). A new knowledge base is established and combined
with developed simulators for successfully simulating boundary conditions of all
management scenarios need.
The data-driven modeling technique in estuarine/coastal system is not limited
to deal with knowledge within the database itself only. It can integrate with
numerical model to provide higher accuracy and time saving results. For example,
the computational intelligent techniques can be used to simulate flow and salinity
variation in the interior points of the numerical model, to make the correction
analysis while the numerical model reaches its calibration limitation, to obtain
more reliable offshore boundary conditions through an inverse modeling approach,
to conduct an unsupervised–supervised ANNs approach for designing a more
generalized prediction tool–simulator, and to perform as one of forecasting module
for a decision support system.
10 Conclusions
This flow and salinity system is dominated by the source tide from the coast and
freshwater inflow forcing from Morgan City, with minor variation due to wind
stress. Remote forcing plays an important role in driving the dynamic variation,
particularly for slow reacting parameters such as salinity. Addressing the subtidal
component of the variables and the hydraulic gradient between variables signifi-
cantly improved the accuracy of the missing data recovery/system simulations. The
dominant factor for flow and salinity transport is governed by seasonality or special
events, mainly between tidal and freshwater inflow forcing.
To understand the flow and salinity patterns from the Dulac gage data, a simple
physical phenomenon can be described as follows. During the low flow period,
although upland flow condition is a critical factor in determining the total net flow
entering the system, the accuracy of tidal forcing addressed by the model seems
to be more sensitive than the upland flow condition for salinity variation at Dulac.
During the high flow period, the boundary tide becomes less important. However,
Communicating Artificial Neural Networks with Physical-Based Flow Model. . . 327
when the Dulac flow is very low, a high salinity spike is expected. If we use only
the stage along HNC as model calibration criteria for hydrodynamic behavior, the
accuracy of the hydraulic gradient near Dulac is critical.
Acknowledgement The US Army Corps Engineers New Orleans District Office funded this work.
References
1. Hsieh, B.B.: Boundary Condition Estimation of Hydrodynamic Systems Using Inverse Neuro-
Numerical Modeling Approach, GIS & RS in Hydrology. Water Resources and Environment,
Yichang (2003)
2. Hsieh, B., Ratcliff J.: A storm surge prediction system using artificial neural networks.
In: Proceedings of the 26th International Society for Computers and Their Applications,
September 2013, Los Angles, CA, USA (2013)
3. Hsieh, B., Ratcliff J.: Determining necessary storm surge model scenario runs using
unsupervised-supervised ANNs. In: Proceedings of the 13th World Multi-Conference on
Systemics, Cybernetics and Informatics, Orlando, FL, USA, July 10–13, pp. 322–327 (2009)
4. Hsieh, B.: Advanced hydraulics simulation using artificial neural networks—integration with
numerical model. In: Proceedings of Management, Engineering, and Informatics Symposium,
July 3–6, Orlando, FL, USA, pp. 620–625 (2005)
5. Swarzenski, C.N., Tarver A.N., Labbe, C.K.: The Gulf Intracoastal Waterway as a Conduit
of Freshwater and Sediments to Coastal Louisiana Wetlands, Poster, USGS, Baton Rouge,
Louisiana (2002)
6. Hsieh, B., Price, C.: Enhancing a knowledge base development for flow and salinity boundary
conditions during model calibration and simulation processes. In: Proceedings of the 13th
World Multi-Conference on Systemics, Cybernetics and Informatics, Orlando, FL, USA, July
10–13, pp. 328–333 (2009)
7. Principe, J.C., Eulino, N.R., Lefebvre, W.C.: Neural and Adaptive Systems: Fundamentals
through Simulations. Wiley, New York (2000)
8. Haykin, S.: Neural Networks: A Comprehensive Foundation. Macmillan, New York (1994)
9. Frasconi, P., Gori, M., Soda, G.: Local feedback multilayered networks. Neural Comput. 4,
120–130 (1992)
10. Hsieh, B., Pratt, T.: Field Data Recovery in Tidal System Using ANNs, ERDC/CHL CHETN-
IV-38, (2001)
11. NeuroSolutions v6.0, Developers Level for Windows, NeuroDimension, Inc., Gainesville
(2011)
Forecasting Daily Water Demand Using Fuzzy
Cognitive Maps
1 Introduction
Forecasting daily water demand has been already addressed in many research
studies. Linear regression and artificial neural network were investigated in [25].
A nonlinear regression was applied in [36]. A hybrid methodology combining
feed-forward artificial neural networks, fuzzy logic, and genetic algorithm was
investigated in [24]. Recently spectral analysis-based methodology to detect cli-
matological influences on daily urban water demand has been proposed [1]. A
comparison of different methods of urban water demand forecasting was made in
[7, 26].
FCM is an emerging soft computing technique mixing artificial neural networks
and fuzzy logic [16]. An FCM models knowledge through causalities through fuzzy
concepts (represented as nodes) [5] and weighted relationships between them [16].
Extensive research has been accomplished on FCMs in different research and
application domains. In the domain of time series prediction, only few research
works based on FCM methodology have been done. In 2008, Stach et al. proposed
first a widely accepted technique for FCMs-based time series modeling [30]. Also,
Froelich and his colleagues used an evolutionary FCM model to effectively forecast
multivariate time series [10, 11, 15, 21, 22]. Recently Homenda et al. have applied
FCMs to univariate time series prediction [33–35] and Lu et al. [19] discussed in
their study time series modeling related to fuzzy techniques in data mining.
In spite of many existing solutions, the problem of daily forecasting of urban
water demand remains a challenge and the investigation of new forecasting methods
to solve it, is an open research area. In this research we undertake an effort to
improve the accuracy of forecasting using a model based on FCM.
The contributions of this paper are the following:
– We are proposing a new method for water demand forecasting using FCMs. An
FCM-based forecasting model is designed and trained for daily water demand
time series.
– A new parametrized fuzzyfication function is proposed that enables to control the
amount of data processes by the linear or nonlinear way.
The outline of this paper is as follows. Section 2 describes the theoretical
background on time series and fuzzy cognitive maps. Section 3 presents the design
of FCM-based model and its mapping to time series. Parameterized fuzzyfication
procedures for the calculation of concept states is proposed in Sect. 4. The results of
comparative experiments are shown in Sect. 5. Section 6 concludes the paper.
2 Theoretical Background
In this section we provide background knowledge on time series and the selected
state-of-the-art forecasting models. Also an introduction to the theory of fuzzy
cognitive maps is presented.
Forecasting Daily Water Demand Using Fuzzy Cognitive Maps 331
Let y 2 < be a real-valued variable whose values are observed over discrete time
scale t 2 Œ1; 2; : : : ; n, where n 2 @ is the length of the considered period. A time
series is denoted as fy.t/g D fy.1/; y.2/; : : : ; y.n/g. The objective is to calculate the
forecast yO .tC1/. The individual forecasting error is calculated as et D yO .tC1/y.tC
1/. To calculate cumulative prediction errors which in practice are accounted by the
value of fitness function of the evolutionary algorithm we use the Mean Absolute
Percentage error (talk) given as formula (1).
n ˇ ˇ
1 X ˇˇ et ˇˇ
MAPE D 100 %: (1)
n tD1 ˇ yt ˇ
For the evaluation of forecasting accuracy the concept of growing window was
used. The growing window is a period that consists of learning part that starts always
at the beginning of time series and ends up at the time step t. The testing part follows
the learning part. In the case of our study, the length of testing window was assigned
as 1 and only 1-step ahead prediction yO .t C 1/ was made. After every learn-and-test
trial the window grows 1 step ahead till the end of historical data is reached. The
concept of growing window is depicted in Fig. 1.
For the comparative purposes the following selection of state-of-the-art forecast-
ing models was made. Naive—the naive approach is a trivial forecasting method
which assumes that forecasts are assigned as the previously observed value, i.e.,
yO .t C 1/ D y.t/. ARIMA—AutoRegressive Integrated Moving Average model is
one of the most popular statistic forecasting one in time series analysis [13]. This
model relies on the assumption of linearity among variables and normal distribution.
The extension version takes into account additional exogenous variables and is
called ARIMAX. LR—linear regression model is used to relate a scalar dependent
variable and one or more explanatory variables. Linear regression was applied for
forecasting in [2]. PR—polynomial regression is a form of regression in which
the relationship between the independent variable and the dependent variables
is modeled as a polynomial. The use of polynomial regression for forecasting
has been demonstrated [36]. GARCH—Generalized Auto-regressive Conditional
Heteroscedastic is a nonlinear model. It uses a Quasi-Newton optimizer to find the
maximum likelihood estimates of the conditionally normal model. The details of this
model are further described in the literature [20]. ANN (Artificial Neural Networks)
is a mathematical model consisting of a group of artificial neurons connecting with
each other; the strength of a connection between two nodes is called weight [8].
HW—the Holt-Winters method is an extended version of the exponential smoothing
forecasting. It exploits the detected trend and seasonality within data.
Fuzzy cognitive maps are defined as an order pair hC; Wi, where C is the set of
concepts and W is the connection matrix that stores the weights wij assigned to the
arcs of the FCM [12]. The concepts are mapped to the real-valued activation level
ci 2 Œ0; 1 that is the degree in which the observation belongs to the concept (i.e., the
value of fuzzy membership function). The reasoning is performed as the calculation
of Eq. (2) [5]:
!
X
n
cj .t C 1/ D f wij ci .t/ ; (2)
iD1;i¤j
where f ./ is the transformation function that can be bivalent, trivalent, hyperbolic
tangent, or unipolar sigmoid. After numerous trials, for this study we selected to
apply the unipolar sigmoid function given by the formula (3):
1
f .x/ D ; (3)
1C ec.xT/
where c and T are constant parameters, set on trial-and-error basis. The parameter c
determines how quickly the transformation reaches values of 0 and 1. The parameter
T is used to move the transformation along the x axis.
The objective of learning FCM is to optimize the matrix W with respect to the
forecasting accuracy. In this paper we use for that purpose the real-coded genetic
algorithm (RCGA) [28] that was proved to be very effective for the time series
forecasting task [9]. In RCGA the populations of candidate FCMs are iteratively
evaluated with the use of fitness function given in Eq. (4) [29]:
˛
fitness.FCM/ D ; (4)
ˇ err C 1
where ˛, ˇ are the parameters and err is the prediction error. For the purpose of this
paper we assume ˛ D 1, ˇ D 1. The MAPE evaluates cumulative prediction error,
i.e., err D MAPE.
Forecasting Daily Water Demand Using Fuzzy Cognitive Maps 333
The challenge that we faced was the mapping of FCM structure to the water demand
time series. For that purpose we performed an extensive analysis of the considered
time series.
First the correlation analysis was performed. As can be observed in Fig. 2a the
auto-correlation function reveals a typical pattern with seasonality and trend [3].
The peaks occurring with the frequency of 7 days are related to weekly seasonality.
The decreasing slope of the function exhibits that the considered time series involves
trend.
Partial auto-correlation has been plotted in Fig. 2b. It confirms that the statistical
significance of lags above 7 is very low. In consequence we decided to select lags 7
and 1 for the planned FCM-based forecasting model.
In the second step of analysis, we checked the influence of exogenous variables
on the forecasted water demand. Correlation between water demand and mean daily
temperature was calculated and achieved the value 0:20. The correlation between
water demand and precipitation was very weak at negative: 0:02. The low value
of these correlations raised doubts regarding the inclusion of external regressors.
For that reason, we constructed a multivariate time series consisting of the three
considered variables: water demand, mean daily temperature, and precipitation. We
checked whether such time series is co-integrated, sharing an underlying stochastic
trend [4]. To test co-integration, the Johansen test [14] implemented as “ca.jo”
function in “urca” library of the R package was used. The results given in Table 1
confirmed the null hypothesis of the existence of two co-integration vectors. The
test value for the hypothesis r 2 was above the corresponding critical value
at all significance levels: 10 %, 5 %, and 1 % . The Johansen test suggested that
the inclusion of weather related variables may improve the forecasting accuracy.
Therefore, the mean daily temperature and precipitation (rain) were added as
(a) (b)
1.0
0.8
0.8
0.6
Partial ACF
0.6
ACF
0.4
0.4
0.2
0.2
0.0
0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Lag Lag
yi .t/ .1 k/ min.yi /
ci .t/ D ; (5)
.1 C k/ max.yi / .1 k/ min.yi /
where min.vi / and max.vi / denote the minimum and maximum values of yi .t/,
respectively. With the use of constant k 2 Œ0; 1 it is possible to shift the data closer
to the linear part of the transformation function.
This way, by changing the value of k it is possible to control the amount of data
that are transformed linearly during the reasoning process performed by the FCM.
By increasing k more data are processed purely linearly, i.e., less of them fall into
the nonlinear part of the transformation function.
After forecasting, defuzzyfication is made using the reversed function given as
formula (6).
yi .t/ D ci .t/ .1 C k/ max.yi / .1 k/ min.yi / C .1 k/ min.yi /: (6)
5 Experiments
The considered water demand time series involved 2254 days, with the data missing
for 78 of them. This was due to brakes in data transmission caused by discharged
batteries or other hardware problems in the data transmission channels. This
problem posed some difficulty because the missing values were irregularly dispersed
across the entire time series. We decided to impute them by linear interpolation
using function “interpNA” from package R. To illustrate the problem, in Fig. 4a we
show an exemplary part of the time series that was the most substantially affected by
missing data. In Fig. 4b the missing values are imputed by the applied interpolation.
The second issue that we encountered was the occurrence of outliers that were
also irregularly distributed across the time series. These were primarily due to
pipe bursts and the subsequent extensive water leakage. The outliers had to be
336 J.L. Salmeron et al.
4000
Water flow
Water flow
2000
2000
0 50 100 150 0 50 100 150
Days Days
Fig. 4 Dealing with missing data. (a) Raw data. (b) After interpolation
3000
Water flow
Water flow
2500
4000
2000
2000
1500
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Days Days
removed from the time series as the pipe bursts are random anomalies affecting
the regularities expected in the data. To filter out the outliers, the commonly used
Hampel filter was applied [23]. The parameters of the filter were tuned by trial-and-
error to best exclude outliers. To illustrate the problem, an example of a part of the
time series with outliers is presented in Fig. 5a. As shown in Fig. 5b, the exemplary
peak in water demand caused by the pipe burst has been eliminated to a certain
extent (please note the different scales of the y-axes in both figures).
The stationarity of the time series is one of the main factors that determine the
applicability of a particular predictive model. First, to test for unit root stationarity,
the augmented Dickey Fuller (ADF) test was performed [6]. The resulting low
Forecasting Daily Water Demand Using Fuzzy Cognitive Maps 337
The weights of the designed FCM was learned using the genetic algorithm. The
parameters of the genetic algorithm have been assigned on the basis of literature
and numerous trials. The assigned values are given in Table 3. On trial-and-error
basis we selected the optimal value of the parameter k D 0:4.
For the comparative purpose we implemented also experiments with the state-of-
the-art models that were described in introduction. For the models: Naive, ARIMA,
ARIMAX, GARCH, and Holt-Winters, the R software package was applied for their
implementation [32]. In case of all those methods, the optimal auto-correlation lags
were adjusted automatically. For fitting the ARIMA model to the data, we used the
function “auto.arima” from the package “forecast.” The Holt-Winters model was
learned using the function ets from the “forecast” package.
For linear regression, polynomial regression, and ANN, the KNIME toolkit was
used for the implementation.
All parameters were adjusted using trial-and-error method. Especially, to select
the best structure of ANN, the experiments were repeated many times. A single
hidden layer with 20 neurons and RProp training algorithm for multilayer feed-
forward network were found as the best [27]. A maximum number of iterations
for leaning ANNs were set to 100. For ARIMAX the temperature and precipitation
were added as explanatory variables.
Table 4 shows the results of experiments. As can be observed, the proposed FCM
model learned by the RCGA outperformed in terms of MAPE the other state-of-the-
art models.
6 Conclusion
In this chapter the problem of forecasting daily water demand was addressed. For
that purpose we designed a forecasting model based on fuzzy cognitive maps. The
proposed model assumes the mapping of raw time series to the concepts using
fuzzyfication function. The results of the performed experiments provide evidence
for the superiority of the proposed model over the selected state-of-the-art models.
To improve the forecasting accuracy, further theoretical work is required towards
the enhancement of the proposed FCM model.
Acknowledgements The work was supported by ISS-EWATUS project which has received
funding from the European Union’s Seventh Framework Programme for research, technological
development, and demonstration under grant agreement no. 619228. The authors would like to
thank the water distribution company in Sosnowiec (Poland) for gathering water demand data
and the personal of the weather station of the University of Silesia for collecting and preparing
meteorological data.
References
1. Adamowski, J., Adamowski, K., Prokoph, A.: A spectral analysis based methodology to detect
climatological influences on daily urban water demand. Math. Geosci. 45(1), 49–68 (2013)
2. Bianco, V., Manca, O., Nardini, S.: Electricity consumption forecasting in Italy using linear
regression models. Energy 34(9), 1413–1421 (2009)
3. Cortez, P., Rocha, M., Neves, J.: Genetic and evolutionary algorithms for time series fore-
casting. In: Engineering of Intelligent Systems, 14th International Conference on Industrial
Forecasting Daily Water Demand Using Fuzzy Cognitive Maps 339
and Engineering Applications of Artificial Intelligence and Expert Systems, IEA/AIE 2001,
Budapest, Hungary, June 4–7, 2001, Proceedings, pp. 393–402 (2001)
4. Cowpertwait, P.S.P., Metcalfe, A.V.: Introductory Time Series with R, 1st edn. Springer,
New York (2009)
5. Dickerson, J., Kosko, B.: Virtual worlds as fuzzy cognitive maps. Presence 3(2), 173–189
(1994)
6. Dickey, D.A., Fuller, W.A.: Distribution of the estimators for autoregressive time series with a
unit root. J. Am. Stat. Assoc. 74(366), 427–431 (1979)
7. Donkor, E., Mazzuchi, T., Soyer, R., Alan Roberson, J.: Urban water demand forecasting:
review of methods and models. J. Water Resour. Plan. Manag. 140(2), 146–159 (2014)
8. Du, W., Leung, S.Y.S., Kwong, C.K.: A multiobjective optimization-based neural network
model for short-term replenishment forecasting in fashion industry. Neurocomputing 151(Part
1), 342–353 (2015)
9. Froelich, W., Juszczuk, P.: Predictive capabilities of adaptive and evolutionary fuzzy cognitive
maps - a comparative study. In: Nguyen, N.T., Szczerbicki, E. (eds.) Intelligent Systems
for Knowledge Management, Studies in Computational Intelligence, vol. 252, pp. 153–174.
Springer, New York (2009)
10. Froelich, W., Salmeron, J.L.: Evolutionary learning of fuzzy grey cognitive maps for the
forecasting of multivariate, interval-valued time series. Int. J. Approx. Reason. 55(6), 1319–
1335 (2014)
11. Froelich, W., Papageorgiou, E.I., Samarinas, M., Skriapas, K.: Application of evolutionary
fuzzy cognitive maps to the long-term prediction of prostate cancer. Appl. Soft Comput. 12(12),
3810–3817 (2012)
12. Glykas, M. (ed.): Fuzzy Cognitive Maps, Advances in Theory, Methodologies, Tools and
Applications. Studies in Fuzziness and Soft Computing. Springer, Berlin (2010)
13. Han, A., Hong, Y., Wan, S.: Autoregressive conditional models for interval-valued time series
data. In: The 3rd International Conference on Singular Spectrum Analysis and Its Applications,
p. 27 (2012)
14. Johansen, S.: Estimation and hypothesis testing of cointegration vectors in gaussian vector
autoregressive models. Econometrica 59(6), 1551–1580 (1991)
15. Juszczuk, P., Froelich, W.: Learning fuzzy cognitive maps using a differential evolution
algorithm. Pol. J. Environ. Stud. 12(3B), 108–112 (2009)
16. Kosko, B.: Fuzzy cognitive maps. Int. J. Man Mach. Stud. 24, 65–75 (2010)
17. Kwiatkowski, D., Phillips, P.C., Schmidt, P., Shin, Y.: Testing the null hypothesis of stationarity
against the alternative of a unit root: how sure are we that economic time series have a unit root?
J. Econ. 54(1–3), 159–178 (1992)
18. Ljung G.M, Box G.E.P.: On a measure of lack of fit in time series models. Biometrika 65(2),
297–303 (1978)
19. Lu, W., Pedrycz, W., Liu, X., Yang, J., Li, P.: The modeling of time series based on fuzzy
information granules. Expert Syst. Appl. 41, 3799–3808 (2014)
20. MatíAs, J.M., Febrero-Bande, M., González-Manteiga, W., Reboredo, J.C.: Boosting garch
and neural networks for the prediction of heteroskedastic time series. Math. Comput. Model.
51(3–4), 256–271 (2010)
21. Papageorgiou, E.I., Froelich, W.: Application of evolutionary fuzzy cognitive maps for
prediction of pulmonary infections. IEEE Trans. Inf. Technol. Biomed. 16(1), 143–149 (2012)
22. Papageorgiou, E.I., Froelich, W.: Multi-step prediction of pulmonary infection with the use of
evolutionary fuzzy cognitive maps. Neurocomputing 92, 28–35 (2012)
23. Pearson, R.: Outliers in process modelling and identification. IEEE Trans. Control Syst.
Technol. 10(1), 55–63 (2002)
24. Pulido-Calvo, I., Gutiérrez-Estrada, J.C.: Improved irrigation water demand forecasting using
a soft-computing hybrid model. Biosyst. Eng. 102(2), 202–218 (2009)
25. Pulido-Calvo, I., Montesinos, P., Roldán, J., Ruiz-Navarro, F.: Linear regressions and neural
approaches to water demand forecasting in irrigation districts with telemetry systems. Biosyst.
Eng. 97(2), 283–293 (2007)
340 J.L. Salmeron et al.
26. Qi, C., Chang, N.B.: System dynamics modeling for municipal water demand estimation in an
urban region under uncertain economic impacts. J. Environ. Manag. 92(6), 1628–1641 (2011)
27. Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: the
RPROP algorithm. In: Proceedings of the IEEE International Conference on Neural Networks
(ICNN), pp. 586–591 (1993)
28. Stach, W., Kurgan, L., Pedrycz, W., Reformat, M.: Genetic learning of fuzzy cognitive maps.
Fuzzy Sets Syst. 153(3), 371–401 (2005)
29. Stach, W., Kurgan, L.A., Pedrycz, W.: Numerical and linguistic prediction of time series with
the use of fuzzy cognitive maps. IEEE Trans. Fuzzy Syst. 16(1), 61–72 (2008)
30. Stach, W., Kurgan, L., Pedrycz, W.: A divide and conquer method for learning large fuzzy
cognitive maps. Fuzzy Sets Syst. 161, 2515–2532 (2010)
31. Teräsvirta, T., Lin, C.F., Granger, C.W.J.: Power of the neural network linearity test. J. Time
Ser. Anal. 14(2), 209–220 (1993)
32. R Development Core Team (2008). R: A language and environment for statistical computing.
R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, http://www.R-
project.org
33. Homenda W., Jastrzȩbska A., Pedrycz W.: Joining concept’s based fuzzy cognitive map model
with moving window technique for time series modeling. In: IFIP International Federation
for Information Processing, CISIM 2014. Lecture Notes in Computer Science, vol. 8838.
pp. 397–408 (2014)
34. Homenda W., Jastrzȩbska A., Pedrycz W.: Modeling time series with fuzzy cognitive maps. In:
IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 2055–2062 (2014)
35. Homenda W., Jastrzȩbska A., Pedrycz W.: Time series modeling with fuzzy cognitive maps:
simplification strategies, the case of a posteriori removal of nodes and weights. In: IFIP
International Federation for Information Processing, CISIM 2014. Lecture Notes in Computer
Science, vol. 8838, pp. 409–420 (2014)
36. Yasar, A., Bilgili, M., Simsek, E.: Water demand forecasting based on stepwise multiple
nonlinear regression analysis. Arab. J. Sci. Eng. 37(8), 2333–2341 (2012)
Forecasting Short-Term Demand for Electronic
Assemblies by Using Soft-Rules
Tamás Jónás, József Dombi, Zsuzsanna Eszter Tóth, and Pál Dömötör
T. Jónás ()
Research and Development Department, Flextronics International Ltd, Hangár u. 5-37, 1183
Budapest, Hungary
Department of Management and Corporate Economics, Budapest University of Technology
and Economics, Magyar tudósok körútja 2, 1117 Budapest, Hungary
e-mail: [email protected]; [email protected]
J. Dombi
Institute of Informatics, University of Szeged, Árpád tér 2, 6720 Szeged, Hungary
Z.E. Tóth
Department of Management and Corporate Economics, Budapest University of Technology
and Economics, Magyar tudósok körútja 2, 1117 Budapest, Hungary
P. Dömötör
Research and Development Department, Flextronics International Ltd, Hangár u. 5-37,
1183 Budapest, Hungary
1 Introduction
In this paper, a new way of modeling and forecasting customer demand for
electronic assemblies is addressed and discussed. This work presents a compar-
ative forecasting methodology regarding uncertain customer demands via hybrid
techniques. Artificial intelligence forecasting techniques have been receiving much
attention lately in order to solve problems that can be hardly solved by the
use of traditional forecasting methods. The main objective of the paper is to
introduce an approach which is based on short-term pattern recognition and soft-rule
learning techniques and can be applied to predict short-term demand for electronic
assemblies and to manage customer demand in a fluctuating business environment.
The effectiveness of the proposed approach to the demand forecasting issue is
demonstrated using real-world data.
The available forecasting techniques can be classified into four main groups:
time series methods, causal methods, qualitative methods, and simulation methods
[2]. Most studies predict customer demand based on the first two categories [4].
Customer demand forecasts based on historical information are usually accurate
enough, when demand patterns are relatively smooth and continuous. However,
these methods react to changes slowly in dynamic environments [10]. In the
markets of electronic manufacturing services (EMS) companies, whose customers
are typically original equipment manufacturers (OEMs), the customer demand
for a particular electronic assembly may vary a lot from a week to another.
There are several reasons behind it, for example, the frequent demand changes at
different parties in the whole supply chain, the short-term supply tactics of the
direct customer, or the high number of substituting products. These phenomena
finally result in demand time series which are typically fragmented and do not
form characteristic life-cycles. Therefore, understanding the short-term behavior of
customer demand can be fruitful in managing the daily operation at the manufacturer
end.
The application of quantitative forecasting methods has several limitations in
such a case. The relationship between historical data and the demand data can
be complicated or can result in a poor regression. A large amount of data is
often required and non-linear patterns are difficult to capture. Finally, outliers can
bias the estimation of the model parameters [5]. To overcome these limitations,
artificial neural networks (ANNs) have also become a conventional technique in
demand forecasting owing to its adequate performance in pattern recognition.
ANNs are flexible computing frameworks and universal approximations that can
be applied to a wide range of forecasting problems [15]. Fuzzy theory has also been
broadly applied in forecasting (e.g., [6]). Song et al. proposed fuzzy time series
models to deal with forecasting problems [11, 12]. Hybrid models combining neural
networks and fuzzy systems are also of interest and could provide better results
(e.g., [7, 9, 14]). In our approach, a hierarchical neural clustering method, which is
founded on self-organizing maps, was used to recognize the so-called soft-rules in
Forecasting Short-Term Demand for Electronic Assemblies 343
demand time series of electronic assemblies. Once adequate soft-rules are identified,
they can be used for forecasting purposes.
2 Learning Soft-Rules
We assume that there are m time series, each of which represents the weekly
historical demand for an electronic assembly, that is, for the ith assembly we
have the Yi;1 ; Yi;2 ; : : : ; Yi;ni time series, where Yi;j is the jth demand for item
i.i D 1; 2; : : : ; m; j D 1; 2; : : : ; ni /. Our goal is to identify the short-term behavior
of demand changes, that is, we wish to deliver
like soft-rules, where Lp and Rq are values obtained from Yi;1 ; Yi;2 ; : : : ; Yi;ni , .u
3; v 1I p D k; : : : ; k C u 1I q D k C u; : : : ; k C u C v 1/. Such a
soft-rule has the following meaning: if there is a pattern of normalized historical
data similar to Lk ; LkC1 ; : : : ; LkCu1 , then future continuation of this series may be
RkCu ; : : : ; RkCuCv1 . Our aim is to identify soft-rules on the short run, that is, we
wish to obtain soft-rules learned from the time series Yi;1 ; Yi;2 ; : : : ; Yi;ni that can
be used to predict demands for a few periods (typically for 1–5 periods) by using
historical demand data of 3–15 periods. Let i 2 f1; 2; : : : ; mg be a fixed index. At
first, we discuss how to generate soft-rules for the time series Yi;1 ; Yi;2 ; : : : ; Yi;ni . For
a simpler notation, let Y1 ; Y2 ; : : : ; Yn denote the time series Yi;1 ; Yi;2 ; : : : ; Yi;ni .
We set the number of historical periods .u/ and the number of periods for prediction
.v/ so that u 3; v 1, u C v < n. We will differentiate among three cases,
depending on the values of each u period long sub-time-series Yk ; YkC1 ; : : : ; YkCu1
in the time series Y1 ; Y2 ; : : : Yn .k D 1; 2; : : : ; n v u C 1/.
.u/ .v/
Case 1. If Yk ; YkC1 ; : : : ; YkCu1 are not all equal, vectors zk and zkCu are defined
.u/ .v/
as zk D .Zk ; ZkC1 ; : : : ; ZkCu1 /, zkCu D .ZkCu ; : : : ; ZkCuCv1 /, where
YkCp mk;u
ZkCp D ; (2)
Mk;u mk;u
YkCuCq mk;u
ZkCuCq D ; (3)
Mk;u mk;u
344 T. Jónás et al.
and mk;u and Mk;u are the minimum and maximum of Yk ; YkC1 ; : : : ; YkCu1
values, respectively, p D 0; : : : ; u1, q D 0; : : : ; v1, k D 1; 2; : : : ; nvuC1.
.u/
Each component in zk is normalized to the Œ0; 1 interval, and the min-max
normalization used for a u-period long sub-time-series is also applied to its v-
period long continuation.
Case 2. If Yk ; YkC1 ; : : : ; YkCu1 are all the same and nonzero, say have the value
.u/ .v/ .v/
of a.a > 0/, then zk D .1; 1; : : : ; 1/, and vector zkCu is defined as zkCu D
.YkCu =a; : : : ; YkCuCv1 =a/.
.u/
Case 3. If Yk ; YkC1 ; : : : ; YkCu1 are all zeros, then zk D .0; 0; : : : ; 0/, and vector
.v/ .v/
zkCu is defined as zkCu D .YkCu ; : : : ; YkCuCv1 /.
.u/ .v/
We call the pairs .zk ; zkCu / study-pairs, each of them contains a normalized u-
period long sub-time-series and its v-period long continuation. The normalizations
.u/ .u/ .v/
allow us to make the zk vectors comparable and cluster the .zk ; zkCu / study-pairs
.u/
based on their zk vectors.
In order to find typical u-period long sub-time-series, that is, potential left-sides
.u/
of soft-rules, we cluster the study-pairs based on their zk vectors by applying a
growing hierarchical self-organizing map (GHSOM) method that is a customized
version of the method described in the work of Dittenbach et al. [3]. In our
implementation, growth of the GHSOM is controlled by using the spread-out factor
fs . The control method that we use is based on the one by Alahakoon et al. [1].
The spread-out factor fs is set so that 0 < fs < 1. Small values of fs result in less
clusters with higher quantization errors, while its higher values yield more clusters
with lower quantization errors.
Let us assume that the L1 ; L2 ; : : : ; LN clusters are formed with nL1 , nL2 ; : : : ; nLN
number of study-pairs, respectively. Let Is be the set of indexes k of study-pairs
.u/ .v/
.zk ; zkCu / that belong to cluster Ls .s 2 1; 2; : : : ; N/, that is,
n ˇ o
Is D kˇ zk ; zkCu 2 Ls ; 1 k n v u C 1
.u/ .v/
(4)
0.5
−0.5
−1
−1.5
0 2 4 6 8 10 12 14 16
Periods
.u/
can be formed, and if ls D .Ls;0 ; Ls;1 ; : : : ; Ls;u1 /, then this assignment can be
treated as a set of .Ls;0 ; Ls;1 ; : : : ; Ls;u1 / ) .ZkCu ; : : : ; ZkCuCv1 / like correspon-
dences, where k 2 Is .
Figure 1 depicts an example for cluster of study-pairs. In this example, the
study-pairs are clustered according to normalized 10-period long sub-time-series.
The normalized 10-period long sub-time-series (gray lines), the cluster centroid
(thick line), and the normalized 5-period long continuations (gray lines) are shown
together. The figure illustrates that different continuations are assigned to the 10-
period long normalized typical pattern (cluster centroid).
.u/ .v/
It is important to see that in the ls ) fzkCu g assignment, its left-side is one
vector that represents a u-period long typical normalized historical pattern, while
.u/
its right-side is a set of possible v-period long continuations of ls . Therefore, the
.u/ .v/ .v/
ls ) fzkCu g assignment cannot be taken as a rule, unless all the vectors zkCu on
.u/ .v/
its right-side are identical. Forming a soft-rule from the ls ) fzkCu g assignment
.v/
means that we identify a single, v-dimensional vector rs D .Rs;0 ; Rs;1 ; : : : ; Rs;v1 /
.v/
which represents all the vectors in fzkCu g well .k 2 Is /. In such a case,
.Ls;0 ; Ls;1 ; : : : ; Ls;u1 / ) .Rs;0 ; Rs;1 ; : : : ; Rs;v1 / can be considered as a soft-rule.
.v/
There can be several methods established to derive an appropriate rs vector from
.u/ .v/
the continuations of ls in the fzkCu g set. We discuss two methods: the Average
346 T. Jónás et al.
.v/
If continuations zkCu .k 2 Is / are not too spread out they can be represented
.v/ .u/ .v/
well by their average rs , and the soft-rule ls ) rs can be formed. When the
.v/
zkCu .k 2 Is / continuations are spread, their average does not represent them well.
Clustered Right-Side Continuations In Fig. 1, we can identify three clusters of
.v/
the zkCu .k 2 Is / continuations. One of them (the most upper one) contains several
continuations with similar patterns, another one contains a single continuation, and
.v/
the third one includes two continuations. In such cases, clustering the zkCu right-side
.u/
continuations of the zk left-side time series (left-sides of study-pairs) that belong
to cluster Ls .k 2 Is / may result in clusters out of which one includes dominant
.v/
portion of the zkCu , while the others contain much less right-side continuations. The
right-side continuations are clustered by using the same GHSOM based clustering
.v/ .u/
method that we discussed before. If the zkCu right-side continuations of ls .k 2
.v/ .v/
Is / are clustered into the Rs;1 ; : : : ; Rs;Ms clusters with the rs;1 ; : : : ; rs;Ms cluster
.v/
centroids, respectively, then vector rs can be defined as the centroid of cluster Rs;i
ˇ ˇ
that contains the highest number of zkCu continuations: jRs;i j D max ˇRs;j ˇ ;
.v/
jD1;:::;Ms
i 2 f1; : : : ; Ms g:
Measuring Goodness of a Soft-Rule Off course, it may happen that clusters
.v/
Rs;1 ; : : : ; Rs;Ms all include the same number of vectors from the set fzkCu g, or
there are some of the clusters among Rs;1 ; : : : ; Rs;Ms with same number of vectors
.v/ .v/
from the set fzkCu g. In such cases, vector rs cannot be unambiguously selected
.v/ .v/ .u/ .v/
from the cluster centroids rs;1 ; : : : ; rs;Ms , neither the soft-rule ls ) rs can be
unambiguously selected from the correspondences
and so the ARSC method may be more appropriate than the CRSC method.
As mentioned before, the CRSC method is applicable if one of the clusters
.v/
Rs;1 ; : : : ; Rs;Ms contains the majority of the zkCu continuations .k 2 Is /.
Forecasting Short-Term Demand for Electronic Assemblies 347
.u/ .v/
In order to characterize how representative the soft-rule ls ) rs is, we use
two metrics, the support and the confidence, similarly to the way as the support and
confidence metrics are used to characterize strength of association rules [13].
.u/ .v/ .u/ .v/ .u/
The support sup .ls ) rs / of soft-rule ls ) rs is the proportion of zk
.u/
vectors of which closest cluster centroid is the vector ls .1 k n v u C 1/. In
other words, the proportion of u-period long sub-time-series represented by vector
.u/ .u/ .v/
ls is the support of ls ) rs , that is,
.u/ jIs j
sup .ls ) r.v/
s /D ; (8)
nuvC1
.u/ .v/
the cardinality jIs j is the support count of soft-rule ls ) rs .
.u/ .v/ .u/ .v/
The confidence con .ls ) rs / of soft-rule ls ) rs can be defined as the
.v/ .u/
proportion of zkCu right-side continuations (of ls ) of which closest cluster centroid
.v/ .v/ .u/ .v/
is the vector rs .k 2 Is /. Considering definition of rs , con .ls ) rs / can be
calculated as
ˇ ˇ
max ˇRs;j ˇ
.u/ .v/ jD1;:::;Ms
con .ls ) rs / D : (9)
jIs j
The problem with this definition of confidence is that it is applicable when the
CRSC method is used, but not for the ARSC method, thus, we propose the following
.u/
algorithm to derive a soft-rule with left-side of ls and interpret its confidence.
The confidence of a rule generated by using the ARSC method is set to tcon .
If definition (9) was applied, the confidence of such a rule would be 1, but the
average of continuations may not represent all the continuations so well as the
cluster centroid of the cluster with the highest cardinality does in the case when
the CRSC method is used to form the soft-rule. Thus, setting the confidence of such
a rule to tcon expresses that its confidence is not considered being better than tcon .
Aggregate Goodness of Soft-Rules Let
.u/ .v/ .u/ .v/ .u/ .v/
l1 ) r1 ; l2 ) r2 ; : : : ; lN ) rN (10)
1 X
N
.u;v/ .u/
supcon D sup .ls ) r.v/
s /
.u/
con .ls ) r.v/
s /: (11)
N sD1
.u;v/
There are a couple of notable properties of supcon that need to be considered in
.u/ .v/
order to use it in practice appropriately. If the ls ) rs soft-rules .s D 1; 2; : : : ; N/
are all formed by using the CRSC method, then using the definitions of support and
.u;v/
confidence in (8) and (9), supcon can be written as
1 1 X N
ˇ ˇ
max ˇRs;j ˇ :
.u;v/
supcon D (12)
N n u v C 1 sD1 jD1;:::;Ms
The Number of Historical Periods Taken into Account The number of historical
periods .u/ and the number of periods for prediction .v/ are important parameters
to our method. Parameter v is driven by the particular practical application, while u
.u;v/ .u;v/
can be a user-specified one. The supcon and MSE measures can be used to answer
the question on what u results the best soft-rule based model for the time series
Y1 ; Y2 ; : : : ; Yn , if all the other parameters used for forming the rules are fixed.
Let
.u/ .v/ .u/ .v/ .u/ .v/
l1 ) r1 ; l2 ) r2 ; : : : ; lNu ) rNu (15)
.u;v/ .u;v/
dsupcon D 1 supcon (16)
.u;v/
.u;v/ MSE
dMSE D .i;v/
; (17)
max .MSE /
iD3;:::;umax
350 T. Jónás et al.
Let us assume that applying the methodology discussed so far to the time series
Y1 ; Y2 ; : : : ; Yn results in the following soft-rules:
Q
u1
.1 jZi Ls;i j/
iD0
ws D : (20)
Q
P u1
N
1 jZi Lp;i j
pD1 iD0
.u/ .v/
It expresses how much the left-side of soft-rule ls ) rs is similar to vector
.v/
z.u/ .s D 1; 2; : : : ; N/. Let vector rs be .Rs;0 ; : : : ; Rs;v1 /, then vector r.v/ D
.R0 ; R1 ; : : : ; Rv1 / is calculated as
X
N
Rt D ws Rs;t (21)
sD1
Forecasting Short-Term Demand for Electronic Assemblies 351
Case 2.
aRt ; if aRt 0
YnC1Ct D (23)
0; otherwise
Case 3.
Rt ; if Rt 0
YnC1Ct D (24)
0; otherwise
.t D 0; : : : ; v 1/.
Our method was developed for practical applications to support short-term demand
forecasting of electronic assemblies. This section demonstrates how the soft-rules
discovered from demand time series can be used for modeling and forecasting
purposes.
The soft-rule (SF) based, moving average (MA), exponential smoothing (ES),
and ARIMA forecasting methods were applied to generate 1 week .v D 1/ and
3 week .v D 3/ demand forecasts for 12 electronic assemblies. Forecast values
were compared to actual demand data and mean square error (MSE) values were
calculated for each method. For each time series, the soft-rules were generated with
historical periods u resulting the best set of rules according to the method discussed
in “Aggregate Goodness of Soft-Rules”. 0.95 of spread-out factor fs (both for left-
and right-side clustering), and a 0.8 of confidence threshold tcon were applied to
generate the soft-rules for each historical demand time series. The length of moving
average was set to the value of parameter u of the corresponding SF model. The
number of autoregressive terms . p/, nonseasonal differences needed for stationarity
.d/, and lagged forecast errors .q/ are indicated for the best fitting ARIMA model
for each time series. Results are summarized in Table 1 in which the bold and
underlined number for each prediction indicates the best MSE value.
352 T. Jónás et al.
The empirical results tell us that in 16 cases, out of the total 24 cases, the soft-
rule based forecast method gave the least MSE value, while in rest of the cases the
ARIMA method produced better MSE values.
The presented approach, which was realized in a software, can be taken as a
machine learning based one. Namely, the GHSOM as an unsupervised learning
technique is used to discover typical patterns in time series, based on which left-
sides of soft-rules can be identified. We discussed two methods: the ARSC and the
CRSC to form soft-rules, however, there can be other possibilities to form soft-rules
once their left-sides are identified. For example, the age of each left-side historical
sub-time-series can be taken into consideration as a weight when average of the
right-side continuations is generated. Our method can be taken as a hybrid one
that combines neural learning and statistical techniques. Based on the empirical
results, the introduced technique can be considered as a viable alternative method for
Forecasting Short-Term Demand for Electronic Assemblies 353
short-term customer demand prediction and as such can support customer demand
management in a fluctuating business environment. One of our further research
plans is to study how our method relates to the ARIMA and fuzzy time series
models as well as how it could be combined with them. We also wish to study
how the spread-out factors for the left- and right-side clustering, and the confidence
threshold impact on the learned soft-rules and goodness of prediction.
References
1. Alahakoon, D., Halgamuge, S., Srinivasan, B.: Dynamic self-organizing maps with controlled
growth for knowledge discovery. IEEE Trans. Neural Netw. 11(3), 601–614 (2000)
2. Chopra, S., Meindl, P.: Supply Chain Management: Strategy, Planning and Operation. Prentice-
Hall, Upper Saddle River, NJ (2001)
3. Dittenbach, M., Merkl, D., Rauber, A.: The growing hierarchical self-organizing map. In:
Proceedings of the Intu Joint Conference on Neural Networks (IJCNN2000), Como, pp. VI-
15–VI-19. IEEE Computer Society Press, Los Alamitos, CA (2000)
4. Efendigil, T., Önüt, S., Kahraman, C.: A decision support system for demand forecasting with
artificial neural networks and neuro-fuzzy models: a comparative analysis. Expert Syst. Appl.
36(3), 6697–6707 (2009)
5. Garetti, M., Taisch, M.: Neural networks in production planning and control. Prod. Plan.
Control 10(4), 324–339 (1999)
6. Huarng, K.: Heuristic models of fuzzy time series for forecasting. Fuzzy Sets Syst. 123, 369–
386 (2001)
7. Khashei, M., Hejazi, S., Bijari, M.: A new hybrid artificial neural networks and fuzzy
regression model for time series forecasting. Fuzzy Sets Syst. 159(7), 769–786 (2008)
8. Kriegel, H., Kröger, P., Zimek, A.: Clustering high-dimensional data. ACM Trans. Knowl.
Discov. Data 3(1) (2009). doi:10.1145/1497577.1497578
9. Kuo, R., Xue, K.: An intelligent sales forecasting system through integration of artificial neural
network and fuzzy neural network. Comput. Ind. 37, 1–15 (1998)
10. Petrovic, D., Xie, Y., Burnham, K.: Fuzzy decision support system for demand forecasting with
a learning mechanism. Fuzzy Sets Syst. 157, 1713–1725 (2006)
11. Song, Q., Chissom, B.: Fuzzy time series and its models. Fuzzy Sets Syst. 54, 269–277 (1993)
12. Song, Q., Leland, R.: Adaptive learning defuzzification techniques and applications. Fuzzy
Sets Syst. 81, 321–329 (1996)
13. Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1st edn. Chap. 6. Association
Analysis: Basic Concepts and Algorithms. Addison-Wesley, Boston (2005)
14. Valenzuela, O., Rojas, I., Rojas, F., Pomares, H., Herrera, L., Guillen, A., Marquez, L., Pasadas,
M.: Hybridization of intelligent techniques and ARIMA models for time series prediction.
Fuzzy Sets Syst. 159, 821–845 (2008)
15. Zhang, G., Hu, B.: Forecasting with artificial neural networks: the state of the art. Neurocom-
puting 56, 205–232 (2004)
Electrical Load Forecasting: A Parallel Seasonal
Approach
Abstract The electrical load forecast is an important aspect for the electrical
distribution companies. It is important to determine the future demand for power
in the short, medium, and long term.
In the aim of making sure that the prediction remains relevant, different parame-
ters are taking into account such as gross domestic product (GDP) and weather.
This contribution covers the forecasting of medium and long terms of Algerian
electrical load, using information contained in past consumption in a parallel
approach where each season is forecasted separately and using dynamic load profile
in order to deduct daily and hourly load values.
Three models are implemented in this work. Multy variables linear regression
(MLRs), artificial neural network (ANN) multi-layer perceptron (MLP), support
vector machines regression (SVR) with grid search algorithm for hyperparameter
optimization, and we use real energy consumption records. The proposed approach
can be useful in the elaboration of energy policies, although accurate predictions of
energy consumption positively affect the capital investment, while conserving at the
same time the supply security.
In addition it can be a precise tool for the Algerian mid-long term energy
consumption prediction problem, which up today has not been faced effectively. The
results are very encouraging and were accepted by the local electricity company.
1 Introduction
2 Related Work
Different methods and approaches have been proposed for forecasting the long- and
mid-term electric load demand in the last decades. Many of them include time series
analysis with statistical method like linear regressions. Indeed, Bianco [8] used
multiple linear regression (MLR) using GDP and population as selected exogenous
variables to forecast electricity consumption in Italy, the paper presents the different
models used, the results were globally good with an error rate that varies between
0.11 and 2.4 %.
Nezzar [9] used also linear regression approach for mid-long term Algerian
electric load forecasting, using historical information and GDP. The paper is
composed mainly into two parts: the first one for the annual national grid forecasting
Electrical Load Forecasting: A Parallel Seasonal Approach 357
and the second one concerns with an introduction to the use of load profile in order
to obtain a global load matrix.
Achnata [10] applied support vector regression to a real world dataset provided
in the web by the Energy Information Administration (EIA) department of America
for the Alaska state. In this work the support vector machines (SVM) performance
was compared with MLP for various models. The results obtained show that SVM
performs better than neural networks trained with back propagation algorithm, they
concluded that through proper selection of the parameters, SVM can replace some
of the neural network based models for electric load forecasting.
On the other hand, we can found intelligent methods such as artificial neural
network (ANN), in this way Ekonomou [11] used an MLP to predict Greek long-
term energy consumption. Several neural network (MLP) architectures were tested
and the one with the best generalizing ability was selected. The selected ANN model
results were compared to linear regression and "-support vector regression. The
produced results were much more accurate than those obtained by a linear regression
model and similar to those obtained by a support vector machine model.
Karin [12] also used ANN approach for the forecasting of electricity demand
in Thailand; in the paper, three methodologies were used, autoregressive integrated
moving average (ARIMA), ANN, and MLR.
The objective was to compare the performance of these three approaches and
the empirical data used in this study was the historical data regarding the electricity
demand (population, GDP, stock index, revenue from exporting industrial products,
and electricity consumption).
The results based on the error measurement showed that ANN model outper-
forms the other approaches.
3 Models Used
response variable by fitting a linear equation on the observed data. Every value of
the independent variables is associated with a value of the dependent variable.
Formally, the model for MLRs, given N observations, is
X
$ .k/ D ˛0 C ai i .k/ C "k ; k D 1; 2; : : : ; N; (1)
iD1
where r.k/ is the estimated loads, K is the observation ID (the month predicted),
is the number variables, i are the multiple variables, and "k is the notation for the
model deviations.
ANN is a machine learning approach inspired by the way in which the brain
performs a particular learning task. ANNs are modeled on human brain and consist
of a number of artificial neurons. Neuron in ANNs tends to have fewer connections
than biological neurons.
Each neuron in ANN receives a number of inputs. (Each input has weight). An
activation function is applied to these inputs which results in a neuron activation
level. There are different classes of network architectures: Single-layer feed-
forward, Multi-layer feed-forward, Recurrent [14].
is to map the original data into a feature space with high dimensionality through a
nonlinear mapping function and construct an optimal hyperplane in new space.
SVM can be applied to both classification and regression. In the case of
classification, an optimal hyperplane is found that separates the data into two
classes. Whereas in the case of regression a hyperplane is to be constructed that lies
close to as many points as possible [17]. The key characteristics of SVMs are the
use of kernels, the non-attendance of local minima, the sparseness of the solution,
and the capacity control obtained by optimizing the margin.
The SVR task is to find a functional form for a function that can correctly predict
new cases that the SVM has not been presented with before. This can be achieved
by training the SVM model on a sample set, i.e., training set, a process that involves,
like classification (see above), and the sequential optimization of an error function,
depending on this error function definition [17].
There are a number of kernels that can be used in SVM models. These include
linear, polynomial, RBF, and sigmoid (Table 1).
4 Methodology
The power consumption in Algeria is divided into four seasons and the electric load
values in each season are relatively following the same evolution path.
In order to improve the precision of the forecast, we divided the dataset into
four parts; each part contains the monthly electric load values (peak) of a different
season.
Due to that, we constructed a model on each season’s dataset. Then we use three
months to predict a season. Combining the results obtained by the three months we
can have a season result, and then combining the four seasons we will have a year. In
this goal we divided the problem into four sub-problems. Otherwise we considered
360 O. Ahmia and N. Farah
Fig. 1 Diagram illustrating an example of prediction of monthly load for the year 2012
that, months belonging to the same season as a single time series (each season is a
different time series).
We will have a parallel approach as shown in Fig. 1 with four ANN, four SVR
with RBF kernel, and so on.
The forecasting models that are modeled are implemented using historical
information. Different approaches are used to find the best monthly forecast model
by choosing and testing different given information:
• The electric load values of the same month of the previous years (PMY) i.e.:
Y . y; m/ D f .Y . y 1; m/ ; Y . y 2; m/ ; : : : /
Y . y; m/ D f .Y . y; m 1/ ; Y . y; m 2/ ; : : : /
Y . y; m/ D f .Y . y 1; m/ ; Y . y 2; m/ ; : : : ; Y . y; m 1/ ; Y . y; m 2/ ; : : : /
This study uses the Algerian national electricity consumption data from year 2000 to
2012, provided by the national electricity company Sonelgaz (http://www.sonelgaz.
dz/), in order to compare the forecasting performances of a parallel seasonal
approach, using SVR models, ANN, and linear regression models.
The data provided concerns the national electricity consumption for each hour
during the period 2000–2012, using annual, weekly, and daily profile we calculated
the electrical load value after that the monthly peak is deducted by taking the
maximum load values in the considered month.
The data used by ANN and SVR (PUK, RBF) has to be normalized on a Scaling
from 0 to 1.
ei Emin
Normalized .ei / D ; (4)
Emax Emin
where Emin is the minimum expected value for variable E, Emax is the maximum
expected value for variable E, if Emax is equal to Emin , then normalized (ei ) is set to
0.5.
362 O. Ahmia and N. Farah
There is a relation between actual and past electric load. That’s why eight
combinations of past information (monthly peak load at a precedent month) are
used as variables in order to predict future peak load. To forecast a month we use
different models for each combination:
– (2 PMY C 2 PM) use the same month of the two previous years and the electric
load values of the two previous months (four variables).
– (2 PMY C PM) use the same month of the two previous years and the electric
load values of the previous month (three variables).
– (2 PMY) use the same month of the two previous years (two variables).
– (3 PMY) use the same month of the three previous years (three variables).
– (4 PMY C 2 PM) use the same month of the four previous years and the electric
load values of the two previous months (six variables).
– (4 PMY C 3 PM) use the same month of the four previous years and the electric
load values of the three previous months (seven variables).
– (4 PMY C PM) use the same month of the four previous years and the electric
load values of the previous month (five variables).
– (4 PMY) use the same month of the four previous years (four variables).
In order to find the best neural network architecture to have a good generalizing
ability, several ANN MLP models were developed and tested. These structures were
consisted of one to five hidden layers and within two to ten neurons in each hidden
layer. The model, which presented the best generalizing ability, had a compact
structure with the following characteristics: two hidden layers, with five and four
neurons in each layer, and trained with back propagation learning algorithm and
logarithmic sigmoid transfer function.
The parameters optimizations increase considerably the accuracy of the peak
forecast [19, 20, 21]. In order to have better accuracy, the SVR parameters have
been chosen by grid search algorithm [22].
By linking together the three types of load profiles (every week of the annual profile
is linked with weekly profile and every day of weekly profile is associated with
a daily profile), we will obtain a global load profile, a matrix that contains the
percentage of each hour of the year compared to the annual peak.
Electrical Load Forecasting: A Parallel Seasonal Approach 363
Usually the electricity power supplier classifies intuitively week days into three
types (working day (WD), weekend (WE), and first day of the week (FDW)). In
order to verify this hypothesis, we used K-means [23] algorithm in order to cluster
the weekdays into three different types.
To classify the weekdays that have an overall similar behavior into the same
group (type), we create synthetic variables (24 variables), the percentage of each
hour of the same weekday, in relation to the daily peak.
The results are shown in Table 2.
From Table 2 we can notice that the weekdays are classified as follows:
weekend,1 working day, and the day after weekend.
The multiplication of the predicted annual peak by the global load profile will
result in the hourly load prediction of the entire year.
Instead of using annual peak, it is also possible to multiply monthly peak by
the corresponding weekly profiles combined to the daily profile, to get hourly
prediction. It is possible to improve the accuracy of the short-term prediction by
using personalized profile for special days and month (e.g., Ramadan).
Firstly, we train the models on the full dataset without dividing it into four seasons.
The performance of all the models with the same variables set is tested in order
to reveal the advantage of the proposed method.
Each model will be checked using mean absolute percentage error (MAPE). The
forecasting results for each model are presented in Table 3. The minimal errors in
each line are highlighted in bold.
Then we train four models on the dataset that have been divided into four
seasons, where each model will be specialized on a different season, the results
are represented in Table 4.
Finally we add the GDP as exogenous variable to the divided dataset, the results
are shown below.
1
The weekend in Algeria is Friday and Saturday.
364 O. Ahmia and N. Farah
Table 3 MAPE before dividing the dataset into four seasons of the five models using different
variables basing on different historical value
Linear regression SVR (PUK SVR (RBF SVR (Poly Neural
model (%) kernel) (%) kernel) (%) kernel) (%) network (%)
2 PMY C 2 PM 2.96 3.28 3.15 2.98 3.52
2 PMY C PM 3.02 3.17 3.23 3.20 3.74
2 PMY 3.60 3.57 3.56 3.64 3.90
3 PMY 3.55 3.50 3.37 3.42 3.86
4 PMY C 2 PM 2.96 3.02 3.04 3.01 3.75
4 PMY C 3 PM 2.98 3.12 3.11 3.11 3.93
4 PMY C PM 3.03 3.12 3.13 3.23 4.16
4 PMY 3.15 3.13 3.10 3.01 3.77
Table 4 MAPE after dividing the dataset into four seasons of the five models using different
variables basing on different historical value
Linear regression SVR (PUK SVR (RBF SVR (poly Neural
model (%) kernel) (%) kernel) (%) kernel) (%) network (%)
2 PMY C 2 PM 2.94 3.21 2.97 2.72 5.48
2 PMY C PM 2.83 3.00 2.76 2.73 4.81
2 PMY 3.63 4.33 3.58 3.48 3.71
3 PMY 3.51 3.81 3.05 3.48 3.51
4 PMY C 2 PM 2.85 2.77 2.77 2.67 4.81
4 PMY C 3 PM 2.93 2.89 2.88 2.65 3.99
4 PMY C PM 2.81 2.58 2.48 2.54 4.63
4 PMY 2.97 2.88 2.66 2.79 4.02
We notice from Tables 3 and 4 that dividing our dataset into seasons increases
the forecast precision, as an example the error of the support vector regression with
RBF kernel decreases from 3.10 to 2.66 %, for the model that uses the values of
the same month of the four previous years. Except in the case of the neural network
because of insufficient training dataset caused by the division, the size of the training
set before dividing contain 120 instances, after dividing the number of instance in
each dataset becomes 30, we note according to the result in Tables 2 and 3 that
the error of the neural network increases from 3.52 to 5.48 % for the model that
uses the values of the same month of the two previous years and the values of the
two previous. As shown in Table 5, taking into account the GDP value in the model
construction increase significantly the accuracy of the prediction, this shows that the
Algerian GDP is closely linked to the electricity load peak. From Tables 3, 4, and
5 it can be observed that hyperparameter optimized SVRs give a better result than
ANN and multiple regression.
To measure the accuracy of the long-term to short-term prediction, we predict the
hourly load values of the year 2012, we use the annual peak predicted by the SVM
(RBF) with the 4 PMY C 3 PM combination, and the global load profile of 2011, in
comparison with the real load values the MAPE error is 3.61 %.
Electrical Load Forecasting: A Parallel Seasonal Approach 365
Table 5 MAPE after dividing the dataset into four seasons of the five models using different
variables basing on different historical values and the GDPm value
Linear regression SVR (PUK SVR (RBF SVR (poly Neural
model (%) kernel) (%) kernel) (%) kernel) (%) network (%)
2 PMY C 2 PM 2.22 2.10 2.14 2.08 3.77
2 PMY C PM 2.21 2.37 2.17 2.30 4.04
2 PMY 3.51 3.36 2.98 3.43 2.81
3 PMY 3.45 3.10 2.62 3.16 3.08
4 PMY C 2 PM 2.35 1.93 1.60 1.99 3.04
4 PMY C 3 PM 2.53 1.91 1.59 2.20 2.68
4 PMY C PM 2.38 1.92 1.78 1.97 2.73
4 PMY 2.74 2.44 2.34 2.84 3.76
6 Conclusion
References
1. Feinberg, E.A., Genethliou, D.: Load forecasting. In: Chow, J.H., Wu, F.F., Momoh, J.A.
(eds.) Anonymous Applied Mathematics for Restructured Electric Power Systems, pp. 269–
285. Springer, Heidelberg (2005)
2. Haida, T., Muto, S.: Regression based peak load forecasting using a transformation technique.
IEEE Trans. Power Syst. 9(4), 1788–1794 (1994)
3. Cleveland, W.P., Tiao, G.C.: Decomposition of seasonal time series: a model for the census
X-11 program. J. Am. Stat. Assoc. 71(355), 581–587 (1976)
4. Perninge, M., Knazkins, V., Amelin, M., Söder, L.: Modeling the electric power consumption
in a multi-area system. Eur. Trans. Electr. Power 21(1), 413–423 (2011)
5. Ozturk, I., Acaravci, A.: The causal relationship between energy consumption and GDP in
Albania, Bulgaria, Hungary and Romania: evidence from ARDL bound testing approach. Appl.
Energy 87(6), 1938–1943 (2010)
366 O. Ahmia and N. Farah
6. Laouafi, A., Mordjaoui, M., Dib, D.: One-hour ahead electric load forecasting using neuro-
fuzzy system in a parallel approach. In: Azar, A.T., Vaidyanathan, S. (eds.) Anonymous
Computational Intelligence Applications in Modeling and Control, pp. 95–121. Springer,
Heidelberg (2015)
7. Ahmia, O., Farah, N.: Parallel seasonal approach for electrical load forecasting. In: Interna-
tional Work-Conference on Time Series (ITISE), pp. 615–626. Granada (2015)
8. Bianco, V., Manca, O., Nardini, S.: Electricity consumption forecasting in Italy using linear
regression models. Energy 34(9), 1413–1421 (2009)
9. Nezzar, M., Farah, N., Khadir, T.: Mid-long term Algerian electric load forecasting using
regression approach. In: 2013 International Conference on Technological Advances in Elec-
trical, Electronics and Computer Engineering (TAEECE), IEEE, pp. 121–126 (2013)
10. Achnata, R.: Long term electric load forecasting using neural networks and support vector
machines. Int. J. Comp. Sci. Technol. 3(1), 266–269 (2012)
11. Ekonomou, L.: Greek long-term energy consumption prediction using artificial neural net-
works. Energy 35(2), 512–517 (2010)
12. Kandananond, K.: Forecasting electricity demand in Thailand with an artificial neural network
approach. Energies 4(8), 1246–1257 (2011)
13. Sykes, A.O.: An introduction to regression analysis. Chicago Working Paper in Law and
Economics (1993)
14. Kubat, M.: Neural Networks: A Comprehensive Foundation by Simon Haykin. Macmillan,
New York, 1994, ISBN 0-02-352781-7 (1999)
15. Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: the
RPROP algorithm. In: IEEE International Conference on Neural Networks, 1993, IEEE, pp.
586–591 (1993)
16. Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl.
Disc. 2(2), 121–167 (1998)
17. Schölkopf, B., Burges, C.J., Smola, A.J.: Using support vector machine for time series
prediction. In: Advances in kernel Methods, pp. 242–253. MIT, Cambridge (1999)
18. Abaidoo, R.: Economic growth and energy consumption in an emerging economy: augmented
granger causality approach. Res. Bus. Econ. J. 4(1), 1–15 (2011)
19. Cherkassky, V., Ma, Y.: Practical selection of SVM parameters and noise estimation for SVM
regression. Neural Netw. 17(1), 113–126 (2004)
20. Diehl, C.P., Cauwenberghs, G.: SVM incremental learning, adaptation and optimization,
Neural Networks, 2003. In: Proceedings of the International Joint Conference on, vol. 4, pp.
2685–2690 (2003)
21. Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection
and hyper parameter optimization of classification algorithms. In: 19th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, pp. 847–855. Chicago
(2013)
22. Bi, J., Bennett, K., Embrechts, M., Breneman, C., Song, M.: Dimensionality reduction via
sparse support vector machines. J. Mach. Learn. Res. 3, 1229–1243 (2003)
23. Benabbas, F., Khadir, M.T., Fay, D., et al.: Kohonen map combined to the K-means algorithm
for the identification of day types of Algerian electricity load. In: IEEE 7th Computer
Information Systems and Industrial Management Applications (CISIM’08), pp. 78–83 (2008)
A Compounded Multi-resolution-Artificial
Neural Network Method for the Prediction
of Time Series with Complex Dynamics
Livio Fenga
1 Introduction
One of the most effective and well-established practical uses of time series
analysis is related to the prediction of future values using its past information [1].
However, much of the related statistical analysis is done under linear assumptions
which—outside trivial cases and ad hoc lab—controlled experiments—hardly ever
do possess features compatible with real-world DGPs. The proposed forecast
procedure has been designed to account for complex, possibly non-linear dynamics
one may encounter in practice and combines the wavelet multi-resolution decompo-
sition approach, with a non-standard, highly computer intensive statistical method:
artificial neural networks (ANN). The acronym chosen for i—i.e. MUNI, short for
Multi-resolution Neural Intelligence—reflects both these aspects. In more details,
MUNI procedure is based on the reconstruction of the original time series after its
decomposition, performed through an algorithm based on the inverse of a wavelet
transform, called Multi-resolution Approximation (MRA) [2–4]. Practically, the
L. Fenga ()
UCSD, University of California San Diego, La Jolla, CA, USA
e-mail: [email protected]
original time series is decomposed into more elementary components (first step),
each of them representing an input variable for the predictive model, and as such
individually predicted and finally combined through the inverse MRA procedure
(second step). In charge of generating the predictions is the time domain—Artificial
Intelligence (AI)—part of the method which exploits an algorithm belonging to the
class of parallel distributed processing, i.e., ANN [5, 6], with an input structure of
the type autoregressive.
In what follows the time series (signal) of interest isn assumedo to be real-valued,
uniformly sampled of finite length T, i.e.: xt WD .xt /Tt2ZC . MUNI has been
implemented with a wavelet [7–9] signal-coefficient transformation procedure of the
type Maximum Overlapping Discrete Wavelet Transform (MODWT) [10], which is
a filtering approach aimed at modifying the observed series fxgt2Z C , by artificially
introducing an extension of it, so that the unobserved samples fxgt2Z are assigned
the observed values XT1 , XT2 ; : : : ; X0 . This method considers the series as it were
periodic and is known as using circular boundary conditions, where wavelet and
scale coefficients are respectively given by:
Lj 1 Lj 1
1 XQ 1 X
dj;t D j=2 hj;l ; Xtl modN ; SJ;t D j=2 gQ j;l ; Xtl modN ;
2 lD0 2 lD0
˚ ˚
with hQ j;l and gQ j;l denoting the length L, level j, wavelet, and scaling ˚ filters,
obtained by rescaling their Discrete Wavelet Transform counterparts, i.e., hj;l and
˚ ˚
gj;l , as follows: hQ j;l D 2 j=2
hj;l gj;l
and gQ j;l D 2 j=2 . Here, the sequences of coefficients hj;l
˚
and gj;l are approximate filters: the former of the type band-pass, with nominal
pass-band f 2 Œ 4 1 j ; 12 j , and the latter of the type low-pass, with a nominal
pass-band f 2 Œ0; 14 j , with j denoting the scale. Considering all the J D J max
sustainable scales, MRA wavelet representation of xt , in the L2 .R/ space, can be
expressed as follows:
X X X
x.t/ D sJ;k J;k .t/ C dJ;k J;k .t/ C dJ1;k J1;k .t/ C
k k k
X X
C dj;k j;k .t/ C d1;k 1;k .t/; (1)
k k
Multi-resolution Neural Network-Based Predictions 369
with k taking integer values from 1 to the length of the vector of wavelet coefficients
related to the component j and and , respectively, the father and mother wavelets
(see, for example, [11, 12]). Assuming PJthat a number J0 J maxPof scales is
P is expressed as xt D jD1 Dj C SJ0 , with Dj D k dJ;k J;k .t/
0
selected, MRA
and Sj D k sJ;k J;k .t/; j D 1; 2; : : : ; J. Each sequence of coefficients dj , (in
signal processing called crystal), represents the original signal at a given resolution
level, so that the MRA conducted at a given level j . j D 1; 2; : : : ; J), delivers
the coefficients set Dj , which reflects signal local variations at the detailing level
j,
˚ and the set SJ0 , accounting for the long run variations. By adding more levels
dj I j D 1; 2; : : : ; J max , finer levels js are involved in the reconstruction of the
original signal and the approximation becomes closer and closer, until the loss of
information becomes negligible. The forecasted values are generated by aggregation
of the predictions singularly obtained by each of the wavelet components, P 0 once they
are transformed via the Inverse MODWT algorithm, i.e.: xO t .h/ D JjD1 DO inv
j .h/ C
SO Jinv
0
.h/, where D and S are as above defined and the superscript inv indicates the
inverse MODWT transform. In total, four are the choices required for a proper imple-
mentation of MODWT, they are boundary conditions, type of wavelet filter, its width
parameter L, and number of decomposition levels. Regarding the first choice, MUNI
has been implemented with periodic boundary conditions. However, alternatives can
be evaluated on the basis of the characteristics of the time series and/or as a part of
a preliminary investigation. The choices related to the type of wavelet function and
its length L are generally hard to automatize, therefore their inclusion in MUNI
has not been pursued. More simply, it has been implemented with the fourth order
Daubechies least asymmetric wavelet filter (known also as symmlets) [8] of length
L D 8, usually denoted LA.8/. Regarding the forecasting method, MUNI uses a
neural topology belonging to the family of multilayer perceptron [13, 14], of the
type feed-forward (FFWD-ANN) [15]. This is a popular choice in computational
intelligence for its ability to perform in virtually any functional mapping problem
including autoregressive structures. This network represents the non-linear function
mapping from past observations fxt I . D 1; 2; : : : ; T 1/g to future values
fxh I .h D T; T C 1; : : :/g, i.e.: xt D nn xt1 ; xt2 ; : : : ; xtp ; w C ut , with p the
maximum autoregressive lag, .:/ the activation function defined over the inputs
and w the network parameters and ut the error term. In practice, the input–output
relationship is learnt by linking, via acyclic connections, the output xt to its lagged
values, constituting the network input, through a set of layers. While the latter has
usually a very low cardinality (often 1 or 2), the input set is critical—being the
inclusion of not-significant lags and=or the exclusion of significant ones able to
affect the quality of the outcomes.
370 L. Fenga
MUNI envisions the time series at hand split in three different, non overlapping
parts, serving respectively
˚ as training, test, and validation sets. The training set
is the sequence .x1 ; q1 /; : : : ; .xp ; qp / , in the form of p ordered pairs of n- and
m-dimensional vectors, were qi denotes the target value and xi the matrix of the
delayed time series values. The network, usually initialized with random weights, is
presented an input pattern and an output, say oi , is generated as a result. Being in
general oi ¤ qi , the learning algorithm tries to find the optimal weights vector
minimizing the error function in the w-space, that is: op D fnn .w; xp /, where
the weight vector w refers, respectively, to the pi output and the pi input and fnn
the activation function. Denoting here the training set with Tr and with Ptr the
number of pairs, the average
P error E committed by the network can be expressed
Q
as: E.w/ D E.w/ C 12 i w2i , where is a constraint term aimed at penalizing
model weights and thus limiting the probability of over-fitting.
In this section, MUNI’s AI-driven part is illustrated and some notation introduced. In
essence, it is a multi-grid searching system for the estimation of an optimal network
vector of parameters under a suitable loss function, i.e., the root mean square error
(RMSE), expressed as
" # 12
X
T
B.xi ; xO i / D T 1 jei j2 ; (2)
i
with xi denoting the observed value, e the difference between it and its prediction xO i ,
and T the sample size. The parameters subjected to neural-driven search, listed in
Table 1 along with the symbols used to denote each of themare stored in the vector
!, i.e.: ! .ˇ; ; ˛; ; ; /. Each of them is associated a grid, whose arbitrarily
chosen values are all in ZC . Consistently with the list of parameters of Table 1, the
set of these grids, is formalized as follows: D .fˇ g, f g, f˛ g, f g, f g,
f g), where each subset f./ g has cardinality respectively equals to ˇ,Q ,
Q ˛,Q ,Q
Q 1 The wavelet-based MRA procedure is applied ˇQ times, so that the time series
Q .
of interest is broken down into ˇQ different sets, each containing different numbers of
Q
crystals, in turn contained by a set denominated A, i.e.: fAˇw I w D 1; 2; : : : ; ˇg
A. Here, each of the A’s encompasses a number of decomposition levels ranging
from a minimum and a maximum, respectively, denoted by J min and J max , therefore,
for the generic set Aˇw A, it will be:
˚
Aˇw D J min k; k C 1; k C 2; : : : ; K J max I J min > 1 : (3)
Assuming a resolution set Aˇ0 and a resolution level k0 Aˇ0 , the related
crystal, denoted by Xk0 ;ˇ0 , is processed by the network N1 , which is parametrized by
.k0 ;Aˇ /
the vector of parameters !1 0
..k0 ;Aˇ0 / ; ˛ .k0 ;Aˇ0 / ; .k0 ;Aˇ0 / ; .k0 ;Aˇ0 / ; .k0 ;Aˇ0 / /.
Once trained, the network N1 , denoted by CQ1 k0 ;Aˇ0 , is employed to generate H-
step ahead predictions. MUNI chooses the best parameter vector, i.e., !.k0 ;Aˇ0 /
for Xk0 ;ˇ0 , according to the minimization of a set of cost functions of the form
B.Xk0 ;ˇ0 ; XOk0 ;ˇ0 / iteratively computed on the predicted values XOk0 ;ˇ0 in the vali-
dation set. These predictions are generated by a set of networks parametrized and
trained according to the set of k-tuples (with k the length of !) induced by the set
of the Cartesian relations on . Denoting the former by CQk0 ;Aˇ0 and by P the latter,
it will be:
!.k0 ;Aˇ0 / D arg min B.XOk0 ;Aˇ0 .P/; Xk0 ;Aˇ0 /: (4)
P
The set of all the trained networks attempted at the resolution level Aˇ0 (i.e.,
encompassing all the crystals in Aˇ0 ), is denoted by CQAˇ0 , whereas the set of
networks trained in the whole exercise (i.e., for all the A’s), by CQA . The networks
CQAˇ0 , parametrized with the optimal vector . !J min ; : : : ; !J max / ˝ ˇ0 , which is
obtained by applying (4) to each crystal, are used to generate predicted values at
each resolution level independently. These predictions are combined via Inverse-
MODWT and evaluated in terms of the loss function B, computed on the validation
set of the original time series. By repeating the above steps for the remaining
sets, i.e., fAˇw I w D 1; 2; : : : ; ˇQ 1g A, ˇQ optimal sets of networks CQA ,
each parametrized by optimal vectors of parameters f ˝ w I w D 1; 2; : : : ; ˇg Q are
obtained. Each set of networks in CQA is used to generate one vector of predictions
.1/ .2/ Q
./ . j/
1
For example, for the grid we have D ; ; : : : ; , with the generic element
denoting one of the values chosen for the number of the input neurons.
372 L. Fenga
for xt in the validation set (by combination of the multi-resolutions predictions via
MODWT), so that, by iteratively applying (2) to each of them, a vector containing
ˇQ values of the loss function, say Lw , is generated. Finally, the set of networks in
a resolution set say A, whose parametrizations minimize Lw , are the winners, i.e.:
A
˝ D arg min .L/.
. ˝ w /
2
The upper row containing the symbols for the A’s has been added for clarity.
Multi-resolution Neural Network-Based Predictions 373
Here, the generic column ˇw , represents the set of resolution levels generated
by a given MRA procedure, so that its generic element Xk;w is the crystal
obtained at a given decomposition level k belonging to the set of crystals
Aˇw . For each column vector ˇw , a minimum and a maximum decomposition
level (3), J min and J max , is arbitrarily chosen;
3. the set P of the parametrizations ˚ of interest is built. It is the set of all
the Cartesian relations P ˛ whose cardinality,
expressed through the symbol j j, is denoted by jPj;
4. an arbitrary set of decomposition levels, say A0 A, is selected (the symbol ˇ
is suppressed for an easier readability);
5. an arbitrary crystal, say Xk0 ;A0 A0 , is extracted;
.1/
6. the parameter vector ! is set to an (arbitrary) initial status !1 P1 ˛ ,
.1/ .1/ .1/ .1/
, , , ;
(a) Xk0 ;A0 is submitted to and processed by a single hidden layer ANN of the
form
8
<N D Xk0 ;A0 ; w.1/
.1/
:w.1/ D .!k0 ;A0 /;
1
with being the sigmoid activation function and w the network weights
evaluated for a given configuration of the parameter vector, i.e., !1 .
(b) network N1 is trained and the network CQ1 k0 ;A0 , obtained as a result, is
employed to generate H-step ahead predictions for the validation set xU .
These predictions are stored in the vector Pk10 ;A0 ;
374 L. Fenga
(c) steps 6a–6d are repeated for each of the remaining .jPj 1/ elements of
P. The matrix Pkm0 ;A0 of dimension U jPj 1 , containing the related
predictions (for the crystal Xk0 ;A0 ) is generated by the trained networks
k0 ;A0
CQ2;:::;jPj ;
(d) the matrix full Pk0 ;A0 of dimensions U jPj containing all the predictions
for the crystal Xk0 ;A0 is generated, i.e.,
full P
k0 ;A0
D Pk10 ;A0 [ Pkm0 ;A0 I
(e) steps 6a–6d are repeated for each of the remaining crystals in A0 , i.e.,
˚
Xki ;A0 I i D J min ; J minC1 ; : : : ; .J max 1/ A0 A;
2 3
XOJ MIN ;!1 XOJ MIN ;!2 XOJ MIN ;!k XOJ MIN ;!jPj
6 7
6 XO MINC1 XO MINC1 XOJ MINC1 ;!k XOJ MINC1 ;!jPj 7
6 J ;!1 J ;!2 7
6 7
6 : : :: :: :: 7
6 :: :: : : : 7
JO A0 D6
6
7I
7
6 XOk;!1 XOk;!2 XOk;!k XOk;!jPj 7
6 7
6 :: :: :: :: :: 7
6 : : : : : 7
4 5
XOJ MAX ;!1 XOJ MAX ;!2 XOJ MAX ;!k XOJ MAX ;!jPj
o
7. loss function minimization in the validation set is usedn to build the set of winner
min max
ANNs for each of the crystals in A0 , i.e., C A0 J C A0 ; : : : ; J C A0 .
For example, for the generic crystal k, the related optimal network is selected
according to: k CA0 D arg min B. XkU ; XOkU .P//;
P
8. C A0 is employed to generate the matrix XOA0 of the optimal predictions for the
validation set of each resolution level ini A0 , i.e.,
h min 0
XO XO U ; : : : : : : ; J XO U
A 0 J max
;
˚
9. by applying inverse MODWT to XO 0 , the series xU t is reconstructed, i.e.,
A
˚ .U/
Inv.XOA0 / D xO U , so that the related loss function B.xU
A0 t t ;x
O t / is computed
and its value stored in the vector whose length is .J MAX J min C 1/;
3
In order to save space, the empty set symbol ; is omitted.
Multi-resolution Neural Network-Based Predictions 375
10. steps 4–9 are repeated for the remaining sets of resolutions A1 ; : : : Aw ; : : : ; Aw1
Q ,
so that all the wQ error function minima are stored in the vector ;
11. the network set C generating the estimation of the crystals minimizer of over
all the network configurations C, is the final winner, i.e., C D arg min .C/;
CA
12. final performances assessments are obtained by using C on the test set xSt .
2 Empirical Analysis
10
300
8
250
200
6
150
4
100
1995 2000 2005 2010 2015 1950 1960 1970 1980 1990 2000 2010
1.0
120
0.5
100
0.0
80
−0.5
60
−1.0
40
−1.5
1950 1960 1970 1980 1990 2000 2010 1980 1985 1990 1995 2000 2005 2010 2015
shows roughly an overall similar pattern, with spikes, irregular seasonality, and
non-stationarity both in mean and variance. Such a similarity is roughly confirmed
by the patterns of their EACFs (Fig. 2). Regarding the time series TS3–4, they
exhibit more regular overall behaviors but deep differences in terms of their
structures. In fact, by examining Figs. 1 and 2, it becomes apparent that unlike TS4,
TS3 is roughly trend stationary with a 12-month seasonality with a persistence of
the type moving average—according to the (unreported) Partial EACF—different
from the one characterizing TS4, appearing to follow an autoregressive process.
Regarding TS4, this time series has been included for being affected by two major
problems: an irregular trend pattern with a significant structural break located in
2009 and seasonal variations with size approximately proportional with the local
Multi-resolution Neural Network-Based Predictions 377
1.0
1.0
0.8
0.8
0.6
0.6
EACF
EACF
0.4
0.4
0.2
0.2
0.0
0.0
−0.2
−0.2
0.8
0.6
0.0
EACF
EACF
0.4
−0.2
0.2
0.0
−0.4
Fig. 2 EACFs for the time series employed in the empirical study
7.0
7.0
6.5
6.5
Index
Index
6.0
6.0
5.5
5.5
2 4 6 8 10 12 2 4 6 8 10 12
Test set Test set
7.0
7.0
6.5
6.5
Index
Index
6.0
6.0
5.5
5.5
2 4 6 8 10 12 2 4 6 8 10 12
Test set Test set
Fig. 3 TS1: test set. True (continuous line) and 1,2,3,4-step ahead predicted values (dashed line)
2.1 Results
As illustrated in Sect. 1.1.4, each of the employed ANNs has been implemented
according to a variable-specific set , containing all the grids whose values are
reported in Table 4. It is worth emphasizing that, in practical applications, not
necessarily the set encompasses the optimal (in the sense of the target function
B) parameter values of a given network. More realistically, due to the computational
constraints, one can only design a grid set able to steer the searching procedure
towards good approximating solutions (Table 6). The outcomes of the empirical
analysis outlined in the previous Sect. 2 are reported in Table 7. From its inspection,
it is possible to see the good performances, with a few exceptions, achieved by the
procedure. The series denominated TS4, in particular, shows a level of fitting that can
Multi-resolution Neural Network-Based Predictions 379
100
100
90
90
80
80
Index
Index
70
70
60
60
50
50
40
40
2 4 6 8 10 12 2 4 6 8 10 12
Test set Test set
100
100
90
90
80
80
Index
Index
70
70
60
60
50
50
40
40
2 4 6 8 10 12 2 4 6 8 10 12
Test set Test set
Fig. 4 TS4: test set. True (continuous line) and 1,2,3,4-step ahead predicted values (dashed line)
Table 6 Parameters chosen by MUNI for each time series at each frequency component
Parameters
Time series Crystals ˛
TS1 d1 200 0:1 1-2-3-12-18-36 7
D 400 d2 200 0:1 1-2-3-4-12-36 8
ˇD5 d3 200 0:1 1-2-3-4-12-15-18 8
d4 200 0:1 1-2-3-4-12-15-18-30 8
s4 200 0:1 1-2-3-4-12-18-24-30 8
TS2 d1 100 0:001 1-2-12 4
D7 d2 100 0:001 1-2-12-18 5
ˇD6 d3 100 0:001 1-2-3-30 4
d4 50 0:001 1-2-4-48 3
d5 100 0:001 1-2-12–30 4
d6 100 0:01 1-4-36 2
s6 100 0:01 1-2-3-10-60 4
TS3 d1 100 0:01 1-2-12 2
D2 d2 100 0:01 1-2-3-12 2
ˇD5 d3 100 0:01 1-2-12-18 3
d4 50 0:01 1-2-3-4-21 3
s4 100 0:01 1-2-3-5-6-21 3
TS4 d1 100 0:001 1-2-3-12 2
D150 d2 100 0:001 1-2-12 2
ˇ D5 d3 100 0:001 1-2-5-15 3
d4 200 0:01 1-3-4-24 3
s4 200 0:1 1-3-15-18-36-48 5
Table 7 Goodness of fit statistics computed on the test set for the four time series considered
Horizon RMSE MPE MAPE Horizon RMSE MPE MAPE
TS1 1 1:729 0:303 1:485 TS2 1 0:093 0:105 1:424
2 2:705 0:097 2:354 2 0:186 0:408 2:829
3 4:71 1:063 3:56 3 0:294 0:75 4:382
4 7:184 1:588 6:133 4 0:375 1:361 5:835
TS3 1 0:129 0:677 1:841 TS4 1 0:712 0:047 0:849
2 0:215 0:712 3:224 2 1:507 0:351 1:43
3 0:224 0:694 3:12 3 2:908 0:035 2:626
4 0:295 2:693 4:075 3 5:591 0:807 5:938
Values for TS3 obtained by back-transformation
Multi-resolution Neural Network-Based Predictions 383
has to be said that, among those included in the empirical experiment, this time
series proves to be the most problematic one both in terms of sample size and
for exhibiting a multiple regime pattern made more complicated by the presence
of heteroscedasticity. Such a framework induces MUNI at selecting architectures
which are too complex for the available sample size. As reported in Table 6, in fact,
the number of hidden neurons is large and reaches, for almost all the frequency
components chosen (ˇ D 5), its maximum grid value ( D 8). Also, the number
of input lags selected is always high, whereas the regularization parameter reaches,
for all the decomposition levels, its maximum value ( D 0:1). Such a situation
is probably an indication of how the procedure tries to limit model complexity
by using the greatest value admitted for the regularization term, nevertheless the
selected networks still seem to over-fit. This impression is also supported by the high
number of iterations (the selected value for ˛ is 200 for each of the final networks)
which might have induced the selected networks to learn irrelevant patterns. As
a result, MUNI is not able to properly screen out undesired, noisy components,
which therefore affect the quality of the predictions. However, notwithstanding this
framework, the performances can be still regarded as acceptable considering the
predictions at lag 1 and perhaps at lag 2 (RMSE D 1:73 and 2:70, respectively),
whereas they significantly deteriorate at horizon 3, where the RMSE reaches the
value of 4:71. Horizon 4 is where MUNI breaks down, probably for the increasing
degree of uncertainty present at higher horizons associated with poor network
generalization capabilities. With an RMSE of 7.18, that is, more than 4 times higher
than horizon 1 and an MPE of 1.59 (>5 times), additional actions would be in order,
e.g., increasing the information set by including ad hoc regressors in the model. As
already mentioned, TS2 shows an overall behavior fairly similar to TS1, in terms
of probabilistic structure, non-stationarity components and multiple regime pattern.
However, the more satisfactory performances recorded in this case are most likely
to be connected to the much bigger available sample size. In particular, it is worth
emphasizing the good values of the MAPE for the short term predictions (h D 1; 2),
respectively, equal to 1:42 and 2:83 as well as the RMSE obtained at horizon 4,
which amounts to 0:37. In this case, more parsimonious architecture are chosen
(see Table 6) for the ˇ D 6 selected number of components the original time series
has been broken into, with a number of neurons ranging from 2 (for the crystal d6)
to 5 (for the crystals d2), which are associated with input sets of moderate size,
( D 5 is the max value selected). As a pre-processed version of TS2, TS3 shows
a more regular behavior (even though a certain amount of heteroscedasticity is still
present) with weaker correlation structures at the first lags and a single peak at
the seasonal lag 12 (EACF D 0:474). As expected, MUNI for this case selects
simpler architectures for each of the ˇ D 5 sub-series, with a limited number of
input lags and a smaller number of neurons ( 3). Although generated by more
parsimonious networks, the overall performances seem to be comparable to those
obtained in the case of TS2. In fact, while they are slightly worse for the first two
lags (MAPE D 1:84; 3:22 for h D 1; 2 respectively versus 1:42 and 2:83), the
error committed seems to decrease more smoothly as the forecast horizon increases.
In particular at horizon 4, MUNI delivers better predictions than in the case of
384 L. Fenga
the un-transformed series: in fact, the recorded values of RMSE and MAPE are
respectively of 0.29 and 4.07 for TS3 and 0.37 and 5.83 for TS2.
References
1. De Gooijer, J.G., Hyndman, R.J.: 25 years of time series forecasting. Int. J. Forecast. 22,
443–473 (2006)
2. Mallat, S.G.: A theory for multiresolution signal decomposition: the wavelet representation.
IEEE Trans. Pattern Anal. Mach. Intell. 11, 674–693 (1989)
3. Mallat, S.: A Wavelet Tour of Signal Processing. Academic, New York (1999)
4. Akansu, A.N., Haddad, R.A.: Multiresolution Signal Decomposition: Transforms, Subbands,
and Wavelets. Academic, New York (2001)
5. Patra, J., Pal, R., Chatterji, B., Panda, G.: Identification of nonlinear dynamic systems using
functional link artificial neural networks. IEEE Trans. Syst. Man Cybern. B Cybern. 29, 254–
262 (1999)
6. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, New York,
NY (1995)
7. Farge, M.: Wavelet transforms and their applications to turbulence. Annu. Rev. Fluid Mech.
24, 395–458 (1992)
8. Daubechies, I., et al.: Ten lectures on wavelets. In: Society for Industrial and Applied
Mathematics, vol. 61. SIAM, Philadelphia (1992)
9. Qian, T., Vai, M.I., Xu, Y.: Wavelet Analysis and Applications. Springer, Berlin (2007)
10. Percival, D.B., Walden, A.T.: Wavelet Methods for Time Series Analysis, vol. 4. Cambridge
University Press, Cambridge (2006)
11. Aboufadel, E., Schlicker, S.: Discovering Wavelets. Wiley, New York (2011)
12. Härdle, W., Kerkyacharian, G., Picard, D., Tsybakov, A.: Wavelets, Approximation, and
Statistical Applications, vol. 129. Springer, Berlin (2012)
13. Webb, A.R.: Statistical Pattern Recognition. Wiley, New York (2003)
14. Pal, S.K., Mitra, S.: Multilayer perceptron, fuzzy sets, and classification. IEEE Trans. Neural
Netw. 3, 683–697 (1992)
15. Michie, D., Spiegelhalter, D.J., Taylor, C.C.: Machine Learning, Neural and Statistical
Classification. Horwood Ellis, Ltd., Hempstead (1994)
16. Box, G.E.P., Jenkins, G.M.: Time Series Analysis, Forecasting, and Control, vol.136.
Holden-Day, San Francisco (1976)
17. Chatfield, C., Prothero, D.: Box-Jenkins seasonal forecasting: problems in a case-study. J. R.
Stat. Soc. Ser. A (General) (1973) 295–336
18. Poirier, D.J.: Experience with using the Box-Cox transformation when forecasting economic
time series: a comment. J. Econ. 14, 277–280 (1980)