0% found this document useful (0 votes)
56 views

Rojas-Time series analysis and forecasting-Book16

Uploaded by

Houssem Louchene
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Rojas-Time series analysis and forecasting-Book16

Uploaded by

Houssem Louchene
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 384

Contributions to Statistics

Ignacio Rojas
Héctor Pomares Editors

Time Series
Analysis and
Forecasting
Selected Contributions from the ITISE
Conference
Contributions to Statistics
More information about this series at http://www.springer.com/series/2912
Ignacio Rojas • Héctor Pomares
Editors

Time Series Analysis


and Forecasting
Selected Contributions from the ITISE
Conference

123
Editors
Ignacio Rojas Héctor Pomares
CITIC-UGR CITIC-UGR
University of Granada University of Granada
Granada, Spain Granada, Spain

ISSN 1431-1968
Contributions to Statistics
ISBN 978-3-319-28723-2 ISBN 978-3-319-28725-6 (eBook)
DOI 10.1007/978-3-319-28725-6

Library of Congress Control Number: 2016933767

Mathematics Subject Classification (2010): 37M10, 62M10, 62-XX, 68-XX, 60-XX, 58-XX, 37-XX

© Springer International Publishing Switzerland 2016


Chapter 23 was created within the capacity of an US governmental employment. US Copyright protection
does not apply.
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, express or implied, with respect to the material contained herein or for any
errors or omissions that may have been made.

Printed on acid-free paper

This Springer imprint is published by Springer Nature


The registered company is Springer International Publishing AG Switzerland
Preface

From the measurements of natural phenomena, such as the amount of rainfall or


temperature in a given region, to the electricity consumption of a city or a country,
they all can be considered as a mere succession of values obtained, normally at
regular intervals, or a time series. Let us think for a moment what if we could
know any of the values of these series in advance; it would never rain on our
freshly washed car. The usefulness of the study of time series is of huge importance
for our society in general. At this point it should be noted that there are many
types of time series caused by different phenomena and therefore there exist many
different behaviors. If the phenomenon behind the series were known with enough
accuracy, we could build a model from it, and our prediction could be easy, direct,
and accurate. However, most phenomena are not known or, at least, are not known
with sufficient precision, and thus to obtain a model from the phenomenon becomes
unfeasible, unless we are able to infer the model from the values of the series.
This book is intended to provide researchers with the latest advances in the
immensely broad field of time series analysis and forecasting (more than 200,000
papers published in this field since 2000 according to Thomson Reuters’ Web of
Science). Within this context, not only will we consider that the phenomenon or
process where the values of the series come from is such that the knowledge of past
values of the series contains all available information to predict future values but we
will also address the more general case in which other variables outside the series,
also called external or exogenous variables, can affect the process model. It should
also be noted that these exogenous variables can be discrete variables (day of the
week), continuous variables (outside temperature), and even other time series.
The applications in this field are enormous, from weather forecasting or analysis
of stock indices to modeling and prediction of any industrial, chemical, or natural
process. Therefore, a scientific breakthrough in this field exceeds the proper limits
of a certain area. This being said, the field of statistics can be considered the nexus
of all of them, and, for that reason, this book is published in the prestigious series
Contributions to Statistics of the Springer publishing house.
The origin of this book stems from the International Work-Conference on Time
Series, ITISE 2015, held in Granada (Spain) in July 2015. Our aim with the

v
vi Preface

organization of ITISE 2015 was to create a friendly discussion forum for scientists,
engineers, educators, and students about the latest ideas and realizations in the
foundations, theory, models, and applications for interdisciplinary and multidis-
ciplinary research encompassing disciplines of statistics, mathematical models,
econometrics, engineering, and computer science in the field of time series analysis
and forecasting.
The list of topics in the successive Call for Papers has also evolved, resulting in
the following list for the last edition:
1. Time Series Analysis and Forecasting
• Nonparametric and functional methods
• Vector processes
• Probabilistic approach to modeling macroeconomic uncertainties
• Uncertainties in forecasting processes
• Nonstationarity
• Forecasting with many models. Model integration
• Forecasting theory and adjustment
• Ensemble forecasting
• Forecasting performance evaluation
• Interval forecasting
• Econometric models
• Econometric forecasting
• Data preprocessing methods: data decomposition, seasonal adjustment, singu-
lar spectrum analysis, and detrending methods
2. Advanced Methods and Online Learning in Time Series
• Adaptivity for stochastic models
• Online machine learning for forecasting
• Aggregation of predictors
• Hierarchical forecasting
• Forecasting with computational intelligence
• Time series analysis with computational intelligence
• Integration of system dynamics and forecasting models
3. High Dimension and Complex/Big Data
• Local vs. global forecast
• Techniques for dimension reduction
• Multiscaling
• Forecasting from Complex/Big Data
4. Forecasting in Real Problems
• Health forecasting
• Telecommunication forecasting
• Modeling and forecasting in power markets
• Energy forecasting
Preface vii

• Financial forecasting and risk analysis


• Forecasting electricity load and prices
• Forecasting and planning systems
• Real-time macroeconomic monitoring and forecasting
• Applications in other disciplines
At the end of the submission process of ITISE 2015, and after a careful peer
review and evaluation process (each submission was reviewed by at least 2, and on
the average 2.8, program committee members or additional reviewers), 131 papers
were accepted for oral or poster presentation, according to the recommendations of
reviewers and the authors’ preferences.
High-quality candidate papers (27 contributions, i.e., 20 % of the contributions)
were invited to submit an extended version of their conference paper to be
considered for this special publication in the book series of Springer: Contributions
to Statistics. For the selection procedure, the information/evaluation of the chairman
of every session, in conjunction with the review comments and the summary of
reviews, was taken into account.
So, now we are pleased to have reached the end of the whole process and
present the readers with these final contributions that we hope will provide a clear
overview of the thematic areas covered by the ITISE 2015 conference, ranging from
theoretical aspects to real-world applications of time series analysis and forecasting.
It is important to note that for the sake of consistency and readability of the book,
the presented papers have been classified into the following chapters:
• Part 1: Advanced Analysis and Forecasting Methods. This chapter deals
with the more recent theoretical advances and statistical methods for time series
analysis and forecasting. A total of seven contributions were selected where the
reader can learn, for example, about:
– How to disentangle two additive continuous Markovian stochastic processes,
which are a more general case of the so-called measurement noise concept,
given only a measured time series of the sum process
– How to model directionality in time series in order to provide more accurate
forecasting and more realistic estimation of extreme values
– How to elaborate likelihood-based simultaneous statistical inference methods
in dynamic factor models.
We have also included a study about recent advances in ARIMA decompo-
sition models and the relationship between the Beveridge-Nelson decomposi-
tion and exponential smoothing, the novel application of permutation entropy
as a complexity measure in time series, and an analysis of several types of
novel generative forecasting models for time-variant rates.
Finally, we conclude this chapter with a paper explaining very interesting
experimental findings about how the statistical behavior of the first-passage time
of a time series, i.e., the time required for an output variable that defines the time
series to return to a certain value, can reveal some very important properties of
the time series.
viii Preface

• Part 2: Theoretical and Applied Econometrics. Econometrics is a particularly


relevant field within the area of time series analysis and even more considering
the economic crisis we are going through these years almost worldwide. This
has made us give special emphasis, in this 2015 ITISE edition, to promote high-
quality scientific papers dedicated to the use of statistical models for the analysis
of economic systems. This chapter is the result of that effort. Within the pages
of this chapter, we can find papers of great interest at a theoretical level such
as the paper entitled “The environmental impact of economic activity on the
planet,” that is, just by reading the title, sufficiently motivating and gives us
very important conclusions, for example, that the influence of environmental
deterioration on mortality rates is positive and statistically significant. This
paper is accompanied by an original vision of the stock indices in emerging
and consolidated economies from a fractal perspective and a series of works of
analysis and forecasting of economic time series with special emphasis on stock
time series.
• Part 3: Applications in Time Series Analysis and Forecasting. No theory can
be considered useful until it is put into practice and the success of its predictions
is scientifically demonstrated. That’s what this chapter is about. It is shown how
multiple and rather different mathematical and statistical models can be used
for so many analyses and forecasts of time series in fields such as electricity
demand, road traffic, age-specific death rates, radiometry, sea-level rise, or social
networks.
• Part 4: Machine Learning Techniques in Time Series Analysis and Predic-
tion.
Finally, we wanted to finish this book showing a different view, another way
to address real problems using models commonly used in ICT-related disciplines
(ICT, information and communication technologies) such as computer science
and artificial intelligence. This kind of research may be lacking, in some cases,
the mathematical rigor that some well-known linear models give us, but they
show us new ways to address the problems of analysis and prediction of series
that have been validated experimentally and can provide very powerful solutions
in various fields of knowledge. The selection here was very strict, only five
contributions, but we are confident to give a clear enough vision of what we
have just said.
Last but not least, we would like to point out that this edition of ITISE was
organized by the University of Granada together with the Spanish Chapter of the
IEEE Computational Intelligence Society and the Spanish Network on Time Series
(RESeT). The guest editors would also like to express their gratitude to all the
people who supported them in the compilation of this book and especially to the
contributing authors for their submissions, the chairmen of the different sessions,
and to the anonymous reviewers for their comments and useful suggestions in order
to improve the quality of the papers.
We wish to thank our main sponsors as well: the Department of Computer
Architecture and Computer Technology, the Faculty of Science of the University of
Preface ix

Granada, the Research Centre for Information and Communications Technologies


(CITIC–UGR), and the Ministry of Science and Innovation for their support and
grants. Finally, we wish also to thank Alfred Hofmann, Vice President, Publishing–
Computer Science, Springer-Verlag, and Dr. Veronika Rosteck, Springer Associate
Editor, for their interest in editing a book of Springer based on the best papers of
ITISE 2015.
We hope the readers can enjoy these papers the same way as we did.

Granada, Spain Ignacio Rojas


Granada, Spain Hector Pomares
September 2015
Contents

Part I Advanced Analysis and Forecasting Methods


A Direct Method for the Langevin-Analysis of Multidimensional
Stochastic Processes with Strong Correlated Measurement Noise . . . . . . . . . 3
Teresa Scholz, Frank Raischel, Pedro G. Lind, Matthias Wächter,
Vitor V. Lopes, and Bernd Lehle
Threshold Autoregressive Models for Directional Time Series . . . . . . . . . . . . . 13
Mahayaudin M. Mansor, Max E. Glonek, David A. Green,
and Andrew V. Metcalfe
Simultaneous Statistical Inference in Dynamic Factor Models . . . . . . . . . . . . . 27
Thorsten Dickhaus and Markus Pauly
The Relationship Between the Beveridge–Nelson
Decomposition and Exponential Smoothing . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 47
Víctor Gómez
Permutation Entropy and Order Patterns in Long Time Series . . . . . . . . . . . . 61
Christoph Bandt
Generative Exponential Smoothing and Generative ARMA
Models to Forecast Time-Variant Rates or Probabilities . . . . . . . . . . . . . . . . . . . . 75
Edgar Kalkowski and Bernhard Sick
First-Passage Time Properties of Correlated Time Series
with Scale-Invariant Behavior and with Crossovers in the Scaling.. . . . . . . . 89
Pedro Carpena, Ana V. Coronado, Concepción Carretero-Campos,
Pedro Bernaola-Galván, and Plamen Ch. Ivanov

xi
xii Contents

Part II Theoretical and Applied Econometrics


The Environmental Impact of Economic Activity on the Planet .. . . . . . . . . . . 105
José Aureliano Martín Segura, César Pérez López,
and José Luis Navarro Espigares
Stock Indices in Emerging and Consolidated Economies
from a Fractal Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 113
María Antonia Navascués, Maria Victoria Sebastián,
and Miguel Latorre
Value at Risk with Filtered Historical Simulation . . . . . . . .. . . . . . . . . . . . . . . . . . . . 123
Mária Bohdalová and Michal Greguš
A SVEC Model to Forecast and Perform Structural Analysis
(Shocks) for the Mexican Economy, 1985Q1 –2014Q4 .. .. . . . . . . . . . . . . . . . . . . . 135
Eduardo Loría and Emmanuel Salas
Intraday Data vs Daily Data to Forecast Volatility in Financial Markets . . . . 147
António A.F. Santos
Predictive and Descriptive Qualities of Different Classes of
Models for Parallel Economic Development of Selected EU-Countries .. . . 161
Jozef Komorník and Magdaléna Komorníková
Search and Evaluation of Stock Ranking Rules Using Internet
Activity Time Series and Multiobjective Genetic Programming.. . . . . . . . . . . 175
Martin Jakubéci and Michal Greguš
Integer-Valued APARCH Processes .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 189
Maria da Conceição Costa, Manuel G. Scotto, and Isabel Pereira

Part III Applications in Time Series Analysis and Forecasting


Emergency-Related, Social Network Time Series: Description
and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 205
Horia-Nicolai Teodorescu
Competitive Models for the Spanish Short-Term Electricity
Demand Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 217
J. Carlos García-Díaz and Óscar Trull
Age-Specific Death Rates Smoothed
by the Gompertz–Makeham Function and Their Application
in Projections by Lee–Carter Model . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 233
Ondřej Šimpach and Petra Dotlačilová
An Application of Time Series Analysis in Judging the Working
State of Ground-Based Microwave Radiometers and Data Calibration . . . 247
Zhenhui Wang, Qing Li, Jiansong Huang, and Yanli Chu
Contents xiii

Identifying the Best Performing Time Series Analytics for Sea


Level Research.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 261
Phil. J. Watson
Modellation and Forecast of Traffic Series by a Stochastic Process .. . . . . . . 279
Desiree Romero, Nuria Rico, and M. Isabel Garcia-Arenas
Spatio-Temporal Modeling for fMRI Data .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 293
Wenjie Chen, Haipeng Shen, and Young K. Truong

Part IV Machine Learning Techniques in Time Series


Analysis and Prediction
Communicating Artificial Neural Networks with
Physical-Based Flow Model for Complex Coastal Systems . . . . . . . . . . . . . . . . . . 315
Bernard B. Hsieh
Forecasting Daily Water Demand Using Fuzzy Cognitive Maps . . . . . . . . . . . 329
Jose L. Salmeron, Wojciech Froelich, and Elpiniki I. Papageorgiou
Forecasting Short-Term Demand for Electronic Assemblies
by Using Soft-Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 341
Tamás Jónás, József Dombi, Zsuzsanna Eszter Tóth, and Pál Dömötör
Electrical Load Forecasting: A Parallel Seasonal Approach .. . . . . . . . . . . . . . . 355
Oussama Ahmia and Nadir Farah
A Compounded Multi-resolution-Artificial Neural Network
Method for the Prediction of Time Series with Complex Dynamics.. . . . . . . 367
Livio Fenga
Contributors

Oussama Ahmia Département d’Informatique, Université Badji Mokhtar Annaba,


El Hadjar, Algeria
Christoph Bandt Institute of Mathematics, University of Greifswald, Greifswald,
Germany
Pedro Bernaola-Galván Dpto. de Física Aplicada II, ETSI de Telecomunicación,
Universidad de Málaga, Málaga, Spain
Mária Bohdalová Faculty of Management, Comenius University in Bratislava,
Bratislava, Slovakia
Pedro Carpena Dpto. de Física Aplicada II, ETSI de Telecomunicación, Univer-
sidad de Málaga, Málaga, Spain
Concepción Carretero-Campos Dpto. de Física Aplicada II, ETSI de Telecomu-
nicación, Universidad de Málaga, Málaga, Spain
Wenjie Chen American Insurance Group, Inc., New York, NY, USA
Yanli Chu Institute of Urban Meteorological Research, CMA, Beijing, P. R. China
Ana V. Coronado Dpto. de Física Aplicada II, ETSI de Telecomunicación,
Universidad de Málaga, Málaga, Spain
Maria da Conceição Costa Departamento de Matemática and CIDMA, University
of Aveiro, Aveiro, Portugal
Thorsten Dickhaus Institute for Statistics, University of Bremen, Bremen,
Germany
József Dombi Institute of Informatics, University of Szeged, Szeged, Hungary
Pál Dömötör Research and Development Department, Flextronics International
Ltd, Budapest, Hungary

xv
xvi Contributors

Petra Dotlačilová Department of Statistics and Probability, Faculty of Informatics


and Statistics, University of Economics Prague, Prague, Czech Republic
Nadir Farah Département d’Informatique, Université Badji Mokhtar Annaba, El
Hadjar, Algeria
Wojciech Froelich The University of Silesia, ul. Bedzinska, Sosnowiec, Poland
M. Isabel Garcia-Arenas University of Granada, Granada, Spain
J. Carlos García-Díaz Applied Statistics, Operations Research and Quality
Department, Universitat Politècnica de València, Valencia, Spain
Max E. Glonek School of Mathematical Sciences, University of Adelaide,
Adelaide, SA, Australia
Víctor Gómez Ministry of Finance and P.A., Dirección Gral. de Presupuestos,
Subdirección Gral. de Análisis y P.E., Alberto Alcocer, Madrid, Spain
David A. Green School of Mathematical Sciences, University of Adelaide,
Adelaide, SA, Australia
Michal Greguš Department of Information Systems, Faculty of Management,
Comenius University in Bratislava, Bratislava, Slovakia
and
Faculty of Management, Comenius University in Bratislava, Bratislava, Slovakia
Bernard B. Hsieh US Army Engineer Research and Development Center, Vicks-
burg, MS, USA
Jiansong Huang Collaborative Innovation Center on Forecast and Evaluation of
Meteorological Disasters, CMA Key Laboratory for Aerosol-Cloud-Precipitation,
Nanjing University of Information Science & Technology, Nanjing, China
Plamen Ch. Ivanov Center for Polymer Studies and Department of Physics,
Boston University, Boston, MA, USA
and
Harvard Medical School and Division of Sleep Medicine, Brigham and Women’s
Hospital, Boston, MA, USA
and
Institute of Solid State Physics, Bulgarian Academy of Sciences, Sofia, Bulgaria
Martin Jakubéci Department of Information Systems, Faculty of Management,
Comenius University in Bratislava, Bratislava, Slovakia
Tamás Jónásás Research and Development Department, Flextronics International
Ltd, Budapest, Hungary
and
Department of Management and Corporate Economics, Budapest University of
Technology and Economics, Budapest, Hungary
Edgar Kalkowski University of Kassel, Kassel, Germany
Contributors xvii

Jozef Komorník Faculty of Management, Comenius University, Bratislava,


Slovakia
Magdaléna Komorníková Faculty of Civil Engineering, Slovak University of
Technology, Bratislava, Slovakia
Miguel Latorre Escuela de Ingeniería y Arquitectura, Universidad de Zaragoza,
C/ María de Luna, Zaragoza, Spain
Bernd Lehle Institute of Physics, Carl von Ossietzky University Oldenburg,
Oldenburg, Germany
Qing Li Collaborative Innovation Center on Forecast and Evaluation of Meteoro-
logical Disasters, CMA Key Laboratory for Aerosol-Cloud-Precipitation, Nanjing
University of Information Science & Technology, Nanjing, China
and
School of Atmospheric Physics, Nanjing University of Information Science &
Technology, Nanjing, China
Pedro G. Lind ForWind Center for Wind Energy Research, Carl von Ossietzky
University Oldenburg, Oldenburg, Germany
Livio Fenga UCSD, University of California San Diego, La Jolla, CA, USA
Vitor V. Lopes Universidad de las Fuerzas Armadas - ESPE, Sangolquí, Ecuador
CMAF-CIO, University of Lisbon, Lisbon, Portugal
Cèsar Pèrez López University Complutense of Madrid, Madrid, Spain
Eduardo Loría Center for Modeling and Economic Forecasting, School of Eco-
nomics, UNAM, Mexico City, Mexico
Mahayaudin M. Mansor School of Mathematical Sciences, University of
Adelaide, Adelaide, SA, Australia
José Aureliano Martín Segura University of Granada, Granada, Spain
Andrew V. Metcalfe School of Mathematical Sciences, University of Adelaide,
Adelaide, SA, Australia
José Luis Navarro Espigares University of Granada, Granada, Spain
María Antonia Navascués Escuela de Ingeniería y Arquitectura, Universidad de
Zaragoza, Zaragoza, Spain
Elpiniki I. Papageorgiou Information Technologies Institute, Center for Research
and Technology Hellas, CERTH, Thermi, Greece
and
Computer Engineering Department, Technological Educational Institute of Central
Greece, TK Lamia, Greece
Markus Pauly Institute of Statistics, University of Ulm, Ulm, Germany
xviii Contributors

Isabel Pereira Departamento de Matemática and CIDMA, University of Aveiro,


Aveiro, Portugal
Frank Raischel Instituto Dom Luiz (IDL), University of Lisbon, Lisbon, Portugal
Nuria Rico University of Granada, Granada, Spain
Desiree Romero University of Granada, Granada, Spain
Emmanuel Salas Center for Modeling and Economic Forecasting, School of
Economics, UNAM, Mexico City, Mexico
Jose L. Salmeron University Pablo de Olavide, Seville, Spain
António A.F. Santos Faculty of Economics, Monetary and Financial Research
Group (GEMF), Center for Business and Economics Research (CeBER), University
of Coimbra, Coimbra, Portugal
Teresa Scholz Center for Theoretical and Computational Physics, University of
Lisbon, Lisbon, Portugal
Manuel G. Scotto Departamento de Matemática and CIDMA, University of
Aveiro, Aveiro, Portugal
Maria Victoria Sebastián Escuela de Ingeniería y Arquitectura, Universidad de
Zaragoza, Zaragoza, Spain
Haipeng Shen Department of Statistics and Operations Research, University of
North Carolina, Chapel Hill, NC, USA
Bernhard Sick University of Kassel, Kassel, Germany
Ondřej Šimpach Department of Statistics and Probability, Faculty of Informatics
and Statistics, University of Economics Prague, Prague, Czech Republic
Horia-Nicolai Teodorescu Romanian Academy, Iasi Branch, Iasi, Romania
and
‘Gheorghe Asachi’ Technical University of Iasi, Iasi, Romania
Zsuzsanna Eszter Tóth Department of Management and Corporate Economics,
Budapest University of Technology and Economics, Budapest, Hungary
Óscar Trull Applied Statistics, Operations Research and Quality Department,
Universitat Politècnica de València, Valencia, Spain
Young K. Truong Department of Biostatistics, The University of North Carolina,
Chapel Hill, NC, USA
Matthias Wächter ForWind Center for Wind Energy Research, Carl von Ossietzky
University Oldenburg, Oldenburg, Germany
Zhenhui Wang Collaborative Innovation Center on Forecast and Evaluation of
Meteorological Disasters, CMA Key Laboratory for Aerosol-Cloud-Precipitation,
Nanjing University of Information Science & Technology, Nanjing, China
Contributors xix

and
School of Atmospheric Physics, Nanjing University of Information Science &
Technology, Nanjing, China
Phil. J. Watson School of Civil and Environmental Engineering, University of
New South Wales, Sydney, NSW, Australia
Part I
Advanced Analysis and Forecasting
Methods
A Direct Method for the Langevin-Analysis
of Multidimensional Stochastic Processes
with Strong Correlated Measurement Noise

Teresa Scholz, Frank Raischel, Pedro G. Lind, Matthias Wächter,


Vitor V. Lopes, and Bernd Lehle

Abstract This paper addresses the problem of finding a direct operational method
to disentangle the sum of two continuous Markovian stochastic processes, a more
general case of the so-called measurement noise concept, given only a measured
time series of the sum process. The presented method is based on a recently
published approach for the analysis of multidimensional Langevin-type stochastic
processes in the presence of strong correlated measurement noise (Lehle, J Stat Phys
152(6):1145–1169, 2013). The method extracts from noisy data the respective drift
and diffusion coefficients corresponding to the Itô–Langevin equation describing
each stochastic process. The method presented here imposes neither constraints nor
parameters, but all coefficients are directly extracted from the multidimensional
data. The method is introduced within the framework of existing reconstruction

T. Scholz ()
Center for Theoretical and Computational Physics, University of Lisbon, Lisbon, Portugal
e-mail: [email protected]
F. Raischel
Instituto Dom Luiz (IDL), University of Lisbon, Lisbon, Portugal
Closer Consulting Avenida Engenheiro Duarte Pacheco, Torre 1-15ı Andar, 1070-101 Lisboa,
Portugal
e-mail: [email protected]
M. Wächter
ForWind Center for Wind Energy Research, Carl von Ossietzky University Oldenburg,
Oldenburg, Germany
P.G. Lind
ForWind Center for Wind Energy Research, Carl von Ossietzky University Oldenburg,
Oldenburg, Germany
Institut für Physik, Universität Osnabrück, Barbarastrasse 7, 49076 Osnabrück, Germany
V.V. Lopes
Universidad de las Fuerzas Armadas - ESPE, Sangolquí, Ecuador
CMAF-CIO, University of Lisbon, Lisbon, Portugal
B. Lehle
Institute of Physics, Carl von Ossietzky University Oldenburg, Oldenburg, Germany

© Springer International Publishing Switzerland 2016 3


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_1
4 T. Scholz et al.

methods, and then applied to the sum of a two-dimensional stochastic process


convoluted with an Ornstein–Uhlenbeck process.

Keywords Langevin analysis • Measurement noise • Stochastic process

1 Introduction

Given a measured stochastic time series, we can verify if it is generated by an


underlying continuous Markovian process, and we can completely reconstruct this
process from the measurement solely [1]. What about measurements originating
from the sum of two independent stochastic processes? This problem has been
solved for the case that one of these additions be uncorrelated Gaussian noise,
and in this case the complete description of the two stochastic components has
been accomplished even for very high intensities of this so-called measurement
noise. However, for the more complicated case of two independent, genuine (i.e.,
autocorrelated) stochastic processes in Langevin form in multiple dimensions, only
recently a solution has been found for this problem. The solution presented in [2]
yields a parametrized reconstruction of the sum of one general multidimensional
Langevin process (X) with multiplicative noise and a second, multidimensional
Ornstein–Uhlenbeck process (Y). Based on this finding, we here present the
complete solution of the problem without the requirement for parametrizing the
processes, and we give a brief review of the relevant literature and important recent
developments.
Continuous Markovian processes are often modeled through two equivalent
descriptions, the Langevin or the Fokker–Planck description. Whereas the Langevin
description considers the temporal evolution of a single instance of a stochastic
variable under the influence of a Gaussian white noise source t in a stochastic
differential equation (SDE) of the form
q
XP D D.1/ .X/ C D.2/ .X/t ; (1)

the Fokker–Planck equation [3] describes the temporal evolution of the correspond-
ing probability density function (pdf) in a PDE of the form

@P.x/ @ 1 @2 .2/
D  D.1/ P.x/ C D P.x/ : (2)
@t @X 2 @X 2
A topic of increasing interest in recent years has been the question how—given a
measured time series of a stochastic variable—it can be confirmed that a stochastic
process with Markov properties generates the time series, and if the process can
be reconstructed by means of the SDE or Fokker–Planck description. Friedrich and
Peinke have shown how the Kramers–Moyal coefficients, also called the drift (D.1/ )
and diffusion (D.2/ ) coefficients, can be directly derived from observed or generated
Correlated Measurement Noise 5

data [4], a method that has found widespread application [1], for example for the
description of turbulence [4, 5], financial markets [6], wind energy generation [7, 8],
and biological systems [9]. A better estimation of transition probabilities can be
achieved by using non-parametric Kernel [10] or Maximum Likelihood estimators
[11].
Given an unknown stochastic time series, it is not clear a priori whether it
follows a Langevin equation driven by Gaussian white noise. In order to apply the
reconstruction process, it is therefore advisable to consider the following approach:
first, it should be confirmed that the time series has (approximately) Markovian
properties, e.g., by performing a Wilcoxon test [5]. If the latter is positive, one
can bin the data and calculate marginal and conditional pdfs [5], or evaluate the
conditional moments directly [12]. From these, if the limit in Eq. (13) below exists,
one can calculate the drift and diffusion coefficients. Finally, it should be verified
whether the noise is actually Gaussian: by inverting the discretized Langevin
equation one can solve for the noise increments in each time step and consider their
distribution [13].
In N dimensions, the evolution of a set of stochastic variables can be described
by an equivalent system of Itô–Langevin equations, where the stochastic equations
defined by a deterministic contribution (drift) and fluctuations from stochastic
sources (diffusion) show quite complex behavior [14]. For the general case of a
N-dimensional stochastic process X.t/ the equation is given by
q
dX D D.1/ .X/dt C D.2/ .X/dW.t/; (3)

where dW denotes a vector of increments of independent Wiener processes with


hdWi dWj i D ıij dt, 8i; j D 1; : : : ; N. D.1/ .X/ and D.2/ .X/ are the Kramers–
Moyal coefficients of the corresponding Fokker–Planck equation that describes the
evolution of the joint probability density function .x/ of the stochastic variables
X. However, often stochastic processes X cannot be recorded directly, but are
superimposed by additional, uncorrelated noise introduced by a measurement
process, or are convoluted with additional—typically faster—stochastic processes.
In both cases, the additional sources Y have been addressed as measurement
noise in the literature [15, 16]. The measurement then consists of the sum of
two stochastic processes. Methods have been presented to recover the underlying
stochastic process and the magnitude of the measurement noise for the case of
uncorrelated noise [15–17], and even for correlated measurement noise in one
dimension [12]. In this report, we present a novel methodology that allows to
extract both the underlying multidimensional multiplicative noisy process X, and an
overlayed multidimensional correlated measurement noise process Y—solely from
the measured noisy time series X .t/ D X.t/ C Y.t/. Our algorithm is based on a
method introduced by Lehle [2]; however, the present approach is parameter-free.
6 T. Scholz et al.

We assume in the following the measurement noise Y.t/ to be described by an


Ornstein–Uhlenbeck process in N dimensions:
p
dY.t/ D AY.t/dt C BdW.t/; (4)

where A and B are N  N matrices, B is symmetric positive semi-definite and


the eigenvalues of A have a positive real part. A describes the correlation of the
measurement noise, whereas B contains information about the amplitude of the
measurement noise. It has to be assumed that Y is faster than X, i.e., that the inverse
eigenvalues of A are considerably smaller than the time scale of X.
The methodology presented here allows to extract the measurement noise
parameters A and B as well as the drift and diffusion coefficients D.1/ .X/ and
D.2/ .X/ from noisy data X , which yields a complete description of X and Y.

2 Methodology

The method relies on the direct extraction of the noisy joint moments m from the
measured time series, which are defined as
Z
.0/
m .x/ D  .x; x0 ; /dx01 : : : dx0N (5a)
x0
Z
.1/
mi .x; / D .x0i .t C /  xi .t// .x; x0 ; /dx01 : : : dx0N (5b)
x0
Z
.2/
mij .x; / D .x0i  xi /.x0j .t C /  xj .t// .x; x0 ; /dx01 : : : dx0N ; (5c)
x0

in analogy to the joint moments m and conditional moments h,

.1/
hi .x; / D hŒXi .t C /  Xi .t/ijX.t/Dx (6a)
.2/  
hij .x; / D hŒXi .t C /  Xi .t/ Xj .t C /  Xj .t/ ijX.t/Dx (6b)

of the unperturbed process [1] and with the marginal and joint density of the
compound process

 .x/ D p.x; t/ (7a)


 .x; x0 ; / D p.x; tI x0 ; t C / D  .x/p.x0 ; t C jx; t/ : (7b)

These quantities can be computed directly from the time series X . In the next step,
from the first noisy joint moments m.1/ .x; /, a N  N matrix Z is computed by
Z
.1/
Zij ./ D mi .x; /xj dx1 : : : dxN : (8)
x
Correlated Measurement Noise 7

Around Z, a nonlinear equation system is constructed:


max
X
Z.t/ D P./ .1t/  .Id  M.1/1 /V
D1
max
X
Z.2t/ D P./ .2t/  .Id  M.1/2 /V
D1 (9)
::
:
max
X
Z.kmax t/ D P./ .kmax t/  .Id  M.1/kmax /V:
D1

Here, P.1 / ; : : : ; P.max / are auxiliary matrices that describe an expansion in temporal
increments , which are integer multiples of the temporal sampling t of the
measured time series. Their definition as well as the derivation of equation
system (9) are described in [2]. The matrices M./ D M.kt/ and V are defined by
the measurement noise parameters A and B through the following equations:

M.kt/ D eAk D M.1/k (10a)


Z 1
eAs BeA s ds :
T
VD (10b)
0

Numerical solution of the over-determined nonlinear system of Eq. (9) through


least-square optimization yields the complete characterization of the measurement
noise Y through M.1/; V. The optimization was performed by Ipopt, a nonlinear
interior-point solver [18], using a formulation within the Casadi computation
framework [19] and applying the optimized HSL linear solvers [20]. Having
obtained the measurement noise process Y, in a final step an approximation of the
perturbed moments m in terms of the convolution of the unperturbed moments m
with the measurement noise density Y yields the following system of equations

m.0/ D Y  m.0/ (11a)


.1/ .1/
mi D Y  .hi m.0/ / C Qii0 @i0 m.0/ (11b)
.2/ .2/
mij D Y  .hij m.0/ / C .Qij C Qji  Qii0 Qjj0 @i0 @j0 /m.0/ (11c)
.1/ .1/
C Qii0 @i0 mj C Qjj0 @j0 mi ; (11d)

where the Einstein summation convention is used [2]. The Qij are coefficients
.1/ .2/
related to the measurement noise, and hi ; hij are moments of the conditional
8 T. Scholz et al.

increments of X,
ˇ
hi .x; / D hŒXi .t C /  Xi .t/ ˇX.t/Dx i
.1/
(12a)
 ˇ
hij .x; / D hŒXi .t C /  Xi .t/ Xj .t C /  Xj .t/ ˇX.t/Dx i :
.2/
(12b)

Equations (11) are solved in the least square sense through nonlinear optimization
within the aforementioned Casadi/Ipopt/HSL framework [18–20]. Details of this
approach will be outlined elsewhere [21]. In a final step, from the conditional
moments, the drift and diffusion coefficients are computed:

.1/
.1/ 1 mi .x; /
Di .x/ D lim (13a)
 !0  m.0/ .x/

.2/
.2/ 1 mij .x; /
Dij .x/ D lim ; (13b)
 !0  m.0/ .x/

which completely describe the underlying process for the evolution of X, Eq. (3).

3 Results

To illustrate the usefulness of this method, we present the same numerical example
as [2], namely the stochastic process
q
dX D D.1/ .X/dt C D.2/ .X/dW.t/; (14)

with the nonlinear drift and diffusion coefficients


! !
.1/
x  xy .2/ 0:5 0
D .x; y/ D 2 D .x; y/ D (15)
x y 0 0:5.1 C x2 /

and correlated measurement noise from an Ornstein–Uhlenbeck process


p
dY.t/ D AY.t/dt C BdW.t/; (16)

with coefficients
! !
200 200=3 75 425=12
AD BD : (17)
0 200=3 425=12 125=6
Correlated Measurement Noise 9

Fig. 1 Sample of original two-dimensional time series X (left), measurement noise Y (middle),
and resulting time series X D X C Y (right)

Fig. 2 Zeroth (left), first (middle), and second (right) moment estimated directly from a synthetic
time series X (top) and reconstructed by our parameter-free method (bottom)

As can be seen in Fig. 1, the measurement noise Y introduces a considerable level


of additional noise and correlation to the original process X, turning the latter quite
opaque in the resulting empirical time series X D X C Y. We then apply the
procedure described above on a compound time series X of 106 data points and
can reconstruct both the measurement noise Y and the original process X. As an
.1/
example, in Fig. 2 we display the first component of the first joint moment m1 .
It can be concluded that our method recovers the underlying process faithfully,
although it must be mentioned that the numerical details of the process require
some attention [21]. One important observation is in place with respect to the
.1/ .2/
behavior of the noisy conditional moments hi and hij as a function of the
delay . Whereas in the “classical” case of measurement noise, i.e., the addition of
a second uncorrelated Gaussian noise source to a stochastic process signal produces
a non-vanishing positive offset in the moments, which is roughly proportional to
the intensity of the measurement noise signal [15], this case is different. When
adding two autocorrelated stochastic processes with different time scales, we clearly
10 T. Scholz et al.

.1/
Fig. 3 The behavior (points) of one of the first noisy moment h1 (left) and one of the second
.2/
noisy moment h11 (right) as a function delay  . A crossover between time scales can be clearly
seen. Straight lines: faster (Y) and slower (X) moments of the measurement noise and the general
process for comparison

observe these two different time scales in the moment plots, with a crossover
behavior, see Fig. 3.

Acknowledgements Authors gratefully acknowledge support from Fundação para a Ciência


e a Tecnologia (FCT) through SFRH/BD/86934/2012 (TS), SFRH/BPD/65427/2009 (FR),
UID/GEO/50019/2013-ILD-LA (FR), German Federal Ministry for Economic Affairs and Energy
0325577B (PGL), from FCT and German Academic Exchange Service DRI/DAAD/1208/2013
(TS, FR, P GL, MW) and for the internship fellowship through IPID4all from University of
Oldenburg (TS). VVL thanks the Prometeo Project of SENESCYT (Ecuador) for financial support.
This work is partially supported by FCT, UID/MAT/04561/2013 (VVL).

References

1. Friedrich, R., Peinke, J., Sahimi, M., Tabar, M.R.R.: Approaching complexity by stochastic
methods: from biological systems to turbulence. Phys. Rep. 506(5), 87–162 (2011)
2. Lehle, B.: Stochastic time series with strong, correlated measurement noise: Markov analysis
in n dimensions. J. Stat. Phys. 152(6), 1145–1169 (2013)
3. Risken, H., Frank, T.: The Fokker-Planck Equation. Springer Series in Synergetics, vol. 18.
Springer, Berlin, Heidelberg (1996)
4. Friedrich, R., Peinke, J.: Description of a turbulent cascade by a Fokker-Planck equation. Phys.
Rev. Lett. 78, 863–866 (1997)
5. Renner, C., Peinke, J., Friedrich, R.: Experimental indications for Markov properties of small-
scale turbulence. J. Fluid Mech. 433, 383–409 (2001)
6. Ghashghaieand, S., Breymann, W., Peinke, J., Talkner, P., Dodge, Y.: Turbulent cascades in
foreign exchange markets. Nature 381, 767–770 (1996)
7. Milan, P., Wächter, M., Peinke, J.: Turbulent character of wind energy. Phys. Rev. Lett., 110,
138701 (2013)
8. Raischel, F., Scholz, T., Lopes, V.V., Lind, P.G.: Uncovering wind turbine properties through
two-dimensional stochastic modeling of wind dynamics. Phys. Rev. E 88, 042146 (2013)
9. Zaburdaev, V., Uppaluri, S., Pfohl, T., Engstler, M., Friedrich, R., Stark, H.: Langevin dynamics
deciphers the motility pattern of swimming parasites. Phys. Rev. Lett. 106, 208103 (2011)
Correlated Measurement Noise 11

10. Lamouroux, D., Lehnertz, K.: Kernel-based regression of drift and diffusion coefficients of
stochastic processes. Phys. Lett. A 373(39), 3507–3512 (2009)
11. Kleinhans, D.: Estimation of drift and diffusion functions from time series data: a maximum
likelihood framework. Phys. Rev. E 85, 026705 (2012)
12. Lehle, B.: Analysis of stochastic time series in the presence of strong measurement noise.
Phys. Rev. E 83, 021113 (2011)
13. Raischel, F., Russo, A., Haase, M., Kleinhans, D., Lind, P.G.: Optimal variables for describing
evolution of no2 concentration. Phys. Lett. A 376, 2081–2089 (2012)
14. Vasconcelos, V.V., Raischel, F., Haase, M., Peinke, J., Wächter, M., Lind, P.G., Kleinhans, D.:
Principal axes for stochastic dynamics. Phys. Rev. E 84, 031103 (2011)
15. Böttcher, F., Peinke, J., Kleinhans, D., Friedrich, R., Lind, P.G., Haase, M.: Reconstruction
of complex dynamical systems affected by strong measurement noise. Phys. Rev. Lett. 97,
090603 (2006)
16. Lind, P.G. Haase, M., Böttcher, F., Peinke, J., Kleinhans, D., Friedrich, R.: Extracting strong
measurement noise from stochastic time series: applications to empirical data. Phys. Rev. E
81, 041125 (2010)
17. Carvalho, J., Raischel, F., Haase, M., Lind, P.G.: Evaluating strong measurement noise in data
series with simulated annealing method. J. Phys. Conf. Ser. 285, 012007 (2011)
18. Wächter, A., Biegler, L.T.: On the implementation of an interior-point filter line-search
algorithm for large-scale nonlinear programming. Math. Program. 106, 25–57 (2006)
19. Andersson, J., Houska, B., Diehl, M.: Towards a computer algebra system with automatic
differentiation for use with object-oriented modelling languages. In: Proceedings of the 3rd
International Workshop on Equation-Based Object-Oriented Modeling Languages and Tools,
Oslo, pp. 99–105 (2010)
20. HSL: A collection of Fortran codes for large scale scientific computation. http://www.hsl.rl.ac.
uk/ (2013)
21. Scholz, T., Raischel, F., Wächter, M., Lehle, B., Lopes, V.V., Lind, P.G., Peinke, J.: Parameter-
free resolution of the superposition of stochastic signals. Phys. Rev. E (2016, submitted)
Threshold Autoregressive Models for Directional
Time Series

Mahayaudin M. Mansor, Max E. Glonek, David A. Green,


and Andrew V. Metcalfe

Abstract Many time series show directionality as plots against time and against
time-to-go are qualitatively different. A stationary linear model with Gaussian
noise is non-directional (reversible). Directionality can be emulated by introducing
non-Gaussian errors or by using a nonlinear model. Established measures of
directionality are reviewed and modified for time series that are symmetrical
about the time axis. The sunspot time series is shown to be directional with
relatively sharp increases. A threshold autoregressive model of order 2, TAR(2) is
fitted to the sunspot series by (nonlinear) least squares and is shown to give an
improved fit on autoregressive models. However, this model does not model closely
the directionality, so a penalized least squares procedure was implemented. The
penalty function included a squared difference of the discrepancy between observed
and simulated directionality. The TAR(2) fitted by penalized least squares gave
improved out-of-sample forecasts and more realistic simulations of extreme values.

Keywords Directional time series • Penalized least squares • Reversibility •


Sunspot numbers • Threshold autoregressive models

1 Introduction

Directionality, defined as asymmetry in time [3], enables us to tell the difference


between a sequence of observations plotted in time order (time series) and the
sequence plotted in reverse time order (time-to-go). A clear example of directional-
ity can be seen in the average yearly sunspot numbers 1700–2014 (Fig. 1). A time
series model is reversible if, and only if, its serial properties are symmetric, with
respect to time and time-to-go. A linear time series model with Gaussian errors
(LGE model) is reversible, but a linear time series model with non-Gaussian errors
is directional (LNGE model) [3]. Nonlinear time series models are also directional,
whether or not the errors are Gaussian (NLGE and NLNGE models, respectively).

M.M. Mansor () • M.E. Glonek • D.A. Green • A.V. Metcalfe


School of Mathematical Sciences, University of Adelaide, Adelaide, SA 5005, Australia
e-mail: [email protected]; [email protected];
[email protected]; [email protected]

© Springer International Publishing Switzerland 2016 13


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_2
14 M.M. Mansor et al.

Fig. 1 Graphical inspection of directionality shows the sunspot observations rise more quickly
than they fall in time order (above) and rise more slowly than they fall in reverse time order (below)

Directionality is important for forecasting because it indicates that models other


than LGE should be considered [2, 3]. If the error distribution is better modeled
as non-Gaussian, this will lead to more precise limits of prediction for forecasts. If
nonlinear models provide a better fit than linear models, the forecast from nonlinear
models will be more accurate. Modeling directionality also leads to more realistic
simulation of extreme values.
The paper is arranged as follows. Section 2 describes well-established procedures
for detecting directionality in time series, together with a modification for direc-
tional time series that are symmetrical about the time axis. In Sect. 3 we consider
modeling directionality using TAR(2) models fitted by penalized least squares. In
Sect. 4 we provide evidence of directionality in the sunspot series, discuss a model
and simulation results. The model shows improved predictions and a more realistic
distribution of maxima from a long simulation. A conclusion is given in Sect. 5.

2 Detecting Directionality

In general any trend or seasonality should be removed before investigating direc-


tionality in the stationary series. There are many possible directional features in time
series, and these include, for example, sharp increases followed by slow recessions
or slow increases followed by sharp decreases. In such cases the series is asymmetric
with respect to time and also with respect to its mean or its median. In contrast a
time series may exhibit both sharp increases and sharp decreases, followed by more
Threshold Autoregressive Models for Directional Time Series 15

gradual returns to the median value. Such time series are asymmetric with respect to
time but symmetric with respect to the median. Different statistics are appropriate
for detecting directionality in series that are asymmetric or symmetric with respect
to the median.
In this paper, we employ relatively simple and well-established tests [3] to
detect directionality in time series: difference in linear quadratic lagged correlations;
proportion of positive differences; skewness of differences; and tests based on
comparisons of time from threshold to peak against time from peak to threshold.
More recent tests are based on properties of Markov chains [1]; spectral estimation
of kurtosis of differences in financial time series [9]; and the FeedbackTS package
in R to detect time directionality occurring in specific fragments of time series [8].

2.1 Difference in Linear Quadratic Lagged Correlations

Directionality has the complementary concept of reversibility. Demonstrating evi-


dence that a series is not reversible is another way of expressing that the series
is directional. Following [3], a time series modeled by random variables fXt g for
t D 0; ˙1; ˙2; : : : is reversible if the joint distribution of Xt ; XtC1 ; : : : ; XtCr and
XtCr ; XtCr1 ; : : : ; Xt is the same for all r D 1; 2; : : : . In particular, a time series that
is reversible has
2
Corr.Xt ; XtC1 / D Corr.Xt2 ; XtC1 /: (1)

A measure of directionality can be based on the difference in the sample


estimates of these correlations. The non-dimensional measure used here is
Pn1 2
Pn1
tD1 .xt  xN /.xtC1  xN / tD1 .xt  xN /2 .xtC1  xN /
DLQC D  Pn 3=2   Pn 3=2 : (2)
tD1 .xt  xN /2 tD1 .xt  x N /2

The rationale behind this statistic is as follows. Consider for example, there are
sharp increases followed by slow recessions. Suppose the sharp increase occurs
between xt and xtC1 , then .xt  xN / could be negative or positive but .xtC1  xN / is very
likely to be positive. It follows that .xt  xN /.xtC1  xN /2 is negative or positive whereas
.xt  xN /2 .xtC1  xN / is positive and hence DLQC will tend to be negative. Both terms
in DLQC are correlations, so bounds for the DLQC are Œ2; 2, but typical values
in directional time series are smaller by two orders of magnitude. For the sunspot
series, which exhibits clear directionality, with relatively sharp increases, DLQC is
0:06.
16 M.M. Mansor et al.

2.2 Methods Based on First Differences

More intuitive measures of directionality can be based on the distribution of lag


one differences. For example, if there are sharp increases and slow recessions, there
will be fewer large positive differences and more small negative differences. The
distribution will be positively skewed. Let the observed time series be Xt and define
the lag one first order differences1 as

Yt D Xt  Xt1 for t D 2; 3; : : : ; n: (3)

2.2.1 Percentage of Positive Differences

The percentage of positive differences is


number of positive Yt
PC D  100: (4)
number of positive Yt C number of negative Yt
This formula excludes possible zero differences. If the time series is symmetric
about the median and there are both sharp increases and sharp decreases, then there
will tend to be both large positive and large negative differences, which tend to
cancel out. Therefore, we adjust the measure to

.PC 
above / C .Pbelow /
Pabm D  100; (5)
.PC  C 
above / C .Pabove / C .Pbelow / C .Pbelow /

where a difference is classified as above or below according to whether Xt is above


or below the median. Also, PC 
above , and Pbelow , are the proportions of differences
above, or below the median that are positive, or negative.
If a series is reversible, the expected values of the percentages in Eqs. (4) and (5)
are 50 %. If differences are treated as independent, in a time series of length 1000,
Pabm would need to be differ from 50 % by at least 3.2 % to be statistically significant
at the 0.05 level.

2.2.2 Product Moment Skewness of Differences

A potentially more sensitive test for directionality is to consider the skewness of the
distribution of differences [3] given by
Pn
.yt  yN /3 =.n  1/
O D  P tD1 3=2 : (6)
n
N /2 =.n  1/
tD1 .yt  y

1
In the following, “differences” refers to these lag one differences.
Threshold Autoregressive Models for Directional Time Series 17

If a time series has both sharp increases and sharp decreases and is symmetric about
the median we adapt the definition to be

Oabm D jOabove j C jObelow j : (7)

Significant nonzero skewness of either O or Oabm is evidence of directionality.

2.3 Threshold-Peak or Threshold-Trough Test

Consider a threshold (H) set, in this investigation, at the upper quintile of the
marginal distribution of the time series xt . Suppose that xj1 < H, xj > H, and
that xt remains above H until xjCkC1 < H. Denote the time when xt is greatest (peak
value) for j  t  . j C k/ as . j C p/. Define the difference between time from
threshold to peak and time from the peak to the threshold as

DHPPHj D .k  p/  p: (8)

A similar definition can be constructed for a threshold-trough test, using least


values (troughs) of series of observations below the lower quintile (L). Denote the
difference between time from threshold to trough and time from trough to threshold
as DLTTLj . Calculate DHPPH and DLTTL as the average of DHPPHj and DLTTLj
respectively for all exceedances of H and excursions below L. The expected value
of DHPPH and DLTTL is 0 for a reversible series.

2.4 Evidence of Directionality

In general, a directional time series will not show directionality on all of these
statistics. Any one test being statistically significant at the ˛ level, where ˛ <
0.05 say, is evidence of directionality. A conservative allowance for multiple testing
would be to use a Bonferroni inequality and claim overall significance at less than
an m˛ level where m is the number of tests.

3 Modeling Directionality

Mansor et al. [5] considered first order autoregressive processes AR(1) of the form

Xt D ˛Xt1 C t ; (9)

and showed that the choice of non-Gaussian error distributions could lead to a
variety of significant directional features. Furthermore, they demonstrated that
18 M.M. Mansor et al.

realizations from first order threshold autoregressive models TAR(1) with two
thresholds (TL ; TU ) of the form
8
< ˛U Xt1 C t if Xt1 > TU
Xt D ˛M Xt1 C t if TU < Xt1 < TL (10)
:
˛L Xt1 C t if Xt1 > TL

show substantial directionality even with Gaussian errors. They also found that the
product moment skewness of differences (O ) was generally the most effective statis-
tic for detecting directionality. They subsequently fitted a second order threshold
autoregressive model TAR(2) with one threshold (T) of the form

˛1U Xt1 C ˛2U Xt2 C t if Xt1 > T
Xt D (11)
˛1L Xt1 C ˛2L Xt2 C t if Xt1 < T

to the first 200 values in the sunspot series, by nonlinear least squares. The TAR(2)
gave some improvement over an AR(2) model for one-step ahead predictions of
the remaining 115 values in the sunspot series that were not used in the fitting
procedure. However they noted that there was scope for more realistic modeling
of directionality.
Here we consider the strategy of using a penalized least squares procedure for
the fitting of the TAR(2) model to the sunspot series. Initially, the objective function
to be minimized was

X
n
!D rt2 C .Osimulated  Oobserved /2 ; (12)
tD3

where frt g for t D 3; : : : ; n are the residuals defined by



xt  ˛O 1U .xt1 /  ˛O 2U .xt2 / if .xt1 / > T
rt D (13)
xt  ˛O 1L .xt1 /  ˛O 2L .xt2 / if .xt1 / < T;

where fxt g for t D 1; : : : ; n is the mean adjusted (to 0) time series, and Oobserved is
the directionality calculated for the 200 year sunspot series.
For any candidate set of parameter values, the optimization routine has to
determine not only the sum of squared errors, but also the directionality ( ). The
directionality  is not known as an algebraic function of the model parameters, so it
is estimated by simulation of the TAR(2) model with the candidate parameters and
resampled residuals. A simulation of length 2  105 was used to establish  to a
reasonable precision and this is referred to as Osimulated .
The R function optim() which uses the Nelder–Mead algorithm [6] was used
to optimize the parameter values. The long simulation for every set of candidate
parameters makes this a challenging optimization problem, but convergence was
typically achieved within 30 min on a standard desktop computer. The sum of
Threshold Autoregressive Models for Directional Time Series 19

squared residuals inevitably increases as increase from 0, but a substantial


reduction in the difference between Osimulated and Oobserved could be achieved with
a relatively small increase in the sum of squared residuals.
However, the procedure was not found to be satisfactory because simulations
with the optimized parameter values and resampled residuals were found to give
lower marginal standard deviations than the standard deviation of the observed time
series ( O observed ) (the marginal standard deviation depending on the parameter values
as well as the standard deviation of the error distribution). A lower marginal standard
deviation would result in underestimation of the variability of extreme values, and
lead to unrealistically narrow prediction intervals.
A solution is to include a requirement that the standard deviation of the fitted
series should match that of the observed series in the optimization criterion. A
standard deviation of the TAR(2) model ( O simulated ) with candidate parameter values
can conveniently be calculated along with the Osimulated . We modify (12) to
X
n
!D rt2 C 1 .Osimulated  Oobserved /2 C 2 . O simulated  O observed /2 ; (14)
tD3

where 1 and 2 are the weight given to mitigate the discrepancy in O and O ,
respectively. The modification does not noticeably increase the run time. Detailed
results are given in Sect. 4.

4 The Sunspot Series

We provide formal evidence of directionality in the average yearly sunspots (1700–


2014) [7], and fit time series models to the first 200 points (1700–1900). We use the
remaining 115 points (1901–2014) to compare the one-step-ahead forecast errors.
We also compare the distribution of extreme values from the observed sunspots
(1700–2014) and the simulated directional series using various error distributions.

4.1 Directionality in Sunspots

In Table 1 the DLQC, PC and O all indicate directionality in the series and all
have P-values of 0.00 (two-sided P-values calculated from a parametric bootstrap
procedure) [4]. The P-value for Pabm and Oabm is 0.28 and 0.79 respectively.

Table 1 Summary table of test statistics of directionality for the sunspot series (1700–2014)
Series Length Mean sd DLQC PC Pabm O Oabm
Sunspot numbers 315 49.68 40.24 0.0598 42.49 % 51.78 % 0.8555 1.5014
20 M.M. Mansor et al.

4.2 Threshold Autoregressive Model Fitted by Least Squares

We first consider AR(p) models of orders 1, 2, and 9. AR(9) corresponds to


the lowest Akaike information criterion (AIC) for AR(p) models. The results are
summarized in Table 6. The AR(2) model is a substantial improvement on the AR(1)
model in terms of standard deviation of the errors. An AR(9) and ARIMA(2,0,1)
give some further improvement on AR(2).
We consider the TAR(2) model in (11) with three different thresholds set at the
70 %, 80 %, and 90 % percentiles, respectively. The four parameters of the TAR(2)
model, ˛1L , ˛2L , ˛1U , and ˛2U , are estimated by nonlinear least squares from the
mean adjusted (to 0) time series. Of all the models considered, TAR(2)_90% with
four estimated parameters (TAR(2)[LS]) is the best with an estimated standard
deviation of the errors ( O error ) of 13.94 (Table 2).

4.3 Threshold Autoregressive Model Fitted by Penalized Least


Squares

We fit a TAR(2)_90% model by penalized least squares (TAR(2)[LSP]), finding


the values of ˛1L , ˛2L , ˛1U , and ˛2U that minimize the objective function in (14).
We determine a suitable value of 1 for fitting the TAR(2)[LSP] model to the first
200 sunspot numbers after fixing the value of 2 at 103 (for which O simulated is kept
within 1 % of O observed which is 34.8). We compare all Osimulated values modeled by
TAR(2)[LSP] from a long simulation (length of 2  105 ) to the target skewness,
Oobserved of 0.8344 in Table 3. A combination of 1 D 105 and 2 D 103 provides an

Table 2 Time series models for the sunspot series (1700–1900) compared by est

AR model: AR(1) AR(2) AR(9)


O error 20.34 15.36 14.84
ARIMA model: ARIMA(0,0,1) ARIMA(1,0,1) ARIMA(2,0,1)
O error 21.80 16.60 14.72
TAR model, four parameters: TAR(2)_70% TAR(2)_80% TAR(2)_90%
O error 14.67 14.61 13.94

Table 3 Fitting Osimulated O error


1
TAR(2)[LSP]: Osimulated and
O error for selected 1 ( 2 =103 ) 0 0.1969 14.27
104 0.2623 14.04
105 0.6315 15.47
106 0.8103 16.00
1010 0.8315 16.84
Threshold Autoregressive Models for Directional Time Series 21

280
260
Error Variance
240
220

0.0 0.1 0.2 0.3 0.4


Skewness discrepancy

Fig. 2 Fitting TAR(2)[LSP] to the observed sunspots (1700–1900): trade-off between minimizing
the sum of squared residuals, and minimizing the skewness discrepancy

improved approximation (0.6315) to the target directionality with a relatively small


increase in O error of the TAR(2)[LSP] model.
2
We illustrate the relationship between O error and the squared difference between
Osimulated and Oobserved in Fig. 2. The O error is monotonically increasing with the
increase of the 1 for 1 D 0; 101 ; 102 ; : : : ; 1010 .

4.4 Details of Fitting AR(2), TAR(2)[LS], and TAR(2)[LSP]

The details of fitting AR(2), TAR(2)[LS], and TAR(2)[LSP] models are given in
Tables 4, 5, and 6. The upper and the lower regimes of the TAR(2)[LS] and the
TAR(2)[LSP] in Tables 5 and 6, respectively, are stable AR(2) processes which
satisfy the requirements of the stationary-triangular region ˛2 > 1, ˛1 C ˛2 < 1
and ˛1  ˛2 > 1 [2].
22 M.M. Mansor et al.

Table 4 Sample mean, two O ˛O 1 ˛O 2 O error


coefficients and O error of
AR(2) 44.11 1.3459 0.6575 15.36

Table 5 Sample mean, four estimated parameters, and O error of TAR(2)[LS]


O T ˛1L ˛2L ˛1U ˛2U O error
44.11 51.79 1.5643 0.8117 0.9978 0.3158 13.94

Table 6 Sample mean, four estimated parameters, 1, 2, and O error of TAR(2)[LSP]


O T ˛1L ˛2L ˛1U ˛2U 1 2 O error
44.11 51.79 1.2817 0.4408 1.1429 0.3626 105 103 15.47

Table 7 Test statistics of directionality in the simulated sunspots and the sunspot numbers (1700–
1900)
Series Length Mean sd DLQC PC (%) Pabm (%) O Oabm
Sunspot numbers 200 44:11 34:76 0:0609 43:94 48:48 0:8344 1:0646
TAR(2)[LS_G] 2105 39:95 41:38 0:0088 48:90 50:15 0:1444 0:2939
TAR(2)[LS_R] 2105 44:38 39:25 0:0173 48:26 49:05 0:2721 0:4878
TAR(2)[LSP_R] 2105 45:25 35:18 0:0266 45:50 49:18 0:6293 0:9501

4.5 Simulation to Validate TAR(2)[LS] and TAR(2)[LSP]

A simulation of 2  105 points from the TAR(2)[LS] model with Gaussian errors
(TAR(2)[LS_G]) and with the resampled residuals (R) for TAR(2)[LS_R]; and
TAR(2)[LSP_R] gave the statistics shown in Table 7.
The TAR(2)[LSP_R] gives the best fit to the first 200 observations in the sunspot
series in terms of the statistics that were not included as criteria for fitting.

4.6 Comparisons of One-Step-Ahead Predictions

We compare the one-step-ahead forecasting performance of AR(2) with


TAR(2)[LS] and TAR(2)[LSP] for the years 1901–2014 (Fig. 3). We define the
forecasting performance by measuring the relative errors given by the following
measures (Tables 8).
.actual  predicted/
Relative error .Erel / D ; (15)
actual
j.actual  predicted/j
Absolute relative error .jErel j/ D : (16)
actual
The TAR(2)[LSP] model offers an improvement over the AR(2) and TAR(2)[LS]
in terms one-step-ahead predictions.
Threshold Autoregressive Models for Directional Time Series 23

150

Predicted sunspots by TAR(2)[LSP]


Predicted sunspots by TAR(2)[LS]

150
Predicted sunspots by AR(2)
150

100

100
100

50

50
50

0
0 50 100 150 0 50 100 150 0 50 100 150
Observed sunspots Observed sunspots Observed sunspots

Fig. 3 Comparison of forecast values given by AR(2), TAR(2)[LS], and TAR(2)[LSP] at 90 %


threshold

Table 8 Forecasting measures of predicted sunspots to the sunspot series from 1901 to 2014
Model Mean(Erel ) sd(Erel ) Mean(jErel j) sd(jErel j)
AR(2) 0:6812 2.4777 0:8713 2:4169
TAR(2)[LS] 0:4886 2.3014 0:7512 2:2289
TAR(2)[LSP] 0:5015 2.1897 0:7394 2:1206

4.7 Comparisons of Distributions of 15-Year Extreme Values

We simulate 2  105 values using an AR(2) model with Gaussian errors (AR(2)_G),
TAR(2)[LS_G], TAR(2)[LS_R], and TAR(2)[LSP_R] models. The upper and lower
coefficients for TAR(2)[LS] and TAR(2)[LSP] are the optimized parameters in
Tables 5 and 6 accordingly. We include another two different error distributions for
TAR(2)[LSP] which are the back-to-back Weibull distribution, that is one Weibull
distribution was fitted to the positive residuals and another to the absolute values of
the negative residuals, (WD); and an Extreme Value Type 1 (Gumbel) distribution
of minima (EV).
We refer TAR(2)[LSP] with WD and EV errors as TAR(2)[LSP_WD] and
TAR(2)[LSP_EV], respectively. We calculate the extreme values for every 15
consecutive years in the simulated series of length 2  105 , illustrate with boxplots
(Fig. 4) and provide descriptive statistics (Table 9).
In general, TAR(2)[LSP] models simulate greater extreme values than TAR(2)
[LS] and AR(2) models, as shown by inter-quartile range (IQR) and standard
deviation (sd) in Table 9. Furthermore, 15-year extreme values from TAR(2)[LSP_
DW] have the closest sd, and IQR, to the extreme values from the observed time
series.
24 M.M. Mansor et al.

Table 9 Descriptive statistics of 15-year extreme values in the sunspot series (1700–2014) and
the simulated series for each model with different residuals
Series n Median Mean Max IQR sd Skewness
Sunspot numbers 21 111:0 112:20 190:20 53:90 37:89 0:1158
AR(2)_R 13,333 91:60 94:54 221:40 34:53 26:10 0:5903
TAR(2)[LS_G] 13,333 101:10 99:30 175:40 26:24 20:65 0:2523
TAR(2)[LS_R] 13,333 102:70 102:00 195:10 27:58 21:94 0:0025
TAR(2)[LSP_R] 13,333 91:45 91:46 236:60 40:25 29:82 0:2110
TAR(2)[LSP_WD] 13,333 97:39 100:20 358:80 50:06 40:18 0:7913
TAR(2)[LSP_EV] 13,333 83:77 84:61 255:30 41:35 30:60 0:2973

(a) (b) (c)

200
200

150

150
150

100

100
100

50

50
50

(d) (e) (f)


250
200

300

200
150

150
200
100

100
100
50

50
0

Fig. 4 Boxplot of 15-year extreme values in the simulated series for AR(2) and TAR(2) models.
(a) AR(2)_R; (b) TAR(2)[LS_G]; (c) TAR(2)[LS_R]; (d) TAR(2)[LSP_R]; (e) TAR(2)[LS_WD];
(f) TAR(2)[LSP_EV]

5 Conclusion

There are many ways in which a time series can exhibit directionality, and different
measures are needed to identify these different characteristics. TAR models provide
a piecewise linear approximation to a wide range of nonlinear processes, and offer a
versatile modeling strategy. The sunspot series shows clear directionality, a physical
interpretation of which is given in [5]. We have shown that a nonlinear TAR(2)[LS]
Threshold Autoregressive Models for Directional Time Series 25

model gives an improvement on, out of sample, one-step-ahead predictions made


with an AR(2) model.
With the inclusion of the measure of directionality in the objective function
(12) in the fitting procedures for the TAR(2)_90% model, we are able to reduce
the discrepancy between the observed and the simulated directionality seen in the
TAR(2)[LS model]. Furthermore, we have demonstrated that any consequential
discrepancy in the marginal standard deviations in the fitted model may similarly be
dealt with by the inclusion of the standard deviation term in the improved objective
function (14). This TAR(2)[LSP] model yields improved one-step-ahead predictions
for 115 out-of-sample values.
The use of resampled residuals in simulations of extreme values is unsatisfactory
because the extreme errors in the simulation are restricted to the range of the
residuals. In the case of the sunspots, back-to-back Weibull distributions provided
good fit to the residuals and resulted in far more realistic simulations of 15-year
extreme values.
In summary, we have modeled directionality in the sunspot series by explicitly
using both the measure of directionality and standard deviation as fitting criteria.
The explicit modeling of directionality has provided more accurate forecasting and
more realistic modeling of extreme values.

Acknowledgements We thank the School of Mathematical Sciences at the University of Adelaide


for sponsoring the presentation of this work by Maha Mansor at ITISE 2015 in Granada. We
would also like to thank the Majlis Amanah Rakyat (MARA), a Malaysian government agency
for providing education sponsorship to Maha Mansor at the University of Adelaide, and the SIDC,
World Data Center, Belgium, for data.

References

1. Beare, B.K., Seo, J.: Time irreversible copula-based Markov models. Econ. Theory 30, 1–38
(2012)
2. Chatfield, C.: The Analysis of Time Series: An Introduction, 6th edn., pp. 218–219, 223–224,
44–45. Chapman and Hall/CRC, London/Boca Raton (2004)
3. Lawrance, A.: Directionality and reversibility in time series. Int. Stat. Rev./Revue Internationale
de Statistique 59(1), 67–79 (1991)
4. Mansor, M.M., Green, D.A., Metcalfe, A.V.: Modelling and simulation of directional financial
time series. In: Proceedings of the 21st International Congress on Modelling and Simulation
(MODSIM 2015), pp. 1022–1028 (2015)
5. Mansor, M.M., Glonek, M.E., Green, D.A., Metcalfe, A.V.: Modelling directionality in sta-
tionary geophysical time series. In: Proceedings of the International Work-Conference on Time
Series (ITISE 2015), pp. 755–766 (2015)
6. Nash, J.C.: On best practice optimization methods in R. J. Stat. Softw. 60(2), 1–14 (2014)
7. Solar Influences Data Analysis Center, Sunspot Index and Long-term Solar Observations. http://
www.sidc.be/silso (last accessed 17 October 2015)
8. Soubeyrand, S., Morris, C.E., Bigg, E.K.: Analysis of fragmented time directionality in time
series to elucidate feedbacks in climate data. Environ. Model Softw. 61, 78–86 (2014)
9. Wild, P., Foster, J., Hinich, M.: Testing for non-linear and time irreversible probabilistic structure
in high frequency financial time series data. J. R. Stat. Soc. A. Stat. Soc. 177(3), 643–659 (2014)
Simultaneous Statistical Inference in Dynamic
Factor Models

Thorsten Dickhaus and Markus Pauly

Abstract Based on the theory of multiple statistical hypotheses testing, we elabo-


rate likelihood-based simultaneous statistical inference methods in dynamic factor
models (DFMs). To this end, we work up and extend the methodology of Geweke
and Singleton (Int Econ Rev 22:37–54, 1981) by proving a multivariate central
limit theorem for empirical Fourier transforms of the observable time series. In
an asymptotic regime with observation horizon tending to infinity, we employ
structural properties of multivariate chi-square distributions in order to construct
asymptotic critical regions for a vector of Wald statistics in DFMs, assuming that
the model is identified and model restrictions are testable. A model-based bootstrap
procedure is proposed for approximating the joint distribution of such a vector
for finite sample sizes. Examples of important multiple test problems in DFMs
demonstrate the relevance of the proposed methods for practical applications.

Keywords Empirical Fourier transform • False discovery rate • Family-wise


error rate • Likelihood ratio statistic • Multiple hypothesis testing • Multivariate
chi-square distribution • Resampling • Time series regression • Wald statistic

1 Introduction and Motivation

Dynamic factor models (DFMs) are multivariate time series models of the form
1
X
X.t/ D .s/ f.t  s/ C ".t/; 1  t  T: (1)
sD1

T. Dickhaus ()
Institute for Statistics, University of Bremen, P.O. Box 330 440, 28344 Bremen, Germany
e-mail: [email protected]; http://www.math.uni-bremen.de/~dickhaus
M. Pauly
Institute of Statistics, University of Ulm, Helmholtzstr. 20, 89081 Ulm, Germany

© Springer International Publishing Switzerland 2016 27


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_3
28 T. Dickhaus and M. Pauly

Thereby, X D .X.t/ W 1  t  T/ denotes a p-dimensional, covariance-stationary


stochastic process in discrete time with mean zero, f.t/ D . f1 .t/; : : : ; fk .t//>
with k < p denotes a k-dimensional vector of so-called common factors and
".t/ D ."1 .t/; : : : ; "p .t//> denotes a p-dimensional vector of “specific factors,”
to be regarded as error or remainder terms. Both f.t/ and ".t/ are assumed to
be centered and the error terms are modeled as noise in the sense that they are
mutually uncorrelated at every time point and, in addition, uncorrelated with f.t/ at
all leads and lags. The error terms ".t/ may, however, exhibit nontrivial (weak) serial
autocorrelations. The model dimensions p and k are assumed to be fixed, while the
sample size T may tend to infinity.
The underlying interpretation of model (1) is that the dynamic behavior of the
process X can already be described well (or completely) by a lower-dimensional
“latent” process. The entry .i; j/ of the matrix .s/ quantitatively reflects the
influence of the jth common factor at lead or lag s, respectively, on the ith component
of X.t/, where 1  i  p and 1  j  k. Recently, the case where factor loadings
may depend on covariates was studied in [36, 54], and applications in economics
and neuroimaging were discussed.
A special case of model (1) results if the influence of the common factors on X
is itself without dynamics, i.e., if the model simplifies to

X.t/ D f.t/ C ".t/; 1  t  T: (2)

In [39], methods for the determination of the (number of) common factors in a
factor model of the form (2) and a canonical transformation allowing a parsimonious
representation of X.t/ in (2) in terms of the common factors were derived. Statistical
inference in static factor models for longitudinal data has been studied, for instance,
in [28], where an algorithm for computing maximum likelihood estimators (MLEs)
in models with factorial structure of the covariance matrix of the observables was
developed. For further references and developments regarding the theory and the
interrelations of different types of (dynamic) factor models we refer to [5, 21], and
references therein.
Statistical inference methods for DFMs typically consider the time series in the
frequency domain, cf., among others, [17, 18] and references therein, and analyze
decompositions of the spectral density matrix of X. Nonparametric estimators
of the latter matrix by kernel smoothing have been discussed in [41]. In a
parametric setting, a likelihood-based framework for statistical inference in DFMs
was developed in [19] by making use of central limit theorems for time series
regression in the frequency domain, see [23]. The inferential considerations in
[19] rely on the asymptotic normality of the MLE #O of the (possibly very high-
dimensional) parameter vector # in the frequency-domain representation of the
model. We will provide more details in Sect. 3. To this end, it is essential that
the time series model (1) is identified in the sense of [19], which we will assume
throughout the paper. If the model is not identified, the individual contributions of
the common factors cannot be expressed unambiguously and, consequently, testing
SSI in DFMs 29

for significance or the construction of confidence sets for elements of # is obviously


not informative.
In the present work, we will extend the methodology of Geweke and Singleton
[19]. Specifically, we will be concerned with simultaneous statistical inference in
DFMs under the likelihood framework by considering families of linear hypotheses
regarding parameters of the frequency-domain representation of (1). As we will
demonstrate in Sect. 3, the following two problems, which are of practical interest,
are examples where our methodology applies.
Problem 1 Which of the specific factors have a nontrivial autocorrelation
structure? Solving this problem is substantially more informative than just testing a
single specific factor for trivial autocorrelations as considered in [19]. In practical
applications, typically stylized facts regarding the dynamics of the observable
process are available from expert knowledge. While the common factors capture the
cross-sectional dependencies in X, its autocorrelation structure is influenced by the
specific factors. Therefore, the solution to Problem 1 can be utilized for the purpose
of model diagnosis in the spirit of a residual analysis; cf. also, among others, [6, 16].
Problem 2 Which of the common factors have a lagged influence on X? In many
economic applications, it is informative if certain factors (such as interventions)
have an instantaneous or a lagged effect. By solving Problem 2, this can be
answered for several of the common factors simultaneously, accounting for the
multiplicity of the test problem.
Solving problems of these types requires multiple testing of several hypotheses
simultaneously. In our case, likelihood ratio statistics (or, asymptotically equiva-
lently, Wald statistics) will build the basis for the respective decision rules.
The paper is organized as follows. In Sect. 2, we provide a brief introduction to
multiple testing, especially under positive dependence. In particular, we will analyze
structural properties of multivariate chi-square distributions and provide a numerical
assessment of type I error control for standard multiple tests when applied to
vectors of multivariate chi-square distributed test statistics. This section is meant to
contribute to multiple testing theory and practice in general. Although it is known for
a longer time that the components of a multivariate chi-square distributed random
vector necessarily exhibit pairwise positive correlations, such vectors in general
do not fulfill higher-order dependency concepts like multivariate total positivity
of order 2 (MTP2 ), cf. Example 3.2. in [29]. However, for instance the extremely
popular linear step-up test from [1] for control of the false discovery rate (FDR) is
only guaranteed to keep the FDR level strictly if the vector of test statistics or p-
values, respectively, is MTP2 (or at least positively regression dependent on subsets,
PRDS). Hence, a question of general interest is how this and related tests behave for
multivariate chi-square distributed vectors of test statistics. Section 3 demonstrates
how such vectors of test statistics arise naturally in connection with likelihood-
based solutions to simultaneous inference problems for DFMs of the form (1) when
the observation horizon T tends to infinity. To this end, we revisit and extend the
methodology of Geweke and Singleton [19]. Specifically, we prove a multivariate
30 T. Dickhaus and M. Pauly

central limit theorem for empirical Fourier transforms of the observable time series.
The asymptotic normality of these Fourier transforms leads to the asymptotic
multivariate chi-square distribution of the considered vector of Wald statistics. In
Sect. 4, we propose a model-based resampling scheme for approximating the finite-
sample distribution of this vector of test statistics. We conclude with a discussion in
Sect. 5.

2 Multiple Testing

The general setup of multiple testing theory assumes a statistical model .˝; F ,
.P# /#2 / parametrized by # 2  and is concerned with testing a family H D
.Hi ; i 2 I/ of hypotheses regarding the parameter # with corresponding alternatives
Ki D  n Hi , where I denotes an arbitrary index set. We identify hypotheses
with subsets of the parameter space throughout the paper. Let ' D .'i ; i 2 I/
be a multiple test procedure for H, meaning that each component 'i , i 2 I, is a
(marginal) test for the test problem Hi versus Ki in the classical sense. Moreover,
let I0  I0 .#/  I denote the index set of true hypotheses in P H and V.'/
the number of false rejections (type I errors) of ', i.e., V.'/ D i2I0 'i : The
classical multiple type I error measure in multiple hypothesis testing is the family-
wise error rate, FWER for short, and can (for a given # 2 ) be expressed as
FWER# .'/ D P# .V.'/ > 0/. The multiple test ' is said to control the FWER
at a predefined significance level ˛, if sup#2 FWER# .'/  ˛. A simple, but
often conservative method for FWER control is based on the union bound and is
referred to as Bonferroni correction in the multiple testing literature. Assuming that
jIj D m, the Bonferroni correction carries out each individual test 'i ; i 2 I, at
(local) level ˛=m. The “Bonferroni test” ' D .'i ; i 2 I/ then controls the FWER.
In case that joint independence of all m marginal test statistics can be assumed,
the Bonferroni-corrected level ˛=m can be enlarged to the “Šidák-corrected” level
1  .1  ˛/1=m > ˛=m leading to slightly more powerful (marginal) tests. Both
the Bonferroni and the Šidák test are single-step procedures, meaning that the same
local significance level is used for all m marginal tests.
An interesting other class of multiple test procedures are stepwise rejective tests,
in particular step-up-down (SUD) tests, introduced in [50]. They are most conve-
niently described in terms of p-values p1 ; : : : ; pm corresponding to test statistics
T1 ; : : : ; Tm . It goes beyond the scope of this paper to discuss the notion of p-values
in depth. Therefore, we will restrict attention to the case that every individual null
hypothesis is simple, the distribution of every Ti , 1  i  m, under Hi is continuous,
and each Ti tends to larger values under alternatives. The test statistics considered
in Sect. 3 fulfill these requirements, at least asymptotically. Then, we can calculate
(observed) p-values by pi D 1  Fi .ti /, 1  i  m, where Fi is the cumulative
distribution function (cdf) of Ti under Hi and ti denotes the observed value of Ti .
The transformation with the upper tail cdf brings all test statistics to a common
SSI in DFMs 31

scale, because each p-value is supported on Œ0; 1. Small p-values are in favor of the
corresponding alternatives.
Definition 1 (SUD Test of Order œ in Terms of p-Values, cf. [15]) Let p1Wm <
p2Wm <    < pmWm denote the ordered p-values for a multiple test problem. For a
tuning parameter  2 f1; : : : ; mg an SUD test '  D .'1 ; : : : ; 'm / (say) of order 
based on some critical values ˛1Wm      ˛mWm is defined as follows. If pWm 
˛Wm , set j D maxf j 2 f; : : : ; mg W piWm  ˛iWm for all i 2 f; : : : ; jgg, whereas for
pWm > ˛Wm , put j D supf j 2 f1; : : : ;   1g W pjWm  ˛jWm g .sup ; D 1/. Define
'i D 1 if pi  ˛j Wm and 'i D 0 otherwise .˛1Wm D 1/.
An SUD test of order  D 1 or  D m, respectively, is called step-down (SD)
or step-up (SU) test, respectively. If all critical values are identical, we obtain a
single-step test.
In connection with control of the FWER, SD tests play a pivotal role, because
they can often be considered a shortcut of a closed test procedure, cf. [33]. For
example, the famous SD procedure of Holm [24] employing critical values ˛iWm D
˛=.m  i C 1/, 1  i  m is, under the assumption of a complete system of
hypotheses, a shortcut of the closed Bonferroni test (see, for instance, [49]) and
hence controls the FWER at level ˛.
In order to compare concurring multiple test procedures, also a type II error
measure or, equivalently, a notion of power is required under the multiple testing
framework. To this end, following
P Definition 1.4 of [8], we define I1  I1 .#/ D
I n I0 , m1 D jI1 j, S.'/ D i2I1 'i and refer to the expected proportion of correctly
detected alternatives, i.e., power# .'/ D E# ŒS.'/= max.m1 ; 1/, as the multiple
power of ' under #, see also [34]. If the structure of ' is such that 'i D 1pi t
for a common, possibly data-dependent threshold t , then the multiple power of '
is increasing in t . For SUD tests, this entails that index-wise larger critical values
lead to higher multiple power.

2.1 Multiple Testing Under Positive Dependence

Gain in multiple power under the constraint of FWER control is only possible
if certain structural assumptions for the joint distribution of . p1 ; : : : ; pm /> or,
equivalently, .T1 ; : : : ; Tm /> can be established, cf. Example 3.1 in [10]. In particular,
positive dependency among p1 ; : : : ; pm in the sense of MTP2 (see [29]) or PRDS (see
[2]) allows for enlarging the critical values .˛iWm /1im . To give a specific example,
it was proved in [45] that the critical values ˛iWm D i˛=m, 1  i  m, can be used
as the basis for an FWER-controlling closed test procedure, provided that the joint
distribution of p-values is MTP2 . These critical values have originally been proposed
T
in [48] in connection with a global test for the intersection hypothesis H0 D m iD1 Hi
and are therefore often referred to as Simes’ critical values. In [25] a shortcut for the
aforementioned closed test procedure based on Simes’ critical values was worked
out; we will refer to this multiple test as ' Hommel in the remainder of this work.
32 T. Dickhaus and M. Pauly

Simes’ critical values also play an important role in connection with control of
the FDR. The FDR is a relaxed type I error measure suitable for large systems of
hypotheses. Formally, it is defined as FDR# .'/ D E# ŒFDP.'/, where FDP.'/ D
V.'/= max.R.'/; 1/ with R.'/ D V.'/ C S.'/ denoting the total number of
rejections of ' under #. The random variable FDP.'/ is called the false discovery
proportion. The meanwhile classical linear step-up test from [1], ' LSU (say), is an
SU test with Simes’ critical values. Under joint independence of all p-values, it
provides FDR-control at (exact) level m0 ˛=m, where m0 D mm1 , see, for instance,
[14]. In [2, 46] it was independently proved that FDR# .' LSU /  m0 .#/˛=m for
all # 2  if the joint distribution of . p1 ; : : : ; pm /> is PRDS on I0 (notice that
MTP2 implies PRDS on any subset). The multiple test ' LSU is the by far most
popular multiple test for FDR control and is occasionally even referred to as the
FDR procedure in the literature.

2.2 Multivariate Chi-Square Distributed Test Statistics

Asymptotically, the vectors of test statistics that are appropriate for testing the
hypotheses we are considering in the present work follow under H0 a multivariate
chi-square distribution in the sense of the following definition.
Definition 2 Let m 2 and  D .1 ; : : : ; m /> be a vector of P positive
integers. Let .Z1;1 ; : : : ; Z1;1 ; Z2;1 ; : : : ; Z2;2 ; : : : ; Zm;1 ; : : : ; Zm;m / denote m
kD1 k
jointly normally distributed random variables with joint correlation matrix
R D ..Zk1 ;`1 ; Zk2 ;`2 / W 1  k1 ; k2  m; 1  `1  k1 ; 1  `2  k2 / such
that for any 1  k  m the random vector Zk D .Zk;1 ; : : : ; Zk;k /> has a standard
normal distribution on Rk . Let Q D .Q1 ; : : : ; Qm /> , where
k
X
2
Qk D Zk;` for all 1  k  m: (3)
`D1

Then we call the distribution of Q a multivariate (central) chi-square distribution (of


generalized Wishart-type) with parameters m, , and R and write Q 2 .m; ; R/.
Well-known special cases arise if all marginal degrees of freedom are identical,
i.e., 1 D 2 D    D m   and the vectors .Z1;1 ; : : : ; Zm;1 /> , .Z1;2 ; : : : ; Zm;2 /> ,
: : :, .Z1; ; : : : ; Zm; /> are independent random vectors. If, in addition, the correlation
matrices among the m components of these latter  random vectors are all identical
and equal to ˙ 2 Rmm (say), then the distribution of Q is that of the diagonal
elements of a Wishart-distributed random matrix S Wm .; ˙/. This distribution
is for instance given in Definition 3.5.7 of [51]. The case of potentially different
correlation matrices ˙1 ; : : : ; ˙ has been studied in [27]. Multivariate chi-square
distributions play an important role in several multiple testing problems. In Sect. 3
below, they occur as limiting distributions of vectors of Wald statistics. Other
SSI in DFMs 33

applications comprise statistical genetics (analysis of many contingency tables


simultaneously) and multiple tests for Gaussian variances; see for instance [9, 10]
for more details and further examples.
The following lemma shows that among the components of a (generalized)
multivariate chi-square distribution only nonnegative pairwise correlations can
occur.
Lemma 1 Let Q 2 .m; ; R/. Then, for any pair of indices 1  k1 ; k2  m it
holds
p
0  Cov.Qk1 ; Qk2 /  2 k1 k2 : (4)

Proof Without loss of generality, assume k1 D 1 and k2 D 2. Simple probabilistic


calculus now yields
0 1
X1 2
X
Cov.Q1 ; Q2 / D Cov @ 2
Z1;i ; 2 A
Z2;j
iD1 jD1

X 2
1 X 1 X
X 2
2 2
D Cov.Z1;i ; Z2;j /D2 2 .Z1;i ; Z2;j / 0:
iD1 jD1 iD1 jD1

The upper bound in (4) follows directly from the Cauchy–Schwarz inequality,
because the variance of a chi-square distributed random variable with  degrees
of freedom equals 2.
In view of the applicability of multiple test procedures for positively dependent
test statistics that we have discussed in Sect. 2.1, Lemma 1 points into the right direc-
tion. However, as outlined in the introduction, the MTP2 property for multivariate
chi-square or, more generally, multivariate gamma distributions could up to now
only be proved for special cases as, for example, exchangeable gamma variates (cf.
Example 3.5 in [29], see also [47] for applications of this type of multivariate gamma
distributions in multiple hypothesis testing). Therefore and especially in view of the
immense popularity of ' LSU we conducted an extensive simulation study of FWER
and FDR control of multiple tests suitable under MTP2 (or PRDS) in the case that
the vector of test statistics follows a multivariate chi-square distribution in the sense
of Definition 2. Specifically, we investigated the shortcut test ' Hommel for control of
the FWER and the linear step-up test ' LSU for control of the FDR and considered
the following correlation structures among the variates .Zk;` W 1  k  m/ for any
given 1  `  maxfk W 1  k  mg. (Since only the coefficients of determination
enter the correlation structure of the resulting chi-square variates, we restricted our
attention to positive correlation coefficients among the Zk;` .)
1. Autoregressive, AR.1/: ij D jijj ,  2 f0:1; 0:25; 0:5; 0:75; 0:9g.
2. Compound symmetry (CS): ij D C.1/1fiDjg ,  2 f0:1; 0:25; 0:5; 0:75; 0:9g.
34 T. Dickhaus and M. Pauly

3. Toeplitz: ij D jijjC1 , with 1  1 and 2 ; : : : ; m randomly drawn from the
interval Œ0:1; 0:9.
4. Unstructured (UN): The ij are elements of a normalized realization of a Wishart-
distributed random matrix with m degrees of freedom and diagonal expectation.
The diagonal elements were randomly drawn from Œ0:1; 0:9m .
In all four cases, we have ij D Cov.Zi;` ; Zj;` /, 1  i; j  m , where m D
jf1  k  m W k ` gj. The marginal degrees of freedom .k W 1  k  m/
have been drawn randomly from the set f1; 2; : : : ; 100g for every simulation setup.
In this, we chose decreasing sampling probabilities of the form =. C 1/, 1 
  100, where  denotes the norming constant, because we were most interested
in the small-scale behavior of ' Hommel and ' LSU under dependency. For the number
of marginal test statistics, we considered m 2 f2; 5; 10; 50; 100g, and for each such
m several values for the number m0 of true hypotheses. For all false hypotheses,
we set the corresponding p-values to zero, because the resulting so-called Dirac-
uniform configurations are assumed to be least favorable for ' Hommel and ' LSU , see,
for instance, [4, 14]. For every simulation setup, we performed M D 1000 Monte
Carlo repetitions of the respective multiple test procedures and estimated the FWER
or FDR, respectively, by relative frequencies or means, respectively. Our simulation
results are provided in the appendix of Dickhaus [7].
To summarize the findings, ' Hommel behaved remarkably well over the entire
range of simulation setups. Only in a few cases, it violated the target FWER level
slightly, but one has to keep in mind that Dirac-uniform configurations correspond
to extreme deviations from the null hypotheses which are not expected to be
encountered in practical applications. In line with the results in [2, 46], ' LSU
controlled the FDR well at level m0 ˛=m (compare with the bound reported at the end
of Sect. 2). One could try to diminish the resulting conservativity for small values of
m0 either by pre-estimating m0 and plugging the estimated value m O 0 into the nominal
level, i.e., replacing ˛ by m˛=m O 0 , or by employing other sets of critical values. For
instance, in [14, 15] nonlinear critical values were developed, with the aim of full
exhaustion of the FDR level for any value of m0 under Dirac-uniform configurations.
However, both strategies are up to now only guaranteed to work well under the
assumption of independent p-values and it would need deeper investigations of their
validity under positive dependence. Here, we can at least report that we have no
indications that ' LSU may not keep the FDR level under our framework, militating
in favor of applying this test for FDR control in the applications that we will consider
in Sect. 3.
Remark 1 A different way to tackle the aforementioned problem of lacking higher-
order dependency properties is not to rely on the asymptotic Q 2 .m; ; R/ (where
R is unspecified), but to approximate the finite-sample distribution of test statistics,
for example by means of appropriate resampling schemes. Resampling-based SD
tests for FWER control have been worked out in [42, 43, 52]. Resampling-based
FDR control can be achieved by applying the methods from [53, 56] or [44], among
others. We will return to resampling-based multiple testing in the context of DFMs
in Sect. 4.
SSI in DFMs 35

3 Multiple Testing in DFMs

In order to maintain a self-contained presentation, we first briefly summarize some


essential techniques and results discussed in previous literature.
Lemma 2 The spectral density matrix SX (say) of the observable process X can be
decomposed as

SX .!/ D Q .!/Sf .!/ Q .!/0 C S" .!/;   !  ; (5)


P1
where Q .!/ D sD1 .s/ exp.i!s/ and the prime stands for transposition and
conjugation.
Proof The assertion follows immediately by plugging the representation
1
X 1
X
X .u/ D EŒX.t/X.t C u/>  D .s/ f .u C s  v/ .v/> C " .u/
sD1 vD1

for the autocovariance function of X into the formula


1
X
SX .!/ D .2/1 X .u/ exp.i!u/:
uD1

The identifiability conditions mentioned in Sect. 1 can be plainly phrased by


postulating that the representation in (5) is unique (up to scaling). All further
methods in this section rely on the assumption of an identified model and on
asymptotic considerations as T ! 1. To this end, we utilize a localization
technique which is due to [22, 23]; see also [19]. We consider a scaled version of
the empirical (finite) Fourier transform of X. Evaluated at harmonic frequencies, it
is given by

X
T
Q j / D .2T/1=2
X.! X.t/ exp.it!j /; !j D 2j=T; T=2 < j  bT=2c:
tD1

For asymptotic inference with respect to T, we impose the following additional


assumptions.

Assumption 1 There exist B disjoint frequency bands ˝1 ; : : : ; ˝B , such that SX


can be assumed approximately constant and different from zero within each of these
bands. Let ! .b/ 62 f0; g denote the center of the band ˝b , 1  b  B.
Notice that in [19] an assumption similar to Assumption 1 has been made. As
in [19, 23], we will denote by nb D nb .T/ a number of harmonic frequencies
.!j;b /1jnb of the form 2ju =T which are as near as possible to ! .b/ , 1  b  B.
In this, the integers ju ; 1  u  nb , in !j;b D 2ju =T are chosen in successive order
36 T. Dickhaus and M. Pauly

Q j;b //j one of


of closeness to the center. To derive a weak convergence result for .X.!
the following two additional assumptions, which are due to [22, 23], is needed.
Assumption 2 The process X is a generalized linear process of the form
1
X
X.t/ D A. j/ .t  j/; (6)
jD1

where the process . t /t isP


independent and identically distributed (i.i.d.) white noise
and A. j/ 2 Rpp fulfills j kA. j/k2 < 1:
Assumption 3 The best linear predictor of X.t/ is the best predictor of X.t/, both
in the least squares sense, given the past of the process.
Notice that under Assumption 3 we can also represent X as a linear process of
the form
1
X
X.t/ D A. j/e.t  j/; (7)
jD0

where A. j/ 2 Rpp and the process .et /t is uncorrelated white noise, see [22]. The
representations of X in (6) and (7) justify the term “white noise factor score model”
(WNFS) which has been used, for instance, in [35].
D
Throughout the remainder, we denote convergence in distribution by !.
Theorem 1 Suppose that Assumption 1 and one of the following two conditions
hold true:
(a) Assumption 2 is fulfilled.
(b) Assumption 3 holds and the A. j/ in the representation (7) fulfill

1
X
kA. j/k < 1: (8)
jD0

Then we have weak convergence

D
Q j;b //1jnb ; 0N / ! .Zj;b /j2N ;
..X.! min.nb .T/; T/ ! 1; (9)

Q j;b //1jnb
where the left-hand side of (9) denotes the natural embedding of .X.!
p N
into .R / and .Zj;b /j2N is a sequence of independent random vectors, each of
which follows a complex normal distribution with mean zero and covariance matrix
SX .! .b/ /.
Proof Following [3], p. 29 f., it suffices to show convergence of finite-dimensional
margins. Recall that the indices ju ; 1  u  nb , are chosen in successive order of
SSI in DFMs 37

closeness of !j;b D 2ju =T to the center ! .b/ . Hence, under Assumptions 1 and 2,
this convergence follows from Theorem 4.13 in [22] together with the continuous
mapping theorem. In the other case, the convergence in (9) is a consequence of
Theorem 3 in [23], again applied together with the continuous mapping theorem.
Remark 2
1. It is well known that (8) entails ergodicity of X.
2. Actually, Theorem 1 holds under slightly weaker conditions; see [23] for details.
Moreover, in [38] the weak convergence of the finite Fourier transform X Q has
recently been studied under different assumptions.
3. While (7) or (6) may appear structurally simpler than (1), notice that the involved
coefficient matrices A. j/ have (potentially much) higher dimensionality than
.s/ in (1).
4. In practice, it seems that the bands ˝b as well as the numbers nb have to be chosen
adaptively. To avoid frequencies at the boundary of ˝b , choosing nb D o.T/
seems appropriate.
Let the parameter vector #b contain all d D 2pk C k2 C p distinct parameters in
Q .! .b/ /, Sf .! .b/ / and S" .! .b/ /, where each of the (in general) complex elements
in Q .! .b/ / and Sf .! .b/ / is represented by a pair of real components in #b ,
corresponding to its real part and its imaginary part. The full model dimension
is consequently equal to Bd. For convenience and in view of Lemma 2, we write
with slight abuse of notation #b D vech.SX .! .b/ //, and ivech.#b / D SX .! .b/ /. The
above results motivate to study the (local) likelihood function of the parameter #b
for a given realization X D x of the process (from which we calculate X Q D xQ /. In
frequency band ˝b , it is given by
0 1
X
nb
`b .#b ; x/ D  pnb jivech.#b /jnb exp @ xQ .!j;b /0 ivech.#b /1 xQ .!j;b /A I
jD1

see [20]. Optimization of the B local (log-) likelihood functions requires to solve a
system of d nonlinear (in the parameters contained in #b ) equations of the form

2SX1 .SX  S/SX1 Q Sf D 0;


2 Q 0 SX1 .SX  S/SX1 Q D 0;
diag.SX1 .SX  S/SX1/ D 0;

where we dropped the argument ! .b/ in SX , Q , and Sf , and introduced

X
nb
S D .nb /1 xQ .!j;b /Qx.!j;b /0 :
jD1
38 T. Dickhaus and M. Pauly

To this end, the algorithm originally developed in [28] for static factor models
can be used (where formally covariance matrices are replaced by spectral density
matrices, cf. [19], and complex numbers are represented by two-dimensional vectors
in each optimization step). The algorithm delivers not only the numerical value of
the MLE #O b , but additionally an estimate VO b of the covariance matrix Vb (say) of
p O
nb #b . In view of Theorem 1 and standard results from likelihood theory (cf., e.g.,
Sect. 12.4 in [32]) concerning asymptotic normality of MLEs, it appears reasonable
to assume that
p O D
nb .#b  #b / ! Tb Nd .0; Vb /; 1bB (10)

as min.nb .T/; T/ ! 1, where the multivariate normal limit random vectors Tb are
independent for 1  b  B, and that VO b is a consistent estimator of Vb , which
we will assume throughout the remainder. This, in connection with the fact that the
vectors #O b , 1  b  B, are asymptotically jointly uncorrelated with each other, is
very helpful for testing linear (point) hypotheses. Such hypotheses are of the form
H W C# D  with a contrast matrix C 2 RrBd ,  2 Rr and # consisting of all
elements of all the vectors #b . In [19] the usage of Wald statistics has been proposed
in this context. The Wald statistic for testing H is given by

O > /C .C#O  /;


W D N.C#O  /> .CVC (11)
P
where N D BbD1 nb , VO is the block matrix built up from the band-specific matrices
N VO b =nb , 1  b  B, and AC denotes the Moore–Penrose pseudo inverse of a
matrix A.
Theorem 2 Under the above assumptions, W is asymptotically 2 -distributed with
rank.C/ degrees of freedom under the null hypothesis H, provided that V is positive
definite and N=nb  K < 1 for all 1  b  B.
Proof The assertion follows from basic central limit theorems for quadratic forms;
see, for example, Theorem 9.2.2 in [40] or Formula (2.6) in [37].
In the remainder of this section, we return to the two exemplary simultaneous
statistical inference problems outlined in Problems 1 and 2 and demonstrate that
they can be formalized by families of linear hypotheses regarding (components of)
# which in turn can be tested employing the statistical framework that we have
considered in Sect. 2.
Lemma 3 (Problem 1 Revisited) In the notational framework of Sect. 2, we have
m D p, I D f1; : : : ; pg and for all i 2 I we can consider the linear hypothesis
Hi W CDunnett s"i D 0. The contrast matrix CDunnett is the “multiple comparisons with
a control” contrast matrix with B  1 rows and B columns, where in each row j
the first entry equals C1, the . j C 1/th entry equals 1, and all other entries are
equal to zero. The vector s"i 2 RB consists of the values of the spectral density
matrix S" corresponding to the ith noise component, evaluated at the B centers
SSI in DFMs 39

.! .b/ W 1  b  B/ of the chosen frequency bins. Denoting the subvector of #O


that corresponds to s"i by sO"i , the ith Wald statistic is given by
h iC
Wi D .CDunnett sO"i /> CDunnett VO "i CDunnett
>
.CDunnett sO"i /;

where VO "i D diag. O "2i .! .b/ / W 1  b  B/. Then, under Hi , Wi asymptotically


follows a 2 -distribution with B  1 degrees of freedom if the corresponding
limit matrix V"i is assumed to be positive definite. Considering the vector W D
.W1 ; : : : ; Wp /> of all p Wald statistics corresponding to the p specific factors in the
asympt:
model, we finally have W 2 .p; .B  1; : : : ; B  1/> ; R/ under the intersection
H0 of the p hypotheses H1 ; : : : ; Hp , with some correlation matrix R. This allows to
employ the multiple tests considered in Sect. 2 for solving Problem 1.
Lemma 4 (Problem 2 Revisited) As done in [19], we formalize the hypothesis that
common factor j has a purely instantaneous effect on Xi , 1  j  k, 1  i  p, in
the spectral domain by

Hij W j Q ij j2 is constant across the B frequency bands:

In an analogous manner to the derivations in Lemma 3, the contrast matrix CDunnett


can be used as the basis to construct a Wald statistic Wij . The vector W D .Wij W
1  i  p; 1  j  k/ then asymptotically follows a multivariate chi-square
distribution with B1 degrees of freedom in each marginal under the corresponding
null hypotheses and we can proceed as mentioned in Lemma 3.
Many other problems of practical relevance can be formalized analogously by
making use of linear contrasts and thus, our framework applies to them, too.
Furthermore, the hypotheses of interest may also refer to different subsets of
f1; : : : ; Bg. In such a case, the marginal degrees of freedom for the test statistics
are not balanced, as considered in the general Definition 2 and in our simulations
reported in Sect. 2.2.

4 Finite-Sample Bootstrap Approximation

It is well known that the convergence of Wald-type statistics to their asymptotic


2 -distribution is rather slow, see [31, 37] and references therein. To address this
problem and to make use of the actual dependency structure of W in the multiple
test procedure, we propose a model-based bootstrap approximation of the finite-
sample distribution of W in (11), given by the following algorithm.
1. Given the data X D x, calculate in each band ˝b the quantities #O b and VO b .
2. For all 1  b  B, generate (pseudo) random numbers which behave like
Nd .#O b ; VO b /.
 i:i:d:
realizations of independent random vectors Z1;b ; : : : ; Znb ;b
40 T. Dickhaus and M. Pauly

Pnb  Pnb
3. For all 1  b  B, calculate #O b D n1
b
O 1
jD1 Zj;b and V b D nb

jD1 .Zj;b 
#O /.Z  #O / .

b

j;b
 >
b
4. Calculate W  D N.#O   #/ O > C> .CVO  C> /C C.#O   #/,
O where #O  and VO  are
constructed in analogy to #O and V.
O
5. Repeat steps 2–4 M times to obtain M pseudo replicates of W  and approximate
the distribution of W by the empirical distribution of these pseudo replicates.
The heuristic justification for this algorithm is as follows. Due to Theorem 1 and
the discussion around (10), it is appropriate to approximate the distribution of the

MLE in band ˝b by means of Z1;b ; : : : ; Znb ;b . Moreover, to capture the structure
of W, we build the MLEs #O  and VO  of the mean and the covariance matrix,
respectively, also in this resampling model. Furthermore, for finite sample sizes it
seems more suitable to approximate the distribution of the quadratic form W by a
statistic of the same structure. Throughout the remainder, we denote convergence in
p
probability by !.
Theorem 3 Under the assumptions of Theorem 2, it holds
p
sup jProb.W   wjX/  Prob.W  wjH/j ! 0; (12)
w2R

where Prob.W   jX/ denotes the conditional cumulative distribution function


(cdf) of W  given X and Prob.W  jH/ the cdf of W under H W C# D .
Proof Throughout k stands for a distance that metrizes weak convergence on
Rk ; k 2 N, for example the Prohorov distance. Moreover, for a random variable
T we denote by L.T/ and L.TjX/ the distribution and conditional distribution of T
given X, respectively. Note, that we have by assumption convergences in probability

of the conditional mean and variance of Z1;b , i.e.,

p p

E.Z1;b jX/ D #O b ! #b 
and Var.Z1;b jX/ D VO b ! Vb :

Moreover, for each fixed 1  b  B and fixed data X, the sequence of random
  4
vectors .Zj;b /j is row-wise i.i.d. with lim sup E.kZ1;b k jX/ < 1 almost surely.
Hence an application of Lyapunov’s multivariate central limit theorem together with
Slutzky’s theorem implies conditional convergence in distribution given the data X
in the sense that
 p  p
d L. nb .#O b  #O b /jX/; L.Tb / ! 0

for all 1  b  B, where L.Tb / D Nd .0; Vb /. Note that, as usual for resampling
mechanisms, the weak convergence originates from the randomness of the bootstrap
procedure given X, whereas the convergence in probability arises from the sample
X. We can now proceed similarly to the proof of Theorem 1 in [37]. Since the
SSI in DFMs 41

p
random vectors nb .#O b  #O b / are also independent within 1  b  B given the data,
the appearing multivariate normal limit vectors Tb ; 1  b  B; are independent as
well. Together with
p the continuous mapping theorem this shows that the conditional
distribution of NC.#O   #/ O given X converges weakly to a multivariate normal
distribution with mean zero and covariance matrix CVC> in probability:
 p  p
r L. NC.#O   #/jX/;
O Nr .0; CVC> / ! 0:

Furthermore, the weak law of large numbers for triangular arrays implies VO b 
p
VO b ! 0: Since all Vb ; 1  b  B; are positive definite, we finally have det.VO b / > 0
almost surely and therefore also det.VO b / > 0 finally almost surely. This, together
with the continuous mapping theorem, implies convergence in probability of the
Moore–Penrose inverses, i.e.,
p
.CVO  C> /C ! .CVC> /C :

Thus another application of the continuous mapping theorem together with Theo-
rem 9.2.2 in [40] shows conditional weak convergence of W  given X to L.WjH/;
the distribution of W under H W C# D , in probability, i.e.,
p
1 .L.W  jX/; L.WjH// ! 0:

The final result is then a consequence of Helly Bray’s theorem and Polya’s uniform
convergence theorem, since the cdf of W is continuous.
Remark 3
1. Notice that the conditional distribution of W  always approximates the null
distribution of W, even if H does not hold true.
2. In view of applications to multiple test problems involving a vector W D
.W1 ; : : : ; Wm /> as in Problem 1 (m D p) and Problem 2 (m D pk), our
resampling approach can be applied as follows. The vector W can be written
as a continuous function g (say) of C#O   and V. O Note that the proof of
O O
Theorem 3 shows that C.#  #/ always approximates the distribution of
C.#O  #/ and VO   VO converges to zero in probability. Thus, we can approximate
the distribution of W D g.C#O  ; V/ O under H0 by W D g.C.#O   #/; O VO  /.
Slutzky’s Theorem, together with the continuous mapping theorem, ensures that
an analogous result to Theorem 3 applies for W . This immediately implies
that multiple test procedures for weak FWER control can be calibrated by the
conditional distribution of W . For strong control of the FWER and for FDR
control, the resampling approach is valid under the so-called subset pivotality
condition (SPC) introduced in [55]. Validity of the SPC heavily relies on the
structure of the function g. For Problems 1 and 2, the SPC is fulfilled, because
every Wi depends on mutually different coordinates of #. O
42 T. Dickhaus and M. Pauly

5 Concluding Remarks and Outlook

First of all, we would like to mention that the multiple testing results with
respect to FWER control achieved in Sects. 2 and 3 also imply (approximate)
simultaneous confidence regions for the parameters of model (1) by virtue of the
extended correspondence theorem, see Section 4.1 of [12]. In such cases (in which
focus is on FWER control), a promising alternative method for constructing a
multiple test procedure is to deduce the limiting joint distribution of the vector
.Q1 ; : : : ; Qm /> (say) of likelihood ratio statistics. For instance, one may follow the
derivations in [30] for the case of likelihood ratio statistics stemming from models
with independent and identically distributed observations. Once this limiting joint
distribution is obtained, simultaneous test procedures like the ones developed in [26]
are applicable.
Second, it may be interesting to assess the variance of the FDP in DFMs, too. For
example in [4, 13] it has been shown that this variance can be large in models with
dependent test statistics. Consequently, it has been questioned if it is appropriate
only to control the first moment of the FDP, because this does not imply a type I
error control guarantee for the actual experiment at hand. A maybe more convincing
concept in such cases is given by control of the false discovery exceedance, see [11]
for a good survey.
A topic relevant for economic applications is a numerical comparison of the
asymptotic multiple tests discussed in Sect. 2 and the bootstrap-based method
derived in Sect. 4. We will provide such a comparison in a companion paper.
Furthermore, one may ask to which extent the results in the present paper can be
transferred to more complicated models where factor loadings are modeled as a
function of covariates like in [36]. To this end, stochastic process techniques way
beyond the scope of our setup are required. A first step may be the consideration
of parametric models in which conditioning on the design matrix will lead to our
framework.
Finally, another relevant multiple test problem in DFMs is to test for cross-
sectional correlations between specific factors. While the respective test problems
can be formalized by linear contrasts in analogy to Lemmas 3 and 4, they cannot
straightforwardly be addressed under our likelihood-based framework, because the
computation of the MLE by means of the system of normal equations discussed in
Sect. 3 heavily relies on the general assumption of cross-sectionally uncorrelated
error terms. Addressing this multiple test problem is therefore devoted to future
research.

Acknowledgements The authors are grateful to Prof. Manfred Deistler for valuable comments
regarding Problem 1. Special thanks are due to the organizers of the International work-conference
on Time Series (ITISE 2015) for the successful meeting.
SSI in DFMs 43

References

1. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful
approach to multiple testing. J. R. Stat. Soc. Ser. B 57(1), 289–300 (1995)
2. Benjamini, Y., Yekutieli, D.: The control of the false discovery rate in multiple testing under
dependency. Ann. Stat. 29(4), 1165–1188 (2001)
3. Billingsley, P.: Convergence of Probability Measures. Wiley, New York, London, Sydney,
Toronto (1968)
4. Blanchard, G., Dickhaus, T., Roquain, E., Villers, F.: On least favorable configurations for
step-up-down tests. Stat. Sin. 24(1), 1–23 (2014)
5. Breitung, J., Eickmeier, S.: Dynamic factor models. Discussion Paper Series 1: Economic
Studies 38/2005. Deutsche Bundesbank (2005)
6. Chiba, M.: Likelihood-based specification tests for dynamic factor models. J. Jpn. Stat. Soc.
43(2), 91–125 (2013)
7. Dickhaus, T.: Simultaneous statistical inference in dynamic factor models. SFB 649 Discussion
Paper 2012-033, Sonderforschungsbereich 649, Humboldt Universität zu Berlin, Germany
(2012). Available at http://sfb649.wiwi.hu-berlin.de/papers/pdf/SFB649DP2012-033.pdf
8. Dickhaus, T.: Simultaneous Statistical Inference with Applications in the Life Sciences.
Springer, Berlin, Heidelberg (2014)
9. Dickhaus, T., Royen, T.: A survey on multivariate chi-square distributions and their applica-
tions in testing multiple hypotheses. Statistics 49(2), 427–454 (2015)
10. Dickhaus, T., Stange, J.: Multiple point hypothesis test problems and effective numbers of tests
for control of the family-wise error rate. Calcutta Stat. Assoc. Bull. 65(257–260), 123–144
(2013)
11. Farcomeni, A.: Generalized augmentation to control false discovery exceedance in multiple
testing. Scand. J. Stat. 36(3), 501–517 (2009)
12. Finner, H.: Testing Multiple Hypotheses: General Theory, Specific Problems, and Relation-
ships to Other Multiple Decision Procedures. Habilitationsschrift. Fachbereich IV, Universität
Trier, Germany (1994)
13. Finner, H., Dickhaus, T., Roters, M.: Dependency and false discovery rate: asymptotics. Ann.
Stat. 35(4), 1432–1455 (2007)
14. Finner, H., Dickhaus, T., Roters, M.: On the false discovery rate and an asymptotically optimal
rejection curve. Ann. Stat. 37(2), 596–618 (2009)
15. Finner, H., Gontscharuk, V., Dickhaus, T.: False discovery rate control of step-up-down tests
with special emphasis on the asymptotically optimal rejection curve. Scand. J. Stat. 39(2),
382–397 (2012)
16. Fiorentini, G., Sentana, E.: Dynamic specification tests for dynamic factor models. CEMFI
Working Paper No. 1306, Center for Monetary and Financial Studies (CEMFI), Madrid (2013)
17. Forni, M., Hallin, M., Lippi, M., Reichlin, L.: The generalized dynamic-factor model:
identification and estimation. Rev. Econ. Stat. 82(4), 540–554 (2000)
18. Forni, M., Hallin, M., Lippi, M., Reichlin, L.: Opening the black box: structural factor models
with large cross sections. Econ. Theory 25, 1319–1347 (2009)
19. Geweke, J.F., Singleton, K.J.: Maximum likelihood “confirmatory” factor analysis of economic
time series. Int. Econ. Rev. 22, 37–54 (1981)
20. Goodman, N.: Statistical analysis based on a certain multivariate complex Gaussian distribu-
tion. An introduction. Ann. Math. Stat. 34, 152–177 (1963)
21. Hallin, M., Lippi, M.: Factor models in high-dimensional time series - a time-domain approach.
Stoch. Process. Appl. 123(7), 2678–2695 (2013)
22. Hannan, E.J.: Multiple Time Series. Wiley, New York, London, Sydney (1970)
23. Hannan, E.J.: Central limit theorems for time series regression. Z. Wahrscheinlichkeitstheor.
Verw. Geb. 26, 157–170 (1973)
24. Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. Theory Appl.
6, 65–70 (1979)
44 T. Dickhaus and M. Pauly

25. Hommel, G.: A stagewise rejective multiple test procedure based on a modified Bonferroni
test. Biometrika 75(2), 383–386 (1988)
26. Hothorn, T., Bretz, F., Westfall, P.: Simultaneous inference in general parametric models. Biom.
J. 50(3), 346–363 (2008)
27. Jensen, D.R.: A generalization of the multivariate Rayleigh distribution. Sankhyā, Ser. A 32(2),
193–208 (1970)
28. Jöreskog, K.G.: A general approach to confirmatory maximum likelihood factor analysis.
Psychometrika 34(2), 183–202 (1969)
29. Karlin, S., Rinott, Y.: Classes of orderings of measures and related correlation inequalities. I.
Multivariate totally positive distributions. J. Multivar. Anal. 10, 467–498 (1980)
30. Katayama, N.: Portmanteau likelihood ratio tests for model selection. Discussion Paper Series
2008–1. Faculty of Economics, Kyushu University (2008)
31. Konietschke, F., Bathke, A.C., Harrar, S.W., Pauly, M.: Parametric and nonparametric bootstrap
methods for general MANOVA. J. Multivar. Anal. 140, 291–301 (2015)
32. Lehmann, E.L., Romano, J.P.: Testing Statistical Hypotheses. Springer Texts in Statistics, 3rd
edn. Springer, New York (2005)
33. Marcus, R., Peritz, E., Gabriel, K.: On closed testing procedures with special reference to
ordered analysis of variance. Biometrika 63, 655–660 (1976)
34. Maurer, W., Mellein, B.: On new multiple tests based on independent p-values and the
assessment of their power. In: Bauer, P., Hommel, G., Sonnemann, E. (eds.) Multiple
Hypothesenprüfung - Multiple Hypotheses Testing. Symposium Gerolstein 1987, pp. 48–66.
Springer, Berlin (1988). Medizinische Informatik und Statistik 70
35. Nesselroade, J.R., McArdle, J.J., Aggen, S.H., Meyers, J.M.: Dynamic factor analysis models
for representing process in multivariate time-series. In: Moskowitz, D.S., Hershberger, S.L.
(eds.) Modeling Intraindividual Variability with Repeated Measures Data: Methods and
Applications, Chap. 9. Lawrence Erlbaum Associates, New Jersey (2009)
36. Park, B.U., Mammen, E., Härdle, W., Borak, S.: Time series modelling with semiparametric
factor dynamics. J. Am. Stat. Assoc. 104(485), 284–298 (2009)
37. Pauly, M., Brunner, E., Konietschke, F.: Asymptotic permutation tests in general factorial
designs. J. R. Stat. Soc., Ser. B. 77(2), 461–473 (2015)
38. Peligrad, M., Wu, W.B.: Central limit theorem for Fourier transforms of stationary processes.
Ann. Probab. 38(5), 2009–2022 (2010)
39. Peña, D., Box, G.E.: Identifying a simplifying structure in time series. J. Am. Stat. Assoc. 82,
836–843 (1987)
40. Rao, C.R., Mitra, S.K.: Generalized Inverse of Matrices and Its Applications. Wiley, New York,
London, Sydney (1971)
41. Robinson, P.: Automatic frequency domain inference on semiparametric and nonparametric
models. Econometrica 59(5), 1329–1363 (1991)
42. Romano, J.P., Wolf, M.: Exact and approximate stepdown methods for multiple hypothesis
testing. J. Am. Stat. Assoc. 100(469), 94–108 (2005)
43. Romano, J.P., Wolf, M.: Stepwise multiple testing as formalized data snooping. Econometrica
73(4), 1237–1282 (2005)
44. Romano, J.P., Shaikh, A.M., Wolf, M.: Control of the false discovery rate under dependence
using the bootstrap and subsampling. TEST 17(3), 417–442 (2008)
45. Sarkar, S.K.: Some probability inequalities for ordered MTP2 random variables: a proof of the
Simes conjecture. Ann. Stat. 26(2), 494–504 (1998)
46. Sarkar, S.K.: Some results on false discovery rate in stepwise multiple testing procedures. Ann.
Stat. 30(1), 239–257 (2002)
47. Sarkar, S.K., Chang, C.K.: The Simes method for multiple hypothesis testing with positively
dependent test statistics. J. Am. Stat. Assoc. 92(440), 1601–1608 (1997)
48. Simes, R.: An improved Bonferroni procedure for multiple tests of significance. Biometrika
73, 751–754 (1986)
SSI in DFMs 45

49. Sonnemann, E.: General solutions to multiple testing problems. Translation of “Sonnemann, E.
(1982). Allgemeine Lösungen multipler Testprobleme. EDV in Medizin und Biologie 13(4),
120–128”. Biom. J. 50, 641–656 (2008)
50. Tamhane, A.C., Liu, W., Dunnett, C.W.: A generalized step-up-down multiple test procedure.
Can. J. Stat. 26(2), 353–363 (1998)
51. Timm, N.H.: Applied Multivariate Analysis. Springer, New York (2002)
52. Troendle, J.F.: A stepwise resampling method of multiple hypothesis testing. J. Am. Stat.
Assoc. 90(429), 370–378 (1995)
53. Troendle, J.F.: Stepwise normal theory multiple test procedures controlling the false discovery
rate. J. Stat. Plann. Inference 84(1–2), 139–158 (2000)
54. van Bömmel, A., Song, S., Majer, P., Mohr, P.N.C., Heekeren, H.R., Härdle, W.K.: Risk
patterns and correlated brain activities. Multidimensional statistical analysis of fMRI data in
economic decision making study. Psychometrika 79(3), 489–514 (2014)
55. Westfall, P.H., Young, S.S.: Resampling-Based Multiple Testing: Examples and Methods
for p-Value Adjustment. Wiley Series in Probability and Mathematical Statistics. Applied
Probability and Statistics. Wiley, New York (1993)
56. Yekutieli, D., Benjamini, Y.: Resampling-based false discovery rate controlling multiple test
procedures for correlated test statistics. J. Stat. Plann. Inference 82(1–2), 171–196 (1999)
The Relationship Between the Beveridge–Nelson
Decomposition and Exponential Smoothing

Víctor Gómez

Abstract In this chapter, two parallel decompositions of an ARIMA model are


presented. Both of them are based on a partial fraction expansion of the model
and can incorporate complex seasonal patterns. The first one coincides with the
well-known Beveridge–Nelson decomposition. The other constitutes an innovations
form of the Beveridge–Nelson decomposition and coincides with many of the usual
additive exponential smoothing models.

Keywords ARIMA models • Beveridge–Nelson decomposition • Exponential


smoothing • Innovations form

1 Introduction

In this chapter, two partial fraction expansions of an ARIMA model are described.
They are based on what is known in electrical engineering as parallel decom-
positions of rational transfer functions of digital filters. The first decomposition
coincides with the one proposed by Beveridge and Nelson [2], henceforth BN, that
has attracted considerable attention in the applied macroeconomics literature, or
is a generalization of it to seasonal models. The second one corresponds to the
innovations form of the BN decomposition.
The two decompositions are analyzed using both state space and polynomial
methods. It is shown that most of the usual additive exponential smoothing models
are in fact BN decompositions of ARIMA models expressed in innovations form.
This fact seems to have passed unnoticed in the literature, although the link between
single source of error (SSOE) state space models and exponential smoothing has
been recognized [6] and used for some time (see, for example, [3]). It is also shown
that these SSOE models are in fact innovations state space models corresponding to
the BN decomposition that defines the model.
The remainder of the chapter is organized as follows. In Sect. 2, the two parallel
decompositions of an ARIMA model are presented. These two decompositions

V. Gómez ()
Ministry of Finance and P.A., Dirección Gral. de Presupuestos, Subdirección Gral. de Análisis y
P.E., Alberto Alcocer 2, 1-P, D-34, 28046 Madrid, Spain
e-mail: [email protected]
© Springer International Publishing Switzerland 2016 47
I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_4
48 V. Gómez

are analyzed using polynomial and state space methods. The connection with
exponential smoothing is studied.

2 Two Parallel Decompositions of an ARIMA Model

In this section, we will consider what is known in digital filtering as parallel


decompositions of rational transfer functions of digital filters [7, pp. 390–395].
These decompositions are based on partial fraction expansions of the transfer
functions. Since a time series following an ARIMA model can be considered as the
result of applying a rational filter to a white noise sequence, parallel decompositions
can also be useful in time series analysis.
Suppose a time series fyt g, t D 1; : : : ; N, that follows a multiplicative seasonal
ARIMA model, for example,

.B/˚.Bn /.r d rnD yt  / D .B/.Bn /at ; (1)

where is the mean of the differenced series, B is the backshift operator, Byt =
yt1 , n is the number of seasons, d = 0; 1; 2, D = 0; 1, r = 1  B is a regular
difference, and rn = 1  Bn is a seasonal difference. Our aim in this section is
to decompose model (1) using two partial fraction expansions. This can be done
using both polynomial and state space methods. However, it is to be emphasized
that the developments of this and the following section are valid for any kind of
ARIMA model, multiplicative seasonal or not. This means that we may consider
general models with complex patterns of seasonality, like for example
" #
Y
N
2 ni 1
.B/ r d
.1 C B C B C    C B /yt  D .B/at ; (2)
iD1

where n1 ; : : : ; nN denote the seasonal periods and the polynomials .z/ and .z/
have all their roots outside the unit circle but are otherwise unrestricted, or even
models with seasonal patterns with non-integer periods.
The following lemma, which we give without proof, is an immediate conse-
quence of the partial fraction expansion studied in algebra and will be useful later.
See also Lemma 1 of [4, p. 528] and the results contained therein.
Lemma 1 Let the general ARIMA model r d .B/yt = .B/at , where the roots of
.z/ are simple and on or outside of the unit circle and .z/ and .z/ have degrees
p and q, respectively. Then, the following partial fraction decomposition holds

.z/ B1 Bd Xp
Ak
D C0 CC1 zC  CCqpd zqpd
C C  C C :
.1  z/ .z/
d 1z .1  z/ kD1 1  pk z
d

(3)
Relationship Between the BN Decomposition and Exponential Smoothing 49

If pk is complex, then Ak is complex as well and the conjugate fraction Ak = .1  pk z/


also appears on the right-hand side. The two terms can be combined into the real
fraction
   
Ak C Ak  Ak pk C Ak pk z
:
1  . pk C pk / z C pk pk z2

After joining complex conjugate fractions, we can express (3) as

.z/ X Bk d X 1
Ak
m
D C0 C C1 z C    C Cqpd zqpd C C
.z/ kD1
.1  z/ k
kD1
1  pk z

X
m2
Dk C E k z
C ;
kD1
1 C Fk z C Gk z2

where the coefficients Ck , Bk , Ak , Dk , Ek , Fk , Gk , and pk are all real.

2.1 Polynomial Methods

It is shown in [4] that a partial fraction expansion of model (1) leads to the BN
decomposition in all cases usually considered in the literature and that, therefore, we
can take this expansion as the basis to define the BN decomposition for any ARIMA
model. To further justify this approach, consider that in the usual BN decomposition,
yt = pt C ct , models for the components are obtained that are driven by the same
innovations of the series. Thus, if the model for the series is .B/yt = .B/at , the
models for the components are of the form p .B/pt = p at and c .B/ct = c at . But
this implies the decomposition

.z/ p .z/ c .z/


D C ;
.z/ p .z/ c .z/

and, since the denominator polynomials on the right-hand side have no roots in com-
mon, the previous decomposition coincides with the partial fraction decomposition
that is unique.
Assuming then that the parallel decomposition of the ARIMA model (1) is the
basis of the BN decomposition, suppose in (1) that p and P are the degrees of the
autoregressive polynomials, .B/ and ˚.Bn /, and q and Q are those of the moving
average polynomials, .B/ and .Bn /. Then, letting  .B/ = .B/˚.Bn /, .B/ =
r d rnD , and   .B/ = .B/.Bn /, supposing for simplicity that there is no mean in (1)
and using Lemma 1, the partial fraction expansion corresponding to model (1) is

  .z/ ˛p .z/ ˛s .z/ ˛c .z/


 .z/.z/
D .z/ C C C  ; (4)
.1  z/ dCD S.z/ .z/
50 V. Gómez

where S.z/ = 1CzC  Czn1 and we have used in (4) the fact that rn = .1B/S.B/.
Here, we have grouped for simplicity several terms in the expansion so that we are
only left with the components in (4). For example,

X
dCD
Bk ˛p .z/
D ;
kD1
.1  z/k .1  z/dCD

etc. Note that the third term on the right of (4) exists only if D > 0. The degrees
of the .z/, ˛p .z/, ˛s .z/, and ˛c .z/ polynomials in (4) are, respectively, maxf0; q 
p  d  nDg, d  1, n  2, and p  1, where p =p C P, q = q C Q, and d =
d C D.
Based on the previous decomposition, we can define several components that
are driven by the same innovations, fat g. The assignment of the terms in (4) to the
different components depends on the roots of the autoregressive polynomials in (1).

For example, the factor .1  z/d , containing the root one, should be assigned to the
trend component, pt , since it corresponds to an infinite peak in the pseudospectrum
of the series at the zero frequency. Since all the roots of the polynomial S.z/
correspond to infinite peaks in the pseudospectrum at the seasonal frequencies, the
factor S.z/ should be assigned to the seasonal component, st .
The situation is not so clear-cut; however, as regards the roots of the autoregres-
sive polynomial, .z/˚.zn /, and in this case the assignment is more subjective. We
will consider for simplicity in the rest of the chapter only a third component, which
will be referred to as “stationary component,” ct . All the roots of .z/˚.zn / will
be assigned to this stationary component. Therefore, this component may include
cyclical and stationary trend and seasonal components.
According to the aforementioned considerations, the SSOE components model

yt D pt C st C ct (5)

can be defined, where pt is the trend, st is the seasonal, and ct is the stationary
component. The models for these components are given by
 
r d pt D ˛p .B/at ; S.B/st D ˛s .B/at ; .B/ct D .B/at ; (6)

where .z/ = .z/  .z/ C ˛c .z/.


Instead of expressing model (1) using the backshift operator, where the time runs
backwards, it is possible to use the forward operator, Fyt = ytC1 , and let the time
run forward. To this end, let m = maxfq; p C D g and r = maxf0; q  p  D g,
where q , p , and D are the degrees of the polynomials   .z/, and  .z/, and .z/
in (4). Then, using again Lemma 1, but with the z1 instead of the z variable, it is
obtained that

zm   .z/ ı.z1 / ˇp .z1 / ˇs .z1 / ˇc .z1 /


D 1 C C  C C  1 ; (7)
zm  .z/.z/ zr .z1  1/d S.z1 / .z /
Relationship Between the BN Decomposition and Exponential Smoothing 51

 
where .z1 / = zp  .z/ and the degrees of the polynomials ı.z1 /, ˇp .z1 /,
ˇs .z1 /, and ˇc .z1 / are, respectively, maxf0; r  1g, d  1, n  2, and p  1.
Transforming each of the terms of the right-hand side of (7) back to the z variable
yields

  .z/ zˇp .z/ zˇs .z/ zˇc .z/


 .z/.z/
D 1 C zı.z/ C  C C  : (8)
.1  z/ d S.z/ .z/

This decomposition is an innovations form of the ARIMA model because if we


multiply both terms of (8) by the innovation, at , we get the equality

yt D at C ytjt1
D at C ptjt1 C stjt1 C ctjt1 ; (9)

where, given a random variable xt , xtjt1 denotes the orthogonal projection of xt onto
fys W s D t  1; t  2; : : :g. If the series is nonstationary, the orthogonal projection
is done onto the finite past of the series plus the initial conditions. The relationship
among the components pt , st , and ct and their predictors, ptjt1 , stjt1 , and ctjt1 ,
can be obtained by computing the decomposition of each component in the forward
operator. For example, if we take the model followed by the trend component given

in (6), r d pt = ˛p .B/at , we can write

zd ˛p .z/ ˇp .z1 /
  D k p C ;
zd .1  z/d .z1  1/d

where kp is a constant. Then, returning to the backward operator,

˛p .z/ zˇp .z/


D kp C (10)
.1  z/d .1  z/d

and multiplying both terms of (10) by the innovation, at , yields

pt D kp at C ptjt1 : (11)

Therefore, ptjt1 follows the model r d ptjt1 D ˇp .B/at1 : In a similar way, we can
prove that there exist constants, k , ks , and kc such that

˛s .z/ zˇs .z/ ˛c .z/ zˇc .z/


.z/ D k C zı.z/; D ks C ;  .z/
D kc C  ;
S.z/ S.z/ .z/
 
st D ks at C stjt1 ; ct D k C kc at C ctjt1 ;
52 V. Gómez

and stjt1 and ctjt1 follow the models



 

S.B/stjt1 D ˇs .B/at1 ; .B/ctjt1 D .B/ı.B/ C ˇc .B/ at1 :

Note that the previous relations imply the equality

1 D k C kp C ks C kc : (12)

An example will help clarify matters. Suppose the ARIMA model

1
r4 yt D 1  B5 at : (13)
2

Then, the BN decomposition is given by the partial fraction decomposition of the


model, that is

1  12 z5 1 1 1 3 1 1 1  12 z
D z C C C :
1  z4 2 81z 81Cz 2 1 C z2

Thus, defining

1 1 1 3 1  12 B 1 1
pt D at ; s1;t D at ; s2;t D at ; ct D at1 ; (14)
1B8 1CB8 1 C B2 2 2

and st = s1;t Cs2;t , the BN decomposition, yt = pt Cst Cct , is obtained. The innovations
form is given by the partial fraction decomposition of the model using the forward
operator, that is

z5  12 1 1 1 1 3 1 1 z1 C 2
D 1 C C  
z1 .z4  1/ 2 z1 8 z1  1 8 z1 C 1 4 z2 C 1
1 1 z 3 z 1 z C 2z2
D 1C zC   : (15)
2 8 1  z 8 1 C z 4 1 C z2
It follows from this that the innovations form is yt = at C ptjt1 C s1;tjt1 C s2;tjt1 C
ctjt1 , where

1 1 1 3
ptjt1 D at1 ; s1;tjt1 D  at1 ;
1B8 1CB8
1 C 2B 1 1
s2;tjt1 D at1 ; ctjt1 D at1 :
1 C B2 4 2
In addition, the following relations hold
1 3 1
pt D at C ptjt1 ; s1;t D at C s1;tjt1 ; s2;t D at C s2;tjt1 ; ct D ctjt1 :
8 8 2
Relationship Between the BN Decomposition and Exponential Smoothing 53

2.2 State Space Methods

There are many ways to put an ARIMA model into state space form. We will use
in this chapter the one proposed by Akaike [1]. If fyt g follows the ARIMA model
.B/yt = .B/at , where .z/ = 1 C 1 z C    CP p zp and .z/ = 0 C 1 z C    C q zq ,
let r = max. p; q C 1/, .z/ = 1 .z/.z/ = 1 jD0 j z and xt;1 = yt , xt;i = ytCi1 
j
Pi2
jD0 j atCi1j , 2  i  r. Then, the following state space representation holds

xt D Fxt1 C Kf at (16)
yt D Hxt ; (17)

where
2 3 2 3
0 1 0  0 0
6 0 0 1  0 7 6 1 7
6 7 6 7
6 :: :: :: : : :: 7 6 7
FD6 : : : : : 7 ; Kf D 6 ::: 7 ; (18)
6 7 6 7
4 0 0 0  1 5 4 r2 5
 r  r1  r2     1 r1

i D 0 if i > p, xt = Œxt;1 ; : : : ; xt;r 0 and H = Œ1; 0; : : : ; 0. Note that we are assuming
that 0 can be different from one, something that happens with the models for the
components in the BN decomposition. The representation (16)–(18) is not minimal
if q > p, but has the advantage that the first element of the state vector is yt (the other
elements of the state vector are the one to r  1 periods ahead forecasts of yt ). This
is particularly useful if yt is not observed, so that this representation is adequate
to put the BN decomposition into state space form. To see this, suppose that the
BN decomposition is yt = pt C st C ct , where fyt g follows the model (1) and the
components follow the models given by (6). Then, we can set up for each component
a state space representation of the form (16)–(18) so that, with an obvious notation,
we get the following representation for fyt g
2 3 2 32 32 3
xp;t Fp 0 0 xp;t1 Kf ;p
4 xs;t 5 D 4 0 Fs 0 5 4 xs;t1 5 4 Kf ;s 5 at (19)
xc;t 0 0 Fc xc;t1 Kf ;c
2 3
  xp;t
yt D Hp Hs Hc 4 xs;t 5 ; (20)
xc;t

where pt = Hp xp;t , st = Hs xs;t , and ct = Hc xc;t . Letting xt = Œx0p;t ; x0s;t ; x0c;t 0 , F =


diag.Fp ; Fs ; Fc /, Kf = ŒKf0;p ; Kf0;s ; Kf0;c 0 , and H = ŒHp , Hs , Hc , we can assume that
the state space representation of fyt g is given by (16) and (17).
54 V. Gómez

To obtain the innovations state space model corresponding to (16) and (17),
where xt = Œx0p;t ; x0s;t , x0c;t 0 , F = diag.Fp ; Fs ; Fc /, Kf = ŒKf0;p ; Kf0;s ; Kf0;c 0 and H = ŒHp ,
Hs , Hc  satisfy (19) and (20), consider first that in terms of the matrices in (16)
and (17) the transfer function, .z/, of model (1) can be expressed as

.z/ D H .I  Fz/1 Kf
D 1 C zH .I  Fz/1 FKf (21)

and thus the following relation holds

HKf D 1 (22)

This relation is the state space equivalent to the polynomial relation (12). We also
get from (21) that

yt D at C H .I  FB/1 FKf at1 ;

where B is the backshift operator. Then, if we define

K D FKf ; (23)

and

xtC1jt D .I  FB/1 FKf at ;

we obtain the following state space representation

xtC1jt D Fxtjt1 C Kat (24)


yt D Hxtjt1 C at : (25)

Note that xtjt1 is the projection of xt onto fyt1 ; yt2 ; : : : ; y1 ; x1 g because


h i
xt D .I  FB/1 Kf at D Kf C z .I  FB/1 FKf at (26)

D xtjt1 C Kf at :

In fact, (26) is the measurement update formula corresponding to (24). Therefore,


ytjt1 = Hxtjt1 and Eqs. (24) and (25) constitute an innovations state space repre-
sentation for yt = pt C st C ct such that yt = ptjt1 C stjt1 C ctjt1 C at .
If the ARIMA model followed by yt is invertible, so is its transfer function. In
this case, by the matrix inversion lemma applied to (21), it is obtained that
 1
.z/1 D 1  zH I  Fp z K;
Relationship Between the BN Decomposition and Exponential Smoothing 55

where Fp D F  KH has all its eigenvalues inside the unit circle. In fact, it can
be shown that the eigenvalues of Fp coincide with the inverses of the roots of the
moving average polynomial of the model, .z/, see, for example, [5, pp. 97–98].
As an example, we will use again model (13). According to the models (14), the
state space form (19) and (20) is
2 3 2 32 3 2 3
pt 1 0 0 0 00 pt1 1=8
6 7 6 0 1 0 00 07 6 7 6 7
6 s1;t 7 6 7 6 s1;t1 7 6 3=8 7
6 7 6 76 7 6 7
6 s2;t 7 60 0 0 1 0 0 7 6 s2;t1 7 6 1=2 7
6 7 D6 76 7C6 7 at (27)
6 s2;tC1jt 7 60 0 1 00 0 7 6 s2;tjt1 7 6 1=4 7
6 7 6 76 7 6 7
4 ct 5 40 0 0 0 0 1 5 4 ct1 5 4 0 5
ctC1jt 0 0 0 0 00 ctjt1 1=2
2 3
pt
6 s 7
6 1;t 7
 6
6 s 7
7
yt D 1 1 1 0 1 0 6 2;t 7 (28)
6 s2;tC1jt 7
6 7
4 ct 5
ctC1jt

Using (23), the innovations state space form is


2 3 2 32 3 2 3
ptC1jt 1 0 0 000 ptjt1 1=8
6s 7 6 0 1 0 0 0 07 6 7 6 7
6 1;tC1jt 7 6 7 6 s1;tjt1 7 6 3=8 7
6 7 6 76 7 6 7
6 s2;tC1jt 7 6 0 0 0 1 0 0 7 6 s2;tjt1 7 6 1=4 7
6 7D6 76 7C6 7 at (29)
6 s2;tC2jt 7 6 0 0 1 0 0 0 7 6 s2;tC1jt1 7 6 1=2 7
6 7 6 76 7 6 7
4 ctC1jt 5 4 0 0 0 0 0 1 5 4 ctjt1 5 4 1=2 5
ctC2jt 0 0 0 000 ctC1jt1 0
2 3
ptjt1
6 s 7
6 1;tjt1 7
 6
6 s 7
7
yt D 1 1 1 0 1 0 6 2;tjt1 7 C at : (30)
6 s2;tC1jt1 7
6 7
4 ctjt1 5
ctC1jt1

Note that the relation HKf = 1 holds and that the matrix Fp = F  KH has all
its eigenvalues inside the unit circle. Note also that the last row in the transition
equation is zero and that, therefore, the last state can be eliminated from the state
space form. In this way, we would obtain a minimal innovations state space form.
Suppose we are given an innovations state space representation (24) and (25),
minimal or not, where xtjt1 = Œx0p;tjt1 ; x0s;tjt1 ; x0c;tjt1 0 , F = diag.Fp , Fs , Fc /, K =
ŒKp0 ; Ks0 ; Kc0 0 and H = ŒHp , Hs , Hc , and we want to obtain the BN decomposition,
56 V. Gómez

yt = pt C st C ct , in state space form. We assume that Fp and Fs are nonsingular


and Fc may be singular or empty. Of course, if Fc is empty, so are xc;tjt1 , Kc and
Hc . Note that Fp and Fs are the matrices containing the unit and the seasonal roots,
respectively, and that if Fc is singular or empty, then .z/ in (4) is nonzero. To
obtain the BN decomposition we distinguish two cases, depending on whether Fc is
singular or empty or nonsingular. If Fc is nonsingular, then we solve for Kf in

FKf D K

to get (16) and (17), where pt = Hp xp;t , st = Hs xs;t , and ct = Hc xc;t . If Fc is singular,
then defining

ct D ctjt1 C kc at ;

where ctjt1 = Hc xc;tjt1 and kc is a constant, we can write

ct 0 Hc ct1 k
D C c at :
xc;tC1jt 0 Fc xc;tjt1 Kc

If Fc is empty, then the previous expressions collapse to ct = kc at . In the following,


we will only consider the case in which Fc is singular, leaving to the reader the
necessary changes if Fc is empty. Thus, if we further define xac;t = Œct ; x0c;tC1jt 0 , Hca
= Œ1; 0, Kca = Œ0; Kc0 0 , Fca = Œ0; Cc , where Cc = ŒHc0 ; Fc0 0 and Hca , Kca and Fca are
conformal with xac;t , we can write
2 3 2 32 3 2 3
xp;tC1jt Fp 0 0 xp;tjt1 Kp
6 7 4 5 6 7 4 5
4 s;tC1jt 5 D 0 Fs 0 4 s;tjt1 5 C Ks at
x x
xac;tC1jt 0 0 Fca xac;tjt1 Kca
2 3
xp;tjt1
 6 7
yt D Hp Hs Hca 4 xs;tjt1 5 C at : (31)
xac;tjt1

Solving for Kfa in

F a Kfa D K a ; H a Kfa D 1;

0
where F a = diag.Fp , Fs , Fca /, K a = ŒKp0 ; Ks0 , Kca 0 , H a = ŒHp , Hs , Hca , Kfa = ŒKf0;p ; Kf0;s ,
0
Kfa;c 0 and Kfa;c = Œkc ; Kf0;c 0 , we get (16), (17), where pt = Hp xp;t , st = Hs xs;t , xc;t = xac;t ,
and ct = Hca xc;t . Note that kc = 1  Hp Kf ;p  Hs Kf ;s and that (16), (17) is not minimal
in this case.
Relationship Between the BN Decomposition and Exponential Smoothing 57

As an example, the reader can verify that if we start with (29) and (30), we
eliminate the last state in those equations, and we follow the previous procedure,
then we get (27) and (28).

2.3 Connection with Exponential Smoothing

There has been lately some interest in using generalized exponential smoothing
models for forecasting, see [3]. These models are SSOE models that once they
are put into state space form they become innovations model of the type we have
considered in earlier sections. The question then arises as to whether these models
have any connection with the models given by a parallel decomposition of an
ARIMA model. It turns out that many of the basic exponential smoothing models
coincide with those corresponding to the BN decomposition and in those cases,
mostly seasonal, where they do not coincide, the exponential smoothing models
have been shown to have some kind of problem that is solved if the models given by
the parallel decomposition are used instead. To see this, suppose first Holt’s linear
model, yt = pt1 C bt1 C at , where

pt D pt1 C bt1 C k1 at (32)


bt D bt1 C k2 at ; (33)

and k1 and k2 are constants. If we substitute (32) into the expression for yt , it is
obtained that yt = pt C .1  k1 / at . In addition, it follows from (32) and (33) that

r 2 pt D k2 at1 C k1 rat :

Therefore, we can write

˛p .B/
yt D C kc a t ; (34)
.1  B/2

where ˛p .z/ = k2 z C k1 .1  z/ and kc = 1  k1 . Since the partial fraction expansion


of the polynomial in the backshift operator on the right-hand side of (34) is

k2 z C k1 .1  z/ k1  k2 k2
2
C kc D C C kc ;
.1  z/ 1z .1  z/2

if we define ct = kc at , then yt = pt Cct , ptjt1 = pt1 Cbt1 , ctjt1 = 0, and yt = ptjt1 C


at . Thus, it is seen that Holt’s linear model is the innovations form corresponding to
the BN decomposition of an ARIMA model,

r 2 yt D .1 C 1 B C 2 B2 /at ; (35)
58 V. Gómez

where 1 = k1 C k2  2 and 2 = 1  k1 . Note that, since k1 and k2 can univocally be


solved in terms of 1 and 2 in the previous expressions, every ARIMA model (35)
can be put in the form of a Holt’s model, yt = ptjt1 C at , where ptjt1 = pt1 C bt1
and pt and bt are given by (32) and (33).
Suppose now Holt–Winters’ model, yt = pt1 C bt1 C stn C at , where

pt D pt1 C bt1 C k1 at (36)


bt D bt1 C k2 at ; (37)
st D stn C k3 at ; (38)

and k1 , k2 , and k3 are constants. There are apparently three unit roots in the model.
However, a closer look will reveal that there are in fact only two unit roots. To
see this, substitute (36) and (38) into the expression for yt to give yt = pt C st C
.1  k1  k3 / at . In addition, it follows from (36)–(38) that

r 2 pt D k2 at1 C k1 rat ; rn st D k3 at :

Then, we can write

˛p .B/ ˛s .B/
yt D 2
C C kc a t ; (39)
.1  B/ .1  B/S.B/

where S.z/ = 1CzC  Czn1 , ˛p .z/ = k2 zCk1 .1z/, ˛s .z/ = k3 , and kc = 1k1 k3 .
The partial fraction expansion of the polynomial in the backshift operator on the
right-hand side of (39) is

k2 z C k1 .1  z/ k3 k1  k2 k2 k3 =n ˇ.z/
2
C C kc D C 2
C C C kc ;
.1  z/ .1  z/S.z/ 1z .1  z/ 1z S.z/
 
where ˇ.z/ = .n  1/ C .n  2/z C    C 2zn3 C zn2 k3 =n. Thus, if we define ct
= kc at , then yt = pt C st C ct , ptjt1 = pt1 C bt1 , stjt1 = stn , ctjt1 = 0, and
yt = ptjt1 C stjt1 C at . Therefore, Holt–Winters’ model is the innovations form
corresponding to the BN decomposition of an ARIMA model of the form

r 2 S.B/yt D .B/at ; (40)

where .z/ is a polynomial of degree n C 1. However, the components are not well
defined because the seasonal component can be further decomposed as

k3 =n ˇ.B/
st D C at ;
1B S.B/

and we see that the first subcomponent should be assigned to the trend because
the denominator has a unit root. To remedy this problem, the seasonal component
Relationship Between the BN Decomposition and Exponential Smoothing 59

should be defined as the second subcomponent only, so that Holt–Winters’ method


should be modified to a model of the form yt = ptjt1 C stjt1 C at , where pt and bt
are given by (36), (37),

X
n1
st D  sti C ˇ.B/at ;
iD1

Pn1
ptjt1 = pt1 C bt1 and stjt1 =  iD1 sti . The model can be simplified if we
assume ˇ.B/ = k3 . Another possibility is to decompose the seasonal component
further according to the partial fraction expansion of its model,

Œn=2
ˇ.z/ X ki;1 C ki;2 z
D (41)
S.z/ iD1
1  2˛i z C z2

where Œx denotes the greatest integer less than or equal to x, ˛i = cos !i and !i =
2i=n is the ith seasonal frequency. If n is even, !n=2 = 2Œn=2=n =  and the
corresponding term in the sum on the right-hand side of (41) collapses to kn=2 =
.1 C z/. This would lead us to a seasonal component of the form

Œn=2
X
st D si;t (42)
iD1

.1  2˛i B C B2 /si;t D .ki;1 C ki;2 B/at ; (43)

where we can assume ki;1 = k1 and ki;2 = k2 for parsimony. It is a consequence of the
partial fraction decomposition that there is a bijection between ARIMA models of
the form (40) and exponential smoothing models, yt = ptjt1 C stjt1 C at , where pt
and st are given by (36), (37), (42), and (43). A solution similar to (42) and (43) has
been suggested by De Livera et al. [3], where they propose for each component, sit ,
the model

si;t cos !i sin !i si;t1 r


D C 1 at : (44)
si;t  sin !i cos !i 
si;t1 r2

It can be shown that both solutions are in fact equivalent. However, in [3,
pp. 1516, 1520] the expression yt = pt1 C bt1 C st1 C at is used. This implies
si;tjt1 = si;t1 in model (44), something that is incorrect. The correct expression
can be obtained using the method described in Sect. 2.2 and, more specifically,
formula (23).
60 V. Gómez

References

1. Akaike, H.: Markovian representation of stochastic processes and its application to the analysis
of autoregressive moving average processes. Ann. Inst. Stat. Math. 26, 363–387 (1974)
2. Beveridge, S., Nelson, C.R.: A new approach to decomposition of economic time series into
permanent and transitory components with particular attention to measurement of the ‘business
cycle’. J. Monet. Econ. 7, 151–174 (1981)
3. De Livera, A.M., Hyndman, R.J., Snyder, R.D.: Forecasting time series with complex seasonal
patterns using exponential smoothing. J. Am. Stat. Assoc. 106, 1513–1527 (2011)
4. Gómez, V., Breitung, J.: The Beveridge-Nelson decomposition: a different perspective with new
results. J. Time Ser. Anal. 20, 527–535 (1999)
5. Hannan, E.J., Deistler, M.: The Statistical Theory of Linear Systems. Wiley, New York (1988)
6. Hyndman, R.J., Koehler, A.B., Snyder, R.D., Grose, S.: A state space framework for automatic
forecasting using exponential smoothing methods. Int. J. Forecast. 18(3), 439–454 (2002)
7. Oppenheim, A.V., Schafer, R.W.: Discrete-Time Signal Processing, 3rd edn. Prentice Hall,
New Jersey (2010)
Permutation Entropy and Order Patterns
in Long Time Series

Christoph Bandt

Abstract While ordinal techniques are commonplace in statistics, they have been
introduced to time series fairly recently by Hallin and coauthors. Permutation
entropy, an average of frequencies of order patterns, was suggested by Bandt and
Pompe in 2002 and used by many authors as a complexity measure in physics,
medicine, engineering, and economy. Here a modified version is introduced, the
“distance to white noise.”
For datasets with tens of thousands or even millions of values, which are
becoming standard in many fields, it is possible to study order patterns separately,
determine certain differences of their frequencies, and define corresponding auto-
correlation type functions. In contrast to classical autocorrelation, these functions
are invariant with respect to nonlinear monotonic transformations of the data.
For order three patterns, a variance-analytic “Pythagoras formula” combines the
different autocorrelation functions with our new version of permutation entropy.
We demonstrate the use of such correlation type functions in sliding window
analysis of biomedical and environmental data.

Keywords Autocorrelation • Ordinal time series • Permutation entropy

1 Introduction

We live in the era of Big Data. To produce time series, we now have cheap electronic
sensors which can measure fast, several thousand times per second, for weeks or
even years, without getting exhausted. A sensor evaluating light intensity, together
with a source periodically emitting light, can measure blood circulation and oxygen
saturation if it is fixed at your fingertip. In nature it can measure air pollution by
particulates, or water levels and wave heights, and many other quantities of interest.
Classical time series, obtained for instance from quarterly reports of companies,
monthly unemployment figures, or daily statistics of accidents, consist of 20 up to
a few thousand values. Sensor data are more comprehensive. A song of 3 min on

C. Bandt ()
Institute of Mathematics, University of Greifswald, Greifswald, Germany
e-mail: [email protected]

© Springer International Publishing Switzerland 2016 61


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_5
62 C. Bandt

CD comprises 16 million values. On the other hand, quarterly economic reports are
usually prepared with great scrutiny, while a mass of machine-generated data may
contain errors, outliers, and missing values. And there is also a difference in scale.
The intensity which a sensor determines is usually not proportional to the effect
which it intends to measure. There is a monotonic correspondence between light
intensity and oxygen-saturated blood, which is unlikely to be linear, and careful
calibration of the equipment is necessary if measurements on metric scale are
required. Raw sensor data usually come on ordinal scale. It is this ordinal aspect,
introduced to time series by Hallin and coauthors [8, 12], which we shall address.
We present autocorrelation type functions which remain unchanged when values are
transformed by a nonlinear monotone map like f .x/ D log x:
Permutation entropy, introduced in 2002 as a complexity measure [5], has been
applied to optical experiments [3, 18, 22], brain data [9, 16, 17, 19], river flow data
[14], control of rotating machines [15, 24], and other problems. For recent surveys,
see [2, 25]. A typical application is the automatic classification of sleep stages from
EEG data, a change point detection problem illustrated in Fig. 1b. We use a new
version of permutation entropy defined in Sect. 6.
Permutation entropy is just the Shannon entropy of the distribution of order
patterns:
X
HD p log p :


Fig. 1 Biomedical data of healthy person n3 from [20], taken from the CAP sleep database at
physionet [11]. (a) Ten seconds sample from EEG channel Fp2-F4, ECG, and plethysmogram.
(b) Expert annotation of sleep depth from [20] agrees with our version 2 of permutation entropy
almost completely. (c) Function Q̌ of the plethysmogram gives an overview of 9 h of high-frequency
circulation data. See Sects. 7 and 8
Order Patterns 63

Order patterns p will be defined below. There are two parameters. One parameter
is n; the length of the order pattern—the number of values to be compared with
each other. There are nŠ order patterns of length n: It could be proved that for many
dynamical systems, the limit of Hnn is the famous Kolmogorov–Sinai entropy [1, 23].
This provides a theoretical justification of permutation entropy. For real-world time
series, however, n > 10 is not meaningful because of the fast growth of possible
patterns. We recommend using n for which nŠ is smaller than the length of the series,
even though the averaging effect of the entropy formula allows to work with larger n:
The other parameter is d; the delay between neighboring equidistant time points
which are to be compared. For d D 1 we consider patterns of n consecutive points.
For d D 2 we compare xt with xtC2 ; xtC4 ; : : : It turns out that for big data series,
we have a lot of choices for d: Actually, d can be considered as delay parameter
in the same way as for autocorrelation. Moreover, one can consider single pattern
frequencies p .d/ instead of their average H; as done in [14, 17]. It turns out
that certain differences of pattern frequencies provide interpretable functions with
better properties than the frequencies themselves [6]. Figure 1c illustrates how
such functions can be used to survey large data series, like one night of high-
frequency circulation data. The purpose of the present note is to demonstrate that
such autocorrelation type functions form an appropriate tool for the study of big
ordinal data series. We focus on patterns of length 3 which seem most appropriate
from a practical viewpoint.

2 Ups and Downs

We consider a time series x D .x1 ; : : : ; xT /: Two consecutive values xt ; xtC1 in a


time series can represent two order patterns: up, xt < xtC1 ; or down, xt > xtC1 : We
shall neglect equality xt D xtC1 : That is, we assume that values xt are measured on
a continuous scale with sufficient precision, so that equality between two values is
very rare. If ties exist, they are not included in our calculation, and this will work
well if the number of ties is small, say not more than 3 % of all pairs of values.
The important point here is that we consider not only consecutive values but also
pairs xt ; xtCd in some distance d: Now we consider the difference of the percentage
of ups and downs as a function of d; which is similar to autocorrelation.
Here comes the definition. For any delay d between 1 and T  1, let n12 .d/ and
n21 .d/ denote the number of time points t for which xt < xtCd and xt > xtCd ; ups and
downs, respectively. We determine the relative frequencies of increase and decrease
over d steps. Let

n12 .d/
ˇ.d/ D p12 .d/  p21 .d/ with p12 .d/ D and p21 .d/ D 1  p12 .d/
n12 .d/ C n21 .d/

for d D 1; 2; : : : Ties are disregarded. ˇ.d/ is called up–down balance and is a kind
of autocorrelation function. It reflects the dependence structure of the underlying
process and has nothing to do with the size of the data.
64 C. Bandt

Fig. 2 Example time series for calculation of frequencies p .d/: For d D 2; one pattern  D 123
is indicated

For the short time series of Fig. 2, we get ˇ.1/ D 0; ˇ.2/ D  15 ; ˇ.3/ D
 24 ; ˇ.4/ D  13 ; ˇ.5/ D 1, and ˇ.6/ D 1 which could be drawn as a function.
To get reliable estimates we need of course longer time series, and we shall always
take d  T=2: Let us briefly discuss the statistical accuracy of ˇ.d/:
If the time series comes from Brownian motion (cf. [6]), there are no ties and
n12 .d/ follows a binomial distribution with p D 12 and n D T  d: The radius of the
95 % confidence interval for p12 .d/ then equals p1n : The error of the estimate will be
larger for more correlated time series, depending on the strength of correlation. For
our applications, we are on the safe side with a factor 2. Since ˇ.d/ D 2p12 .d/  1
4
this gives an error of ˙ pTd : Thus to estimate ˇ.d/ for small d with accuracy
˙0:01 we could need T D 160;000 values. Fortunately, the values of jˇ.d/j in our
applications are often greater than 0.1, and T D 2000 is enough to obtain significant
estimates. Nevertheless, ˇ is definitely a parameter for large time series.
One could think that usually ˇ.d/ D 0 and significant deviations from zero are
exceptional. This is not true. We found that ˇ can describe and classify objects
in different contexts. The data for Fig. 3 are water levels from the database of the
National Ocean Service [21]. Since tides depend on moon and sun, we studied
time series for the full month of September in 18 consecutive years. For intervals
of 6 min, 1 month gives T 7000: We have two tides per day, and the basic
frequency of 25 h can be observed in the functions ˇ.d/: What is more important,
ˇ.d/ characterizes sites, and this has nothing to do with amplitudes, which vary
tremendously between sites, but do not influence ˇ at all.
Since water tends to come fast and disappear more slowly, we could expect ˇ.d/
to be negative, at least for small d: This is mostly true for water levels from lakes,
like Milwaukee in Fig. 2. For sea stations, tidal influence makes the data almost
periodic, with a period slightly smaller than 25 h. The data have 6 min intervals and
we measure d in hours, writing d D 250 as d D 25 h: A strictly periodic time
series with period L fulfils ˇ.L  d/ D ˇ.d/ which in the case L D 25 implies
ˇ.12:5/ D 0; visible at all sea stations in Fig. 2. Otherwise there are big differences:
at Honolulu and Baltimore the water level is more likely to fall within the next few
hours, at San Francisco it is more likely to increase, and at Anchorage there is a
Order Patterns 65

Fig. 3 Water levels at 6 min intervals from [21]. Original data shown for Sept 1–2, 2014. Functions
ˇ are given for September of 18 years 1997–2014, and also for January in case of Anchorage and
San Francisco. d runs from 6 min to 27 h (270 values)

change at 6 h. Each station has its specific ˇ-profile, almost unchanged during 18
years, which characterizes its coastal shape. ˇ can also change with the season, but
these differences are smaller than those between stations.
Thus Fig. 3 indicates that ˇ; as well as related functions below, can solve basic
problems of statistics: describe, distinguish, and classify objects.

3 Patterns of Length 3

Three equidistant values xt ; xtCd ; xtC2d without ties can realize six order patterns.
213 denotes the case xtCd < xt < xtC2d :
66 C. Bandt

For each pattern  and d D 1; 2; : : : ; T=3 we count the number n .d/ of appear-
ances in the same way as n12 .d/: In case  D 312 we count all t D 1; : : : ; T  2d
with xtCd < xtC2d < xt : Let S be the sum of the six numbers. Patterns with ties are
not counted. Next, we compute the relative frequencies p .d/ D n .d/=S: In Fig. 2
we have twice 132 and 312 and once 321 for d D 1; and once 123, 231, 321 for
d D 2: This gives the estimates p321 .1/ D 0:2 and p321 .2/ D 0:33: Accuracy of
estimates p .d/ is similar to accuracy of p12 .d/ discussed above. For white noise it
is known from theory that all p .d/ are 16 [5].
As autocorrelation type functions, we now define certain sums and differences of
the p : The function

1
.d/ D p123 .d/ C p321 .d/ 
3
is called persistence [6]. This function indicates the probability that the sign of
xtCd  xt persists when we go d time steps ahead. The largest possible value of .d/
is 23 ; assumed for monotone time series. The minimal value is  13 : The constant 13
was chosen so that white noise has persistence zero. The letter  indicates that this
is one way to transfer Kendall’s tau to an autocorrelation function. Another version
was studied in [8].
It should be mentioned that ˇ can be calculated also by order patterns of length
3, with a negligible boundary error (see [4]):

ˇ.d/ D p123 .d/  p321 .d/

Beside  and ˇ we define two other autocorrelation type functions. For convenience,
we drop the argument d:

 D p213 C p231  p132  p312

is a measure of time irreversibility of the process, and

ı D p132 C p213  p231  p312

describes up–down scaling since it approximately fulfils ı.d/ D ˇ.2d/  ˇ.d/ [4].
Like ˇ; these functions measure certain symmetry breaks in the distribution of the
time series.
Figure 4 shows how these functions behave for the water data of Fig. 3 and for
much more noisy hourly measurements of particulate matter which also contain a
lot of ties and missing data. Like ties, missing values are just disregarded in the
calculation. Although there is more variation, it can be seen that curves for 11
successive years are quite similar. Autocorrelation  was added for comparison,
the function 2 is defined below.
All four ordinal functions remain unchanged when a nonlinear monotone
transformation is applied to the data. They are not influenced by low frequency
Order Patterns 67

Fig. 4 Autocorrelation  and ordinal functions 2 ; ; ˇ; ; ı (a) for the almost periodic series of
water levels at Los Angeles (September 1997–2014, data from [21], cf. Fig. 3), (b) for the noisy
series of hourly particulate values at nearby San Bernardino 2000–2011 from [7], with weak daily
rhythm. The functions on the left are about three times larger, for 2 nine times, but fluctuations
are of the same size
68 C. Bandt

components with wavelength much larger than d which often appear as artifacts
in data. Since ; ˇ; ; and ı are defined by assertions like XtCd  Xt > 0; they
do not require full stationarity of the underlying process. Stationary increments
suffice—Brownian motion for instance has .d/ D  16 for all d [6]. Finally, the
ordinal functions are not influenced much by a few outliers. One wrong value, no
matter how large, can change .d/ only by ˙2=S while it can completely spoil
autocorrelation.

4 Persistence Versus Autocorrelation

Persistence very much resembles the classical autocorrelation function : A period


of length L in a signal is indicated by minima of  at d D L2 ; 3L 5L
2 ; 2 ; : : : For
these d we have xt xtC2d so that the patterns 123 and 321 are rare. Near to
d D L; 2L; 3L; : : : the function  is large, as : For a noisy sequence, however, a
local minimum appears directly at d D L; 2L; : : : ; since patterns tend to have equal
probability there. The larger the noise, the deeper the bump. In Fig. 4b,  at d D 12
and d D 24 shows exactly this appearance and proves the existence of a 24 h rhythm
better than :
Here is a theoretical example. We take the AR2 process Xt D 1:85Xt1 
0:96Xt2 C Wt with Gaussian noise Wt which has oscillating and not too rapidly
decreasing autocorrelation (Fig. 5). To show the fluctuations, we evaluated ten
simulated time series with 2000 values which were then modified by various kinds
of perturbations. In the original data,  and  show the same minima, and  has

Fig. 5 Autocorrelation and persistence for an AR2 process with various perturbations. In each
case ten samples of 2000 points were processed, to determine statistical variation. On the left,
300 points of the respective time series are sketched. (a) Original signal. (b) Additive Gaussian
white noise. (c) One percent large outliers added. (d) Low-frequency function added. (e) Monotone
transformation applied to the data
Order Patterns 69

clear maxima while the maxima of  have a bump. Additive Gaussian white noise
with signal-to-noise ratio 1 increases this effect: Autocorrelation is diminished by
a factor of 2, and persistence becomes very flat. In practice, there are also other
disturbances. In Fig. 5c we took 1 % outliers with average amplitude 20 times of
the original signal. In d we added the low-frequency function sin t=300: In e the
time series was transformed by the nonlinear monotone transformation y D ex=7
which does not change  at all. In the presence of outliers, nonlinear scale, and
low-frequency correlated noise, persistence can behave better than autocorrelation.

5 Sliding Windows

We said that for an autocorrelation type function like ˇ or  we need a time series
of length T 103 : There are many databases, however, which contain series of
length 10 up to 107 : For example, the German Weather Service [10] offers hourly
5

temperature values from many stations for more than 40 years. To treat such data,
we divide the series into subseries of length 103 up to 104 ; so-called windows. We
determine  or ˇ for each window which gives a matrix. Each column represents a
window, each row corresponds to a value of d: This matrix is drawn as a color-coded
image. This is the technique known from a spectrogram. To increase resolution,
overlapping windows can be used. The size of windows can be taken even smaller
than 103 if only a global impression is needed.
As example, we consider temperature values of the soil (depth 5 cm) at the
author’s town. There is a daily cycle which becomes weak in winter. Persistence
with windows of 1 month length (only 720 values) shows the daily cycle and yearly
cycle. Measurements started in 1978 but Fig. 6 shows that during the first years (up

Fig. 6 Hourly temperature of earth in Greifswald, Germany. Data from German Weather Service
[10]. (a) The last 2000 values of 2013. The daily cycle is weak in winter. (b) Persistence for
d D 1; : : : ; 50 h shows irregularities in the data
70 C. Bandt

to 1991) temperature was taken every 3 h—the basic period is d D 8: Between 1991
and 2000, measurements were taken only three times a day. For screening of large
data series, this technique is very useful.

6 Permutation Entropy and Distance to White Noise


P
The permutation entropy H. p/ D   p log p was discussed in the introduc-
tion. Here p is the vector of all nŠ patterns p of length n: Used as a measure of
complexity and disorder, H can be calculated for time series of less than thousand
values since statistical inaccuracies of the p are smoothed out by averaging. As
a measure of disorder, H assumes its maximum H. p / D log nŠ for white noise
p where every p D nŠ1 : The difference D D log nŠ  H is called divergence or
Kullback–Leibler distance to the uniform distribution p D nŠ1 of white noise.
In this note all functions, including autocorrelation, measure the distance of the
data from white noise. For this reason, we take divergence rather than entropy, and
we replace p log p by p2 : As before, we drop the argument d: The function
X X
2 D . p  1 2

/ D p2  1

 

where the sum runs over all patterns  of length n; will Pbe called the distance of
1
the data from white noise. For n D 3 we have 2 D 2 2
 p  6 : Thus,  is
the squared Euclidean distance between the observed order pattern distribution and
the order pattern distribution of white noise. Considering white noise as complete
disorder, 2 measures the amount of rule and order in the data. The minimal value
0 is obtained for white noise, and the maximum 1  nŠ1 for a monotone time series.
Remark The Taylor expansion of H. p/ near white noise p is

nŠ 2 .nŠ/2 X 1
H. p/ D log nŠ   C . p   /3 :::
2 6  nŠ

Since H is a sum of one-dimensional terms, this is fairly easy to check. For f .q/ D
q log q we have f 0 .q/ D 1  log q , f 00 .q/ D q1 ; and f 000 .q/ D q2 : We insert
q D nŠ1 for all coordinates  to get derivatives at white noise, and see that the linear
term of the Taylor expansion vanishes.
Thus for signals near to white noise, 2 is just a rescaling of H: For patterns of
length n D 3 we have

H log 6  32 :
Order Patterns 71

7 Partition of the Distance to White Noise

A Pythagoras type formula combines 2 with the ordinal functions:

42 D 3 2 C 2ˇ 2 C  2 C ı 2 :

This holds for each d D 1; 2; : : : The equation is exact for random processes with
stationary increments as well as for cyclic time series. The latter means that we
calculate p .d/ from the series .x1 ; x2 ; : : : ; xT ; x1 ; x2 ; : : : ; x2d / where t runs from 1
to T: For real data we go only to T  2d and have a boundary effect which causes
the equation to be only approximately fulfilled. The difference is negligible. For the
proof and checks with various data, see [4].
This partition is related to orthogonal contrasts in the analysis of variance. When
2 .d/ is significantly different from zero, we can define new functions of d:

3 2 ˇ2 2 ı2
Q D ; ˇQ D ; Q D ; ıQ D :
42 22 42 42

By taking squares, we lose the sign of the values, but we gain a natural scale. ; Q Q
Q ˇ;
Q
and ı lie between 0 and 1 D 100 %; and they sum up to 1. For each d; they describe
the percentage of order in the data which is due to the corresponding difference of
patterns.
For Gaussian and elliptical symmetric processes, the functions ˇ; ; and ı are
all zero, and Q is 1, for every d [6]. Thus the map of Q shows to which extent the
data come from a Gaussian process and where the Gaussian symmetry is broken.
This is a visual test for elliptical symmetry. It does not provide p-values but it is a
strong tool in the sense of data mining, with 300,000 results contained in the pixels
of Fig. 7.
In data with characteristic periods, for example heartbeat, respiration, and
speech, we rarely find Q > 80 %: In Fig. 7 this is only true for d near the full or the
half heartbeat period which varies around 1 s during the night. Q is small throughout,
but ˇ is large for small d and around the full period while ıQ is large around half of
the period. This is not a Gaussian process.

8 Biomedical Data

As an example, we studied data from the CAP sleep database by Terzano et al.
[20] available at physionet [11]. For 9 h of ECG data, measured with 512 Hz, Fig. 7
shows ; Q Q ; and ıQ for 540 1-min non-overlapping sliding windows as graycode
Q ˇ;
on vertical lines. The delay d runs from 1 up to 600, which corresponds to 1.2 s. The
last row shows the difference in the above equation on a smaller scale, so this figure
is a practical check of the equation for more than 300,000 instances. It can be seen
72 C. Bandt

sec %
1 80
0.5 40
0
11PM 12AM 1AM 2AM 3AM 4AM 5AM 6AM 7AM
sec %
1 80
0.5 40

11PM 12AM 1AM 2AM 3AM 4AM 5AM 6AM 7AM


sec %
1 80
0.5 40
0
11PM 12AM 1AM 2AM 3AM 4AM 5AM 6AM 7AM
sec %
1 80
0.5 40
0
11PM 12AM 1AM 2AM 3AM 4AM 5AM 6AM 7AM
sec %
1

0.5 0.5

0
11PM 12AM 1AM 2AM 3AM 4AM 5AM 6AM 7AM

Fig. 7 Partition of 2 for ECG data of healthy subject n3 from the CAP sleep data of Terzano et al.
[20] at physionet [11]. From above, Q ; Q̌; ;
Q andıQ are shown as functions of time (1 min windows)
and delay (0.002 up to 1.2 s). The last row shows the error in the partition of 2 on a much smaller
scale, indicating artifacts

that the differences smaller than 1 % are rare—in fact they mainly include the times
with movement artifacts.
When 2 .d/ does not differ significantly from zero, it makes no sense to divide
by 2 : As a rule of thumb, we should not consider quantities like Q when 2 .d/ <
15=n where n is the window length. In Fig. 7 this would exclude the windows with
movement artifacts which cause black lines in the last row.
Sleep stages S1–S4 and R for REM sleep were annotated in Terzano et al. [20]
by experts, mainly using the EEG channel Fp2-F4 and the oculogram. Figure 1
demonstrates that permutation entropy 2 of that EEG channel, averaged over d D
2; : : : ; 20 gives an almost identical estimate of sleep depth. Permutation entropy
was already recommended as indicator of sleep stages, see [13, 16, 25], and our
calculations gave an almost magic coincidence for various patients and different
EEG channels. REM phases are difficult to detect with EEG alone. Usually, eye
movements are evaluated. Figure 1 indicates that information on dream phases are
contained in ˇQ of the plethysmogram which is measured with an optical sensor at
the fingertip. With certainty we can see here, and in Fig. 7, all interruptions of sleep
and a lot of breakpoints which coincide with changes detected by the annotator.
Evaluation of order patterns and their spectrogram-like visualization seem to be
a robust alternative to Fourier techniques in time series analysis for big data in many
fields.
Order Patterns 73

References

1. Amigo, J.: Permutation Complexity in Dynamical Systems. Springer, Heidelberg (2010)


2. Amigo, J., Keller, K., Kurths, J. (eds.): Recent progress in symbolic dynamics and permutation
entropy. Eur. Phys. J. Spec. Top. 222 (2013)
3. Aragoneses, A., Rubido, N., Tiana-Aisina, J., Torrent, M.C., Masoller, C.: Distinguishing
signatures of determinism and stochasticity in spiking complex systems. Sci. Rep. 3, Article
1778 (2012)
4. Bandt, C.: Autocorrelation type functions for big and dirty data series (2014). http://arxiv.org/
abs/1411.3904
5. Bandt, C., Pompe, B.: Permutation entropy: a natural complexity measure for time series. Phys.
Rev. Lett. 88, 174102 (2002)
6. Bandt, C., Shiha, F.: Order patterns in time series. J. Time Ser. Anal. 28, 646–665 (2007)
7. California Environmental Protection Agency, Air Resources Board: www.arb.ca.gov/aqd/
aqdcd/aqdcddld.htm (2014)
8. Ferguson, T.S., Genest, C., Hallin, M.: Kendall’s tau for serial dependence. Can. J. Stat. 28,
587–604 (2000)
9. Ferlazzo, E. et al.: Permutation entropy of scalp EEG: a tool to investigate epilepsies. Clin.
Neurophysiol. 125, 13–20 (2014)
10. German Weather Service: www.dwd.de, Climate and Environment, Climate Data (2014)
11. Goldberger, A.L. et al.: PhysioBank, PhysioToolkit, and PhysioNet: components of a new
research resource for complex physiologic signals. Circulation 101(23), e215–e220 (2000).
Data at: http://www.physionet.org/physiobank/database/capslpdb (2014)
12. Hallin, M., Puri, M.L.: Aligned rank tests for linear models with autocorrelated error terms. J.
Multivar. Anal. 50, 175–237 (1994)
13. Kuo, C.-E., Liang, S.-F.: Automatic stage scoring of single-channel sleep EEG based on
multiscale permutation entropy. In: 2011 IEEE Biomedical Circuits and Systems Conference
(BioCAS), pp. 448–451 (2011)
14. Lange, H., Rosso, O.A., Hauhs, M.: Ordinal pattern and statistical complexity analysis of daily
stream flow time series. Eur. Phys. J. Spec. Top. 222, 535–552 (2013)
15. Nair, U., Krishna, B.M., Namboothiri, V.N.N., Nampoori, V.P.N.: Permutation entropy based
real-time chatter detection using audio signal in turning process. Int. J. Adv. Manuf. Technol.
46, 61–68 (2010)
16. Nicolaou, N., Georgiou, J.: The use of permutation entropy to characterize sleep encephalo-
grams. Clin. EEG Neurosci. 42, 24 (2011)
17. Ouyang, G., Dang, C., Richards, D.A., Li, X.: Ordinal pattern based similarity analysis for
EEG recordings. Clin. Neurophysiol. 121, 694–703 (2010)
18. Soriano, M.C., Zunino, L., Rosso, O.A., Fischer, I., Mirasso, C.R.: Time scales of a chaotic
semiconductor laser with optical feedback under the lens of a permutation information analysis.
IEEE J. Quantum Electron. 47(2), 252–261 (2011)
19. Staniek, M., Lehnertz, K.: Symbolic transfer entropy. Phys. Rev. Lett. 100, 158101 (2008)
20. Terzano, M.G., et al.: Atlas, rules, and recording techniques for the scoring of cyclic alternating
pattern (CAP) in human sleep. Sleep Med. 2(6), 537–553 (2001)
21. The National Water Level Observation Network: www.tidesandcurrents.noaa.gov/nwlon.html
(2014)
22. Toomey, J.P., Kane, D.M.: Mapping the dynamical complexity of a semiconductor laser with
optical feedback using permutation entropy. Opt. Express 22(2), 1713–1725 (2014)
23. Unakafov, A.M., Keller, K.: Conditional entropy of ordinal patterns. Physica D 269, 94–102
(2014)
24. Yan, R., Liu, Y., Gao, R.X.: Permutation entropy: a nonlinear statistical measure for status
characterization of rotary machines. Mech. Syst. Signal Process. 29, 474–484 (2012)
25. Zanin, M., Zunino, L., Rosso, O.A., Papo, D.: Permutation entropy and its main biomedical
and econophysics applications: a review. Entropy 14, 1553–1577 (2012)
Generative Exponential Smoothing
and Generative ARMA Models to Forecast
Time-Variant Rates or Probabilities

Edgar Kalkowski and Bernhard Sick

Abstract In this chapter we present several types of novel generative forecasting


models for time series that consists of rates. The models are based on exponential
smoothing and ARMA techniques and exploit the fact that rates can be interpreted
as a series of Bernoulli trials in a probabilistic framework. The probabilistic nature
of the models makes it straightforward to extract uncertainty estimates that assess
how certain the model is that an observation will equal the forecast made before the
observation occurred. The forecasting performance of our models is evaluated using
several artificial and real-world data sets.

Keywords ARMA • Exponential smoothing • Generative models • Time series


forecasting

1 Introduction

An important application field in the context of time series forecasting is the


forecasting of future values of time-variant processes that can be modeled by
a probability distribution whose parameters change with time. To predict future
values of such a process we first derive an estimate of the future parameters
of the probability distribution and then determine a forecast from that estimated
distribution. As an example consider the forecasting of future values of click through
rates (number of clicks per impression) or conversion rates (number of sales per
click) in online advertising. Accurate predictions of future values of those rates
allow the advertiser to tune their ads and optimize their bidding process to achieve
many sales at little cost.
In this chapter we present several novel kinds of generative forecasting models
that can be used in this application scenario. In principle, a model is called “gener-
ative” if it describes the process underlying the “generation” of the observed data
[1]. Since the observed data have the form of rates we assume that the underlying

E. Kalkowski • B. Sick ()


University of Kassel, Kassel, Germany
e-mail: [email protected]; [email protected]

© Springer International Publishing Switzerland 2016 75


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_6
76 E. Kalkowski and B. Sick

process that generates the data consists of repeated Bernoulli trials and we have
the number of successes (e.g., clicks) and the number of overall observations
(e.g., impressions) to our disposal. To create a generative model for this process
we employ methods from Bayesian statistics. Starting with a prior distribution
that constitutes an estimate of the parameters of the underlying Bernoulli process
up to some point in time we update that model using Bayes’ equation to first
obtain a posterior distribution. From that posterior distribution a point estimate
of the parameters of the underlying Bernoulli process is derived using the mode
of the posterior distribution. In addition to the Bayesian estimation procedure
our generative models integrate either exponential smoothing (ES) techniques or
autoregressive moving averages (ARMA) of past values of the rate to counteract
noise and outliers in the data. This chapter is a major extension of a conference
contribution [2] in which only exponential smoothing based models were discussed
and evaluated. Here, we generalize the type of linear model our generative models
are based on much further by using AR and ARMA techniques as a basis.
The main advantage of our generative models is that due to utilizing not only
past values of the rate but also past values of the number of success and failure
observations more information is available which leads to better forecasts of future
values. Also due to the probabilistic nature of our models it is straightforward
to extract uncertainty estimates for forecast values. Those uncertainty estimates
describe how (un)certain the model is that its forecast value will actually be equal or
very similar to the value of the rate that will be observed in the future. Uncertainty
estimates can be used in an application, e.g., to alert a human expert in case forecasts
become too uncertain.
Apart from some data of the aforementioned application field of online advertis-
ing we apply our generative models to several further artificial and real-world data
sets to evaluate their forecasting performance and compare them to the respective
non-generative models.
The remainder of this chapter is structured as follows: Sect. 2 gives a brief
overview of related work. In Sect. 3 we present our new generative forecasting
models. In Sect. 4 we apply our models to several artificial and real-world data
sets to evaluate and compare their forecasting performance. Finally, in Sect. 5 we
summarize our findings and briefly sketch planned future research.

2 Related Work

There exists an abundance of research related to exponential smoothing and


ARMA models. An overview for exponential smoothing is, e.g., given in [3, 4].
Introductions to ARMA models can, e.g., be found in [5–8]. Thus, in this section
we focus one some pointers to articles that try to combine exponential smoothing or
ARMA techniques with probabilistic modeling approaches.
Similar to our exponential smoothing based models, Sunehag et al. [9] create a
Bayesian estimator for the success parameter of a Bernoulli distribution of which
Generative Exponential Smoothing and Generative ARMA Models to Forecast. . . 77

the values change over time. Their goal is to detect change points and outliers in a
sequence of bits in the context of an adaptive variant of the context tree weighting
compression algorithm (cf. [10, 11]). The authors either use a window of fixed
length of previous observations or exponentially decreasing weights for previous
values. However, no trend is considered.
Yamanashi et al. have integrated discounting factors into both, an EM algorithm
for Gaussian mixtures (cf. [12]) and an autoregressive (AR) model which is used
to detect change points and outliers in a time series (cf. [13]). Again, no trends
are considered and the focus is on detecting change points and outliers and not on
accurately forecasting future values of the time series.

3 Generative Models for Rates or Probability Estimates

In this section we first explain the basics of our generative models for rates and
then present several generative models based on exponential smoothing and ARMA
techniques.

3.1 Basic Idea

In the following we denote the most recently observed value of a rate we are
interested in with the time index T 2 N and a future rate we would like to forecast
with time index t0 > T. We then have rates x1 ; : : : ; xT 2 Œ0; 1 observed at points in
time 1 ; : : : ; T 2 R which we can use to forecast the value xt0 . Each rate xt can be
subdivided into a numerator nt 2 N and a nonzero denominator dt 2 N n f0g, i.e.,
xt D ndtt for 1  t  T. In this chapter we assume that each observed rate value was
created by a Bernoulli trial with some success parameter p. This parameter is usually
unknown and has to be estimated from observed data. An additional assumption we
make in this chapter is that p changes over time and, thus, estimates of the success
parameter have to be constantly kept up-to-date. Basically, with a Bernoulli trial at
each point in time either a success or a failure is observed. Our approach, however,
is more general and allows for multiple observations at each time step.
All generative models presented in this chapter use a Bayesian approach to
estimate p from observed data. The basic idea is that using Bayes’ equation
new observations can easily be integrated into a model that already describes all
observations up to a certain point in time. For the initial update, a prior distribution
has to be chosen which expresses any prior knowledge about the process a user
may have. In case no prior knowledge is available an uninformative prior can be
chosen [14]. Usually, in Bayesian statistics a conjugate prior distribution [1] is used
because then the derived posterior distribution has the same functional form as the
prior. This enables us to use the posterior as a prior in subsequent updates.
78 E. Kalkowski and B. Sick

For our problem consisting of rates the conjugate prior distribution is a beta
distribution B.xj˛; ˇ/ defined by

 .˛ C ˇ/ ˛1
B.xj˛; ˇ/ D x .1  x/ˇ1 (1)
 .˛/ .ˇ/

for x 2 Œ0; 1 and ˛; ˇ 2 R with ˛; ˇ > 0. Also,  ./ is the gamma function defined
by
Z 1
 .t/ D xt1 exp.x/ dx: (2)
0

In case the beta distribution is associated with a specific point in time, that point’s
time index is used for the distribution and its parameters as well, e.g., BT .xj˛T ; ˇT /
indicating a beta distribution at time point T with time-specific parameters ˛T and
ˇT .
Using Bayes’ equation

posterior / likelihood  prior (3)

it can easily be shown that the parameters ˛T and ˇT of a posterior beta distribution
can be computed from the parameters ˛T1 and ˇT1 of the respective prior
distribution as

˛T D ˛T1 C nT ; (4)
ˇT D ˇT1 C .dT  nT / (5)

using the latest observation xT D ndTT . The beta distribution obtained this way is used
as a prior distribution in the next update step. If a point estimate pT of the success
rate at time point T is required it can be obtained from the posterior distribution.
Possible choices to extract a point estimate are the expected value or the mode
of the distribution. This choice should be made in conjunction with the initial and
possibly uninformative prior distribution especially in case initial forecasts shall be
made based solely on the initial distribution since not for every prior distribution
both mode and expected value of the distribution are well defined. Also, depending
on the combination of initial prior and the way in which to extract point estimates,
not every forecast accurately represents the number of made success and failure
observations. In this chapter, we use a Bayesian prior with ˛ D ˇ D 1 as suggested
in [15] and extract point estimates based on the mode of the distribution:
(
0:5; ˛T D ˇT D 1
pT D MT .x/ D ˛T 1
: (6)
˛T CˇT 2
; otherwise
Generative Exponential Smoothing and Generative ARMA Models to Forecast. . . 79

Here, MT .x/ indicates the mode of the posterior beta distribution at time point T
with parameters ˛T and ˇT . The mode has been slightly extended for the case of
˛T D ˇT D 1 to allow derivation of forecasts even for initial prior distributions.
In addition to forecasts of future values of the rate we want to assess the
uncertainty of those forecasts. An estimate of how uncertain a generative model is
about a forecast can be derived from the variance of the underlying distribution. Due
to our choice of the prior distribution, the variance of a posterior beta distribution lies
1
within Œ0; 12 . To scale our uncertainty estimates to the interval Œ0; 1 we multiply
the variance by 12 before taking the square root to get a scaled standard deviation
as uncertainty estimate of a forecast. For the beta distribution we thus get an
uncertainty estimate at time point T of
s
p 12˛T ˇT
uT D 12  VT .x/ D (7)
.˛T C ˇT /2 .˛T C ˇT C 1/

where VT .x/ is the variance of the posterior beta distribution at time point T .
The key to integrating exponential smoothing and ARMA techniques into this
generative modeling framework lies in modifying the update procedure of the beta
distribution shown in (4) and (5). The following sections explain how different
generative models achieve this and how those models derive forecasts of future
values of the success parameter of the underlying Bernoulli process.

3.2 Generative Exponential Smoothing

In this section we describe the basic combination of the generative modeling


technique laid out in Sect. 3.1 and exponential smoothing. This model is called
generative exponential smoothing (GES). The main goal is to assign a decreasing
influence to values of the rate the further in the past they were observed. In this
case the weights of rate values decrease exponentially with time. To achieve this we
change the update Eqs. (4) and (5) to

˛T D 1 C T .˛T1  1/ C .1  T /nT ; (8)


ˇT D 1 C T .ˇT1  1/ C .1  T /.dT  nT /: (9)

Here, T D  T T1 with  2 Œ0; 1 is a time-dependent smoothing factor that takes


into account the time difference between the most current and the preceding rate
observation. Due to our choice of initial prior distribution (cf. Sect. 3.1) we smooth
both parameter values towards 1.
To derive a point estimate of a future value of the success parameter of the
underlying Bernoulli process we first project the beta distribution into the future
by essentially executing an update according to (8) and (9) but without adding any
80 E. Kalkowski and B. Sick

new observations:

˛t0 D 1 C t0 .˛T  1/; (10)


ˇt0 D 1 C t0 .ˇT  1/: (11)

This does not change the mode of the distribution but it influences its variance
and, thus, the uncertainty estimates: The further a forecast lies into the future the
more uncertain the model becomes about that forecast. The forecasts themselves
are independent of the amount of time into the future (forecasting horizon) for
which they are made. Point estimates and uncertainty estimates are derived using (6)
with the parameters ˛t0 and ˇt0 of the projected posterior distribution. More details
regarding the GES model can be found in [2].

3.3 Generative Exponential Smoothing with Trend

In addition to the exponential smoothing of observed rate values employed by the


GES model presented in Sect. 3.2 we can make use of a local trend in the data, i.e.,
the difference between the most current and the preceding observation. This model
is called generative exponential smoothing with trend (GEST). The update procedure
for the beta distribution is the same as with the GES model but additionally a
smoothed local trend #T is stored. For a new rate observation this trend value is
updated according to

.1  T /.xT  xT1 /
#T D T #T1 C : (12)
T  T1

Here, T D T T1 with  2 Œ0; 1 is an additional trend smoothing factor.


Similar to the GES model, in order to derive a point estimate the current posterior
beta distribution is first projected into the future. However, in contrast to the GES
model the GEST model considers the current value of the smoothed local trend:

pt0 D minf1; maxf0; Mt0 .x/ C .t0  T /#T gg: (13)

The forecast has to be limited to the interval Œ0; 1 of valid rates since due
to the consideration of the local trends invalid forecasts outside of Œ0; 1 could
occur. Forecasts made by this model explicitly consider the forecasting horizon.
Uncertainty estimates are derived according to (7) using ˛t0 and ˇt0 . More details on
the GEST model can be found in [2].

3.4 Generative Double Exponential Smoothing

To take the approach of Sect. 3.3 one step further we can not only consider the dif-
ferences of subsequent rate values but instead take the differences or smoothed local
Generative Exponential Smoothing and Generative ARMA Models to Forecast. . . 81

trends of subsequent success and failure observations separately into account. This
model is called generative double exponential smoothing (GDES). The parameters
of the beta distribution are updated according to

˛T D 1 C T .˛T1  1 C .T  T1 /#˛;T / C .1  T /nT ; (14)


ˇT D 1 C T .ˇT1  1 C .T  T1 /#ˇ;T / C .1  T /.dT  nT /: (15)

Here, T D  T T1 with  2 Œ0; 1 is a time-dependent smoothing factor as in (8)


and (9). Additionally, #˛;T and #ˇ;T are the smoothed local trends of success and
failures, respectively. Those trends are updated according to

.1  T /.nT  nT1 /
#˛;T D T #˛;T1 C ; (16)
T  T1
1  T
#ˇ;T D T #ˇ;T1 C  .dT  nT  dT1 C nT1 /: (17)
T  T1

As in (12) T D T T1 with  2 Œ0; 1 is a trend smoothing factor.


Similar to the GES model (cf. Sect. 3.2), in order to make a forecast the current
posterior beta distribution is first projected into the future. Similar to the GEST
model (cf. Sect. 3.3), we consider the local trends of the success and failure
observations. Thus, we get

˛t0 D maxf1; 1 C t0 .˛T  1 C .t0  T /#˛;T /g; (18)


ˇt0 D maxf1; 1 C t0 .ˇT  1 C .t0  T /#ˇ;T /g: (19)

Since we consider the smoothed local trends when updating the parameters it may
happen that invalid parameter combinations are generated. For our choice of the
prior distribution (cf. Sect. 3.1) all values smaller than 1 would be invalid and thus
we limit our parameter updates accordingly. To derive a forecast we directly use the
mode of the resulting distribution as in (6). However, in contrast to the GEST model
the forecasts of the GDES model depend on the forecasting horizon. This is due to
the smoothed local trend being considered when deriving the forecasts. More details
about the GDES model can be found in [2].

3.5 Generative Autoregressive Model

In this section we present a more general approach where the weights of values
observed in the past are not decreasing exponentially but can be arbitrary real
numbers. This kind of model is called generative autoregressive model (GAR).
Basically, a non-generative autoregressive model creates a forecast by taking the
weighted sum of p values observed in the past. For our GAR model, we apply this
82 E. Kalkowski and B. Sick

approach to the two parameters of the beta distribution, i.e., the value of a parameter
at a specific point in time T depends on the weighted sum of the previous p success
or failure observations. Due to the data-dependent weights, the result may become
negative depending on the choice of parameters. To make sure that the resulting
beta distribution is well defined and to be consistent with our exponential smoothing
based generative models, we limit the parameter values to be greater than or equal
to 1. This results in
( )
Xp
˛T D max 1; t nTtC1 ; (20)
tD1
( )
X
p
ˇT D max 1; t .dTtC1  nTtC1 / : (21)
tD1

Here, 1 ; : : : ; p 2 R are weights for the number of success observations and


1 ; : : : ; p 2 R are weights for the number of failure observations.
Altogether, a GAR model has 2  p parameters for which values have to be
provided. Depending on the desired forecasting horizon different parameters should
be chosen to make sure that suitable values are forecast. Assuming a sufficient
amount of data are available parameter values for a forecasting horizon h can be
determined by solving the linear least squares problems

. 1; : : : ; p/
T
D arg minp fkA  x  bkg; (22)
x2R

.1 ; : : : ; p /T D arg minp fkB  x  ckg (23)


x2R

with
0 1 0 1
n1    np npCh
B C B C
ADB :: :: : C bDB : C
@ : : :: A ; @ :: A ; (24)
nTphC1    nTh nT
0 1 0 1
d1  n1  dp  np dpCh  npCh
B C B C
BDB :: :: :: C; cDB :: C: (25)
@ : : : A @ : A
dTphC1  nTphC1    dTh  nTh dT  nT

A forecast of a future value of the rate is made by first computing the parameters
˛T and ˇT of the beta distribution and then taking the mode of this distribution
as defined in (6). In contrast to our exponential smoothing based models which
either do not consider the forecasting horizon at all (GES model, cf. Sect. 3.2) or
can dynamically adapt to different forecasting horizons (GEST and GDES models,
cf. Sects. 3.3 and 3.4), the weights of an AR model either have to be recomputed in
case the forecasting horizon changes or multiple sets of weights have to be prepared
Generative Exponential Smoothing and Generative ARMA Models to Forecast. . . 83

one for each horizon. An uncertainty estimate can be derived according to (7) using
the most recent beta distribution.

3.6 Generative Autoregressive Moving Average

In addition to values observed in the past we can also explicitly include noise
terms into our model. This type of model is called generative autoregressive moving
average (GARMA). When including noise terms into (20) and (21) we get
( )
X
p X
q
˛T D max 1; t nTtC1 C t t ; (26)
tD1 tD1
( )
X
p
X
q
ˇT D max 1; t .dTtC1  nTtC1 / C t t : (27)
tD1 tD1

Here, the t and t are weights identical to those used in (20) and (21). Additionally,
1 ; : : : ; q 2 R and 1 ; : : : ; q 2 R are weights for realizations of random noise
t N .0; 2 /. Both values have a hard lower limit of 1 to be consistent with our
exponential smoothing based models and to make sure the resulting beta distribution
is well defined.
Due to not being able to directly observe the random noise variables t , training
a GARMA model is slightly more difficult than training an AR model. For the
simulations in this chapter we used a variation of the Hannan–Rissanen algorithm
[16]. In the first step of this algorithm, a high order GAR model is fitted to available
training data. Then, the forecasting errors of the trained GAR model are used as
estimates for the t . Finally, using the estimated values for t we can estimate the
parameters of a full GARMA model by solving the least squares problems

. 1; : : : ; p ; 1 ; : : : ; q /
T
D arg min
pq
fkC  x  dkg; (28)
x2R

.1 ; : : : ; p ; 1 ; : : : ; q /T D arg min


pq
fkD  x  ekg (29)
x2R

with
0 1
nmpC1    nm mqC1  m
B C
CDB :: :: : :: :: :: C ;
@ : : :: : : : A (30)
nThpC1    nTh ThqC1  Th
0 1
dmpC1  nmpC1  dm  nm mqC1  m
B C
DDB :: :: :: :: :: :: C ; (31)
@ : : : : : : A
dThpC1  nThpC1    dTh  nTh ThqC1  Th
84 E. Kalkowski and B. Sick

0 1 0 1
nmCh dmCh  nmCh
B C B C
dDB : C eDB :: C
@ :: A ; @ : A (32)
nT dT  nT

where m D maxfp; qg.


Forecasts of future values are made by first computing the parameters ˛T and
ˇT of the current beta distribution and then taking the mode of that distribution as
defined in (6). Uncertainty estimates can be derived according to (7). As with a GAR
model (cf. Sect. 3.5), the parameters of a GARMA model have to be recomputed in
case the forecasting horizon changes.

4 Simulation Results

In this section we apply our generative models to several artificial benchmark data
sets and real-world data sets.

4.1 Data Sets

The first data sets we use for the evaluation and comparison of our models are
artificially generated. For that, we used seven different base signals: square wave,
sine, triangular, and mixed 1 through 4. Those base signals span an overall time
of 300 time steps. The signals are generated by taking 200 base observations and
varying the number of successes according to the respective type of signal. For the
square wave the number of successes alternates between 40 and 160 every 25 time
steps which results in the success rate varying between 0:2 and 0:8. The sine signal
uses a wavelength of 50 time steps and oscillates between 10 and 90 successes
which yields a rate varying between 0:05 and 0:95. The triangular signal is similar
to the sine signal except it has a triangular shaped rate. The four mixed data sets
contain rates whose characteristics change midway from a square wave signal to a
sine signal and vice versa and from a sine with low frequency to a sine with high
frequency and vice versa. For each base signal a number of variants were generated
by adding white noise of variances 0, 5, 10, 15, 20, 25, and 30 to the number of
success observations. This yields a total of 49 artificial data sets.
In addition to the artificial data sets some real-world data sets are considered.
The first is taken from the iPinYou challenge [17] held in 2013 to find a good
real time bidding algorithm for online advertising. While we have not created such
an algorithm we nevertheless use the data from the competition and forecast click
through rates. The data do not have a fixed time resolution but we aggregated them
to contain one click through rate value per hour.
Generative Exponential Smoothing and Generative ARMA Models to Forecast. . . 85

The second real-world data set was used in [18, 19] and is concerned with the
number of actively used computers in a computer pool at the University of Passau,
Germany. The data were recorded between April 15th and July 16th 2002 in 30-min
intervals.
A third real-world data set is taken from the PROBEN1 benchmark data suite
[20, 21]. The data set contains recordings of the energy consumption of a building
with a time resolution of 1 h.
The next 17 real-world data sets are concerned with search engine marketing.
Each of these data sets contains conversion rates for one search keyword over a
varying amount of time and with a varying time resolution.
The last 20 real-world data sets contain the citation rates of scientific articles.
The citation rates were aggregated to a time resolution of 1 year.
The next section presents the results of applying our forecasting models to the
data sets described in this section.

4.2 Forecasting Performance

All of our models have one or more parameters that need to be adjusted before
the model can actually be applied to data. We use an iterative grid search to
automatically find good parameters. During the parameter search we only use the
first quarter of each data set. For each value in the remaining three quarters of each
data set a forecast is made using forecasting horizons from 1 to 6 time steps. For
each horizon the mean squared forecasting error (MSE) is computed which can
then be compared to the errors achieved by other models.
For an overall evaluation of our models we applied the Friedman test [22] using
a significance level of 0:01 followed by a Nemenyi test [23]. The Friedman test is a
non-parametric statistical test which compares a set of multiple forecasting models
on multiple data sets. In order to do that ranks are assigned to the models. For each
data set the model with the best performance gets the lowest (best) rank. The null
hypothesis of the Friedman test is that there are no significant differences between
the forecasting performances of all models averaged over all used data sets. For our
nine models and 534 combinations of data set and forecasting horizon Friedman’s
2F is distributed according to a 2F distribution with 8 degrees of freedom. The
critical value of 2F .8/ for a significance level of 0:01 is 20:1 which is lower than
Friedman’s 2F D 153:9 which means the null hypothesis is rejected.
In case there are significant differences a post hoc test such as the Nemenyi test is
employed to find out which of the models performed better or worse. The result of
the Nemenyi test is a critical difference (CD). For our results the Nemenyi test yields
a critical difference of CD D 0:807. This means that any two models whose average
ranks differ by more than 0:807 are significantly different from each other. The
results of the Nemenyi test can be visualized by means of a critical difference plot
(cf. Fig. 1). For comparison we also included results of a non-generative exponential
86 E. Kalkowski and B. Sick

9 8 7 6 5 4 3 2 1

CD = 0.807

AR GARMA
GAR GDES
ARMA GEST
GES SW
ES

Fig. 1 Critical difference plot of the ranked MSE values achieved by each model for a significance
level of 0:01. Smaller ranks are better than greater ranks. If the ranks of two models differ more
than the critical difference their performance is significantly different on the used data sets

smoothing model (ES), a sliding window average (SW), and a non-generative


autoregressive model (AR).
Averaged over all artificial and real-world data sets the non-generative AR model
performs significantly worse than all other models. The remaining models are partly
connected by lines, which means that there is no significant differences between
some of them. However, we can still conclude that the GARMA model performs
significantly better than the GAR, ARMA, GES, ES, and SW models and the GDES
model is still significantly better than the GAR model.

4.3 Run-Time

In addition to the forecasting performance we also did a basic run-time analysis


of our models. We measured the run-time required to execute all evaluations
needed to compute the mean squared forecasting error used to compare forecasting
performances in Sect. 4.2. Each time measurement was repeated ten times and the
average of the last seven runs was actually used to assess the run-time of a model.
Measurements were executed on an 2:5 GHz quad core CPU.
Similar to the forecasting errors we applied the Friedman test followed by the
Nemenyi test to the ranked run-times to find out whether or not there are significant
differences. Here, the critical value of F .8/ for a significance level of 0:01 is 13:4
which is lower than Friedman’s 2F D 2004:7. This means that the null hypothesis
of the Friedman test is rejected and there are significant differences between the
run-times of the models. In this case the Nemenyi test yields a critical difference of
CD D 0:642. The corresponding CD plot is given in Fig. 2. There are four groups
of models which significantly differ in their respective run-times. The significantly
fastest models are ES, SW, GDES, and GES, which are also the simplest models that
require the fewest computations when applying an update. The GEST model takes
significantly longer to evaluate than the previously mentioned models, followed by
the AR and ARMA models. The slowest models are the GAR and GARMA models.
Generative Exponential Smoothing and Generative ARMA Models to Forecast. . . 87

9 8 7 6 5 4 3 2 1
CD = 0.642

GARMA ES
GAR SW
AR GDES
ARMA GES
GEST

Fig. 2 Critical difference plot of the ranked run-times required to evaluate the MSE of each model
for all used data sets using a significance level of 0:01. Smaller ranks are better since they result
from smaller run-times. If the ranks of two models differ less than the critical difference there is
no significant difference between the run-times of the respective models

5 Conclusion and Outlook

In this chapter we presented several novel generative models for rates which are
based on either exponential smoothing or ARMA techniques. We evaluated the
forecasting performance of our models using several artificial and real-world data
sets. A statistical test showed that the GARMA models has the best forecasting
performance followed by a GDES model. A brief analysis of the run-time of our
models revealed that, as expected, models such as GARMA and GAR, which require
more operations to derive forecasts, also require more run-time to evaluate. The
exponential smoothing based models GDES and GES are nearly as fast as a non-
generative exponential smoothing model and a sliding window average.
This chapter focused on describing our novel modeling approaches and giving an
impression of their forecasting performance by comparing their performance using
several data sets. For the sake of brevity detailed results for individual data sets
were not given. In the future we would like to further explore how our different
kinds of models perform on data sets with specific characteristics. Also we would
like to compare our models’ forecasting performance to completely different kinds
of models, e.g., nonlinear models based on support vector regression.
For all our generative models it is possible to derive uncertainty estimates that
express how certain the model is that its predictions will actually fit to future
observations of the rate. We did not explore the possibilities of uncertainty estimates
further in this chapter; however, in the future we would like to make use of this
information to, e.g., automatically notify a human expert in case the uncertainty of
forecasts becomes too high.

Acknowledgements The search engine marketing data sets were kindly provided by crealytics.
88 E. Kalkowski and B. Sick

References

1. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
2. Kalkowski, E., Sick, B.: Generative exponential smoothing models to forecast time-variant
rates or probabilities. In: Proceedings of the 2015 International Work-Conference on Time
Series (ITISE 2015), pp. 806–817 (2015)
3. Gardner Jr., E.S.: Exponential smoothing: the state of the art. J. Forecast. 4(1), 1–28 (1985)
4. Gardner Jr., E.S.: Exponential smoothing: the state of the art—part II. Int. J. Forecast. 22(4),
637–666 (2006)
5. Chatfield, C.: Time-Series Forecasting. Chapman and Hall/CRC, Boca Raton (2000)
6. Brockwell, P.J., Davis, R.A.: Introduction to Time Series and Forecasting, 2nd edn. Springer,
New York (2002)
7. Chatfield, C.: The Analysis of Time Series: An Introduction, 6th edn. Chapman and Hall/CRC,
Boca Raton (2003)
8. Granger, C.W.J., Newbold, P.: Forecasting Economic Time Series, 2nd edn. Academic, San
Diego (1986)
9. Sunehag, P., Shao, W., Hutter, M.: Coding of non-stationary sources as a foundation for
detecting change points and outliers in binary time-series. In: Proceedings of the 10th
Australasian Data Mining Conference (AusDM ’12), vol. 134, pp. 79–84 (2012)
10. O’Neill, A., Hutter, M., Shao, W., Sunehag, P.: Adaptive context tree weighting. In:
Proceedings of the 2012 Data Compression Conference (DCC), pp. 317–326 (2012)
11. Willems, F.M.J., Shtarkov, Y.M., Tjalkens, T.J.: The context-tree weighting method: basic
properties. IEEE Trans. Inf. Theory 41(3), 653–664 (1995)
12. Yamanishi, K., Takeuchi, J., Williams, G., Milne, P.: On-line unsupervised outlier detection
using finite mixtures with discounting learning algorithms. In: Proceedings of the 6th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2000)
(2000)
13. Yamanishi, K., Takeuchi, J.: A unifying framework for detecting outliers and change points
from non-stationary time series data. In: Proceedings of the 8th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD 2002), pp. 676–681 (2002)
14. Jaynes, E.T.: Prior probabilities. IEEE Trans. Syst. Sci. Cybern. 4, 227–241 (1968)
15. Bayes, T., Price, R.: An essay towards solving a problem in the doctrine of chances. By the
late Rev. Mr. Bayes, F. R. S. communicated by Mr. Price, in a letter to John Canton, A. M. F.
R. S. Philos. Trans. 53, 370–418 (1763)
16. Hannan, E.J., Rissanen, J.: Recursive estimation of mixed autoregressive-moving average
order. Biometrika 69, 81–94 (1982)
17. Beijing Pinyou Interactive Information Technology Co., Ltd.: iPinYou: global bidding
algorithm competition. http://contest.ipinyou.com/data.shtml [Online]. Last accessed 18 Mar
2015
18. Gruber, C., Sick, B.: Processing short-term and long-term information with a combination of
hard- and soft-computing techniques. In: Proceedings of the IEEE International Conference
on Systems, Man & Cybernetics (SMC 2003), vol. 1, pp. 126–133 (2003)
19. Fuchs, E., Gruber, C., Reitmaier, T., Sick, B.: Processing short-term and long-term information
with a combination of polynomial approximation techniques and time-delay neural networks.
IEEE Trans. Neural Netw. 20(9), 1450–1462 (2009)
20. Kreider, J.F., Haberl, J.S.: The great energy predictor shootout—Overview and discussion of
results, ASHRAE Transactions 100(2), 1104–1118 (1994)
21. Prechelt, L.: PROBEN 1 – a set of neural network benchmark problems and benchmarking rules.
Technical report, University of Karlsruhe (1994)
22. Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis
of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)
23. Nemenyi, P.B.: Distribution-free multiple comparisons. Ph.D. thesis, Princeton University
(1963)
First-Passage Time Properties of Correlated
Time Series with Scale-Invariant Behavior
and with Crossovers in the Scaling

Pedro Carpena, Ana V. Coronado, Concepción Carretero-Campos,


Pedro Bernaola-Galván, and Plamen Ch. Ivanov

Abstract The observable outputs of a great variety of complex dynamical systems


form long-range correlated time series with scale invariance behavior. Important
properties of such time series are related to the statistical behavior of the first-
passage time (FPT), i.e., the time required for an output variable that defines the time
series to return to a certain value. Experimental findings in complex systems have
attributed the properties of the FPT probability distribution and the FPT mean value
to the specifics of the particular system. However, in a previous work we showed
(Carretero-Campos, Phys Rev E 85:011139, 2012) that correlations are a unifying
factor behind the variety of findings for FPT, and that diverse systems characterized
by the same degree of correlations in the output time series exhibit similar FPT
properties. Here, we extend our analysis and study the FPT properties of long-
range correlated time series with crossovers in the scaling, similar to those observed
in many experimental systems. To do so, first we introduce an algorithm able to
generate artificial time series of this kind, and study numerically the statistical
properties of FPT for these time series. Then, we compare our results to those found
in the output time series of real systems and we demonstrate that, independently of
the specifics of the system, correlations are the unifying factor underlying key FPT
properties of systems with output time series exhibiting crossovers in the scaling.

Keywords DFA • First-passage time • Fractal noises • Long-range correla-


tions • Scaling crossovers

P. Carpena () • A.V. Coronado • C. Carretero-Campos • P. Bernaola-Galván


Dpto. de Física Aplicada II, ETSI de Telecomunicación, Universidad de Málaga, 29071 Málaga,
Spain
e-mail: [email protected]
P.Ch. Ivanov
Center for Polymer Studies and Department of Physics, Boston University, Boston, MA, USA
Harvard Medical School and Division of Sleep Medicine, Brigham and Women’s Hospital,
Boston, MA, USA
Institute of Solid State Physics, Bulgarian Academy of Sciences, Sofia, Bulgaria

© Springer International Publishing Switzerland 2016 89


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_7
90 P. Carpena et al.

1 Introduction

The observable output signals of many complex dynamical systems are long-range
correlated time series with long memory and scale invariance properties, which are
usually studied in the framework of one-dimensional generalized random walks.
Fundamental characteristics of random walks, and of the corresponding time series,
are represented by the statistical properties of the first-passage time (FPT) [1], i.e.,
the time required for the time series to return to a certain value (usually zero).
The main statistical properties of the FPT are the functional form of its probability
distribution p.`/ and its average length h`i. Empirical studies have reported a variety
of forms for the probability distribution of FPT, including (1) pure exponential for
random uncorrelated processes [2], (2) stretched exponential forms for a diverse
group of natural and social complex systems ranging from neuron firing [3], climate
fluctuations [4], or heartbeat dynamics [5], to Internet traffic [6, 7] and stock
market activity [8, 9]; and (3) power-law form for certain on–off intermittency
processes related to nonlinear electronic circuits [10] and anomalous diffusion [11–
14]. Such diverse behavior is traditionally attributed to the specifics of the individual
system. Identifying common factors responsible for similar behaviors of FPT across
different systems has not been a focus of investigations. Indeed, these systems
exhibit different scale invariant long-range correlated behaviors and how the degree
of correlations embedded in the system dynamics relates to the statistical properties
of FPT is not known. In a previous work [15] we hypothesized that correlations are
the unifying factor behind a class of complex systems of diverse nature exhibiting
similar statistical properties for FPT, and conversely, systems that belong to the same
class of FPT properties posses a comparable degree of correlations. We investigated
[15] how the degree of correlations in the system dynamics affects key properties of
FPT—the shape of the probability density p.`/ and the FPT average length h`i. A
summary of the results we found in [15] are presented in Sect. 2.
However, instead of a single scaling exponent, many complex systems exhibit
two different scaling regimes characterized by different correlation exponents at
short and large scales with a crossover separating both regimes. This happens in
complex dynamical systems that are regulated by several competing mechanisms
acting at different time scales, thus producing fractal long-range correlations with
scaling crossovers. Examples of such scaling crossovers can be found in heartbeat
dynamics [16, 17], stock market trade [18], or in the proprioceptive system respon-
sible for human balance [19, 20]. Here, we hypothesize that correlations are also
the unifying factor behind the FPT properties of such complex systems. To probe
so, first we present an algorithm able to generate artificial, long-range correlated
time series with scaling crossovers (Sect. 3). Then, by a systematic analysis of such
time series, we present numerical results on how the different correlation exponents
at short and large scales and the scale of the crossover affect drastically the key
properties of FPT, p.`/, and h`i (Sects. 4 and 5). Finally, we compare in Sect. 6 the
theoretical and numerical results presented in Sects. 4 and 5 with those obtained in
a real time series with scaling crossover, the trajectory of the Center of Pressure of
FPT Properties of Correlated Time Series with Scaling Crossovers 91

the human postural control system in quiet standing. We show the general validity
of our results and conclude that long-range correlations can be seen, indeed, as the
common factor explaining the FPT properties in such complex systems.

2 FPT Characteristics of Time Series with Scale-Invariant


Behavior

The statistical properties of FPT is one of the main characteristic of generalized


random walks and of complex time series. In general, the FPT is defined as the
time required for the time series x.t/ to return to a particular value. In this work, we
consider any FPT value as the time interval ` between two consecutive crossings
of the signal through its mean which, without loss of generality, we will fix to zero
(Fig. 1)
Since our hypothesis is that the degree of long-range correlations are the unifying
factor explaining the different experimental results found for the statistical FPT
properties in time series with scale-invariant dynamics, we use the inverse Fourier
filtering method [21] to generate fractal signals with zero mean, unit standard
deviation, and the desired degree of long-range power-law correlations in order to
study the FPTs properties. The algorithm first generates a random signal in the time
domain, then Fourier transforms it to the frequency ( f ) domain to obtain a white
noise, multiplies this noise by a power law of the type f .2˛1/=2 , and, finally, Fourier
transforms the signal back into the time domain. Obviously, the power spectrum
S. f / of the resulting time series will be a power law of the form S. f / f .2˛1/ .

α = 0.9
x (t )

α = 1.3
x (t )

time t
Fig. 1 Examples of two time series with different degree of correlations as quantified by the
scaling exponent ˛. The piecewise constant lines illustrate the FPTs for both signals
92 P. Carpena et al.

Correlations in the final time series are quantified by the single scaling exponent
˛, which is an input of the algorithm and corresponds to the detrended fluctuation
analysis (DFA) scaling exponent (Fig. 1) by construction.1 The DFA algorithm [22]
is a scaling analysis method which calculates the fluctuations F.n/ of the analyzed
signal at different time scales n removing the local trends. The analyzed time series
presents long-range correlations and scale-invariant properties if

F.n/ n˛ (1)

and the exponent ˛ quantifies the strength of the long-range correlations. Although
other scaling exponents could have been used, we prefer to use as our reference the
DFA exponent ˛, since the DFA method has become the standard when studying
such long-range correlated time series [18–20, 23, 24], and can also be applied to
real-world nonstationary time series. For uncorrelated random signals, ˛ D 0:5;
for anticorrelated signals, ˛ < 0:5; and for positively correlated signals, ˛ > 0:5.
Processes with 0 < ˛ < 1 are fractional Gaussian noises (fGns) and processes
with 1 < ˛ < 2 are fractional Brownian motions (fBms). In particular, ˛ D 1:5
corresponds to the classical random walk.
In [15] we systematically investigated how the degree of correlations in the
system dynamics affects key properties of FPT—the shape of the probability
density p.`/ and the FPT average length h`i. We include here our main results for
completeness, and also because they will serve us as a reference when we analyze
the results obtained for time series with scaling crossovers in the next sections.
Depending on the range of correlations considered, we obtained two different
regimes for the FPT probability density p.`/:
(   
exp  ``0 if 0 < ˛ < 1
p.`/ f .`/
(2)
`3˛
if 1 < ˛ < 2

The first case, which we call stretched-exponential regime, is obtained when the
correlations are in the range 0 < ˛ < 1, i.e., for fGns. p.`/ behaves as a stretched-
exponential and the stretching parameter depends on ˛: for the well-known case
˛ D 0:5 (white noise), we find that D 1; corresponding to a pure exponential
behavior. For ˛ < 0:5, we find that > 1, and increases as ˛ decreases. In this
case, p.`/ decays faster than exponentially. For ˛ > 0:5, we find that < 1, and
decreases as ˛ increases . In this case, p.`/ is a real stretched exponential and its tail
becomes fatter as ˛ increases. This result matches experimental observations for a
great variety of phenomena [4, 5, 8].
The second case, which we call power-law tail regime, is obtained when the
correlations are in the range 1 < ˛ < 2, i.e., for fBms. In this case, the tail of the FPT

1
When the power spectrum of a time series is of the type S. f /  f ˇ , and the DFA fluctuation
function behaves as F.n/  n˛ , then the two exponents are related via ˇ D 2˛  1.
FPT Properties of Correlated Time Series with Scaling Crossovers 93

probability density p.`/ behaves as a power-law of ` with exponent 3  ˛, whereas


the function f .`/ [Eq. (2)] only affects the short-scale regime (small ` values), and
tends to an unimportant constant as ` increases. This result generalizes the well-
known random walk case (˛ D 3=2), for which p.`/ `3=2 for large `.
Concerning the mean FPT value h`i, we also obtain two different behaviors
corresponding to the two regimes in p.`/, both as as function of the time series
length N:
  
h`i1 1  a
if 0 < ˛ < 1
h`i Nb (3)
N ˛1 if 1 < ˛ < 2

In the stretched exponential regime (0 < ˛ < 1), a and b are positive constants,
and h`i1 is the finite constant asymptotic value in the limit of large N. For
increasing ˛, the exponent b decreases and then the convergence to the asymptotic
value h`i1 is slower with the time series length N, and the values of h`i1 also
increase with ˛.
In the power-law tail regime (1 < ˛ < 2) we find a power-law dependence of h`i
on the time series length N, with exponent ˛  1.
The case ˛ D 1 (1=f noise) corresponds to a phase transition between both
regimes, where p.`/ decays faster than a power law and slower than a stretched
exponential, and the mean value h`i increases logarithmically with the time series
length N.

3 Algorithm Generating Time Series with Crossovers


in the Scaling

In order to systematically study the statistical properties of FPTs in time series with
scaling crossover, we need a model to artificially generate such kind of times series.
We propose to use a modified version of the Fourier Filtering method as follows:
(1) Generate a random time series in time domain, then Fourier transforms it
to the frequency ( f ) domain to obtain a white noise. (2) Choose the desired DFA
scaling exponents governing short (˛s ) and large (˛l ) time scales, and the time scale
where the crossover between both scaling regimes happens, tc . Then, multiply the
white noise by a function of the type
(
f .2˛l 1/=2 if f  fc
Q. f / D .˛ ˛ / (4)
fc s l f .2˛s 1/=2 if f > fc

Q. f / represents two power-laws in the frequency domain matching at the crossover


.˛ ˛ /
frequency fc  tc1 . Indeed, the factor fc s l ensures continuity at fc . Obviously,
the resulting power spectrum S. f / consists of two power laws of the type f .2˛l 1/
94 P. Carpena et al.

Fig. 2 (a) Power spectrum of a time series with N D 214 data points obtained by multiplying a
white noise in the frequency domain times the function Q. f / in Eq. (4) with exponents ˛large 
˛l D 1:7 and ˛small  ˛s D 0:6, and with a crossover frequency fc D tc1 D 0:005. (b) Time
series x.t/ obtained by Inverse Fourier Transform of the frequency domain signal shown in (a). (c)
DFA fluctuation function F.n/ of the time series shown in (b) versus the time scale n (measured in
number of data points)

for low frequencies ( f  fc ) and f .2˛s 1/ for high frequencies ( f > fc ) matching at
fc (Fig. 2a).
(3) Fourier-transform back the signal into time domain (Fig. 2b). Now, the
resulting time series presents a scaling crossover at a time scale tc , and the DFA
scaling exponents are ˛s and ˛l for short and large time scales, respectively (Fig. 2c).

4 FPT Probability Density for Time Series with Crossovers


in the Scaling

With the generation method described above, our aim is to study the functional form
of the FPT probability density p.`/ as a function of the crossover scale tc , and also
as a function of the values of the scaling exponents ˛s and ˛l for short and large
time scales, respectively.
FPT Properties of Correlated Time Series with Scaling Crossovers 95

Fig. 3 FPT probability density p.`/ numerically obtained for time series generated with different
combinations of scaling exponents ˛s and ˛l , and for different tc values following the algorithm
introduced in Sect. 3. Every probability density p.`/ is obtained from 1000 realizations of time
series of length N D 223 data points

Our results (see Fig. 3) show that, in general, p.`/ shows a mixed behavior of
two functional forms: at short scales, p.`/ exhibits the profile corresponding to the
expected functional form of the FPT probability density obtained from a single
scaling time series characterized by the exponent ˛s . Conversely, at large scales,
p.`/ behaves as the FPT probability density expected for a single scaling time series
96 P. Carpena et al.

with exponent ˛l . Both functional forms depend on the numerical values of ˛s and
˛l , as expressed in Eq. (2). Mathematically, we find

p˛s .`/ if ` < g.tc /
p.`/ D (5)
p˛l .`/ if ` > g.tc /

In this equation, the symbol ‘=’ means equality in the sense of functional form, and
p˛s .`/ and p˛l .`/ represent the functional forms expected for a single scaling time
series with ˛ D ˛s and ˛ D ˛l , respectively. The function g.tc /, which controls the
transition between both functional forms, is a monotonic function of tc , although its
particular value depends on the range of values of ˛s and ˛l , for which we consider
three different cases:
(i) Case ˛s ; ˛l < 1. According to (5), and noting that for this range of ˛s and ˛l
values stretched exponential forms are expected (2), we observe (Fig. 3a, b)
such double stretched-exponential behavior, and the transition scale g.tc /
between them depends on tc as g.tc / tcc with jcj < 1. When ˛s < ˛l
the exponent c is negative and then the transition displaces to the left as tc
increases, as in the case shown in Fig. 3a, where ˛s D 0:1 > ˛l D 0:9. In
the opposite case ˛s > ˛l , the exponent c is positive and then the transition
displaces to the right as tc increases, as in Fig. 3b, where ˛s D 0:9 > ˛l D 0:1.
Then, as tc increases, the range of validity of p˛s .`/ also increases and p.`/
evolves from near the pure ˛ D 0:1 case (faster decay than exponential) for
very low tc toward the pure ˛ D 0:9 case (slower decay than exponential) for
increasing tc . Note that a perfect exponential decay (expected for ˛ D 0:5)
would appear as a perfect straight line in Fig. 3b.
(ii) Case ˛s ; ˛l > 1. In this case, both p˛s .`/ and p˛l .`/ are decaying power-
laws (2) with exponents 3  ˛s and 3  ˛l respectively, and then p.`/ resembles
this mixed behavior (Fig. 3c, d). The transition scale g.tc / between both power-
laws is particularly simple in this case, since we obtain g.tc / tc , as can be
checked in Fig. 3c d, corresponding respectively to ˛s < ˛l and ˛s > ˛l .
(iii) Case ˛s > 1, ˛l < 1 (or vice versa). In this case, a mixed behavior stretched-
exponential and power-law is expected for p.`/. In the particular case ˛s <
1 and ˛l > 1 (Fig. 3e), p.`/ behaves as a stretched exponential for short `
values and as a decaying power-law of exponent 3  ˛l for large ` values. The
transition scale between both functional forms behaves as g.tc / tc , as can
be observed in Fig. 3e. For the opposite case ˛s > 1 and ˛l < 1 (Fig. 3f), we
observe that p.`/ behaves as a decaying power-law of exponent 3  ˛s for low
` values and as a stretched exponential in the tail, as expected. The transition
scale between both functional forms behaves again as g.tc / tc , as can be
observed in Fig. 3f where anytime tc is doubled, p.`/ increases its range the
same amount in log-scale.
FPT Properties of Correlated Time Series with Scaling Crossovers 97

5 Mean FPT Value for Time Series with Crossovers


in the Scaling

As we stated previously, another important statistical property of the FPT distri-


bution of time series with scaling crossovers is the behavior of its mean value h`i
as a function of the crossover scale tc . By a systematic numerical study, we find
that for small values of tc , the mean value h`i is similar to the one expected in a
single scaling time series with exponent ˛ D ˛l (h`i˛l ). In contrast, for large values
of tc h`i tends to the value expected in a single scaling time series with exponent
˛ D ˛s (h`i˛s ). At intermediate values of tc , we observe a smooth and monotonic
transition between both extreme values. Mathematically,
8
< h`i˛l if tc ! 1
h`i D intermediate and monotonic if 1 < tc < N (6)
:
h`i˛s if tc ! N

We show in Fig. 4 the behavior of h`i as a function of tc for several combinations


of ˛s and ˛l corresponding to the three cases discussed in the previous section. Such

Fig. 4 Mean FPT value h`i as a function of the crossover time scale tc for the three different cases
discussed in Sect. 4: (a) ˛s ; ˛l < 1. (b) ˛s ; ˛l > 1. (c) ˛s > 1, ˛l < 1 (and viceversa). Every curve
is obtained from 1000 realizations of time series of length N D 223 data points generated by our
algorithm described in Sect. 3
98 P. Carpena et al.

behavior of h`i reflects properly the functional form of p.`/ in time series with
scaling crossovers, (5). When tc ! 1, the scaling exponent of the time series is
essentially ˛l , and then h`i D h`i˛l . In contrast, when tc ! N, the scaling exponent
of the time series is solely ˛s , and thus h`i D h`i˛s . In between both extreme cases,
h`i changes from one limiting value to the other (see Fig. 4) as a function of tc , i.e.,
h`i D h`i.tc /. The functional form of h`i.tc / depends on the particular ˛s and ˛l
values, as can be seen in the numerical results shown in Fig. 4. Nevertheless, h`i.tc /
can be also calculated analytically by assuming a p.`/ form as the one given in (5),
and the analytical results are in perfect agreement with the numerical ones. The
calculations are in general rather cumbersome, and in the Appendix we include the
derivation of the case ˛s ; ˛l > 1.

6 Comparing FPT Theoretical Predictions


with Experimental Observations for Systems
with Crossovers in the Scaling

Our working hypothesis is that correlations are the unifying factor behind the statis-
tical properties of FPTs in time series obtained as the output of complex dynamical
systems, independently of their different nature and their specific properties. Thus,
the theoretical and numerical results we have shown in the two precedent sections
obtained from artificial time series with scaling crossovers should be valid for real
time series exhibiting such scaling behavior. To show this, we choose as our working
example the human postural control system, and in particular, the properties of the
trajectory of the Center of Pressure (CoP) of the postural sway in quiet standing.
This system is known to have a scaling crossover since there exists two competing
dynamical mechanisms acting at different time scales [19, 20].
The data are obtained using a platform equipped with accelerometers which can
record the in-plane trajectory of the CoP of a person placed in quiet standing over the
platform. A typical xy trajectory of the CoP is shown in Fig. 5a. This trajectory can
be decomposed into the x.t/ and y.t/ time series, to study respectively the medio-
lateral and the antero-posterior motions independently. The x.t/ time series of the
CoP trajectory plotted in Fig 5a is shown in Fig. 5b, and it is the time series we
choose to analyze in the rest of this section.
If we apply DFA to this time series, we obtain a fluctuation function F.n/ with
two different power-law scalings at short and large time scales, ˛s D 1:9 and ˛l D
1:2, with a clear scaling crossover at a time scale of about tc 1 s (Fig. 5c). Note
the similarity between this result and the F.n/ function obtained from the artificial
time series generated with our model (Fig. 2c).
In this case, we observe that both ˛s and ˛l are larger than 1, and then the results
should correspond to the case (ii) discussed in Sect. 4. For this range of ˛s and ˛l ,
the FPT probability density p.`/ should behave at short scales as a power-law with
exponent .3  ˛s / D 1:1, and as a power-law with exponent .3  ˛l / D 1:8
FPT Properties of Correlated Time Series with Scaling Crossovers 99

Fig. 5 (a) A typical CoP trajectory recorded from a healthy subject during 7 min of quiet standing.
(b) x.t/ time series extracted from (a). (c) DFA fluctuation function F.n/ obtained for the x.t/ time
series shown in (b). F.n/ indicates power-law scaling behavior with a crossover at a time scale 
1 s from exponent ˛s D 1:9 to ˛l D 1:2. (d) FPT cumulative distribution function 1P.`/ obtained
from x.t/ (circles). The pronounced crossover in 1  P.`/ results from the crossover in the scaling
of the fluctuation function F.n/ in (c). Solid lines in (d) represent fits to the two power-law regimes
in 1  P.`/, corresponding to the theoretically expected exponents (2  ˛s and 2  ˛l )

at large scales, with a transition between them at a scale around 1 s. However, as


the time series length is not large (around N D 16;000 data points), the number of
FPTs is very low, and the FPT probability density p.`/ can be hardly obtained due
to undersampling. In this case, from the numerical point
R 1 of view it is better to use
the FPT cumulative distribution function 1  P.`/ D ` p.x/ dx. The experimental
1  P.`/ function obtained from x.t/ is shown in circles in Fig. 5d. If our approach is
correct, and as the integral of a power-law is another power-law with exponent one
unit larger, then 1P.`/ should behave as two power laws of exponents .2˛s / D
0:1 and .2  ˛l / D 0:8 at short and large scales respectively, with a crossover
between them in the vicinity of tc 1 s. These two theoretical power-laws are
shown as solid lines in Fig. 5d and, indeed, the agreement between the experimental
and the theoretical results is remarkable, thus suggesting the general validity of our
approach.
100 P. Carpena et al.

7 Conclusions

We have presented numerical and theoretical results corresponding to the FPT


properties of long-range correlated time series with scaling crossovers, and we have
shown that those results are useful to explain the FPT properties experimentally
observed in real correlated time series of that kind. This fact confirms our hypothesis
that correlations can be seen as the unifying factor explaining the diversity of
results found for the FPT properties in real complex systems, and that dynamical
systems with similar correlations and scaling crossovers should present similar FPT
properties, irrespectively of their specifics.

Acknowledgements We kindly thank Prof. Antonio M. Lallena, from the University of Granada
(Spain), for providing us with CoP data. We thank the Spanish Government (Grant FIS2012-36282)
and the Spanish Junta de Andalucía (Grant FQM-7964) for financial support. P.Ch.I. acknowledges
support from NIH–NHLBI (Grant no. 1R01HL098437-01A1) and from BSF (Grant No. 2012219).

Appendix

For brevity, we only include here the derivation of the behavior of h`i as a function of
tc for the case ˛s ; ˛l > 1, shown graphically in Fig. 4b. For such case, the functional
forms corresponding to both exponents are power-laws, and the transition between
them occurs at g.tc / tc . Thus, p.`/ is of the form:
(
`.3˛s / if `  tc
p.`/ D k 3˛
tc l .3˛l / (7)
3˛s ` if tc < ` < N
tc

RN
where k is a normalization constant that can be obtained from 1 p.`/ d` D 1, and
the factor tc3˛l =tc3˛s ensures continuity at tc . The mean FPT value is then

˛ 1 3˛l
Z tc s 1 N ˛l 1 tc tc2
N 1˛s C 3˛s
tc .1˛l /
h`i D ` p.`/ d` D ˛ 2 3˛
(8)
tc s 1 N ˛l 2 tc l tc
1
2˛s C 3˛
tc s .2˛l /

This expression is complicated, but noting that 1 < ˛s ; ˛l < 2, we can consider
the limit of large time series length N by keeping only the largest powers of N in
the numerator and denominator of (8). Similarly, and as we are interested in the
behavior of h`i as tc increases, we can keep only the highest powers of tc in the
numerator and denominator of the result. Altogether, we obtain:

2  ˛s ˛l 1 ˛s ˛l
h`i ' N tc (9)
˛l  1
FPT Properties of Correlated Time Series with Scaling Crossovers 101

This equation is in perfect agreement with the numerical results shown in Fig. 4b:
for increasing tc values, h`i changes between its two extreme values as a power-
law of tc with exponent ˛s  ˛l (dotted lines in Fig. 4b). Also, Eq. (9) shows the
dependence of h`i on the time series length N in the limit of large N for fixed tc . In
this case, and as tc is finite, the function p.`/ (7) is governed in the range .tc ; 1/
by the scaling exponent ˛l , and this range controls the mean value (h`i N ˛l 1 )
exactly in the same way as in the case of time series with single scaling exponents
with 1 < ˛ < 2 [see Eq. (3)].

References

1. Condamin, S., et al.: First-passage times in complex scale-invariant media. Nature 450, 77–80
(2007)
2. Bunde, A., Havlin, S. (eds.): Fractals in Science, Springer, Heidelberg (1995)
3. Schindler, M., Talkner, P., Hänggi, P.: Firing time statistics for driven neuron models: analytic
expressions versus numerics. Phys. Rev. Lett. 93, 048102 (2004)
4. Bunde, A., et al.: Long-term memory: a natural mechanism for the clustering of extreme events
and anomalous residual times in climate records. Phys. Rev. Lett. 94, 048701 (2005)
5. Reyes-Ramírez, I., Guzmán-Vargas, L.: Scaling properties of excursions in heartbeat dynamics.
Europhys. Lett. 89, 38008 (2010)
6. Leland, W.E., et al.: On the self-similar nature of Ethernet traffic. IEEE ACM Trans. Netw. 2,
1–15 (1994)
7. Cai, S.M., et al.: Scaling and memory in recurrence intervals of Internet traffic. Europhys. Lett.
87, 68001 (2009)
8. Ivanov, P.Ch., et al.: Common scaling patterns in intertrade times of U. S. stocks. Phys. Rev. E
69, 056107 (2004)
9. Wang, F.Z., et al.: Multifactor analysis of multiscaling in volatility return intervals. Phys. Rev.
E 79, 016103 (2009)
10. Ding, M.Z., Yang, W.M.: Distribution of the first return time in fractional Brownian motion
and its application to the study of on-off intermittency. Phys. Rev. E 52, 207–213 (1995)
11. Shlesinger, M.F., Zaslavsky, G.M., Klafter, J.: Strange kinetics. Nature 363, 31–37 (1993)
12. Rangarajan, G., Ding, M.Z.: First passage time distribution for anomalous diffusion. Phys. Lett.
A 273, 322–330 (2000)
13. Khoury, M., et al.: Weak disorder: anomalous transport and diffusion are normal yet again.
Phys. Rev. Lett. 106, 090602 (2011)
14. Eliazar, I., Klafter, J.: From Ornstein-Uhlenbeck dynamics to long-memory processes and
fractional Brownian motion. Phys. Rev. E. 79, 021115 (2009)
15. Carretero-Campos, C., et al.: Phase transitions in the first-passage time of scale-invariant
correlated processes. Phys. Rev. E 85, 011139 (2012)
16. Ivanov, P.Ch.: Scale-invariant aspects of cardiac dynamics. Observing sleep stages and
circadian phases. IEEE Eng. Med. Biol. Mag. 26, 33 (2007)
17. Ivanov, P.Ch., et al.: Levels of complexity in scale-invariant neural signals. Phys. Rev. E 79,
041920 (2009)
18. Ivanov, P.Ch., Yuen, A., Perakakis, P.: Impact of stock market structure on intertrade time and
price dynamics. PLOS One 9, e92885 (2014)
19. Blázquez, M.T., et al.: Study of the human postural control system during quiet standing using
detrended fluctuation analysis. Physica A 388, 1857–1866 (2009)
102 P. Carpena et al.

20. Blázquez, M.T., et al.: On the length of stabilograms: a study performed with detrended
fluctuation analysis. Physica A 391, 4933–4942 (2012)
21. Makse, S., et al.: Method for generating long-range correlations for large systems. Phys. Rev.
E. 53, 5445 (1996)
22. Peng, C.-K., et al.: Mosaic organization of DNA nucleotides. Phys. Rev. E. 49, 1685 (1994)
23. Hu, K., et al: Effect of trends on detrended fluctuation analysis. Phys. Rev. E 64, 011114 (2001)
24. Coronado, A.V., Carpena, P.: Size effects on correlation measures. J. Biol. Phys. 31, 121 (2005)
Part II
Theoretical and Applied Econometrics
The Environmental Impact of Economic Activity
on the Planet

José Aureliano Martín Segura, César Pérez López,


and José Luis Navarro Espigares

Abstract As the United Nations is currently discussing the post-2015 agenda, this
paper provides an updated quantification of the environmental impact index and its
evolution during the last 50 years. An updated and global environmental impact
index estimate, based on the theoretical model of consumption equations initiated
in the 1970s by Paul Ehrlich and John Holdren, was carried out. Included in the
geographic scope of the study are all countries for which data are published in the
World Bank’s database for the period 1961–2012.
Once the growing evolution of this index was noted, the secondary objectives of
the study were focused on the analysis of the relationship between CO2 emissions,
mortality rate, and green investments, the latter being estimated by the volume
of investment in Research and Development (R&D). In both cases our estimation
showed a positive and statistically significant relationship between CO2 emissions
and the mortality and R&D investments variables.

Keywords CO2 emissions • Ehrlich and Holdren index • Mortality rate •


Sustainability

1 Introduction and Objectives

“Climate change is considered one of the most complex challenges for this century.
No country is immune nor can, by itself, address either the interconnected chal-
lenges or the impressive technological change needed” [1]. In addition, international
agencies make it clear that it is developing countries that will bear the brunt;
countries which must simultaneously cope with efforts to overcome poverty and
promote economic growth. Therefore, a high degree of creativity and cooperation,
a “climate-smart approach”, is needed.

J.A. Martín Segura () • J.L. Navarro Espigares


University of Granada, Granada, Spain
e-mail: [email protected]; [email protected]
C. Pérez López
University Complutense of Madrid, Madrid, Spain
e-mail: [email protected]

© Springer International Publishing Switzerland 2016 105


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_8
106 J.A. Martín Segura et al.

In 1995 the Intergovernmental Panel on Climate Change (IPCC) warned that


climate change would involve hazards to human health. The recently published
IPCC’s Fifth Assessment Report (AR5) [2] also emphasises the impact of climate
change on health. Co-benefits for health, derived from a climate change mitigation
strategy, are particularly high in today’s developing world [3]. In addition to the
important and controversial political decisions that have to be taken, companies
will play a leading role in this task. Regardless of the strategic market positioning
of companies, entrepreneurial issues such as investment decisions, systems of
responsibility and sustainable management, and the varying degrees of collaboration
between governments and international agencies are all strategic decisions that will
determine the sustainability of the planet [4].
Our main objective is to estimate the impact of economic activity on the
environment by means of the calculation of an environmental impact index. Certain
econometric estimates will be carried out to obtain evidence regarding the influence
of economic activity on mortality around the world. Our estimates will also
include the increase of Research and Development (R&D) investments, because
this has a counterbalance effect on the negative impact of mortality. In this way
we want to show the two sides of human economic activity. On the one hand, we
generate contaminants and illnesses, but on the other hand, our survival instinct
simultaneously leads us to invest in new technologies to make economic activity
more sustainable [5].

2 Methodology

A first estimate was made to test what is actually happening in the world with regard
to the economic and social impact caused by pollutant emissions. The Ehrlich and
Holdren index was calculated for each country in the world, using data provided by
the World Bank (1961–2012). This index was then compared, country by country,
with mortality rates and R&D investments.
The Ehrlich and Holdren index [6, 7] offers a way of connecting environmental
problems and their root causes. The equation describes the relationship between
population, consumption, and environmental impact in approximate terms as the
following:

TEI D P  UC=pc  EE-1; (1)

where TEI represents total environmental impact, P represents population, UC/pc


represents (average) units of consumption of products and services per capita, and
EE represents the environmental efficiency of production: the use and disposal of
those units.
However, the average of resources used by each person can be measured by the
GDP per capita. The amount of environmental pollution per each unit of resources
consumed could be measured by the number of tonnes of CO2 per unit of GDP.
The Environmental Impact of Economic Activity on the Planet 107

Thus:

TEI D P  GDP=P  CO2 =GDP D CO2 : (2)

When we use these variables to calculate the impact factor, after eliminating
common elements in the numerators and denominators of the factors, we note that
the result of the index is not greater than the total CO2 emissions, although it is
broken down into various determinants. Therefore, in developing countries, the size
of the population and the resulting degradation of potentially renewable resources
are often the most decisive factors. However, in developed countries, the main
components are the high level of resource utilisation and the pollution generated.
Thus, by presenting the index in this way comparisons can be made between more
and less developed countries, in order to discover the root causes of environmental
problems [8, 9].
On the other hand, according to Amartya Sen [10], mortality rates give the
best picture of health and disease levels in a population. We have also employed
R&D investments in our comparison to reflect the efforts made by economic agents
to improve the situation, by putting technological innovations at the service of
sustainability.
We are conscious that other variables could have been included to study this
phenomenon in more detail. Further research will engage an enlargement of our
model.
In our research, a first econometric estimation was made through panel data
techniques, because this technique allows us to deal with two-dimensional (cross
sectional/times) series.
As Baltagi [11] explains, these models have some advantages over cross-
sectional models or time series, because there is no limit to the heterogeneity of the
data within them. More informative data is provided and there is a greater suitability
for studying the dynamics of change, which is better for detecting and measuring
certain effects. This allows more complex models of behaviour to be studied and
minimises the bias resulting from the use of total aggregate data. The model utilised
in this case would be the following:

Yit D ˇ1 C ˇ2 X2it C ˇ3 X3it C it =i D 1; 2; : : : 19I t D 1; 2; : : : 7: (3)

The estimate would depend on the assumptions made about the error term.
Firstly, we might consider that the coefficients of the slopes of the ˇ variables
are constant for all the regressions calculated for each country, but the independent
coefficients, or the intersection, vary for each of these populations, with the subscript
being variable. The model would be the following:

Yit D ˇ1i C ˇ2 X2it C ˇ3 X3it C it =i D 1; 2; : : : 19I t D 1; 2; : : : 7: (4)

This regression model is called fixed effects or least squares dummy variable.
108 J.A. Martín Segura et al.

In contrast to this method of calculation, stands another important method called


random effects or error component model. The basic idea of this method is that
instead of considering the constant term fixed for each population or person there
is supposed to be a random variable with a mean equal to ˇ 1 and a random error
term "i with a mean value of zero and constant variance 2" . In this way the intercept
value for a single traverse unit (State in this case) is expressed as:

ˇ1i D ˇ1 C "i : (5)

The model looks like this:

Yit D ˇ1 C ˇ2 X2it C ˇ3 X3it C "i C it D ˇ1 C ˇ2 X2it C ˇ3 X3it C !it =!it


(6)
D "i C it ; i D 1; 2; : : : 19I t D 1; 2; : : : 7:

The error term ! it would consist of two components, "i would be the individual
specific error component and it the component already discussed, which combines
the time series and cross track error components. Hence the name of the error
components model, because the model error term has two components (more
components can also be considered, for example, the temporal component).
Of these two methods, in accordance with the data we have, the most suitable
for the objectives sought is the fixed effects method. In this, as we have said, the
coefficients of the regression equation remain fixed for all countries, but the constant
terms are different for each. These different constant terms represent the difference
in each country, at the time of addressing the issue studied. The reason for using
this calculation procedure, and not the random effects model, and without biasing
the test performed is to ensure statistical goodness of fit. We can specify that we are
working with all countries, not just a sample.
The data have been collected over time (1961–2012) and for the same individuals
(205 countries). In summary, regarding our approach, we chose the fixed effect
model because, in our opinion, this is the most adequate when interest is only
focussed on drawing inferences about the examined individuals (countries). This
approach assumes that there are unique attributes of individuals (countries) that are
not the results of random variation and that do not vary over time.

3 Results

We have made some comparisons of the EH index components between countries,


which are quite illustrative of the current situation.
Thus, India has an impact index value much lower than the USA, despite having
four times the population, whilst China exceeds the rate of the USA, but not in the
same proportion as its population. On the other hand, by comparing two emerging
countries such as Brazil and Russia, with similar populations and GDP/pc, we can
The Environmental Impact of Economic Activity on the Planet 109

see that the difference in the index values is due to the high level of emissions per
unit of GDP in Russia (Fig. 1).
The above graph shows the global evolution of each of the components of the
index calculated. In this case the key element for explaining the increase of CO2
emissions is the growth of per capita GDP. The population has grown but well
below the index pace. The ratio of intensity of emissions (CO2 /GDP) has declined,
showing greater energy efficiency. Each additional unit of GDP generates less CO2 .
Finally, the ratio that grows at the same rate as the index is the GDP/PC. This implies
that higher income levels mean more consumption and more production, with the
resulting increase in emissions.
We made two fixed effects econometric estimates, into sections and years. In
the first case (Table 1), we estimate the effect caused by the environmental impact
index on the mortality rate, which, as previously stated, is the variable that gives
the best view of health and disease levels in a population, as indicators of human
development. In the second estimate (Table 2), we analyse the effect of the impact
index on the sum total of R&D investments (without detailing, for the moment, what

-1

-2

-3
60 65 70 75 80 85 90 95 00 05 10 15

Median GDPPC Median POPULATION


Median CO2_GDP Median TEI

Fig. 1 Evolution of each of the components of the index calculated. Note: Total CO2 emis-
sions was obtained in http://data.worldbank.org/indicator/EN.ATM.CO2E.PC/countries?display=
default. GDP was obtained in http://data.worldbank.org/indicator/NY.GDP.PCAP.CD. POPULA-
TION was obtained in http://data.worldbank.org/indicator/SP.POP.TOTL

Table 1 Comparisons of the EH index components among countries


IND (India) USA CHN (China) BRA (Brazil) RUS (Russia)
Inhabitants (millions) 1205.62 309.26 1333.70 195.21 142.39
GDP/pc 1419.10 48,357.68 4433.36 10,978.09 10,709.52
CO2 /GDP 1174.12 363.21 1397.32 195.86 1141.55
EH index (millions) 2008.82 5433.06 8286.89 419.75 1740.78
110 J.A. Martín Segura et al.

Table 2 Effect of total Dependent variable: MORTALITY


environmental impact on
Method: panel least squares
mortality
Sample (adjusted): 1961–2012
Variable Coefficient Std. error t-Statistic Prob.
TEI 4.17E-10 1.81E-10 2.300219 0.0215
C 9.995998 0.018092 552.4962 0.0000

Table 3 Effect of total Dependent variable: RD


environmental impact index
Method: panel least squares
on R&D investments
Sample (adjusted): 1961–2012
Variable Coefficient Std. error t-Statistic Prob.
TEI 2.82E-10 7.08E-11 3.987271 0.0001
C 0.113991 0.007060 16.14656 0.0000

percentage is dedicated to investment in renewable energy or other technologies to


help curb climate change) (Table 3).
In the first case, a positive and significant relationship between the environmental
impact index and the mortality rate is observed, indicating that the damage caused
to the planet by consumption patterns has the effect of increased mortality, despite
the low level of the TEI coefficient. The consistency of this relationship has been
proved by means of a co-integration analysis. Since low p-values were obtained, the
panel unit root tests did not indicate the presence of unit roots. Therefore, the null
hypothesis of no co-integration of the variables within each country and over time
is rejected with a confidence level of 95 %. Because the presence of co-integrated
variables is observed, we can say that this relationship is compact in the long term
and not spurious.
Looking at the table of fixed effects coefficients by countries it can be seen that
the largest effects are, surprisingly, located in some developed countries (Germany,
Belgium, United Kingdom : : : ), plus China and Russia. If we analyse the fixed
effects coefficients by years, in the period 1983–2010 they become negative. This
sign swift indicates that the effect on mortality started to descend.
Regarding the second estimate, the index has a positive and significant effect on
R&D investments, many of which are aimed at new technologies and renewable
energy. These investments may be helping to curb the effects of environmental
degradation. Analysing the fixed effects of countries, we realised that other countries
like Brazil or Russia are also included in the list of developed countries. With respect
to the time fixed effects model, the period with highest effect appears between the
years 1996 and 2011, coinciding with the greatest R&D investments.
The Environmental Impact of Economic Activity on the Planet 111

4 Conclusions

The main conclusion reached in this study is that the rate of environmental
impact has maintained an upward trend from the 1960s until today. Although this
development is decreasing in relation to GDP, in absolute terms its increase has
not stopped for over 50 years. This growth is particularly strong in the last decade
and has peaked in recent years, coinciding with the period of economic crisis and
stronger growth in emerging economies.
However, some progress is evident. During the study period CO2 emissions per
unit of production have been reduced, and the relationship between these emissions
and the total population has remained constant. At the same time, it is important to
note that the growth of investment in R&D has remained similar to the growth of
the impact index.
The influence of environmental deterioration on mortality rates has been positive
and statistically significant. However, although the statistical relationship has proved
to be meaningful and not spurious, it is obvious that this relationship does not
have a high intensity (small coefficient) and the coefficients of countries show high
variability depending on their level of development. Regardless of the statistical
consistency of this relationship, epidemiological studies show clear evidence of the
importance of this relationship. Nevertheless, further research is needed to break
down this relationship according to the level of development of countries.
Finally, according to international agencies two lines of action should be high-
lighted. We believe that they will be crucial for a successful strategy for combatting
climate change and environmental degradation. These lines are green investments
and socially responsible policies in the business world. There are significant
opportunities for businesses in green investments, in areas such as infrastructure,
education, health and new technologies related to renewable energies.

References

1. World Bank. World Development Report 2010. ©World Bank. https://openknowledge.


worldbank.org/handle/10986/4387 (2010). License: Creative Commons Attribution license
(CC BY 3.0 IGO)
2. IPCC. Climate Change 2014: Mitigation of Climate Change Working Group III Contribution
to the IPCC Fifth Assessment Report, Climatolo. http://www.cambridge.org/es/academic/
subjects/earth-and-environmental-science/climatology-and-climate-change/climate-change-
2014-mitigation-climate-change-working-group-iii-contribution-ipcc-fifth-assessment-
report?format=PB (2014)
3. Bruce, J.P., Yi, H., Haites, E.F.: Climate Change 1995: Economic and Social Dimensions of
Climate Change: Contribution of Working Group III to the Second Assessment Report of the
Intergovernmental Panel on Climate Change. Cambridge University Press, Cambridge (1996)
4. United Nations. The road to dignity by 2030: ending poverty, transforming all lives and protect-
ing the planet. http://www.un.org/disabilities/documents/reports/SG_Synthesis_Report_Road_
to_Dignity_by_2030.pdf (2014)
112 J.A. Martín Segura et al.

5. UNDP. Human Development Report 2013. The Rise of the South: Human Progress in a Diverse
World. United Nations Development Programme. UN Plaza, New York. http://hdr.undp.org/
sites/default/files/reports/14/hdr2013_en_complete.pdf (2013)
6. Miller, G.T.: Introducción a la ciencia ambiental: desarrollo sostenible de la tierra [Introduction
to environmental science: sustainable development of the earth]. Thomson, Madrid (2002)
7. Ehrlich, P.R., Holdren, J.P.: Impact of population growth. Science 171(3977), 1212–1217
(1971). doi:10.1126/science.171.3977.1212
8. Chertow, M.R.: The IPAT equation and its variants. J. Ind. Ecol. 4(4), 13–29 (2000).
doi:10.1162/10881980052541927
9. European Commission. Environmental Impact of Products (EIPRO). Analysis of the life cycle
environmental impacts related to the final consumption of the EU-25. European Commission
Joint Research Centre (DG JRC) Institute for Prospective Technological Studies. http://ftp.jrc.
es/EURdoc/eur22284en.pdf (2006)
10. Sen, A., Kliksberg, B.: Primero la Gente [People First]. Deusto (2007)
11. Baltagi, B.H.: Econometric Analysis of Panel Data, 4th edn. Wiley, Chichester (2009)
Stock Indices in Emerging and Consolidated
Economies from a Fractal Perspective

María Antonia Navascués, Maria Victoria Sebastián, and Miguel Latorre

Abstract In this chapter we analyze five international stock indices (Eurostoxx


50, Ibovespa, Nikkei 225, Sensex, and Standard & Poor’s 500, of Europe, Brazil,
Japan, India, and USA, respectively) in order to check and measure their geometric
complexity. From the financial point of view, we look for numerical differences in
the wave patterns between emerging and consolidated economies. We are concerned
with the discrimination of new and old markets, in a self-similar perspective. From
the theoretical side, we wish to seek evidences pointing to a fractal structure of
the daily closing prices. We wish to inquire about the type of randomness in the
movement and evolution of the indices through different tests. Specifically, we use
several procedures to find the suitability of an exponential law for the spectral power,
in order to determine if the indices admit a model of colored noise and, in particular,
a Brownian random pattern. Further, we check a possible structure of fractional
Brownian motion as defined by Mandelbrot. For it, we determine several parameters
from the spectral field and the fractal theory that quantify the values of the stock
records and its trends.

Keywords Fractal dimension • Fractional Brownian motions • Random vari-


ables • Stock market indices

M.A. Navascués () • M. Latorre


Escuela de Ingeniería y Arquitectura, Universidad de Zaragoza, C/ María de Luna 3, 50018
Zaragoza, Spain
e-mail: [email protected]; [email protected]
http://pcmap.unizar.es/~navascues/
M.V. Sebastián
Centro Universitario de la Defensa. Academia General Militar de Zaragoza. Ctra. de Huesca s/n
50090 Zaragoza, Spain
e-mail: [email protected]

© Springer International Publishing Switzerland 2016 113


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_9
114 M.A. Navascués et al.

1 Introduction

A stock index is a mathematical weighted sum of some values that are listed in
the same market. Since its creation in 1884, the stock indices have been used as
an economic and financial activity measurement of a particular trade sector or a
country.
In this paper we wish to inquire about the statistical and fractal structure of the
daily closing prices of five international stock indices (Eurostoxx 50, Ibovespa,
Nikkei 225, Sensex, and Standard & Poor’s 500, of Europe, Brazil, Japan, India,
and USA, respectively) through the years 2000–2014, using two different types of
procedures. The stock data are freely available in the web Yahoo Finance [10]. We
chose these selectives because they are a good representation of the trends of the
market, since they collect and reflect the behaviors and performances of new and
old economies. We wish to check if they have a similar statistical and self-affine
structure (Figs. 1 and 2).
Since the founding work of L. Bachelier in this thesis “La théorie de la spécu-
lation,” the random walk model has been present in almost all the mathematical
approaches to the economic series.
Our goal is to check the hypothesis stating that these time data are Brownian
motions. To this end we compute the following parameters: Hurst scalar, fractal
dimension, and exponent of colored noise.

5500

5000

4500

4000

3500

3000

2500

2000

2000 2005 2010 2015

Fig. 1 Chart of Eurostoxx in the period 2000–2014


Are the Stock Market Indices Really Brownian Motions? 115

25 000

20 000

15 000

10 000

5000

0
2000 2005 2010 2015

Fig. 2 Chart of Sensex in the period 2000–2014

The Hurst exponent characterizes fractional Brownian variables, giving a mea-


sure of the self-similarity of the records. This number was proposed by H.E. Hurst
in a hydrological study [1] and it has been applied over years in very different fields.
Its recent popularity in the financial sector is due to later works like, for instance,
those of Peters et al. [7–9]. Through the exponent we obtain the fractal dimension,
describing the fractal patterns involved in the complexity of the chart.
We compute the annual dimensions, providing an indication of the chaoticity of
data, that is to say, whether every record is a pure random variable (fractal dimension
equal to 2) or on the contrary it owns strong underlying trends (dimension near 1).
We have performed statistical tests as well, in order to elucidate if the computed
parameters are significantly different between the indices studied.
In all the cases we analyze the closing prices. The returns can be treated as time
series considering, for instance, GARCH models [6].

2 Exponent of Colored Noise

In general it is said that the economical series are well represented by colored noises,
and we wished to test this hypothesis. A variable of this type satisfies an exponential
power law:

S. f / ' kf  exp ;
116 M.A. Navascués et al.

where f is the frequency and S. f / is the spectral power. In our case we compute
discrete powers corresponding to discrete frequencies (mf0 ), where f0 D 2=T is
the fundamental frequency and m D 1; 2; : : : (T is the length of the recording).
A logarithmical regression of the variables provides the exponent as slope of the
fitting.
For it we construct first a truncated trigonometric series in order to fit the data [5].
Since the Fourier methods are suitable for variable of stationary type, we subtracted
previously the values on the regression line of the record. We analyzed the index
year by year and obtained an analytical formula for every period, as sum of the
linear part and the spectral series. We compute in this way a discrete power spectrum
which describes numerically the great cycles of the index. We performed a graphical
test in order to choose the number of terms for the truncate sum. We find that 52
summands are enough for the representation of the yearly data, that corresponds to
the inclusion of the cycles of weekly length. The formula is almost interpolatory.
The harmonics of the series allow us to obtain the spectral powers. These quantities
enable a numerical comparison between different indicators, for instance. In this
case we used the method to inquire into the mathematical structure of the quoted
indices. The numerical procedure is described in the reference [5] (Fig. 2).
In Table 1 the exponents computed for the years listed are presented. The mean
values obtained on the period are: 1:95, 1:94, 1:99, 1:93, and 1:97 for Eurostoxx
50, Ibovespa, Nikkei 225, Standard & Poor’s 500, and Sensex, respectively, with
standard deviations 0:16, 0:22, 0:20, 0:13:, and 0:17. The results suggest fairly a
structure close to a red noise, whose exponent is 2. The correlations obtained in
their computation are about 0:8.

Table 1 Exponent of colored Year Eurostoxx Ibovespa Nikkei S&P Sensex


noise
2000 1:810 1:851 1:844 1:918 2:069
2001 2:032 1:794 1:819 2:052 1:906
2002 2:078 2:133 2:215 1:976 1:712
2003 1:964 2:248 2:104 2:253 2:426
2004 2:052 1:893 2:039 1:938 2:005
2005 1:915 1:766 1:885 1:849 2:006
2006 2:288 1:866 1:892 1:884 2:037
2007 1:947 1:620 2:316 1:979 1:908
2008 1:595 1:978 2:021 1:736 1:950
2009 1:870 1:456 1:999 1:832 1:905
2010 1:866 2:022 2:312 2:082 2:006
2011 1:780 2:174 1:534 1:878 1:815
2012 2:025 2:115 1:927 1:776 2:071
2013 2:023 2:132 1:861 1:883 1:948
2014 2:003 2:084 2:038 1:974 1:734
Are the Stock Market Indices Really Brownian Motions? 117

3 Fractional Brownian Motion

For stock indices, the Hurst exponent H is interpreted as a measure of the trend
of the index. A value such that 0 < H < 0:5 suggests anti-persistence, and the
values 0:5 < H < 1 give evidence of a persistent series. Thus, the economical
interpretation is that a value lower than 0:5 points to high volatility, that is to say,
changes more frequent and intense, meanwhile H > 0:5 shows a more defined
tendency. By means of the Hurst parameter, one can deduce whether the record
admits a model of fractional Brownian motion [3].
It is said that the Brownian motion is a good model for experimental series and, in
particular, for economic historical data. The fractional (or fractal) Brownian motions
(fBm) were studied by Mandelbrot (see for instance [2, 3]). They are random
functions containing both independence and asymptotic dependence and admit the
possibility of a long-term autocorrelation. Another characteristic feature of fBm’s is
the self-similarity [3]. In words of Mandelbrot and Van Ness [3]: “fBm falls outside
the usual dichotomy between causal trends and random perturbation.” The fractional
Brownian motion is generally associated with a spectral density proportional to
1=f 2HC1 ; where f is the frequency. For H D 1=2 one has an 1=f 2 noise (Brownian
or red).
A fractional Brownian motion with Hurst exponent H, BH .t; !/, is characterized
by the following properties:
1. BH .t; !/ has almost all sample paths continuous (when t varies in a compact
interval I).
2. Almost all trajectories are Hölder continuous for any exponent ˇ < H, that is to
say, for each such path, there exists a constant c such that

jBH .t; !/  BH .s; !/j  c jt  sj ˇ :

3. With probability one, the graph of BH .t; !/ has both Hausdorff and box
dimension equal to 2  H:
4. If H D 12 , BH .t; !/ is an ordinary Brownian function (or Wiener process). In this
case the increments in disjoint intervals are independent.
5. The increments of BH .t; !/ are stationary and self-similar, in fact

fBH .t0 C T; !/  BH .t0 ; !/g ≈ fhH .BH .t0 C hT; !/  BH .t0 ; !//g;

where ≈ means that they have the same probability distribution.


6. The increments fBH .t0 C T; !/  BH .t0 ; !/g are Gaussian with mean zero and
variance proportional to T 2H [3, Corollary 3.4].
7. For H > 12 , the process exhibits long-range dependence.
The goal of our numerical experiment is to inquire about the structure of the daily
stock data as fractional Brownian motions. Are they really variables of this type?
118 M.A. Navascués et al.

Table 2 Hurst parameter Year Eurostoxx Ibovespa Nikkei S&P Sensex


2000 0:416 0:467 0:496 0:413 0:501
2001 0:478 0:467 0:409 0:500 0:502
2002 0:408 0:487 0:514 0:446 0:533
2003 0:449 0:518 0:467 0:448 0:575
2004 0:482 0:420 0:505 0:475 0:462
2005 0:429 0:428 0:491 0:436 0:537
2006 0:470 0:477 0:493 0:406 0:490
2007 0:424 0:429 0:487 0:401 0:463
2008 0:390 0:413 0:412 0:352 0:463
2009 0:506 0:421 0:499 0:489 0:455
2010 0:449 0:462 0:468 0:474 0:518
2011 0:471 0:472 0:453 0:432 0:506
2012 0:420 0:494 0:585 0:417 0:511
2013 0:454 0:499 0:479 0:459 0:462
2014 0:477 0:503 0:498 0:510 0:536

We take advantage of the defining property 6 in order to compute the exponent


H, instead of using the R/S algorithm. We consider annual records and delays of
1,2 . . . , days [4]. The procedure itself constitutes a test to check the model. The
method performs much better than the technique described in the previous section.
Consequently we think that the current framework improves the latter largely.
Table 2 displays the Hurst exponents computed for the period considered. The
mean values obtained are: 0:45, 0:46, 0:48, 0:44, and 0:50, for Eurostoxx 50,
Ibovespa, Nikkei 225, Standard & Poor’s 500, and Sensex, respectively. The results
emphasize an evident model of fractional Brownian motion, whose exponents range
from 0 to 1. The correlations obtained in the computation of the parameter are
close to 1; pointing to a more accurate description of the series considered in the
Mandelbrot theory. A classical Brownian motion corresponds to an exponent of 0:5.
We summarize now the results obtained for the Hurst parameter:
Regarding Eurostoxx index, the maximum value (0:506) is recorded in 2009 and
the minimum (0:390) in 2008. Thus, the range variation is 0:116 which represents a
22:92 % with respect to the maximum value.
In the Brazilian data, the highest Hurst exponent is reached in 2003, with value
0:518. The minimum occurs in the year 2008, with a value of 0:413. The parameter
varies in the period 0:105 points, representing a 22:27 % of the peak value.
In the Japanese index the absolute extremes occur in 2012 and 2001 with values
0:585 and 0:409, respectively. The second minimum (0:412) is recorded in 2008.
The range of variation is 0:176 (30:08 %).
For the S&P index, the maximum is reached in 2014 with value 0:510, and in
2008 there is a minimum of 0:352. The range of variation in the scalar in the S&P
index is 0:158 (30:98 %).
Are the Stock Market Indices Really Brownian Motions? 119

Regarding Sensex, the maximum is set to 0:575 in 2003, which represents the
global highest exponent recorded. A minimum of the Hurst exponent is found in
2009 with value 0:455. The range of absolute variation is 0:12, a 20:87 % with
respect to the peak value of the period.
In 2008, beginning of the financial crisis, a general drop occurs in the Hurst
exponent of the indices, reaching the absolute minimum of the period in the Western
countries. The fall on Eastern countries is lower, pointing to less influence of the
financial crisis on these selectives.
Table 3 displays the average Hurst parameters of the indices in the periods 2000–
2007 (pre-crisis) and 2008–2014 (crisis). These exponents increase very slightly in
the second term, except in the Indian case, but the number of samples is insufficient
(and the difference too small) to perform statistical tests in order to check the
increment. The variability is higher in the second period too, with the exception
of Sensex.
Table 4 shows the annual fractal dimensions computed for the years considered.
The average values obtained are: 1:55, 1:54, 1:52, 1:56, and 1:50, for Eurostoxx
50, Ibovespa, Nikkei 225, Standard & Poor’s 500, and Sensex respectively, with

Table 3 Mean and standard deviation of the Hurst parameter of the indices in the periods pre-crisis
and crisis
Period Date Eurostoxx Ibovespa Nikkei S&P Sensex
2000–2007 Mean 0:445 0:462 0:483 0:441 0:508
2000–2007 Stand. dev. 0:029 0:034 0:033 0:035 0:039
2008–2014 Mean 0:452 0:466 0:485 0:448 0:493
2008–2014 Stand. dev. 0:038 0:037 0:053 0:053 0:032

Table 4 Variance fractal dimension


Year Eurostoxx Ibovespa Nikkei S&P Sensex
2000 1:584 1:533 1:504 1:587 1:499
2001 1:522 1:533 1:591 1:500 1:498
2002 1:592 1:513 1:486 1:554 1:467
2003 1:551 1:482 1:533 1:552 1:425
2004 1:518 1:580 1:495 1:525 1:538
2005 1:571 1:572 1:509 1:564 1:463
2006 1:530 1:523 1:507 1:594 1:510
2007 1:576 1:571 1:513 1:599 1:537
2008 1:610 1:587 1:588 1:648 1:537
2009 1:494 1:579 1:501 1:511 1:545
2010 1:551 1:538 1:532 1:526 1:482
2011 1:529 1:528 1:547 1:568 1:494
2012 1:580 1:506 1:415 1:583 1:489
2013 1:546 1:501 1:521 1:541 1:538
2014 1:523 1:497 1:502 1:490 1:464
120 M.A. Navascués et al.

Table 5 p-Values obtained p-Values Eurostoxx Ibovespa Nikkei S&P Sensex


in the statistical test
Eurostoxx X 0:250 0:015 0:775 0:001
Ibovespa 0:250 X 0:233 0:217 0:021
Nikkei 0:015 0:233 X 0:021 0:233
S&P 0:775 0:217 0:021 X 0:000
Sensex 0:001 0:021 0:233 0:000 X

standard deviations 0:03, 0:03, 0:04, 0:04, and 0:03.The typical fractal dimension of
a Brownian motion is 1:5. The highest fractal dimension (1:648) is recorded in the
American index during the year 2008 (outbreak of the crisis). The second maximum
is European (1.610).

3.1 Statistical Tests

We have performed a nonparametric Mann–Whitney test to the parameters obtained


for the five selectives. The objective was to find (if any) significant differences in
the values with respect to the index considered. The samples were here the annual
Hurst values of each selective. The parametric tests require some hypotheses on
the variables like, for instance, normality, equality of variances, etc. In our case we
cannot assume the normality of the distribution because it is unknown. Commonly
this condition may be acceptable for large samples, but the size is small here, and for
this reason we chose a nonparametric test, being Mann–Whitney a valid alternative.
We provide the results of the test applied to the outcomes of the Hurst parameter of
the five indices and 15 years. The p-values are shown in Table 5.
We can observe that, with a significance level of 0.05, there are differences in
Sensex with respect to Standard & Poor’s 500, Eurostoxx, and Ibovespa. The same
occurs in the Japanese selective, which performs differently from Eurostoxx and
S&P. The larger means correspond to the Eastern countries in both periods (see
Table 3).

4 Conclusions

The fractal tests in the stock records analyzed in the period 2000–2014 provide a
variety of outcomes which we summarize below.
– The numerical results present a great uniformity. Nevertheless, the p-values
provided by the statistical test support, at 95 % confidence level, the numerical
differences in the Indian index with respect to S& P, Europe, and Brazil, and
those of Japan with respect to Eurostoxx and S&P.
Are the Stock Market Indices Really Brownian Motions? 121

2.0

1.8

1.6

1.4

1.2

1.0
2000 2002 2004 2006 2008 2010 2012 2014

Fig. 3 Fractal dimensions of Eurostoxx (squares) and Sensex (dots) over the period considered

– The fractal dimensions of India and Japan are slightly lower than the rest of the
indices, pointing to a bit less complex pattern of the records (see Fig. 3).
– The year 2008 (beginning of the crisis) records a global minimum of the Hurst
exponent in the Western countries. The drop of this oscillator is less evident in
the Eastern economies, pointing to a more reduced (or delayed) influence of the
financial crisis. The location of the maxima is not so uniform and moves over the
period.
– The mean average of the Hurst exponents is 0:47. The standard deviations are
around 0:03.
– The results obtained from the exponent ˛ are around 1:95, very close to the
characteristic value of a red noise or Brownian motion (˛ D 2), with a typical
deviation of 0:02.
– The correlograms of the different indices, whose tendency to zero is slow, pre-
luded a type of random variable very different of a white noise. The computations
performed confirmed this fact. The stock records may admit a representation
by means of colored noises, in particular of red noise, refined by a model of
fractional Brownian motion quite strict. The numerical results suggest that the
Hurst exponent is a good predictor of changes in the market.
– The Hurst scalar is suitable for the numerical description of this type of econom-
ical signals. The exponent gives a measure of the self-similarity (fractality) of the
data.
In general, we observe a mild anti-persistent behavior in the markets (H < 0:5)
that is slightly weaker in the Eastern economies (mainly in India during the first
122 M.A. Navascués et al.

years of the period). Nevertheless, it is likely that the globalization process will lead
to a greater uniformity.
Concerning the methodology, we think that the first test performed is merely
exploratory. The necessary truncation of the series defined to compute the powers
collects only the macroscopic behavior of the variables and omits the fine self-affine
oscillations. The fractal test is however robust. There is absolutely no doubt that the
tools provided by the Fractal Theory allow us to perform a more precise stochastic
study of the long-term trends of the stocks, and give a higher consistency than the
classical hypotheses.

References

1. Hurst, H.E.: Long-term storage of reservoirs: an experimental study. Trans. Am. Soc. Civ. Eng.
116, 770–799 (1951)
2. Mandelbrot, B.B., Hudson, R.L.: The (Mis)Behavior of Markets: A Fractal View of Risk, Ruin
and Reward. Basic Books, New York (2004)
3. Mandelbrot, B.B., Ness, J.V.: Fractional Brownian motions, fractional noises and applications.
SIAM Rev. 10, 422–437 (1968)
4. Navascués, M.A., Sebastián, M.V., Blasco, N.: Fractality tests for the reference index of the
Spanish stock market (IBEX 35). Monog. Sem. Mat. García de Galdeano 39, 207–214 (2014)
5. Navascués, M.A., Sebastián, M.V., Ruiz, C., Iso, J.M.: A numerical power spectrum for
electroencephalographic processing. Math. Methods Appl. Sci. (2015) (Published on-line
doi:10.1002/mma.3343)
6. Navascués, M.A., Sebastián, M.V., Campos, C., Latorre, M., Iso, J.M., Ruiz, C.: Random and
fractal models for IBEX 35. In: Proceedings of International Conference on Stochastic and
Computational Finance: From Academia to Industry, pp. 129–134 (2015)
7. Peters, E.E.: Chaos and Order in the Capital Market: A New View of Cycles, Prices and Market
Volatility. Wiley, New York (1994)
8. Peters, E.E., Peters, D.: Fractal Market Analysis: Applying Chaos Theory to Investment and
Economics. Wiley, New York (1994)
9. Rasheed, K., Qian, B.: Hurst exponent and financial market predictability. In: IASTED
Conference on Financial Engineering and Applications (FEA 2004), pp. 203–209 (2004)
10. Yahoo Finance. http://finance.yahoo.com/ (2015)
Value at Risk with Filtered Historical Simulation

Mária Bohdalová and Michal Greguš

Abstract In this paper we study the properties of estimates of the Value at Risk
(VaR) using the historical simulation method. Historical simulation (HS) method
is widely used method in many large financial institutions as a non-parametric
approach for computing VaR. This paper theoretically and empirically examines
the filtered historical simulation (FHS) method for computing VaR that combines
non-parametric and parametric approach. We use the parametric dynamic models
of return volatility such as GARCH, A-GARCH. We compare FHS VaR with VaR
obtained using historical simulation and parametric VaR.

Keywords A-GARCH • EWMA • Filtered historical simulation • GARCH •


Value at Risk

1 Introduction

The quantification of the potential size of losses and assessing risk levels for
financial instruments (assets, FX rates, interest rates, commodities) or portfolios
composed of them is fundamental in designing risk management and portfolio
strategies [12]. Current methods of evaluation of this risk are based on value-at-
risk (VaR) methodology. VaR determines the maximum expected loss which can be
generated by an asset or a portfolio over a certain holding period with a predeter-
mined probability value. VaR model can be used to evaluate the performance of a
portfolio by providing portfolio managers the tool to determine the most effective
risk management strategy for a given situation [12].
There exist many approaches on how to estimate VaR. An introduction to VaR
methodology is given in [15–18]. A good overview of VaR methods is given in [1,
2, 8, 9], etc. The commonly used techniques include analytic simulation techniques
either parametric or non-parametric [1, 6, 7, 21].

M. Bohdalová () • M. Greguš


Faculty of Management, Comenius University in Bratislava, Odbojárov 10, 82005 Bratislava,
Slovakia
e-mail: [email protected]; [email protected]; http://www.fm.uniba.sk

© Springer International Publishing Switzerland 2016 123


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_10
124 M. Bohdalová and M. Greguš

Many practitioners prefer parametric techniques that involve the selection of an


appropriate distribution for the asset returns and the estimation of the statistical
parameters of these returns. Historical data are used to measure the major param-
eters: means, standard deviations, and correlations [7]. Monte Carlo simulation
techniques are the most flexible and powerful tools, since they are able to take
into account all nonlinearities of the portfolio value with respect to its underlying
risk factor, and to incorporate all desirable distributional properties, such as fat
tails and time-varying volatility. Also, Monte Carlo simulations can be extended
to apply over longer holding periods, making it possible to use these techniques for
measuring credit risk. However, these techniques are also by far the most expensive
computationally [7]. Moreover, they often use factorization techniques that are
sensitive to the ordering of the data [3, 5].
All non-parametric approaches are based on the underlying assumption that the
near future will be sufficiently similar to the recent past so that we can use the data
from the recent past to forecast risks over the near future, and this assumption may
or may not be valid in any given context [9].
The purpose of the paper is to implement several VaR models, under FHS
framework, in order to estimate the 95 % and 99 % 1-day VaR number and compare
them. The rest of the paper is organized as follows. Section 2 provides an overview
of the historical simulation methods. Section 3 presents our results of the empirical
investigation. Section 4 concludes paper.

2 An Overview of the Historical Simulation VaR

Non-parametric simulation methods for the estimation of VaR as historical simula-


tions VaR (HS VaR) are based on empirical distributions which are not obviously
independent and identically distributed (i.i.d.). HS VaR is expressed as a percentage
of the portfolio’s value: the 100˛% h-day historical VaR is the ˛ quantile of
an empirical h-day discounted return distribution. The percentage VaR can be
converted to VaR in value terms by multiplying it by the current portfolio value. His-
torical simulation VaR may be applied to both linear and non-linear portfolios [1].
Traditional HS VaR method was described by Dowd [9], Alexander [1], etc. HS
VaR is usually calculated from a sample of returns for the past data with equal
probabilities. This approach does not take into account age, market volatility or
extremal values, and it is also assumed that all returns in the sample period are i.i.d.
When volatility changes over time, the i.i.d. assumption is violated. This leads to
inconsistent estimates of Value at Risk, as was documented in [5, 9, 11, 13, 19].
Dowd [9] has given a short overview of the generalization of the historical sim-
ulation. The first generalization gives exponentially weighted historical simulation
(or age-weighted HS), which uses weighting of the sample returns to reflect their
relative importance (we have used decay constant , usually  2 h0; 1i). Then the
jth observation has weight j w1 , where w1 D 1   is a weight of the 1 day old
return [1]. The traditional HS is obtained with zero decay, or  D 1 [9]. A suitable
Value at Risk with Filtered Historical Simulation 125

choice of  can make the VaR estimates much more responsive to a large loss of
the returns. Exponentially weighted HS makes risk estimates more efficient and
effectively eliminates any ghost effects [9]. For this reason this method is widely
used in commercial banks.
Volatility-weighted historical simulation, introduced by Hull and White [1, 9, 14],
is another generalization of the historical simulation method. This approach was
designed to weight returns by adjusting them from their volatility to the current
volatility. Briefly we can write this approach as follows.
Let the time series of unadjusted historical returns be frt gTtD1 , where T is the time
at the end of the sample, when VaR is estimated. Then the volatility adjusted returns
series at every time t < T is

OT
rQt D rt ; (1)
Ot

where T is fixed and t varies over the sample (t D 1; 2; : : : ; T), O T ; O t are the
estimation of the standard deviation of the returns series in time T; t, respectively
[1, 9]. Actual return in any period t is therefore increased (or decreased), depending
on whether the current forecast of volatility is greater (or less than) as the estimated
volatility for period T.
Last approach gives the filtered historical simulation (FHS). FHS approach was
proposed in a series of papers by Barone-Adesi et al. [4, 5]. Barone-Adesi assumed
that the historical returns are necessary to filter, which means to adjust them to
reflect current information about security risk. FHS combines in case of non-
parametric VaR the benefits of HS with the power and flexibility of conditional
volatility models such as GARCH, A-GARCH [9]. FHS uses a parametric dynamic
model of returns volatility, such as the GARCH, A-GARCH models to simulate log
returns on each day over the risk horizon [1].
The estimation of the 100˛% 1-day VaR of a single asset using FHS can be
obtained as follows [9]:
1. In the first step we fit log return data by an appropriate model (for example,
EWMA, GARCH, A-GARCH).
2. In the second step we use fitted model for forecast volatility for each of the day
in the sample period. These volatility forecasts are then divided into the realized
returns to produce a set of standardized returns that are i.i.d. We can use next
EWMA, GARCH, A-GARCH recursive formula for the variance estimate at time
t for a returns time series rt :
– EWMA variance:
2 2
O t D .1  /rt1 C  O t1 I 0<<1 (2)
126 M. Bohdalová and M. Greguš

– GARCH variance:

O t2 D !O C ˛O 2
O rt1 C ˇO O t1
2
I! 0; ˛ 0; ˇ 0; ˛ C ˇ  1: (3)

– AGARCH variance:

O t2 D !O C ˛.
O t1 /
2
C ˇO O t1
2
I! 0; ˛ 0; ˇ 0;  0; ˛ Cˇ  1; (4)

where t D 2; : : : ; T and

1X 2
T
O 12 D r : (5)
T iD1 i

3. In the third step we do bootstrap from the data set of the standardized returns. We
obtain simulated returns, each of which reflects current market conditions due to
scaling by today’s forecast of tomorrow’s volatility using Eq. (1).
4. In the last step we take VaR as 100˛% quantile from these simulated returns.
The problem is how to estimate the coefficients of each model. Risk Metrics
methodology recommends setting decay factor  as 0.94 in EWMA volatility
model. GARCH or A-GARCH coefficients can be estimated using maximum-
likelihood estimation (MLE). It is inappropriate to use recursive formulas (3), (4) to
estimate GARCH coefficients directly because computational complexity increases
exponentially with the number of data. If some mathematical software has not the
capability to estimate GARCH or A-GARCH models, it is possible to use our
proposed algorithms. GARCH model is obtained when  D 0. The program code is
written in Wolfram Mathematica software.
Program code for the estimation of A-GARCH coefficients: Alpha, Beta, Omega,
Lambda
(* logarithmic returns are in variable LV *)
Mu = Mean[LV];
Rtn = Flatten[{Mu, LV}];
NumD = Length[Rtn];

error[t, Rtn, mu_] := (Rtn[[t]]-Lambda)^2 ;

GcondVarianceTable[t, Alpha, Beta, Omega, Rtn, Mu,


Lambda] := Table[beta^(t - 1 - k)(Omega
+Alpha error[k, Rtn, Mu, Lambda]),
{k, 1, t - 1}];

GcondVariance[tab, t, Beta, Mu, Lambda] :=


Total[tab[[1 ;; t - 1]]]*Beta^(t - Length[tab] - 1)
+Beta^(t - 1) Mu^2;

LogL[Alpha, Beta, Omega, Lambda, Rtn, Mu] :=


Value at Risk with Filtered Historical Simulation 127

Module[{
tab =
GcondVarianceTable[Length[Rtn], Alpha, Beta, Omega,
Rtn, Mu, Lambda]},
-0.5 Sum[Module[{gcr = GcondVariance[tab, t, Beta,
Mu, Lambda]},
Log[Sqrt[gcr] + error[t, Rtn, Mu, Lambda]]/gcr]] ,
{t, 2, Length[Rtn]}];

AgarchEst = NMaximize[{LogL[Alpha, Beta, Omega, Rtn,


Mu, Lambda],
Omega >= 0, Alpha >= 0, Beta >= 0, Lambda >= 0, Alpha
+ Beta < 1},
{Alpha, Beta, Omega, Lambda},
MaxIterations -> 150, WorkingPrecision -> 10,
Method -> "NelderMead"]

3 Empirical Investigation

In this section we analyze the risk connected with investing in Bitcoin denominated
in US Dollar. Bitcoin as an online payment system was introduced in 2009 for
worldwide users. Bitcoin is a decentralized virtual currency, and no institution (e.g.,
a central bank) takes control of its value [10]. It is publicly designed and everyone
can participate. Bitcoin is based on peer-to-peer technology; managing transactions
and the issuing of bitcoins is performed collectively by the network. Using a lot of
its unique properties, Bitcoin allows fascinating uses which no previous payment
system could embrace.

3.1 Descriptive Statistics

Subject of this paper is to analyze the risk of the Bitcoin currency denominated
in USD using value at -risk. The Bitcoin fluctuations are measured in terms of
logarithmic returns without dividends (log returns are obtained by formula: rt D
ln Pt = ln Pt1 ; t D 1; : : : ; T, where Pt are 24 h average prices recorded in time t).
Sample period for BTC/USD began on July 17, 2010, and continues up to September
2, 2015, and comprises 1854 daily prices including weekends and holidays. It
is because Bitcoin allows any bank, business, or individual to securely send and
receive payments anywhere at any time.
Our sample period covers the entire daily history accessible from the global
financial portal Quandl Ltd., (n. d.) [20]. All results were obtained using Wolfram
Mathematica program code written by the authors. The virtual currency Bitcoin
recorded several major changes during the analyzed period. The time series of the
24 h average prices and their logarithmic returns are shown in Fig. 1. The historical
128 M. Bohdalová and M. Greguš

Fig. 1 BTC/USD 24 h average prices (left) and log returns (right). Sample Range: 17/07/2010–
02/09/2015

Fig. 2 BTC/USD 24 h average prices and log returns. Sample Range: 11/06/2013–02/09/2015

minimum of the log return was recorded on December 6, 2013, and historical
maximum of the log return was recorded on February 1, 2011. Last significant
decrease happened on January 14, 2015. We see a small peak in the very right part
of the graph on the left. This part of the graph approximately corresponds to June,
July, and August 2015. This peak highly probably relates to the recent crisis in
Greece, and this peak may be caused by the fact that more and more Greeks may
have started to use Bitcoins as their banks were closed. This may be especially true
about the maximum part of the peak, which occurred in July, because as we know,
this is exactly the time when the Greek banks were closed (see Fig. 1). When the
banks opened, 24 h average prices of Bitcoin started to fall, which is quite natural
as Greeks may have started to believe more in their bank system, and as a result,
they probably purchased fewer bitcoins than during the time when their banks were
closed. Volatility clustering with presence of heteroscedasticity is visible on the log
returns in Fig. 1 on the right. For better understanding of the last history we have
selected another sample which began on June 11, 2013, and ended on September
02, 2015 (see Fig. 2).
Table 1 presents descriptive statistics on the log returns on Bitcoin currency in
USD. All statistics except the skewness are highly significant unequal to zero at
significancy level 5 % (absolute values of the test statistics are greater than quantile
Value at Risk with Filtered Historical Simulation 129

Table 1 Descriptive statistics for BTC/USD log returns


Sample range Sample range
17/07/2010–02/09/2015 11/06/2013–02/09/2015
Estimation Test stat. Estimation Test stat.
Mean 0:0045 3.3034 0:0009 0.6009
Std. dev. 0:0593 3706 0:0461 1600
Min 0:4456 0:4456
Max 0:3722 0:2595
Skewness 0:2556 1.8336 1:4512 22.8648
Kurtosis 13:9643 25.0464 6:8409 26.9464

Fig. 3 Sample Autocorrelation Coefficients, Squared. Sample Range: 17/07/2010–02/09/2015,


11/06/2013–02/09/2015

Table 2 Distribution fit tests for BTC/USD


Sample range Sample range
17/07/2010–02/09/2015 11/06/2013–02/09/2015
Distribution Anderson–Darling p-value Anderson–Darling p-value
Normal 0:0000 0:0000
Student’s t 0:0004 0:1678
Normal mixture 0:0000 0:1731
Johnson SU 0:0593 0:7685

z0:975 D 1:96 of the normal distribution). The mean daily log return of the Bitcoin
was 0.45 %. The standard deviation was 5.93 % per day.
The squared log returns are significantly positive serial autocorrelated over the
long lags for the full sample and for the sample covering the last 2 years (see Fig. 3).
It means that heteroscedasticity is present in the analyzed samples.
Andreson–Darling p-value was used for testing the type of the distribution. We
select a set of the distributions—normal, Student’s, normal mixture, and Johnson SU
distribution. Table 2 shows that the full sample data follow Johnson SU distribution,
and the sample of the last 800 days does not follow the normal distribution.
Based on a T D 800 rolling sample, we have generated 1054 out-of-sample
forecasts of the VaR. The parameters of the fitted models are re-estimated each
130 M. Bohdalová and M. Greguš

Fig. 4 5 % and 1 % 1-day FHS VaR for GARCH adjusted log returns

Fig. 5 5 % and 1 % 1-day Normal VaR for GARCH adjusted log returns

trading day to calculate the 95 % and 99 % 1-day VaR. We have used GARCH and
EWMA models as fitting models, which were estimated in Wolfram Mathematica
software using our procedure. A-GARCH model is not suitable because there is no
skewness in our data. We have used especially GARCH filtered returns for which we
have chosen several methods: FHS VaR, normal VaR, Student’s t VaR, and Johnson
SU VaR. We compare all of these methods with real log returns and with historical
simulations VaR. Figure 4 compares FHS VaR with HS 5 %, 1 % 1-day VaR. We
can see that historical VaR does not respond very much to volatility distortions.
FHS predicted more precisely the risks connected with investing into bitcoins.
Comparing Normal VaR (Fig. 5), Student’s t VaR (Fig. 6), and Johnson SU VaR
(Fig. 7), the Johnson SU VaR is the best at recognizing the most extreme distortions,
and Student’s t VaR gives the lowest estimates of the risks. It was not possible to
calculate 1 % 1-day Johnson SU VaR because there were too few overruns of real
losses. Figure 8 shows the estimation of the VaR for EWMA adjusted returns. We
can say that this VaR estimation is the worst because it does not recognize the finest
distortions well.
Value at Risk with Filtered Historical Simulation 131

Fig. 6 5 % and 1 % 1-day Student’s VaR for GARCH adjusted log returns

Fig. 7 5 % 1-day for Johnson SU VaR for GARCH adjusted log returns

Fig. 8 5 % and 1 % 1-day historical VaR for EWMA adjusted log returns
132 M. Bohdalová and M. Greguš

4 Conclusion

We have introduced FHS method for computing VaR in this paper, and we
applied this method on historical data on bitcoin denominated in USD. We have
discovered that GARCH model was suitable for our data. We have applied historical
simulations and the most widely used parametric VaR method on the adjusted
log returns. The most suitable method was Johnson SU VaR and the second most
suitable method was FHS VaR.

Acknowledgements This research was supported by a VUB grant, no. 2015-3-02/5.

References

1. Alexander, C.: Market Risk Analysis. Chichester, Wiley, New York (2008)
2. Allen, S.: Financial Risk Management. A Practitioner’s Guide to Managing market and Credit
Risk. Wiley, New Jersey (2003)
3. Andersen, T.G., Davis, R.A., Kreiss, J.P., Mikosch, T.: Handbook of Financial Time Series.
Springer, Heidelberg (2009)
4. Barone-Adesi, G., Bourgoin, F., Giannopoulos, K.: Don’t look back. Risk 11, 100–103 (1998)
5. Barone-Adesi, G., Giannopoulos, K., Vosper, L.: VaR without correlations for portfolios of
derivative securities. J. Futur. Mark. 19, 583–602 (1999). Available at http://www.research
gate.net/profile/Kostas_Giannopoulos/publication/230278913_VaR_without_correlations_for_
portfolios_of_derivative_securities/links/0deec529c80e0b8302000000.pdf
6. Barone-Adesi, G., Giannopoulos, K., Vosper, L.: Filtering historical simulation. Backtest anal-
ysis. Mimeo. Universita della Svizzera Italiana, City University Business School, Westminster
Business School and London Clearing House (2000)
7. Bohdalová, M.: A comparison of value-at-risk methods for measurement of the financial risk.
In: E-Leader Prague, June 11–13, 2007, pp. 1–6. CASA, New York (2007). http://www.g-casa.
com/PDF/Bohdalova.pdf
8. Christoffersen, P.: Value-at-Risk Models. In: Andersen, T.G., Davis, R.A., Kreiss, J.P.,
Mikosch, T. (eds.) Handbook of Financial Time Series. Springer, Heidelberg (2009)
9. Dowd, K.: An Introduction to Market Risk Measurement. Wiley, Chichester (2002)
10. Easwaran, S., Dixit, M., Sinha, S.: Bitcoin dynamics: the inverse square law of price
fluctuations and other stylized facts. In: Econophysics and Data Driven Modelling of Market
Dynamics New Economic Windows 2015, pp. 121–128 (2015). http://dx.doi.org/10.1007/978-
3-319-08473-2_4
11. Escanciano, J.C., Pei, P.: Pitfalls in backtesting historical simulation VaR models. J. Bank.
Financ. 36, 2233–2244 (2012). http://dx.doi.org/10.1016/j.jbankfin.2012.04.004
12. Hammoudeh, S., Santos, P.A., Al-Hassan, A.: Downside risk management and VaR-based
optimal portfolios for precious metals, oil and stocks. N. Am. J. Econ. Finance 25, 318–334
(2013). http://dx.doi.org/10.1016/j.najef.2012.06.012
13. Hendricks, D.: Evaluation of value at risk models using historical data. FRBNY Economic
Policy Review, April 1996, New York (1996)
14. Hull, J., White, A.: Value at risk when daily changes in market variables are not normally
distributed. J. Derivatives 5, 9–19 (1998)
15. Jorion, P.: Value at Risk: The New Benchmark for Managing Financial Risk. McGraw-Hill,
New York (2001)
16. Jorion, P.: Financial Risk Manager Handbook. Wiley, New Jersey (2003)
Value at Risk with Filtered Historical Simulation 133

17. Kuester, K., Mittnik, S., Paolella, M.S.: Value-at-risk prediction: a comparison of alternative
strategies. J. Financ. Econ. 4, 53–89 (2006)
18. Lu, Z., Huang, H., Gerlach, R.: Estimating value at risk: from JP Morgan’s standard-EWMA to
skewed-EWMA forecasting. OME Working Paper No: 01/2010 (2010). http://www.econ.usyd.
edu.au/ome/research/working_papers
19. McNeil, A.J., Frey, R.: Estimation of Tail-Related Risk Measures for Heteroscedastic Financial
Time Series: An Extreme Value Approach. ETH, Zurich (1999). http://www.macs.hw.ac.uk/~
mcneil/ftp/dynamic.pdf
20. Quandl Ltd.: Bitcoin-data. Retrieved from (n. d.). https://www.quandl.com/collections/
markets/bitcoin-data (2015)
Epava, Olomouc (2015)
21. Valenzuela, O., Márquez, L., Pasadas, M., Rojas, I.: Automatic identification of ARIMA
time series by expert systems using paradigms of artificial intelligence. In: Monografias del
Seminario Matemático García de Galdeano, vol. 31, pp. 425–435 (2004)
A SVEC Model to Forecast and Perform
Structural Analysis (Shocks) for the Mexican
Economy, 1985Q1–2014Q4

Eduardo Loría and Emmanuel Salas

Abstract A structural vector error correction (SVEC) model was constructed as


an information system that contains: US Industrial Output, the monetary aggregate
M2, the Unemployment Rate, and the Real Exchange Rate (Mexico to the USA).
The model fulfills all the correct specification tests, and proved to be structurally
stable through some recursive tests. The model is based on four theoretical and
empirical facts: (a) Okun’s Law, (b) the existence of real effects of monetary policy,
(c) the dependency of the Mexican economy to US Industrial Output, and (d) a
positive effect of the real exchange rate to GDP. The model has also been employed
in quarterly forecast (2 or 3 years in advance) for the last 10 years.

Keywords Cointegration • Macroeconomic fluctuations • Multicollinearity •


Structural shocks • SVEC

1 Introduction

Studying the factors that generate fluctuations in economic aggregates is one of the
most important directions of recent macroeconomic literature.
The economy of Mexico is rather unique in that, on the one hand, it has a
natural market with the USA, depending heavily on its neighbor’s total activity and
industrial cycle; on the other, it is an emerging market that suffers from the structural
problems characteristic to such economies.
The objective is to perform structural analysis of the shocks to Mexican GDP for
1985.Q1–2014.Q4. A structural vector error correction (SVEC) model is estimated,

This article is part of the research project Mexico: growth, cycles and labor precariousness, 1980–
2020 (IN302514), DGAPA, UNAM.
The usual disclaimer applies.
E. Loría () • E. Salas
Center for Modeling and Economic Forecasting, School of Economics, UNAM, Mexico City,
Mexico
e-mail: [email protected]; [email protected]

© Springer International Publishing Switzerland 2016 135


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_11
136 E. Loría and E. Salas

which enables the identification of a variety of shocks—both permanent and


transitory—by imposing short- and long-term restrictions. In doing so, the direction,
magnitude and temporal features of the effect that economic aggregates have on
GDP fluctuations can be theoretically substantiated.
Due to overparameterization and mullticollinearity, two submodels were gener-
ated and the fluctuations under study were correctly identified. The article features
a review of the literature, a justification for the choice of variables, an overview of
stylized facts, econometric issues, and conclusions.

2 Literature Review

Before Nelson and Plosser’s [1] seminal article, macroeconomic variables were
modeled as stochastic processes that evolve around deterministic trends. Today,
it is generally accepted that an economy can be knocked away from its previous
trajectory by a relevant shock, and it may never return to it afterwards [2].
Macroeconomic series are often of order I(1), requiring the use of cointegration
methodologies [3]. SVECs not only enable an adequate estimation, but they also
allow for the incorporation of long- and short-term restrictions, the analysis of
shocks between variables, and the determination of their temporary or permanent
nature.
Two general examples of this are the work of Rudzkis and Kvedaras [4] for
Lithuania, which applies the results of the model to make forecasts; and of Lanteri
[5], which studies as a whole the structural shocks of the economy of Argentina.

2.1 Justification and Variable Selection

This section presents a review of SVEC-related literature in terms of the effect of


each variable on GDP.

2.1.1 Monetary Policy (M2)

The literature review revealed a strong debate regarding the efficiency and tem-
porality of real effects of monetary policy. Assenmacher-Wesche [6] studies the
transmission mechanisms of monetary policy in Switzerland after the modifications
introduced in the year 2000. Bernanke et al. [7] find that shocks generated by
monetary policy in the USA can explain around 20 % of the variation of output.
A SVEC Model to Forecast and Perform Structural Analysis (Shocks) for. . . 137

2.1.2 Unemployment

Utilizing a SVEC, Brüggemann [8] studies different sources of unemployment in


Germany (real wages and productivity); Bukowski et al. [9] study shocks to the
labor market in eight countries of Central and Eastern Europe.

2.1.3 GDP and Real Exchange Rate

Rodrik [10], Rapetti et al. [12], and Razmi et al. [13] demonstrate empirically
that a depreciated real exchange rate is fundamental to explaining the growth of
developing countries.
Ibarra [14] demonstrates that real appreciation of the Mexican Peso has weak-
ened the country’s economic growth in the long term by reducing profit margins,
and thus diminishing investment in the tradable goods sector.

2.1.4 US Industrial Output and Economic Integration

Several authors with different methodologies and periods of study such as Torres
and Vela [15] and Chiquiar and Ramos-Francia [16] have demonstrated empirically
that the industrial structure of Mexico has linked itself systemically and progres-
sively to its US counterpart.
All of the above suggests that the range of effects and reactions is very wide,
and reflects the existence of a considerable volume of literature that supports our
system. Here we can conclude that, although the information system appears at first
to be relatively small (five variables), the range of effects that might be studied and
of hypotheses to be proved is very broad.

3 Stylized Facts

The sample is 1985Q1–2014Q4 and the information system is


Z D fY; Yus ; Q; M2; Ug, where Y D Real GDP of Mexico, Yus D Index of US
Industrial Output, Q D Mexico–US real exchange rate (E  INPCus =INPCMX ),
E D nominal exchange rate, M2 D real monetary aggregate M2, and U D open
unemployment rate.1
A visual inspection of our information system (Fig. 1) reveals that y, yus ,
and m2 have a clear, common trend that generates a serious multicollinearity
problem; this can be verified by the Variance Inflation Factors [17] u D 19.673;
yus D 2248.956; m2 D 1963.042; q D 151.821, which are above the threshold of 5,

1
Variables in lower case represent logarithms.
138 E. Loría and E. Salas

Q U Y
-0.8 8 14.6

-1.0 6 14.4

14.2
-1.2 4
10
30 14.0
2 .2
-1.4
5 20 13.8
0 .1
-1.6 10
0
0 .0
-5 -.1
-10

-10 -20 -.2


88 90 92 94 96 98 00 02 04 06 08 10 12 88 90 92 94 96 98 00 02 04 06 08 10 12 88 90 92 94 96 98 00 02 04 06 08 10 12

Actuals Q (Scenario 1) Actuals U (Scenario 1) Actuals Y (Scenario 1)


Deviation Percent Deviation Deviation Percent Deviation Deviation Percent Deviation

YUS M2
4.8 23.0
22.5
4.6
22.0
4.4
21.5
.8 4.2 21.0
.4
.4 20.5
4.0 .2
.0 .0

-.4 -.2

-.8 -.4
88 90 92 94 96 98 00 02 04 06 08 10 12 88 90 92 94 96 98 00 02 04 06 08 10 12

Actuals YUS (Scenario 1) Actuals M2 (Scenario 1)


Deviation Percent Deviation Deviation Percent Deviation

Fig. 1 Historical simulation

Table 1 Partial correlation D(y) D(u) D(yus ) D(m2)


analysis: variables in first
differences 1985Q2–2014Q4 D(u) 0:292
D(yus ) 0:954 0.192
D(m2) 0:985 0.380 0:910
D(q) 0:605 0.300 0:572 0.574

and the high partial correlation (Table 1). Multicollinearity reduces the efficiency
of the estimators and generates several inference problems, such as signs that are
opposite to the theoretical relationship, and perturbations in Granger causality and
in the detection of weak exogeneity. The above translates into obstacles for correct
estimation of the shocks.
First, the complete model was estimated and simulated in order to test its
historical replication capability. Then, it was divided into two submodels, which
are nested to free them of multicollinearity and to subsequently extract the relevant
macroeconomic effects of the shocks correctly.

4 Econometric Issues

4.1 Estimation

Zt is generated by a reduced form VECM:

Zt D ˛ˇ 0 Zt1 C 1 Zt1 C    C p1 Ztp C C  Dt C E  Xt C ut ; (1)


A SVEC Model to Forecast and Perform Structural Analysis (Shocks) for. . . 139

Table 2 Error correction terms (˛)


Error correction D( y) D(u) D( yus ) D(m2) D(q)
˛ 0.416 1.900 0.092 0.057 0.959
(0.084) (2.156) (0.052) (0.149) (0.278)
[4.922] [0.881] [1.756] [0.382] [3.449]

where Zt is a K  1 vector of endogenous variables; Xt is a vector of exogenous or


unmodeled stochastic variables; Dt contains all deterministic terms; and ut is a K  1
structural form error that is a zero mean white noise process with time-invariant
covariance matrix †t .
Following a careful marginalization process [18], and based on the traditional
criteria (trace test and maximum eigenvalue test, Table 6) one cointegration
vector with four lags was found, with a constant and with no deterministic trend.
Deseasonalization dummies and two adjustment dummies are included. There are
no exogenous variables. The resulting model satisfies all correct specification tests
(Table 7).
Since we are interested in finding the determinants of growth, we normalize the
cointegration vector on y, which yields the following long-term result:

yt D 6:100  0:008  ut C 0:221  yust C 0:324  m2t  0:122  qt C et : (2)


t.nd/ .2:088/ .7:031/ .23:366/ .3:832/

Considering the statistical significance measures and the values of the error
correction mechanisms, the unrestricted model suggests that the weak exogeneity
criterion for q is not met, which indicates that it should be modeled itself specifically
[19]. Table 2 shows that ˛ 15 is significant and falls in the correct range (1, 0). This
puts into question the weak exogeneity criterion for q. However, both by theory and
by the above-mentioned economic policy considerations, it is very difficult to argue
that q is endogenous.
Lastly, the historical simulation is very satisfactory for all five variables, par-
ticularly for y, which validates the normalization on this variable and is a further
indicator of correct specification (Fig. 1).
Chow tests on Table 3 also reject the structural break hypothesis. In summary,
even though the unrestricted (i.e., complete) model has issues due to overparameter-
ization, which yields multicollinearity problems,2 it can be considered an adequate

2
This is not a problem for forecasting purposes [17, 20, 21] given the perpetuation of a
stable dependency relationship between variables, and the perpetuation of stable interdependence
relationships within Z [25].
140 E. Loría and E. Salas

Table 3 Chow tests for Break point Chow test 445.67


structural change
Asymptotic 2 p-value 0.0000
Degrees of freedom 145
Sample split Chow test 310.96
Asymptotic 2 p-value 0.0000
Degrees of freedom 130
Sample range: 1987Q1–2013Q4
Ho: Structural change in 1996.2
rejected by 2 statistic at 99 %

statistical model insofar as it reports the existence of weak and strong exogeneity3
and super exogeneity (i.e., stability of the model). The model thus fulfills the
objectives of econometric analysis: (a) elasticity analysis, (b) forecast, and (c) policy
analysis [18, 22].

4.2 Structural Analysis: Methodology

In order to perform structural analysis that reveals the transmission mechanisms


and the efficiency of policy measures, we can set the cointegrating equation to its
moving average representation.4

Xp1 1
0
„ D ˇ? ˛? Ik  i ˇ? ˛? : (3)
iD1

The SVEC model can be used to identify the shocks to be traced in an impulse-
response analysis by imposing restrictions on the matrix of long-run shocks and the
matrix B of contemporaneous effects of the shocks.
The long-run effects of the " shocks are given by rk .„/ D K  r and hence, „B
has rank K  r. Hence, there can be at most r shocks with transitory effects (zero
long-run impact) and at least k D K  r shocks have permanent effects. Due to the
reduced rank of the matrix, each column of zeros stands for only k* independent
restrictions. k .k  1/ =2 additional restrictions are needed to exactly identify
the permanent shocks and r .r  1/ =2 additional contemporaneous restrictions are
needed to exactly identify the transitory shocks. Estimation is done by maximum
likelihood using the reduced form.

3
Which follows from Granger causality.
4
What follows is based on Lütkepohl and Krätzig [23, p. 168].
A SVEC Model to Forecast and Perform Structural Analysis (Shocks) for. . . 141

4.3 Identification and Analysis of Results

To fulfill the second objective of the article, which is to analyze the structural shocks
that underlie the information system, it was necessary to develop two submodels
from the original set, eliminating the above-mentioned colinearity issues. Both
submodels are exactly identified with three restrictions: one short-term restriction,
given that there is only one cointegration vector in each submodel, and two long-
term restrictions. All confidence intervals were constructed using Hall’s percentile
method at 95 % with 100 replications.

4.3.1 Monetary Effects and Okun’s Law (Z1 D ut ; yt ; m2t )

The first effect to be proved is the non-neutrality of monetary policy. New Keynesian
theory [7] has accepted from its birth that there are temporary, short-term effects.
This is the foundation of the unconventional monetary policy that has been intensely
applied since the crisis of 2009 to prevent further drops in output and prices. For this
reason, the following restrictions were applied to matrices B and „B (Table 4).
Figure 2 corroborates the above hypothesis, in that it shows a positive, intermit-
tently limited effect of m2 on y, which fades away after quarter number 16. Variance
analysis in Fig. 2 shows that, after the second period, the effect of the shock of
m2 is negligible, as it accounts for no more than 5 % of the variance, and quickly
dissipates afterwards.

Table 4 SVEC submodel B „B


restrictions Z1 D fut ; yt ; m2t g
ˇ 11 ˇ 12 ˇ 13  11  12 0
0 ˇ 22 ˇ 23  21  22 0
ˇ 31 ˇ 32 ˇ 33  31  32  33

0.02 20%

0.015 15%

0.01 10%

0.005 5%

0 0%
0 4 8 12 16 1 3 5 7 9 11 13 15
-0.005

Fig. 2 Impulse response: M2 ! y (left panel) and the effect of M2 in variance decomposition of
y (right panel)
142 E. Loría and E. Salas

0 0 0

2
4
6
8
10
12
14
16

0
2
4
6
8
10
12
14
16
-0.01
-0.2
-0.02
-0.4
-0.03
-0.6 -0.04

-0.05
-0.8

Fig. 3 Impulse response: y ! u (left panel) and u ! y (right panel)

0.8

0.6

0.4

0.2

0
0 2 4 6 8 10 12 14 16 18 20

Fig. 4 Impulse response u ! u

Growth and Unemployment

The second effect to be proved within the submodel Z1 is Okun’s Law [24] and
the hysteresis hypothesis of unemployment. The former suggests a negative and
bidirectional relationship between unemployment and the growth of output.
Figure 3 shows clearly the permanent effect of Okun’s Law. Finally, Fig. 4
reflects the presence of hysteresis: once unemployment increases (is shocked), it
does not return to its original level. After ten quarters, it stabilizes at twice the
original level.

4.3.2 External Sector (Z2 D yust ; yt ; qt )

Since the introduction of NAFTA, Mexico has been an open and integrated
economy, particularly to US industries, as mentioned above. As was done in the
previous section, three restrictions were defined, yielding an exactly identified
model (Table 5).
Figure 5 (left panel) noticeably reflects the immediate, positive, significant, and
permanent effect of the US industrial output shock. The effect of the real exchange
rate on output is clearly expansionary and permanent (Fig. 5, right panel).
A SVEC Model to Forecast and Perform Structural Analysis (Shocks) for. . . 143

Table 5 SVEC restrictions B „B


for external sector analysis
Z2 D fqt ; yust ; yt g ˇ 11 ˇ 12 ˇ 13  11  12 0
0 ˇ 22 ˇ 23  21  22 0
ˇ 31 ˇ 32 ˇ 33  31  32  33

0.02 0.04
0.015 0.03
0.01 0.02
0.005 0.01
0 0
0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20
-0.01

Fig. 5 Impulse response: yus ! y (left panel) and q ! y (right panel)

5 Conclusion

Given that forecasting was one of the initial and main objectives of the model,
multicollinearity issues arose. In order to perform structural analysis, this problem
was solved by dividing the cointegration space into two submodels with different
variable sets, and therefore two groups of domestic and external shocks that follow
from what the literature considers to be relevant variables.
This last characteristic of the SVEC technique (i.e., the ability to evaluate shocks)
is widely used to corroborate disputed hypothesis, such as real monetary effects,
hysteresis of unemployment, linkage to the US industrial product, and the exchange
rate as a growth factor.
The summary of the results of this model can be structured based on Rodrik [11]
who, on the basis of New Development theory, proposes the widespread use of an
industrial policy for Mexico, in view of its linkage to the USA, and a high and stable
exchange rate, as a way to increase productivity and thus grow through the external
sector.
Hysteresis of unemployment is one of the most pressing reasons to seek high
growth rates. This is because, according to Okun’s [24] analysis of unemployment,
the unemployment generated by a crisis has long-term effects and, in terms of the
bidirectional relationship, hysteresis of unemployment could generate a drop of the
dynamics of growth, potentially creating a poverty trap.
The crisis of 2009 left profound lessons. One of them was the use of uncon-
ventional, active, and countercyclical monetary policy, mainly from the theoretical
point of view of New Keynesian consensus. The above empirical analysis provides
solid evidence for the use of this tool, particularly in periods of recession or crisis.
144 E. Loría and E. Salas

6 Statistical Appendix

Table 6 Johansen cointegration and correct specification tests


Johansen test (r D 1) Correct specification tests
Trace Max. eigen Saikkonen and Lütkepohl Urzua LM(11) White N.C.
t statics 39:54 0.16 31:93 102:49 25:9 752:06
Prob. 0:23 0.34 0:12 0:55 0:41 0:32

Table 7 VEC Granger causality/block exogeneity Wald tests


Dependent variable d( y) d(u) d( yus ) d(m2) d(q)
Excluded 2 Prob. 2 Prob. 2 Prob. 2 Prob. 2 Prob.
d(y) 9:50 0.05 1:98 0.74 3:95 0.41 7:91 0.10
d(u) 13:52 0.01 5:50 0.24 5:25 0.26 0:57 0.97
d(yus ) 28:75 0.00 2:02 0.73 5:37 0.25 0:57 0.97
d(m2) 14:64 0.01 12:01 0.02 2:60 0.63 33:57 0.00
d(q) 40:96 0.00 37:45 0.00 5:85 0.21 23:47 0.00
all 140:43 0.00 60:08 0.00 27:67 0.04 49:49 0.00 40:50 0.00
Ho: No Granger causality at 99 %

References

1. Nelson, C., Plosser, C.: Trends and random walks in macroeconomic time series: some
evidence and implications. J. Monet. Econ. 10(2), 139–162 (1982)
2. Durlauf, S., Romer, D., Sims, C.: Output persistence, economic structure, and the choice of
stabilization policy. Brookings Papers on Economic Activity, pp. 69–136 (1989)
3. Johansen, S.: Determination of cointegration rank in the presence of a linear trend. Oxf. Bull.
Econ. Stat. 54(3), 383–397 (1992)
4. Rudzkis, R., Kvedaras, V.: A small macroeconometric model of the Lithuanian economy.
Austrian J. Stat. 34(2), 185–197 (2005)
5. Lanteri, L.: Choques externos y fuentes de fluctuaciones macroeconómicas. Una propuesta con
modelos de SVEC para la economía Argentina, Economía Mexicana Nueva Época. Núm. 1.
Primer semestre. CIDE. México (2011)
6. Assenmacher-Wesche, K.: Modeling monetary transmission in Switzerland with a structural
cointegrated VAR model. Swiss J. Econ. Stat. 144(2), 197–246 (2008)
7. Bernanke, B., Gertler, M., Watson, M., Sims, C., Friedman, B.: Systematic monetary policy and
the effects of oil price shocks. Brookings Papers on Economic Activity, pp. 91–157 (1997)
8. Brüggemann, R.: Sources of German unemployment: a structural vector error correction
analysis. Empir. Econ. 31(2), 409–431 (2006)
9. Bukowski, M., Koloch, G., Lewandowski, P.: Shocks and rigidities as determinants of CEE
labor markets’ performance. A panel SVECM approach. http://mpra.ub.uni-muenchen.de/
12429/1/MPRA_paper_12429.pdf (2008)
10. Rodrik, D.: Growth after the crisis. Working Paper 65, Commission on Growth and Develop-
ment, Washington, DC (2005)
A SVEC Model to Forecast and Perform Structural Analysis (Shocks) for. . . 145

11. Rodrik, D.: The real exchange rate and economic growth. Brook. Pap. Econ. Act. 2, 365–412
(2008)
12. Rapetti, M., Skott P., Razmi A.: The real exchange rate and economic growth: are developing
countries different? Working Paper 2011-07. University of Massachusetts Amherst (2011)
13. Razmi, A., Rapetti, M., Scott, P.: The Real Exchange Rate and Economic Development. Struct.
Chang. Econ. Dyn. 23(2), 151–169 (2012)
14. Ibarra, C.: México: la maquila, el desajuste monetario y el crecimiento impulsado por las
exportaciones. Revista Cepal. 104 Agosto (2011)
15. Torres, A., Vela, O.: Trade integration and synchronization between the business cycles of
Mexico and the United States. N. Am. J. Econ. Financ. 14(3), 319–342 (2003)
16. Chiquiar, D., Ramos-Francia, M.: Trade and business-cycle synchronization: evidence from
Mexican and US manufacturing industries. N. Am. J. Econ. Financ. 16(2), 187–216 (2005)
17. Kennedy, P.: A Guide to Econometrics, 6th edn. Blackwell, Oxford (2008)
18. Hendry, D.: The econometrics of macroeconomic forecasting. Econ. J. 107(444), 1330–1357
(1997)
19. Johansen, S.: Testing weak exogeneity and the order of cointegration in UK money demand
data. J. Policy Model 14(3), 313–334 (1992)
20. Conlisk, J.: When collinearity is desirable. West. Econ. J. 9, 393–407 (1971)
21. Blanchard, O.: Comment. J. Bus. Econ. Stat. 5, 449–451 (1987)
22. Charemza, W., Deadman, D.: New Directions in Econometric Practice: General to Specific
Modelling, Cointegration and Vector Autoregression. E. Elgar, Aldershot (1992)
23. Lütkepohl, H., Krätzig, M.: Applied Time Series Econometrics. Cambridge University Press,
Cambridge (2004)
24. Okun, A.: Potential GNP: its measurement and significance. In: Proceedings of the Business
and Economic Statistics Section, pp. 98–104. American Statistical Association, Alexandria
(1962)
25. Farrar, D., Glauber, R.: Multicollinearity in regression analysis: the problem revisited. Rev.
Econ. Stat. 49, 92–107 (1967)
Intraday Data vs Daily Data to Forecast
Volatility in Financial Markets

António A.F. Santos

Abstract The measurement of the volatility is key in financial markets. It is well


established in the literature that the evolution of the volatility can be forecasted.
Recently, measures of volatility have been developed using intraday data, for
example, the realized volatility. Here the forecasts of volatility are defined through
the stochastic volatility model. After estimating the parameters through Bayesian
estimation methods using Markov chain Monte Carlo, forecasts are obtained
using particle filter methods. With intraday data additional information can help
understanding better the evolution of the financial volatility. The aim is to build a
more clear picture of the evolution of the volatility in financial markets through the
comparison of volatility measures using daily observations and intraday data.

Keywords Bayesian estimation • Big data • Intraday data • Markov chain Monte
Carlo • Particle filter • Realized volatility • Stochastic volatility

1 Introduction

The volatility of financial returns is an important subject addressed in the financial


literature, and is also a measure essential to decision making in financial markets.
The volatility constitutes a measure of risk used in different situations and enters
in many financial econometric models. These need data, and recently this is
an abundant resource. The models have to be adapted to cope with these new
developments.
Since the 1980s, following the articles of Engle [1], Bollerslev [2], and Taylor
[3], the research in this area is immense. After an analysis of the volatility evolution
that was more model-dependent, at the end of the 1990s some developments less
model-dependent have appeared. One uses what is known as the realized volatility
measure [4–9]. New developments have been considered and some recent research

A.A.F. Santos ()


Faculty of Economics, Monetary and Financial Research Group (GEMF), Center for Business
and Economics Research (CeBER), University of Coimbra, Av. Dias da Silva, 165, 3004-512
Coimbra, Portugal
e-mail: [email protected]

© Springer International Publishing Switzerland 2016 147


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_12
148 A.A.F. Santos

has been trying to establish a link between the realized volatility measure and the
stochastic volatility model [10, 11].
The measures of volatility are important in the decision-making process, as
examples we have decisions associated with the portfolio allocation of assets and the
definition of the price of derivative assets. Different kinds of measures of volatility
have been proposed, and one is the realized volatility. This measure is seen as an
estimator of the integrated volatility defined within a theoretical framework that
accommodates the evolution of asset prices through a continuous stochastic process.
We sought to establish if volatility measures obtained through intraday data are
compatible with the ones obtained using data at lower frequencies. They must be
linked because daily observations must correspond to the end points at each day
through the intraday data. However, the difference in the amount of information is
immense. With daily data, less information is available and more structure is needed.
The aim is to verify if the measures of volatility obtained with less information, but
with more structure supplied by a model, are compatible with the ones obtained
using intraday data. The model used is the stochastic volatility (SV) model. In the
case of compatible approaches, it opens space for the development of more robust
models, where different observations at different frequencies can be considered.
The realized volatility measure is compatible with the forecasts obtained through
the SV model, if it is within the bounds defined by the quantiles associated with the
predictive distribution of the states defined by the model.
The next sections present the results as follows. Section 2 presents the basic
results associated with the definition of the realized volatility measure. It is an
estimator of the integrated volatility and calculated using intraday data. The aim is to
compare this measure with the forecasts obtained through the SV model. In Sect. 3
are presented a series of results associated with the parameters’ estimation through
a Bayesian framework with Markov chain Monte Carlo (MCMC) simulations. The
parameters are needed to define the filter distributions. In Sect. 4 are presented
the results for defining the approximation to the filter distribution using particle
filter methods. In Sect. 5 a set of intraday observations to five stocks are used to
demonstrate the evolution of the volatility through two approaches, and the results
are compared to establish the compatibility of both approaches. In Sect. 6 some
conclusions are presented.

2 Realized Volatility and Integrated Volatility

Assuming that the evolution of the prices of a given asset P.t/ follows a diffusion
process given by dP.t/ D .t/P.t/dt C .t/P.t/dW.t/, where .t/ is a mean
evolution function, 2 .t/ the variance evolution function, and W.t/ the standard
Brownian motion, from this representation, a measure of interest is the integrated
Intraday Data and Volatility Forecasts 149

volatility given by
Z t
2
IVt D .s/ds : (1)
0

If the evolution of the price is given by the aforementioned diffusion process, for a
period of length t, the volatility for the period is given by (1). As this quantity is not
observable, an estimator was proposed, which is known as the realized volatility.
Considering yt the return at t, that represents the return associated with one period
(a day), and the partition 0 < t1 < t2 <    < tn < 1, the intraday returns are given
by yti D pti pti1 , i D 1; : : : ; n, where pti is the log-price at ti . The realized volatility
is given by the sum of the squares of the intraday returns

X
n
RVt D y2ti : (2)
iD1

Important research has been conducted trying to establish if the RVt is a consistent
P
estimator to IVt , RVt ! IVt , i.e., it converges in probability. The statistical
properties of the estimator RVt were extensively analyzed and some of the main
references are [9, 12–14].
The diffusion process considered above for the price can be extended with
a diffusion process for the variance, for example, through a Cox–Ingersoll–
Ross kind of process d log. 2 .t// D .˛  ˇ log. 2 .t/// C  log. 2 .t//dZ.t/,
where Z.t/ is a Brownian motion independent from W.t/. The bivariate diffusion
process can be seen as the continuous version of the stochastic volatility model
in discrete
Rt time that we use hereafter, where ahead, D ˛, D eˇ and
t D t1 eˇ.ts/ log. 2 .s//dZ.s/.

3 The Stochastic Volatility Model

The volatility evolution of financial returns is not directly observable, and different
kinds of models have been proposed to estimate the aforementioned evolution. An
example is the SV model proposed by Taylor [3]. A common representation is
given by
˛ 
t
yt D exp "t ; "t N.0; 1/ (3)
2
˛tC1 D C .˛t  / C  tC1 ; tC1 N.0; 1/ ; (4)

where yt represents the log-return at t, whose distribution is defined by the evolution


of an unobservable state ˛t . The model is characterized by the parameters in the
set  D . ; ;  /. The first method proposed to estimate the parameters was the
150 A.A.F. Santos

method of moments. Later, in the seminal paper of Jacquier et al. [15], the authors
show that Bayesian estimation procedures associated with MCMC can be used to
obtain more efficient estimates for the parameters.

3.1 Stochastic Volatility Estimation

The article [15] began a series of important research on the best way to estimate
the model, and for the most cited, see [16–20]. Different approaches have been
proposed, which are associated with different algorithms to obtain samples for
the vector of states. Here, we develop a gaussian approximation which gives high
acceptance rates in the MCMC simulations used in the estimation process.
To estimate the model the marginal posterior distribution of the parameters is
approximated. This can be done by simulating from the distribution of ˛1Wn ; jy1Wn ,
where ˛1Wn D .˛1 ; : : : ; ˛n / and y1Wn D .y1 ; : : : ; yn /. Using Gibbs sampling
the parameters are sampled conditional on the states, j˛1Wn ; y1Wn , and the states
conditional on the parameters, ˛1Wn j; y1Wn . We develop a single-move sampler to
simulate from ˛t j˛nt ; ; y1Wn , where ˛nt D .˛1 ; : : : ; ˛t1 ; ˛tC1 ; ˛n /. Based on a
second order Taylor approximation to the target density gives a gaussian density
as the approximating density. Assuming that at iteration k the sampled elements are
.k/ .k/
 .k/ D . .k/ ; .k/ ; .k/ / and ˛ .k/ D .˛1 ; : : : ; ˛n /, at iteration k C 1 the algorithm
proceeds as
.kC1/ .k/
1. Sample from ˛t j˛t1 ; ˛tC1 ; yt ;  .k/ ; t D 1; : : : ; n.
2. Sample from ; j .k/ ; ˛ .kC1/ ; y1Wn .
3. Sample from j .kC1/ ; .kC1/ ; ˛ .kC1/ .
To obtain samples for the states in step 1, the algorithm proceeds as follows. The
logarithm of the density function assumes the form

˛t y2 .˛tC1   .˛t  //2 .˛t   .˛t1  //2


`.˛t / /   t˛   ;
2 2e t 2 2 2 2
(5)

for which the maximizer is defined as


!

y2t 2 e' .˛tC1 C ˛t1 / 4 C 2
˛t D W C '; with ' D  C ; (6)
2.1 C 2 / 1C 2 2.1 C 2 /
Intraday Data and Volatility Forecasts 151

where W is the Lambert function. The second order Taylor approximation of `.˛t /
around ˛t is the log-kernel of a gaussian density with mean ˛t and variance

1 2e˛t 2

s2t D  D 2 2 /e˛t
: (7)
` .˛t /
00
yt 2
 C 2.1 C

This is the approximating density used to obtain samples for the vector of states.
The approximation is very good and the acceptance rates are very high. However,
even with chains that always move, sometimes they move slowly, and high levels of
autocorrelation are obtained. Due to the simplicity of the sampler, several strategies
may be considered to reduce the levels of autocorrelation and to define more efficient
estimation procedures. The main point to highlight is that, with SV models, gaussian
approximations are straightforward to implement.

4 Particle Filter Methods

The SV model is a state space model where the evolution of the states defines the
evolution of the volatility. Forecasts for the evolution of the states in this setting
require the development of simulation techniques known as Sequential Monte Carlo
(SMC), also referred as particle filter methods [21–27]. The aim is to update the filter
distribution for the states when new information arrive.
Using a model that depends on a set of parameters, all forecasts are conditioned
by the parameters. It is not realistic to assume that the parameters are known, and
the parameters are estimated through Bayesian estimation methods. This constitutes
an approximation, because even if model’s uncertainty is not taken into account, it
can be assumed that the parameters can vary over time.
The quantities of interest are the values of the states governing the evolution
of the volatility, which are propagated to define the predictive density of the
returns, defined here as f .ytC1 jy1Wt /. However, essential to the definition of this
distribution is the filter density associated with the states, f .˛t jy1Wt /. Bayes’ rule
allows us to assert that the posterior density f .˛t jy1Wt / of states is related to
the density f .˛t jy1Wt1 / prior to yt , and the density f .yt j˛t / of yt given ˛t by
f .˛t jy1Wt / / f .yt j˛Rt /f .˛t jy1Wt1 /. The predictive density of ytC1 given y1Wt is defined
by f .ytC1 jy1Wt / D f .ytC1 j˛tC1 /f .˛tC1 jy1Wt /d˛tC1 .
Particle filters approximate the posterior density of interest, f .˛t jy1Wt /, through
a set of m “particles” P f˛t;1 ; : : : ; ˛t;m g and their respective weights ft;1 ; : : : ; t;m g,
where t;j 0 and m jD1 t;j D 1. This procedure must be implemented sequentially
with the states evolving over time to accommodate new information that arrive. It
is difficult to obtain samples from the target density, and an approximating density
is used instead, afterwards the particles are resampled to better approximate the
target density. This is known as the sample importance resampling (SIR) algorithm.
A possible approximating density is given by f .˛t j˛t1 /, however, [27, 28] pointed
152 A.A.F. Santos

out that as a density to approximate f .˛t jy1Wt / is not generally efficient, because
it constitutes a blind proposal that does not take into account the information
contained in yt .

4.1 Particle Filter for the SV Model

Through SMC with SIR the aim is to update sequentially the filter density for
the states. The optimal importance density is given by f .yt j˛t /f .˛t j˛t1 ; yt /, which
induces importance weights with zero variance. Usually it is not possible to obtain
samples from this density, and an importance density g.˛t /, different from the
optimal density, is used to approximate the target density.
To approximate the filter densities associated with the SV model, [27] considered
the same kind of approximations used to sample the states in a static MCMC setting.
However, the approximations were based on a first order Taylor approximation,
and it was demonstrated by Smith and Santos [29] that they are not robust when
information contained in more extreme observations need to be updated (also called
very informative observations). In [29], a second order Taylor approximation for the
likelihood combined with the predictive density for the states leads to improvements
in the particle filter algorithm. As the auxiliary particle filter in [27], avoids blind
proposals like the ones proposed in [30], takes into account the information in yt ,
and defines a robust approximation for the target density, which also avoids the
degeneracy of the weights.
Here we develop the aforementioned results using a robuster approximation for
the importance density. The logarithm of the density f .yt j˛t /f .˛t j˛t1 /, `.˛t /, is
concave on ˛t , and to maximize the function in order to ˛t , let us consider the first
derivative equal to zero, `0 .˛t / D 0. Solving in order to ˛t the solution is
!
y2t 2
 e 2

˛t DW C ; with  D .1  / C ˛t1  : (8)
2 2

The second derivative is given by `00 .˛t / D .2e˛t C 2 y2t /=.2 2 e˛t /, which is
strictly negative for all ˛t , so ˛t maximizes the function `.˛t / defining a global
maximum. The second order Taylor expansion of `.˛t / around ˛t defines the log-
kernel of a gaussian density with mean mt D ˛t and variance

2
2  e mt
s2t D 2
: (9)
2emt C  y2t

This gaussian density will be used as the importance density in the SIR algorithm.
In the procedures implemented, the estimates of interest were approximated
using particles with equal weights, which means that a resampling step is performed.
Assuming at t  1 a set of m particles ˛t1
m
D f˛t1;1 ; : : : ; ˛t1;m g with associated
Intraday Data and Volatility Forecasts 153

weights 1=m, which approximate the density f .˛t1 jy1Wt1 /, the algorithm proceeds
as follows:
1. Resample m particles from ˛t1 m
, obtaining ˛tm D f˛t;1 ; : : : ; ˛t;m g.
2. For each element of the set, ˛t;i , i D 1; : : : ; m, sample a value from a
gaussian distribution with mean and variance defined by (8) and (9), respectively,
 
obtaining the set f˛t;1 ; : : : ; ˛t;m g.
3. Calculate the weights,
 
f .yt j˛t;i /f .˛t;i j˛t1;i / wi
wi D  ; i D Pm : (10)
g.˛t;i jmt ; s2t / iD1 wi

 
4. Resample from the set f˛t;1 ; : : : ; ˛t;m g using the set of weights f1 ; : : : ; m g
obtaining a sample f˛tj1Wt;1 ; : : : ; ˛tj1Wt;m g, where to each particle a weight of 1=m
is associated.
For the one step-ahead volatility forecast, having the approximation to the density
f .˛t jy1Wt /, and due to the structure of the system equation in the SV model, AR(1)
with gaussian noise, it is easy to sample from f .˛tC1 jy1Wt /, the predictive density for
the states.

5 An Empirical Demonstration

We describe the data, and we compute some statistics able to summarize the
information. It is highlighted the differences in the amount of information within
each dataset. Intraday data are used to calculate the measures of realized volatility.
Forecasts of daily volatility are obtained using daily observations through filter
distributions for the states within the SV model. Filter distributions depend on the
parameters’ values, which are estimated before the forecasting. Data observed in
time intervals of equal length are the ones mostly used in econometric models.
However, intraday observations are becoming largely available. Considering only
the prices that correspond to transactions, observations not equally spaced in time
have to be considered. All observations available were used to compute the RV
measures and a comparison is established with the SV forecasts.

5.1 Data Description

Intraday observations were used to five stocks traded in US stock markets. The
companies are The Boeing Company (BA), Bank of America (BAC), General
Electric (GE), International Business Machines Corporation (IBM), and Intel
(INTC). The observations were collected from the publicly available web pages
154 A.A.F. Santos

Table 1 Summary of the data


Stock M-Pr M-Vol MTDS N. Obs. M-RV SD-RV Sk-RV Min-RV Max-RV
BA 137:74 89:55 3:40 2,235,593 1:57 4:98 16:94 0:09 89:59
BAC 16:47 379:81 4:32 2,242,690 1:63 1:63 7:93 0:07 22:93
GE 25:95 230:72 6:84 2,242,865 1:47 8:95 17:67 0:08 161:75
IBM 169:46 86:47 3:43 2,238,062 1:15 1:46 10:73 0:11 22:87
INTC 31:51 185:16 6:13 2,243,192 1:85 2:11 6:83 0:26 26:41
Means for the price and volume are presented, and the number of observations associated with each
stock. The mean time difference in seconds (MTDS) is presented in the fourth column. On the right-
side, the mean, standard deviation, skewness, minimum and maximum statistics are computed for
the series of the realized volatility measure

of the company BATS Global Markets, Inc. from January 30, 2014 to August 27,
2015. The web page address is https://www.batstrading.com, and software written
in Objective-C was used to record the intraday data for the period considered.
The data correspond to more than two millions observations per stock. The second
kind of observations are end-of-day prices from January 29, 2012 to August 27,
2015 obtained through the publicly available dataset supplied by Yahoo! Inc. The
web page address is http://finance.yahoo.com, and the data can be obtained using
publicly available software, written in R or Matlab, or even using the Yahoo Query
Language (YQL).
Through the analysis of Table 1, we can assess some of the main characteristics
associated with the intraday data. The mean of the price is related to the mean of
the volume, and the mean time-span between consecutive transactions is around 5 s.
However in our dataset, we find that simultaneous transactions are very common.
Regarding the RV measure, the mean values are not significantly different between
the stocks, but standard deviations and values for the skewness are more variable.
The distributions of the RVs have positive skewness, and in some days the realized
volatility assumes very high values. The maximum values for all the assets were
obtained on August 24, 2015, when in a few seconds during the day the Down
Jones Index fell more than 1000 points due to the effects of the crisis in the Chinese
stock exchanges.

5.2 Parameters Estimation for the SV Model

Before the application of the particle filter methods, which are used to approximate
the filter distributions for the states, the parameters of the model need to be defined,
but it is unrealistic to assume that the parameters are constant over time. We use
a moving window of financial returns as a way of estimating the time-varying
parameters.
The parameters were estimated using the observations prior to the ones used
to update the filter distributions. The first sample corresponds to observations
Intraday Data and Volatility Forecasts 155

from January 29, 2012 to January 28, 2014 (500 observations). The first filter
distribution corresponds to January 29, 2014, which will be used to forecast the
volatility for the next day. After being used to update the filter distribution, the
observation is incorporated in the sample, and the parameters are estimated again.
The first observation of the sample is deleted to maintain the number of observations
constant. Regarding the estimation of the parameters, three choices were made.
First, a sample of 500 observations was used, corresponding approximately to 2
years of daily data. Second, it was assumed that parameters could vary over time.
The parameters are estimated at each iteration of the particle filter. Finally, we are
aware that 500 observations can be too few to estimate the parameters of a nonlinear
state space model, and strong prior distributions for the parameters were assumed.
In the estimation of the SV models using Bayesian methods and MCMC, most
results are associated with the design of efficient samplers for the states. Given the
states, it is fairly straightforward to obtain samples for the parameters. Here we
adopt the prior distributions commonly found in the literature, a gaussian to , a
beta to . C1/=2, and a scaled chi-square to 2 . This is in line with the literature and
used in one of the articles where a very efficient way for estimating the SV model is
presented, see [20]. Due to the samples with a relative small number of observations,
using the same form for the prior densities, we have considered somehow stronger
prior distributions. In Table 2, we present the estimates for the parameters using the
entire sample (January 29, 2012–August 27, 2015) for all the companies considered
in this section. The estimates are in line with the ones commonly found in the
literature, especially the values for the persistence parameter and the values for
the effective sample size (ESS). However, it is not realistic to assume the values of
the parameters are constant over time, which is confirmed by the estimations using
a moving window. Due to space restrictions, we present only the evolution of the
parameters’ estimates associated with the BAC in Fig. 1. It is apparent that it is
reasonable to assume that the parameters vary over time, and at each iteration of the
particle filter algorithm, a renewed set of parameters are used based on the updated
sample.

Table 2 Summary of the SV estimation using daily returns



Stock Mean sd ESS Mean sd ESS Mean sd ESS
BA 0:35 0:06 3480 0:76 0:07 458 0:41 0:07 525
BAC 0:59 0:09 2682 0:94 0:02 311 0:27 0:06 251
GE 0:27 0:09 691 0:87 0:04 407 0:37 0:06 468
IBM 0:27 0:09 547 0:88 0:04 467 0:44 0:05 614
INTC 0:44 0:07 4335 0:83 0:04 581 0:44 0:06 537
For each stock the mean of the chains (parameter estimate), the respective standard deviation, and
a measure of estimation inefficiency (ESS) are presented. Chains with 20,000 observations were
considered after a burn-in of 2000 observations
156 A.A.F. Santos

μ evolution φ evolution σ η evolution


0.62 1 0.45

0.6
0.95 0.4
0.58
0.35
0.56 0.9

0.54

ση
μ

0.3
0.85
0.52
0.25
0.5
0.8
0.48 0.2

0.46 0.75
0.15
0 100 200 300 400 0 100 200 300 400
0 50 100 150 200 250 300 350 400
iteration iteration
iteration

Fig. 1 The figure illustrates the evolution of the mean of the chains for the parameters for SV
model associated with the BAC. With a moving window of 500 observations 398 estimation
processes were applied

5.3 Volatility Measures Comparison

The comparisons are made between the RV measures and the evolution of the
predictive distribution of the states associated with the SV model. A Bayesian
approach is adopted to approximate through simulations (particle filter) the filter
distribution for the states. The RV measures are obtained using ultra-high-frequency
data, they are model-free and represent point estimates through time. One the other
hand, the volatility evolution obtained through the SV model is represented by the
evolution of the predictive distribution of the states. This uses less information
(daily returns) and is model-dependent. Because of the differences in the amount
of information and the dependence on a model, there is the possibility that the
volatility evolution maybe different for each approach. In the experiments with the
data, we compare the evolution of the RV measure through time with the evolution
of the quantiles associated with the predictive distribution of the states. To assess
the compatibility, we check if the RV measures are within the bounds defined by the
respective quantiles of the predictive distributions of the states.
When the filter distributions of the states are approximated through the particle
filter, we found out that the results are very sensitive to the parameters used in the
SV model. As described above, we considered time-varying parameters obtained
through a sample obtained using a moving window, and the main adjustments
performed were related to the prior distributions associated with the parameters. For
the parameter , the prior considered was N.0; 0:1/ (BA,BAC,GE,IBM), and
N.0:5; 0:1/ (INTC). To the parameter , beta distributions were considered,
B.a; b/, where b D 2:5 for all stocks, a D 50 (BA,BAC,GE), and a D 80
(IBM,INTC). Finally, for 2 the prior distribution 2 ı21 is considered with
ı D 0:1.
The main results are presented in Table 3 and Fig. 2 (just for the GE due to space
constraints) where the evolution of the RV is compared with the evolution of the
quantiles. In general the results are satisfactory, the predictive distribution of the
states accommodates the evolution of the RV measure. There are some cases where
Intraday Data and Volatility Forecasts 157

Table 3 Summary of Stock p0:05 p0:1 p0:2 p0:4


quantiles evolution associated
with the SV model compared BA 0:061 0:129 0:200 0:403
with the RV measures BAC 0:061 0:095 0:197 0:387
GE 0:067 0:126 0:203 0:393
IBM 0:043 0:086 0:138 0:342
INTC 0:046 0:080 0:138 0:273
The p-value, p˛ , represents the sum of the
probabilities on the tails. In the table are rep-
resented the proportions of the RV in the tails,
compared with the quantiles associated with
the predictive distribution of the states

Log of realized volatility evolution and respective quantiles


2.5

1.5

0.5
volatility

-0.5

-1

-1.5

-2

-2.5
50 100 150 200 250 300
time

Fig. 2 This figure depicts, for the stock GE, the evolution of the logarithm of the RV (solid line),
and the respective quantiles, q0:05 and q0:95 (dashed line), and q0:5 (plus sign), associated with the
predictive distribution of the states in the SV model

the adjustment is more pronounced and there is some divergence in others. The most
evident divergence is with the INTC, which is in line with the adjustments that were
needed for the prior distributions of the parameters. In the other cases the adjustment
is better, especially with the BA, BAC, and GE.

6 Conclusion

Intraday data represent an important increase in the information available to


characterize the evolution of the financial volatility. This can be compared with the
more standard characterization given by models using data with lower frequency
like the SV model. A model-free approach (higher frequencies) can be combined
with an approach more model-dependent (lower frequencies) when the measures
of volatility that are supplied by both approaches are compatible. We found some
evidence that the compatibility exists, and we think that further research is necessary
to accommodate the different sources of information for a better understanding of
the evolution of the financial volatility.
158 A.A.F. Santos

Acknowledgements I would like to thank the editors of this volume and organizers of the ITISE
2015, Granada, professors Ignacio Rojas and Héctor Pomares, as well as all the participants in the
conference that contributed with their feedback to improve the results here present.

References

1. Engle, R.F.: Autoregressive conditional heteroscedasticity with estimates of the variance of


united kingdom inflation. Econometrica: J. Econometric Soc. 987–1007 (1982)
2. Bollerslev, T.: Generalized autoregressive conditional heteroskedasticity. J. Econometrics
31(3), 307–327 (1986)
3. Taylor, S.J.: Modelling Financial Time Series. Wiley, Chichester (1986)
4. Andersen, T.G., Bollerslev, T.: Intraday periodicity and volatility persistence in financial
markets. J. Empirical Finance 4(2), 115–158 (1997)
5. Andersen, T.G., Bollerslev, T., Diebold, F.X., Ebens, H.: The distribution of realized stock
return volatility. J. Financ. Econ. 61(1), 43–76 (2001)
6. Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P.: Modeling and forecasting realized
volatility. Econometrica 71(2), 579–625 (2003)
7. Andersen, T.G., Bollerslev, T., Lange, S.: Forecasting financial market volatility: sample
frequency vis-a-vis forecast horizon. J. Empirical Finance 6(5), 457–477 (1999)
8. Andersen, T.G., Bollerslev, T., Meddahi, N.: Correcting the errors: volatility forecast evaluation
using high-frequency data and realized volatilities. Econometrica 73(1), 279–296 (2005)
9. Hansen, P.R., Lunde, A.: Realized variance and market microstructure noise. J. Bus. Econ.
Stat. 24(2), 127–161 (2006)
10. Koopman, S.J., Scharth, M.: The analysis of stochastic volatility in the presence of daily
realized measures. J. Financ. Econometrics 11(1), 76–115 (2012)
11. Takahashi, M., Omori, Y., Watanabe, T.: Estimating stochastic volatility models using daily
returns and realized volatility simultaneously. Comput. Stat. Data Anal. 53(6), 2404–2426
(2009)
12. Barndorff-Nielsen, O.E., Shephard, N.: Econometric analysis of realized volatility and its use
in estimating stochastic volatility models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 64(2), 253–
280 (2002)
13. Barndorff-Nielsen, O.E., Shephard, N.: Econometric analysis of realized covariation: high
frequency based covariance, regression, and correlation in financial economics. Econometrica
72(3), 885–925 (2004)
14. Zhang, L., Mykland, P.A., Aït-Sahalia, Y.: A tale of two time scales. J. Am. Stat. Assoc.
100(472) (2005)
15. Jacquier, E., Polson, N.G., Rossi, P.E.: Bayesian analysis of stochastic volatility models. J.
Bus. Econ. Stat. 12(4), 371–89 (1994)
16. Shephard, N., Pitt, M.K.: Likelihood analysis of non-gaussian measurement time series.
Biometrika 84(3), 653–667 (1997)
17. Kim, S., Shephard, N., Chib, S.: Stochastic volatility: likelihood inference and comparison
with arch models. Rev. Econ. Stud. 65(3), 361–393 (1998)
18. Chib, S., Nardari, F., Shephard, N.: Markov Chain Monte Carlo methods for stochastic
volatility models. J. Econometrics 108(2), 281–316 (2002)
19. Chib, S., Nardari, F., Shephard, N.: Analysis of high dimensional multivariate stochastic
volatility models. J. Econometrics 134(2), 341–371 (2006)
20. Omori, Y., Chib, S., Shephard, N., Nakajima, J.: Stochastic volatility with leverage: fast and
efficient likelihood inference. J. Econometrics 140(2), 425–449 (2007)
21. Andrieu, C., Doucet, A., Holenstein, R.: Particle Markov Chain Monte Carlo methods. J. R.
Stat. Soc. Ser. B (Stat. Methodol.) 72(3), 269–342 (2010)
Intraday Data and Volatility Forecasts 159

22. Carpenter, J., Clifford, P., Fearnhead, P.: Improved particle filter for nonlinear problems. IEE
Proc. Radar Sonar Navig. 146(1), 2–7 (1999)
23. Del Moral, P., Doucet, A., Jasra, A.: Sequential monte carlo samplers. J. R. Stat. Soc. Ser. B
(Stat. Methodol.) 68(3), 411–436 (2006)
24. Doucet, A., Godsill, S., Andrieu, C.: On sequential Monte Carlo sampling methods for
Bayesian filtering. Stat. Comput. 10(3), 197–208 (2000)
25. Fearnhead, P., Wyncoll, D., Tawn, J.: A sequential smoothing algorithm with linear computa-
tional cost. Biometrika 97(2), 447–464 (2010)
26. Godsill, S., Clapp, T.: Improvement strategies for Monte Carlo particle filters. In: Sequential
Monte Carlo Methods in Practice, pp. 139–158. Springer, Berlin (2001)
27. Pitt, M.K., Shephard, N.: Filtering via simulation: auxiliary particle filters. J. Am. Stat. Assoc.
94(446), 590–599 (1999)
28. Pitt, M.K., Shephard, N.: Auxiliary variable based particle filters. In: Sequential Monte Carlo
Methods in Practice, pp. 273–293. Springer, Berlin (2001)
29. Smith, J., Santos, A.A.F.: Second-order filter distribution approximations for financial time
series with extreme outliers. J. Bus. Econ. Stat. 24(3), 329–337 (2006)
30. Gordon, N.J., Salmond, D.J., Smith, A.F.: Novel approach to nonlinear/non-gaussian Bayesian
state estimation. In: IEE Proceedings F (Radar and Signal Processing), IET, vol. 140, pp.
107–113. (1993)
Predictive and Descriptive Qualities of Different
Classes of Models for Parallel Economic
Development of Selected EU-Countries

Jozef Komorník and Magdaléna Komorníková

Abstract In this paper, we extend and modify our modelling [presented in the con-
ference paper (Komorník and Komorníková, Predictive and descriptive models of
mutual development of economic growth of Germany and selected non-traditional
EU countries. In: ITISE 2015, International Work-Conference on Time Series,
pp. 55–64. Copicentro Granada S.L, 2015)] of the parallel development of GDP
of Germany (as the strongest EU economy), the so-called V4 countries (Poland,
the Czech Republic, Hungary, Slovakia) and Greece (as the most problematic
EU economy). Unlike in Komorník and Komorníková (Predictive and descriptive
models of mutual development of economic growth of Germany and selected non-
traditional EU countries. In: ITISE 2015, International Work-Conference on Time
Series, pp. 55–64. Copicentro Granada S.L, 2015), we analyse the data provided
by OECD (freely available from http://stats.oecd.org/index.aspx?queryid=218) that
are expressed in USD (using the expenditure approach) and covering a longer time
interval than our former data from EUROSTAT (http://appsso.eurostat.ec.europa.
eu/nui/show.do?wai=true&data-set=namq_10_gdp) (expressed in EUR using the
output approach). The best predictive quality models were found in the class of
multivariate TAR (Threshold Autoregressive) models with aggregation functions’
type thresholds. On the other hand, the best descriptive quality models were found
in the competing classes of one-dimensional MSW (Markov Switching) and STAR
(Smooth Transition Autoregressive) models.

Keywords Aggregation function • Multivariate TAR models • OMA function •


One-dimensional TAR • OWA function • Predictive and descriptive models •
STAR and MSW models • Time series

J. Komorník ()
Faculty of Management, Comenius University, Odbojárov 10, P.O. Box 95, 820 05 Bratislava,
Slovakia
e-mail: [email protected]
M. Komorníková
Faculty of Civil Engineering, Slovak University of Technology, Radlinského 11, 810 05
Bratislava, Slovakia
e-mail: [email protected]

© Springer International Publishing Switzerland 2016 161


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_13
162 J. Komorník and M. Komorníková

1 Introduction

It is well known that shortly after the dramatic political changes two and a half
decades ago the foreign trade of the so-called Visegrad group (V4) countries: Poland
(Pl), the Czech Republic (Cz), Hungary (Hu) and Slovakia (Sk) has been largely
reoriented from the former Soviet Union to EU (and predominantly to Germany).
These intensive trade relations have greatly influenced the economic development
of the V4 countries. In Figs. 1, 2 and 3 we present the parallel development of
the growth of GDP (in %) in the period 2000Q1–2015Q1 for Germany (De), V4
countries and Greece (Gr). These graphs have been calculated from the seasonally
adjusted quarterly data of GDP provided by the OECD [12] (expressed in USD
applying the expenditure approach).
Since the first three of the above V4 countries still use their national currencies,
using USD (as a neutral currency) provides a more balanced approach to the data of
the individual considered countries.
We can observe a parallel dramatic drop of GDP during the short period of the
global financial markets’ crisis around 2009 (that has been the most severe in case
of Slovakia) followed by a subsequent moderate recovery in all considered countries
(except for Greece, where the period of negative GDP growth lasted for more than

Fig. 1 The development of the GDP growth (in %) for Germany (black) and (gray) Czech
Republic—left, Slovakia—right

Fig. 2 The development of the GDP growth (in %) for Germany (black) and (gray) Poland—left,
Hungary—right
Qualities of Different Classes of Models for Parallel Economic Development 163

Fig. 3 The development of the GDP growth (in %) for Germany (black) and Greece (gray)

Table 1 The correlation De Cz Pl Hu Sk Gr


matrix (in each row maximal
values are in bold and De 1 0:651 0:267 0:584 0:567 0:201
minimal values are in italic) Cz 0:651 1 0:363 0:712 0:698 0:474
Pl 0:267 0:363 1 0:135 0:185 0:037
Hu 0:584 0:712 0:135 1 0:464 0:445
Sk 0:567 0:698 0:185 0:464 1 0:415
Gr 0:201 0:474 0:037 0:445 0:415 1

4 years starting in 2010). It is also noticeable that slight problems with the German
GDP growth around 2013 are accompanied by similar problems in the last three of
the V4 countries (Cz, Hu, Sk). The Polish economy is the largest in the V4 group
and seems to be more robust than the other three.
As we can see in Table 1, the German data exhibit high correlations with the last
three of the V4 countries (Cz, Hu, Sk) with the maximum reached by the Czech
Republic. Low values of correlations of the Polish data with the remaining ones
correspond to a stronger robustness of the Polish economy (which is by far the
largest in the V4 group). The correlations between the couples from the triple Cz,
Hu, Sk are the greatest among all considered couples of countries, but that slight
numerical paradox can not weaken the significance of the clear German dominance
of the foreign trade of those three countries.
The rest of the paper has the following structure. The second part briefly presents
theoretical foundations of the used modelling methodology and the third part
contains the results of our calculations. Finally, related conclusions are outlined.
164 J. Komorník and M. Komorníková

2 Theoretical Background

2.1 Aggregation Functions

Aggregation functions have been a genuine part of mathematics for years. Recall,
e.g., several types of means (e.g. Heronian means) considered in ancient Greece.
The theory of aggregation functions, however, has only been established recently,
and its overview can be found in [2, 5], among others. We will deal with inputs and
outputs from the real unit interval Œ0; 1 only, though, in general, any real interval
I could be considered. Formally, an aggregation function assigns to an m-tuple of
inputs one representative value (this value does not need to be from the inputs set,
compare arithmetic mean, for example).
Recall that an aggregation function is defined for a fixed m 2 N; m 2 as a
mapping (see [2, 5])

A W Œ0; 1m ! Œ0; 1

that is non-decreasing in all its components and satisfies A.0; : : : ; 0/ D 0,


A.1; : : : ; 1/ D 1.
Definition 1 (see [2, 5]) An aggregation function A is called
– idempotent if A.x; : : : ; x/ D x for all x 2 Œ0; 1,
– additive if A.x C y/ D A.x/ C A.y/ for all x; y 2 Œ0; 1m with x C y 2 Œ0; 1m ,
– modular if A.x _ y/ C A.x ^ y/ D A.x/ C A.y/ for all x; y 2 Œ0; 1m ,
where x _ y D .max.x1 ; y1 /; : : : ; max.xm ; ym // and x ^ y D .min.x1 ; y1 /; : : : ;
min.xm ; ym // :
Note that any additive aggregation function is idempotent and modular.
Each additive aggregation function is a weighted average

X
m
WA.x/ D wi xi (1)
iD1

P
for given non-negative weights w1 ; : : : ; wm ; miD1 wi D 1.
Yager [17, 18] introduced the OWA function as a symmetrization of a weighted
arithmetic mean, i.e. of an additive aggregation function.
Definition 2 (See Definition 2 in [11]) Let A W Œ0; 1m ! Œ0; 1 be an aggregation
function. Symmetrized aggregation function SA W Œ0; 1m ! Œ0; 1 is given by
 
SA .x/ D A x .1/ ; : : : ; x .m/ ; (2)

where is a permutation of .1; : : : ; m/ such that x .1/    x .m/ . If A D WA is


weighted average (with weights w1 ; : : : ; wm ), then SA D OWA is called an ordered
Qualities of Different Classes of Models for Parallel Economic Development 165

weighted average

X
m
SA .x/ D OWA.x/ D wi x .i/ : (3)
iD1

In [11], Mesiar and Zemánková introduced the OMA function as a symmetrization


of a modular average, i.e. of a modular aggregation function.
Definition 3 (See [11]) Let A W Œ0; 1m ! Œ0; 1 Pmbe an idempotent modular
aggregation function (modular average), A.x/ D Pfi .xi /, where f1 ; : : : ; fm W
iD1
Œ0; 1 ! Œ0; 1 are non-decreasing functions satisfying m iD1 fi D id (identity on
[0, 1]). Then, its symmetrization SA is called OMA (ordered modular averages), i.e.
SA D OMA is given by

X
m
SA .x/ D OMA.x/ D fi .x .i/ /; (4)
iD1

where the permutation satisfies x .1/  x .m/ .

The following ways to defineP functions fi have been used: Pi


m
Let .w1 ; : : : ; wm / 2 Œ0; 1m ; iD1 wi D 1 are some weights. Define vi D jD1 wj ,
i.e. v0 D 0; v1 D w1 ; v2 D w1 C w2 ; : : : ; vm D 1. Let C W Œ0; 12 ! Œ0; 1 be a
copula function [5]. Then

fi .x/ D C.x; vi /  C.x; vi1 /: (5)

In this paper the copula function C.M/ .u; v/ D Min.u; v/ D min.u; v/ has been
.M/
used. The corresponding functions fi are

.M/
fi .x/ D min .max .x  vi1 ; 0/ ; wi / : (6)

Note that all OWA functions present a specific subclass of OMA functions and
can be expressed in the form of (4) and (5) for the product copula C.u; v/ D
….u; v/ D u v corresponding to all couples of random vectors .X; Y/ with
independent components and continuous marginal distribution functions.
Remark Observe that the above-mentioned aggregation functions can be seen as
particular integrals. Indeed, weighted averages WA coincide with the Lebesgue
P
integral based on a probability measure p W 2f1;:::;mg ! Œ0; 1; p.E/ D i2E wi .
Similarly, OWA function is the Choquet
PcardE integral based on a symmetric capacity
W 2f1;:::;mg ! Œ0; 1; .E/ D jD1 wj D vcardE . OMA functions are copula-
based integrals based on defined above. When C D Min, see (6), then the Sugeno
integral based on is recovered. For more details see [7].
166 J. Komorník and M. Komorníková

2.2 Regime-Switching Models

Regime-switching models became an interesting tool for the modelling of non-


linear time series. The reason is the great ability to adjust to dynamic behaviour
of time series which are influenced by dramatic and occasional events such as
financial crises, wars, natural disasters, changes of political scenes and so on. All
these events can influence the behaviour or the trend of economic and financial time
series and many others. Such discrete shifts can cause changes in model parameters.
For instance, the expected value can attain two or more different values according
to which “regime” or “state” the model is in. We focus on models that assume
that in each regime the dynamic behaviour of the time series is determined by an
Autoregressive (AR) model, such as Threshold AR (TAR), Self-exciting Threshold
AR (SETAR) and Smooth Transition AR (STAR) models. We also cover more
general Markov switching (MSW) models using state space representations.
We differ between two types of regime-switching models. The models in the first
types assume that the regimes can be characterized (or determined) by an observable
variable. To this type belong, e.g., TAR, SETAR and STAR models. The models
of the second type (to which Markov-Switching model belong) assuming that the
regime cannot actually be observed but is determined by an underlying unobservable
stochastic process (for more details see, e.g., [4]).

2.2.1 TAR Models

TAR models were first proposed by Tong [14] and discussed in detail in Tong [15].
The TAR models are simple and easy to understand, but rich enough to generate
complex non-linear dynamics.
A standard form of k-regimes TAR for a m-dimensional time series yt D
.y1;t ; : : : ; ym;t /; t D 1; : : : ; n is given by (see, e.g., [16])

. j/
X
p
. j/ . j/
yt D ˚0 C ˚i yti C "t if cj1 < qt  cj ; t D p C 1; : : : ; n; (7)
iD1

where n is the length of the time series, p D max.p1 ; : : : ; pk /, pj is the order of


autoregressive model in the state regime j D 1; : : : ; k with corresponding matrices
. j/ . j/
˚i ; i D 1; : : : ; pj of model coefficients of type m  m, ˚0 are constant vectors
of the dimension m, qt is an observable threshold variable, cj ; j D 0; 1; : : : ; k are
threshold values that delimit individual regimes of the model (c0 D 1; ck D
. j/
C1) and "t is an m1 unobservable zero mean normal white noise vector process
(serially uncorrelated) with time invariant covariance matrix ˙j .
The methods of estimation of parameters of the TAR model will be discussed in
subsequent sub-section of this paper. The selection between k-regime TAR models
Qualities of Different Classes of Models for Parallel Economic Development 167

will be based on the Akaike information criterion (see, e.g., [1, 8, 16])

X
k
AIC D nj ln.det.˙O j // C 2m.m p C 1/; (8)
jD1

where nj ; j D 1; : : : ; k are number of data in the jth regime and ˙O j is a strongly


consistent estimator of covariance matrix ˙j of residuals ". j/ .
For multi-dimensional TAR model (for m > 1) the notation MTAR is used.

2.2.2 STAR Models

In the TAR models, a regime switch happens when the threshold variable crosses
a certain threshold. If the discontinuity of the threshold is replaced by a smooth
transition function 0 < F.qt ; ; c/ < 1, (where the parameter c can be interpreted
as the threshold, as in TAR models, and the parameter  determines the speed and
smoothness of the change in the value of the transition function), TAR models can be
generalized to STAR models [4, 13]. The k-regimes one-dimensional STAR model
is given by (see, e.g., [4, 13])

X
k1
yt D ˚1  Xt C .˚iC1  ˚i /  Xt F.qt ; i ; ci / C "t ; t D p C 1; : : : ; n; (9)
iD1

where ˚i D .'0;i ; : : : ; 'p;i /; i D 1; 2; : : : ; k are autoregressive coefficients,


Xt D .1; yt1 ; : : : ; ytp /0 , c1 < c2 <    < ck1 are threshold values, i >
0; i D 1; : : : ; k  1 and "t ’s is the strict white noise process with EŒ"t  D 0; DŒ"t  D
2
" fort D 1; : : : ; n.
Two classes of the so-called transition function F.qt ; ; c/ will be applied:
1
1. The logistic function FL .qt ; ; c/ D 1Ce.q t c/
;  > 0. The resulting model is
called a Logistic STAR (LSTAR) model.
2
2. The exponential function FE .qt ; ; c/ D 1  e.qt c/ ;  > 0 and the resulting
model is called an Exponential STAR (ESTAR) model.
The following properties can be readily observed [15]:
1. If  is small, both transition functions switch between 0 and 1 very smoothly
and slowly; if  is large, both transition functions switch between 0 and 1 more
quickly.
2. As  ! 1, both transition functions become binary.

0 qt < c
lim FL .qt ; ; c/ D ; lim FE .qt ; ; c/ D 1 )
 !1 1 qt > c  !1
168 J. Komorník and M. Komorníková

the logistic function approaches the indicator function I.qt > c/ and the LSTAR
model reduces to a TAR model while the ESTAR model is reduced to the linear
model.
3.
1
lim FL .qt ; ; c/ D 0; lim FL .qt ; ; c/ D 1; lim FL .qt ; ; c/ D ;
qt !1 qt !C1 qt !c 2

the logistic function is monotonic and the LSTAR model switches between two
regimes smoothly depending on how much the threshold variable qt is smaller
than or greater than the threshold c.

lim FE .qt ; ; c/ D lim FE .qt ; ; c/ D 1; lim FE .qt ; ; c/ D 0;


qt !1 qt !C1 qt !c

the exponential function is symmetrical and the ESTAR model switches between
two regimes smoothly depending on how far the threshold variable qt is from the
threshold c.
In this paper, for one-dimensional TAR and STAR models two alternative
approaches to the construction of the threshold variable qt will be used. The first
one is a traditional self-exciting, where qt D ytd for a suitable time delay d > 0.
We denote this class of TAR models as SETAR (Self-Exciting TAR) models. The
second approach is using the exogenous threshold variable that is the delayed values
of the German data.
For m-dimensional MTAR models (m D 6), again two alternative approaches to
the construction of the threshold variable qt will be used. The first one is using the
delayed values of the German data. The second approach is to use the outputs of
aggregation functions in the role of threshold variables. For a suitable aggregation
function A the value of threshold variable is

qt D A.y1;td ; : : : ; ym;td / (10)

for a suitable delay d > 0.


In the role of aggregation functions, the following aggregation functions types
were chosen: projection to first coordinate (German data) with w1 D 1 and wi D 0
otherwise, Arithmetic Mean (M) with wi D m1 ; i D 1; : : : ; m, Weighted Average
(WA) and Ordered Weighted Average (OWA) with Sierpinski carpet S related to a
probability p (see, e.g., [2, 5])

S D .w1;m ; : : : ; wm;m /; w1;m D pm1 ; wi;m D .1  p/ pmi for i > 1: (11)

From the class of OWA operators we used MIN function (with wm D 1 and wi D 0
otherwise) and MAX function (with w1 D 1 and wi D 0 otherwise), corresponding
to extremal cases. Finally, Ordered Modular Average (OMA) function with Sierpin-
.M/
ski carpet S and functions fi (6), (11) was also used.
Qualities of Different Classes of Models for Parallel Economic Development 169

2.2.3 Markov-Switching Models MSW

Discrete state Markov processes are very popular choices for modelling state-
dependent behaviour in natural phenomena, and are natural candidates for modelling
the hidden state variables in Markov switching models.
Assuming that the random variable st can attain only values from the set
f1; 2; : : : ; kg, the basic k-regime autoregressive Markov-switching model is given
by [6]

yt D '0;st C '1;st yt1 C    C 'p;st ytp C "t ; st D 1; : : : ; k; (12)

where st is discrete ergodic first order Markov process, the model is described in
particular regimes by an autoregressive model AR(p) and t D p C 1; : : : ; n; n is the
length of time series. We assume that "t i.i.d. N.0; "2 /.
The Akaike information criterion (AIC) [10] has the form

AIC D 2 ln L.O ML j˝n / C 2r; (13)

where O ML is the maximum likelihood estimation of parameter vector, L./ is a


likelihood function of the model, ˝n denotes the history of the time series up to
and including the observation at time n, r is the number of independent model
parameters and n is the length of time series.

2.3 Model Specification

In order to be able to utilize TAR models with threshold variables obtained by


applications of aggregation functions on the values of the considered time series, we
transformed them into the interval Œ0; 1 by applications of linear transformations
y  ymin
:
ymax  ymin

For evaluation of the predictive qualities of investigated models (that were measured
by the criterion Root Mean Square Error, RMSE) for one-step-ahead predictions
(see, e.g., [4]), the most recent 9 data (2013Q1–2015Q1) have been left out from
all individual time series and remaining earlier data (2000Q1–2012Q4) were used
for constructions of the models. All calculations were performed using the system
Mathematica, version 10.2.
Inspired by the recommendation from Franses and Dijk [4] the following steps
have been applied:
1. specification of an appropriate linear AR(p) model for the time series under
investigation,
170 J. Komorník and M. Komorníková

2. testing the null hypothesis of linearity against the alternative regime-switching


non-linearity,
3. estimation of the parameters in the selected non-linear model,
4. evaluation of the model using diagnostic tests,
5. modification of the model if necessary,
6. using the model for descriptive or forecasting purposes.
When testing the null hypothesis of linearity against the alternative of TAR, MTAR
or MSW-type non-linearity, we need to know the estimation of parameters of the
non-linear model. Therefore, in this case we start a specification procedure for the
model with estimation of parameters.
For multivariate TAR models the approach of Bacigál [1], Komorník and
Komorníková [8], and Tsay [16] was followed. First, an optimal fit of the data in
the class of multivariate linear autoregressive model was found (using Lewinson–
Durbin estimation procedure and minimizing information criterion AIC, see, e.g.,
[4, 16]). This model was used for testing linearity (H0 ) hypothesis against 2-regime
alternatives. The order of the autoregression for this model was considered as the
upper bound for autoregression orders for all regime-related autoregression models.
Next, estimates of the 2-regime MTAR model alternative with a threshold variable
qt [satisfying (10)] were calculated [1, 8, 16]. Testing the linearity against the 2-
regime MTAR model alternative was based on the method of arranged regression
described in [16].
For competing LSTAR, ESTAR and MSW models we applied standard proce-
dures of model specification described, e.g., in [4].
For each of the considered six time series, one-dimensional SETAR, LSTAR and
ESTAR models (where the threshold variables are delayed values of the same time
series) and MSW models were first constructed, followed by one-dimensional TAR,
LSTAR and ESTAR models with delayed values of the dominating German data
as the exogenous threshold variables for the remaining five time series. For both
above approaches and for each model, two as well as three regimes were consider.
First conditional ordinary least squares (OLS) estimates of regression parameters
(separately for each considered regime) and the corresponding conditional residual
variance were calculated. Further the optimal form of the threshold variable (i.e.
optimal value of delay d > 0) were selected by minimizing the Akaike information
criterion AIC (8). Then a test of linearity against the alternative of considered type of
non-linearity was performed. Finally, the residuals were tested for serial correlations
and remaining non-linearity.
The selection of the best model for each investigated class of models and each
considered country was continued applying minimization of the AIC criterion (8)
[or (13)] for MSW models).
Qualities of Different Classes of Models for Parallel Economic Development 171

3 Modelling Results

Table 2 contains the indicators of descriptive qualities expressed by the estimated


standard deviations of residuals. Table 3 presents out-of-sample indicators of pre-
dictive qualities of different models measured by one-step-ahead RMSE of forecasts
for nine most recent among the considered quarters for individual countries.
The first rows of Tables 2 and 3 correspond to the one-dimensional SETAR
models (with qt D ytd ). The next two rows of these tables contain the results
of one-dimensional and six-dimensional TAR models with the threshold variables
provided by the delayed German data.

Table 2 The descriptive errors ( " ) of selected models (for individual countries)
Standard deviation of residuals "
Model Threshold De Cz Pl Hu Sk Gr
SETAR 0:118 0:114 0:204 0:123 0:047 0:148
TAR De 0:118 0:115 0:206 0:124 0:052 0:118
MTAR De 0:139 0:119 0:280 0:140 0:156 0:215
MTAR WASC 0:143 0:125 0:279 0:151 0:164 0:227
MTAR OWASC 0:146 0:132 0:278 0:149 0:169 0:232
MTAR OMASC;Min 0:138 0:126 0:226 0:148 0:164 0:224
LSTAR 0:091 0:073 0:121 0:082 0:064 0:143
LSTAR De 0:091 0:074 0:114 0:085 0:094 0:098
ESTAR 0:089 0:068 0:120 0:070 0:048 0:143
ESTAR De 0:089 0:067 0:114 0:074 0:090 0:136
MSW 0:047 0:082 0:107 0:052 0:028 0:087
Values in bold represent the optimal models

Table 3 The predictive errors (RMSE) of selected models (for individual countries)
Prediction errors RMSE for 8 one-step-ahead forecasts
Model Threshold De Cz Pl Hu Sk Gr
SETAR 0:037 0:035 0:064 0:053 0:004 0:088
TAR De 0:037 0:034 0:064 0:035 0:034 0:087
MTAR De 0:057 0:099 0:079 0:140 0:037 0:074
MTAR WASC 0:020 0:104 0:054 0:053 0:026 0:054
MTAR OWASC 0:060 0:083 0:095 0:060 0:023 0:067
MTAR OMASC;Min 0:074 0:048 0:136 0:028 0:002 0:039
LSTAR 0:072 0:164 0:262 0:215 0:063 0:292
LSTAR De 0:072 0:139 0:369 0:262 0:036 0:130
ESTAR 0:067 0:432 0:263 0:214 0:415 0:294
ESTAR De 0:067 0:217 0:315 0:237 0:138 0:087
MSW 0:261 0:374 0:307 0:180 0:187 0:257
Values in bold represent the optimal models
172 J. Komorník and M. Komorníková

The next three rows of Tables 2 and 3 correspond to six-dimensional TAR


models with threshold variables constructed by the use of the Sierpinsky carpet
given by (11) via the relations (1), (3) (for two functions WASC and OWASC ) and
by (4)–(6) for the operator OMASC .
The best models in the classes LSTAR and ESTAR with endogenous or exoge-
nous threshold variables, as well as in the MSW class (which correspond with the
last five rows of Tables 2 and 3) were included in the final version of our conference
paper [9], following the suggestions of one of its referees. Like in [9], these classes
again provide the best descriptive quality models (for all considered countries).
However, the models with the best (out-of-sample) predictive qualities (for all
considered countries) are in the classes that we investigated in the original version
of [9]. All of them (except for the one for the Czech Republic) are included in the
MTAR class with the threshold from OMA (Hu, Sk, Gr) or WA (De, Pl) subclasses.
For the data from the Czech Republic (that have the largest value of correlation with
German data), the optimal model for descriptive purposes is in the one-dimensional
ESTAR class with the threshold variable provided by the delayed German data.
Note that the above separation between the classes of models providing best
descriptive and predictive models is a bit more strongly marked than in [9].

4 Conclusions

The main results of our investigations are two-fold. From the economical point of
view, we provided another evidence of the dominance of the German economy with
respect to other economies of the considered group of countries (especially to the
three smaller V4 economies, mainly the Czech one). On the other hand, a relatively
more robust position of the (largest V4) Polish economy has been demonstrated.
Moreover, our analysis provides further empirical justification of applications
of the OMA type of aggregation functions in construction of threshold variables
for MTAR models with promising results concerning their potential out-of-sample
predictive performance.

Acknowledgements This work was supported by the grants VEGA 1/0420/15 and APVV-14-
0013.

References

1. Bacigál, T.: Multivariate threshold autoregressive models in Geodesy. J. Electr. Eng. 55(12/s),
91–94 (2004)
2. Calvo, T., Kolesárová, A., Komorníková, M., Mesiar R.: Aggregation operators: properties,
classes and construction methods. In: Calvo, T., Mayor, G., Mesiar, R. (eds.) Studies in
Fuzziness and Soft Computing, vol. 97, Aggregation Operators: New Trend and Applications,
Eds:, pp. 3–106. Physica Verlag, Heidelberg (2002)
Qualities of Different Classes of Models for Parallel Economic Development 173

3. EUROSTAT. http://appsso.eurostat.ec.europa.eu/nui/show.do?wai=true&data-set=namq_10_
gdp (2015)
4. Franses, P.H., Dijk, D.: Non-linear Time Series Models in Empirical Finance. Cambridge
University Press, Cambridge (2000)
5. Grabisch, M., Marichal, J.-L., Mesiar, R., Pap, E.: Aggregation Functions. Encyclopedia of
Mathematics and its Applications, vol. 127. Cambridge University Press, Cambridge (2009)
6. Hamilton, J.D.: A new approach to the economic analysis of nonstationary time series and the
business cycle. Econometrica 57(2), 357–384 (1989)
7. Klement, E.P., Mesiar R., Pap, E.: A universal integral as common frame for Choquet and
Sugeno integral. IEEE Trans. Fuzzy Syst. 18(1), art. no. 5361437, 178–187 (2010)
8. Komorník, J., Komorníková, M.: Applications of regime-switching models based on aggrega-
tion operators. Kybernetika 43(4), 431–442 (2007)
9. Komorník, J., Komorníková, M.: Predictive and descriptive models of mutual development
of economic growth of Germany and selected non-traditional EU countries. In: ITISE 2015,
International Work-Conference on Time Series, pp. 55–64. Copicentro Granada S.L (2015)
10. Linhart, H., Zucchini, W.: Model Selection. Wiley Series in Probability and Mathematical
Statistics. Wiley, New York (1986)
11. Mesiar, R., Mesiarová-Zemánková, A.: The ordered modular averages. IEEE Trans. Fuzzy
Syst. 19(1), 42–50 (2011)
12. OECD.Stat. http://stats.oecd.org/index.aspx?queryid=218 (2015)
13. Teräsvirta, T.: Specification, estimation, and evaluation of smooth transition autoregressive
models. J. Am. Stat. Assoc. 89, 208–218 (1994)
14. Tong, H.: On a threshold model. In: C.H. Chen (ed.) Pattern Recognition and Signal Processing.
Sijhoff & Noordhoff, Amsterdam (1978)
15. Tong, H.: Non-linear Time Series: A Dynamical System Approach. Oxford University Press,
Oxford (1990)
16. Tsay, R.S.: Testing and modeling multivariate threshold models. J. Am. Stat. Assoc. 93, 1188–
1202 (1998)
17. Yager, R.R.: On ordered weighted averaging aggregation operators in multicriteria decision
making. IEEE Trans. Syst. Man Cybern. 18(1), 183–190 (1988)
18. Yager, R.R., Kacprzyk, J.: The Ordered Weighted Averaging Operators Theory and Applica-
tions. Kluwer, Boston, MA (1997)
Search and Evaluation of Stock Ranking Rules
Using Internet Activity Time Series
and Multiobjective Genetic Programming

Martin Jakubéci and Michal Greguš

Abstract Hundreds of millions of people are daily active on the internet. They view
webpages, search for different terms, post their thoughts, or write blogs. Time series
can be built from the popularity of different terms on webpages, search engines, or
social networks. It was already shown in multiple publications that popularity of
some terms on Google, Wikipedia, Twitter, or Facebook can predict moves on the
stock market. We are trying to find relations between internet popularity of company
names and the rank of the company’s stock. Popularity is represented by time
series of Google Trends data and Wikipedia view count data. We use multiobjective
genetic programming (MOGP) to find these relations. MOGP is using evolutionary
operators to find tree-like solutions to multiobjective problems and is popular in
financial investing in the last years. Stock rank is used in an investment strategy to
find stock portfolios in our implementation, revenue and standard deviation are used
as objectives. Solutions found by the MOGP algorithm show the relation between
the internet popularity and stock rank. It is also shown that such data can help to
achieve higher revenue with lower risk. Evaluation is done by comparing the results
with different investment strategies, not only the market index.

Keywords Internet activity • Multiobjective genetic programming • Portfolio

1 Introduction

Modern investment strategies are based on diversification, investing in multiple


assets. The reason is to minimize risk [1–3]. Research in the area of financial
forecasting and investing is mostly based on the use of historical stock prices.
However data from the activity of millions on users on the internet can forecast

M. Jakubéci () • M. Greguš


Department of Information Systems, Faculty of Management, Comenius University in Bratislava,
Bratislava, Slovakia
e-mail: [email protected]; [email protected]

© Springer International Publishing Switzerland 2016 175


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_14
176 M. Jakubéci and M. Greguš

future moves on the financial markets. Recent studies show a lot of forecasting
capabilities. Sources of internet activity data are, for example:
• Google Trends—they contain the search popularity of different terms in the
Google search engine. Preis et al. [4] used popularity of 98 search terms to build
investment strategies.
• Wikipedia page views—Moat et al. [5] found out that market falls are preceded
by increased number of page views of financial terms and companies.
• Twitter posts—Ruiz et al. [6] found correlation of posts about companies with
trade volume and also smaller correlation with stock price, which was used in a
trading strategy.
We are using Google Trends and Wikipedia page views of the traded companies
to rank their stock. This rank is then used to select stocks in an investment
strategy. We use multiobjective genetic programming (MOGP) to find the model.
The advantage of this algorithm is that it is able to search space of possible models,
represented by a tree structure, while covering both revenue and risk goals. This
representation is useful for complex nonlinear models with multiple inputs.
Genetic programming is an evolutionary optimization algorithm, which is search-
ing for problem solutions. Solution is a program represented by a tree structure. The
tree-based solutions are formed from two different sets of vertices. The first group
are terminal symbols, for example, inputs, constants, or any method calls, which
do not accept any parameters. Those are leafs of the tree structure. The second
set are nonterminals, or functions, that accept parameters. For example, arithmetic
operators, logical operators, conditions, etc. They are expected to be type and run
safe, so that the solutions can be executed to transform inputs to outputs. The first
vertex in the tree is called root and the depth of every vertex is defined as the
distance from the root. First generation of solutions is created randomly. Every
next generation is created by stochastic transformation of the previous generation.
Transformation is done by applying operators, which are inspired by the evolution
theory. These operators are mostly selection, mutation, and crossover [7]. Every
next generation is expected to be better.
The quality of the solutions is evaluated by the fitness function. When dealing
with multiobjective optimization, there are multiple fitness functions required, one
for every objective. There are many algorithms to handle multiple objectives in
evolutionary algorithms, overview can be found in [8].

2 Related Research

Evolutionary algorithms were applied on multiple problems in the area of financial


investing. Mullei and Beling [9] used genetic algorithm to find the weights of a
linear combination evaluating stock, with nine objectives. They compared the results
with a polynomial network without definitive conclusion. Becker et al. [10] were
able to outperform the market with their genetic programming solutions for picking
Search and Evaluation of Stock Ranking Rules Using Internet Activity Time. . . 177

stocks from S&P300 index with three objectives. Huang et al. [11] outperformed
the market with their single objective genetic algorithm, which was used to tune the
parameters of their fuzzy model. Then they improved the results by implementing a
two-objective NSGA II algorithm [12] and adding domain knowledge [13].
Another area of research deals with generating trading rules. These are tree
structures, which return a logical value, which decides whether to enter or leave
the market. Allen and Karjalainen [14] were first to experiment on S&P 500 index,
but failed to beat the market. Neely [15] and other researchers added risk as a second
objective, but still failed to beat the market. Potvin et al. [16] were only able to beat
a stable or falling market. Becker and Seshadri [17] made some modifications to
the typical approach and beat the market. They used monthly data instead of daily,
reduced the number of functions, increased the number of precalculated indicators
(so in fact increasing domain knowledge), coevolved separate rules for selling and
buying, penalted more complex solutions, and took into account the number of
profitable periods instead of the total revenue. Lohpetch and Corne [18, 19] were
analyzing the differences between the different approaches and found out that longer
trading periods (monthly instead of daily) and introduction of a validation period
are causing better results. Their NSGA II implementation was able to beat the
single objective solution and also market and US bonds [20]. Briza and Naval [21]
used the multiobjective particle swarm algorithm with revenue and Sharpe ratio as
objectives and outperformed market and five indicators (moving averages, moving
average convergence—divergence, linear regression, parabolic stop, and reverse and
directional movement index) on training period, but failed to outperform market on
testing period.
A lot of research was done in the typical problem of portfolio optimization,
where the optimal weights of stocks in a portfolio are searched. Many single and
multiobjective evolutionary algorithms were used and compared, an overview was
done, for example, by [22] and [23].
Chen and Navet [24] criticize the research in the area of genetic programming
usage in investment strategies and suggest more pretesting. They compare strategies
with random strategies and lottery training without getting good results. Genetic
programming should prove its purpose by performing better than these random
strategies, according to them.
Most of the research in this area compares the results only with the most simple
buy and hold strategy, although wider evaluation should be done. We evaluate the
found rules by comparing them to different trading strategies, for example, random
solutions, bonds, and strategies based on technical or fundamental analysis.
We evaluate also the genetic programming itself by comparing it to the random
search algorithm. This is necessary to show that the algorithm is useful.
178 M. Jakubéci and M. Greguš

3 Goal

Our goal is to find a model for stock ranking based on its previous popularity
on the internet (based on Google Trends and Wikipedia page views). To evaluate
this ranking, we implement an investment strategy and compare its performance to
different strategies. We expect that our strategy can achieve the highest profit.

4 Methods

Genetic programming is used to generate stock evaluation rules using normalized


internet popularity data from Google and Wikipedia (of the company names as
terms). Google data was downloaded from Google Trends service (https://www.
google.com/trends) and Wikipedia data from the service Wikipedia article traffic
statistics (http://stats.grok.se).
The rule is then used for daily investing in the 30 Dow Jones Industrial Average
(DJIA) companies, because they serve as a good representation of the market. These
functions were used
• Arithmetic operations: addition, subtraction, multiplication, division, negation,
and exponentiation,
• Equality: higher, lower, equal, or any combination,
• Trigonometric operations: sine, cosine, and
• List operations: lag, moving average.
We use two fitness methods:
XT Rt  Rt1
1
• Average daily rate of return (RoR) D T , where R0 is the
tD1 Rt1
initial portfolio value and Rt is the final portfolio value.
r 
XT 2
• Standard deviation (StdDev) of daily returns i D T1 .ri  / , where
iD1
T is the number of periods, ri is the return rate at time i, and is the average
RoR.
SPEA2 algorithm was used to handle multiple objectives, because it overcomes
some issues in other algorithms. It’s based on elitism, Pareto dominant solutions are
kept in a separate archive with fixed size [22]. The algorithm works this way [29]:
1. Initialization—Generate an initial population and create the empty archive
(external set). Set t D 0.
2. Fitness assignment—Calculate fitness values of individuals in population and
archive.
3. Environmental selection—Copy all not dominated individuals in population and
archive to the new archive. If size of the new archive exceeds M then reduce new
archive by means of the truncation operator, otherwise if size of new archive is
Search and Evaluation of Stock Ranking Rules Using Internet Activity Time. . . 179

less than N then fill new archive with dominated individuals in population and
archive.
4. Termination: If t T or another stopping criterion is satisfied then set A to the set
of decision vectors represented by the not dominated individuals in the archive.
Stop.
5. Mating selection: Perform binary tournament selection with replacement on the
new archive in order to fill the mating pool.
6. Variation: Apply recombination and mutation operators to the mating pool and
set new population to the resulting population. Increment generation counter
(t D t C 1) and go to Step 2.
We used only internet popularity data as input, compared to our previous efforts
[25, 26], where we combined this data with historical prices. Transaction fees are
ignored. Portfolio is updated daily, ten companies with highest rank are bought and
ten companies with lowest rank are sold. Results are compared with the buy and
hold strategy on the DJIA, as a representation of the market.
Implementation was done in the C# language, which has a high performance but
is still easy to use. The language integrated many features from dynamic program-
ming, for example, the expression trees, which allow working with an algorithm as
a data structure. This is important for the genetic programming algorithm, because it
allows modifications in the solutions and application of the evolutionary operators.
The MetaLinq library was used to simplify these modifications (at http://metalinq.
codeplex.com/). We used the Microsoft Automatic Graph Layout library for tree
visualization (at http://research.microsoft.com/en-us/projects/msagl/).
We compared the results with a set of investment strategies:
• Lottery trading is doing decisions randomly. That means that it always gives a
random evaluation of a stock.
• Random strategy is a randomly created strategy. Such strategies are created also
in the first generation of the genetic programming simulation.
• Risk-free investment is represented by 3-year US treasury bonds.
• Buy and hold strategy means that the asset is bought on the beginning of the
period and sold at the end. It is the most basic strategy and it was applied to the
DJI index.
• Dogs of the Dow strategy is investing to ten companies from the DJI index with
the highest dividend yield [27].
• Simple moving averages (SMA) is calculated as an average of previous days,
when the price rises above the moving average, stock should be bought, when it
falls under the moving average, it should be bought [28].
• Exponential moving averages (EMA) is similar to the SMA, but with decreasing
effect of the older days in the calculation [28].
• Moving average convergence divergence (MACD) is calculated as a difference
between 26-period EMA and 12-period EMA, when it crosses the signal line
(EMA of MACD) from below, it is a buy signal [28].
180 M. Jakubéci and M. Greguš

5 Results

We ran the genetic programming multiple times to find Pareto fronts with different
solutions. An example of such solution in tree representation can be seen in
Fig. 1. It can be written as Multiply(Multiply(lastGoogle, Cos(Multiply(lastGoogle,
Sin(Lag(google, 49))))), Multiply(lastWiki, 0,38)). Meaning of some of the nodes:
• lastWiki and lastGoogle are the popularity values from previous day,
• Google and Wiki are the popularity time series from last 50 days, and
• Lag is the move in the time series.
This shows a linear relation between the rank and value of Google and Wikipedia
popularity from last day. We can also see a negative correlation with popularity
from the day before. This negative relation is caused by the cosine operator in the
normalized range of values. We can interpret this as a rule to buy a stock, while
the popularity is rising and sell it, when it decreases. This is the case for almost all
solutions (92 %) from the Pareto front of multiple simulation runs.
Although similar relation is present in most of the models, some (7 %) contain
even an exponential relation. Example can be seen in Fig. 2.

Fig. 1 Sample solution for


stock ranking

Fig. 2 Exponential relation


for Wikipedia popularity
Search and Evaluation of Stock Ranking Rules Using Internet Activity Time. . . 181

We used the ranking rule in an investment strategy and compared RoR and
StdDev with different other strategies. We used the training period to find Pareto
fronts, filter solutions in a validation period as suggested by [19] and evaluated the
results in two different evaluation periods. Results are shown as averages in ten
different runs.
First we compared the performance with the random search algorithm. Figure 3
shows that genetic programming is able to find better Pareto fronts. Table 1
shows that genetic programming outperforms the random search in both return and
deviation. It also found larger Pareto fronts, which is useful for more diverse trading
options, with different revenue and risk combinations.
Next we compared the results with different trading strategies in two time settings
and without transaction fee and also with 0.5 % transaction fee. Results for the first
time setting with transaction fee are shown in Figures 4, 5, 6, and 7, where RoR
is shown on the x axis and StdDev on y axis. This setting consists of a training
period 2010–2012, validation period 2013, first evaluation period first half of 2014,

Fig. 3 Pareto fronts of random search and genetic programming

Table 1 Results of random search and genetic programming


Random search Genetic programming
RoR StdDev # RoR StdDev #
Training 2010–2012 0:0006 0.0125 28 0.0008 0.0102 63
Validation 2013 0:0006 0.0097 13 0.0008 0.0070 35
Evaluation 2014 0:0004 0.0062 13 0.0005 0.0049 35
Evaluation 2008–2009 0:0003 0.0243 13 0.0003 0.0239 35
182 M. Jakubéci and M. Greguš

Fig. 4 Pareto front of genetic programming and other investment strategies in the training period
of 2010–2012, including a transaction fee

Fig. 5 Pareto front of genetic programming and other investment strategies in the validation period
of 2013, including a transaction fee
Search and Evaluation of Stock Ranking Rules Using Internet Activity Time. . . 183

Fig. 6 Results of genetic programming and other investment strategies in the evaluation period of
2014, including a transaction fee

Fig. 7 Results of genetic programming and other investment strategies in the evaluation period of
2008–2009, including a transaction fee
184 M. Jakubéci and M. Greguš

and second evaluation period of 2008–2009. The periods were chosen to contain a
raising market and also the falling market of the financial crisis.
Figure 4 shows the results in the training period. The line represents the
different ranking rules found by genetic programming and the dots represent the
other investment strategies, one of them is the genetic programming average.
Genetic programming is able to achieve higher revenues while maintaining a lower
deviation.
The same information is shown in Fig. 5, but for the validation period. This
data was not used for training, so the results are a little bit worse. But genetic
programming has still the highest revenue, while having almost the lowest deviation.
Most interesting are results in the evaluation periods. Figure 6 shows the results
in 2014, genetic programming has the highest revenue with lowest deviation, so it
clearly outperforms all the other strategies.
The second evaluation period is shown in Fig. 7, this is in fact the falling market
of the financial crisis in the years of 2008–2009. It can be seen that the market
index and fundamental analysis are in a loss, outperformed even by the bonds.
Technical analysis strategies have the lowest deviation with a small profit. Genetic
programming has again the highest profit.
The results are summarized in Fig. 8 and it is obvious that genetic programming
outperformed all the other strategies. It includes the information for both time
settings and both transaction fee settings. Technical analysis, Dogs of the Dow, and
DJIA index performed quite well too, but were weak in the crisis periods.

Fig. 8 Average daily rate of return of genetic programming and other strategies
Search and Evaluation of Stock Ranking Rules Using Internet Activity Time. . . 185

It’s obvious that a strategy based on internet popularity can be highly profitable.
More research and evaluation in this area is needed in the future.

6 Conclusion

We found out that data about internet activity can be used to rank stocks and achieve
interesting revenues when using this rank for investing. We used MOGP algorithm
to find the relation between internet activity and stock rank.
We found out that there is a positive correlation between stock rank and term
popularity and also a negative correlation between stock rank and older popularity.
This can be interpreted as a high rank for stocks with rising popularity and low rank
for falling popularity. This contradicts partly the findings of [4], whose strategy was
based on selling when popularity was rising. Although they used weekly trading and
we used daily trading, this lag could cause the difference.
We compared the rules used in an investment strategy with other strategies, for
example, random algorithms, market index, technical, and fundamental strategies.
Rules found by genetic programming are able to compete with them and even
outperform them in most of the cases.
We suggest that more research is needed in this area. Strategies should be
compared to more trading strategies and evaluation on more data periods is needed.

Acknowledgment This research has been supported by a VUB grant no. 2015-3-02/5.

References

1. Bohdalová, M.: A comparison of value-at-risk methods for measurement of the financial risk.
In: The Proceedings of the E-Leader, pp. 1–6. CASA, New York, (2007)
2. Bohdalová, M., Šlahor, L.: Modeling of the risk factors in correlated markets using a
multivariate t-distributions. Appl. Nat. Sci. 2007, 162–172 (2007)
3. Bohdalová, M., Šlahor, L.: Simulations of the correlated financial risk factors. J. Appl. Math.
Stat. Inf. 4(1), 89–97 (2008)
4. Preis, T., Moat, S.H., Stanley, H.E.: Quantifying trading behavior in financial markets using
Google Trends. Sci. Rep. 3, 1684 (2013)
5. Moat, H.S., Curme, CH., Avakian, A., Kenett, D.Y., Stanley, H.E., Preis, T.: Quantifying
Wikipedia usage patterns before stock market moves. Sci. Rep. 3, 1801 (2013)
6. Ruiz, J.E., Hristidis, V., Castillo, C., Gionis, A., Jaimes, A.: Correlating financial time series
with micro-blogging activity. In: Proceedings of the Fifth ACM International Conference on
Web Search and Data Mining, ACM, New York, NY, USA, pp. 513–522 (2012)
7. Poli, R., Langdon, W.B., McPhee, N.F.: A Field Guide to Genetic Programming. http://www.
gp-field-guide.org.uk (2008)
8. Ghosh, A., Dehuri, S.: Evolutionary algorithms for multicriterion optimization: a survey. Int.
J. Comput. Inform. Sci. 2(1), 38–57 (2005)
9. Mullei, S., Beling, P.: Hybrid evolutionary algorithms for a multiobjective financial problem.
In: Proceedings of the 1998 IEEE International Conference on Systems, Man, and Cybernetics,
11–14 October 1998, San Diego, CA, USA, vol. 4, pp. 3925–3930 (1998)
186 M. Jakubéci and M. Greguš

10. Becker, Y.L., Fei, P., Lester, A.: Stock selection—an innovative application of genetic
programming methodology. In: Genetic Programming Theory and Practice IV, pp. 315–334.
Springer, New York (2007)
11. Huang, C.F., Chang, C.H., Chang, B.R., Cheng, D.W.: A study of a hybrid evolutionary fuzzy
model for stock selection. In: Proceeding of the 2011 IEEE International Conference on Fuzzy
Systems, 27–30 June 2011, pp. 210–217. Taipei (2011)
12. Chen, S.S., Huang, C.F., Hong, T.P.: A multi-objective genetic model for stock selection.
Proceedings of The 27th Annual Conference of the Japanese Society for Artificial Intelligence,
Toyama, Japan, 4–7 June 2013
13. Chen, S.S., Huang, C.F., Hong, T.P.: An improved multi-objective genetic model for stock
selection with domain knowledge. In: Technologies and Applications of Artificial Intelligence,
Lecture Notes in Computer Science, vol. 8916, pp. 66–73. Springer (2014)
14. Allen, F., Karjalainen, R.: Using genetic algorithms to find technical trading rules. J. Financ.
Econ. 51, 245–271 (1999)
15. Neely, C.H.: Risk-adjusted, ex-ante, optimal technical trading rules in equity markets. Int. Rev.
Econ. Financ. 12, 69–87 (1999)
16. Potvin, J.Y., Soriano, P., Vallée, M.: Generating trading rules on the stock markets with genetic
programming. Comput. Oper. Res. 31(7), 1033–1047 (2004)
17. Becker, L.A., Seshadri, M.: GP-evolved technical rules can outperform buy and hold. In:
Proceedings of the 6th International Conference on Computational Intelligence and Natural
Computing, pp. 26–30 (2003)
18. Lohpetch, D., Corne, D.: Discovering effective technical trading rules with genetic pro-
gramming: towards robustly outperforming buy-and-hold. World Congress on Nature and
Biologically Inspired Computing, pp. 431–467 (2009)
19. Lohpetch, D., Corne, D.: Outperforming buy-and-hold with evolved technical trading rules:
daily, weekly and monthly trading. In: Proceedings of the 2010 International Conference on
Applications of Evolutionary Computation, vol. 6025, pp. 171–181. Valencia, Spain (2010)
20. Lohpetch, D., Corne, D.: Multiobjective algorithms for financial trading: Multiobjective
out-trades single-objective. IEEE Congress on Evolutionary Computation, 5–8 June 2011,
New Orleans, LA, USA, pp. 192–199 (2011)
21. Briza, A.C., Naval, P.C.: Design of stock trading system for historical market data using
multiobjective particle swarm optimization of technical indicators. In: Proceedings of the 2008
GECCO Conference Companion on Genetic and Evolutionary Computation, pp. 1871–1878.
Atlanta, Georgia, USA (2008)
22. Hassan, G.N.A.: Multiobjective genetic programming for financial portfolio management in
dynamic environments. Doctoral Dissertation, University College London (2010)
23. Tapia, G.C., Coello, C.A.: Applications of multi-objective evolutionary algorithms in eco-
nomics and finance: a survey. IEEE Congress on Evolutionary Computation, pp. 532–539
(2007)
24. Chen, S.H., Navet, N.: Failure of genetic-programming induced trading strategies: distin-
guishing between efficient markets and inefficient algorithms. Comput. Intell. Econ. Finan.
2, 169–182 (2007)
25. Jakubéci, M.: Výber portfólia akcií s využitím genetického programovania a údajov o pop-
ularite na internete. VII. mezinárodní vědecká konference doktorandů a mladých vědeckých
pracovníků, pp. 47–56. Opava, Czech Republic (2014)
26. Jakubéci, M.: Evaluation of investment strategies created by multiobjective genetic program-
ming. In: Proceedings of the 7th International Scientific Conference Finance and Performance
of Firms in Science, Education and Practice, 23–24 April 2015, pp. 498–509. Zlin, Czech
Republic (2015)
27. Domiana, D.L., Loutonb, D.A., Mossmanc, C.E.: The rise and fall of the Dogs of the Dow.
Financ. Serv. Rev. 7(3), 145–159 (1998)
Search and Evaluation of Stock Ranking Rules Using Internet Activity Time. . . 187

28. Kirkpatrick, C., Dahlquist, J.: Technical analysis. FT Press, Upper Saddle River, NJ (2010)
29. Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: improving the strength Pareto evolutionary
algorithm. In: Evolutionary Methods for Design, Optimisation and Control with Application
to Industrial Problems (EUROGEN 2001), pp. 95–100. International Center for Numerical
Methods in Engineering, Barcelona
Integer-Valued APARCH Processes

Maria da Conceição Costa, Manuel G. Scotto, and Isabel Pereira

Abstract The Asymmetric Power ARCH representation for the volatility was
introduced by Ding et al. (J Empir Financ 1:83–106, 1993) in order to account for
asymmetric responses in the volatility in the analysis of continuous-valued financial
time series like, for instance, the log-return series of foreign exchange rates, stock
indices, or share prices. As reported by Brännäs and Quoreshi (Appl Financ Econ
20:1429–1440, 2010), asymmetric responses in volatility are also observed in time
series of counts such as the number of intra-day transactions in stocks. In this work,
an asymmetric power autoregressive conditional Poisson model is introduced for
the analysis of time series of counts exhibiting asymmetric overdispersion. Basic
probabilistic and statistical properties are summarized and parameter estimation is
discussed. A simulation study is presented to illustrate the proposed model. Finally,
an empirical application to a set of data concerning the daily number of stock
transactions is also presented to attest for its practical applicability in data analysis.

Keywords Asymmetric volatility • Ergodicity • Heteroscedasticity • Nonlinear


time series • Overdispersion • Stationarity

1 Introduction

The analysis of continuous-valued financial time series like log-return series of


foreign exchange rates, stock indices, or share prices has revealed some common
features: sample means not significantly different from zero, sample variances
of the order 104 or smaller and sample distributions roughly symmetric in its
center, sharply peaked around zero but with a tendency to negative asymmetry. In
particular, it has usually been found that the conditional volatility of stocks responds

M.C. Costa () • I. Pereira


Departamento de Matemática and CIDMA, University of Aveiro, 3810-193 Aveiro, Portugal
e-mail: [email protected]; [email protected]
M.G. Scotto
Departamento de Matemática and CEMAT, Instituto Superior Técnico, University of Lisbon,
Av. Rovisco Pais 1, 1049-001 Lisboa, Portugal
e-mail: [email protected]

© Springer International Publishing Switzerland 2016 189


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_15
190 M.C. Costa et al.

asymmetrically to positive versus negative shocks: volatility tends to rise higher in


response to negative shocks as opposed to positive shocks, which is known as the
leverage effect. To account for asymmetric responses in the volatility Ding et al. [3]
introduced the Asymmetric Power ARCH, APARCH. p; q/, in which
X
p X
q
ı
Yt D t Zt ; t D!C ˛i .jXti j  i Xti /ı C ˇj ı
tj ; t 2 Z; (1)
iD1 jD1

where .Zt / is an i.i.d. sequence with zero mean, ! > 0, ˛i  0, ˇj  0, ı  0,


1 < i < 1. The APARCH representation in (1) has some noteworthy advantages,
namely the power of the returns for which the predictable structure in the volatility
is the strongest, is determined by the data and the model allows the detection of
asymmetric responses of the volatility for positive or negative shocks. If i > 0
the leverage effect arises. The APARCH model includes as particular cases several
well-known ARCH-type models such as the GARCH, the TARCH, and the NARCH
models; see Turkman et al. [12] for further details.
Asymmetric responses on the volatility are also commonly observed in the
analysis of time series representing the number of intra-day transactions in stocks, in
which the numbers are typically quite small [1]. As an illustration of this kind of data
we present in Fig. 1 two time series of count data generated from stock transactions,
namely tick-by-tick data for Glaxosmithkline and Astrazeneca downloaded from
www.dukascopy.com. Data consist on the number of transactions per minute during
one trading day (19/09/2012, for Glaxosmithkline and 21/09/2012, for Astrazeneca),
corresponding to 501 observations for each series. Counts are typically small and
both time series contain a large quantity of zeros. After download, data was filtered
by the authors in order to fill in the zero counts during the trading periods considered
and delete all trading during the first and the last 5 min of each day as trading
mechanisms may be different during the opening and closing of the stock exchange
market.
The existence of this kind of data motivated our proposal for a counterpart of the
APARCH representation of the volatility. To this extent, in Sect. 2 an INGARCH-
type model suitable to account for time series of counts exhibiting asymmetric
overdispersion is introduced. Parameter estimation is covered in Sect. 3. In Sect. 4
a simulation study is carried out to illustrate INAPARCH.1; 1/ model. A real-data
application is given in Sect. 5.

2 Integer-Valued APARCH. p; q/ Processes

In this work, focus is put on models in which the count variable is assumed to
be Poisson distributed conditioned on the past which is to say that the conditional
distribution of the count variable, given the past, is assumed to be Poisson with time-
varying mean t , satisfying some autoregressive mechanism. An important family of
such observation-driven models that is able to handle overdispersion is the class of
Integer-Valued APARCH Processes 191

Fig. 1 Time series plots for Glaxosmithkline (top) and Astrazeneca (bottom)

Autoregressive Conditional Poisson (ACP), first introduced in [9], but also referred
to as the INGARCH model due to its analogy to the conventional GARCH model
(see Ferland et al. [5]).
An INteger-valued GARCH process of orders . p; q/, INGARCH. p; q/ in short,
is defined to be an integer-valued process .Yt / such that, conditioned on the past
experience, Yt is Poisson distributed with mean t , and t is obtained recursively
from the past values of the observable process .Yt / and .t / itself, that is

X
p
X
q
Yt jFt1 Po.t /; t D 0 C i Yti C ıj tj ; t 2 Z ;
iD1 jD1

where Ft1 WD .Ys ; s  t  1/, 0 > 0, i 0, and ıj > 0. In [5] it was shown
that the process .Y
Pp t / is strictly
Pq stationary with finite first and second order moments
provided that iD1 i C jD1 ıj < 1. The particular case p D q D 1 was analyzed
by Fokianos and Tjøstheim [6] and Fokianos et al. [7] under the designation of
Poisson Autoregression. The authors considered linear and nonlinear models for t .
192 M.C. Costa et al.

For the linear model case the representation considered is as follows:


Y;
Yt jFt1 Po.t /; t D d C at1 C bYt1 ; t 2 N ; (2)

where it is assumed that the parameters d, a, b are positive, and 0 and Y0 are
fixed. This representation corresponds exactly to the INGARCH.1; 1/ model in [5],
nevertheless, the approach followed by Fokianos et al. [7] is slightly different in the
sense that the linear model is rephrased as Yt D Nt .t /; t 2 N with t defined as
in (2), and 0 and Y0 fixed. For each time point t, the authors introduced a Poisson
process of unit intensity, Nt ./, so that Nt .t / represents the number of such events
in the time interval Œ0; t . Following this rephrasing a perturbation is introduced
in order to demonstrate -irreducibility and as a consequence geometric ergodicity
follows. The nonlinear case is considered a generalization of the previous situation
in which the conditional mean, EŒYt jFt1 Y;
 D t , is a nonlinear function of both the
past values of t and the past values of the observations. Sufficient conditions to
prove geometric ergodicity can be found in [7].
It is worth to mention that the models above can not cope with the presence of
asymmetric overdispersion. This paper aims at giving a contribution towards this
direction by introducing the INteger-valued APARCH process.
Definition 1 (INAPARCH. p; q/ Model) An INteger-valued APARCH. p; q/ is
defined to be an integer-valued process .Yt /, such that, conditioned on the past, the
distribution of Yt is Poisson with mean value t satisfying the recursive equation

X
p
X
q
ıt D ! C ˛i .jYti  ti j  i .Yti  ti //ı C ˇj ıtj ; t 2 Z
iD1 jD1

with ! > 0, ˛i 0, ˇj 0, ji j < 1 and ı 0.


Following Doukhan et al. [4] (see also [2, 8, 11]) we will establish the existence and
uniqueness of a stationary solution, and ergodicity for the p D q D 1 case. The
INAPARCH.1; 1/ process is defined as an integer-valued process such that

Yt jFt1 Po.t /; ıt D ! C ˛j.Yt1  t1 j  .Yt1  t1 //ı C ˇıt1 (3)

with t 2 Z, ˛  ˛1 , ˇ  ˇ1 , and   1 . Parameter  should reflect the leverage


effect relative to the conditional mean of the process .Yt /.
Proposition 1 Under the conditions in Definition 1 the bivariate process .Yt ; t /
has a stationarity solution.
Proof For the general Markov chain and according to Theorem 12.0.1(i) in [10], if
.Xt / is a weak Feller chain and if for any > 0 there exists a compact set C X
such that P.x; Cc / < ; 8x 2 X; then .Xt / is bounded in probability and thus there
exists at least one stationary distribution for the chain. We will show that the chain
is bounded in probability and therefore admits at least one stationary distribution.
Integer-Valued APARCH Processes 193

First note that the chain is weak Feller (cf., [2]). Define C WD Œc; c then

P.; Cc / D P.ıt 2 Cc jt1 D /


 
D P j! C ˛.jYt1  t1 j  .Yt1  t1 //ı C ˇıt1 j > cjt1 D 

which, by Markov’s inequality


   
E Œj!j C E j˛.jYt1  t1 j  .Yt1  t1 //ı jjt1 D  C E jˇı j
 :
c

Since ˛; ˇ; ı;  > 0 and in view of the fact that j j < 1 and jYt1  t1 j  .Yt1 
t1 /  0, the expression above simplifies to

! C ˇı ˛  
P.; Cc /  C E .jYt1  t1 j  .Yt1  t1 //ı jt1 D  :
c c
 
Given the definition of E .jYt1  t1 j  .Yt1  t1 //ı jt1 D  ,

X yt1 C1
! C ˇı ˛
P.; C / 
c
C e .jyt1  j  .yt1  //ı :
c c y D0
.y t1 /Š
t1

P
By d’Alembert’s criterion, the series C1 yt1 ı
yt1 D0 .yt1 /Š .jyt1  j  .yt1  // ; is
absolutely convergent. Being convergent, the series has a finite sum and so it can be
written that
C1
X yt1
! C ˇı ˛
P.; Cc /  C e .jyt1  j  .yt1  //ı < :
c c y D0
.y t1 /Š
t1

Thus, for any > 0 just choose c large enough so that


0 1
C1
X yt1
1@
! C ˇı C ˛e .jyt1  j  .yt1  //ı A < ;
c y D0
.y t1 /Š
t1

leading to conclude that the series has at least one stationary solution.
Finally, in proving uniqueness we proceed as follows: first note that the
INAPARCH.1; 1/ model belongs to the class of observation-driven Poisson count
processes considered in Neumann [11], Yt jFt1 Y;
Po.t /I t D f .t1 ; Yt1 /; t 2
N with
1
f .t1 ; Yt1 / D .! C ˛.jYt1  t1 j  .Yt1  t1 //ı C ˇıt1 / ı :
194 M.C. Costa et al.

Thus, the result follows if the function f above satisfies the contractive condition

j f .; y/  f .0 ; y0 /j  k1 j  0 j C k2 jy  y0 j; 8 ; 0 0; 8 y; y0 2 N0 ; (4)

where k1 and k2 are nonnegative constants such that k1 C k2 < 1. For the
INAPARCH.1; 1/ model the contractive condition simplifies to

@f @f
j f .t1 ; Yt1 /  f .0t1 ; Yt1
0
/j  k k1 jt1  0t1 j C k 0
k1 jYt1  Yt1 j;
@t1 @Yt1

where for the Euclidean space Rd and h W Rd ! R, khk1 is defined by


khk1 D supx2Rd jh.x/j. For the sake of brevity we will skip the theoretical details
and conclude that, in the INAPARCH.1; 1/ case, if

˛2ıC1 ı C ˇ2ı1 < 1 ; (5)

for ı  2, then the contractive condition holds. This concludes the proof.
Neumann [11] proved that the contractive condition in (4) is, indeed, sufficient
to ensure uniqueness of the stationary distribution and ergodicity of .Yt ; t /. The
results are quoted below.
Proposition 2 Suppose that the bivariate process .Yt ; t / satisfies (3) and (5) for
ı 2. Then the stationary distribution is unique and EŒ1  < 1.
Proposition 3 Suppose that the bivariate process .Yt ; t / is in its stationarity
regime and satisfies (3) and (5) for ı 2. Then the bivariate process .Yt ; t / is
ergodic and EŒ21  < 1.
Furthermore, following Theorem 2.1. in [4], it can be shown that if the process
.Yt ; t / satisfies (3) and (5) for ı 2, then there exists a solution of (3) which
is a -weakly dependent strictly stationary process with finite moments up to any
positive order and is ergodic.

3 Parameter Estimation

In this section, we consider the estimation of the parameters of the model (3).
The conditional maximum likelihood (CML) method can be applied in a very
straightforward manner. Note that by the fact that the conditional distribution is
Poisson the conditional likelihood function, given the starting value 0 and the
observations y1 ; : : : ; yn , takes the form

Y
n
et ./ t t ./
y
L./ WD (6)
tD1
yt Š
Integer-Valued APARCH Processes 195

with  WD .!; ˛1 ; : : : ; ˛p ; ˇ1 ; : : : ; ˇq ; 1 ; : : : ; p ; ı/  .1 ; 2 ; : : : ; 2pCqC2 /, the


unknown parameter vector. The log-likelihood function is given by

X
n X
n
ln.L.// D Œyt ln.t /  t  ln.yt Š/  `t ./ : (7)
tD1 tD1

The score function is the vector defined by

@ ln.L.// X @`t ./


n
Sn ./ WD D : (8)
@ tD1
@

For the calculation of the first order derivatives of the general INAPARCH. p; q/
model the auxiliary calculations presented below are needed.

@`t @t yt
D  1 ; i D 1; : : : ; 2 C 2p C q ;
@i @i t

@.ı /
where @ t
@i D ııt @i ; i D 1; : : : ; 2 C 2p C q: Thus, for i D 1; : : : ; p and for
t t

j D 1; : : : ; q, the first derivatives are given by the following expressions


0 1
t @ X ı1 @ti X @tj
p q ı
@t
D ı ı ˛i gti .Iti C i / C ˇj C 1A ;
@! ıt iD1
@! jD1
@!
0 1
t @ X X
p q ı
@t @ @
˛k gı1 C gıti A ;
tk tj
D ı ı tk .Itk C k / C ˇj
@˛i ıt kD1
@˛i jD1
@˛i
0 1
t @ X X
p q ı
@t @ @
D ı ı ˛k gı1
tk .Itk C k /
tk
C ˇj
tj
 ı˛i gı1
ti .yti  ti /
A;
@i ıt kD1
@ i jD1
@ i

!
@t t X
p
@ti X @ıtk
q
D ı ı ˛i gı1
ti .Iti C i / C ˇk C ıtj ;
@ˇj ıt iD1
@ˇj kD1
@ˇ j
(
@t t X
p
ı @ti
D ı ˛i gıti .Iti C i / C ln.gti /
@ı ıt iD1
gti @ı
9
X
q
@ıtj ıt =
C ˇj  ln.ıt / ;
@ı ı ;
jD1
196 M.C. Costa et al.


1 yt > t
where gti D jyti  ti j  i .yti  ti / and It D : Thus, for the
1 y t < t
INAPARCH.1; 1/ model the score function can then be explicitly written as
2P   3
n yt @t
t 1 @1
6 tD1
7
6 :: 7
Sn ./ D 6 7
4P  :  5
n yt @t
tD1 t 1 @5

with

@t @t t ı1 ı1 @t1


 D ı ı.˛.It1 C  /gt1 C ˇt1 / C1 ;
@1 @! ıt @!
@t @t t ı1 ı1 @t1
 D ı ı.˛.It1 C  /gt1 C ˇt1 / C gıt1 ;
@2 @˛ ıt @˛
@t @t t ı1 ı1 @t1
 D ı ı.˛.It1 C  /gt1 C ˇt1 /  ˛ıgı1
t1 .yt1  t1 / ;
@3 @ ıt @
@t @t t ı1 ı1 @t1
 D ı ı.˛.It1 C  /gt1 C ˇt1 / C ıt1 ;
@4 @ˇ ıt @ˇ
@t @t t ı1 ı1 @t1
 D ı ı.˛.It1 C  /gt1 C ˇt1 / C ˛gıt1 ln.gt1 /
@5 @ı ıt @ı
 t
Cˇıt1 ln.t1 /  ln.t / :
ı

The solution of the equation Sn ./ D 0 is the CML estimator, , O if it exists. To


study the asymptotic properties of the maximum likelihood estimator we proceed
as follows: first it can be shown that the score function, evaluated at the true value
of the parameter, say  D  0 , is asymptotically normal. The score function has
martingale difference terms defined by

@`t yt @t
D 1 :
@i t @i

It follows that, at  D  0

@`t
E jFt1 D 0 ;
@ 0
Integer-Valued APARCH Processes 197

h i  2 h i
yt 1
since E t  1jFt1 D 0, and E ytt  1 jFt1 D V ytt  1jFt1 D t : It
can also easily be shown that, for ı  2
   
E t22ı jFt1 < C1 ; E t1ı jFt1 < C1
 
E t2ı ln.t /jFt1 < EŒln.t /jFt1  < EŒt jFt1  < C1
 
E 2t ln2 .t /jFt1 < C1 ; E Œt ln.t /jFt1  < C1 :
h i
Thus, it can be concluded that V @` @
t
jF t1 < C1 and that @`t =@ is a martingale
difference sequence with respect to Ft1 . The application of a central limit theorem
for martingales guarantees the desired asymptotic normality.
It is worth to mention here that in Sect. 2 it was concluded that the process has
finite moments up to any positive order and is -weak dependent, which implies
ergodicity. This is sufficient to state that the Hessian matrix converges in probability
to a finite limit. Finally, all third derivatives are bounded by a sequence that
converges in probability. Given these three conditions, it is then concluded that the
CML estimator, , O is consistent and asymptotically normal

p
n.O   0 / ! N .0; G1 . 0 //
d

with variance-covariance matrix, G./, given by


0
1 @t @t
G./ D E :
t @ @

A consistent estimator of G./ is given by

X
n
@`t ./ X 1
n
@t ./ @t ./ 0
Gn ./ D V jFt1 D :
tD1
@  ./
tD1 t
@ @

4 Simulation

In this section a simulation study computed using Matlab is conducted to illustrate


the theoretical findings given above. This study contemplates seven different
combinations for , which are displayed in Table 1. For each set of parameters, time
series of length 500 with 300 independent replicates from the INAPARCH.1; 1/
model were simulated.
Note that for C1–C4 cases, condition (5) holds, whereas for cases C5–C7 such
condition fails. The results are summarized in Table 1 and the bias of the CML
estimates are presented in Fig. 2 for the combination of parameters C2, C4, C5,
198 M.C. Costa et al.

Table 1 Parameter estimates and standard deviations (std) in parentheses


Parameter values Point estimates and (std)
ˇO ıO
ˇ
Case ! ˛  ˇ ı 2ı .2˛ı C 2 / !O ˛O O
C1 2.30 0.01 0.68 0.10 2.00 0.36 1:851 0:064 0:635 0:185 1:924
.0:482/ .0:068/ .0:318/ .0:224/ .0:715/
C2 2.30 0.03 0.68 0.06 2.00 0.60 1:906 0:075 0:617 0:145 1:917
.0:514/ .0:069/ .0:335/ .0:198/ .0:686/
C3 2.30 0.01 0.68 0.10 3.00 0.88 1:967 0:057 0:592 0:157 2:958
.0:422/ .0:068/ .0:291/ .0:181/ .0:718/
C4 2.30 0.05 0.68 0.08 2.00 0.96 1:893 0:088 0:700 0:175 1:953
.0:529/ .0:072/ .0:307/ .0:210/ .0:715/
C5 0.40 0.30 0.68 0.10 1.00 1.30 0:451 0:264 0:769 0:145 0:746
.0:128/ .0:077/ .0:196/ .0:138/ .0:376/
C6 0.80 0.30 0.68 0.10 1.00 1.30 0:746 0:300 0:704 0:135 0:963
.0:139/ .0:087/ .0:164/ .0:123/ .0:387/
C7 2.30 0.30 0.68 0.10 2.00 5.00 2:272 0:308 0:748 0:129 2:040
.0:751/ .0:129/ .0:222/ .0:131/ .0:651/

Fig. 2 Bias of the conditional ML estimates, for cases C2 (top left), C4 (top right), C5 (bottom
left), and C6 (bottom right) (numbers 1–5 below the boxplots refer to the estimated parameters, in
the order appearing in Table 1)
Integer-Valued APARCH Processes 199

and C6. From this part of the simulation study a few conclusions can be drawn: it is
clear that as the theoretical values of the ˛ and ˇ parameters rise, the point estimates
obtained are much closer to what was expected, in particular for the ˛ parameter; 
and ı parameters are fairly estimated but there is a certain difficulty in the estimation
of the ! parameter that tends to be underestimated, with the exception of C5 case;
there is also a very high degree of variability, in particular for ! and ı parameters.
An important conclusion is that condition (5) does not seem to interfere with the
quality of the point estimates for this model. In fact, best overall estimates were
obtained for cases C6 and C7, clearly not obeying condition (5).

4.1 Log-Likelihood Analysis

For C2 and C4 cases, 300 samples were simulated considering values of ı varying
from 2.0 to 3.0 (i.e., six different situations for each case). After preliminary data
analysis with the construction of boxplots and histograms that confirm the presence
of overdispersion, the log-likelihood was studied in the following manner: for each
set of 300 samples the log-likelihood was calculated, varying the ı parameter in the
range 2.0–3.0. It was expected that the log-likelihood was maximum for the ı value
used to simulate that particular set of 300 samples. Results are presented in Table 2
for Case 2. Case 2 was chosen for representation herein just because for this case
the first three values for the ı parameter lie inside the region that obeys condition (5)
and the last three lie outside this region. Nevertheless, same behavior was observed
for both Case 2 and Case 4 and the ı value for which the calculated log-likelihood
was maximum was exactly what was expected for both cases and all six different
situations. In Table 2, it can be observed that the mean log-likelihood is maximum
for the ı value corresponding to the ı value used for the simulation of the respective
set of samples.

Table 2 Maximum likelihood estimation results for Case 2


Samples simulated with Log-likelihood for varying ı
 D .2:30; 0:03; 0:68; 0:06; ı/ 2.0 2.2 2.4 2.6 2.8 3.0
ı D 2:00 785.478 786.156 787.699 789.663 791.803 793.982
ı D 2:20 775.208 774.593 775.065 776.129 777.501 779.019
ı D 2:40 766.791 765.102 764.684 764.999 765.733 766.701
ı D 2:60 760.116 757.574 756.449 756.168 756.395 756.926
ı D 2:80 755.027 751.778 750.067 749.294 749.102 749.271
ı D 3:00 751.178 747.302 745.073 743.865 743.302 743.153
200 M.C. Costa et al.

5 Real-Data Example: Transaction Modeling

In this section, the results above are applied in the analysis of the motivating
examples presented in Fig. 1, Sect. 1. As already described, data consist on the
number of transactions per minute during one trading day for Glaxosmithkline and
Astrazeneca. CML estimation method was applied and the results are shown in
Table 3. Note that the estimated value of  is negative for both time series meaning
that there is evidence that positive shocks have stronger impact on overdispersion
than negative shocks. Another important feature exhibited by both time series is that
the estimated value of ı fail the condition ı 2. It is worth mentioning that this is
not a surprising result since in the estimation of the Standard and Poor 500 stock
market daily closing price index in [3], the ı estimate obtained did not also satisfy
such sufficient condition for the process to be covariance stationary.
A short simulation study was also carried out in this section. The CML point
estimates of both real-data series in Table 3 were used to simulate 300 independent
replicates of length 500 from the INAPARCH.1; 1/ model, namely, GSK and AZN
cases, referring, respectively, to the samples based on the point estimates for the
Glaxosmithkline and Astrazeneca time series. CML estimates were then obtained
for these samples and the results are presented in Table 4, with corresponding
bias in Fig. 3. Regarding Fig. 3, it can be seen that variability and the tendency to
underestimate the ! parameter is maintained (taking in consideration the median
value) but in relation to the ı parameter variability has decreased significantly. From
inspection of Table 4, it can be said that, in general, CML point estimates are not
very far from what was expected in both cases, although better overall estimates
were obtained for the AZN case. Considering that condition (5) was not fulfilled for
either AZN or GSK cases (2ı .2˛ı C ˇ2 / equals 2.0297 for the AZN case and 1.4091

Table 3 Maximum likelihood estimation results for Glaxosmithkline and Astrazeneca time series
(standard errors in parentheses)
Time series !O ˛O O ˇO ıO
Glaxosmithkline 0.378 0.139 0.326 0.879 0.982
(0.068) (0.007) (0.084) (0.007) (0.0005)
Astrazeneca 2.486 0.282 0.278 0.750 1.059
(0.108) (0.006) (0.036) (0.004) (0.0008)

Table 4 Maximum likelihood estimation results for GSK and AZN cases (standard deviations in
parentheses)
Samples !O ˛O O ˇO ıO
GSK case 0.739 0.182 0.241 0.871 1.220
(0.577) (0.094) (0.259) (0.025) (0.388)
AZN case 2.495 0.309 0.147 0.699 0.998
(0.665) (0.101) (0.258) (0.107) (0.161)
Integer-Valued APARCH Processes 201

Fig. 3 Bias of CML estimates, for the AZN case (numbers 1–5 below the boxplots refer to the
estimated parameters, in the order appearing in Table 4)

for the GSK case), as was already mentioned in Sect. 4, it seems that violating the
sufficient condition for ergodicity has no effect on the behavior of the estimation
procedure. The impact of violating necessary instead of sufficient conditions for
ergodicity remains as a topic of future work.

Acknowledgements This research was partially supported by Portuguese funds through the Cen-
ter for Research and Development in Mathematics and Applications, CIDMA, and the Portuguese
Foundation for Science and Technology,“FCT—Fundação para a Ciência e a Tecnologia,” project
UID/MAT/04106/2013.

References

1. Brännäs, K., Quoreshi, A.M.M.S.: Integer-valued moving average modelling of the number of
transactions in stocks. Appl. Financ. Econ. 20, 1429–1440 (2010)
2. Davis, R.A., Dunsmuir, W.T.M., Streett, S.B.: Observation driven models for Poisson counts.
Biometrika 90, 777–790 (2003)
3. Ding, Z., Granger, C.W., Engle, R.F.: A long memory property of stock market returns and a
new model. J. Empir. Financ. 1, 83–106 (1993)
4. Doukhan, P., Fokianos, K., Tjøstheim, D.: On weak dependence conditions for Poisson
autoregressions. Stat. Probab. Lett. 82, 942–948 (2012)
5. Ferland, R., Latour, A., Oraichi, D.: Integer-valued GARCH process. J. Time Ser. Anal. 6,
923–942 (2006)
6. Fokianos, K., Tjøstheim, D.: Nonlinear Poisson autoregression. Ann. Inst. Stat. Math. 64,
1205–1225 (2012)
7. Fokianos, K., Rahbek, A., Tjøstheim, D.: Poisson autoregression. J. Am. Stat. Assoc. 104,
1430–1439 (2009)
202 M.C. Costa et al.

8. Franke, J.: Weak dependence of functional INGARCH processes. Technical Report 126,
Technische Universität Kaiserslautern (2010)
9. Heinen, A.: Modelling time series count data: an autoregressive conditional Poisson model.
Center for Operations Research and Econometrics (CORE) Discussion Paper No. 2003-63,
University of Louvain (2003)
10. Meyn, S.P., Tweedie, R.L.: Markov Chains and Stochastic Stability. Springer, New York (1994)
11. Neumann, M.H.: Absolute regularity and ergodicity of Poisson count processes. Bernoulli 17,
1268–1284 (2011)
12. Turkman, K.F., Scotto, M.G., de Zea Bermudez, P.: Non-linear Time Series: Extreme Events
and Integer Value Problems. Springer, Cham (2014)
Part III
Applications in Time Series Analysis
and Forecasting
Emergency-Related, Social Network Time
Series: Description and Analysis

Horia-Nicolai Teodorescu

Abstract Emergencies and disasters produce a vast traffic on the social networks
(SNs). Monitoring this traffic for detection of emergency situations reported by
the public, for rapid intervention and for emergency consequences mitigation, is
possible and useful. This article summarizes the results presented in a series of
previous papers and brings new data evidence. We describe the specific traffic on
social networks produced by several emergency situations and analyze the related
time series. These time series are found to present a good picture of the situation
and have specific properties. We suggest a method for the analysis of SN-related
time series with the aim of establishing correlations between characteristics of the
disaster and the SN response. The method may help the prediction of the dimension
of the emergency situations and for forecasting needs and measures for relief and
mitigation. Time series of real data (situation time series) are exemplified. The take-
away from this article is that the SN content must be deeper statistically analyzed
to extract more significant and reliable information about the social and external
processes in disasters and use learned correlations for prediction and forecasting.
This paper heavily relies on previous works of the author and reproduces several of
the idea, sometimes ad verbatim, from those papers.

Keywords Correlational analysis • Disaster • Event response • Social


networks • Time series

1 Introduction

In recent years, academics, several agencies and organizations, and several com-
panies proposed the use of social networks (SNs) for detecting and evaluating
emergency situations and improving rescue operations. For this goal, it was
suggested that messages must be analyzed and relevant data extracted. Various

H.-N. Teodorescu ()


Romanian Academy, Iasi Branch, Bd. Carol I nr. 8, Iasi 6600, Romania
‘Gheorghe Asachi’ Technical University of Iasi, Iasi, Romania
e-mail: [email protected]

© Springer International Publishing Switzerland 2016 205


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_16
206 H.-N. Teodorescu

specific analytics and similar applications have been developed and presented in
the literature. Moreover, some commercial, even freely available apps, such as
Banjo allow today almost everyone to follow in almost real-time major events
developing as reflected in a variety of media and SNs. We recall that Banjo collects
messages and posts from all major SNs and social media and lets users see the most
recent ones; the newly introduced feature “rewind” also lets users see older events.
Therefore, this application, supplemented with appropriate means for statistics and
analysis may help researchers perform extended studies on the reflection on SNs
and SMs of developing and older emergencies. However, these tools—standard and
adapted analytics and search tools—are not enough for the substantial understanding
and evaluation of the emergencies, in view of decision making for rescue and
mitigation of the effects of disasters. In the first place, the assumption that the
response to emergency on SNs correlates linearly with the event, as found on or
implied by the sites of several analytics was never tested and the mechanisms
relating the manner and number of responses to dramatic events remain largely
unquestioned and unknown. Thus, finding statistically validated ways to measure
and to represent the correlations between dramatic events in the real world and
the response on SNs requires further studies. This article aims to contribute to the
advancement of the topic and to help elucidating some of these issues.
In [1] we addressed the basic issues of the correlational analysis of the response
on SNs/SMs and the events that produces the related traffic with the applicative
concern of disaster reflection in the SMs/SNs media. This article provides further
evidence support and details on the correlations between events amplitude and traffic
amplitude on SNs/SMs.
We briefly recall several projects regarding the use of SN analytics for disaster
situations; the TweetTracker, proposed as “a powerful tool from Arizona State
University that can help you track, analyze, and understand activity on Twitter,”
extensively described in Kumar et al. [2] and in the papers by Morstatter et al.
[3] and Kumar et al. [4]. TweetTracker has been reportedly used by Humanity
Road, an organization “delivering disaster preparedness and response information
to the global mobile public before, during, and after a disaster,” see (http://
tweettracker.fulton.asu.edu/). But not only charities and universities turned toward
SNs for monitoring disasters. Federal agencies in the USA are requested to monitor
the spread and quality of their e-services using analytics (DAP: Digital Metrics
Guidance and Best Practices); these agencies are expressly required to use analytics
[5], so, it is not surprising that several of these agencies used monitoring and
predictive analytics, for event detection, including early detection of earthquakes
and flu outbreaks [6], among others. Also, federal agencies in the USA have already
applied SN analysis in law enforcing [7]. There is no independent analysis of the
results, however.
There are numerous academic studies, among others [8–14] advocating the
application of SN content analysis in disasters. However, there are few proved and
convincing uses until now.
The organization of this article is linear. Section 2 discusses the incentives for the
use and the limits in the use of analytics in disasters. The third Section introduces
Emergency-Related, Social Network Time Series: Description and Analysis 207

issues related to the description of the SN responses to events and of the related time
series. Examples of SN-related time series are presented and analyzed in the fourth
Section. The last section draws conclusions.
Notice that throughout the paper, we name “post” any type object on a SNs/SMs
(e.g., tweet, message, posted pictures, blogs, etc.). Also notice that this article is a
review of the work performed mainly by the author and colleagues on the use of SNs
for disaster detection, improving rescue timeliness, and increasing the mitigation of
the disaster consequences. The papers summarized here include mainly [1, 15–17].

2 Incentives for and Limits in the Use of Analytics


in Disasters

There are several reasons for advocating the use of SNs/SMs analysis in view
of detecting emergencies. The most obvious are the large amount and the low-
granularity of the information one can gather, the sometimes fast response of the
SNs/SMs to events, and the lower cost of the gathered information. The obvious
drawbacks are the questionable respect of the privacy of people suffering hardship,
the large amount of noise in the information collected, first analyzed in [17], the lack
of studies on the reliability of the collected information, the issue of interpretation
of the information (DAP: Digital Metrics Guidance and Best Practices) and Herman
[5], and the questionable advantage in the speed of information propagation on SNs
compared to sensor networks, in some cases. Related to the last issue, Teodorescu
[1] draws the attention that the optimism expressed in [18] on the speed advantage
of SNs should be moderated because “while it is true that the earthquake waves
may travel slowly, allowing tweets to arrive at distant enough locations before the
earthquake, radio and wire communications between seismic monitoring stations
send warnings on a regular basis and much faster than the analysis of tweets, as
used in [18], who argues that ‘tweets give USGS early warning on earthquakes’.”
Nevertheless, there is little doubt that the monitoring and analysis of SNs/SMs
for the detection of emergencies and for assessing their implications is a valuable
addition to the tools of law enforcers, relief agencies, and decision makers directing
the mitigation of disaster effects [19–21]. The question is not if to use SNs and SMs
analysis, but how to best use it. The answer to the previous question requires deeper
understanding of the SNs/SMs response to events. This and previous articles [1,
15–17] address this particular issue.
Because the infancy era of SNs and analytics ended and tools are already
available for collecting various sorts of data, we suggest that statisticians can bring
in the near future an essential contribution to the deeper understanding on the
SNs behavior. What is needed in the first place are extensive correlation studies
to determine the mixture of factors influencing the SNs response to events and the
208 H.-N. Teodorescu

roles and weights of those factors. Studies by Schultz et al. [22] and Utz et al. [23],
and Teodorescu [1, 15] have already “emphasize that there numerous factors that
influence SN communications, including various emotions, the type of the crisis
and the SN itself.”
Reflecting concerns, systematically validated evidence, and empirical, anecdotic
evidence gathered during a project and partly presented in [1, 15–17], moreover
reflected concerns expressed by other studies [24–27], we summarized [1, 15] the
following reservations:
1. The usefulness of the information collected on SNs in rescuing and in disaster
management is limited and case-dependent; a large amount of posts represent
noise that, when not filtered, may confuse the efforts of disaster mitigation. In
more detail:
2. Posts may help produce better assessment of the situation when the event
involves a large part of the population, but not when the posts refer to localized,
distant events not affecting their authors [1]. Stefanidis et al. [28] specifically
address the issue of gathering both content and location for improving the data
relevance.
3. Posts triggered by disasters frequently represent only a carrier of sentiments and
frustrations unrelated to the event—although triggered by it—or the result of the
chatting desire. When not filtered, these posts clutter the data and may modify
the results of the analysis in unexpected ways [17]. The effect of the noise on
the correlation between the SNs response to events and number of the affected
population was addressed, for example, in [16].
4. Filtering all the above described noise is beyond the capabilities of current
analytics.
The literature on the relevance of the messages on SNs for disaster mitigation
includes various useful points of view, see, for example, [19, 24–27, 29–34]. Castillo
et al. [30] tried to propose a more technical approach, by defining the posts relevance
in terms of the number of their retransmissions and audience. Unfortunately, it is
quite easy to see that there is no reason for equality of the importance of a post for
rescuers and the interest the audience gets in that message. The person receiving
the post is key in this respect. While posts retransmissions and audience may help
filtering part of the unessential data, they are not solving the problem [1]. These
issues are exemplified in the subsequent sections of this article.

3 SN Response and Related Time Series

In the generation of time series discussed in this article, two processes contribute
[1, 15]: the external event (disaster) and the societal process represented by the SN-
reflection of the event. The time series reflect both of them, yet the interest of the
rescuers and emergency agencies may differ in scope; some decision makers may
be interested in the event itself (how large is the flooded area; how big was the
Emergency-Related, Social Network Time Series: Description and Analysis 209

earthquake and how many buildings are destroyed); others may wish to know only
about the affected population in terms of injured or dead numbers; relief agencies
may wish to know on all aspects, including panic spreading. The posts on SNs may
not answer all the related questions; we don’t know now well to what SNs response
correlate. Subsequently the interest is restrained to what the time evolution of the
number of posts, seen as a time series, may tell us. Precisely, we aim to determine if
the future number of posts can be predicted from short segments of the related time
series. This question is essential because, on one side, we need to make predictions
for better planning and mitigation of the disaster consequences; second, the ability
of making correct predictions validates the model and frees the rescuers from part
of the task of monitoring the SN response.
We distinguish between “instantaneous” (short duration) events (e.g., explosions,
earthquakes, landslides, tornados, crashes, and some attacks), and those (such as
storms and wildfires) that we name long-duration events (also characterized by
a slow onset.) Short duration events have abrupt onset. In technical terms, short
duration events act as Dirac impulse excitations on the system represented by
the SN. From another point of view [1], events may be predictable, enabling us
to issue warnings (in this category are storms, hurricanes, and most floods), and
unpredictable, such as earthquakes and crashes. Notice that actually there is a third
category, including the events that allow us a window for warning of the order of
minutes (e.g., tsunamis and tornados). For brevity, we are concerned with the main
two categories.
The subsequent paragraphs, on notations and main parameters of the SNs/SMs
response to events, are almost ad verbatim from [15] and [1]. Conveniently assume
that one can determine the moment of occurrence (onset) of the event, ton - ev and the
moment of its vanishing, toff - ev . The duration of the event is Tev D toff-ev ton-ev . The
“intensity” (amplitude) of the event, E(t), is the number of messages or posts per unit
time. Then, several features of the event might be defined, for example, its maximal
peak amplitude, the onset time, the decrease time, the number of peaks, (main)
frequency of its fluctuations, etc. The elementary parameters of the SN response
and their notations are listed subsequently.
• nev (t1 , t2 )—the number of posts in a specified time interval, (t1 , t2 ); when the
interval length t2  t1 is the unity this parameter is the same as the one below and
is the variable in the time series.
• nev (t)—the temporal density distribution (i.e., the “instantaneous” number of
relevant messages) of the SN/SM response (relevant posts), for the event ev.
• Onset delay of the SN response, on D t .n > 0/  t0 ; here, ton - ev , the onset time
of the event, with t0 .
• Nmax —maximal amplitude of the SN response (maximum number of posts per
time unit; the specified unit of time may be an hour, a day, or a minute).
• Ntot —total number of posts (messages); Ntot may correlate with the duration of
the disaster, with the total number of victims, or with effects of panic, effects of
“spectacularity,” etc.
• tmax .since t0 / D t .n D Nmax /  t0 —time to the peak of the SN response.
•  plt —plateau duration (if any) of the SN response.
210 H.-N. Teodorescu

Further parameters have been introduced and explained in Teodorescu [1]. Also,
two examples of time series were given in that paper, related to the SNs responses to
the weather event (winter storm) in New England, January–February 2015, and to
the small earthquake series in Romania, during 2014—first months of 2015. Further
examples are given in this article, using time series represented by n .tk ; tk C t/ D
nk , where t is typically one day. The main interest is to find a good empirical model
for the time series and to try to explain the process leading to the empirical model
in the frame of a theoretical one.

4 Examples of SN-Related Time Series

The method of gathering data (posts) on several SNs and SMs was presented in
Teodorescu [1, 15–17]. A first example of time series is shown in Fig. 1, which
represents the response in number of tweets per day to a long-duration event, the
series of winter storms in New England, January–February 2015 (only January
shown). Data in Fig. 1 was collected by the author daily using both Topsy Social
Analytics (http://topsy.com/) and Social Mention search (http://socialmention.com)
with the search condition “blizzard OR storm AND snow OR ice,” manually
assembled and processed by the author; the figures in the graph represent averaged
numbers. Because of the several relapses of the storm, there are, as expected, several
peaks. Also, an almost plateau is present due to multi-day sustained storm. The time
of the peaks coincided with the day of the storm recurrences, but there was no good
correlation of the number of posts with the quantity of snow or the temperature in the
region as announced by major meteorological stations. The manual analysis of the
posts shown that a large number (majority) of the posts were triggered anecdotally
by the weather condition, but were not relevant with respect to the persons in need.
Worryingly, very few persons on need were found to post messages and virtually no
posts were from aged or sick persons in need, although a few messages written by

Period Feb 1-Mar 2, 2015.Event = "blizzard OR storm AND snow OR ice".


Averaged number of tweets per day
25000
20000
15000
10000
5000
0
1 6 11 16 21 26

Fig. 1 Time series for the number of messages for the ‘blizzard OR storm AND snow OR ice’
event, measured as tweets per day, for the period 1 February–2 March 2015 (horizontal axis: day
count)
Emergency-Related, Social Network Time Series: Description and Analysis 211

14000 14000
12000 y = 36010e-1.592x
12000
10000 R² = 0.9418
10000
8000 8000
6000 y = 10226e-1x
6000
4000 R² = 0.8408
4000
2000
2000
0
0
1 2 3 4 5 6
1 2 3 4

Fig. 2 Response of Tweeter to the Czech event occurred on February 24, 2015 (see text). Also
shown is the effect of the noise on the quality of the model

young people referred to neighbors potentially requiring assistance. We concluded


that, in this type of emergency, SNs response is almost useless, unfortunately.
Small duration disasters seem to elicit more attention on SNs and SMs, possibly
because their usually shocking, unexpected, “spectacular” content. They elicit a fast
growing number of posts in the first one or two days after the event and a much
slower decrease in interest, with much interest easily resurrected in the first week
when some details related to the event surfaces. Even small, local disasters tend
to have these features. A first example of SN response to a brief-duration event is
shown in Fig. 2. The event is a somewhat obscure one, with a mentally ill gunmen
opening fire on February 24, 2015, in a small town in Czech Republic, killing several
people.
Notice in Fig. 2 the effect of the noise on the model precision. In the last days of
the SN monitoring (days 5 and 6), relevant messages were in very low number but
several unrelated messages mistakenly considered as relevant were included in the
search (based on keywords). Forcing the model to include those days significantly
decreases the quality of the resulted model (left panel). Also notice that knowing
that the model is exponential decay, y D Aeat , its parameters A, a can be derived
approximatively based on the data from the first two days of the response. Therefore,
one can make reasonably good predictions on the SN response for the following
days (3–6) based on the SN response in days 1 and 2.
In another example of results of SN monitoring and related time series connected
to a brief duration event (in this case, Copenhagen attack on February 14, 2015), the
approximation of the SN response with an exponential decay (again after the fast 
increase during the first day, not shown in the figure) works very well R2 D 0:953 ,
see Fig. 3. However, the decrease in response is less fast in the second day of the
decrease compared to the prediction of the exponential best matching. A similar
behavior was found for the response to the Dubai fire (Fire at Dubai skyscraper
named The Torch, February 21, 2015).
Regarding the model for the increase of the number of posts, which typically
occurs in less than 24 h, an exponential model with hour-base sampling seems to
212 H.-N. Teodorescu

35000
30000
25000
20000 y = 32316e-0.761x
15000 R² = 0.9535
10000
5000
0
0 1 2 3 4 5

Fig. 3 Approximation of the decay of the response to Copenhagen attack on February 14, 2015

y = -15.778x3 + 251.76x2 - 1230.3x +


2000 1856.2 2000
R² = 0.9896
1500 1500
y = -16x4 + 144.22x3 - 239.67x2 - y = 1933.4e-1.352x
1000 773.13x + 1828.8 1000
R² = 0.997
R² = 0.908
500
500
0
0 2 4 6 0
-500 0 2 4 6

Fig. 4 Forgetting (decrease) model for SN response: polynomial and exponential approximations
for the train derailment event in California, February 24, 2015

work well enough. The decrease of the response is also best (among simple models)
described by the exponential model, which also can find a natural explanations
and can be extended to cover larger time intervals. Although polynomial models
may work better for short durations, they behave wrong when time increases (see
negative values of the response and fluctuations) and have no reasonable basis as
models, see Fig. 4. In Teodorescu [35], an approach was presented for optimizing
the model based on adaptive fuzzy coordinate transformations.

5 Discussion and Conclusions

Converting the activity on the SNs and SMs into time series seems a natural
idea, moreover it is relatively easy to perform as explained in Section 3. This
representation makes the SNs’ response to events easier to represent and analyze
and provides a large number of study avenues. Importantly, once a type of model
for the response is established, it is easy to determine from the initial data the
parameters of the model. This, in turn, allows us to make reliable enough predictions
Emergency-Related, Social Network Time Series: Description and Analysis 213

on the amplitude of the response in the hours and days to come. The ability to make
predictions is, as it is well-known, crucial in the process of planning for rescues and
making decisions for mitigation of the effects of the emergencies.
However, the predictions of the actual situation in emergencies, based on the
number of posts referring to a specific emergency, should be considered with
care, because many, possibly most of the posts may come from people outside or
unrelated to the emergency. Also, as proved by the analysis of posts related to the
events of snow storms in New England in 2015, even when the posts come from
people in the involved region and even from people involved in the event, the vast
majority of those people may be not in need. Therefore, a detailed analysis of the
posts has to be made before they are included in the count of significant posts and
used to build the time series for representing the situation. The tasks of perfecting
the search for posts of interest related to emergencies and of validating the relevance
of the posts from the points of view of rescuers and decision makers remain to be
accomplished, as currently existing systems are unable to perform these tasks with
enough accuracy.
Additional Material Additional material regarding methods of data collection and analysis, on
the limits of these methods and on the uncertainty in the data can be found on the web address of
the project, at http://iit.academiaromana-is.ro/sps/ or can be asked directly from the author.

Acknowledgments The research and the participation in this conference are partly supported
by the multi-annual NATO SPS grant 984877 “Modelling and Mitigation of Public Response
to Catastrophes and Terrorism.” The contest from various freely available analytics, including
those from SocialMention™ real-time search platform, Topsy Inc.™, Google Analytics™, etc.,
was only used as resources (as permitted). Data generated by these analytics were compared and
only derivatives of the data are given in this article.
Conflict of Interest In the above-mentioned grant, the author is the PI. At our best knowledge,
neither the author nor his employers or the institutions and organizations named in this article in
relation to the author have any interest in the tools available free on the Internet and used in this
research, or in the companies or organizations that produced those tools.

References

1. Teodorescu, H.N.: Emergency situational time series analysis of SN traffic. In: Proceedings of
the ITISE 2015 International Work-Conference on Time Series, Granada, July 1–3, 2015, pp.
1–12 (also, additional material at http://iit.academiaromana-is.ro/sps/)
2. Kumar, S., Morstatter, F., Liu, H.: Twitter Data Analytics. Springer, New York. http://
tweettracker.fulton.asu.edu/tda/TwitterDataAnalytics.pdf (2013)
3. Morstatter, F., Kumar, S., Liu, H., Maciejewski, R.: Understanding Twitter data with Tweet-
Xplorer (Demo). In: Proceedings of the 19th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, KDD 2013, pp. 1482–1485. ACM, New York (2013)
4. Kumar, S., Barbier, G., Abbasi, M.A., Liu, H.: TweetTracker: an analysis tool for humanitarian
and disaster relief. ICWSM, http://tweettracker.fulton.asu.edu/Kumar-etal_TweetTracker.pdf
(2011)
5. Herman, J.: Social media metrics for Federal Agencies. http://www.digitalgov.gov/2013/04/19/
social-media-metrics-for-federal-agencies/ (2013)
214 H.-N. Teodorescu

6. Konkel, F.: Predictive analytics allows feds to track outbreaks in real time. http://fcw.com/
articles/2013/01/25/flu-social-media.aspx (2013)
7. Lyngaas, S.: Faster data, better law enforcement. http://fcw.com/articles/2015/02/03/faster-
data-better-enforcement.aspx (2015)
8. Abbasi, M.-A., Kumar, S., Filho, J.A.A., Liu, H.: Lessons learned in using social media
for disaster relief—ASU crisis response game. In: Social Computing, Behavioral—Cultural
Modeling and Prediction. LNCS 7227, pp. 282–289. Springer, Berlin/Heidelberg (2012)
9. Anderson, K.M., Schram, A.: Design and implementation of a data analytics infrastructure in
support of crisis informatics research (NIER track). In: Proceedings of the 33rd International
Conference on Software Engineering (ICSE’11), May 21–28, 2011, Waikiki, Honolulu, HI,
USA. pp. 844–847. ACM, New York (2011)
10. Boulos, M.N.K., Sanfilippo, A.P., Corley, C.D., Wheeler, S.: Social Web mining and exploita-
tion for serious applications: technosocial predictive analytics and related technologies for
public health, environmental and national security surveillance. Comput. Methods Programs
Biomed. 100, 16–23 (2010)
11. Liu, B.F., Austin, L., Jin, Y.: How publics respond to crisis communication strategies: the
interplay of information form and source. Public Relat. Rev. 37(4), 345–353 (2011)
12. Houston, J.B., Spialek, M.L., Cox, J., Greenwood, M.M., First, J.: The centrality of com-
munication and media in fostering community resilience. A framework for assessment and
intervention. Am. Behav. Sci. 59(2), 270–283 (2015)
13. Merchant, R.M., Elmer, S., Lurie, N.: Integrating social media into emergency-preparedness
efforts. N. Engl. J. Med. 365, 289–291 (2011)
14. Teodorescu, H.N.: SN voice and text analysis as a tool for disaster effects estimation—a
preliminary exploration. In: Burileanu, C., Teodorescu, H.N., Rusu, C. (eds.) Proceedings of
the 7th Conference on Speech Technology and Human—Computer Dialogue (SpeD), Oct 16–
19, 2013, pp. 1–8. IEEE, Cluj-Napoca (2013) doi:10.1109/SpeD.2013.6682650
15. Teodorescu, H.N.: Using analytics and social media for monitoring and mitigation of social
disasters. Procedia Eng. 107C, 325–334 (2015). doi:10.1016/j.proeng.2015.06.088
16. Teodorescu, H.N.L.: On the responses of social networks’ to external events. Proceedings of
the 7th IEEE International Conference on Electronics, Computers and Artificial Intelligence
(ECAI 2015), 25 June –27 June, 2015, Bucharest (2015)
17. Teodorescu, H.N.L.: Social signals and the ENR index—noise of searches on SN with
keyword-based logic conditions. In: Proceedings of the IEEE Symposium ISSCS 2015, Iasi
(2015) (978-1-4673-7488-0/15, 2015 IEEE)
18. Konkel, F.: Tweets give USGS early warning on earthquakes. (2013). http://fcw.com/articles/
2013/02/06/twitter-earthquake.aspx
19. Bruns, A., Burgess, J.E.: Local and global responses to disaster: #eqnz and the Christchurch
earthquake. In: Sugg, P. (ed.) Disaster and Emergency Management Conference Proceedings,
pp. 86–103. AST Management Pty Ltd, Brisbane (2012)
20. Pirolli, P., Preece, J., Shneiderman, B.: Cyberinfrastructure for social action on national
priorities. Computer (IEEE Computer Society), pp. 20–21 (2010)
21. Zin, T.T., Tin, P., Hama, H., Toriu, T.: Knowledge based social network applications to
disaster event analysis. In: Proceedings of the International MultiConference of Engineers and
Computer Scientists, 2013 (IMECS 2013), vol. I, Mar 13–15, 2013, pp. 279–284. Hong Kong
(2013)
22. Schultz, F., Utz, S., Göritz, A.: Is the medium the message? Perceptions of and reactions to
crisis communication via twitter, blogs and traditional media. Public Relat. Rev. 37(1), 20–27
(2011)
23. Utz, S., Schultz, F., Glocka, S.: Crisis communication online: how medium, crisis type and
emotions affected public reactions in the Fukushima Daiichi nuclear disaster. Public Relat.
Rev. 39(1), 40–46 (2013)
24. Chae, J., Thom, D., Jang, Y., Kim, S.Y., Ertl, T., Ebert, D.S.: Public behavior response analysis
in disaster events utilizing visual analytics of microblog data. Comput. Graph. 38, 51–60 (2014)
Emergency-Related, Social Network Time Series: Description and Analysis 215

25. Murakami, A., Nasukawa, T.: Tweeting about the tsunami? Mining twitter for information
on the Tohoku earthquake and tsunami. In: Proceedings of the 21st International Confer-
ence Companion on World Wide Web, WWW’12, pp. 709–710. ACM, New York (2012)
doi:10.1145/2187980.2188187
26. Potts, L., Seitzinger, J., Jones, D., Harrison, A.: Tweeting disaster: hashtag constructions
and collisions. In: Proceedings of the 29th ACM International Conference on Design of
Communication, SIGDOC’11, pp. 235–240. ACM, New York (2011)
27. Toriumi, F., Sakaki, T., Shinoda, K., Kazama, K., Kurihara, S., Noda, I. Information sharing
on Twitter during the 2011 catastrophic earthquake. In: Proceedings of the 22nd International
Conference on World Wide Web companion, pp. 1025–1028. Geneva (2013)
28. Stefanidis, A., Crooks, A., Radzikowski, J.: Harvesting ambient geospatial information from
social media feeds. GeoJournal 78, 319–338 (2013)
29. Acar, A., Muraki, Y.: Twitter for crisis communication: lessons learned from Japan’s tsunami
disaster. Int. J. Web Based Communities 7(3), 392–402 (2011)
30. Castillo, C., Mendoza, M., Poblete, B.: Information credibility on Twitter. In: WWW 2011—
Session: Information Credibility, World Wide Web Conference (IW3C2), Mar 28–April 1,
2011, pp. 675–684. ACM, Hyderabad (2011)
31. Cheong, F., Cheong, C.: Social media data mining: a social network analysis of tweets during
the 2010–2011 Australian floods. In: Proceedings of the PACIS, Paper 46. http://aisel.aisnet.
org/pacis2011/46 (2011)
32. Gil, Y., Artz, D.: Towards content trust of web resources. Web Semant. Sci. Serv. Agents World
Wide Web 5, 227–239 (2007)
33. Kent, M.L., Carr, B.J., Husted, R.A., Pop, R.A.: Learning web analytics: a tool for strategic
communication. Public Relat. Rev. 37, 536–543 (2011)
34. Qu, Y., Huang, C., Zhang, P., Zhang, J.: Microblogging after a major disaster in China: a case
study of the 2010 Yushu Earthquake. In: CSCW 2011, March 19–23, 2011. ACM, Hangzhou
(2011)
35. Teodorescu, H.N.L.: Coordinate fuzzy transforms and fuzzy tent maps—properties and
applications. Studies Inform. Control 24(3), 243–250 (2015)
Competitive Models for the Spanish Short-Term
Electricity Demand Forecasting

J. Carlos García-Díaz and Óscar Trull

Abstract The control and scheduling of the demand for electricity using time
series forecasting is a powerful methodology used in power distribution systems
worldwide. Red Eléctrica de España, S.A. (REE) is the operator of the Spanish
electricity system. Its mission is to ensure the continuity and security of the
electricity supply. The goal of this paper is to improve the forecasting of very
short-term electricity demand using multiple seasonal Holt–Winters models without
exogenous variables, such as temperature, calendar effects or day type, for the
Spanish national electricity market. We implemented 30 different models and
evaluated them using software developed in MATLAB. The performance of the
methodology is validated via out-of-sample comparisons using real data from the
operator of the Spanish electricity system. A comparison study between the REE
models and the multiple seasonal Holt–Winters models is conducted. The method
provides forecast accuracy comparable to the best methods in the competitions.

Keywords Holt–Winters exponential smoothing • Multiple seasonal • Red


Eléctrica de España • Short-term electricity demand forecasting • Spanish
electricity market

1 Introduction

Electric power markets have become competitive due to the deregulation carried
out in recent years that allows the participation of producers, investors, traders and
qualified buyers. Thus, the price of electricity is determined on the basis of a buying–
selling system. Short-term load forecasting is an essential instrument in power
system planning, operation and control. Many operating decisions are based on load
forecasts, such as dispatch scheduling of generating capacity, reliability analysis and
maintenance planning for the generators. Therefore, demand forecasting plays an
important role for electricity power suppliers, as both excess and insufficient energy

J.C. García-Díaz () • Ó. Trull


Applied Statistics, Operations Research and Quality Department, Universitat Politècnica de
València, Valencia, Spain
e-mail: [email protected]

© Springer International Publishing Switzerland 2016 217


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_17
218 J.C. García-Díaz and Ó. Trull

production may lead to high costs and significant reductions in profits. Forecasting
accuracy has a significant impact on electric utilities and regulators.
A market operator, a distribution network operator, electricity producers, con-
sumers and retailers make up the Spanish electricity system. This market is based
on either a pool framework or bilateral contracts. Market clearing is conducted
once a day, providing hourly electricity prices. The market operator in Spain is
OMEL (Operador del Mercado Ibérico de Energía, Polo Español, S. A.), which is
responsible for managing the bidding system for the purchase and sale of electricity
according to the legal duties established, as well as the arrangement of settlements,
payments and collections, incorporating the results of the daily and intra-day
electricity markets. Red Eléctrica de España, S.A. (REE) is the distribution network
operator within the Spanish electricity system, and its mission is to ensure the
continuity and security of the electricity supply. The role of REE as system operator
consists of maintaining a balance. For this purpose, it forecasts consumption,
operating and overseeing the generation and transmission installations in real time,
thus ensuring that the planned production at the power stations coincides at all times
with the actual consumer demand. A good description of the functioning of Spain’s
electricity production market can be found in [1–4].
A state-of-the-art description of different methods of short-term electricity
demand forecasting can be found in [5, 6]. These methods are based on the Holt–
Winters exponential smoothing model, ARIMA time series models, and electricity
demand regression models with exogenous variables, such as temperature, artificial
neural networks (ANNs) and hybrid forecast techniques, among others. ANN mod-
els are an alternative approach to load forecast modelling [7, 8]. Some researchers
consider related factors such as temperature, e.g. [9–11], and calendar effects or day
type [12] in load forecasting models.
Exponential smoothing methods are especially suitable for short-term forecasting
[6, 13–16]. Seasonal ARIMA [17] and Holt–Winters exponential smoothing [18]
are widely used, as they require only the quantity-demanded variable, and they
are relatively simple and robust in their forecasting. Recent papers have stimulated
renewed interest in Holt–Winters exponential smoothing for short-term electricity
demand forecasting, due to its simple model formulation and good forecasting
results [16, 19–23]. In a recent paper, Bermudez [24, 25] analyses an extension
of the exponential smoothing formulation, which allows the use of covariates
to introduce extra information in the forecasting process. Calendar effects, such
as national and local holidays and vacation periods, are also introduced using
covariates. This hybrid approach can provide accurate forecasts for weekdays,
public holidays, and days before and after public holidays.
In recent years, new techniques have been used in electricity demand forecasting.
For example, non-linear regression models have been used for load forecasting
[26–28]. Functional time series modelling and forecasting techniques have been
extensively studied for electricity demand forecasting, and [29–32] have also studied
functional nonparametric time series modelling. Hyndman and Shang [27] propose
forecasting functional time series using weighted functional principal component
regression and weighted functional partial least squares regression. Some authors
Competitive Models for the Spanish Short-Term Electricity Demand Forecasting 219

have studied the possibility of using state-space models [33–36] to improve load
forecasting performance.
It is well known that some exogenous variables, particularly temperature and
calendar effects (including annual, weekly and daily seasonal patterns, as well as
public holidays, and different hours on work days and non-work days), have an
influence on the electricity demand [37]. There are some considerations to take into
account, as the complexity of the model would increase if exogenous variables such
as temperature were introduced. This solution is certainly not a parsimonious way
of modelling. Several authors have considered this problem. The interested reader
is referred to Soares and Medeiros [12] and Valor et al. [38] for more detailed
descriptions. Temperature is not considered for two reasons:
• Available data on Spanish electricity consumption is not disaggregated regionally
into different climatic sub regions. Taking into account that the available
electricity consumption data used in the study correspond to the whole Spanish
state, while the meteorological parameters vary in different geographical regions
(especially between northern and southern Spain, and between the islands
(Canary and Balearic Islands) and the Iberian Peninsula), it is necessary to
calculate weighted indices for the representative meteorological parameters for
the entire geographical region under consideration. Soares and Medeiros [12]
also discuss this problem.
• The hourly temperatures in Spain are not easily accessible. The selected variable
is the mean daily air temperature (ı C), as it can better capture thermal oscillation
within a day [9]. Taylor and Buizza [12] present an analysis based on daily
electricity demand data.
Moreover, forecasting models based on linear models now tend to approach this
task by using separate models for each hourly data point (some researchers propose
forecasting models that include 168 separate regression equations, one for each
hour in the week). However, as the problem is intrinsically a multivariate one, the
interrelated information is lost when the models treat each hourly value separately. It
is also possible to use the grouping of work days and weekends, but the problem of
information loss would be the same, due to disruption in the correlation structure
present in the time series. In modelling time series, it is simply not possible to
perform groupings of weekdays and weekends, as this would change the structure
of temporal dependence.
The models discussed in this paper only consider time as an explanatory
variable to create simple, flexible models. In this paper, an automatic forecasting
procedure based on univariate time series models is implemented. The hourly
Spanish electricity demand forecasting performance of multiple seasonal univariate
Holt–Winters models is examined, based on multi-step ahead forecast mean squared
errors, which are greatly dominated by daily, weekly and annual seasonal cycles.
REE’s model for electricity demand forecasting consists of one daily model and
24 hourly models. About 150 parameters are readjusted each hour to enhance the
next hour’s forecasts. The mean forecast accuracy of REE models is reported as a
MAPE (mean absolute percentage error) of around 2 % [2].
220 J.C. García-Díaz and Ó. Trull

The goal of this paper is to present a parsimonious modelling approach to Spanish


electricity demand forecasting, as an alternative to REE, which should provide
similar results with less computational and conceptual effort. The approach will
use HWT methods following multiple seasonal patterns.
The study makes two main contributions. First, the paper proposes a gen-
eralisation of double and triple seasonal Holt–Winters models to an n-multiple
seasonal case, and new initialisation criteria are adapted to multiple seasonal
components. Second, it proposes a new methodology based on multiple seasonal
Holt–Winters models for short-term demand forecasting. A set of 30 models have
®
been implemented in the MATLAB software package, as well as the methodology,
in order to help electricity suppliers compute the best short-term hourly demand
forecasts for operational planning.
This work is structured as follows: Section 2 provides the details of the
Spanish electricity demand data set and REE modelling. Section 3 introduces a
generalisation of double and triple seasonal Holt–Winters models to an n-multiple
seasonal case proposed by the authors, and new initialisation criteria are adapted
to multiple seasonal components. In Sect. 4, we carry out a study comparing the
effectiveness the proposed models and the REE method for predicting short-term
Spanish electricity demand. Computational results are evaluated, and comparative
studies are discussed in this section. Finally, Section 5 presents the research
conclusions.

2 Spanish Short-Term Electricity Demand Forecasting

The data set used in this paper covers the period from 1 July 2007 to 9 April 2014. It
is provided by REE (www.ree.es). REE also provides forecasts for the next hours, as
well as other operational values. The data set comprises all the transactions carried
out in mainland Spain, not including the Canary Islands or the Balearic Islands. This
kind of series is highly dominated by three seasonal patterns: the daily cycles (intra-
day), with a period length of s1 D 24, the weekly cycles (intra-week), with length of
s2 D 168, and the yearly cycles (intra-year), where s3 D 8766. Figure 1, on the left,
shows a random week from this series, in 2010, 2011, 2012 and 2013, where the
intra-day and intra-week seasonal patterns are recognisable. The right panel depicts
two years on the same axes, and the overlapping of the two series denotes the intra-
year seasonal pattern.

2.1 REE Modelling

The methodology used by REE to provide forecasts for the next day consists of a
combination of 24 hourly models plus 1 daily model. All these models share the
same structure, which considers the special days, weather and some distortions. The
Competitive Models for the Spanish Short-Term Electricity Demand Forecasting 221

Week 25/2010 Week 25/2011

17-Jun-2010 16-Jun-2011
Week 25/2012 Week 25/2013

March June September


Year 2010 Year 2011
21-Jun-2012 24-Jun-2013

Fig. 1 Seasonal patterns obtained from the hourly Spanish electricity demand time series. On the
left, four representations of week 25 in different years are depicted, where the intra-day cycles
from Monday to Friday and the intra-week cycles on the weekend can be seen. On the right, the
hourly demand for years 2010 and 2011 is depicted. The overlap of the two series denotes the third
seasonality: intra-year

model is expressed as (1)

ln Ct D pt C st C CSDt C CWEAt C ut ; (1)

where pt refers to the trend, while st refers to the seasonality. The factor CSDt is the
contribution of the special days, and CWEAt is the weather conditions. ut refers
to a stationary disturbance. The component pt C st C ut is denoted as the base
consumption, modelled using the ARIMA methodology. The rest of the factors are
modelled as dummy variables. More detailed descriptions of the Spanish electricity
system and REE forecasting methods can be found in [1–4].
In this paper we used the forecasts provided directly by REE employing the
previously mentioned methodology.

3 Multiple Seasonal Exponential Smoothing Holt–Winters


Models

3.1 Holt–Winters Models Generalisation

Double and triple seasonal Holt–Winters models (HWT) were introduced by Taylor
[16, 20] as an evolution of Holt–Winters methods [18]. Here, we propose a
generalisation to n-seasonality of the HWT models, formed by transition equations:
level smoothing (2); trend smoothing (3); and as many seasonality smoothing
222 J.C. García-Díaz and Ó. Trull

equations (4) as seasonal patterns are taken into account. A final forecast equation
(5) uses the previous information to provide a k-step-ahead forecast. We call this
model nHWT, and the smoothing equations for an additive trend-multiplicative
seasonality model are defined in Eqs. (2), (3), (4), and (5).
0 1
X
St D ˛ @ Y .i/ A C .1  ˛/ .St1 C Tt1 / ;
t
(2)
Itsi

Tt D  .St  St1 / C .1   / Tt1 ; (3)

0 1
.i/ X   .i/
D •.i/ @ Y . j/ A C 1  ı .i/ Itsi ;
t
It (4)
St Itsj
j¤i

Y  Y .i/ 
b .i/
Xt .k/ D .St C kTt / Itsi Ck C 'AR
k
Xt  .St1 C kTt1 / Itsi ; (5)
i i

where Xt are the observed values, St is the level, Tt is the additive trend smoothing
and I t(i) are the seasonal smoothing equations for the seasonal pattern. The involved
parameters ˛ and  are the smoothing parameters for level and trend, while ı (i)
is the smoothing parameter for each seasonal pattern. Here i means the seasonal
component, with a seasonal cycle length of Si . b Xt .k/ is the k-step ahead forecast.
'AR is the adjustment for the first autocorrelation error. Additionally, Hyndman et al.
[35] and Taylor [39] proposed damped trend versions of (2), and we gathered all
models following Pegel’s classification, shown in Table 1.
These models are expounded in Table 2 for better understanding. This table
shows the complete formulae for the adjusted models using the first autoregressive
error. To obtain normal models, 'AR D 0 annuls this component.

Table 1 Summary of implemented models


Seasonality None Additive Multip. None Additive Multip.
Trend Normal AR(1) adjusted
None NNL NAL NML NNC NAC NMC
Additive ANL AAL AML ANC AAC AMC
Damped additive dNL dAL dML dNC dAC dMC
Multiplicative MNL MAL DML MNC MAC MMC
Damped multiplicative DNL DML DML DMC DAC DMC
Notation used: the first letter defines the trend method (N: None, A: Additive, d: Damped additive,
M: Multiplicative, D: Damped multiplicative); the second defines the seasonal method (N: None,
A: Additive, M: Multiplicative) and the last letter stands for AR(1) adjustment (L: not adjusted,
C:adjusted). (e.g. AMC24,168 is a double (24 and 168 h lengths) multiplicative-seasonal with
additive trend model, adjusted with the first-order autocorrelation error)
Table 2 Multiple seasonal Holt–Winters models (nHWT), including damped versions
XX
Additive Multiplicative
XXXSeason None
Trend XXX
! !
X .i/
Y .i/
None St D ˛Xt C .1  ˛/ St1 St D ˛ Xt  Itsi C .1  ˛/ St1 St D ˛ Xt = Itsi C .1  ˛/ St1
i i
0 1 0 1
.i/
X . j/   .i/ Y . j/   .i/
b k
Xt .k/ D St C 'AR  "t It D ı .i/ @Xt  St  Itsj A C 1  ı .i/ Itsi It D ı .i/ @Xt =St Itsj A C 1  ı .i/ Itsi
j¤i j¤i
X .i/
Y .i/
b k b k
Xt .k/ D St C Itsi Ck C 'AR  "t Xt .k/ D St Itsi Ck C 'AR  "t
i i
! !
X .i/
Y .i/
Additive St D ˛Xt C .1  ˛/ .St1 C Tt1 / St D ˛ Xt  Itsi C .1  ˛/ .St1 C Tt1 / St D ˛ Xt = Itsi C .1  ˛/ .St1 C Tt1 /
i i

Tt D  .St  St1 / C .1   / Tt1 Tt D  .St  St1 / C .1   / Tt1 Tt D  .St  St1 / C .1   / Tt1


0 1 0 1
k .i/ .i/ @
X . j/   .i/ .i/ .i/ @
Y . j/   .i/
b
Xt .k/ D St C kTt C 'AR  "t It D ı Xt  St  Itsj A C 1  ı .i/ Itsi It D ı Xt =St Itsj A C 1  ı .i/ Itsi
j¤i j¤i
X .i/
Y .i/
b k b k
Xt .k/ D St C kTt C Itsi Ck C 'AR  "t Xt .k/ D .St C kTt / Itsi Ck C 'AR  "t
i i
! !
X .i/
Y .i/
Damped additive St D ˛Xt C .1  ˛/ .St1 C Tt1 / St D ˛ Xt  Itsi C .1  ˛/ .St1 C Tt1 / St D ˛ Xt = Itsi C .1  ˛/ .St1 C Tt1 /
i i

Tt D  .St  St1 / C .1   / Tt1 Tt D  .St  St1 / C .1   / Tt1 Tt D  .St  St1 / C .1   / Tt1


0 1 0 1
k
X  
Competitive Models for the Spanish Short-Term Electricity Demand Forecasting

.i/
X . j/  .i/ .i/
Y . j/  .i/
b i k .i/ @ .i/ @
Xt .k/ D St C Tt C 'AR  "t It D ı Xt  St  Itsj A C 1  ı .i/ Itsi It D ı Xt =St Itsj A C 1  ı .i/ Itsi
iD1 j¤i j¤i

k k
!
X X .i/
X Y .i/
b i k b i k
Xt .k/ D St C Tt C Itsi Ck C C'AR  "t Xt .k/ D St C Tt Itsi Ck C 'AR  "t
iD1 i iD1 i

(continued)
223
Table 2 (continued)
224

XX
Additive Multiplicative
XXXSeason None
Trend XXX
! !
X .i/
Y .i/
Multiplicative St D ˛Xt C .1  ˛/ .St1 Rt1 / St D ˛ Xt  Itsi C .1  ˛/ St1 Rt1 St D ˛ Xt = Itsi C .1  ˛/ St1 Rt1
i i
 
St
Rt D  St1 C .1   / Rt1 Rt D  .St =St1 / C .1   / Rt1 Rt D  .St =St1 / C .1   / Rt1
0 1 0 1
k .i/ .i/ @
X . j/   .i/ .i/ .i/ @
Y . j/   .i/
b
Xt .k/ D St Rkt C 'AR  "t It D ı Xt  St  Itsj A C 1  ı .i/ Itsi It D ı Xt =St Itsj A C 1  ı .i/ Itsi
j¤i j¤i
X .i/   Y .i/
b k b k
Xt .k/ D St Rkt C Itsi Ck C 'AR  "t Xt .k/ D St Rkt Itsi Ck C 'AR  "t
i i
! !
  X .i/
Y .i/
Damped multiplicative St D ˛Xt C .1  ˛/ St1 Rt1 St D ˛ Xt  Itsi C .1  ˛/ St1 Rt1 St D ˛ Xt = Itsi C .1  ˛/ St1 Rt1
i i
 
St
Rt D  St1 C .1   / Rt1 Rt D  .St =St1 / C .1   / Rt1 Rt D  .St =St1 / C .1   / Rt1
k
X
i 0 1 0 1
iD1 .i/
X . j/   .i/ .i/
Y . j/   .i/
b k
Xt .k/ D St Rt C 'AR  "t It D ı .i/ @X t  St  Itsj A C 1  ı .i/ Itsi It D ı .i/ @X t =St Itsj A C 1  ı .i/ Itsi
j¤i j¤i

k
X k
X
i i

iD1
X .i/ iD1
Y .i/
b k b k
Xt .k/ D St Rt C Itsi Ck C 'AR  "t Xt .k/ D St Rt Itsi Ck C 'AR  "t
i i

Rows are sorted by trend method, whereas columns are sorted by season. Here Xt are the observed values, St is the level, Tt is the additive trend, Rt is the multiplicative
(i)
Xt .k/ is the k-ahead forecast. ˛,  , ı are the smoothing parameters, whereas is the damping
trend, I t is the seasonal index of the i seasonality (of cycle length si ) and b
factor, and 'AR is the adjustment with the AR(1) error. "t is the forecast error
J.C. García-Díaz and Ó. Trull
Competitive Models for the Spanish Short-Term Electricity Demand Forecasting 225

3.2 Computational Strategy for Optimising the Smoothing


Parameters

The model evaluation is based on the one-step-ahead prediction errors for obser-
vations within the evaluation period. One-step-ahead predictions are generated
from the model specifications and parameter estimates. Smoothing parameters are
determined in order to minimise the root of the mean of squared one-step-ahead
prediction errors (RMSE), as defined in (6).
r
1 Xb 2
RMSE D Xt  Xt : (6)
N

The parameters were constrained to [0;1], and they were all optimised simulta-
neously. The main problem is that these optimisation algorithms strongly depend
on the starting values used for the algorithm. Therefore, the optimisation was
performed  evaluating 10.000 initial vectors of smoothing parameters v D
by first
˛; ; ı .i/ ; ; 'AR , obtained randomly from a grid of dimensions of the number
of parameters to optimise, ranging from 0 to 1. The RMSE of each vector was
evaluated, and the ten vectors with the lowest RMSE were used as starting values
to perform an optimisation algorithm. The vector with the lowest RMSE provides
the smoothing parameters for the model. Since nHWT equations are recursive, the
model must be initialised. This procedure is explained in [20]. For the additive
trend models, the initialisation method proposed by Taylor [16] is used. For the
multiplicative trend models, we adapted the Holt–Winters [18] method to a multiple
seasonal pattern. Level is initialised following [40]. The seasonal indices for each
seasonality are computed as the average of dividing each cycle by its mean. Finally,
the indices are weighted by dividing the longer seasonal period indices by the shorter
ones.

4 Modelling Approach and Results

In order to find the best predictive model, the work is divided into two stages: first,
a search is performed for the in-sample forecasting model, and in a second step, a
validation of out-of-sample forecasting is carried out.
The optimisation problem will be solved using a methodology specifically
®
developed and implemented in the software program running on the MATLAB
environment. In this Section, an extensive set of computational results and com-
parative studies are presented to study the performance of the proposed nHWT
models for short-term demand forecasting implemented in the software package.
The results are compared numerically with those obtained by the REE model in the
same periods.
226 J.C. García-Díaz and Ó. Trull

MAPE and symmetrical MAPE (sMAPE), defined in Eqs. (7) and (8), respec-
tively, were used to compare the prediction accuracy.
ˇ ˇ
1 X ˇˇ Xt  b
Xt ˇˇ
MAPE D ˇ ˇ  100; (7)
N ˇ Xt ˇ
ˇ ˇ
Xt ˇˇ
1 X ˇˇ Xt  b
sMAPE D ˇ ˇ  100: (8)
N Xt ˇ
ˇ Xt C b

Some authors, e.g. [41], have proposed the sMAPE as the best performance
measure to select among models. It is an average measure of the forecast accuracy
across a given forecast horizon, and it provides a global measurement of the
goodness of fit, while MAPE only provides point measures. Thus, in order to select
the best model to forecast future demands, a competition is carried out with all the
models present in Table 1, where the model with the lowest sMAPE for the forecasts
provided is chosen.

4.1 Model Selection

The following scenario was established to conduct the competition: Four time
windows from 2013 were randomly selected, one for each season (spring, summer,
autumn and winter), in order to address different climate conditions and three
alternatives, according to the methodology:
• Double seasonal (DS) and triple seasonal (TS) models using the full data set from
July 2007 until the day immediately before the forecasting period.
• Double seasonal models using only the seven weeks immediately before the
forecasting period, in order to check the influence of the data length on
predictions.
The REE model was also included as a reference. Because REE updates the
forecasts each hour, the analysis uses an hourly re-estimation for all models, in
order to establish the same comparison scenario. Table 3 summarises the results
obtained by comparing the sMAPE of the best models on four random days, each
corresponding to a different season.
DS methods using the full data set outperformed the other alternatives, even
though the TS method was expected to be the most accurate, mainly due to the
influence of national holidays (in Spain, Christmas ends on 6 January, and 12
October is Hispanic Day). Other authors [23] smooth the series in order to avoid
these unusual day effects, but in this case the serial has not been reworked or
Competitive Models for the Spanish Short-Term Electricity Demand Forecasting 227

Table 3 Symmetrical MAPE comparisons of the most accurate models, each alternative studied,
and the REE model
14 January 22 April 8 July 13 October
Alternative Models 2013 2013 2013 2013
Double 7 Weeks AMC24,168 0.741 0.689 0.351 0.611
dMC24,168 0.761 0.744 0.343 0.588
MMC24,168 0.830 1.265 0.982 0.850
Double Full Data set AMC24,168 0.508 0.576 0.303 0.634
DMC24,168 0.505 0.577 0.304 0.633
MMC24,168 0.503 0.578 0.302 0.634
Triple Full Data set NAC24,168,8760 0.590 0.579 0.355 0.591
AAC24,168,8760 0.750 0.645 0.417 0.619
DAC24,168,8760 0.730 0.634 0.426 0.624
REE 0.449 0.534 0.416 0.532
Results are shown for four different days, one in each season

adapted. Figure 2 shows the evolution of the MAPE within the forecast horizon
of all the selected models compared to the REE. We used the MAPE graphically, as
it makes it feasible to understand the evolution of the accuracy with the lead time
used. The models that show the best performance are the AMC24,168 , the MMC24,168
and the DMC24,168 . Results reveal that our models perform similarly to REE, and in
some cases, they outperform it.
The Diebold–Mariano test [42] compares the forecast accuracy of two competing
forecasting methods. We proposed and evaluated explicit tests to check whether
there were significant differences among the selected models, and no significant
differences were found. Thus, the AMC24,168 model is chosen, as the DMC24,168
uses more parameters, and the MMC24,168 requires more computational effort.

4.2 Validation

The validation process was carried out by comparing real time forecasts proposed
by REE and forecasts obtained with the selected model. The validation took place
during the weeks from 3 April 2014 to 9 April 2014, the last dates available at
the time of the study. In order to provide forecasts for these dates, models were
optimised using the full data set until the forecasting period, and 24-h forecasts were
provided. Figure 3 depicts the evolution of the MAPE obtained during the validation
process by the AMC24,168 model and REE. In this case, we use the MAPE because
we want to observe the short-term accuracy evolution compared to the forecast
horizon. The left graph shows the first case, in which 12-h forecast horizon results
are compared, as REE only forecasted until 22:50 h. The accuracy obtained with our
model is around 0.6 %, whereas the REE model reaches 0.9 %. On the graph on the
228 J.C. García-Díaz and Ó. Trull

14 Jan. 2013 9 22 Apr. 2013


1.8
7

MAPE(%)
MAPE(%)

1.4
5
1
3

0.6 1

10 20 10 20
Forecast Horizon (hours)
1.5 8 July 2013 13 Oct. 2013
2

1 MAPE(%) 1.6
MAPE(%)

1.2
0.5
0.8

0 0.4
0 10 20 10 20
Forecast Horizon (hours)

AMC DMC MMC REE

Fig. 2 MAPE evolution for the selected double seasonal models, compared to the REE model,
in a forecast horizon of 24 h in four seasons. Top-left panel depicts comparisons for 14 January
2013, and the right 22 April, while the bottom-left panel shows 8 July, and the right 13 October.
Differences among the forecasts provided by AMC, DMC and MMC are not significant; therefore,
it looks like the graphs overlap

3 Apr. 2014 7 Apr. 2014 9 Apr. 2014


6 2.5
0.9 5 2
MAPE(%)

0.8 4
0.7 1.5
0.6 3
2 1
0.5
0.4 1 0.5
0 0
11:00 17:00 22:00 2:00 10:00 18:00 4:00 10:00 17:00 24:00

REE REE(reforecast) AMC DS AMC DS (reforecast)

Fig. 3 Forecasting accuracy evolution of the AMC24,168 model compared to the REE, measured
using MAPE along the forecast horizon. The left panel shows the results for 3 April 2014, and the
right panel for 9 April 2014. The middle panel depicts the MAPE for 7 April 2014 provided by
the AMC24,168 model forecast at 0:00 h, compared to the REE provided at 10:30, as well as the
re-forecast with the AMC24,168 model at 13:00, compared to the REE at 17:00
Competitive Models for the Spanish Short-Term Electricity Demand Forecasting 229

right, the 9 April 2014 results are analysed. The AMC24,168 has an average MAPE
of around 1 %, outperforming the REE, which reaches 1.5 %.
In order to check the horizon in which our model could provide accurate
forecasts, a forecast was made 24 h ahead, from 01:00 to 24:00 h on 7 April
2014, with AMC24,168 . The model outperforms the forecasts provided by REE at
10:30, which asymptotically reaches the performance obtained by the AMC24,168 .
A re-forecast made at 13:00 with AMC24,168 enhances the results, whereas the re-
estimation by REE at 17:00 h enhances its performance, but decreases drastically
after only a few hours, reaching the MAPE obtained by the original estimation with
AMC24,168 .

5 Conclusions

This paper presents an alternative methodology for forecasting the Spanish short-
term electricity demand, using models that parsimoniously provide forecasts com-
parable to the REE model.
The paper proposes a generalisation of the HWT models to n-multiple seasonal
models, and new initialisation criteria are developed for multiple seasonal com-
ponents. The methodology was implemented in a software application using the
®
MATLAB environment to predict hourly electricity demand in Spain by selecting
from 30 multiple Holt–Winters models with improved optimisation of smoothing
parameter methods.
An analysis using the Spanish data is conducted in two stages. The purpose of
the first stage is to select the best model for the forecasts. Three alternatives, using
double and triple seasonal methods for all models, were analysed by computing
forecasts using the implemented models. As a result, the AMC24,168 model is
selected. In a second stage, a validation analysis is conducted by making real time
forecasts and comparing them to REE.
The REE model has more than 100 parameters that are estimated hourly, and
it has a performance of around 2 % in terms of MAPE, whereas the methodology
presented here shows results similar to those obtained by the REE, obtaining MAPE
between 0.6 and 2 % in the time period considered. However, a maximum of
five parameters are optimised in the proposed models, significantly reducing the
computational effort.
Additional conclusions are drawn from comparing the models. The double
seasonal method obtains better forecasts than the triple seasonal, due to the fact
that the series is neither adapted nor reworked when triple seasonal methods are
used.
230 J.C. García-Díaz and Ó. Trull

References

1. Pino, R., Parreño, J., Gomez, A., Priore, P.: Forecasting next-day price of electricity in the
Spanish energy market using artificial neural networks. Eng. Appl. Artif. Intell. 21, 53–62
(2008)
2. Cancelo, J.R., Espasa, A., Grafe, R.: Forecasting the electricity load from one day to one week
ahead for the Spanish system operator. Int. J. Forecast. 24(4), 588–602 (2008)
3. Nogales, F.J., Conejo, A.J.: Electricity price forecasting through transfer function models. J.
Oper. Res. Soc. 57(4), 350–356 (2006)
4. Conejo, A.J., Contreras, J., Espínola, R., Plazas, M.A.: Forecasting electricity prices for a day-
ahead pool-based electric energy market. Int. J. Forecast. 21, 435–462 (2005)
5. Muñoz, A., Sánchez-Úbeda, E., Cruz, A., Marín, J.: Short-term forecasting in power systems:
a guided tour. In: Rebennack, S., Pardalos, P.M., Pereira, M.V.F., Iliadis, N.A. (eds.) Handbook
of Power Systems II, pp. 129–160. Springer, Berlin/Heidelberg (2010)
6. Weron, R.: Electricity price forecasting: a review of the state-of-the-art with a look into the
future. Int. J. Forecast. 30(4), 1030–1081 (2014)
7. Carpinteiro, O., Reis, A., Silva, A.: A hierarchical neural model in short-term load forecasting.
Appl. Soft Comput. 4, 405–412 (2004)
8. Darbellay, G.A., Slama, M.: Forecasting the short-term demand for electricity: do neural
networks stand a better chance? Int. J. Forecast. 1(16), 71–83 (2000)
9. Pardo, A., Meneu, V., Valor, E.: Temperature and seasonality influences on Spanish electricity
load. Energy Econ. 24(1), 55–70 (2002)
10. Pedregal, D.J., Trapero, J.R.: Mid-term hourly electricity forecasting based on a multi-rate
approach. Energy Convers. Manag. 51, 105–111 (2010)
11. Taylor, J.W., Buizza, R.: Using weather ensemble predictions in electricity demand forecasting.
Int. J. Forecast 19(1), 57–70 (2003)
12. Soares, L.J., Medeiros, M.C.: Modeling and forecasting short-term electricity load: a compar-
ison of methods with an application to Brazilian data. Int. J. Forecast 24(4), 630–644 (2008)
13. Chatfield, C., Yar, M.: Holt-Winters forecasting: some practical issues. The Statistician 37,
129–140 (1988)
14. Gardner, E.S.: Exponential smoothing: the state of the art. J. Forecast. 4, 1–28 (1985)
15. Gardner Jr., E.S.: Exponential smoothing: the state of the art, part II. Int. J. Forecast. 22, 637–
666 (2006)
16. Taylor, J.W.: Short-term electricity demand forecasting using double seasonal exponential
smoothing. J. Oper. Res. Soc. 54(8), 799–805 (2003)
17. Box, G., Jenkins, G.M., Reinsel, G.: Time Series Analysis: Forecasting & Control. Prentice-
Hall, Englewood Cliffs, NJ (1994)
18. Winters, P.R.: Forecasting sales by exponentially weighted moving averages. Management 6,
324–342 (1960)
19. Taylor, J.W.: An evaluation of methods for very short-term load forecasting using minute-by-
minute British data. Int. J. Forecast. 24(4), 645–658 (2008)
20. Taylor, J.W.: Triple seasonal methods for short-term electricity demand forecasting. Eur. J.
Oper. Res. 204(1), 139–152 (2010)
21. Arora, S., Taylor, J.W.: Short-term forecasting of anomalous load using rule-based triple
seasonal methods. Power Syst. IEEE Trans. 28(3), 3235–3242 (2013)
22. Taylor, J.W., McSharry, P.E.: Short-term load forecasting methods: an evaluation based on
European data. Power Syst. IEEE Trans. 22(4), 2213–2219 (2007)
23. Taylor, J.W., de Menezes, L.M., McSharry, P.E.: A comparison of univariate methods for
forecasting electricity demand up to a day ahead. Int. J. Forecast. 22(1), 1–16 (2006)
24. Corberán-Vallet, A., Bermúdez, J.D., Vercher, E.: Forecasting correlated time series with
exponential smoothing models. Int. J. Forecast. 27(2), 252–265 (2011)
25. Bermúdez, J.D.: Exponential smoothing with covariates applied to electricity demand forecast.
Eur. J. Ind. Eng. 7(3), 333–349 (2013)
Competitive Models for the Spanish Short-Term Electricity Demand Forecasting 231

26. Pierrot, A., Goude, Y.: Short term electricity load forecasting with generalized additive models.
In: Proceedings of 16th International Conference Intelligent System Applications to Power
Systems, pp. 410–415. New York: Institute of Electrical and Electronics Engineers (2011)
27. Fan, S., Hyndman, R.J.: Short-term load forecasting based on a semi-parametric additive
model. Power Syst. IEEE Trans. 27(1), 134–141 (2012)
28. Ba, A., Sinn, M., Goude, Y., Pompey, P.: Adaptive learning of smoothing functions: application
to electricity load forecasting. In: Bartlett, P., Pereira, F., Burges, C., Bottou, L., Weinberger,
K. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 2519–2527. MIT
Press, Cambridge (2012)
29. Antoch J., Prchal, L., DeRosa, M., Sarda, P.: Functional linear regression with functional
response: application to prediction of electricity consumption. In: Proceedings of the Func-
tional and Operatorial Statistics, IWFOS 2008. Springer, Heidelberg (2008)
30. Cho, H., Goude, Y., Brossat, X., Yao, Q.: Modeling and forecasting daily electricity load
curves: a hybrid approach. J. Am. Stat. Assoc. 108, 7–21 (2013)
31. Vilar, J.M., Cao, R., Aneiros, G.: Forecasting next-day electricity demand and price using
nonparametric functional methods. Int. J. Electr. Power Energy Syst. 39(1), 48–55 (2012)
32. Aneiros-Pérez, P., Vieu, G.: Nonparametric time series prediction: a semifunctional partial
linear modeling. J Multivar Anal 99, 834–857 (2008)
33. Harvey, A., Koopman, S.: Forecasting hourly electricity demand using time-varying splines. J.
Am. Stat. Assoc. 88, 1228–1253 (1993)
34. Dordonnat, V., Koopman, S.J., Ooms, M., Dessertaine, A., Collet, J.: An hourly periodic state
space model for modelling French national electricity load. Int. J. Forecast. 24(4), 566–587
(2008)
35. Hyndman, R., Koehler, A.B., Ord, J.K., Snyder, R.D.: Forecasting with Exponential Smooth-
ing: The State Space Approach. Springer, Heidelberg (2008)
36. So, M.K.P., Chung, R.S.W.: Dynamic seasonality in time series. Comput. Stat. Data Anal. 70,
212–226 (2014)
37. Weron, R.: Modeling and Forecasting Electricity Loads and Prices: A Statistical Approach.
Wiley, Chichester (2006)
38. Valor, E., Meneu, V., Caselles, V.: Daily air temperature and electricity load in Spain. J. Appl.
Meteorol. 40(8), 1413–1421 (2001)
39. Taylor, J.W.: Exponential smoothing with a damped multiplicative trend. Int. J. Forecast. 19(4),
715–725 (2003)
40. Chatfield, C.: The Holt-Winters forecasting procedure. Appl. Stat. 27, 264–279 (1978)
41. Makridakis, S., Hibon, M.: The M3-competition: results, conclusions and implications. Int. J.
Forecast. 16(4), 451–476 (2000)
42. Diebold, F.X., Mariano, R.S.: Comparing predictive accuracy. J. Bus. Econ. Stat. 13(3), 253–
263 (1995)
Age-Specific Death Rates Smoothed
by the Gompertz–Makeham Function
and Their Application in Projections
by Lee–Carter Model

Ondřej Šimpach and Petra Dotlačilová

Abstract The aim of this paper is to use stochastic modelling approach (Lee–Carter
model) for the case of age-specific death rates for the Czech population. We use
an annual empirical data from the Czech Statistical Office (CZSO) database for the
period from 1920 to 2012. We compare two approaches for modelling between each
other, one is based on the empirical time series of age-specific death rates and the
other one is based on smoothed time series by the Gompertz–Makeham function,
which is currently the most frequently used tool for smoothing of mortality curve
at higher ages. (Our review also includes a description of other advanced models
which are commonly used.) Based on the results of mentioned approaches we
compare two issues of time series forecasting—variability and stability. Sometimes
stable development of time series can be the correct issue which ensure significant
and realistic prediction, sometimes not. In the case of mortality it is necessary to
consider both unexpected or stochastic changes and long-term stable deterministic
trend. Between them we have to find a mutual compromise.

Keywords Demographic projection • Gompertz–Makeham function • Lee–


Carter model • Life expectancy • Time series of mortality

1 Introduction and Literature Review

Trend of mortality is one of the most important indicator of standard of living. Mor-
tality is an important component of population’s reproduction and its development
is very interesting topic for demographers and actuaries. If mortality is going to be
better, then people live longer. The reason for the improvement in mortality and also
for increase in life expectancy (labelled ex;t ) could be better health care. The second

O. Šimpach () • P. Dotlačilová


Department of Statistics and Probability, University of Economics Prague, Faculty of Informatics
and Statistics, W. Churchill sq. 4, 130 67 Prague, Czech Republic
e-mail: [email protected]; [email protected]
http://fis.vse.cz/

© Springer International Publishing Switzerland 2016 233


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_18
234 O. Šimpach and P. Dotlačilová

one is greater interest in healthy life style. On the other hand, the increase of ex;t
means population ageing [19]. More people live to the highest ages, so it is very
important to have better imagination about mortality at these ages. In the past it was
not so important, because only a few people live to the highest ages.
The level of mortality affects the length of human life. When we analyse the
development of mortality, it is important to know that the biggest changes of
mortality come out at higher ages (approximately 60 years and above), where
mortality has different character in comparison with lower ages. It is caused not
only by small numbers of deaths .Dx;t /, but also by small numbers of living at the
highest ages .Ex;t /. It is also necessary to realize that these data could be affected by
systematic and random errors. If we want to capture the most accurately mortality
of oldest people it is good idea to make minor adjustments in data matrix. This is
mainly related to smoothing of mortality curve and possibility of its extrapolation
until the highest ages. We can use several existing models for smoothing. The oldest
one (but still very often used) is the Gompertz–Makeham function [13, 20]. It is
suitable for the elimination of fluctuations in age-specific death rates (labelled mx;t )
and also for their subsequent extrapolation until the highest ages. The disadvantage
is that it can not be used for prediction of mortality and therefore neither for the
calculation of demographic projections ([1] or [25]).
Demographic projections of possible future evolution of population are essential
information channel, which is used for providing a key information about potential
evolution of mortality, birth rates, immigration and emigration, or other demo-
graphic statistics [24]. Each projection is based on the assumptions, which could
but might not be occurred. Stochastic demographic projections are based on the
main components [17], explaining trend, which is included in the development of
time series of age-specific demographic rates. A major influence on results has the
length of the time series (see, e.g., Coale and Kisker [8], or comparing the multiple
results of populations from study by Booth et al. [4]).
In this paper we focus on the evolution of data about mortality in the Czech
Republic, provided by the Czech Statistical Office [9]. The length of the time series
is sufficient for statistically significant projections, but the empirical data contain
high variability at the highest ages. We use two approaches for our analysis (see also
Simpach et al. [26]). The first one uses the empirical mx;t for the period from 1920 to
2012 and the second one uses smoothed values of mx;t by the Gompertz–Makeham
function. The first model contains unexpected variability in the time series, and the
second one is represented by stability and absence of unexpected changes. These
models will be evaluated and final projections of mx;t (and also estimated ex;t ) for
the Czech population until 2050 will be compared with each other.

2 Materials and Used Methods

For purposes of mortality analysis in the Czech Republic we use the data about
mortality from the Czech Statistical Office [9]: numbers of deaths at complete age
x Dx;t , and the exposure to risk Ex;t , which is estimated like mid-year population at
Age-Specific Death Rates Smoothed by the Gompertz–Makeham Function 235

age x (males and females separately in all cases). We use the annual data for the
reporting period from 1920 to 2012. The age-specific death rates (see, e.g., Erbas
et al. [10]) we calculate as

Dx;t
mx;t D (1)
Ex;t

and these empirical rates [in logarithms ln.mx;t /] we can see in 3D perspective charts
([7] or [14]) in Fig. 1.
It is important to know that mortality is influenced by systematic and random
errors (especially at higher ages). That is the reason why the modelling approach is
applied. Before we mention any function it is useful to notice the relation between

Fig. 1 Empirical data of mx;t in logs of the Czech males (top left) and females (top right). Bottom
left and right are these rates smoothed by the Gompertz–Makeham function. Source: Data CZSO
[9], authors’ calculations
236 O. Šimpach and P. Dotlačilová

mx;t and the intensity of mortality (labelled x;t ). The relationship could be written
as

mx D .x C 0:5/: (2)

For modelling of mortality at higher ages is very often used the Gompertz–
Makeham function, which can be written as

x D a C bcx ; (3)

where a, b and c are unknown parameters (more e.g. in [12] or [23]). Formula is
based on the exponential increase in mortality with an increasing age. There is well-
known fact that the development of mortality is different in each population. Due to
this issue we have more existing functions for mortality modelling. Let us mention
two logistic functions: Kanisto model

aebx
x D ; (4)
1 C aebx

where a and b are unknown parameters and Thatcher model

aebx
x DcC ; (5)
1 C aebx

which is enriched by one more parameter c (more in [12]). Both functions could
be included among more optimistic ones, because they assume slow increase of
mortality with an increasing age. They are suitable for population with long-term
low level of mortality. On the other hand, we can mention functions which are
suitable for population with higher level of mortality (i.e. Gompertz–Makeham
function written above (3) and Coale–Kisker model). It is defined like
2 CbxCc
x D eax ; (6)

where a, b and c are also unknown parameters of this model. The third group
includes models which are useful somewhere between two previous groups. Let
us mention, e.g., Heligman–Pollard model

GH x
qx D ; (7)
1 C GH x

where G and H are unknown parameters, qx is in this case the probability of dying—
model is designed for estimation in this form. (From algorithm of life tables we need
to know the relationship between mx;t and qx;t . Probability of surviving (px;t ) follows
the Gompertz law of mortality (see Gompertz [13]) px;t D emx;t and probability of
Age-Specific Death Rates Smoothed by the Gompertz–Makeham Function 237

dying is supplement to 1 as qx;t D 1  px;t .) Another Weibull model we can express


like

mx D bxa ; (8)

a and b are unknown parameters and this model is designed for estimation in “age-
specific death rates’ form”.
We return our attention to original Gompertz–Makeham function (3), then we
estimate their parameters in DeRaS software (see Burcin et al. [6]) and after we
calculate values of smoothed mx;t for males and females (see also Fig. 1). It is known
that the instability of the time series reduces their predictive capability ([2] or [11]).
The history although has the lowest weight in the prediction model. But for the
modelling of mortality, which is a long-term process that it has for each population
its long-term trend, the history even with a little weight could be quite important [4].
Interesting idea is to consider our two data matrices, (one empirical and the other
one smoothed), for calculation of mortality forecast up to the year 2050. The logs
of mx;t can be decomposed ([17] or [18]) as

ln mx;t D ax C bx kt C "x;t ; (9)

where x D 0; 1; : : : ; .!  1/, t D 1; 2; : : : ; T, ax are the age-specific mortality


profiles independent of time, bx are the additional age-specific components which
determine how much each age group changes when kt changes, kt are the time-
varying parameters—mortality indices and "x;t is the error term. mx;t at age x and
year t create .!  1/  T dimensional matrix

M D A C BK> C E; (10)

and the identification of Lee–Carter model is ensured by

!1
X X
T
bx D 1and kt D 0: (11)
xD0 tD1

The estimation of parameters bx and kt is based on the Singular Value Decompo-


sition (SVD) of matrix of mx;t , presented by Bell and Monsell [3], Lee and Carter
[17] or Lundstrom and Qvist [19] and finally
PT
ln mx;t
ax D tD1
(12)
T
is the simple arithmetic average of the logs of mx;t .
In the past there was often used the approach of linear extrapolation of the logs
of mx;t over time. The sufficient information was that the series of ln.mx;t / are
approximately linear in each age x and also decreasing over time. There was possible
238 O. Šimpach and P. Dotlačilová

to conclude, that if we find suitable intercept (b0x ) and slope (b1x ) of linear regression,
we can easily make linear extrapolation to the future as

ln mx;t D b0x C b1x t C "x;t ; (13)

where t is time variable. In Fig. 2 (top left and right) are shown the logs of mx;t for
males and females in “rainbow” chart over time (see study by Hyndman [14]), while
on the bottom charts are represented male’s and female’s logs of mx;t smoothed
by the Gompertz–Makeham function. There are ages at which the development
of time series is approximately linear and actually decreasing. But especially at
the advanced ages in the case of empirical data (top charts), we can see greater
variability, which cannot be explained by linear models only.

Fig. 2 Empirical logs of mx;t of the Czech males (top left) and females (top right) over time.
Bottom left and right is shown the development of these rates smoothed by the Gompertz–
Makeham function. Source: Data CZSO [9], author’s calculations
Age-Specific Death Rates Smoothed by the Gompertz–Makeham Function 239

Because we want to explain the major components of mortality of the Czech


population, we use stochastic Lee–Carter model (9), where for the purposes of
prediction of the future mx;t it is necessary to forecast values of parameter kt only.
This forecast is mostly calculated by ARIMA.p; d; q/ models ([5] or [21]). Values of
the parameters ax and bx are independent of time and the prediction using the Lee–
Carter model is therefore purely extrapolative (see the proof by Lee and Tuljapurkar
[18]).
In the last part we estimate ex;t according to predicted mx;t of both models, which
we calculate with the knowledge of life tables algorithm. The relationship between
mx;t and the probability of surviving (px;t )/dying (qx;t ) was given in relation with
Heligman–Pollard and Weibull model. The next part of the calculation relates to
tabular (i.e. imaginary) population. At first, we select the initial number of live births
in tabular population: l0;t D 100,000. Based on knowledge of px;t , we are able to
calculate the number of survivors in further exact ages as

lxC1;t D lx;t px;t ; (14)

where lx;t is the number of survivors at the exact age x from the default file
of 100,000 live births from tabular population. The number of deaths in tabular
population is given by equation

dx;t D lx;t qx;t : (15)

After that we calculate the number of lived years (Lx;t ), and the number of years of
remaining life (Tx;t ) like

L0;t D l0;t  0:85 d0;t for x D 0; (16)

(where 0:85 D ˛ is the proportion of lower elementary file of died),

lx;t C lxC1;t
Lx;t D for x > 0; (17)
2
and

Tx;t D TxC1;t C Lx;t : (18)

Finally we obtain life expectancy like

Tx;t
ex;t D : (19)
lx;t
240 O. Šimpach and P. Dotlačilová

3 Results and Discussion

Using the SVD method implemented in the package “demography” [14], which is
developed for RStudio [22], we estimate the parameters aO x (age-specific mortality
profiles independent of time) and bO x (additional age-specific components determine
how much each age group changes when kt changes) for both Lee–Carter’s model.
We can see them in Fig. 3, from which it is also clear the comparison between the
different evolutions of these parameters, depending on the input variability.
The mortality indices kO t (the time-varying parameters) were estimated for both
models (empirical and smoothed) and it was found that the results are almost
identical. This is due to the fact that these indices are also almost independent on
the input variability of mx;t . We can see these estimates in Fig. 4. For these estimates
we calculated the predictions up to the year 2050 based on the methodological
approach of ARIMA, [5] and ran by “forecast” package in R [15, 16]. Results are
four ARIMA(1,1,0) models with drifts (see Table 1). Parameters AR(1) signed by
 are equal to zero at the 5 % significance level. From these predictions with 95 %
confidence intervals, (which can be seen in Fig. 4 too) it is clear, that models for
females provide slightly lower values of these estimates.
Now we evaluate all Lee–Carter’s models on the basis of approach, which is
presented by Charpentier and Dutang [7]. Using the RStudio we display Pearson’s
residues in “rainbow” bubble charts firstly for the empirical males’ model, secondly
for the empirical females’ model, thirdly for the smoothed males’ model and lastly
for the females’ smoothed model. Each model is evaluated on the basis of the
residues by age x (left charts) and of the residues at time t (right charts). Most
residues are concentrated around 0, higher variability is explained by the estimated

Fig. 3 Comparison of two Lee–Carter’s models for males and females—the estimates of param-
eter aO x and bO x . Wheels represent model based on the empirical data matrix, lines represent model
based on smoothed data matrix. Source: authors’ calculations
Age-Specific Death Rates Smoothed by the Gompertz–Makeham Function 241

Fig. 4 Comparison of two Lee–Carter’s models—the estimates of the time-varying parameters


kO t —mortality indices for males and females. Source: authors’ calculations

Table 1 Estimated parameters of ARIMA models


AR (s.e.) Drift (s.e.)
Empirical model—males ARIMA(1,1,0) with drift 0:0546 0:1043 2:2153 0:6313
Empirical model—females ARIMA(1,1,0) with drift 0:2182 0:1020 2:7256 0:4844
Smoothed model—males ARIMA(1,1,0) with drift 0:0835 0:1042 2:2210 0:6184
Smoothed model—females ARIMA(1,1,0) with drift 0:2321 0:1019 2:6888 0:4776
Source: authors’ calculations

model. The Pearson’s residues for empirical and smoothed models (males and
females) are shown in Fig. 5. Because the evaluation results are good, we proceed
to fit and then to estimate the future values of ln(mx;t ) using parameters aO x , bO x and kO t
of both Lee–Carter’s models for males and for females.
Let us use the Lee–Carter model (9) in the following form

ln mx;t D aO x C bO x kO t ; (20)

where x D 0; 1; : : : ; 100 and t D 1920; : : : ; 2050. Obtained values with the attached
estimates of ln(mx;t ) based on the empirical/smoothed data matrix are displayed in
Fig. 6. It is evident that the empirical model provides more variable values of ln(mx;t )
than the smoothed model, especially at the highest ages (60+).
We use life tables algorithm and estimate life expectancy at birth (e0;t ), which is
one of the most important statistic result from demographic forecasts. In 5-year time
periods we show them in Table 2, estimated on the basis of empirical and smoothed
model. We believe that model which is based on smoothed data by the Gompertz–
Makeham function provides the prediction closer to reality. Mortality is explained
and predicted by its main components which is much more sophisticated approach
242 O. Šimpach and P. Dotlačilová

Fig. 5 Diagnostic control of two Lee–Carter’s models—Pearson’s residues for males and females.
Source: author’s calculations
Age-Specific Death Rates Smoothed by the Gompertz–Makeham Function 243

Fig. 6 Fitted values of ln(mx;t ) for the Czech males (left charts) and for females (right charts) for
the period from 1920 to 2012 with attached forecasts of these rates. Top charts are constructed by
model based on empirical data, bottom charts by model based on smoothed data by the Gompertz–
Makeham function. Source: authors’ calculations

than expects, e.g., a simple linear decreasing (see Fig. 2: there is a risk that the trend
will not be sufficiently explained and there remain the unexplained variability in
residues). Another study in the Czech Republic [1] also predicts death rates, but
used approach was on the basis of shortened data matrix of mx;t (since 1950) and
without smoothing.
244 O. Šimpach and P. Dotlačilová

Table 2 Estimated life expectancy at birth (e0;t ). Source: authors’ calculations


2015 2020 2025 2030 2035 2040 2045 2050
Empirical model—males 76:05 76:47 76:89 77:29 77:69 78:07 78:45 78:82
Empirical model—females 82:12 82:65 83:15 83:63 84:09 84:52 84:94 85:34
Smoothed model—males 76:01 76:44 76:86 77:27 77:67 78:06 78:44 78:82
Smoothed model—females 81:93 82:47 82:98 83:46 83:93 84:37 84:80 85:21

4 Conclusion

In our paper we examined whether the Lee–Carter’s model provides better predic-
tions of future ln(mx;t ), which are based on the empirical data matrix or on smoothed
data matrix obtained by the Gompertz–Makeham function (which is currently
the most famous one for modelling and extrapolating of mortality curves). The
advantage of empirical model was that we analysed data without any modifications.
Residues of both models seem to be favourable. On the basis of this it is no doubt
about one of the used models. But if we look at our results in Fig. 6, we can see,
that the ln(mx;t ) decline through the all age groups in the smoothed model only,
what is correctly related to the law of Gompertz mortality. This may involve the one
important conclusion. From our comparison we can claim that the model based
on smoothed data fits better the reality, because it refers to the expected future
development of the analysed population.

Acknowledgements This paper was supported by the Czech Science Foundation project No.
P402/12/G097 DYME—Dynamic Models in Economics.

References

1. Arltova, M.: Stochasticke metody modelovani a predpovidani demografickych procesu [habil-


itation thesis], 131 p. University of Economics Prague, Prague (2011)
2. Bell, W.R.: Comparing and assessing time series methods for forecasting age-specific fertility
and mortality rates. J. Off. Stat. 13(3), 279–303 (1997)
3. Bell, W.R., Monsell, B.: Using principal components in time series modelling and forecasting
of age-specific mortality rates. In: Proceedings of the American Statistical Association, Social
Statistics Section, pp. 154–159 (1991)
4. Booth, H., Tickle, L., Smith, L.: Evaluation of the variants of the Lee-Carter method of
forecasting mortality: a multi-country comparison. N. Z. Popul. Rev. 31(1), 13–34 (2005)
5. Box, G.E.P., Jenkins, G.: Time Series Analysis: Forecasting and Control, 537 pp. Holden-Day,
San Francisco (1970)
6. Burcin, B., Tesarkova, K.H., Komanek, D.: DeRaS: software tool for modelling mortality
intensities and life table construction. Charles University in Prague. http://deras.natur.cuni.cz
(2012)
7. Charpentier, A., Dutang, Ch.: L’Actuariat avec R [working paper]. Decembre 2012. Paternite-
Partage a l’indentique 3.0 France de Creative Commons, 215 pp. (2012)
Age-Specific Death Rates Smoothed by the Gompertz–Makeham Function 245

8. Coale, A.J., Kisker, E.E.: Mortality crossovers: reality or bad data? Popul. Stud. 40, 389–401
(1986)
9. CZSO: Life tables for the CR since 1920. Czech Statistical Office, Prague. https://www.czso.
cz/csu/czso/life_tables (2015)
10. Erbas, B., et al.: Forecasts of COPD mortality in Australia: 2006–2025. BMC Med. Res.
Methodol. 2012, 12–17 (2012)
11. Gardner Jr., E.S., McKenzie, E.: Forecasting trends in time series. Manag. Sci. 31(10), 1237–
1246 (1985)
12. Gavrilov, L.A., Gavrilova, N.S.: Mortality measurement at advanced ages: a study of social
security administration death master file. N. Am. Actuar. J. 15(3), 432–447 (2011)
13. Gompertz, B.: On the nature of the function expressive of the law of human mortality, and on
a new mode of determining the value of life contingencies. Philos. Trans. R. Soc. Lond. 115,
513–585 (1825)
14. Hyndman, R.J.: Demography: forecasting mortality, fertility, migration and population data. R
package v. 1.16. http://robjhyndman.com/software/demography/ (2012)
15. Hyndman, R.J., Shang, H.L.: Forecasting functional time series. J. Korean Stat. Soc. 38(3),
199–221 (with discussion) (2009)
16. Hyndman, R.J., Koehler, A.B., Snyder, R.D., Grose, S.: A state space framework for automatic
forecasting using exponential smoothing methods. Int. J. Forecast. 18(3), 439–454 (2002)
17. Lee, R.D., Carter, L.R.: Modeling and forecasting U.S. mortality. J. Am. Stat. Assoc. 87, 659–
675 (1992)
18. Lee, R.D., Tuljapurkar, S.: Stochastic population forecasts for the United States: beyond high,
medium, and low. J. Am. Stat. Assoc. 89, 1175–1189 (1994)
19. Lundstrom, H., Qvist, J.: Mortality forecasting and trend shifts: an application of the Lee-
Carter model to Swedish mortality data. Int. Stat. Rev. (Revue Internationale de Statistique)
72(1), 37–50 (2004)
20. Makeham, W.M.: On the law of mortality and the construction of annuity tables. Assur. Mag.
and J. Inst. Actuar. 8(1860), 301–310 (1860)
21. Melard, G., Pasteels, J.M.: Automatic ARIMA modeling including intervention, using time
series expert software. Int. J. Forecast. 16, 497–508 (2000)
22. R Development Core Team: R: A Language and Environment for Statistical Computing. R
Foundation for Statistical Computing, Vienna (2008)
23. Simpach, O.: Faster convergence for estimates of parameters of Gompertz-Makeham function
using available methods in solver MS Excel 2010. In: Proceedings of 30th International
Conference on Mathematical Methods in Economics, Part II, pp. 870–874 (2012)
24. Simpach, O.: Detection of outlier age-specific mortality rates by principal component method
in R software: the case of visegrad four cluster. In: International Days of Statistics and
Economics, pp. 1505–1515. Melandrium, Slany (2014)
25. Simpach, O., Pechrova, M.: The impact of population development on the sustainability of
the rural regions. In: Agrarian Perspectives XXIII – The Community-Led Rural Development,
pp. 129—136. Czech University of Life Sciences, Prague (2014)
26. Simpach, O., Dotlacilova, P., Langhamrova, J.: Effect of the length and stability of the time
series on the results of stochastic mortality projection: an application of the Lee-Carter model.
In: Proceedings ITISE 2014, pp. 1375–1386 (2014)
An Application of Time Series Analysis
in Judging the Working State of Ground-Based
Microwave Radiometers and Data Calibration

Zhenhui Wang, Qing Li, Jiansong Huang, and Yanli Chu

Abstract Time series analysis on clear-sky brightness temperature (TB) data


observed in the morning with ground-based microwave radiometer for atmospheric
remote sensing is adapted to judge the working state of the radiometer system
according to meteorological data variation features in terms of radiative transfer. The
TB data taken as the first example in this study was for the ground-based microwave
radiometer at Nanjing during the period from Nov. 27, 2010 to May 29, 2011.
The radiometer has 12 channels including five channels at 22.235, 23.035, 23.835,
26.235, and 30 GHz for sensing air humidity and liquid water content and seven
channels at 51.25, 52.28, 53.85, 54.94, 56.66, 57.29, and 58.80 GHz for sensing
air temperature profiles. The correlation coefficients between the TB readouts from
the radiometer and the simulated TB with radiosonde temperature and humidity
profiles as input to radiative transfer calculation are greater than 0.9 for the first five
channels while the correlation coefficients for the last seven channels are quite poor,
especially for the channels at lower frequency such as 51.25, 52.28, and 53.38 GHz,
at which the TB readout values in time series do not show the right atmospheric
temperature variation features as time goes from November (winter) through spring

Z. Wang () • Q. Li
CMA Key Laboratory for Aerosol-Cloud-Precipitation, Collaborative Innovation Center
on Forecast and Evaluation of Meteorological Disasters, Nanjing University of Information
Science and Technology, Nanjing 210044, China
School of Atmospheric Physics, Nanjing University of Information Science and Technology,
Nanjing 210044, China
e-mail: [email protected]
J. Huang
CMA Key Laboratory for Aerosol-Cloud-Precipitation, Collaborative Innovation Center
on Forecast and Evaluation of Meteorological Disasters, Nanjing University of Information
Science and Technology, Nanjing 210044, China
Y. Chu
Institute of Urban Meteorological Research, CMA, Beijing 100089, People’s Republic of China

© Springer International Publishing Switzerland 2016 247


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_19
248 Z. Wang et al.

to early summer (late May). The results show that the first five channels worked well
in the period while the last seven channels didn’t, implying that the radiometer need
to be maintained or repaired by manufacture. The methodology suggested by this
paper has been applied to the similar radiometers at Wuhan and Beijing for quality
control and calibration on observed TB data.

Keywords Microwave radiometer • Time series feature • Working state


monitoring

1 Introduction

Ground-based microwave radiometer has become a kind of instrument for atmo-


spheric observations with remote-sensing technique [1] and can operate continu-
ously with a typical temporal resolution of 1 s and satisfy the requirements in order
to continuously monitor the atmospheric boundary layer temperature and humidity
profiles [2–9] and unique liquid profiles [10–13], and may even detect lightning
[14]. It is therefore very important to know the working state of a ground-based
microwave radiometer.
The output from a microwave radiometer is brightness temperature (abbr. as TB),
which stands for electromagnetic energy received by the radiometer at a certain
reference frequency and must be inversed or assimilated in terms of the forward-
calculated TB with radiative transfer theory [15, 16]. Therefore the consistency
between the observed TB with radiometer and the forward-calculated TB with
radiative transfer theory is fatally important.
In this study, time series analysis on clear-sky TB data observed in the morn-
ing with ground-based microwave radiometers for atmospheric remote sensing
is adapted to judge the working state of radiometer systems according to the
consistency between the observed TB with radiometer and the forward-calculated
TB associated with meteorological data variation features in terms of radiative
transfer. The observed TB data used in this study were obtained with four ground-
based microwave radiometers, of which one was at Nanjing during the period from
Nov. 27, 2010 to May 29, 2011 [17], one at Beijing during the period from Jan. 1,
2010 to Dec. 31, 2011 [18], one at Wuhan in summers of both 2009 and 2010, and
another also at Wuhan but in Feb. 2010.
An Application of Time Series Analysis in Judging the Working State. . . 249

2 Clear-Sky Brightness Temperature Simulations


and Observations

The clear-sky atmospheric radiance in terms of brightness temperature measured


with a ground-based radiometer pointing upward can be simulated according to the
radiative transfer equation [16].
1
TB .0/ D TB .1/  .0; 1/ C ka .z/T.z/ .0; z/ sec dz; (1)
0

where
n z   o
 .0; z/ D exp  ka z0 sec dz0 ; (2)
0

is the transmittance of the air from height z down to the antenna (z D 0),  .0; 1/
is the transmittance of the whole atmosphere, and TB .1/ is the cosmic brightness
temperature and is taken as 2.9 K [16] for computation in this paper. T(z) is the
temperature profile, ka (z) is the absorption coefficient because of mainly oxygen
and water vapor in case of clear-sky and depends mainly on pressure, temperature,
humidity, and wave frequency [19, 20].
A software for brightness temperature simulation based Eq. (1) was designed and
has been used for many years [16] and the absorption coefficient, ka (z), is calculated
according to Liebe’s model [19, 20]. Atmospheric profiles needed for computation
in Eq. (1) can be obtained from either radiosonde or NCEP model outputs (http://
rda.ucar.edu/datasets/ds083.2/, doi:10.5065/D6M043C6).
The radiometer at Nanjing has 12 channels including five channels at 22.235,
23.035, 23.835, 26.235, and 30 GHz for sensing air humidity and liquid water
content and seven channels at 51.25, 52.28, 53.85, 54.94, 56.66, 57.29, and
58.80 GHz for sensing air temperature profiles. Ground-based radiometer should
be calibrated with liquid nitrogen (LN) once every a few months and the radiometer
at Nanjing was just LN-calibrated a few days before Nov. 27, 2010. Therefore data
collection for the experiment started on Nov. 27, 2010. Observations in case of cloud
must be deleted from the sample because of the uncertainty in the radiance of cloud.
According to “relative humidity <85 %” [21] for clear-sky, a sample with 77 “Clear-
sky” events, as listed in Table 1, was obtained for Nanjing station at 08:00 BST in
the experiment period from Nov. 27, 2010 to May 29, 2011.
250 Z. Wang et al.

Table 1 Clear-sky dates in the experiment period from Nov. 27, 2010 to May 29, 2011 at Nanjing
Event No. of Event No. of Event No. of
Date index days Date index days Date index days
2010-11-27 1 0 2011-01-12 27 46 2011-03-31 53 124
2010-11-28 2 1 2011-01-13 28 47 2011-04-04 54 128
2010-12-01 3 4 2011-01-16 29 50 2011-04-09 55 133
2010-12-03 4 6 2011-01-24 30 58 2011-04-10 56 134
2010-12-04 5 7 2011-01-25 31 59 2011-04-12 57 136
2010-12-07 6 10 2011-01-30 32 64 2011-04-17 58 141
2010-12-08 7 11 2011-01-31 33 65 2011-04-18 59 142
2010-12-09 8 12 2011-02-02 34 67 2011-04-19 60 143
2010-12-10 9 13 2011-02-03 35 68 2011-04-20 61 144
2010-12-11 10 14 2011-02-04 36 69 2011-04-23 62 147
2010-12-16 11 19 2011-02-05 37 70 2011-04-25 63 149
2010-12-17 12 20 2011-02-11 38 76 2011-04-26 64 150
2010-12-20 13 23 2011-02-15 39 80 2011-04-27 65 151
2010-12-23 14 26 2011-03-03 40 96 2011-04-28 66 152
2010-12-26 15 29 2011-03-04 41 97 2011-04-29 67 153
2010-12-28 16 31 2011-03-07 42 100 2011-05-01 68 155
2010-12-29 17 32 2011-03-08 43 101 2011-05-02 69 156
2010-12-30 18 33 2011-03-11 44 104 2011-05-07 70 161
2010-12-31 19 34 2011-03-12 45 105 2011-05-13 71 167
2011-01-01 20 35 2011-03-16 46 109 2011-05-17 72 171
2011-01-04 21 38 2011-03-17 47 110 2011-05-19 73 173
2011-01-06 22 40 2011-03-23 48 116 2011-05-20 74 174
2011-01-07 23 41 2011-03-25 49 118 2011-05-25 75 179
2011-01-08 24 42 2011-03-27 50 120 2011-05-28 76 182
2011-01-10 25 44 2011-03-28 51 121 2011-05-29 77 183
2011-01-11 26 45 2011-03-30 52 123

3 Comparison Between the Simulated and the Observed


Brightness Temperature Series

3.1 At the Frequencies for Water Vapor and Liquid Water


Remote Sensing

Figure 1 shows the simulated brightness temperature series together with the
observed brightness temperature series for the 77 clear-sky events. In this figure,
the curve TBM is for observation and the curve TBC is for simulation.
It can be seen from Fig. 1 that the consistence between the simulation and
the observation is good for each of the five channels though significant biases
exist. Results from statistic analyses are given in Table 2. The 30 GHz channel is
slightly worse as compared with the other four channels and some of the observed
An Application of Time Series Analysis in Judging the Working State. . . 251

(a) (b)
 
  7%0
 7%0
 7%&
 7%&

7% .

7% .



 
 
 
                 
(YHQWLQGH[ (YHQWLQGH[

(c) (d)
 

 7%0  7%0
7%& 7%&



7% .
7% .




 

 
                 
(YHQWLQGH[ (YHQWLQGH[

(e)
 (c)
 7%0
7%&

7% .




        
(YHQWLQGH[

Fig. 1 The calculated and observed brightness temperature variations at 08:00BT on 77 clear days
for the five channels at the frequencies of (a) 22.235, (b) 23.035, (c) 23.835, (d) 26.235, and
(e) 30.000 GHz

Table 2 Statistics for the calculated and observed brightness temperatures for the five channels to
sense humidity and liquid water content (sample size N D 77)
Channel Frq./GHz 22.235 23.035 23.835 26.235 30.000
TBM=K 26:59 24:97 20:68 18:25 1:93
TBC=K 18:49 18:15 16:55 13:08 12:57
Biasa /K 8:10 6:82 4:13 5:17 10:64
R 0:9821 0:9832 0:9821 0:9816 0:9168
a
Bias D TBM  TBC

brightness temperature values are negative(see Fig. 1e), which is obviously wrong.
The correlation coefficients are all better than 0.91, which implies that all the
observed data after bias correction are good for further applications and all the five
channels have been working well in the given period.
252 Z. Wang et al.

3.2 At the Frequencies for Air Temperature Remote Sensing

The simulated and the observed brightness temperature series at the seven frequen-
cies for atmospheric temperature remote sensing are given in Fig. 2.
As the time goes from winter through spring to summer, the air temperature
at Nanjing increases and, accordingly, the brightness temperature increases. The
brightness temperature at higher frequencies indicates the temperature in the lower
troposphere and the brightness temperature at lower frequencies indicates the
temperature within the whole atmosphere including the upper troposphere [16].
Therefore, the brightness temperature at higher frequencies increases quicker than
at lower frequencies (as shown by TBC series in figures from Fig. 2a to g) since the
temperature increase in the lower troposphere is quicker because of the land surface
than that in the upper troposphere.
But the observed brightness temperatures at lower frequencies such as 51.25,
52.28, and 53.38 GHz are gradually going down and shifting away from the
simulated values, as shown by TBM series in Fig. 2, especially at 51.25 GHz as
shown in Fig. 2a. The phenomenon like this can also be seen at higher frequencies
though not so obvious as at lower frequencies. This phenomenon implies that
the radiometer was not working properly at these channels. (It was later that the
manufacture found because of a failed diode.)
If one’s purpose is to make use of the obtained TBM as shown in Fig. 2, from
which the air temperature information can be retrieved, the correction for the time-
shift must be performed in advance as an important further processing. It can be seen
that the influence of the radiometer malfunction on the data quality was gradually
increasing as the time was going. Therefore suppose the divergence as a function of
time can be expressed by f (D), where D is the number of days from the first day of
the experiment (see Table 1) so that TBM after the time-shift correction becomes

TBO D TBM C f .D/: (3)

The obtained TBO is expected to be more consistent with TBC than TBM. It has
been found that a polynomial like

f .D/ D a0 C a1  D C a2  D2 C a3  D3 ; (4)

can give quite good results as shown by TBO series in Fig. 2, in this case in terms
of the correlation given in Table 3. The values of a0 a3 in the polynomial above
were obtained with regression analysis on the data. It can be seen from Table 3 that
the correlation coefficients between TBC and TBO after correction are obviously
improved as compared with those between TBC and TBM before correction.
Figure 2 shows that the TBO series keep both the shorter-scale temperature
fluctuation features and the long-scale variation from winter to early summer, and
would be better than TBM for air temperature retrievals.
An Application of Time Series Analysis in Judging the Working State. . . 253

(a)  (b) 


 
 
 
 

7% .
7% .

 
  7%0
7%0 
 7%&

7%& 
7%2 7%2
 
 
                 

(YHQWLQGH[ (YHQWLQGH[
(c)  (d) 
 7%0

 7%&


7%2
 
7% .
7% .

 


7%0 

7%& 
 7%2 
 
                 

(YHQWLQGH[ (YHQWLQGH[
(e)  (f) 

 7%0  7%0


 7%&  7%&
 7%2  7%2
7% .
7% .

 

 

 

 

 
                 
(YHQWLQGH[ (YHQWLQGH[
(g) 


7%0
7%&
 7%2
7% .






        
(YHQWLQGH[

Fig. 2 The calculated and observed brightness temperature variations at 08:00 on 77 clear days
for the seven temperature-sensing channels at the frequencies of (a) 51.25, (b) 52.28, (c) 53.85,
(d) 54.94, (e) 56.66, (f) 57.29, and (g) 58.80 GHz
254 Z. Wang et al.

Table 3 The correlation coefficients between TBC and TBM before and after correction for the
seven channels to sense air temperature profiles (sample size N D 77)
Channel Frq./GHz 51.25 52.28 53.85 54.94 56.66 57.29 58.80
R before correction 0:6017 0:6761 0:5817 0.2356 0.6384 0.8387 0.9420
R after correction 0:5728 0:7575 0:7627 0.8339 0.9321 0.9521 0.9784

4 Applications of the Method onto Other Radiometers

The method described above for judging the working state of a ground-based
microwave radiometer based on time series analysis has been applied successfully
to other radiometers.

4.1 The Situation of the Radiometers at Wuhan

There are two radiometers in Wuhan. Table 4 is the Statistics for comparison
between the calculated and observed brightness temperatures for the two radiome-
ters. It can be seen that Radiometer A (one of the two) worked quite well in the
two summers in 2009 and 2010 for inspection with only exception of the channel
at 54.940 GHz according to the correlation coefficient (R2 D 0.7022). The data
obtained with Radiometer A was directly used for precipitation weather analysis
[22]. But Radiometer B (another of the two) was worse and all its channels for
water vapor remote sensing failed for their biases were very large and correlation
coefficient very low as shown in Table 4. The channels for upper air temperature
remote sensing (working at the lower frequency range) did not work well either
according to the low correlation coefficient. Therefore the data obtained with
Radiometer B in the inspection period of Feb. 2010 need further processing for
applications or it is suggested that the radiometer needs LN calibration and, possibly,
hardware inspection as well.

4.2 The Situation of the Beijing Radiometer

Figure 3a, b show the time series of observed clear-sky brightness temperatures
(TBM) as compared with simulated (TBC) for the two typical channels of a Beijing
radiometer during the period from Jan. 1, 2010 to Dec. 31, 2011. It can be seen that
the seasonal variation feature shown by TBC is very clear in Fig. 3a for air humidity
and in Fig. 3b for air temperature but there are two mutations in the TBM series. The
An Application of Time Series Analysis in Judging the Working State. . . 255

Table 4 Statistics for comparison between the calculated and observed brightness temperatures
for Wuhan Radiometer A in summers of both 2009 and 2010 and Wuhan Radiometer B in Feb.
2010
Radiometer A Radiometer B
Channel Frq./GHz TBC/K Bias/K R2 TBC/K Bias/K R2
1 22.234 53:17 3:76 0.9392 10:79 13:90 0.3747
2 22.500 53:68 1:14 0.9331 10:59 13:93 0.7467
3 23.034 51:93 3:52 0.9399 10:48 16:09 0.7402
4 23.834 45:78 1:18 0.9436 10:08 14:83 0.3189
5 25.000 36:94 0:47 0.9399 9:57 10:92 0.5953
6 26.234 30:88 0:53 0.9369 9:37 8:42 0.3073
7 28.000 26:64 0:32 0.8974 9:53 13:53 0.1019
8 30.000 25:14 2:20 0.8950 10:12 17:34 0.1138
9 51.248 121:76 4:99 0.8535 103:71 4:39 0.0902
10 51.760 140:20 5:64 0.8283 120:97 5:96 0.1204
11 52.280 164:90 5:08 0.8528 143:70 1:59 0.0163
12 52.804 196:01 4:40 0.9039 172:39 0:07 0.3651
13 53.336 230:62 1:78 0.9250 205:11 0:67 0.0357
14 53.848 259:92 0:37 0.9722 233:95 0:83 0.0950
15 54.400 279:91 0:52 0.9800 254:57 0:39 0.6646
16 54.940 288:25 0:87 0.7022 263:38 0:94 0.9688
17 55.500 291:19 0:91 0.9895 266:37 1:30 0.7966
18 56.020 292:36 0:93 0.9850 267:44 0:39 0.9935
19 56.660 293:15 0:97 0.9857 268:10 1:68 0.9662
20 57.288 293:61 0:93 0.9761 268:43 1:38 0.9889
21 57.964 293:88 0:58 0.9737 268:60 1:34 0.9645
22 58.800 294:08 0:85 0.9652 268:68 1:27 0.9655

first mutation was found associated with LN calibration at the index of 294 as shown
in Fig. 3a and the second was because of the instrument movement from Shunyi
(40ı070 N, 116ı 360 E, 34.0 m ASL) to Shangdianzi (40ı 390 N, 117ı 070 E, 293.3 m
ASL) at the index of 520 in Fig. 3b. The TBM before LN calibration differs a lot
as compared with the TBM after LN calibration for humidity channels while the
TBM for temperature channels decreases obviously after the instrument movement.
Therefore, the whole sample should be sliced into three sub-samples for any further
comparison and applications. The authors [18] made linear regression analysis on
the three sub-samples to set up the relationship between TBM and TBC for each
channel so that TBM can be transformed into TBO according to

TBO D a TBM C b: (5)


256 Z. Wang et al.

 


7%0 7%0

7%& 7%&
 
TB(K)

TB(K)
 



 

 
             
Event index Event index
(a) channel 4(23.834GHz) (b) channel 14(53.848GHz)
 

7%2 
7%2

7%& 7%&


TB(K)

TB(K)
 



 

 
             
Event index Event index
(c) channel 4 (d) channel 14

Fig. 3 The time series of observed clear-sky brightness temperatures before and after transforma-
tion (TBM and TBO) as compared with model simulated results (TBC) for the two typical channels
of the radiometer at Beijing during the period from Jan. 1, 2010 to Dec. 31, 2011. (a) Channel 4
(23.834 GHz), (b) channel 14 (53.848 GHz), (c) channel 4, and (d) channel 14

The coefficients a and b are obtained from linear regression analysis under the
condition to make

†.TBO  TBC/2 D min : (6)

Table 5 gives the coefficients a and b for data transformation and the correlation
coefficients before and after the transformation. Figure 3c, d are the same as
Fig. 3a, b but TBM has been replaced with TBO. The comparison between Fig. 3c
and a and between Fig. 3d and b, respectively, shows that the discrepancy between
TBM and TBC has been decreased. It can be seen that for most channels except
channel 7 and 8 the linear transformation has made the correlation improved.
Channel 7 and 8 need further inspection.
An Application of Time Series Analysis in Judging the Working State. . . 257

Table 5 The coefficients a and b in equation TBO D a TBM C b and the comparison of the
correlation coefficients of full sample before and after transformation
a and b for a and b for a and b for R2 before R2 after
sub-sample 1 sub-sample 2 sub-sample 3 transformation transformation
Channel (N D 294) (N D 226) (N D 83) (N D 603) (N D 603)
1 1:2103; 15:8 1.1058, 1.9 1:1042; 3:3 0.8673 0.9408
2 1:2170; 15:9 1.1138, 1.2 1:0872; 0:6 0.8525 0.9398
3 1:2930; 19:8 1.2048, 1.7 1:0544; 2:1 0.7373 0.8904
4 1:2614; 18:9 1.2538, 2.1 1:0223; 3:4 0.6828 0.8991
5 1:2635; 14:2 1.3039, 2.6 0:8770; 4:7 0.5987 0.8080
6 1:1073; 7:9 1.3270, 3.4 0:7857; 3:7 0.5217 0.6843
7 0:6872; 1:4 1.1621, 0.4 0:1642; 10:3 0.1528 0.3066
8 0:0828; 17:9 0.1271, 12.9 0:0948; 13:6 0.0165 0.0354
9 0:9044; 10:9 1.0356, 1.0 0:7305; 34:4 0.8242 0.9162
10 0:9705; 2:5 1.1330, 14.0 0:9021; 18:5 0.7635 0.9239
11 0:8240; 29:7 0.9642, 9.4 0:9942; 9:7 0.9036 0.9553
12 0:8485; 31:1 0.9542, 12.2 1:0048; 8:6 0.9290 0.9722
13 0:8642; 32:4 0.9609, 11.3 0:9596; 17:1 0.9527 0.9887
14 0:9177; 21:4 0.9626, 10.0 0:9463; 18:6 0.9738 0.9887
15 0:9599; 10:5 0.9865, 3.3 0:9535; 15:3 0.9846 0.9959
16 0:9780; 5:6 0.9966, 0.3 0:9476; 16:2 0.9893 0.9960
17 0:9929; 1:5 1.0178, 5.8 0:9466; 16:4 0.9844 0.9909
18 0:9973; 0:2 1.0144, 4.9 0:9292; 21:1 0.9869 0.9931
19 1:0030; 1:6 1.0161, 5.3 0:9252; 22:4 0.9856 0.9922
20 1:0043; 1:9 1.0221, 7.1 0:9193; 24:0 0.9834 0.9903
21 1:0101; 3:6 1.0244, 7.7 0:9156; 25:1 0.9825 0.9899
22 1:0159; 5:2 1.0318, 9.8 0:9122; 26:2 0.9802 0.9886

5 Conclusion

Time series analysis on clear-sky brightness temperature (TB) data observed in


the morning with ground-based microwave radiometer for atmospheric remote
sensing is adapted to judge the working state of the radiometer system according
to meteorological data variation features in terms of radiative transfer. The TB data
observed with a ground-based microwave radiometer at Nanjing during the period
from Nov. 27, 2010 to May 29, 2011 have been taken as an example for this paper.
The methodology suggested has been applied to a few similar radiometers at Wuhan
and Beijing for quality control on observed TB data.
The correlation coefficients between the TB readouts from the radiometers and
the simulated TB with radiosonde temperature and humidity profiles as input to
radiative transfer calculation are usually greater than 0.9 for the channels in good
state. The correlation coefficients would be quite poor and even negative for the
channels at which the time series of TB readouts do not show the right atmospheric
temperature and humidity variation features as time goes from one season to another.
258 Z. Wang et al.

In this case, the radiometer working state must be checked and the TB readouts
obtained must be calibrated in advance for further possible applications in retrieving
meteorological information.

Acknowledgement CMA/Institutes of Urban Meteorological Research in Beijing and of Heavy


Raining in Wuhan provided the radiometer-observed data used in this study. NCEP FNL data is
obtained from http://rda.ucar.edu/datasets/ds083.2/ (doi:10.5065/D6M043C6). The work is jointly
supported by National Nature Science Foundation of China (41275043), China Commonweal
Industry Research Project (GYHY201306078), and China Meteorological Administration Urban
Meteorological Research Foundation IUMKY&UMRF201101.

References

1. Westwater, E., Crewell, S., Matzler, C.: A review of surface-based microwave and millimeter-
wave radiometric remote sensing of the troposphere. Radio Sci. Bull. 2004(310), 59–80 (2004)
2. Cimini, D., Westwater, E., Gasiewski, A., Klein M., Leuski V., Liljegren J.: Ground-based
millimeter- and submillimiter-wave observations of low vapor and liquid water contents. IEEE
Trans. Geosci. Remote Sens. 45(7), Part II, 2169–2180 (2007)
3. Cimini, D., Campos, E., Ware, R., Albers, S., Giuliani, G., Oreamuno, J., Joe, P., Koch,
S.E., Cober, S., Westwater, E.: Thermodynamic atmospheric profiling during the 2010 Winter
Olympics using ground-based microwave radiometry. IEEE Trans. Geosci. Remote Sens.
49(12), 4959–4969 (2011)
4. Cimini, D., Nelson, M., Güldner, J., Ware, R.: Forecast indices from ground-based
microwave radiometer for operational meteorology. Atmos. Meas. Tech. 8(1), 315–333 (2015).
doi:10.5194/amt-8-315-2015
5. Güldner, J., Spänkuch, D.: Remote sensing of the thermodynamic state of the atmospheric
boundary layer by ground-based microwave radiometry. J. Atmos. Oceanic Tech. 18, 925–933
(2001)
6. Hewison, T.: 1D-VAR retrievals of temperature and humidity profiles from a ground-based
microwave radiometer. IEEE Trans. Geosci. Remote Sens. 45(7), 2163–2168 (2007)
7. Maschwitz, G., Löhnert, U., Crewell, S., Rose, T., Turner, D.D.: Investigation of ground-based
microwave radiometer calibration techniques at 530 hPa. Atmos. Meas. Tech. 6, 2641–2658
(2013). doi:10.5194/amt-6-2641-2013
8. Stähli, O., Murk, A., Kämpfer, N., Mätzler, C., Eriksson, P.: Microwave radiometer to retrieve
temperature profiles from the surface to the stratopause. Atmos. Meas. Tech. 6, 2477–2494
(2013). doi:10.5194/amt-6-2477-2013
9. Xu, G., Ware, R., Zhang, W., Feng, G., Liao, K., Liu, Y.: Effect of off-zenith observation
on reducing the impact of precipitation on ground-based microwave radiometer measurement
accuracy in Wuhan. Atmos. Res. 140–141, 85–94 (2014)
10. Campos, E., Ware, R., Joe, P., Hudak, D.: Monitoring water phase dynamics in winter clouds.
Atmos. Res. 147–148, 86–100 (2014)
11. Serke, D., Hall, E., Bognar, J., Jordan, A., Abdo, S., Baker, K., Seitel, T., Nelson, M., Reehorst,
A., Ware, R., McDonough, F., Politovich, M.: Supercooled liquid water content profiling case
studies with a new vibrating wire sonde compared to a ground-based microwave radiometer.
Atmos. Res. 149, 77–87 (2014)
12. Ware, R., Cimini, D., Campos, E., Giuliani, G., Albers, S., Nelson, M., Koch, S.E., Joe, P.,
Cober, S.: Thermodynamic and liquid profiling during the 2010 Winter Olympics. Atmos. Res.
132, 278–290 (2013)
An Application of Time Series Analysis in Judging the Working State. . . 259

13. Ware, R., Solheim, F., Carpenter, R., Güldner, J., Liljegren, J., Nehrkorn, T., Vandenberghe, F.:
A multi-channel radiometric profiler of temperature, humidity and cloud liquid. Rad. Sci. 38,
8079–8032 (2003)
14. Wang, Z., Li, Q., Hu, F., Cao, X., Chu, Y.: Remote sensing of lightning by a ground-based
microwave radiometer. Atmos. Res. 150, 143–150 (2014)
15. Solheim, F., Godwin, J., Westwater, E., Han, Y., Keihm, S., Marsh, K., Ware, R.: Radiometric
profiling of temperature, water vapor, and cloud liquid water using various inversion methods.
Radio Sci. 33(2), 393–404 (1998)
16. Westwater, E., Wang, Z., Grody, N.C., et al.: Remote sensing of temperature profiles from a
combination of observations from the satellite-based microwave sounding unit and the ground-
based profiler. J. Atmos. Oceanic Tech. 2, 97–109 (1985)
17. Wang, Z., Cao, X., Huang, J., et al.: Analysis on the working state of a ground based microwave
radiometer based on radiative transfer model and meteorological data variation features. Trans.
Atmos. Sci. 37(1), 1–8 (2014) (in Chinese with English abstract)
18. Li, Q., Hu, F., Chu, Y., Wang, Z., Huang, J., Wang, Y., Zhu, Y.: A consistency analysis
and correction of the brightness temperature data observed with a ground based microwave
radiometer in Beijing. Remote. Sens. Technol. Appl. 29(4), 547–556 (2014) (in Chinese with
English abstract)
19. Liebe, H.J.: An updated model for millimeter wave propagation in moist air. Radio Sci. 20(5),
1069–1089 (1985)
20. Liebe, H.J., Rosenkranz, P.W., Hufford, G.A.: Atmospheric 60 GHz oxygen spectrum: new
laboratory measurements and line parameters. J. Quant. Spectrosc. Radiat. Transfer 48, 629–
643 (1992)
21. Decker, M.T., Westwater, E.R., Guiraud, F.O.: Experimental evaluation of ground-based
microwave radiometric sensing of atmospheric temperature and water vapor profiles. J. Appl.
Meteorol. 17(12), 1788–1789 (1978)
22. Ao, X., Wang, Z., Xu, G., Zhai, Q.i., Pan, X.: Application of Wuhan ground-based radiometer
observations in precipitation weather analysis. Torrential Rain Disasters 30(4), 358–365 (2011)
(in Chinese with English abstract)
Identifying the Best Performing Time Series
Analytics for Sea Level Research

Phil. J. Watson

Abstract One of the most critical environmental issues confronting mankind


remains the ominous spectre of climate change, in particular, the pace at which
impacts will occur and our capacity to adapt. Sea level rise is one of the key
artefacts of climate change that will have profound impacts on global coastal
populations. Although extensive research has been undertaken into this issue, there
remains considerable scientific debate about the temporal changes in mean sea
level and the climatic and physical forcings responsible for them. This research has
specifically developed a complex synthetic data set to test a wide range of time series
methodologies for their utility to isolate a known non-linear, non-stationary mean
sea level signal. This paper provides a concise summary of the detailed analysis
undertaken, identifying Singular Spectrum Analysis (SSA) and multi-resolution
decomposition using short length wavelets as the most robust, consistent methods
for isolating the trend signal across all length data sets tested.

Keywords Climate change • Sea-level rise • Trend analysis

1 Introduction

Sea level rise is one of the key artefacts of climate change that will have profound
impacts on global coastal populations [1, 2]. Understanding how and when impacts
will occur and change are critical to developing robust strategies to adapt and
minimise risks.
Although the body of mean sea level research is extensive, professional debate
around the characteristics of the trend signal and its causalities remains high [3]. In
particular, significant scientific debate has centred around the issue of a measurable
acceleration in mean sea level [4–9], a feature central to projections based on the
current knowledge of climate science [10].

P.J. Watson ()


School of Civil and Environmental Engineering, University of New South Wales, Sydney, NSW
2052, Australia
e-mail: [email protected]

© Springer International Publishing Switzerland 2016 261


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_20
262 P.J. Watson

Monthly and annual average ocean water level records used by sea level
researchers are a complex composite of numerous dynamic influences of largely
oceanographic, atmospheric or gravitational origins operating on differing temporal
and spatial scales, superimposed on a comparatively low amplitude signal of sea
level rise driven by climate change influences (see [3] for more detail). The mean
sea level (or trend) signal results directly from a change in volume of the ocean
attributable principally to melting of snow and ice reserves bounded above sea level
(directly adding water), and thermal expansion of the ocean water mass. This low
amplitude, non-linear, non-stationary signal is quite distinct from all other known
dynamic processes that influence the ocean water surface which are considered to
be stationary; that is, they cause the water surface to respond on differing scales and
frequencies, but do not change the volume of the water mass. In reality, improved
real-time knowledge of velocity and acceleration rests entirely with improving the
temporal resolution of the mean sea level signal.
Over recent decades, the emergence and rapid improvement of data adaptive
approaches to isolate trends from non-linear, non-stationary and comparatively
noisy environmental data sets such as EMD [11, 12], Singular Spectrum Analysis
(SSA) [13–15] and Wavelet analysis [16–18] are theoretically encouraging. The
continued development of data adaptive and other spectral techniques [19] has given
rise to recent variants such as CEEMD [20, 21] and Synchrosqueezed Wavelet
Transform (SWT) [22, 23].
An innovative process by which to identify the most efficient method for
estimating the trend is to test against a “synthetic” (or custom built) data set with
a known, fixed mean sea level signal [3]. In general, a broad range of analysis
techniques have been applied to the synthetic data set to directly compare their
utility to isolate the embedded mean sea level signal from individual time series.
Various quantitative metrics and associated qualitative criteria have been used to
compare the relative performance of the techniques tested.

2 Method

The method to determine the most robust time series method for isolating mean sea
level with improved temporal accuracy is relatively straightforward and has been
based on three key steps, namely:
1. development of synthetic data sets to test;
2. application of a broad range of analytical methods to isolate the mean sea level
trend from the synthetic data set and
3. comparative assessment of the performance of each analytical method using
a multi-criteria analysis (MCA) based on some key metrics and a range of
additional qualitative criteria relevant to its applicability for broad, general use
on conventional ocean water level data worldwide.
Identifying the Best Performing Time Series Analytics for Sea Level Research 263

2.1 Step 1: Development of Synthetic Data Sets for Testing


Purposes

The core synthetic data set developed for this research has been specifically designed
to mimic the key physical characteristics embedded within real-world ocean water
level data, comprising a range of six key known dynamic components added to a
non-linear, non-stationary time series of mean sea level [3]. The fixed mean sea level
signal has been generated by applying a broad cubic smoothing spline to a range of
points over the 1850–2010 time horizon reflective of the general characteristics of
the global trend of mean sea level [24], accentuating the key positive and negative
“inflexion” points evident in the majority of long ocean water level data sets [25].
This data set has been designed as a monthly average time series spanning a 160-
year period (from 1850 to 2010) to reflect the predominant date range for the longer
records in the Permanent Service for Mean Sea Level (PSMSL), which consolidates
the world’s ocean water level data holdings.
The synthetic data set contains 20,000 separate time series, each generated by
successively adding a randomly sampled signal from within each of the six key
dynamic components to the fixed mean sea level signal. The selection of 20,000
time series represents a reasonable balance between optimising the widest possible
set of complex combinations of real-world signals and the extensive computing
time required to analyse the synthetic data set. Further, the 20,000 generated trend
outputs from each analysis provide a robust means of statistically identifying the
better performing techniques for extracting the trend [3].
Additionally, the core 160-year monthly average data set has been subdivided
into 2  80 and 4  40 year subsets and annualised to create 14 separate data sets to
also consider the influence of record length and issues associated with annual versus
monthly records.

2.2 Step 2: Application of Analysis Methods to Extract Trend


from Synthetic Data Sets

The time series analysis methods that have been applied to the synthetic data set to
estimate the trend are summarised in Table 1. This research has not been designed to
consider every time series analysis tool available. Rather the testing regime is aimed
at appraising the wide range of tools currently used more specifically for mean sea
level trend detection of individual records, with a view to improving generalised
tools for sea level researchers. Some additional, more recently developed data
adaptive methods such as CEEMD [21] and SWT [22, 23] have also been included
in the analysis to consider their utility for sea level research. It is acknowledged
that various methods permit a wide range of parameterisation that can critically
affect trend estimation. In these circumstances, broad sensitivity testing has been
undertaken to identify the better performing combination and range of parameters
Table 1 Summary of analysis techniques applied to synthetic data set
264

Software package/additional
Method Sub-method Additional condition comment
Linear regression n/a n/a n/a
Polynomial regression Second order n/a n/a
LOESS smoothing n/a n/a ˛ D 0.75, order D 2, weighted least
squares
Smoothing splines Cubic smoothing, thin plate PRS, B-Spline œ based on both GCV “mgcv” package in R [26–29]
and REML
Moving averagea 10- to 40-year smooth Single, triple and quad “zoo” package in R [30]
averaging.
Structural modelsb Seasonal decomposition and basic struct. model Based on LOESS and Stl decomposition in R [31].
ARIMA StructTS in R [32]
Butterworth filterc 10–80 years cycles removed n/a GRETL [33]
SSAd,e 1d and Toeplitz variants Win: 10–80 years “Rssa” package in R [34]
EMD Envelope: interpolation, spline smoothing, locfit Boundary condition: “EMD” package in R [11, 35, 36]
smoothing. Sifting by interpolation and spline smoothing none, symmetric, wave,
periodic
EEMDf Noise amplitude: 20–200 mm Trials: 20–200 “hht” package in R [12, 37, 38]
CEEMDf Noise amplitude: 20–200 mm Trials: 20–200 “hht” package in R [21, 37, 38]
(continued)
P.J. Watson
Table 1 (continued)
Software package/additional
Method Sub-method Additional condition comment
Wavelet analysis Multi-resolution decomposition using MODWT Daubechies filters: “wmtsa” package in R [16, 38, 39]
Symmlet (S2–S10)
Synchrosqueezed Wavelet Wavelet filters: “Bump” ( D 1, s D 0.2), “CMHat” Gen parameter: 100 “SynchWave” package in R [22, 23]
Transform (SWT)e ( D 1, s D 5), “Morlet” ( D 0.05PI), “Gauss” ( D 2, 1000
s D 0.083) 10,000
100,000
Notes: The above-mentioned table provides a general summary of the analytical techniques applied to the synthetic data set in order to test the utility of
extracting the embedded mean sea level (trend) component. The “Sub-Method” and “Additional Condition” provide details on the sensitivity analysis pertaining
to the respective methodologies. Where possible, relevant analytical software from the R open source suite of packages have been used [40]
a
Moving (or rolling) averages are centred around the data point in question and therefore the determined trend is restricted to half the averaging window inside
both ends of the data set
b
Structural models are only relevant for monthly average data sets
c
For the respective 40-year monthly and annual synthetic data sets, only cycles up to and including 40 years have been removed by the digital filter
d
For the respective 40-year data sets, only window lengths from 10 to 30 years have been considered. Similarly, for the respective 80-year data sets, only
window lengths from 10 to 70 years have been considered
e
Auto detection routines have been specifically written to isolate decomposed elements of the time series with low frequency trend characteristics
f
The noise amplitude for the annual data sets includes the full range, but, for the monthly data sets only ranges from 50 to 200 mm
Identifying the Best Performing Time Series Analytics for Sea Level Research
265
266 P.J. Watson

for a particular method when applied specifically to ocean water level records (as
represented by the synthetic data sets).
With methods such as SSA and SWT, it has been necessary to develop auto
detection routines to isolate specific elements of decomposed time series with
characteristics that resemble low frequency trends. Direct consultation with leading
time series analysts and developers of method specific analysis tools has also
assisted to optimise sensitivity testing.

2.3 Step 3: Multi-Criteria Assessment of Analytical Methods


for Isolating Mean Sea Level

In addition to identifying the analytic that provides the greatest temporal precision in
resolving the trend, the intention is to use this analytic to underpin the development
of tools for wide applicability by sea level researchers. Comparison of techniques
identified in Table 1, have been assessed across a relevant range of quantitative and
qualitative criteria, including:
• Measured accuracy (Criteria A1 ). This criterion is based upon the cumulative
sum of the squared differences between the fixed mean sea level signal and the
trend derived from a particular analytic for each time series in the synthetic data
set. This metric has then been normalised per data point for direct comparison
between the different length synthetic data sets (40, 80 and 160 years) as follows:

20;000
1X
A1 D .xi  X/2 (1)
n iD1

where X represents the fixed mean sea level signal embedded within each time
series; xi represents the trend derived from the analysis of the synthetic data set
using a particular analytical approach and n represents the number of data points
within each of the respective synthetic data sets (or lesser outputs in the case of
moving averages).
It is imperative to note that particular combinations of key parameters used as
part of the sensitivity testing regime for particular methods (refer Table 1), resulted
in no (or limited) outputs for various time series analysed. This occurred either
due to the analytic not resolving a signal within the limitations established for a
trend (particularly for auto detection routines necessary for SSA and SWT) or where
internal thresholds/convergence protocols were not met for a particular algorithm
and the analysis terminated. Where such circumstances occurred, the determined
A1 metric was prorated to equate to 20,000 time series for direct comparison across
methods. Where the outputs of an analysis resolved a trend signal in less than 75 %
(or 15,000 time series) of a particular synthetic data set, the result was not included
in the comparative analysis.
Identifying the Best Performing Time Series Analytics for Sea Level Research 267

• Maximum standard deviation (Criteria A2 ). This straightforward statistical


measure is based on the outputted trends from the application of a particular
analytical method to the synthetic data sets, providing a measure of the scale of
the spread of outputted trend estimates. Intuitively, the better performing analytic
will minimise both criterions A1 and A2 .
• Computational expense (Criteria A3 ). This criterion provides a comparative
assessment of the average processing time to isolate the trend from the longest
synthetic data set (160 years). This metric provides an intuitive appraisal of the
value of some of the more computationally demanding analytical approaches
when weighed against, in particular, the measured accuracy (criteria A1 ).
• Consistency across differing length data sets (Criteria A4 ). This criterion is
based on a qualitative assessment of the consistency in the performance of the
respective method across the three key length data sets (40, 80 and 160 years)
which cover the contemporary length of global data used by sea level researchers.
It is important to gain an understanding of how the relative accuracy changes in
the extraction of the trend (if at all) from shorter to longer length data sets. A
simple tick indicates a general consistency in the level of accuracy across all data
sets. A cross indicates that the analytic may not have been able to consistently
isolate a signal with “trend-like” characteristics across all length data sets within
the limits established through the sensitivity testing regime.
• Capacity to improve temporal resolution of trend characteristics (Criteria
A5 ). This criterion is similarly based on a qualitative assessment of the capacity
for the isolated trend to inform changes to associated real-time velocity and
accelerations, which are of great contemporary importance to sea level and
climate change researchers.
• Resolution of trend over full data record (Criteria A6 ). This criterion relates
to the ability of a particular analytic to resolve the trend over the full length of
the data record. It has become increasingly important for sea level researchers to
gain a real-time understanding of any temporal changes in the characteristics of
the mean sea level (or trend) signal in the latter portion of the record.
• Ease of application by non-expert practitioners (Criteria A7 ). Several ana-
lytical approaches considered require extensive expert judgement to optimise
performance. Despite the sensitivity analyses undertaken to broadly identify the
optimal settings of a specific analytic in relation to the signals within the synthetic
data sets, the sensitivity of key parameters can be quite high. Where limited (or
no) specific knowledge of the analytic is required to optimise its performance the
analytic has been denoted with a tick.

3 Results

In total, 1450 separate analyses have been undertaken as part of the testing regime,
translating to precisely 29 million individual time series analyses. Figure 1 provides
a pictorial summary of the complete analysis of all monthly and annual data
268 P.J. Watson

Fig. 1 Analysis overview based on Criteria A1 . Notes: This chart provides a summary of all
analysis undertaken (refer Table 1). Scales for both axes are equivalent for direct comparison
between respective analyses conducted on the monthly (top panel) and annual (bottom panel)
synthetic data sets. The vertical dashed lines demarcate the results of each method on the 160-,
80- and 40-year length data sets in moving from left to right across each panel. Where the analysis
permitted the resolution of a trend signal across a minimum of 75 % (or 15,000 time series) of a
synthetic data set, this has been represented as “complete”. Those analyses resolving trends over
less than 75 % of a synthetic data set are represented as “incomplete”

sets (40-, 80- and 160-year synthetic data sets) plotted against the key metric,
criteria A1 . Equivalent scales for each panel provide direct visual and quantitative
comparison between monthly and annual and differing length data sets. For the sake
of completeness, it is worth noting a further 36 monthly analysis results lie beyond
the limit of the scale chosen and therefore are not depicted on the chart. Where
analysis resolves a trend signal across more than 75 % (or 15,000 time series) of
a synthetic data set, the output is used for comparative purposes and depicted on
Fig. 1 as “complete”.
From Fig. 1, it is evident that the cumulative errors of the estimated trend
(criteria A1 ) are appreciably lower for the annual data sets when considered across
the totality of the analysis undertaken. More specifically, for the 579 “complete”
monthly outputs, 408 (or 71 %) fall below an A1 threshold level of 30  106 mm2
(where the optimum methods reside). Comparatively, for the 632 “complete” annual
outputs, 566 (or 90 %) are below this threshold level.
The key reason for this is that the annualised data sets not only provide a
natural low frequency smooth (through averaging calendar year monthlies), but,
the seasonal influence (at monthly frequency) is largely removed, noting the bin
of seasonal signals sampled to create the synthetic data set also contains numerous
time-varying seasonal signals derived using ARIMA.
Identifying the Best Performing Time Series Analytics for Sea Level Research 269

Based on visual inspection of Fig. 1, it is difficult to distinguish the influence


of record length on capacity to isolate the trend component. However, detailed
examination of the “complete” monthly outputs indicates that 77 % of the 160-
year data set are contained below the A1 threshold level of 30  106 mm2 , falling to
62 % for the 40-year data sets. Similarly for the “complete” annual outputs, 98 % of
the 160-year data set are contained below this threshold, falling to 85 % for the 40-
year data sets. The above-mentioned results provide strong evidence that estimates
of mean sea level are enhanced generally through the use of longer, annual average
ocean water level data.
Based upon the appreciably reduced error in the estimate of the trend by using
annual over monthly average ocean water level data, the multi-criteria assessment of
the various methodologies advised in Table 1 have been limited to analysis outputs
based solely on the annual synthetic data sets. Table 2 provides a summary of the
multi-criteria assessment of the better performing methods, based on optimisation
of relevant parameters for each specific analytic. From this assessment, multi-
resolution decomposition using short maximal overlap discrete wavelet transform
(MODWT) and short length wavelets has proven the optimal analytic over the broad
range of criteria outlined in Sect. 2.3, whereby limited expert judgment is required
to optimise performance.
In addition to the results discussed above, there are some other interesting
observations to be gleaned from the weight of analysis undertaken as part of this
work. Of all methods considered in Table 1, the comparatively simple structural
models applied to the monthly data sets provided the least utility in extracting the
mean sea-level trend component. This is not unexpected given that the range of
complex signals within the synthetic data set are forced to be resolved into trend,
seasonal and noise components only by these general models.
Similarly, methods such as EMD with inherent limitations associated with mode
mixing and splitting, aliasing and end effects [41], performed comparatively poorly
across the range of synthetic data sets and across the range of parameters varied to
optimise performance. The EEMD variant [12] which effectively combines EMD
with noise stabilisation to offset the propensity for mode mixing and aliasing [19],
exhibited substantially enhanced performance compared to EMD. Across all 14
monthly and annual average synthetic data sets, EEMD exhibited more stable and
consistent results across all sensitivity tests with the best performing EEMD on
average reducing the squared error by 15 % compared to the best performing EMD
combination.
A further advancement in the form of CEEMD [21] was developed to overcome
a nuance of EEMD in which the sum of the intrinsic mode functions determined
by the algorithm does not necessarily reconstruct the original signal [19]. When
similarly averaged across all synthetic data sets, the best performing combination of
CEEMD parameterisation only reduced the squared error by less than 5 % compared
to the best performing EMD combination. Further, it should be noted the CEEMD
algorithm was not able to resolve a trend for every time series where internal
thresholds/convergence protocols were not met.
270

Table 2 Multi-criteria assessment by method across annual synthetic data sets


Criteria
A1 (mm2  106 )
Method (note 1) A2 (mm)a A3 (seconds)b A4 A5 A6 A7
Single MA (30YR) 26.0 81 <0.01   ✗ 
SWT (wavelet: CMH, gen.par 105 ) 37.1 89 0.36 ✗   ✗
Linear regression 37.2 71 <0.01  ✗  
Multi-resolution wavelet decomposition (MODWT) (wavelet: s2) 37.8 65 <0.01    
SSA (1-D Toeplitz, auto select, window D 30YR) 39.3 63 0.01    ✗
EEMD (noise D 100 mm, trials D 200) 40.9 94 24.06    ✗
Second order polynomial 43.7 102 <0.01  ✗  
Butterworth digital filter (removal up to 40YR cycles) 45.5 115 <0.01    
CEEMD (noise D 100 mm, trials D 100) 49.3 106 26.80 ✗   ✗
LOESS smoothing 49.7 139 <0.01    
B-Spline smoothing ( based on REML) 50.8 122 0.01    
EMD (spline smooth sifting, symmetric end,  based on golden search) 51.0 125 20.57    ✗
Notes: The above-mentioned table provides a summary of the better performing methods based on optimisation of relevant parameters for each specific analytic
(refer Table 1 for full range of sensitivity analyses). Only methods which resolved a trend component for a minimum of 75 % of each of the respective annual
data sets (160, 2  80 and 4  40 year) have been considered
a
Criteria A1 and A2 are based on the sum of the metrics for the 160-year data set added to the respective averages for the 2  80 year and 4  40 year data sets
b
Criteria A3 represents the average time in seconds to analyse a single time series from the 160-year annual average synthetic data set. The multi-resolution
wavelet decomposition highlighted demonstrates optimal performance across all criterions considered. Only the top 12 methods are indicated based on criteria
A1 ranking
P.J. Watson
Identifying the Best Performing Time Series Analytics for Sea Level Research 271

Based on the testing regime performed on the synthetic data sets, EEMD outper-
formed CEEMD. Both variants of the ensemble EMD, using the sensitivity analysis
advised, proved the most computationally expensive of all the algorithms tested.
Both of these EMD variants were substantially outperformed by the MODWT and
SSA, but importantly, processing times were of the order of 3000–4000 times that
of these better performing analytics.
Clearly for these particularly complex ocean water level time series, the excessive
computational expense of these algorithms has not proven beneficial. One of
the more inconsistent performers proved to be the SWT. This algorithm proved
highly sensitive to the combination of wavelet filter and generalisation parameter.
Certain combinations of parameters provided exceptional performance on indi-
vidual synthetic data sets but proved less capable of consistently resolving low
frequency “trend-like” signals across differing length data sets. Of the analytics
tested, this algorithm proved the most complex to optimise in order to isolate and
reconstruct trends from the ridge extracted components. Auto detection routines
were specifically developed to test and isolate the low frequency components based
on first differences. However, a significant portion of the sensitivity analyses for
SWT had difficulty isolating the low frequency signals across the majority of the
data sets tested.
SSA has also been demonstrated to be a superior analytical tool for trend
extraction across the range of synthetic data sets. However, like the SWT, SSA
requires an elevated level of expertise to select appropriate parameters and internal
methods to optimise performance. Auto detection routines were also developed
to isolate the key SSA eigentriple groupings with low frequency “trend-like”
characteristics, based on first differences. With this approach, not all time series
could be resolved to isolate a trend within the limits established. Auto detection
routines based on frequency contribution [42] were also provided by Associate
Professor Nina Golyandina (St Petersburg State University, Russia) to test, proving
comparable to the first differences technique.

4 Discussion

With so much reliance on improving the temporal resolution of the mean sea level
signal due to its association as a key climate change indicator, it is imperative
to maximise the information possible from the extensive global data holdings of
the PSMSL. Numerous techniques have been applied to these data sets to extract
trends and infer accelerations based on local, basin or global scale studies. Ocean
water level data sets, like any environmental time series, are complex amalgams
of physical processes and influences operating on different spatial scales and
frequencies. Further, these data sets will invariably also contain influences and
signals that might not yet be well understood (if at all).
With so many competing and sometimes controversial findings in the scientific
literature concerning trends and more particularly, accelerations in mean sea level
272 P.J. Watson

(refer Sect. 1), it is difficult to definitively separate sound conclusions from those
that might unwittingly be influenced by the analytical methodology applied (and to
what extent). This research has been specifically designed as a necessary starting
point to alleviate some of this uncertainty and improve knowledge of the better
performing trend extraction methods for individual long ocean water level data.
Identification of the better performing methods enables the temporal resolution of
mean sea level to be improved, enhancing the knowledge that can be gleaned from
long records which includes associated real-time velocities and accelerations. In
turn, key physically driven changes can be identified with improved precision and
confidence, which is critical not only to sea level research, but also climate change
more generally at increasingly finer (or localised) scales.
The importance of resolving trends from complex environmental and climatic
records has led to the application of increasingly sophisticated, so-called data
adaptive spectral and empirical techniques [12, 19, 43, 44] over comparatively
recent times. In this regard, it is readily acknowledged that whilst the testing
undertaken within this research has indeed been extensive, not every time series
method for trend extraction has been examined. The methods tested are principally
those that have been applied to individual ocean water level data sets within the
literature to estimate the trend of mean sea level.
Therefore spatial trend coherence and multiple time series decomposition tech-
niques such as PCA/EOF, SVD, MC-SSA, M-SSA, XWT, some of which are used
in various regional and global scale sea level studies [45–51] are beyond the scope
of this work and have not been considered. In any case, the synthetic data sets
developed for this work have not been configured with spatially dependent patterns
to facilitate rigorous testing of these methods. In developing the synthetic data sets
to test for this research, Watson [3] noted specifically that a natural extension (or
refinement) of the work might be to attempt to fine tune the core synthetic data set
to reflect the more regionally specific signatures of combined dynamic components.
Other key factors for consideration include identifying the method(s) that
prove robust over the differing length time series available whilst resolving trends
efficiently, with little pre-conditioning or site specificity. Whilst recognising that
various studies investigating mean sea level trends at long gauge sites have utilised
the construction of comparatively detailed site specific general additive models,
these models have little direct applicability or transferability to other sites and have
not been considered further for this work.
Of the analysis methods considered, the comparatively simple 30-year moving
(or rolling) average filter proved the optimal performer against the key A1 criterion
when averaged across all length data sets. Although not isolating and removing
high amplitude signals or contaminating noise, the sheer width of the averaging
window proves to be very efficient in dampening their influence for ocean water
level time series. However, the resulting mean sea level trend finishes 15 years inside
either end of each data set, providing no temporal understanding of the signal for
the most important part of the record—the recent history, which is keenly desired to
better inform the trajectory of the climate related signal. Although well performing
on a range of criteria, this facet is a critical shortcoming of this approach. Whilst
Identifying the Best Performing Time Series Analytics for Sea Level Research 273

triple and quadruple moving averages were demonstrated to marginally lower the
A1 criteria, respectively, compared to the equivalent single moving average, the loss
of data from the ends of the record was further amplified by these methods.
It is also noted that the simple linear regression analysis also performed
exceptionally well against the A1 criteria when averaged across all data sets. Based
on the comparatively limited amplitude and curvature of the mean sea level trend
signal embedded within the synthetic data set it is perhaps not surprising that
the linear regression performs well. But, like the moving average approach, its
simplicity brings with it a profound shortcoming, in that it provides limited temporal
instruction on the trend other than its general direction (increasing or decreasing).
No information on how (or when) this signal might be accelerating is possible from
this technique, which regrettably, is a facet of critical focus for contemporary sea
level research.
It has been noted that unfortunately many studies using wavelet analysis have
suffered from an apparent lack of quantitative results. The wavelet transform
has been regarded by many as an interesting diversion that produces colourful
pictures, yet purely qualitative results [52]. The initial use of this particular multi-
resolution decomposition technique (MODWT) for application to a long ocean
water level record can be found in the work of Percival and Mofjeld [53]. There
is no question from this current research, that wavelet analysis has proven a “star
performer”, producing measurable quantitative accuracy exceeding other methods,
with comparable consistency across all length synthetic data sets and with minimal
computational expense.
Importantly, it is worth noting that the sensitivity testing and MCA used to
differentiate the utility of the various methods, unduly disadvantages the SSA
method. In reality the SSA method performs optimally with a window length
varying between L/4 and L/2 (where L is the length of the time series). Varying the
window length permits necessary optimisation of the separability between the trend,
oscillatory and noise components [54]. However, for the sensitivity analysis around
SSA, only fixed window lengths were compared across all data sets. Although SSA
(with a fixed 30-year window) performed comparably for the key A1 criteria with
MODWT (refer Table 2), a method that optimises the window length parameter
automatically would, in all likelihood have further improved this result. Only a
modest improvement of less than 4 % would be required to put SSA on parity with
the accuracy of MODWT. In addition, auto detection routines designed to select
“trend-like” SSA components are unlikely to perform as well as the interactive
visual inspection (VI) techniques commonly employed by experienced practitioners
decomposing individual time series [43]. Clearly VI techniques were not an option
for the testing regime described herein, which involved processing 14 separate data
sets each containing 20,000 time series.
It is important that both the intent and the limitations of the research work
presented here are clearly understood. The process of creating a detailed synthetic
ocean water level data set, embedded with a fixed non-linear, non-stationary mean
sea level signal to test the utility of trend extraction methods is unique for sea
level research. Despite broad sensitivity testing designed herein, this work should
274 P.J. Watson

be viewed as a starting point rather than a fait accompli in providing a transparent


appraisal of the utility of currently used techniques for isolating the mean sea level
trend from individual ocean water level time series. The author warmly welcomes
the opportunity to work further with analysts on refining parameters of tested
methods and alternative methods of trend extraction to optimise performance of
these tools for sea level research.

5 Conclusion

The monthly and annual average ocean water level data sets used to estimate
mean sea level are like any environmental or climatic time series data, ubiquitously
“contaminated” by numerous complex dynamic processes operating across differing
spatial and frequency scales, often with very high noise to signal ratio. Whilst the
primary physical processes and their scale of influence are known generally [3],
not all processes in nature are fully understood and the quantitative attribution
of these associated influences will always have a degree of imprecision, despite
improvements in the sophistication of time series analyses methods [44]. In an
ideal world with all contributory factors implicitly known and accommodated, the
extraction of a trend signal would be straightforward.
In recent years, the controversy surrounding the conclusions of various published
works, particularly concerning measured accelerations from long, individual ocean
water level records necessitate a more transparent, qualitative discussion around
the utility of various analytical methods to isolate the mean sea level signal with
improved accuracy. The synthetic data set developed by Watson [3] was specifically
designed for long individual records, providing a robust and unique framework
within which to test a range of time series methods to augment sea level research.
The testing and analysis regime summarised in this paper is extensive, involving
1450 separate analyses across monthly and annual data sets of length 40, 80 and
160 years. In total, 29 million individual time series were analysed. From this work,
there are some broad general conclusions to be drawn concerning the extraction of
the mean sea level signal from individual ocean water level records with improved
temporal accuracy:
• Precision is enhanced by the use of the longer, annual average data sets;
• The analytic producing the optimal measured accuracy (Criteria A1 ) across all
length annual data sets was the simple 30-year moving average filter. However,
the outputted trend finishes half the width of the averaging filter inside either end
of the data record, providing no temporal understanding of the trend signal for
the most important part of the record – the recent history;
• The best general purpose analytic requiring minimum expert judgment and
parameterisation to optimise performance was multi-resolution decomposition
using MODWT and
Identifying the Best Performing Time Series Analytics for Sea Level Research 275

• The optimum performing analytic is most likely to be SSA whereby interactive


visual inspection (VI) techniques are used by experienced practitioners to
optimise window length and component separability.
This work provides a very strong argument for the utility of SSA and multi-
resolution decomposition using MODWT techniques to isolate mean sea level with
improved temporal resolution from long individual ocean water level data using
a unique, robust, measurable approach. Notwithstanding, there remains scope to
improve the utility of several of the data adaptive approaches using more extensive
tuning of alternative parameters to optimise their performance to enhance mean sea
level research.

Acknowledgements The computing resources required to facilitate this research have been
considerable. It would not have been possible to undertake the testing program without the
benefit of access to high performance cluster computing systems. In this regard, I am indebted
to John Zaitseff and Dr Zavier Barthelemy for facilitating access to the “Leonardi” and “Manning”
systems at the Faculty of Engineering, University of NSW and Water Research Laboratory, respec-
tively.Further, this component of the research has benefitted significantly from direct consultations
with some of the world’s leading time series experts and developers of method specific analysis
tools. Similarly, I would like to thank the following individuals whose contributions have helped
considerably to shape the final product and have ranged from providing specific and general expert
advice, to guidance and review (in alphabetical order): Daniel Bowman (Department of Geological
Sciences, University of North Carolina at Chapel Hill); Dr Eugene Brevdo (Research Department,
Google Inc, USA); Emeritus Professor Dudley Chelton (College of Earth, Ocean and Atmospheric
Sciences, Oregon State University, USA); Associate Professor Nina Golyandina (Department
of Statistical Modelling, Saint Petersburg State University, Russia); Professor Rob Hyndman
(Department of Econometrics and Business Statistics, Monash University, Australia); Professor
Donghoh Kim (Department of Applied Statistics, Sejong University, South Korea); Alexander
Shlemov (Department of Statistical Modelling, Saint Petersburg State University, Russia); Asso-
ciate Professor Anton Korobeynikov (Department of Statistical Modelling, Saint Petersburg State
University, Russia); Emeritus Professor Stephen Pollock (Department of Economics, University
of Leicester, UK); Dr Natalya Pya (Department of Mathematical Sciences, University of Bath,
UK); Dr Andrew Robinson (Department of Mathematics and Statistics, University of Melbourne,
Australia); and Professor Ashish Sharma (School of Civil and Environmental Engineering,
University of New South Wales, Australia).

References

1. McGranahan, G., Balk, D., Anderson, B.: The rising tide: assessing the risks of climate change
and human settlements in low elevation coastal zones. Environ. Urban. 19(1), 17–37 (2007)
2. Nicholls, R.J., Cazenave, A.: Sea-level rise and its impact on coastal zones. Science 328(5985),
1517–1520 (2010)
3. Watson, P.J.: Development of a unique synthetic data set to improve sea level research and
understanding. J. Coast. Res. 31(3), 758–770 (2015)
4. Baart, F., van Koningsveld, M., Stive, M.: Trends in sea-level trend analysis. J. Coast. Res.
28(2), 311–315 (2012)
5. Donoghue, J.F., Parkinson, R.W.: Discussion of: Houston, J.R. and Dean, R.G., 2011. Sea-
level acceleration based on U.S. tide gauges and extensions of previous global-gauge analyses.
J. Coast. Res. 27(3), 409–417; J. Coast. Res, 994–996 (2011)
276 P.J. Watson

6. Houston, J.R., Dean, R.G.: Sea-level acceleration based on U.S. tide gauges and extensions of
previous global-gauge analyses. J. Coast. Res. 27(3), 409–417 (2011)
7. Houston, J.R., Dean, R.G.: Reply to: Rahmstorf, S. and Vermeer, M., 2011. Discussion of:
Houston, J.R. and Dean, R.G., 2011. Sea-level acceleration based on U.S. tide gauges and
extensions of previous global-gauge analyses. J. Coast. Res. 27(3), 409–417; J. Coast. Res.,
788–790 (2011)
8. Rahmstorf, S., Vermeer, M.: Discussion of: Houston, J.R. and Dean, R.G., 2011. Sea-level
acceleration based on U.S. tide gauges and extensions of previous global-gauge analyses. J.
Coast. Res. 27(3), 409–417; J. Coast. Res., 784–787 (2011)
9. Watson, P.J.: Is there evidence yet of acceleration in mean sea level rise around Mainland
Australia. J. Coast. Res. 27(2), 368–377 (2011)
10. IPCC: Summary for policymakers. In: Stocker, T.F., Qin, D., Plattner, G.-K., Tignor, M., Allen,
S.K., Boschung, J., Nauels, A., Xia, Y., Bex, V., Midgley, P.M. (eds.) Climate Change 2013:
The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report
of the Intergovernmental Panel on Climate Change. Cambridge University Press, Cambridge
(2013)
11. Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, E.H., Zheng, Q., Tung, C.C., Liu, H.H.:
The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary
time series analysis. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 454(1971), 903–995
(1998)
12. Wu, Z., Huang, N.E.: Ensemble empirical mode decomposition: a noise-assisted data analysis
method. Adv. Adapt. Data Anal. 1(01), 1–41 (2009)
13. Broomhead, D.S., King, G.P.: Extracting qualitative dynamics from experimental data. Physica
D 20(2), 217–236 (1986)
14. Golyandina, N., Nekrutkin, V., Zhigljavsky, A.A.: Analysis of Time Series Structure: SSA and
Related Techniques. Chapman and Hall/CRC, Boca Raton, FL (2001)
15. Vautard, R., Ghil, M.: Singular spectrum analysis in nonlinear dynamics, with applications to
paleoclimatic time series. Physica D 35(3), 395–424 (1989)
16. Daubechies, I.: Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics
(SIAM), Philadelphia (1992)
17. Grossmann, A., Morlet, J.: Decomposition of Hardy functions into square integrable wavelets
of constant shape. SIAM J. Math. Anal. 15(4), 723–736 (1984)
18. Grossmann, A., Kronland-Martinet, R., Morlet, J.: Reading and understanding continuous
wavelet transforms. In: Combes, J.-M., Grossmann, A., Tchamitchian, P. (eds.) Wavelets,
pp. 2–20. Springer, Heidelberg (1989)
19. Tary, J.B., Herrera, R.H., Han, J., Baan, M.: Spectral estimation—What is new? What is next?
Rev. Geophys. 52, 723–749 (2014)
20. Han, J., van der Baan, M.: Empirical mode decomposition for seismic time-frequency analysis.
Geophysics 78(2), O9–O19 (2013)
21. Torres, M.E., Colominas, M.A., Schlotthauer, G., Flandrin, P.: A complete ensemble empirical
mode decomposition with adaptive noise. In: Proceedings of the IEEE International Confer-
ence on Acoustics, Speech and Signal (ICASSP), May 22–27, 2011, Prague Congress Center
Prague, Czech Republic, pp. 4144–4147 (2011)
22. Daubechies, I., Lu, J., Wu, H.T.: Synchrosqueezed wavelet transforms: an empirical mode
decomposition-like tool. Appl. Comput. Harmon. Anal. 30(2), 243–261 (2011)
23. Thakur, G., Brevdo, E., Fučkar, N.S., Wu, H.T.: The synchrosqueezing algorithm for time-
varying spectral analysis: robustness properties and new paleoclimate applications. Signal
Process. 93(5), 1079–1094 (2013)
24. Bindoff, N.L., Willebrand, J., Artale, V., Cazenave, A., Gregory, J., Gulev, S., Hanawa, K.,
Le Quéré, C., Levitus, S., Nojiri, Y., Shum, C.K., Talley, L.D., Unnikrishnan, A.: Observations:
oceanic climate change and sea level. In: Solomon, S., Qin, D., Manning, M., Chen, Z.,
Marquis, M., Averyt, K.B., Tignor, M., Miller, H.L. (eds.) Climate Change 2007: The
Physical Science Basis. Contribution of Working Group I to the Fourth Assessment Report
of the Intergovernmental Panel on Climate Change. Cambridge University Press, Cambridge
(2007)
Identifying the Best Performing Time Series Analytics for Sea Level Research 277

25. Woodworth, P.L., White, N.J., Jevrejeva, S., Holgate, S.J., Church, J.A., Gehrels, W.R.:
Review—evidence for the accelerations of sea level on multi-decade and century timescales.
Int. J. Climatol. 29, 777–789 (2009)
26. Wood, S.: Generalized Additive Models: An Introduction with R. CRC, Boca Raton, FL (2006)
27. O’Sullivan, F.: A statistical perspective on ill-posed inverse problems. Stat. Sci. 1(4), 502–527
(1986)
28. O’Sullivan, F.: Fast computation of fully automated log-density and log-hazard estimators.
SIAM J. Sci. Stat. Comput. 9(2), 363–379 (1988)
29. Eilers, P.H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties. Stat. Sci. 11(2),
89–102 (1996)
30. Zeileis, A., Grothendieck, G.: Zoo: S3 infrastructure for regular and irregular time series. J.
Stat. Softw. 14(6), 1–27 (2005)
31. Cleveland, R.B., Cleveland, W.S., McRae, J.E., Terpenning, I.: STL: a seasonal-trend decom-
position procedure based on loess. J. Off. Stat. 6(1), 3–73 (1990)
32. Durbin, J., Koopman, S.J.: Time Series Analysis by State Space Methods (No. 38). Oxford
University Press, Oxford (2012)
33. GRETL.: Gnu Regression, Econometrics and Time series Library (GRETL). http://www.gretl.
sourceforge.net/ (2014)
34. Golyandina, N., Korobeynikov, A.: Basic singular spectrum analysis and forecasting with R.
Comput. Stat. Data Anal. 71, 934–954 (2014)
35. Kim, D., Oh, H.S.: EMD: a package for empirical mode decomposition and Hilbert spectrum.
R J. 1(1), 40–46 (2009)
36. Kim, D., Kim, K.O., Oh, H.S.: Extending the scope of empirical mode decomposition by
smoothing. EURASIP J. Adv. Signal Process 2012(1), 1–17 (2012)
37. Bowman, D.C., Lees, J.M.: The Hilbert–Huang transform: a high resolution spectral method
for nonlinear and nonstationary time series. Seismol. Res. Lett. 84(6), 1074–1080 (2013)
38. Percival, D.B., Walden, A.T.: Wavelet Methods for Time Series Analysis (Cambridge Series in
Statistical and Probabilistic Mathematics). Cambridge University Press, Cambridge (2000)
39. Daubechies, I.: Orthonormal bases of compactly supported wavelets. Commun. Pure Appl.
Math. 41(7), 909–996 (1988)
40. R Core Team.: R: A language and environment for statistical computing. R Foundation
for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org/
(2014)
41. Mandic, D.P., Rehman, N.U., Wu, Z., Huang, N.E.: Empirical mode decomposition-based
time-frequency analysis of multivariate signals: the power of adaptive data analysis. IEEE
Signal Process Mag. 30(6), 74–86 (2013)
42. Alexandrov, T., Golyandina, N.: Automatic extraction and forecast of time series cyclic
components within the framework of SSA. In: Proceedings of the 5th St. Petersburg Workshop
on Simulation, June, St. Petersburg, pp. 45–50 (2005)
43. Ghil, M., Allen, M.R., Dettinger, M.D., Ide, K., Kondrashov, D., Mann, M.E., Yiou, P.:
Advanced spectral methods for climatic time series. Rev. Geophys. 40(1), 3–1 (2002)
44. Moore, J.C., Grinsted, A., Jevrejeva, S.: New tools for analyzing time series relationships and
trends. Eos. Trans. AGU 86(24), 226–232 (2005)
45. Church, J.A., White, N.J., Coleman, R., Lambeck, K., Mitrovica, J.X.: Estimates of the regional
distribution of sea level rise over the 1950–2000 period. J. Clim. 17(13), 2609–2625 (2004)
46. Church, J.A., White, N.J.: A 20th century acceleration in global sea-level rise. Geophys. Res.
Lett. 33(1), L01602 (2006)
47. Church, J.A., White, N.J.: Sea-level rise from the late 19th to the early 21st century. Surv.
Geophys. 32(4-5), 585–602 (2011)
48. Domingues, C.M., Church, J.A., White, N.J., Gleckler, P.J., Wijffels, S.E., Barker, P.M., Dunn,
J.R.: Improved estimates of upper-ocean warming and multi-decadal sea-level rise. Nature
453(7198), 1090–1093 (2008)
278 P.J. Watson

49. Hendricks, J.R., Leben, R.R., Born, G.H., Koblinsky, C.J.: Empirical orthogonal function
analysis of global TOPEX/POSEIDON altimeter data and implications for detection of global
sea level rise. J. Geophys. Res. Oceans (1978–2012), 101(C6), 14131–14145 (1996)
50. Jevrejeva, S., Moore, J.C., Grinsted, A., Woodworth, P.L.: Recent global sea level acceleration
started over 200 years ago? Geophys. Res. Lett. 35(8), L08715 (2008)
51. Meyssignac, B., Becker, M., Llovel, W., Cazenave, A.: An assessment of two-dimensional past
sea level reconstructions over 1950–2009 based on tide-gauge data and different input sea level
grids. Surv. Geophys. 33(5), 945–972 (2012)
52. Torrence, C., Compo, G.P.: A practical guide to wavelet analysis. Bull. Am. Meteorol. Soc.
79(1), 61–78 (1998)
53. Percival, D.B., Mofjeld, H.O.: Analysis of subtidal coastal sea level fluctuations using wavelets.
J. Am. Stat. Assoc. 92(439), 868–880 (1997)
54. Hassani, H., Mahmoudvand, R., Zokaei, M.: Separability and window length in singular
spectrum analysis. Comptes Rendus Mathematique 349(17), 987–990 (2011)
Modellation and Forecast of Traffic Series
by a Stochastic Process

Desiree Romero, Nuria Rico, and M. Isabel Garcia-Arenas

Abstract Traffic in a road point is counted, by a device, for more than 1 year,
giving us time series. The obtained data have a trend with a clear seasonality. This
allows us to split taking each season (1 week or 1 day, depending on the series) as
a separated path. We see the set of paths we have like a random sample of paths for
a stochastic process. In this case seasonality is a common behaviour for every path
that allow us to estimate the parameters of the model. We use a Gompertz-lognormal
diffusion process to model the paths. With this model we use a parametric function
for short-time forecasts that improves some classical methodologies.

Keywords Stochastic process • Time series prediction • Traffic flow

1 Introduction

The challenge we assume in this paper is to model and forecast the traffic flow
at a road point. We work with four time series with different features, all of them
available at the web page http://geneura.ugr.es/timeseries/itise2015 offering the time
series in CSV file format. They correspond to four monitoring devices installed in
Granada (Spain) and include data between 14 November 2012 and 07 August 2013.
These data are widely described in [4].
Four different time intervals (24 h, 1 h, 30 min and 15 min) are presented
– ITISE15-TS_A. Interval: 24 h Time Series, from 2012-11-26 to 2013-07-14. 231
values for training. Seven values for testing.
– ITISE15-TS_B. Interval: 1 h Time Series, from 2013-05-22 00:00 (W) to 2013-
08-07 00:00 (W). 1872 values for training. 24 values for testing.
– ITISE15-TS_C. Interval: 30 min Time Series, from 2012-11-24 00:00 (W) to
2013-07-21 00:00 (W). 11518 values for training. 48 values for testing.
– ITISE15-TS_D. Interval: 15 min Time Series, from 2012-11-14 00:00 (W) to
2013-02-24 00:00 (W). 9888 values for training. 96 values for testing.

D. Romero () • N. Rico • M.G. Arenas


University of Granada, Granada, Spain
e-mail: [email protected]; [email protected]; [email protected]

© Springer International Publishing Switzerland 2016 279


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_21
280 D. Romero et al.

Fig. 1 Observed values of the time series ITISE15-TS_A

Fig. 2 Observed values of the time series ITISE15-TS_B

Figures 1, 2, 3 and 4 show the time series observed data.


We deal with the modelling problem from a new point of view, using a diffusion
process model. This model is useful for fit and prediction time-dependent variables
with exponential trend [7] or Gompertz-type trend [8]. As in traffic series the trend
Modellation and Forecast of Traffic Series by a Stochastic Process 281

Fig. 3 Observed values of the time series ITISE15-TS_C

Fig. 4 Observed values of the time series ITISE15-TS_D

is a mixture between lognormal and sigmoidal shape, we choose a mixed Gompertz-


lognormal model [9]. This is helpful to model situations with a trend that changes
from increasing to decreasing [11].
In this paper we compare such diffusion process model with classic ARIMA
techniques [3, 13]. We carry out the parameter estimation for each series and the
282 D. Romero et al.

forecast with both models. Then we calculate some standard error measurements:
mean absolute error (MAE), mean absolute percentage error (MAPE), mean squared
error (MSE) and root mean squared error (RMSE).
Results show that Gompertz-lognormal diffusion process gives a better forecast
than autoregressive integrated moving average (ARIMA) model does.
The rest of the paper is organised as follows: Sect. 2 succinctly reviews the
state of the art related to time series. Section 3 describes the methodology applied.
Section 4 presents the results for the four temporal series and, ending, Sect. 5 itemise
the conclusions.

2 State of the Art

There exists a large number of approaches for modelling temporal series. The most
widely applied are ARIMA models.
ARIMA models were developed by Box and Jenkins [2] with the purpose of
searching the best fit of the values of a temporal series, to have accurate forecasts.
There are three primary stages in building an ARIMA model; seasonality identifi-
cation, estimation and validation. Although ARIMA models are quite flexible, the
most general model includes difference operators, autoregressive terms, moving
average terms, seasonal difference operators, seasonal autoregressive terms and
seasonal moving average terms. Building good ARIMA models generally requires
more experience than commonly used statistical methods.
On the other hand, diffusion process is useful for modelling time-dependent
variables that increase, usually with an exponential or sigmoidal trend [1]. In these
cases practitioners choose a model among a set of them, depending on the features
of the observed paths, the kind of data and how interpretable the parameters are.
Then, next step is to estimate the parameters of the model and use a parametric
function, such as mode or mean functions, to forecast future unknown values.
Some papers focus their attention on other related questions, such as first-passage
time [6] or additional information to add as exogenous factor for the model [5, 7, 12].
The main advantage using stochastic diffusion process is that it is not necessary
to make changes to fit the parameters because they are estimated by maximum
likelihood.
Nevertheless, using stochastic processes is not the common way to face with time
series, so we need to make a conscious processing of the data. Next section shows
the methodology we have performed.

3 Methodology

We have analysed the time series very cautiously in order to obtain the model
for the data. Firstly, we preprocess the data, doing a selection and removing
anomalous values. We describe this step in Sect. 3.1. With the resulting dataset,
Modellation and Forecast of Traffic Series by a Stochastic Process 283

we use the Gompertz-lognormal diffusion process to model the series, estimating


its parameters, and then we get the predictions, for the times proposed on the
challenge, using both the estimated conditional mean and the estimated conditional
mode functions of the model. We describe this part of the methodology at Sect. 3.2.

3.1 Preprocessing Data

The first step for preprocessing data is to split the original data series into trajectories
or paths. The path length is fixed depending on the type of the data series and the
data which we intend to forecast. By splitting the data series, we get data by weeks
(series A) or by days (series B, C and D).

3.1.1 Week-Long Paths: Series A

In this case, the data series provides the traffic flow day by day, through some
months. The challenge is to forecast the traffic flow for the following week. We
have split the data series following a weekly pattern, so each week of the data series
is a path for the model.
Some of the paths include holidays which disturb the general pattern. Because
the week that we want to predict does not have any holy day, we remove those paths
to fit the model.
We use the rest of the paths for estimating the parameters for the process and we
forecast by using the conditional mean and the conditional mode functions of the
process. Our approach uses the conditional mean and the conditional mode to the
previous step of the path for forecasting, using always the previous path in time.
This approach runs not so good in some situations, for example, if we want to
predict the traffic on Monday, the information of the previous day, Sunday, is not
representative because it is a weekend in which the behaviour is very different. This
fact also happens when we try to predict the traffic on Saturday from the information
on Friday.
The reason why the model does not work in these particular cases is because the
diffusion process is a continuous time model. However, we are using it as a discrete
time model. If the discrete values have small laps between them, the model works
properly. Opposite, we have no relevant information from the previous step for the
predictions if the discrete values are separated by a long time.
To avoid this problem and to obtain a best set of discrete values, we consider
a second approach. In this case we use a different model for each week day. We
estimate seven models, one for each day of the week. Then, we use the conditional
mean and mode functions from the previous step of the path for forecasting. With
this approach we get better results.
Figures 5 and 6 show the multiple trajectories that we take into account for
modelling with Gompertz-lognormal diffusion process.
284 D. Romero et al.

Fig. 5 Processed data of the observed values for ITISE15-TS_A time series for modelling with
Gompertz-lognormal diffusion process, first approach

Fig. 6 Processed data of the observed values for ITISE15-TS_A time series for modelling with
Gompertz-lognormal diffusion process, second approach
Modellation and Forecast of Traffic Series by a Stochastic Process 285

3.1.2 Day-Long Paths: Series B, C and D

For these cases we split the temporal series by days, getting different paths. From
this set of paths we select only those ones that provide information related to our
forecast objective. For example, for one of the cases where the required forecast is
a holy day we only consider paths of holidays. In other series, we observe that the
sample paths drastically change its behaviour from a date on, so we consider that
a good model to forecast should use only new information since this date. Finally,
some paths show anomalous behaviour because the path contains either holy days
or erroneous data. We remove those paths from the study, taking into account that
the required forecast days are working days.
Considering the selected paths, we estimate the parameters of the model and use
conditional mean and mode functions. The conditional mean and mode functions
made forecasts based on the previous path in time. In this case, though the series
considered are discrete, the continuous model fits these discrete data properly.
Figures 7, 8 and 9 display multiple trajectories, obtained by splitting the time
series, for modelling the data with the Gompertz-lognormal diffusion process.

3.2 The Gompertz-Lognormal Diffusion Process

This section is devoted to briefly describe the chosen model.

Fig. 7 Processed data of the observed values for ITISE15-TS_B time series for modelling with
Gompertz-lognormal diffusion process
286 D. Romero et al.

Fig. 8 Processed data of the observed values for ITISE15-TS_C time series for modelling with
Gompertz-lognormal diffusion process

Fig. 9 Processed data of the observed values for ITISE15-TS_D time series for modelling with
Gompertz-lognormal diffusion process
Modellation and Forecast of Traffic Series by a Stochastic Process 287

The Gompertz-lognormal diffusion process can be seen as a mixture between a


Gompertz-type process and a lognormal process since the infinitesimal moments
that define the Gompertz-lognormal diffusion process are

A1 .x; t/ D .meˇt C c/x


A2 .x; t/ D 2 x2 ;

with m > ˇ > 0, c 2 R.


Taking c D 0 the process is Gompertz-type, and for a known fixed value of ˇ the
process is a homogeneous lognormal diffusion process. In other cases, the process
has a shape with two inflection points. This makes the process useful for modelling
situations with observed paths along time that turn from increasing to decreasing or
from concave to convex.
We determine the set of parameters fm; ˇ; c; g using maximum likelihood
estimation. Since there is not an analytical solution for maximum likelihood
equation system, we solve using a numerical method; this time we choose Newton–
Raphson methodology. Other methodologies and a comparison between them can
be seen at [10]. In the estimation process we consider multiple paths of the process
and take all of them to get the estimates.
Once we get the estimate of the set of parameters fm; O cO ; O g, we use the
O ˇ;
estimated conditional mode (1) and conditional mean (2) to forecast next values.
!
c mO h ˇs
O O
i 3O2
MoŒX.t/jX.s/ D xs  D xs exp e  eˇt C cO  .t  s/ (1)
ˇO 2
!
mO h i
O O
O
MeŒX.t/jX.s/ D xs  D xs exp eˇs  eˇt C cO .t  s/ (2)
ˇO

For the first series, named ITISE15-TS_A, seasonality is weekly. In this case, as
we claimed previously, we use two strategies.
Firstly, we get the series of each week separately. In order to forecast the next
week day we use the information from the total set of previous data, removing any
anomalous data. The parameter estimation is unique for the set of paths. For each
day we want to predict, we use the information of the day immediately preceding.
Secondly, we estimate seven different models, one for each week day. For each
one, we use a sample path containing all the information about that day, eliminating
the anomalous cases. In order to do the predictions, we use the respective model and
the information that provides the previous week day value of the series.
For example, following the first strategy, the forecast of Monday is conditioned
to previous Sunday. However, following the second strategy, the forecast of Monday
is conditioned to previous Monday.
These two ways are named with subscripts 1 and 2, respectively.
The rest of the series, named ITISE15-TS_B, ITISE15-TS_C and ITISE15-
TS_D, have daily seasonality, the differences are only in time lapse measures. In
288 D. Romero et al.

these cases we use multiple paths of day-length to estimate the parameters. The
forecast of new values is conditioned to the previous path in each case.
In order to compare the obtained results with classical techniques, we have
carried out an ARIMA model for the same series. For this analysis we did not split
the temporal series and we did not preprocessed the data (holidays and erroneous
data are not removed).

4 Results

We resume the forecast values jointly with the real observed values in
Figs. 10, 11, 12 and 13. They show that both conditional mode and mean functions
work better than ARIMA predictions. As we expected, the conditional mode
function works better than conditional mean function. It is an expectable result
because conditional mode function shows more probable values and conditional
mean function shows average results.
Note that lognormal variables have not symmetric density probability function.
Right asymmetry provokes the mean has higher value than the mode. Since for each
fixed time t, the variable X.t/ is lognormal distributed, then is expectable that mean
function exceeds the mode function.
Tables 1, 2, 3 and 4 show the error measurements for all series, using the
conditional mode function and the conditional mean function of the Gompertz-

Fig. 10 Observed values, conditional mode forecast, conditional mean forecast and ARIMA
model forecast for the seven test values of ITISE15-TS_A series
Modellation and Forecast of Traffic Series by a Stochastic Process 289

Fig. 11 Observed values, conditional mode forecast, conditional mean forecast and ARIMA
model forecast for the 24 test values of ITISE15-TS_B series

Fig. 12 Observed values, conditional mode forecast, conditional mean forecast and ARIMA
model forecast for the 48 test values of ITISE15-TS_C series

lognormal diffusion process, and using ARIMA model. These errors are calculated
only for the forecast time.
Note that Table 1 resumes results for two different predictions with Gompertz-
lognormal diffusion process and for ARIMA. Firstly, with subscript 1, there are
prediction errors for the week, forecasting each value for the day using the
290 D. Romero et al.

Fig. 13 Observed values, conditional mode forecast, conditional mean forecast and ARIMA
model forecast for the 96 test values of ITISE15-TS_D series

Table 1 Error measurements for the predictions of the temporal series in dataset ITISE15-TS_A
MAE MAPE (%) MSE RMSE
Cond. mean1 2261:51 17:17 12264729:06 3502:10
Cond. mode1 2297:23 19:23 9732040:16 3119:62
Cond. mean2 464:86 13:81 338310:52 581:64
Cond. mode2 419:16 14:7 261344:34 511:22
ARIMA model 1227:06 7:67 2017643:49 1420:44
Data are fitted by a Gompertz-lognormal diffusion process and by an ARIMA model. Best results
are in bold

Table 2 Error measurements for the predictions of the temporal series in dataset ITISE15-TS_B
MAE MAPE (%) MSE RMSE
Cond. mean 160:03 30:30 37618:16 193:95
Cond. mode 46:40 14:79 3935:26 62:73
ARIMA model 291:12 223:28 113569:38 337:00
Data are fitted by a Gompertz-lognormal diffusion process and by an ARIMA model. Best results
are in bold

information about the previous day. With subscript 2, we predict each value using
the information about previous week.
Modellation and Forecast of Traffic Series by a Stochastic Process 291

Table 3 Error measurements for the predictions of the temporal series in dataset ITISE15-TS_C
MAE MAPE (%) MSE RMSE
Cond. mean 67:88 31:71 7272:40 85:28
Cond. mode 34:72 16:68 2665:53 51:63
ARIMA model 226:10 193:55 68117:50 260:99
Data are fitted by a Gompertz-lognormal diffusion process and by an ARIMA model. Best results
are in bold

Table 4 Error measurements for the predictions of the temporal series in dataset ITISE15-TS_D
MAE MAPE (%) MSE RMSE
Cond. mean 33:54 21:51 2300:81 47:97
Cond. mode 19:85 31:10 950:31 30:83
ARIMA model 85:15 70:87 11674:84 108:05
Data are fitted by a Gompertz-lognormal diffusion process and by an ARIMA model. Best results
are in bold

5 Conclusions

As we can appreciate, after getting off seasonality, the use of diffusion process is
helpful in temporal series prediction problems, when they are continuous enough.
Occasionally, we need to select the paths from the temporal series. That allows us
to work with the data and the diffusion process. In general, the fitting of the model
to the series allows better predictions, with lower error than classic model ARIMA.
Moreover, excepting MAPE in Tables 1 and 4, the rest of measurements point out
that the conditional mode function of the Gompertz-lognormal diffusion process
provides better predictions for all the temporal series. This can be due to the
finite dimensional distributions of the process are lognormal, and lognormal is an
asymmetrical distribution. The asymmetry implies the mode function predicts lower
values than mean function.
The main problem that this treatment for the series presents is the required work
to preprocess. Nevertheless, this is easily automationable because it is based on
obvious and simple procedures.
Other methods to obtain predictions like logistic regression, multilayer per-
ceptrons and support vector machine have been used [4]. We can see the error
measurement are not better than the obtained using Gompertz-lognormal diffusion
process joint with the preprocess of the data. All error measurements are lower when
we use the Gompertz-lognormal diffusion process except MAPE in Table 1.

Acknowledgements This work has been supported in part by FQM147 (Junta de Andalucía),
SIPESCA (Programa Operativo FEDER de Andalucía 2007–2013), TIN2011-28627-C04-02 and
TIN2014-56494-C4-3-P (Spanish Ministry of Economy and Competitivity), SPIP2014-01437
(Dirección General de Tráfico), PRY142/14 (Este proyecto ha sido financiado íntegramente por la
Fundación Pública Andaluza Centro de Estudios Andaluces en la IX Convocatoria de Proyectos de
Investigación) and PYR-2014-17 GENIL project (CEI-BIOTIC Granada). Thanks to the Granada
Council, Concejalía de protección ciudadana y movilidad.
292 D. Romero et al.

References

1. Baudoin, F.: Diffusion Processes and Stochastic Calculus. EMS Textbooks in Mathematics
Series, vol. 16. European Mathematical Society, Zurich (2014)
2. Box, G., Jenkins, G.: Some comments on a paper by Chatfield and Prothero and on a review
by Kendall. J. R. Stat. Soc. 136(3), 337–352 (1973)
3. Box, G., Jenkins, G.: Time Series Analysis: Forecasting and Control. Holden Day, San
Francisco (1976)
4. Castillo, P.A., Fernandez-Ares, A.J., G-Arenas, M., Mora, A., Rivas, V., Garcia-Sanchez, P.,
Romero, G., Garcia-Fernandez, P., Merelo, J.J.: SIPESCA-competition: a set of real data time
series benchmark for traffic prediction. In: International Work-Conference on Time Series
Analysis, Proceedings ITISE 2015, pp. 785–796 (2015)
5. García-Pajares, R., Benitez, J., Palmero, S.G.: Feature selection form time series forecasting:
a case study. In: Proceedings of 8th International Conference on Hybrid Intelligent Systems,
pp. 555–560 (2008)
6. Gutiérrez, R., Román, P., Torres, F.: Inference and first-passage-times for the lognormal
diffusion process with exogenous factors: application to modelling in economics. Appl. Stoch.
Models Bus. Ind. 15, 325–332 (1999)
7. Gutiérrez, R., Román, P., Romero, D., Torres, F.: Forecasting for the univariate lognormal
diffusion process with exogenous factors. Cybern. Syst. 34(8), 709–724 (2003)
8. Gutiérrez, R., Román, P., Romero, D., Serrano, J.J., Torres, F.: A new Gompertz-type diffusion
process with application to random growth. Math. Biosci. 208, 147–165 (2007)
9. Rico, N., Román, P., Romero, D., Torres, F.: Gompertz-lognormal diffusion process for
modelling the accumulated nutrients dissolved in the growth of Capiscum annuum. In:
Proceedings of the 20th Annual Conference of the International Environmetrics Society, vol. 1,
p. 90 (2009)
10. Rico, N., G-Arenas, M., Romero, D., Crespo, J.M., Castillo, P., Merelo, J.J.: Comparing
optimization methods, in continuous space, for modelling with a diffusion process. In:
Advances in Computational Intelligence. Lecture Notes in Computer Science, vol. 9095,
pp. 380–390. Springer, Heidelberg (2015)
11. Romero, D., Rico, N., G-Arenas, M.: A new diffusion process to epidemic data. In: Computer
Aided Systems Theory - EUROCAST 2013. Lecture Notes in Computer Science, vol. 8111,
pp. 69–76. Springer, Heidelberg (2013)
12. Sa’ad, S.: Improved technical efficiency and exogenous factors in transportation demand for
energy: an application of structural time series analysis to South Korean data. Energy 35(7),
2745–2751 (2010)
13. Shumway, R.H., Stoffer, D.S.: ARIMA models. In: Time Series Analysis and Its Applications.
Springer Texts in Statistics. Springer, New York (2011)
Spatio-Temporal Modeling for fMRI Data

Wenjie Chen, Haipeng Shen, and Young K. Truong

Abstract Functional magnetic resonance imaging (fMRI) uses fast MRI techniques
to enable studies of dynamic physiological processes at a time scale of seconds.
This can be used for spatially localizing dynamic processes in the brain, such as
neuronal activity. However, to achieve this we need to be able to infer on models of
four-dimensional data. Predominantly, for statistical and computational simplicity,
analysis of fMRI data is performed in two-stages. Firstly, the purely temporal
nature of the fMRI data is modeled at each voxel independently, before considering
spatial modeling on summary statistics from the purely temporal analysis. Clearly,
it would be preferable to incorporate the spatial and temporal modeling into one
all encompassing model. This would allow for correct propagation of uncertainty
between temporal and spatial model parameters. In this paper, the strengths and the
weaknesses of currently available methods will be discussed based on hemodynamic
response (HRF) signal modeling and spatio-temporal noise modeling. Specific
application to a medical study will also be described.

1 Introduction

Functional magnetic resonance imaging (fMRI) is based on the blood-oxygen level


dependent (BOLD) principle. When neurons are active they consume oxygen,
which then leads to increased blood flow to the activated area. As a result, neural
activities can be inferred from the BOLD signals. In fact, the BOLD signal

Research supported by NSF DMS 1106962.


W. Chen ()
American Insurance Group, Inc., New York, NY, USA
H. Shen
Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC
27599, USA
Y.K. Truong
Department of Biostatistics, The University of North Carolina, Chapel Hill, NC 27599, USA

© Springer International Publishing Switzerland 2016 293


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_22
294 W. Chen et al.

from fMRI scanner has been shown to be closely linked to neural activity [11].
Through a process called the hemodynamic response, blood releases oxygen to
active neurons at a greater rate than to inactive ones. The difference in magnetic
susceptibility between oxyhemoglobin and deoxyhemoglobin, and thus oxygenated
or deoxygenated blood, leads to magnetic signal variation which can be detected
using an MRI scanner. The relationship between the experimental stimulus and
the BOLD signal involves the hemodynamic response function (HRF). For the
simplicity of illustration, we denote the relationship by

BOLD D F.STIMULUS; HRF/ C NOISE:

Several pioneer papers have used experiments to confirm that the functional form
of F is approximately linear based on some typical STIMULUS such as auditory
or finger tapping. Furthermore, it has been reported that the fMRI time series is
hampered by the hemodynamic distortion. These effects result from the fact that
the fMRI signal is only a secondary consequence of the neuronal activity. See, for
example, Glover [8]. These observations have simplified the above model greatly
leading to the following popular model adopted by many fMRI studies:

BOLD D convolution.STIMULUS; HRF/ C NOISE: (1)

Estimating or determining the HRF is important for the correct interpretation of


neural science and human brain research. For example, standard fMRI analysis
packages such as Statistical Parametric Map (SPM) [7] and FMRIB Software
Library (FSL) [9] developed statistical inference based on

BOLD D ˇ  convolution.STIMULUS; HRF/ C NOISE: (2)

by estimating the parameter ˇ under the following assumptions:


1. The HRF takes a pre-specified function known as the double-gamma function in
the literature [8]:

HRF.t/ D c1 tn1 exp.t=t1 /  a2 c2 tn2 exp.t=t2 /;


ci D max.tni exp.t=ti //; i D 1; 2:

2. The noise takes on simple time series models such as AR(1).


Some of the earlier papers also reported that the HRF may vary according to the
location of the neurons and the type of stimulus [2, 4]. For example, the parameters
ni and ti depend on the motor or auditory regions of the brain. Thus flexible
nonparametric estimation of HRF is an important problem in fMRI that allows
different HRFs for different subjects and different brain regions. The identification
of the HRF has been studied under event-related designs because the design
Spatio-Temporal Modeling for fMRI Data 295

paradigm allows the HRF to return to baseline or recover after every trial. Most of
the approaches are time domain based because of the spiking nature of the stimuli.
In this paper, we introduce a Fourier method based on the transfer function
estimate (TFE). This is a frequency domain nonparametric method for extracting
the HRF from any kind of experimental designs, either block or event-related.
This approach also allows TFE to detect the activation through test of hypotheses.
Moreover, it can be further developed to validate the linearity assumption. This
is very important as reported in some of the earlier papers, the BOLD response
is an approximate linear function of the stimuli and the HRF, and the linearity
assumption may not hold throughout the brain. These desirable features of TFE will
be demonstrated using on simulation and real data analysis. More specifically,
– We first extend Bai et al.’s [1] method to multivariate form to estimate multiple
HRFs simultaneously using ordinary least square (OLS). We also verified and
reported the consistency and asymptotic normality of the OLS estimator.
– TFE detects the brain activation while estimating HRF by providing the F map
and also tests the linearity assumption inherited from the convolution model.
– TFE is able to compare the difference among multiple HRFs in the experiment
design.
– TFE adapts to all kinds of experiment designs, and it does not depend on the
pre-specified HRF length support.
The present paper is based on the first author’s Ph.D. dissertation [5] in which
the methodology and its statistical sampling properties are described in more
details. The current approach is based on the OLS method, a weighted least square
(WLS) version for estimating the HRF has been implemented and a more detailed
discussion on these approaches to fMRI can be found therein.

2 Method

In this section, we will outline how we developed TFE in order to make it feasible
to estimate HRF under a multi-stimulus design experiment.
Using multiple stimuli in one experiment session is in high demand as it obtains
stronger response signal compared to the single stimulus design. In the view of
human brain’s biological process, the advantage from multiple stimuli is to avoid the
refractory effect, which causes nonlinearity in the response over time [6]. Subjects
are easy to get bored if only one stimulus shows repeatedly in a session. In the
view of experiment design, multiple stimuli are helpful to make efficient design in a
limited time that the scanner provides. Thus the change of stimulus over successive
trials is beneficial for experiment design.
296 W. Chen et al.

2.1 Model

Consider the BOLD response model given by

y.t/ D h1 ˝ x1 .t/ C h2 ˝ x2 .t/ C    C hn ˝ xn .t/ C z.t/; t D 0; : : : ; T  1; (3)

where xi .t/ represents the ith stimulus function, hi .t/ is the corresponding P HRF,
and ˝ is the binary convolution operator define by x ˝ h.t/ D u x.u/h
.t  u/. We assume that the error or noise series, z.t/, is stationary with 0 mean
and power spectrum szz .r/, where r is the radian frequency. Here T is the duration
of the experimental trial.
To write the convolution model (3) in a matrix form, let x.t/ be an n vector-valued
series, i.e., x.t/ D .x1 .t/; x2 .t/; : : : ; xn .t// . Suppose that h.u/ is a 1  n filter given
by h.u/ D .h1 .u/; h2 .u/; : : : ; hn .u//. The BOLD model (3) to be considered in this
paper is
X
y.t/ D h.u/x.t  u/ C z.t/: (4)
u

We further assume that the HRF h.u/ is 0 when u < 0 or u > d, where d is the
length of HRF latency determined by underlying neural activity.
P Let H./ denote the finite Fourier transform (FFT) of h./ given by H.r/ D
t h.t/ exp.irt/. Define similarly

X
T1 X
T1
Y.r/  Y .T/ .r/ D y.t/ exp.irt/; X.r/  X.T/ .r/ D x.t/ exp.irt/;
tD0 tD0

X
T1
Z.r/  Z .T/ .r/ D z.t/ exp.irt/; r 2 R:
tD0

By the properties of FFT, we have

Y.r/ D H.r/X.r/ C Z.r/; r 2 R:

If H./ is smooth, then it can be estimated reasonably well by a local approximation


of the above relationship. Specifically, let K be an integer with 2K=T near radian
frequency r. Suppose T is sufficiently large. From the asymptotic property of FFT.

Y.2.K Ck/=T/DH.r/X.2.K
P Ck/=T/CZ.2.K Ck/=T/; k D 0; ˙1; : : : ; ˙m:
(5)
Here m is an appropriate integer to be specified, which is related to the degree of
smoothing the Fourier transform in estimating the spectral density function.
Spatio-Temporal Modeling for fMRI Data 297

By applying Fourier transform, the convolution in time domain is transformed


to the product in frequency domain. The product forms the linear relationship
between Y./ and X./, which makes the estimation very much similar to the typical
linear regression setting. Also, applying Fourier transform is a good way to avoid
estimating the autocorrelation in the time domain. When we deal with Fourier
coefficients instead of time points, these coefficients are asymptotically uncorrelated
at different frequencies.

2.2 OLS Estimate

Relation (5) is seen to have the form of a multiple regression relation involving
complex-valued variates provided H.r/ is smooth. In fact, according to [3], an
efficient estimator is given by

O
H.r/ D sOyx .r/Osxx .r/1 ; r 2 R: (6)

Here sOxx is a smooth or window-type estimator of the auto-spectral density function


sxx of the vector-valued processes x, and sOyx is defined similarly as a smooth
estimator of the crossed-spectra syx of the processes y and x.
A reasonable estimator of the HRF h is obtained by the inverse FFT of H: O

X
T1
Oh.u/ D 1 O 2t exp.i2tu=T/;
H u D 0; 1; : : : ; d: (7)
T tD0 T

It has been shown that h.u/O has attractive sampling properties that it is an
asymptotically consistent and efficient estimate of h.u/.

3 Hypothesis Testing

After introducing TFE method, this section introduces the multivariate tests for
fMRI analysis.

3.1 Key Concepts

First, we introduce two key concepts [3] here to support the hypothesis testing.
Coherence is an important statistic that provides a measure of the strength of
a linear time invariant relation between the series y.t/ and the series x.t/, that is, it
indicates whether there is a strongly linear relationship between the BOLD response
298 W. Chen et al.

and the stimuli. From a statistical point of view, we can test the linear time invariant
assumption for the convolution model; for the fMRI exploration, we can choose
the voxels with significant large coherence where the BOLD series have functional
response to the stimulus, and then estimate the HRF in those voxels.
Coherence is defined as
ˇ ˇ
ˇRyx .r/ˇ2 D syx .r/sxx .r/1 sxy .r/=syy .r/: (8)

Coherence is seen as a form of correlation coefficient, bounded by 0 and 1. The


closer to 1, the stronger linear time invariant relation between y.t/ and x.t/.
The second concept is partial coherence. If we look at the stimulus individually,
it is interesting to consider the complex analogues of the partial correlations or
partial coherence. The estimated partial cross-spectrum of y.t/ and xi .t/ after
removing the linear effects of other xj .t/ is given by

syxi xj .r/ D syxi .r/  syxj .r/sxj xj .r/1 sxj xi .r/: (9)

Usually the case of interest is the relationship between the response and a single
stimulus after the other stimuli are accounted for, that is, xi is the single stimulus of
interest, and xj is the other stimuli involved in the design paradigm.
The partial coherence of y.t/ and xi .t/ after removing the linear effects of xj .t/ is
given by

ˇ ˇ syxi xj .r/2


ˇRyx x .r/ˇ2 D : (10)
i j
syyxj .r/sxi xi xj .r/

If n D 2, that is, if there are two kinds of stimulus in the experiment, it can be
written as
ˇ ˇ
ˇ ˇ ˇRyx .r/  Ryx .r/Rx x .r/ˇ2
ˇRyx x .r/ˇ2 D ˇ
i
ˇ2
j
ˇ
i j
ˇ2 : (11)
Œ1  ˇRyx .r/ˇ Œ1  ˇRx x .r/ˇ 
i j
j i j

Partial coherence is especially important when we focus on a specific stimulus.


Not all stimuli are considered in equal measure. Stimuli such as the heart beat
and breathing, which cannot be avoided in any experiment involving humans,
are of secondary concern. Furthermore, as each type of stimulus has its own
characteristics, it is natural to perform an individual statistical analysis to see how
each one affects the overall fMRI response.
Spatio-Temporal Modeling for fMRI Data 299

3.2 Testing the Linearity

The linearity assumption functions as the essential basis of the convolution model.
As we know, any nonlinearity in the fMRI data may be caused by the scanner system
or the human physical capability such as refractory. Refractory effects refer to the
reductions in hemodynamic amplitude after several stimuli presented. If refractory
effects are present, then a linear model will overestimate the hemodynamic response
to closely spaced stimuli, potentially reducing the effectiveness of experimental
analyses. It is critical, therefore, to consider the evidence for and against the linearity
of the fMRI hemodynamic response.
It is possible that the nonlinearity is overwhelmed during scanning. Conse-
quently, it is crucial to make sure that the linearity assumption is acceptable.
The advantage of our method is that we can first determine whether the linearity
assumption is acceptable before using the convolution model for analysis.
The value of coherence, between 0 and 1, reflects the strength of the linear
relation between fMRI response and the stimuli. Under certain conditions, RO Yx .r/
is asymptotically normal with mean Ryx .r/ and variance proportional to constant
.1  R2yx .r//=Tb. Moreover, if Ryx D 0, then

.c  n/jRO yx .r/j2
F.r/ D F2n;2.cn/ ; (12)
n.1  jRO yx .r/2 j/
R
where c D bT= and  D  2 with  being the lag-window generator depending
on the choice of window function. If the F statistic on coherence is significant, it is
reasonable to accept the linearity assumption.

3.3 Testing the Effect from a Specific Stimulus

For each brain area, stimuli have varying effects. For the motor cortex in the left
hemisphere, right-hand motion causes much more neural activities than left-hand
motion. Partial coherence is able to distinguish between right- and left-hand effects,
determine whether left-hand motion evokes neural activity, and identify which
motion has greater effect. The following test is applied for these kinds of research
questions.
For partial coherence, if Ryxi xj D 0, then

c0 jRO yxi xj .r/j2


F.r/ D F2;2.c0 1/ ; (13)
1  jRO yxi xj .r/2 j
R
where c0 D bT=  n C 1 and  D  2 with  being the lag-window generator.
300 W. Chen et al.

3.4 Detecting the Activation

The HRF in fMRI indicates the arising neural activity. If there is activation evoked
by the stimulus, then the corresponding HRF cannot be ignored. If there is no HRF
in a brain region, there is no going-on neuronal activity. To detect activation in the
brain region is to see whether there is underlying HRF. For our frequency method,
we test H.r0 / D 0 at stimulus-related frequency r0 .
We are interested in testing the hypothesis H.r/ D 0. This is carried out by means
of analogs of the statistic (8). In the case H.r/ D 0,

O sxx .r/H.r/
.bT= /H.r/O O 
(14)
nOszz .r/

is distributed asymptotically as F2;2.bT= n/ .

3.5 Testing the Difference Between HRFs

The multivariate method simplifies the functional hypothesis testing by comparing


the corresponding Fourier coefficients at frequency r in order to see whether there
is any discrepancy between HRF curves corresponding to different stimuli. HRFs
curves are functions, but when we focus just on Fourier coefficients at frequency r,
we look at a common hypothesis testing on points. To see the difference between the
two HRFs, it is enough to consider the hypothesis that the two Fourier coefficients
at task-related frequency r0 are equivalent.
As we know H./ is the Fourier transform of h./. For the contrast hypothesis to
compare HRF functions c h D 0, we have the equivalent hypothesis c H.r/ D 0,
where r is usually the task-related frequency r0 .
In OLS method, we know the distribution of H.r/O  is asymptotically

NnC .H.r/ ; szz .r/†/; H.r/ 2 Cn ; (15)

where

.bT= /1 sOxx .r/1 r ¤ 0 mod 
†D :
.bT=  1/1 sOxx .r/1 r D 0 mod 

NnC .; / is the complex multivariate normal distribution for the n vector-valued
random variable [3].
The contrast between different HRF estimates P O ,
can be represented by c H.r/
 n
where c D .c1 ; c2 ; : : : ; cn / which satisfies that iD1 ci D 0. For the complex
number c H.r/O  , the hypothesis testing can be conducted by the definition of
Spatio-Temporal Modeling for fMRI Data 301

complex normal distribution, which converts complex normal to multivariate normal


distribution. " #
Re O
H.r/ 
 O
Under the hypothesis c H.r/ D 0, .c c/ O  , denoted by cv Hv , where
Im H.r/
cv D .c1 ; : : : ; cn ; c1 ; : : : ; cn /, is distributed asymptotically as

1
N 0; szz .r/cv † v cv : (16)
2

At the same time the distribution of sOzz .r/ is approximated by an independent

szz .r/22.bT= n/


; r ¤ 0 mod : (17)
2.bT=  n/

Thus, the t statistic for the contrast between different HRF estimates is

O v .r/
cv H
q t2.bT= n/ ; r ¤ 0 mod : (18)
2.bT= n/
bT=
sOzz .r/cv † 1
v cv

The contrast is highly utilized in fMRI to point out the discrepancy of responses
in different conditions. In the fMRI softwares SPM and FSL, we need first to specify
“conditions,” which is analogous to types of stimuli here. For example, if we have
two types of stimuli from the right and left hands, there are two conditions: Right
and Left. Then we need to set up the contrast of conditions according to our interest.
For testing whether the right hand has greater effect than the left hand, the contrast
should be Right > Left, equivalent to Right  Left > 0. So we state the contrast in a
vector c D .1; 1/, 1 for the condition Right and 1 for the condition Left. After
settling the contrast, SPM and FSL will continue their general linear model, using
parameters to conduct the t statistic.
The hypothesis of comparing the HRF similarity here equates the contrasts in
SPM and FSL. We have two types of stimuli: Right and Left. Then we have
respective HRF estimates for Right and Left. To test whether Right > Left, we
O  . As the result, t statistic in (18) is used for testing
specify c D .1; 1/ in c H.r/
their difference.

3.6 Remarks

We will make a few remarks or comments on our method in contrast with the
existing methods.
1. Some popular packages [7, 9] estimate the activation based on (3) and (2) using
general linear models (GLM) procedure. The model considered in SPM and FSL
302 W. Chen et al.

is given by

y.t/ D ˇ1 h1 ˝x1 .t/Cˇ2 h2 ˝x2 .t/C  Cˇn hn ˝xn .t/Cz.t/; t D 0; : : : ; T 1;


(19)
where the HRFs h1 ; : : : ; hn are pre-specified (such as the double-gamma function
described in Sect. 1) and the time series z.t/ is a simple AR(1) model with
two parameters to be estimated. Model (19) is a linear regression model with
ˇ D .ˇ1 ; : : : ; ˇn / as the regression parameters, and the design matrix is formed
by the convolution of the stimuli x.t/ and the double-gamma functions h.t/. The
estimates of ˇ1 ; : : : ; ˇn are obtained by the least squares (LS) method which
may be consistent (depending on the specification of the HRF) but may not
be efficient. The latter is caused by the mis-specification of the AR(1) and the
standard errors (se) of the LS estimates may be large. Another challenge is to
conduct the voxel-wise significant test of activation. Namely, at each voxel, the
test is based on the t-statistic given by

ˇO
tD
O
se.ˇ/

whose p-value may be over- or under-estimated depending on whether the


standard error se.ˇ/ O has been correctly estimated, which in turn depends on
the specification of the model for the (noise) time series z.t/. This will result
in a higher false-positive or false-negative rates in detecting activation. One may
address the estimation of the model for z.t/, but this will be inefficient in fitting
and testing a correct time series model at each voxel. The default setting in those
packages usually starts with AR(1) at the expenses of specification error.
This problem can be remedied using the TFE approach described above.
Apply the discrete Fourier transform to (3) yields two important results at
once. First, the TFE of HRF is direct (without introducing the ˇ’s) and
consistent estimate. Second, the FFT of the noise time series is uncorrelated
(exactly independent in the Gaussian case), and consequently, the F-test (14) is
consistent for testing activation. Note that our F-test is described in the frequency
domain, which should not be confused with the model-based F-test in SPM or
FSL described in time domain, which is easily accessible by the majority of
statisticians. In the current setting, however, Fourier transformation provides a
direct and highly interpretable activation detection.
2. The TFE method was thought to be only applicable to block designs with some
forms of periodicity. Hence the popular event-related designs have ruled out the
use of it. This is certainly not the case as one can see from [3]. The current paper
demonstrates the usefulness of the TFE through simulation and real data sets
taken from the above package sites for the sake of comparison. More results can
be found in [5].
3. The TFE also offers a check on the linearity assumption in (3) using the F-
statistics based on the coherence (12). On the other hand, the time domain method
Spatio-Temporal Modeling for fMRI Data 303

will further need to verify the error auto-covariance structure for this step, which
can be inefficient for the whole brain.
4. A key issue to be addressed in using the TFE method is the smoothing parameter
in estimating various auto- and cross-spectra. Some data-adaptive procedures
such as cross-validation have been investigated in [5] and the HRF shape has
made such a choice less challenging.
5. When the voxel time series is non-stationary, then both time and frequency
domain methods will not work that well. In that situation, one can invoke the
short time (or windowed) Fourier transform method (a special case of wavelets
analysis) to estimate the HRF, this will be carried out in another paper.
6. An extensive review is given in one of the chapters of Chen [5]. Here the main
objective is to highlight some important contributions to the problem of HRF
estimation using Fourier transform. This type of deconvolution or finite impulse
response (FIR) problems has been known to the area of signal processing.
The various forms of the test statistics (12), (13), however, have not been
appropriately addressed. Hence the aim of the current paper is to illustrate the
usefulness of the statistical inference tools offered by this important frequency
domain method.

4 Simulation

The simulation study was based on a multiple stimuli experiment design and a
simulated brain. The experiment in the section included two types of stimuli, called
left and right. The simulated brain had 8  8 voxels, which was designed to have
various brain functions in the left and right experiment design. The brain was
divided into four regions: one only responded to left, one only responded to right,
one can respond to both left and right, and the remaining one had no response in the
experiment.
The fMRI data was simulated based on the convolution model (3): the con-
volution of the pre-specified left HRF h1 ./ and right HRF h2 ./, and the known
experiment paradigm for left stimulus x1 .t/ and right stimulus x2 .t/. The response
was given by

Y.t/ D h1 ˝ x1 .t/ C h2 ˝ x2 .t/ C z.t/; t D 0; : : : ; T  1 (20)

with T D 600. The noise was generated from an ARMA(2, 2):

z.t/  0:8897z.t  1/ C 0:4858z.t  2/ D w.t/  0:2279w.t  1/ C 0:2488w.t  2/;


w.t/ iid N.0; 0:22 /; t D 0; 1; : : : ; 599:
304 W. Chen et al.

The ARMA was chosen to test the strength of our method under other types of
correlated structures, and the coefficients were selected to illustrate the performance
of the procedure under moderate, serially correlated noise.
The illustrated experiment paradigm and the brain map are shown in Fig. 1. This
was an event-related design with left and right interchanging, where the green stands
for the left, and the purple stands for the right. The left and right stimuli came
periodically. The stimulus-related frequency for each was 1=40, and the overall
task-related frequency for the experiment was 1=20. The brain map shows the brain
function in each region (Fig. 1b).
The first simulation was to detect the activation regions in the brain. We assumed
both of the original HRFs for left and right were Glover’s HRF. At the experiment
frequency 1=20, the coherence and F statistic map are shown in Fig. 2. The lighter
the color is, the higher the values are. The high value of coherence in the responsive
region implies a strong linear relation in the simulation. Also, the F statistic
represents the strength of activation in the voxel. As expected, there were three

Fig. 1 The simulated brain map in the simulation. (a) shows the experiment design of the
simulation with two kinds of stimuli, which are finger taping on the right (shown in purple) and
on the left (shown in green). (b) is the simulated brain map: the purple region only responds to
the right-hand stimulus; the green region only responds to the left-hand stimulus; the brown region
responds to both left and right; and the white region has only noise. Originally published in Chen
(2012). Published with the kind permission of © Wenjie Chen 2012. All Rights Reserved
Spatio-Temporal Modeling for fMRI Data 305

Fig. 2 Detecting the activation regions by TFE. The activation region is where the brain has
response to the experiment stimulus. (a) shows that the true HRFs for both left and right are the
same. (b) shows the coherence value obtained in voxels (the red color means high intensity, and the
yellow indicates low intensity). (c) shows the corresponding F statistic, called F map. As shown
in (d), both right and left activated regions (marked in red) are detected. Originally published in
Chen (2012). Published with the kind permission of © Wenjie Chen 2012. All Rights Reserved

Fig. 3 Hypothesis testing with two identical HRFs in the simulated brain. (a) shows the Glovers
HRF for both left and right. (b) shows the overall t statistic over the brain map, where red
color means high positive values, green color means negative values, and yellow means near 0.
(c) shows the rejection region for the test: left  right; (d) shows the rejection region for
left right; (e) shows the rejection region for left = right. Originally published in Chen (2012).
Published with the kind permission of © Wenjie Chen 2012. All Rights Reserved

activation regions: Left (L), Right (R), Left and Right (L&R), which was selected
at ˛ D 0:01 level.
At the stimulus-related frequency 1=40 level, we compared the similarity of left
and right HRFs. The true left and right HRFs are the same Glover’s HRF. The HRF
discrepancy region is only two regions: Left (L) and Right (R), where we regarded
no response as zero HRF. The simulation result is displayed in Fig. 3. The rejection
region for L>R is the region L; the rejection region for L<R is the region R; the
rejection region for L¤R is L&R at level ˛ D 0:05. As we can see, if the voxel
has the same response to different stimuli, it shows in the result that there is no
difference in the HRFs.
306 W. Chen et al.

Fig. 4 Detecting the activation regions by TFE with non-identical HRFs. The activation region is
where the brain has response to the experiment stimulus. (a) shows the true HRFs for both left
(green) and right (purple). (b) shows the coherence obtained in voxels (the red color means high
intensity, and the yellow indicates low intensity). (c) shows the corresponding F statistic, F map.
As shown in (d), both right and left activated regions (marked in red) are detected. Originally
published in Chen (2012). Published with the kind permission of © Wenjie Chen 2012. All Rights
Reserved

Fig. 5 Hypothesis testing with two non-identical HRFs in the simulated brain. (a) shows the
Glovers HRF for left (green) and half Glovers HRF for right (purple). (b) shows the overall
t statistic over the brain map, where red color means high positive values, green color means
negative values, and yellow means near 0. (c) shows the rejection region for the test: left  right;
as the left HRF has much higher amplitude than the right one, the rejection region for the test
left  right is the two regions that response to left-hand stimulus. (d) shows the rejection region
for right left; (e) shows the rejection region for left = right. Originally published in Chen (2012).
Published with the kind permission of © Wenjie Chen 2012. All Rights Reserved

The second simulation was built on different HRFs. The left HRF kept Glover’s
HRF, and the right HRF reduced Glover’s HRF to half. As we can see, the left and
right HRFs had different amplitudes. The left was larger than the right one. At the
experiment frequency 1=20, the activation region is accurately spotted in Fig. 4.
At the individual-stimulus-related frequency 1=40, the difference between left
and right HRF was detected, as shown in Fig. 5. The rejection region for L>R
contains the regions that respond to both L and R. The hypothesis testing of similar
HRFs clearly separated different HRFs.
Spatio-Temporal Modeling for fMRI Data 307

5 Real Data Analysis

5.1 Auditory Data

In order to test whether the method of Bai et al. [1] is applicable to real data
and detect fMRI activation, we applied the nonparametric method to the published
auditory data set on the Statistical Parametric Mapping website (http://www.fil.ion.
ucl.ac.uk/spm/data/auditory/). According to the information listed on the website,
these whole brain BOLD/EPI images were acquired on a modified 2T Siemens
MAGNETOM Vision system. Each acquisition consisted of 64 contiguous slices
.64  64  64; 3 mm  3 mm  3 mmvoxels/. Acquisition took 6.05 s, with the scan
to scan repeat time (TR) set arbitrarily to 7 s. During the experiment 96 acquisitions
were made in blocks of 6 that resulted in 16 42-s blocks. The blocks alternated
between rest and the auditory stimulation. We included eight trials in our dataset,
with the first six images acquired in the first run discarded due to T1 effects. The data
was preprocessed using SPM5, and included realignment, slice timing correction,
coregistration, and spatial smoothing.
Figure 6a shows the time course data from the one voxel that had the greatest F
value in Eq. (14). The voxel time series depicted in Fig. 6a has been detrended [10]
because the trends may result in false-positive activations if they are not accounted
for in the model. Since the voxel has a high F value, its time series has good
relationship to the task, similar to the pattern we obtained in our second simulation.
Figure 6b shows several HRF estimates from the 12 voxels with the highest F-
values in the brain. The majority of the HRF estimates closely match the HRF
shape, showing the increase in the signal that corresponds to the HRF peak and
some even depicting the post-dip after the peak signal. The TR for this dataset is 7 s,
which corresponds to the time interval between the acquisition of data points. This
fMRI signal
100
−100

0 100 200 300 400 500 600


Time

(a) Raw FMRI data from the voxel with the grea test F value (b) HRF estimates

Fig. 6 HRF estimation for auditory data. (a) is the experimental design paradigm (the red dashed
line) for the auditory data. The solid line is fMRI response from an activated voxel over time;
(b) is the HRF estimates from the 12 highly activated voxels found by using TFE in the brain. Due
to the large TR (7 s), there is a limitation on showing the HRF estimate in finer temporal resolution.
In (b), we still can see the different shapes of HRF. Originally published in Chen (2012). Published
with the kind permission of © Wenjie Chen 2012. All Rights Reserved
308 W. Chen et al.

leads to a very low temporal resolution with which to measure the hemodynamic
response. The limitation of large TR time for estimating HRF is not only the low
temporal resolution, but it also conducts the wrong timing for the stimulus onset.
For instance, if TR is seven seconds, we have 40-s blocks instead of 42-s. If the
stimulus function X.t/ is 0–1 function which indicates the onset for every seven
seconds, then we will miss the exact onset on 40, 80, 120, 160, : : : s. The strategy
here we used is interpolation. We interpreted the preprocessed data on the second-
based time series, and then applied TFE to see the HRF. As a result, we could only
see the framework of the HRF and approximate its value. Despite this limitation,
the resulting framework gave us evidence that our method does indeed capture the
various HRFs in the voxels. In addition, it establishes that our HRF-based analysis
can be applied to real data and may be improved with correspondingly refined
temporal resolution.
In the experimental design of this study, there are seven stimulus blocks in the
time series data that have a total duration of 90 acquisitions. As a result, the task-
related frequency is 7=90 D 0:0778. Using this information, we can apply our
method in order to generate an F-statistic map to show the activation in the brain that
is triggered by the stimuli (bottom row in Fig. 7). For comparison, we also generated
a T map using SPM5 (the SPM T map) that is shown in the upper row of Fig. 7. The
SPM T map is a contrast image that obtains the activation triggered by the stimulus
by applying the canonical SPM HRF uniformly throughout the brain. As a result,
it does not take into account any HRF variation that might occur in the different
regions of the brain.
In both rows of Fig. 7, increased activation is depicted by increased hot color,
such that the bright yellow regions represent more activation. As expected from an
auditory study, both the F map generated using our method and the SPM-generated
T map display activation in the temporal lobe. The F map from our analysis shows

Fig. 7 (b) F map and (a) T map of the activation by using TFE and SPM. T maps contain blue
and hot colors which respectively indicates negative and positive factor. The F map generated by
TFE (bottom row) appears to have less noise compared to the SPM-generated T map (upper row).
Originally published in Chen (2012). Published with the kind permission of © Wenjie Chen 2012.
All Rights Reserved
Spatio-Temporal Modeling for fMRI Data 309

increased activation almost exclusively in the temporal lobe, again as would be


expected from an auditory study. Whereas the contrast map generated using SPM
also displays the activation in the temporal lobe, there is also significant activation
in other regions of the brain, including parietal and prefrontal cortical areas. In
addition, the activation in the temporal lobe is more diffuse using SPM compared
to that seen using our F method. We conclude that the map generated using our
method appears to display less noise, such that there is less activation in regions
other than the primary auditory cortex. In addition, our method displayed a less
diffuse activation area in the auditory region, which may be interpreted as either a
more focused activation pattern or there may be some loss of sensitivity for detecting
the actual activation. Despite this possible limitation associated with our method, it
does have the additional benefit of being a test for the linearity assumption.

6 Discussion

The TFE, based on the method of Bai et al. [1], completes the fMRI data analysis
procedure: from adapting to various experimental design, through estimating HRF,
to detecting the activation region.
The first benefit is the experiment design. TFE can be applied for any type of
experimental design, including multiple stimulus design, event-related design, and
block design. Our nonparametric method can be applied to the multiple stimulus
experiment paradigm. From the property of HRF, different stimuli may cause
different hemodynamic responses even in one specific region. Consequently, the
corresponding HRF estimates to each stimulus will be given in our method, and
furthermore we carry out the statistical testing to see whether they are equivalent to
each other.
Our method can also be applied to block design and some rapid event-related
design. Most of the existing HRF estimation methods are only applied to event-
related design. With our method’s adaptability to various experimental design, we
extended the application to rapid event-related design and to block design by adding
an extra rest period. In fact, as long as there is a resting period during the design,
our method is better in estimating HRF.
The second benefit is reducing the noise. Noises might come from multiple
sources, such as the various types of scanners with systematic errors in use, the
background noise in the environment, and differences in the individual subjects’
heart beats and breathing. These noises would lead to heterogeneity of the records
of fMRI data. By using TFE, the heterogeneity is considered in the frequency
domain, which simplifies the error structure estimation process. Such simplicity
comes from the asymptotic independence of the spectrum in different Fourier
frequencies when we transfer the time series analysis to the frequency domain. In
addition, for efficiency, we use the WLS method to estimate the error spectrum.
Unlike the existing work [3, 13] based on the WLS method, which is implemented
by a computationally costly high-dimensional-matrix operation, our method shows
310 W. Chen et al.

higher performance, since the dimension of our matrix operation depends only on
the number of stimulus types in the experiment design.
The third benefit is HRF estimation. TFE does not require the length of HRF,
which is also called the latency (width) of HRF. As in most HRF modeling methods,
the length of HRF is the input as a priori to start the analysis. In practice, however,
the latency of HRF is unknown for the researchers. If the length of HRF is assumed
as known, such as in smooth FIR or the two-level method in [13], the final result
may be very sensitive to the input lengths. For TFE, the latency of HRF is not a
factor that affects the estimates. Additionally, the TFEs gives us a rough idea about
the latency of HRF by looking at how the estimates go to zero eventually over time.
One of the most important benefits is that TFE is able to generate the brain
activation map without using the general linear method (GLM). In fact, it simplified
the analysis by reducing the number of steps from two to one. The typical fMRI
analysis (SPM, FSL) requires two steps to customize HRF in the analysis. The first
step estimates HRF, and the second step applies the GLM to study the detection of
activation. Some issues related to GLM have not been addressed even in the most
recent versions of these packages. For example, the estimated standard error used
in tests for activation is a model-based approach and it is not efficient to check the
model validity for each voxel. Nevertheless, the GLM method continue to be applied
to explore other areas of brain research such as connectivity. In TFE, activation
detection is generalized by testing the hypothesis for the HRF estimates, which does
not require additional GLM and the specification of the error structure. Applications
of TFE to Parkinson’s disease and schizophrenia patients can be found in [5].
Also, the unique feature of using TFE is being able to test the linearity
assumption. As the linearity assumption is the foundation of the convolution model
we used, our method is able to test its validity before estimation, which is definitely
important for further analysis. As the linearity assumption is valid for the fMRI
data after testing, we are then able to use our nonparametric method to perform
the analysis, or any analysis tool based on the linearity assumption. If the linearity
testing fails, nonlinearity dominates the fMRI data, and the nonlinear estimation
method might be used [6, 8, 12].

References

1. Bai, P., Huang, X., Truong, Y.K.: Nonparametric estimation of hemodynamic response
function: a frequency domain approach. In: Rojo, J. (ed.) Optimality: The Third Erich L.
Lehmann Symposium. IMS Lecture Notes Monograph Series, vol. 57, pp. 190–215. Institute
of Mathematical Statistics, Beachwood (2009)
2. Boynton, G.M., Engel, S.A., Glover, G.H., Heeger, D.J.: Linear systems analysis of functional
magnetic resonance imaging in human V1. J. Neurosci. 16(13), 4207–4221 (1996)
3. Brillinger, D.R.: Time Series. Holden-Day, San Francisco (1981)
4. Buckner, R.L., Bandettini, P.A., OCraven, K.M., Savoy, R.L., Petersen, S.E., Raichle, M.E.,
Rosen, B.R.: Detection of cortical activation during averaged single trials of a cognitive task
using functional magnetic resonance imaging. Proc. Natl. Acad. Sci. U. S. A. 93(25), 14878–
14883 (1996)
Spatio-Temporal Modeling for fMRI Data 311

5. Chen, W.: On estimating hemodynamic response functions. Ph.D. thesis, The University of
North Carolina at Chapel Hill (2012)
6. Friston, K.J., Josephs, O., Rees, G., Turner, R.: Nonlinear event-related responses in fMRI.
Magn. Reson. Med. 39, 41–52 (1998)
7. Friston, K.J., Ashburner, J., Kiebel, S.J., Nichols, T.E., Penny, W.D. (eds.): Statistical
Parametric Mapping: The Analysis of Functional Brain Images. Academic, New York (2007).
http://www.fil.ion.ucl.ac.uk/spm/
8. Glover, G.H.: Deconvolution of impulse response in event-related BOLD fMRI1. NeuroImage
9(4), 416–429 (1999)
9. Jenkinson, M., Beckmann, C.F., Behrens, T.E., Woolrich, M.W., Smith, S.M.: Fsl. NeuroImage
62, 782–790 (2012). http://www.fmrib.ox.ac.uk/fsl/
10. Marchini, J.L., Ripley, B.D.: A new statistical approach to detecting significant activation in
functional MRI. NeuroImage 12(4), 366–380 (2000)
11. Ogawa, S., Lee, T.M., Kay, A.R., Tank, D.W.: Brain magnetic resonance imaging with contrast
dependent on blood oxygenation. Proc. Natl. Acad. Sci. U. S. A. 87(24), 9868–9872 (1990)
12. Vazquez, A.L., Noll, D.C.: Nonlinear aspects of the BOLD response in functional MRI.
NeuroImage 7(2), 108–118 (1998)
13. Zhang, C., Jiang, Y., Yu, T.: A comparative study of one-level and two-level semiparametric
estimation of hemodynamic response function for fMRI data. Stat. Med. 26(21), 3845–3861
(2007)
Part IV
Machine Learning Techniques in Time
Series Analysis and Prediction
Communicating Artificial Neural Networks
with Physical-Based Flow Model for Complex
Coastal Systems

Bernard B. Hsieh

Abstract Four applications using an intelligent computational method, such as


Artificial Neural Networks (ANNs) to incorporate with physical-based flow model,
namely, to enhance of knowledge base for model calibration, to correct simulation
errors, to perform inverse modeling estimation, and to perform surrogate model
approach have been studied recently. In this paper, the first application which
enhances the knowledge base for physical-based flow model’s calibration is demon-
strated. For the most physical-based model, invariably there are problems collecting
complete data sets to evaluate and model estuarine or coastal hydrodynamics
systems. These data gaps lead to uncertainties associated with the understanding
or modeling of these processes. One new approach to reduce this uncertainty and
to fill in these data gaps is based on the analysis of all the data characterizing the
system using ANNs. To demonstrate this data-driven approach, the computational
efforts undertaken for the Houma Navigation Canal (HNC) project, Louisiana,
USA, is presented. For this project, the following three major benefits are obtained
and presented from many “pieces” of hydro-environmental information. They are:
to fill in the data gaps so that complete data sets are available for the largest
number of instrumentation locations, to develop a better understanding of the
system responses, and to understand the sensitivity of the system variables. This
knowledge base is important to understand and assist with the physical-based model
development and calibration/verification processes.

Keywords Artificial Neural Networks • Coastal/estuary waterways • Data recov-


ery system • Numerical hydrodynamic modeling supporting tool • Time series
analysis

B.B. Hsieh ()


US Army Engineer Research and Development Center, 3909 Halls Ferry Road, Vicksburg, MS
39180, USA
e-mail: [email protected]

© Springer International Publishing Switzerland (Outside the USA) 2016 315


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_23
316 B.B. Hsieh

1 Introduction

Numerical simulation of flows and other processes occurring in water has now
matured into an established and efficient part of hydraulic analyses. However,
models for coastal, estuary, and river systems with millions of nodes are becoming
increasingly common. These models make heavy demands on computing capacity.
The communication of computational intelligent techniques, such as Artificial
Neural Networks (ANNs) with numerical flow model or other types of physical-
based simulation is to introduce the data-driven techniques for developing modeling
model procedures to obtain more reliable and faster turn-around results. The
computational strategy is based on the analysis of all the data characterizing the
system.
Several categories for communication are how we can improve the model
accuracy during the modeling calibration/simulation processes, such as building
up more reliable time series boundary conditions and interior calibration points
(application type 1), how to generate a corrected time series when the calibra-
tion of a physical-based model has reached its optimal limitation (application
type 2) and how to perform a reliable boundary condition from a simulation-
optimization computational inverse modeling approach (application type 3—Hsieh
[1]). In addition, the great divergence between the response-time requirements
and the computational-time requirements points out the need to reduce the time
required simulating the impact of input events on hydrodynamics of the modeled
flow/transport system. The address of reducing time application (type 4) can be
called as “surrogate modeling” method to simplified a complex system but it will
receive reasonable accurate results and much shorten computational resources. For
instance, a surrogate modeling system which integrates a storm surge model with
supervised–unsupervised ANNs is used to solve the response-time requirement is
considered as a good example. Due to the length of paper limitation, this more
advanced approach can be found in Hsieh and Ratcliff [2, 3] and Hsieh [4].
This paper focuses the use of supervised ANNs to obtain the reliable long-term
continuous time series (type 1) of the physical-based model in a tidal-influenced
environment as the demonstration example.

2 Brief Description of the Study Site

The project area is located in Terrebonne Parish and includes the city of Houma
(Fig. 1). The Terrebonne estuary in southern Louisiana, USA, consists of many
waterways, lakes, and salt marsh lands. Houma, at the northern end of the Houma
Navigation Canal (HNC), is home to commerce and industry that relies on the
HNC.
Swarzenski et al. [5] reported that the GIWW influences open water salinities
by transporting freshwater to coastal water at some distance from the Lower
Communicating Artificial Neural Networks with Physical-Based Flow Model. . . 317

Fig. 1 Schematic of inland Bayou Lafourche


at Thibodaux
flow and salinity boundaries
and their neighboring gages Morgan City
Bayou Lafourche
East of Larose

Houma GIWW
West of BL
GIWW at Larose
We
Ca st Mi South of Bayou
na no
l rs L’Eau Bleu

Grand Bayou

HNC
Dulac

Atchafalaya River (LAR). Fresh water moves south down the HNC. The GIWW
at Larose stays fresh under most conditions. However, when the stage of the LAR
drops, the HNC can become a conduit for saltwater to move northwards from the
Gulf of Mexico to the GIWW. Annual sea level is highest in the fall at the time the
LAR is at its annual lowest. The differential head gradient between in the LAR and
rest of the Louisiana coast is small and easily reversed by wind events, including
tropical storms.
The purpose of the HNC deepening study is to identify the most economically
feasible and environmentally acceptable depth of the HNC. Identification of the
final lock sill depth will be a direct result of the limited reevaluation study
and navigational safety and maintenance concerns. To improve the reliability
for numerical model calibration and simulation, knowledge base enhancement is
conducted by ANNs. Particularly, the enhancement of the continuous record for
flow and salinity inland boundary conditions during the calibration year, the average
flow simulation year, and the low and high flow simulation months are the primary
focus in this paper. The initial study has been summarized by Hsieh and Price [6].

3 Knowledge Recovery System Using ANNs

Knowledge recovery system (KRS) deals with the activities which are beyond the
capabilities of normal data recovery systems. It covers most of the information
which is associated with knowledge oriented problems and is not limited to numer-
ical problems. Almost every management decision, particularly for a complex and
dynamic system, requires the application of a mathematical model and the validation
of the model with numerous field measurements. However, due to instrumentation
adjustments and other problems, the data obtained could be incomplete or produce
abnormal recording curves. Data may also be unavailable at appropriate points in the
318 B.B. Hsieh

computational domain when the modeling design work changes. Three of the most
popular temporal prediction/forecasting algorithms are used to form the KRS for
this project: Multi-layered Feed Forward Neural Networks (MLPs), Time-Delayed
Neural Networks (TDNN), and Recurrent Neural Networks (RNN). The detailed
theoretical development for the algorithms is described in Principe et al. [7] and
Haykin [8]. Recurrent networks can be classified as fully and partially recurrent: the
Jordan RNN, has feedback (or recurrent) connections from the output layer to its
inputs; the locally RNN [9], uses only local feedback; and the globally connected
RNN, has feedback connections from its hidden layer neurons back to its inputs,
such as Elman RNN.
The KRS [10] for missing data is based on the transfer function (response
function) approach. The simulated output can generate a recovered data series
using optimal weights from the best-fit activation function (transfer function). This
series is called the missing window. Three types of KRS are defined: self-recovery,
neighboring gage recovery with same parameter, and mixed neighboring and remote
gages recovery with multivariate parameters. It is noted that the knowledge recovery
will not always be able to recover the data even with perfect information (no missing
values or highly correlated) from neighboring gages.
The knowledge base development for KRS can be conducted as the following
steps:
1. To understand possible physical system associated with corresponding
input/output system.
2. To create a input/output system based on initial cross analysis between inputs
and target output series.
3. To detect and correct outliers and abnormal values.
4. To perform KRS for input system as necessary.
5. To conduct different training strategies and architecture.
6. To adjust/select additional possible inputs.

4 Existing Knowledge Base During 2004–2005 Periods

To develop the knowledge base for assisting the numeric calibration/verification


process, two data bases are established—one deal with interior points (gages)
for model calibration and the other relates to inland flow and salinity boundaries
at GIWW and Bayou Lafourche at Thibodaux. Model boundary conditions also
include wind stress and tidal forcing from the Gulf of Mexico.
To meet the requirement of data availability for a one year calibra-
tion/verification, an optimal period with minimum overall gaps was selected—from
March 23, 2004 to March 22, 2005. This part of the KRS effort seeks to develop an
hourly knowledge base to cover the modeling domain for inland boundary gages.
Nine interior gages are extracted and processed as hourly data for the modeling
time window. The conditions of data availability are defined as: Less Missing (LM)
Communicating Artificial Neural Networks with Physical-Based Flow Model. . . 319

Table 1 Knowledge base Gauge Parameter Availability


availability for inland
boundary conditions related Morgan City Stage Available
gages during 2004–2005 Morgan City Flow Available
modeling period GIWW-WMC Stage Partial missing
GIWW-WMC Flow Most missing
GIWW-WMC Salinity Partial missing
GIWW-Houma Stage Less missing
GIWW-Houma Flow Less missing
GIWW-Houma Salinity Available
GIWW-East of Lar. Stage Partial missing
GIWW-East of Lar. Flow Most missing
GIWW-East of Lar. Salinity Partial missing
GIWW-BL at Lar. Stage Most missing
GIWW-BL at Lar. Flow Partial missing
GIWW-BL at Lar. Salinity Less missing
BL at Thibodaux Flow Less missing
BL at Thibodaux Salinity Total missing
HNC at Dulac Stage Available
HNC at Dulac Flow Less missing
HNC at Dulac Salinity Less missing
GBC-South of BLB Stage Partial missing
GBC-South of BLB Salinity Partial missing

means there is less than 10 % data missing during entire year; Partial Missing (PM)
indicates the missing values are between 10 and 50 %; and Most Missing (MM)
means more than 50 % of the values are missing.
The inland boundary conditions were determined at three locations based on the
numeric modeling design and available gages. These locations are West Minors
Canal (WMC-GIWW), West of Bayou Lafourche at Larose (GIWW), and Bayou
Lafourche at Thibodaux. Note that only the WMC flow and salinity gage KRS will
be demonstrated in this paper.
Table 1 summarizes all the knowledge bases for gages related to the above
inland boundary locations. Although the Morgan City gage is out of the numerical
model domain, it is a major freshwater contributor to the system. The United States
Geological Survey (USGS) gage HNC at Dulac is even further south of the inland
boundary points, but it is an indicator of how much tidal flow will enter and leave
the system. Wind stress and tidal fluctuations are two other forcings which drive the
hydrodynamic system. Wind stress and wind direction are converted to horizontal
(x), vertical (y), and along-channel (c) components. The tidal elevations from the
Gulf coast propagate the forcing signal inland. The Grand Isle tidal station, a coastal
gage, is used as this driving forcing and the West bank wind gauge is represented as
wind forcing.
320 B.B. Hsieh

5 Developments of DRS for Inland Boundary Conditions


During 2004–2005 Periods

Flow data for West Minors Canal (GIWW) during the 2004–2005 period has more
than 80 % missing. The existing data distributes into a period between 1/25/05 and
3/22/05. The first step of this approach is to determine the most significant inputs
that dominate the change of flow at the target location. Usually, cross-correlation
analysis and an x–y scatter plot can be good tools to aid in making this decision.
For this system, significant inputs include stages and flows from gage information
at Morgan City, Houma, and Dulac, local stages, remote tidal forcing, and wind
stress. The meaningful candidates for the WMC-GIWW are Dulac flow, Morgan
City stage, West Minors Canal stage, Grand Isle tidal elevation, Houma stage, and
wind stress x-component. From an initial analysis, it was found that the hydraulic
gradient plays a significant role. To incorporate the way in which lower frequency
inputs propagate into the system, higher frequency information for remote inputs
(such as the Grand Isle tidal elevation and Dulac flow) are filtered out. The most
significant tidal frequencies for Gulf of Mexico are diurnal frequencies, so a 25-h
moving average filter is applied to both input functions. The training set used the
above described six series as inputs and the flow of West Minors Canal (GIWW) as
output, from 1/25/05 to 3/22/05.
Several training algorithms (Neurosolutions, v6.0 [11]), including static and
dynamic systems, and training strategies were tested. The best algorithm for this
application was Jordan–Elman Recurrent ANNs. Excellent agreement between the
ANNs model and flow observations was obtained (correlation coefficient D 0.951
and mean absolute error D 229 cfs (7.4 cms). The most significant inputs are subtidal
Dulac flow (filtering out the signal periods higher than 25 h), and the Morgan City
stage. With this success training, the flow near West Minors Canal at GIWW from
3/23/04 to 1/24/05 was simulated when six input series in the same period were
applied (Fig. 2). The annual flow average during this period was estimated to be

5000
MG01(ANNs)
4500
MG(01)
4000

3500

3000

2500

2000

1500

1000

500

0
1 505 1009 1513 2017 2521 3025 3529 4033 4537 5041 5545 6049 6553 7057 7561 8065 8569

Fig. 2 KRS (blue color shows the knowledge recovery and the pink color represents the
observations) for GIWW flow (cfs) near West Minors Canal flow gage
Communicating Artificial Neural Networks with Physical-Based Flow Model. . . 321

2544 cfs (94 cms). After the flow data near West Minors Canal had been recovered,
the focus turned to the salinity data recovery. This was easier to recover due to
smaller data gaps and more complete salinity data at Houma GIWW.

6 Development of Inland Flow and Salinity Boundary


Simulators (WMC-GIWW) During 2004–2005 Periods

An ANNs model constructed with relatively few patterns used for training will
yield less reliable responses to conditions the system never learned. To meet the
generalized response of flow and salinity boundary conditions to other hydrologic
patterns, the ANNs model (called the simulator) has to be retrained by all the
input/output with a one year basis, or be longer if more information is available.
Unlike the knowledge recovery process for the WMC 2004–2005 salinity, the
intention of this simulator is not to use the flow information obtained from
WMC/GIWW flow simulator and the salinity from the Houma gage. The optimal
candidate for the salinity simulator at WMC/GIWW is six-input system. These are
up-gradient subtidal flow, WMC flow, subtidal component from Gulf of Mexico,
Houma gage stage, x-component wind stress, and Inverse Morgan City subtidal
flow. After 6000 training iterations, the simulator (Fig. 3) had a 0.921 correlation
coefficient and 0.03 ppt. mean absolute error.

4.5

4 WMC(sal)
WMC(sal) -ANNS
3.5

2.5

1.5

0.5

0
1 498 995 1492 1989 2486 2983 3480 3977 4474 4971 5468 5965 6462 6959 7456 7953 8450
-0.5

Fig. 3 ANNs training result ( pink color) of salinity (ppt.) simulator for GIWW near West Minors
Canal based on the 2004–2005 modeling period. The blue color shows the observations
322 B.B. Hsieh

7 Patterns Search from the Historical Hydrological Record

After the numeric model is calibrated/validated by providing the boundary and


initial conditions, a series of simulations are conducted under the project-specific
management concerns. The management decisions can be more correctly made if
the response variable has salinity variation from selected locations in the model
under wide range of flow conditions. To meet this requirement, a procedure to
determine the average flow year, high flow month, and low flow month is needed. To
select a high flow/low flow month is a straightforward process, but the determination
of the average flow year requires further calculation from a flow record.
Since the flow from Atchafalaya River (Morgan City gage) is a major freshwater
resource into the system, the analysis of its historical record can be used to represent
the flow conditions. After reviewing and filling the missing gaps, a monthly average
data from year 1994 to 2006 was established. The record high average monthly flow
for this 13-year period was May 1996. However, data for many other variables was
missing for this period of time, so April 2002, the second highest, was selected as
the high flow month. The lowest monthly average in September 2006 was used to
represent the low flow month.
Due to the relatively short length of processed data at Morgan City, we were not
able to identify very clear clusters. However, we were able to define the average flow
year from the concept of renormalized input transformation and minimum variance
calculations. These can be used to define the average flow year within this 14-year
record. The normalized input vector for each particular month was calculated by the
ratio of the deviation of each monthly average flow from the average of monthly
average mean for a particular month to the variance of that particular month over
the record years. The resulting quantitative index (dry–wet index) can be used to
represent the deviation for a particular month from record years. When this index
is close to zero that particular month gets close to the average of monthly averages
within the study record. Therefore, if each month for a year’s dry–wet index is at
or near zero, that year is considered as an average flow year. Since it is impossible
to get wet–dry index equal to zero for each month, yearly variance is computed to
identify the final average flow year. The final average flow year, therefore, is year
2003 due to its lowest variance of the wet–dry index (Fig. 4).
Communicating Artificial Neural Networks with Physical-Based Flow Model. . . 323

0.4

0.35 Variance
0.3

0.25

0.2

0.15

0.1

0.05

0
1 2 3 4 5 6 7 8 9 10 11 12 13

Fig. 4 Annual dry–wet index variance for Morgan City flow from 1994 to 2006

8 Inland Flow and Salinity Boundary Conditions Simulation


During Selected Average Flow Year, High and Low Flow
Months

Dry–wet index and variance analyses showed the average flow year to be 2003, the
high flow month to be April 2002, and the low flow month to be September 2006.
A knowledge base with six related gages to cover all three required periods was
then constructed (Table 2). Information from the gages at Morgan City, Dulac, and
Houma were used as remote forcing to quantify and recover the missing flow and
salinity data for the three boundary gages. The remote forcing or stress also included
tidal propagation and wind stress.
The procedure to process these flow and salinity boundary conditions was to
apply the simulators (connection weights) to simulate the target output after the
set of input parameters had been identified by the simulator. Note that stage, flow,
and salinity information during these three periods for the gage GIWW near West
Minors Canal was totally missing. This therefore required a simulation process, not
data recovery. Although the targets were to construct inland boundary condition
gage information for these three required periods, the information from the other
three related gages as well as tidal and wind forcing needed to be recovered first. For
example, to simulate flow at WMC during the average flow year 2003, the missing
values for flow at the Houma gage, tidal elevation at Grand Isle, and wind stress at
the West Bank station during the same period of time had to be estimated first.
324 B.B. Hsieh

Table 2 Knowledge base Gauge Parameter AFY/HFM/LFM


availability for inland
boundary conditions related Morgan City Flow Available/available/PM
gages during management Dulac Flow Available/TM/available
scenario runs of the average Dulac Stage Available/available/available
flow year (year 2003), the GIWW-Houma Salinity PM/TM/available
high flow month (April GIWW-Houma Flow PM/available/available
2002), and the low flow
GIWW-Houma Stage Available/available/available
month (September 2006)
Larose Salinity Available/TM/available
Larose Flow PM/TM/available
Larose Stage Available/available/available
Thibodaux Salinity TM/TM/TM
Thibodaux Flow LM/PM/available
Thibodaux Stage Available/available/available
WMC Salinity TM/TM/TM
WMC Flow TM/TM/TM
WMC Stage TM/TM/TM
PM partial missing, TM total missing, LM less missing, AFY
average flow year, HFM high flow month, LFM low flow
month

6000

WMC(f)
5000
Lafourche(f)
4000

3000

2000

1000

0
1 34 67 100 133 166 199 232 265 298 331 364 397 430 463 496 529 562 595 628 661 694

Fig. 5 Simulated/recovered flow (cfs) for both GIWW near WMC and West Bayou Lafourche at
Larose gages during the high flow month, April 2003

Figure 5 shows boundary flow simulation results for both GIWW near WMC
and West Bayou Lafourche at Larose gage during the high flow month, while Fig. 6
presents the simulated/recovered salinity of the three inland boundary gages during
the 2003 average flow year with a similar approach technique.
Communicating Artificial Neural Networks with Physical-Based Flow Model. . . 325

8
WMC(sal)
7
Lafourche (sal)
6
Thibodaux(sal)
5

0
1 489 977 1465 1953 2441 2929 3417 3905 4393 4881 5369 5857 6345 6833 7321 7809 8297

Fig. 6 Simulated/recovered salinity (ppt.) for the three inland boundary gages—GIWW near
WMC, West Bayou Lafourche at Larose, and Bayou Lafourche at Thibodaux—during the average
flow year 2003

9 Learning Remarks

The knowledge discovery in databases (KDD) activities include preprocessing raw


data and analyzing the patterns, filling missing gaps based on the relationships
among influenced variables, establishing the flow and salinity simulators for inland
boundary conditions, searching flow patterns from a historical record for performing
management runs, and generating flow and salinity boundary conditions for selected
corresponding average flow year, high flow month and low flow month are success-
fully applied to a very dynamic coastal-estuary salinity and flow transport system in
the interaction water bodies of HNC and GIWW. The purposes of this effort are to
provide a better knowledge base in the interior gauges of the numerical modeling
domain for model calibration/validation, to develop three hourly non-missing gaps
inland flow and salinity boundary conditions for model calibration/validation, and
to conduct the simulation of such flow and salinity boundary conditions when the
flow conditions for management scenarios are proposed.
The period between March 23, 2004 and March 22, 2005 is selected as numerical
model calibration/validation window due to the data availability for interior and
boundary gauges of the model. The parameters used for this enhancement effort are
stage, flow, salinity, tidal elevation, and wind forcing. The inland model boundary
conditions are set as near West Minors Canal (GIWW), West of Bayou Lafourche at
Larose (GIWW), and Bayou Lafourche at Thibodaux gauges. The most challenge
knowledge recovery was about 80 % of flow data missing in GIWW near West
Minors Canal during this period.
Higher correlation can be obtained for both flow boundary gages if “rich”
information is available surrounding the gage. The salinity boundary condition
326 B.B. Hsieh

recovery also provided good results because there were less missing values, meaning
fewer values needed to be estimated within this model calibration/validation period.
However, this will not occur in a generalized situation. Therefore, to improve
prediction reliability, the simulators are developed. The method presented herein
achieved the result of overcoming the condition where information is not available
from neighboring gages, through the ability to estimate corresponding variable time
series. This condition becomes “system simulation,” which goes beyond performing
“filling-the-gaps” activities.
A flow pattern search procedure using renormalized transformation of long-term
flow record to identify the average flow year within the selected time window. The
used technique is an initial part of unsupervised ANNs, which can be used to search
primary flow patterns from a long historical hydrologic record, is given. The year
2003 is the best candidate as average flow year from Morgan City monthly average
flow record (1994–2006). A new knowledge base is established and combined
with developed simulators for successfully simulating boundary conditions of all
management scenarios need.
The data-driven modeling technique in estuarine/coastal system is not limited
to deal with knowledge within the database itself only. It can integrate with
numerical model to provide higher accuracy and time saving results. For example,
the computational intelligent techniques can be used to simulate flow and salinity
variation in the interior points of the numerical model, to make the correction
analysis while the numerical model reaches its calibration limitation, to obtain
more reliable offshore boundary conditions through an inverse modeling approach,
to conduct an unsupervised–supervised ANNs approach for designing a more
generalized prediction tool–simulator, and to perform as one of forecasting module
for a decision support system.

10 Conclusions

This flow and salinity system is dominated by the source tide from the coast and
freshwater inflow forcing from Morgan City, with minor variation due to wind
stress. Remote forcing plays an important role in driving the dynamic variation,
particularly for slow reacting parameters such as salinity. Addressing the subtidal
component of the variables and the hydraulic gradient between variables signifi-
cantly improved the accuracy of the missing data recovery/system simulations. The
dominant factor for flow and salinity transport is governed by seasonality or special
events, mainly between tidal and freshwater inflow forcing.
To understand the flow and salinity patterns from the Dulac gage data, a simple
physical phenomenon can be described as follows. During the low flow period,
although upland flow condition is a critical factor in determining the total net flow
entering the system, the accuracy of tidal forcing addressed by the model seems
to be more sensitive than the upland flow condition for salinity variation at Dulac.
During the high flow period, the boundary tide becomes less important. However,
Communicating Artificial Neural Networks with Physical-Based Flow Model. . . 327

when the Dulac flow is very low, a high salinity spike is expected. If we use only
the stage along HNC as model calibration criteria for hydrodynamic behavior, the
accuracy of the hydraulic gradient near Dulac is critical.

Acknowledgement The US Army Corps Engineers New Orleans District Office funded this work.

References

1. Hsieh, B.B.: Boundary Condition Estimation of Hydrodynamic Systems Using Inverse Neuro-
Numerical Modeling Approach, GIS & RS in Hydrology. Water Resources and Environment,
Yichang (2003)
2. Hsieh, B., Ratcliff J.: A storm surge prediction system using artificial neural networks.
In: Proceedings of the 26th International Society for Computers and Their Applications,
September 2013, Los Angles, CA, USA (2013)
3. Hsieh, B., Ratcliff J.: Determining necessary storm surge model scenario runs using
unsupervised-supervised ANNs. In: Proceedings of the 13th World Multi-Conference on
Systemics, Cybernetics and Informatics, Orlando, FL, USA, July 10–13, pp. 322–327 (2009)
4. Hsieh, B.: Advanced hydraulics simulation using artificial neural networks—integration with
numerical model. In: Proceedings of Management, Engineering, and Informatics Symposium,
July 3–6, Orlando, FL, USA, pp. 620–625 (2005)
5. Swarzenski, C.N., Tarver A.N., Labbe, C.K.: The Gulf Intracoastal Waterway as a Conduit
of Freshwater and Sediments to Coastal Louisiana Wetlands, Poster, USGS, Baton Rouge,
Louisiana (2002)
6. Hsieh, B., Price, C.: Enhancing a knowledge base development for flow and salinity boundary
conditions during model calibration and simulation processes. In: Proceedings of the 13th
World Multi-Conference on Systemics, Cybernetics and Informatics, Orlando, FL, USA, July
10–13, pp. 328–333 (2009)
7. Principe, J.C., Eulino, N.R., Lefebvre, W.C.: Neural and Adaptive Systems: Fundamentals
through Simulations. Wiley, New York (2000)
8. Haykin, S.: Neural Networks: A Comprehensive Foundation. Macmillan, New York (1994)
9. Frasconi, P., Gori, M., Soda, G.: Local feedback multilayered networks. Neural Comput. 4,
120–130 (1992)
10. Hsieh, B., Pratt, T.: Field Data Recovery in Tidal System Using ANNs, ERDC/CHL CHETN-
IV-38, (2001)
11. NeuroSolutions v6.0, Developers Level for Windows, NeuroDimension, Inc., Gainesville
(2011)
Forecasting Daily Water Demand Using Fuzzy
Cognitive Maps

Jose L. Salmeron, Wojciech Froelich, and Elpiniki I. Papageorgiou

Abstract In this chapter, we describe the design of a multi-regressive forecasting


model based on fuzzy cognitive maps (FCMs). Growing window approach and
1-day ahead forecasting are assumed. The proposed model is retrained every day
as more data become available. To improve forecasting accuracy, mean daily
temperature and precipitation are applied as additional explanatory variables. The
designed model is trained and tested using data gathered from a water distribution
system. Comparative experiments provide evidence for the superiority of the
proposed approach over the selected state-of-the-art competitive methods.

Keywords Forecasting water demand • Fuzzy cognitive maps

1 Introduction

Forecasting water demand is an important problem considered within the domain


of water management. The existing forecasting methods for urban water demand
are classified as: long-term (the prediction resolution is 1 year or greater), mid-term
(its resolution ranges from a month to a year), short-term (the resolution of it ranges
from hours to days). In this paper we are addressing the problem of short-term, daily
forecasting of water demand.

J.L. Salmeron ()


University Pablo de Olavide, Seville, Spain
e-mail: [email protected]
W. Froelich
The University of Silesia, ul. Bedzinska 39, Sosnowiec, Poland
e-mail: [email protected]
E.I. Papageorgiou
Computer Engineering Department, Technological Educational Institute of Central Greece,
3o Km Old National Road Lamia-Athens, TK 35100 Lamia, Greece
e-mail: [email protected]; [email protected]

© Springer International Publishing Switzerland 2016 329


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_24
330 J.L. Salmeron et al.

Forecasting daily water demand has been already addressed in many research
studies. Linear regression and artificial neural network were investigated in [25].
A nonlinear regression was applied in [36]. A hybrid methodology combining
feed-forward artificial neural networks, fuzzy logic, and genetic algorithm was
investigated in [24]. Recently spectral analysis-based methodology to detect cli-
matological influences on daily urban water demand has been proposed [1]. A
comparison of different methods of urban water demand forecasting was made in
[7, 26].
FCM is an emerging soft computing technique mixing artificial neural networks
and fuzzy logic [16]. An FCM models knowledge through causalities through fuzzy
concepts (represented as nodes) [5] and weighted relationships between them [16].
Extensive research has been accomplished on FCMs in different research and
application domains. In the domain of time series prediction, only few research
works based on FCM methodology have been done. In 2008, Stach et al. proposed
first a widely accepted technique for FCMs-based time series modeling [30]. Also,
Froelich and his colleagues used an evolutionary FCM model to effectively forecast
multivariate time series [10, 11, 15, 21, 22]. Recently Homenda et al. have applied
FCMs to univariate time series prediction [33–35] and Lu et al. [19] discussed in
their study time series modeling related to fuzzy techniques in data mining.
In spite of many existing solutions, the problem of daily forecasting of urban
water demand remains a challenge and the investigation of new forecasting methods
to solve it, is an open research area. In this research we undertake an effort to
improve the accuracy of forecasting using a model based on FCM.
The contributions of this paper are the following:
– We are proposing a new method for water demand forecasting using FCMs. An
FCM-based forecasting model is designed and trained for daily water demand
time series.
– A new parametrized fuzzyfication function is proposed that enables to control the
amount of data processes by the linear or nonlinear way.
The outline of this paper is as follows. Section 2 describes the theoretical
background on time series and fuzzy cognitive maps. Section 3 presents the design
of FCM-based model and its mapping to time series. Parameterized fuzzyfication
procedures for the calculation of concept states is proposed in Sect. 4. The results of
comparative experiments are shown in Sect. 5. Section 6 concludes the paper.

2 Theoretical Background

In this section we provide background knowledge on time series and the selected
state-of-the-art forecasting models. Also an introduction to the theory of fuzzy
cognitive maps is presented.
Forecasting Daily Water Demand Using Fuzzy Cognitive Maps 331

2.1 Forecasting Time Series

Let y 2 < be a real-valued variable whose values are observed over discrete time
scale t 2 Œ1; 2; : : : ; n, where n 2 @ is the length of the considered period. A time
series is denoted as fy.t/g D fy.1/; y.2/; : : : ; y.n/g. The objective is to calculate the
forecast yO .tC1/. The individual forecasting error is calculated as et D yO .tC1/y.tC
1/. To calculate cumulative prediction errors which in practice are accounted by the
value of fitness function of the evolutionary algorithm we use the Mean Absolute
Percentage error (talk) given as formula (1).
n ˇ ˇ
1 X ˇˇ et ˇˇ
MAPE D   100 %: (1)
n tD1 ˇ yt ˇ

For the evaluation of forecasting accuracy the concept of growing window was
used. The growing window is a period that consists of learning part that starts always
at the beginning of time series and ends up at the time step t. The testing part follows
the learning part. In the case of our study, the length of testing window was assigned
as 1 and only 1-step ahead prediction yO .t C 1/ was made. After every learn-and-test
trial the window grows 1 step ahead till the end of historical data is reached. The
concept of growing window is depicted in Fig. 1.
For the comparative purposes the following selection of state-of-the-art forecast-
ing models was made. Naive—the naive approach is a trivial forecasting method
which assumes that forecasts are assigned as the previously observed value, i.e.,
yO .t C 1/ D y.t/. ARIMA—AutoRegressive Integrated Moving Average model is
one of the most popular statistic forecasting one in time series analysis [13]. This
model relies on the assumption of linearity among variables and normal distribution.
The extension version takes into account additional exogenous variables and is
called ARIMAX. LR—linear regression model is used to relate a scalar dependent
variable and one or more explanatory variables. Linear regression was applied for
forecasting in [2]. PR—polynomial regression is a form of regression in which
the relationship between the independent variable and the dependent variables
is modeled as a polynomial. The use of polynomial regression for forecasting
has been demonstrated [36]. GARCH—Generalized Auto-regressive Conditional
Heteroscedastic is a nonlinear model. It uses a Quasi-Newton optimizer to find the

Fig. 1 Exemplary growing


window
332 J.L. Salmeron et al.

maximum likelihood estimates of the conditionally normal model. The details of this
model are further described in the literature [20]. ANN (Artificial Neural Networks)
is a mathematical model consisting of a group of artificial neurons connecting with
each other; the strength of a connection between two nodes is called weight [8].
HW—the Holt-Winters method is an extended version of the exponential smoothing
forecasting. It exploits the detected trend and seasonality within data.

2.2 FCM Fundamentals

Fuzzy cognitive maps are defined as an order pair hC; Wi, where C is the set of
concepts and W is the connection matrix that stores the weights wij assigned to the
arcs of the FCM [12]. The concepts are mapped to the real-valued activation level
ci 2 Œ0; 1 that is the degree in which the observation belongs to the concept (i.e., the
value of fuzzy membership function). The reasoning is performed as the calculation
of Eq. (2) [5]:
!
X
n
cj .t C 1/ D f wij  ci .t/ ; (2)
iD1;i¤j

where f ./ is the transformation function that can be bivalent, trivalent, hyperbolic
tangent, or unipolar sigmoid. After numerous trials, for this study we selected to
apply the unipolar sigmoid function given by the formula (3):

1
f .x/ D ; (3)
1C ec.xT/

where c and T are constant parameters, set on trial-and-error basis. The parameter c
determines how quickly the transformation reaches values of 0 and 1. The parameter
T is used to move the transformation along the x axis.
The objective of learning FCM is to optimize the matrix W with respect to the
forecasting accuracy. In this paper we use for that purpose the real-coded genetic
algorithm (RCGA) [28] that was proved to be very effective for the time series
forecasting task [9]. In RCGA the populations of candidate FCMs are iteratively
evaluated with the use of fitness function given in Eq. (4) [29]:
˛
fitness.FCM/ D ; (4)
ˇ  err C 1

where ˛, ˇ are the parameters and err is the prediction error. For the purpose of this
paper we assume ˛ D 1, ˇ D 1. The MAPE evaluates cumulative prediction error,
i.e., err D MAPE.
Forecasting Daily Water Demand Using Fuzzy Cognitive Maps 333

3 Mapping FCM Model to Water Demand Time Series

The challenge that we faced was the mapping of FCM structure to the water demand
time series. For that purpose we performed an extensive analysis of the considered
time series.
First the correlation analysis was performed. As can be observed in Fig. 2a the
auto-correlation function reveals a typical pattern with seasonality and trend [3].
The peaks occurring with the frequency of 7 days are related to weekly seasonality.
The decreasing slope of the function exhibits that the considered time series involves
trend.
Partial auto-correlation has been plotted in Fig. 2b. It confirms that the statistical
significance of lags above 7 is very low. In consequence we decided to select lags 7
and 1 for the planned FCM-based forecasting model.
In the second step of analysis, we checked the influence of exogenous variables
on the forecasted water demand. Correlation between water demand and mean daily
temperature was calculated and achieved the value 0:20. The correlation between
water demand and precipitation was very weak at negative: 0:02. The low value
of these correlations raised doubts regarding the inclusion of external regressors.
For that reason, we constructed a multivariate time series consisting of the three
considered variables: water demand, mean daily temperature, and precipitation. We
checked whether such time series is co-integrated, sharing an underlying stochastic
trend [4]. To test co-integration, the Johansen test [14] implemented as “ca.jo”
function in “urca” library of the R package was used. The results given in Table 1
confirmed the null hypothesis of the existence of two co-integration vectors. The
test value for the hypothesis r  2 was above the corresponding critical value
at all significance levels: 10 %, 5 %, and 1 % . The Johansen test suggested that
the inclusion of weather related variables may improve the forecasting accuracy.
Therefore, the mean daily temperature and precipitation (rain) were added as

(a) (b)
1.0

0.8
0.8

0.6
Partial ACF
0.6
ACF

0.4
0.4

0.2
0.2

0.0
0.0

0 5 10 15 20 25 30 0 5 10 15 20 25 30
Lag Lag

Fig. 2 Correlation coefficients. (a) Auto-correlation. (b) Partial auto-correlation


334 J.L. Salmeron et al.

Table 1 Johansen test Test 10 % 5% 1%


r2 18:77 10:49 12:25 16:26
r1 44:25 22:76 25:32 30:45
rD0 370:20 39:06 42:44 48:45

Table 2 Variables and Variable Description


description
y1 Lagged water demand (lag 8)
y2 Lagged water demand (lag 1)
y3 Mean daily temperature
y4 Mean daily precipitation (rain)
y5 Forecasted water demand

Fig. 3 FCM-based regressive model

additional exogenous regressors to the considered multi-regressive models. The final


set of the selected variables Y D fyi g5iD1 is presented in Table 2.
On the basis of the described analysis we designed the architecture of the FCM
that is shown in Fig. 3. The FCM contains only one parent node c5 that is the
forecasted daily water demand. The parent concepts c1 and c5 are auto-regressive
and influence directly the forecasted water demand. The concept c1 is related to
weekly seasonality that was previously discovered in data. The concept c2 is related
to the trend. Concepts c3 and c4 related to the exogenous variables of temperature
and rain, respectively.
Forecasting Daily Water Demand Using Fuzzy Cognitive Maps 335

4 Parametrized Membership Function

Similarly as in most practical applications, the fuzzyfication function is simplified


as a normalization [12]. However, in this study, we propose to use the parametrized
fuzzyfication function given by the formula (5).

yi .t/  .1  k/  min.yi /
ci .t/ D ; (5)
.1 C k/  max.yi /  .1  k/  min.yi /

where min.vi / and max.vi / denote the minimum and maximum values of yi .t/,
respectively. With the use of constant k 2 Œ0; 1 it is possible to shift the data closer
to the linear part of the transformation function.
This way, by changing the value of k it is possible to control the amount of data
that are transformed linearly during the reasoning process performed by the FCM.
By increasing k more data are processed purely linearly, i.e., less of them fall into
the nonlinear part of the transformation function.
After forecasting, defuzzyfication is made using the reversed function given as
formula (6).
 
yi .t/ D ci .t/  .1 C k/  max.yi /  .1  k/  min.yi / C .1  k/  min.yi /: (6)

5 Experiments

In this section, we present our testing of forecasting models, gradually selecting


the best possible set up for the experiments and then selecting the best individual
forecasting method based on our data.

5.1 Quality of Data

The considered water demand time series involved 2254 days, with the data missing
for 78 of them. This was due to brakes in data transmission caused by discharged
batteries or other hardware problems in the data transmission channels. This
problem posed some difficulty because the missing values were irregularly dispersed
across the entire time series. We decided to impute them by linear interpolation
using function “interpNA” from package R. To illustrate the problem, in Fig. 4a we
show an exemplary part of the time series that was the most substantially affected by
missing data. In Fig. 4b the missing values are imputed by the applied interpolation.
The second issue that we encountered was the occurrence of outliers that were
also irregularly distributed across the time series. These were primarily due to
pipe bursts and the subsequent extensive water leakage. The outliers had to be
336 J.L. Salmeron et al.

(a) 4000 (b)

4000
Water flow

Water flow
2000

2000
0 50 100 150 0 50 100 150
Days Days

Fig. 4 Dealing with missing data. (a) Raw data. (b) After interpolation

(a) (b) 3500


6000

3000
Water flow

Water flow
2500
4000

2000
2000

1500

0 10 20 30 40 50 60 0 10 20 30 40 50 60
Days Days

Fig. 5 Filtering outliers. (a) Raw data. (b) After filtering

removed from the time series as the pipe bursts are random anomalies affecting
the regularities expected in the data. To filter out the outliers, the commonly used
Hampel filter was applied [23]. The parameters of the filter were tuned by trial-and-
error to best exclude outliers. To illustrate the problem, an example of a part of the
time series with outliers is presented in Fig. 5a. As shown in Fig. 5b, the exemplary
peak in water demand caused by the pipe burst has been eliminated to a certain
extent (please note the different scales of the y-axes in both figures).

5.2 Statistical Features of Data

The stationarity of the time series is one of the main factors that determine the
applicability of a particular predictive model. First, to test for unit root stationarity,
the augmented Dickey Fuller (ADF) test was performed [6]. The resulting low
Forecasting Daily Water Demand Using Fuzzy Cognitive Maps 337

p-value D 0.05912 suggested that the null hypothesis of nonstationarity should be


rejected. As this result was puzzling, we decided to perform the Kwiatkowski–
Phillips–Schmidt–Shin (KPSS) test [17]. The resulting p-value was very close to
0, rejecting the hypothesis of stationarity.
In addition, we performed nonlinearity tests. First, the ARIMA model was fitted
to the entire time series and the residuals were checked using the Ljung–Box
portmanteau test [18]. The p-value close to 0 rejected the hypothesis of linearity
of the considered time series. Second, the test based on ANNs implemented as
“terasvirta.test” in R package was performed [31]. With a p-value close to 1, this
confirmed that the considered time series are nonlinear.

5.3 Experimental Setup

The weights of the designed FCM was learned using the genetic algorithm. The
parameters of the genetic algorithm have been assigned on the basis of literature
and numerous trials. The assigned values are given in Table 3. On trial-and-error
basis we selected the optimal value of the parameter k D 0:4.
For the comparative purpose we implemented also experiments with the state-of-
the-art models that were described in introduction. For the models: Naive, ARIMA,
ARIMAX, GARCH, and Holt-Winters, the R software package was applied for their
implementation [32]. In case of all those methods, the optimal auto-correlation lags
were adjusted automatically. For fitting the ARIMA model to the data, we used the
function “auto.arima” from the package “forecast.” The Holt-Winters model was
learned using the function ets from the “forecast” package.
For linear regression, polynomial regression, and ANN, the KNIME toolkit was
used for the implementation.
All parameters were adjusted using trial-and-error method. Especially, to select
the best structure of ANN, the experiments were repeated many times. A single
hidden layer with 20 neurons and RProp training algorithm for multilayer feed-
forward network were found as the best [27]. A maximum number of iterations
for leaning ANNs were set to 100. For ARIMAX the temperature and precipitation
were added as explanatory variables.

Table 3 The parameters of Description Value


Genetic Algorithm
Cardinality of the initial population card.P1 / 250
Probability of mutation 0.4
Probability of crossover 0.6
Maximal number of iterations (xmax ) 150
Pounding used for the calculations 106
338 J.L. Salmeron et al.

Table 4 Forecasting Method MAPE


accuracy
Naive 7:519984
ARIMA 7:352071
ARIMAX 7:896630
Linear regression 8:026895
Polynomial regression 13:253680
GARCH 47:982280
ANN 8:688714
Holt-Winters 7:418528
FCM 7:13234
Bold represents the lowest (best)
value in the table

Table 4 shows the results of experiments. As can be observed, the proposed FCM
model learned by the RCGA outperformed in terms of MAPE the other state-of-the-
art models.

6 Conclusion

In this chapter the problem of forecasting daily water demand was addressed. For
that purpose we designed a forecasting model based on fuzzy cognitive maps. The
proposed model assumes the mapping of raw time series to the concepts using
fuzzyfication function. The results of the performed experiments provide evidence
for the superiority of the proposed model over the selected state-of-the-art models.
To improve the forecasting accuracy, further theoretical work is required towards
the enhancement of the proposed FCM model.

Acknowledgements The work was supported by ISS-EWATUS project which has received
funding from the European Union’s Seventh Framework Programme for research, technological
development, and demonstration under grant agreement no. 619228. The authors would like to
thank the water distribution company in Sosnowiec (Poland) for gathering water demand data
and the personal of the weather station of the University of Silesia for collecting and preparing
meteorological data.

References

1. Adamowski, J., Adamowski, K., Prokoph, A.: A spectral analysis based methodology to detect
climatological influences on daily urban water demand. Math. Geosci. 45(1), 49–68 (2013)
2. Bianco, V., Manca, O., Nardini, S.: Electricity consumption forecasting in Italy using linear
regression models. Energy 34(9), 1413–1421 (2009)
3. Cortez, P., Rocha, M., Neves, J.: Genetic and evolutionary algorithms for time series fore-
casting. In: Engineering of Intelligent Systems, 14th International Conference on Industrial
Forecasting Daily Water Demand Using Fuzzy Cognitive Maps 339

and Engineering Applications of Artificial Intelligence and Expert Systems, IEA/AIE 2001,
Budapest, Hungary, June 4–7, 2001, Proceedings, pp. 393–402 (2001)
4. Cowpertwait, P.S.P., Metcalfe, A.V.: Introductory Time Series with R, 1st edn. Springer,
New York (2009)
5. Dickerson, J., Kosko, B.: Virtual worlds as fuzzy cognitive maps. Presence 3(2), 173–189
(1994)
6. Dickey, D.A., Fuller, W.A.: Distribution of the estimators for autoregressive time series with a
unit root. J. Am. Stat. Assoc. 74(366), 427–431 (1979)
7. Donkor, E., Mazzuchi, T., Soyer, R., Alan Roberson, J.: Urban water demand forecasting:
review of methods and models. J. Water Resour. Plan. Manag. 140(2), 146–159 (2014)
8. Du, W., Leung, S.Y.S., Kwong, C.K.: A multiobjective optimization-based neural network
model for short-term replenishment forecasting in fashion industry. Neurocomputing 151(Part
1), 342–353 (2015)
9. Froelich, W., Juszczuk, P.: Predictive capabilities of adaptive and evolutionary fuzzy cognitive
maps - a comparative study. In: Nguyen, N.T., Szczerbicki, E. (eds.) Intelligent Systems
for Knowledge Management, Studies in Computational Intelligence, vol. 252, pp. 153–174.
Springer, New York (2009)
10. Froelich, W., Salmeron, J.L.: Evolutionary learning of fuzzy grey cognitive maps for the
forecasting of multivariate, interval-valued time series. Int. J. Approx. Reason. 55(6), 1319–
1335 (2014)
11. Froelich, W., Papageorgiou, E.I., Samarinas, M., Skriapas, K.: Application of evolutionary
fuzzy cognitive maps to the long-term prediction of prostate cancer. Appl. Soft Comput. 12(12),
3810–3817 (2012)
12. Glykas, M. (ed.): Fuzzy Cognitive Maps, Advances in Theory, Methodologies, Tools and
Applications. Studies in Fuzziness and Soft Computing. Springer, Berlin (2010)
13. Han, A., Hong, Y., Wan, S.: Autoregressive conditional models for interval-valued time series
data. In: The 3rd International Conference on Singular Spectrum Analysis and Its Applications,
p. 27 (2012)
14. Johansen, S.: Estimation and hypothesis testing of cointegration vectors in gaussian vector
autoregressive models. Econometrica 59(6), 1551–1580 (1991)
15. Juszczuk, P., Froelich, W.: Learning fuzzy cognitive maps using a differential evolution
algorithm. Pol. J. Environ. Stud. 12(3B), 108–112 (2009)
16. Kosko, B.: Fuzzy cognitive maps. Int. J. Man Mach. Stud. 24, 65–75 (2010)
17. Kwiatkowski, D., Phillips, P.C., Schmidt, P., Shin, Y.: Testing the null hypothesis of stationarity
against the alternative of a unit root: how sure are we that economic time series have a unit root?
J. Econ. 54(1–3), 159–178 (1992)
18. Ljung G.M, Box G.E.P.: On a measure of lack of fit in time series models. Biometrika 65(2),
297–303 (1978)
19. Lu, W., Pedrycz, W., Liu, X., Yang, J., Li, P.: The modeling of time series based on fuzzy
information granules. Expert Syst. Appl. 41, 3799–3808 (2014)
20. MatíAs, J.M., Febrero-Bande, M., González-Manteiga, W., Reboredo, J.C.: Boosting garch
and neural networks for the prediction of heteroskedastic time series. Math. Comput. Model.
51(3–4), 256–271 (2010)
21. Papageorgiou, E.I., Froelich, W.: Application of evolutionary fuzzy cognitive maps for
prediction of pulmonary infections. IEEE Trans. Inf. Technol. Biomed. 16(1), 143–149 (2012)
22. Papageorgiou, E.I., Froelich, W.: Multi-step prediction of pulmonary infection with the use of
evolutionary fuzzy cognitive maps. Neurocomputing 92, 28–35 (2012)
23. Pearson, R.: Outliers in process modelling and identification. IEEE Trans. Control Syst.
Technol. 10(1), 55–63 (2002)
24. Pulido-Calvo, I., Gutiérrez-Estrada, J.C.: Improved irrigation water demand forecasting using
a soft-computing hybrid model. Biosyst. Eng. 102(2), 202–218 (2009)
25. Pulido-Calvo, I., Montesinos, P., Roldán, J., Ruiz-Navarro, F.: Linear regressions and neural
approaches to water demand forecasting in irrigation districts with telemetry systems. Biosyst.
Eng. 97(2), 283–293 (2007)
340 J.L. Salmeron et al.

26. Qi, C., Chang, N.B.: System dynamics modeling for municipal water demand estimation in an
urban region under uncertain economic impacts. J. Environ. Manag. 92(6), 1628–1641 (2011)
27. Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: the
RPROP algorithm. In: Proceedings of the IEEE International Conference on Neural Networks
(ICNN), pp. 586–591 (1993)
28. Stach, W., Kurgan, L., Pedrycz, W., Reformat, M.: Genetic learning of fuzzy cognitive maps.
Fuzzy Sets Syst. 153(3), 371–401 (2005)
29. Stach, W., Kurgan, L.A., Pedrycz, W.: Numerical and linguistic prediction of time series with
the use of fuzzy cognitive maps. IEEE Trans. Fuzzy Syst. 16(1), 61–72 (2008)
30. Stach, W., Kurgan, L., Pedrycz, W.: A divide and conquer method for learning large fuzzy
cognitive maps. Fuzzy Sets Syst. 161, 2515–2532 (2010)
31. Teräsvirta, T., Lin, C.F., Granger, C.W.J.: Power of the neural network linearity test. J. Time
Ser. Anal. 14(2), 209–220 (1993)
32. R Development Core Team (2008). R: A language and environment for statistical computing.
R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, http://www.R-
project.org
33. Homenda W., Jastrzȩbska A., Pedrycz W.: Joining concept’s based fuzzy cognitive map model
with moving window technique for time series modeling. In: IFIP International Federation
for Information Processing, CISIM 2014. Lecture Notes in Computer Science, vol. 8838.
pp. 397–408 (2014)
34. Homenda W., Jastrzȩbska A., Pedrycz W.: Modeling time series with fuzzy cognitive maps. In:
IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 2055–2062 (2014)
35. Homenda W., Jastrzȩbska A., Pedrycz W.: Time series modeling with fuzzy cognitive maps:
simplification strategies, the case of a posteriori removal of nodes and weights. In: IFIP
International Federation for Information Processing, CISIM 2014. Lecture Notes in Computer
Science, vol. 8838, pp. 409–420 (2014)
36. Yasar, A., Bilgili, M., Simsek, E.: Water demand forecasting based on stepwise multiple
nonlinear regression analysis. Arab. J. Sci. Eng. 37(8), 2333–2341 (2012)
Forecasting Short-Term Demand for Electronic
Assemblies by Using Soft-Rules

Tamás Jónás, József Dombi, Zsuzsanna Eszter Tóth, and Pál Dömötör

Abstract Organizations competing in the electronic industry, where product life-


cycles are getting shorter and shorter, need to understand the short-term behavior of
customer demand to make the right decisions in time. This work presents a new
way of modeling and forecasting customer demand for electronic assemblies in
the electronic manufacturing services industry, where demand patterns are more
volatile. The objective of the paper is to propose a new short-term forecasting
technique that is based on learning soft-rules in demand time series of electronic
assemblies. The approach presented is also verified by empirical results. The
prediction capability of the introduced methodology is compared to moving average,
ARIMA, and exponential smoothing based forecasting methods. According to the
results the presented forecasting method can be considered as a viable alternative
customer demand prediction technique.

Keywords Customer demand • Electronic industry • Growing hierarchical self-


organizing map • Short-term forecasting • Soft-rules

T. Jónás ()
Research and Development Department, Flextronics International Ltd, Hangár u. 5-37, 1183
Budapest, Hungary
Department of Management and Corporate Economics, Budapest University of Technology
and Economics, Magyar tudósok körútja 2, 1117 Budapest, Hungary
e-mail: [email protected]; [email protected]
J. Dombi
Institute of Informatics, University of Szeged, Árpád tér 2, 6720 Szeged, Hungary
Z.E. Tóth
Department of Management and Corporate Economics, Budapest University of Technology
and Economics, Magyar tudósok körútja 2, 1117 Budapest, Hungary
P. Dömötör
Research and Development Department, Flextronics International Ltd, Hangár u. 5-37,
1183 Budapest, Hungary

© Springer International Publishing Switzerland 2016 341


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_25
342 T. Jónás et al.

1 Introduction

In this paper, a new way of modeling and forecasting customer demand for
electronic assemblies is addressed and discussed. This work presents a compar-
ative forecasting methodology regarding uncertain customer demands via hybrid
techniques. Artificial intelligence forecasting techniques have been receiving much
attention lately in order to solve problems that can be hardly solved by the
use of traditional forecasting methods. The main objective of the paper is to
introduce an approach which is based on short-term pattern recognition and soft-rule
learning techniques and can be applied to predict short-term demand for electronic
assemblies and to manage customer demand in a fluctuating business environment.
The effectiveness of the proposed approach to the demand forecasting issue is
demonstrated using real-world data.
The available forecasting techniques can be classified into four main groups:
time series methods, causal methods, qualitative methods, and simulation methods
[2]. Most studies predict customer demand based on the first two categories [4].
Customer demand forecasts based on historical information are usually accurate
enough, when demand patterns are relatively smooth and continuous. However,
these methods react to changes slowly in dynamic environments [10]. In the
markets of electronic manufacturing services (EMS) companies, whose customers
are typically original equipment manufacturers (OEMs), the customer demand
for a particular electronic assembly may vary a lot from a week to another.
There are several reasons behind it, for example, the frequent demand changes at
different parties in the whole supply chain, the short-term supply tactics of the
direct customer, or the high number of substituting products. These phenomena
finally result in demand time series which are typically fragmented and do not
form characteristic life-cycles. Therefore, understanding the short-term behavior of
customer demand can be fruitful in managing the daily operation at the manufacturer
end.
The application of quantitative forecasting methods has several limitations in
such a case. The relationship between historical data and the demand data can
be complicated or can result in a poor regression. A large amount of data is
often required and non-linear patterns are difficult to capture. Finally, outliers can
bias the estimation of the model parameters [5]. To overcome these limitations,
artificial neural networks (ANNs) have also become a conventional technique in
demand forecasting owing to its adequate performance in pattern recognition.
ANNs are flexible computing frameworks and universal approximations that can
be applied to a wide range of forecasting problems [15]. Fuzzy theory has also been
broadly applied in forecasting (e.g., [6]). Song et al. proposed fuzzy time series
models to deal with forecasting problems [11, 12]. Hybrid models combining neural
networks and fuzzy systems are also of interest and could provide better results
(e.g., [7, 9, 14]). In our approach, a hierarchical neural clustering method, which is
founded on self-organizing maps, was used to recognize the so-called soft-rules in
Forecasting Short-Term Demand for Electronic Assemblies 343

demand time series of electronic assemblies. Once adequate soft-rules are identified,
they can be used for forecasting purposes.

2 Learning Soft-Rules

We assume that there are m time series, each of which represents the weekly
historical demand for an electronic assembly, that is, for the ith assembly we
have the Yi;1 ; Yi;2 ; : : : ; Yi;ni time series, where Yi;j is the jth demand for item
i.i D 1; 2; : : : ; m; j D 1; 2; : : : ; ni /. Our goal is to identify the short-term behavior
of demand changes, that is, we wish to deliver

.Lk ; LkC1 ; : : : ; LkCu1 / ) .RkCu ; : : : ; RkCuCv1 / (1)

like soft-rules, where Lp and Rq are values obtained from Yi;1 ; Yi;2 ; : : : ; Yi;ni , .u
3; v 1I p D k; : : : ; k C u  1I q D k C u; : : : ; k C u C v  1/. Such a
soft-rule has the following meaning: if there is a pattern of normalized historical
data similar to Lk ; LkC1 ; : : : ; LkCu1 , then future continuation of this series may be
RkCu ; : : : ; RkCuCv1 . Our aim is to identify soft-rules on the short run, that is, we
wish to obtain soft-rules learned from the time series Yi;1 ; Yi;2 ; : : : ; Yi;ni that can
be used to predict demands for a few periods (typically for 1–5 periods) by using
historical demand data of 3–15 periods. Let i 2 f1; 2; : : : ; mg be a fixed index. At
first, we discuss how to generate soft-rules for the time series Yi;1 ; Yi;2 ; : : : ; Yi;ni . For
a simpler notation, let Y1 ; Y2 ; : : : ; Yn denote the time series Yi;1 ; Yi;2 ; : : : ; Yi;ni .

2.1 Data Normalization and the Study-Pairs

We set the number of historical periods .u/ and the number of periods for prediction
.v/ so that u 3; v 1, u C v < n. We will differentiate among three cases,
depending on the values of each u period long sub-time-series Yk ; YkC1 ; : : : ; YkCu1
in the time series Y1 ; Y2 ; : : : Yn .k D 1; 2; : : : ; n  v  u C 1/.
.u/ .v/
Case 1. If Yk ; YkC1 ; : : : ; YkCu1 are not all equal, vectors zk and zkCu are defined
.u/ .v/
as zk D .Zk ; ZkC1 ; : : : ; ZkCu1 /, zkCu D .ZkCu ; : : : ; ZkCuCv1 /, where

YkCp  mk;u
ZkCp D ; (2)
Mk;u  mk;u
YkCuCq  mk;u
ZkCuCq D ; (3)
Mk;u  mk;u
344 T. Jónás et al.

and mk;u and Mk;u are the minimum and maximum of Yk ; YkC1 ; : : : ; YkCu1
values, respectively, p D 0; : : : ; u1, q D 0; : : : ; v1, k D 1; 2; : : : ; nvuC1.
.u/
Each component in zk is normalized to the Œ0; 1 interval, and the min-max
normalization used for a u-period long sub-time-series is also applied to its v-
period long continuation.
Case 2. If Yk ; YkC1 ; : : : ; YkCu1 are all the same and nonzero, say have the value
.u/ .v/ .v/
of a.a > 0/, then zk D .1; 1; : : : ; 1/, and vector zkCu is defined as zkCu D
.YkCu =a; : : : ; YkCuCv1 =a/.
.u/
Case 3. If Yk ; YkC1 ; : : : ; YkCu1 are all zeros, then zk D .0; 0; : : : ; 0/, and vector
.v/ .v/
zkCu is defined as zkCu D .YkCu ; : : : ; YkCuCv1 /.
.u/ .v/
We call the pairs .zk ; zkCu / study-pairs, each of them contains a normalized u-
period long sub-time-series and its v-period long continuation. The normalizations
.u/ .u/ .v/
allow us to make the zk vectors comparable and cluster the .zk ; zkCu / study-pairs
.u/
based on their zk vectors.

2.2 Forming Soft-Rules

In order to find typical u-period long sub-time-series, that is, potential left-sides
.u/
of soft-rules, we cluster the study-pairs based on their zk vectors by applying a
growing hierarchical self-organizing map (GHSOM) method that is a customized
version of the method described in the work of Dittenbach et al. [3]. In our
implementation, growth of the GHSOM is controlled by using the spread-out factor
fs . The control method that we use is based on the one by Alahakoon et al. [1].
The spread-out factor fs is set so that 0 < fs < 1. Small values of fs result in less
clusters with higher quantization errors, while its higher values yield more clusters
with lower quantization errors.
Let us assume that the L1 ; L2 ; : : : ; LN clusters are formed with nL1 , nL2 ; : : : ; nLN
number of study-pairs, respectively. Let Is be the set of indexes k of study-pairs
.u/ .v/
.zk ; zkCu / that belong to cluster Ls .s 2 1; 2; : : : ; N/, that is,
n ˇ  o
Is D kˇ zk ; zkCu 2 Ls ; 1  k  n  v  u C 1
.u/ .v/
(4)

.u/ .u/ .u/


and let ls be the centroid of vectors zk in cluster Ls . Each ls represents a u-
period long normalized typical pattern learned from the Y1 ; Y2 ; : : : ; Yn time series
.u/
.s D 1; 2; : : : ; N/. The ls vectors are the left-sides of the soft-rules that we wish to
identify.
Forecasting Short-Term Demand for Electronic Assemblies 345

Fig. 1 Clustered study-pairs 1.5

0.5

−0.5

−1

−1.5
0 2 4 6 8 10 12 14 16
Periods

For cluster Ls , the cluster representative (centroid) u-period long normalized


.u/ .v/
typical pattern ls can be associated with each v-period long continuation zkCu that
belongs to this cluster .k 2 Is /. By this means, in each cluster Ls , the assignment
n o
.v/
l.u/
s ) zkCu (5)

.u/
can be formed, and if ls D .Ls;0 ; Ls;1 ; : : : ; Ls;u1 /, then this assignment can be
treated as a set of .Ls;0 ; Ls;1 ; : : : ; Ls;u1 / ) .ZkCu ; : : : ; ZkCuCv1 / like correspon-
dences, where k 2 Is .
Figure 1 depicts an example for cluster of study-pairs. In this example, the
study-pairs are clustered according to normalized 10-period long sub-time-series.
The normalized 10-period long sub-time-series (gray lines), the cluster centroid
(thick line), and the normalized 5-period long continuations (gray lines) are shown
together. The figure illustrates that different continuations are assigned to the 10-
period long normalized typical pattern (cluster centroid).
.u/ .v/
It is important to see that in the ls ) fzkCu g assignment, its left-side is one
vector that represents a u-period long typical normalized historical pattern, while
.u/
its right-side is a set of possible v-period long continuations of ls . Therefore, the
.u/ .v/ .v/
ls ) fzkCu g assignment cannot be taken as a rule, unless all the vectors zkCu on
.u/ .v/
its right-side are identical. Forming a soft-rule from the ls ) fzkCu g assignment
.v/
means that we identify a single, v-dimensional vector rs D .Rs;0 ; Rs;1 ; : : : ; Rs;v1 /
.v/
which represents all the vectors in fzkCu g well .k 2 Is /. In such a case,
.Ls;0 ; Ls;1 ; : : : ; Ls;u1 / ) .Rs;0 ; Rs;1 ; : : : ; Rs;v1 / can be considered as a soft-rule.
.v/
There can be several methods established to derive an appropriate rs vector from
.u/ .v/
the continuations of ls in the fzkCu g set. We discuss two methods: the Average
346 T. Jónás et al.

of Right-Side Continuations (ARSC) and the Clustered Right-Side Continuations


(CRSC). After that, we introduce certain metrics to measure goodness of soft-rules,
and an algorithm that uses the two mentioned methods and the goodness metrics to
form soft-rules.
Average of Right-Side Continuations An obvious approach to derive an appro-
.v/ .u/ .v/
priate rs vector from the continuations of ls in the fzkCu g set is to define it as the
average of the continuations, that is,
P
k2Is ZkCuCq
Rs;q D ; .q D 0; 1; : : : ; v  1/: (6)
nLs

.v/
If continuations zkCu .k 2 Is / are not too spread out they can be represented
.v/ .u/ .v/
well by their average rs , and the soft-rule ls ) rs can be formed. When the
.v/
zkCu .k 2 Is / continuations are spread, their average does not represent them well.
Clustered Right-Side Continuations In Fig. 1, we can identify three clusters of
.v/
the zkCu .k 2 Is / continuations. One of them (the most upper one) contains several
continuations with similar patterns, another one contains a single continuation, and
.v/
the third one includes two continuations. In such cases, clustering the zkCu right-side
.u/
continuations of the zk left-side time series (left-sides of study-pairs) that belong
to cluster Ls .k 2 Is / may result in clusters out of which one includes dominant
.v/
portion of the zkCu , while the others contain much less right-side continuations. The
right-side continuations are clustered by using the same GHSOM based clustering
.v/ .u/
method that we discussed before. If the zkCu right-side continuations of ls .k 2
.v/ .v/
Is / are clustered into the Rs;1 ; : : : ; Rs;Ms clusters with the rs;1 ; : : : ; rs;Ms cluster
.v/
centroids, respectively, then vector rs can be defined as the centroid of cluster Rs;i
ˇ ˇ
that contains the highest number of zkCu continuations: jRs;i j D max ˇRs;j ˇ ;
.v/
jD1;:::;Ms
i 2 f1; : : : ; Ms g:
Measuring Goodness of a Soft-Rule Off course, it may happen that clusters
.v/
Rs;1 ; : : : ; Rs;Ms all include the same number of vectors from the set fzkCu g, or
there are some of the clusters among Rs;1 ; : : : ; Rs;Ms with same number of vectors
.v/ .v/
from the set fzkCu g. In such cases, vector rs cannot be unambiguously selected
.v/ .v/ .u/ .v/
from the cluster centroids rs;1 ; : : : ; rs;Ms , neither the soft-rule ls ) rs can be
unambiguously selected from the correspondences

.v/ .v/ .v/


l.u/ .u/ .u/
s ) rs;1 ; ls ) rs;2 ; : : : ; ls ) rs;Ms ; (7)

and so the ARSC method may be more appropriate than the CRSC method.
As mentioned before, the CRSC method is applicable if one of the clusters
.v/
Rs;1 ; : : : ; Rs;Ms contains the majority of the zkCu continuations .k 2 Is /.
Forecasting Short-Term Demand for Electronic Assemblies 347

.u/ .v/
In order to characterize how representative the soft-rule ls ) rs is, we use
two metrics, the support and the confidence, similarly to the way as the support and
confidence metrics are used to characterize strength of association rules [13].
.u/ .v/ .u/ .v/ .u/
The support sup .ls ) rs / of soft-rule ls ) rs is the proportion of zk
.u/
vectors of which closest cluster centroid is the vector ls .1  k  n  v  u C 1/. In
other words, the proportion of u-period long sub-time-series represented by vector
.u/ .u/ .v/
ls is the support of ls ) rs , that is,

.u/ jIs j
sup .ls ) r.v/
s /D ; (8)
nuvC1
.u/ .v/
the cardinality jIs j is the support count of soft-rule ls ) rs .
.u/ .v/ .u/ .v/
The confidence con .ls ) rs / of soft-rule ls ) rs can be defined as the
.v/ .u/
proportion of zkCu right-side continuations (of ls ) of which closest cluster centroid
.v/ .v/ .u/ .v/
is the vector rs .k 2 Is /. Considering definition of rs , con .ls ) rs / can be
calculated as
ˇ ˇ
max ˇRs;j ˇ
.u/ .v/ jD1;:::;Ms
con .ls ) rs / D : (9)
jIs j

The problem with this definition of confidence is that it is applicable when the
CRSC method is used, but not for the ARSC method, thus, we propose the following
.u/
algorithm to derive a soft-rule with left-side of ls and interpret its confidence.

Algorithm (Forming a Soft-Rule)


.u/ .v/ .u/ .u/
Inputs: cluster Ls of study-pairs .zk ; zkCu /; the ls centroid of zk vectors in cluster Ls ; Is set
.u/ .v/
of indexes k of study-pairs .zk ; zkCu /
that belong to cluster Ls ; the confidence threshold tcon ,
0:5 < tcon < 1.
Step1: Specify a spread-out factor and apply the introduced hierarchical neural clustering to cluster
.v/ .u/
the zkCu right-side continuations of the zk left-side time series (left-sides of study-pairs) that
.v/ .v/
belong to cluster Ls .k 2 Is /. Results of clustering: Rs;1 ; : : : ; Rs;Ms clusters with the rs;1 ; : : : ; rs;Ms
cluster centroids, respectively. ˇ ˇ ˇ ˇ
Step2: Let rs be the centroid of cluster Rs;i for which ˇRs;i ˇ D max ˇRs;j ˇ i 2 f1; : : : ; Ms g,
.v/
jD1;:::;Ms
.u/ .v/
and consider ls ) rs as a potential soft-rule.
.u/ .v/ .u/ .v/
Step3: Compute the confidence con .ls ) rs / of potential soft-rule ls ) rs using (9).
.u/ .v/ .u/ .v/
– If con .ls ) rs / tcon , then form the soft-rule ls ) rs .
.u/ .v/ .v/
– If con .ls ) rs / < tcon , then compute rs according to the ARSC method, and form the
.u/ .v/ .u/ .v/
soft-rule ls ) rs . In this case, let con .ls ) rs / D tcon .
.u/ .v/ .u/ .v/
Outputs: The soft-rule ls ) rs and its confidence con .ls ) rs / .
348 T. Jónás et al.

The confidence of a rule generated by using the ARSC method is set to tcon .
If definition (9) was applied, the confidence of such a rule would be 1, but the
average of continuations may not represent all the continuations so well as the
cluster centroid of the cluster with the highest cardinality does in the case when
the CRSC method is used to form the soft-rule. Thus, setting the confidence of such
a rule to tcon expresses that its confidence is not considered being better than tcon .
Aggregate Goodness of Soft-Rules Let
.u/ .v/ .u/ .v/ .u/ .v/
l1 ) r1 ; l2 ) r2 ; : : : ; lN ) rN (10)

be soft-rules learned for the time series Y1 ; Y2 ; : : : ; Yn by following the method


discussed so far .N 2 N; N > 0/. The goodness of each of these soft-rules can
be characterized by its support and confidence, and the greater these two values for
a soft-rule are, the better its goodness is. Based on this, the average of support and
.u;v/
confidence products supcon can be used to express the aggregate goodness of all
the soft-rules learned for one time series:

1 X
N
.u;v/ .u/
supcon D sup .ls ) r.v/
s /
.u/
con .ls ) r.v/
s /: (11)
N sD1

.u;v/
There are a couple of notable properties of supcon that need to be considered in
.u/ .v/
order to use it in practice appropriately. If the ls ) rs soft-rules .s D 1; 2; : : : ; N/
are all formed by using the CRSC method, then using the definitions of support and
.u;v/
confidence in (8) and (9), supcon can be written as

1 1 X N
ˇ ˇ
max ˇRs;j ˇ :
.u;v/
supcon D (12)
N n  u  v C 1 sD1 jD1;:::;Ms

As discussed before, u C v < n, but if u C v is large compared to n, then the


.u;v/
number of soft-rules N is small as well as n  u  v C 1 is small, and so supcon
may be relatively large. For example, if u C v D n  1, then one or two soft-rules
.u;v/ .u;v/
are formed and supcon is 1 or 0.5. It means that using the measure supcon to
characterize aggregate goodness of soft-rules can be misleading when the sum of
number of historical periods .u/ and the number of periods for prediction .v/ is
close to the total length of the studied time series. In practice, we use our method
with 3  u  15 and 1  v  5, while the total length of the analyzed time series
is typically greater than 100, so we do not face the highlighted limitation of use of
.u;v/
supcon . It is worth mentioning that u C v anyhow needs to be kept small because if
u C v were large, then we would face the problems of clustering high-dimensional
data [8]. Considering definition (9) for confidence of a soft-rule formed by using the
CRSC method and the interpretation of confidence in case when the ARSC method
Forecasting Short-Term Demand for Electronic Assemblies 349

is used, it can be seen that the inequalities


tcon .u;v/ 1
 supcon  (13)
N N
stand. Note that this backs up that if u C v increases, and so N decreases, then
.u;v/
supcon tends to increase.
The confidence and support values of individual soft-rules as well as the value of
.u;v/
supcon depend not only on u and v, but also on the parameters such as the spread-
out factors for the left- and right-side clustering, or the confidence threshold tcon
that were applied to form them. Thus, confidence of a soft-rule may be found quite
good, and at the same time, the right-side continuations may be found relatively
spread around the right-side of the rule which is to represent them (e.g., when the
spread-out factor for the right-side clustering is relatively low). In order to measure
how much the corresponding right-side continuations are spread around the right-
sides of soft-rules in average, other additional aggregate goodness measures can be
.u;v/
used as well. In our practical application the average mean square error MSE was
used as an additional aggregate goodness measure of soft-rules:
v1
1 X 1 X 1 X 2
N
.u;v/
MSE D ZkCuCq  Rs;q : (14)
N sD1 jIs j k2I v qD0
s

The Number of Historical Periods Taken into Account The number of historical
periods .u/ and the number of periods for prediction .v/ are important parameters
to our method. Parameter v is driven by the particular practical application, while u
.u;v/ .u;v/
can be a user-specified one. The supcon and MSE measures can be used to answer
the question on what u results the best soft-rule based model for the time series
Y1 ; Y2 ; : : : ; Yn , if all the other parameters used for forming the rules are fixed.
Let
.u/ .v/ .u/ .v/ .u/ .v/
l1 ) r1 ; l2 ) r2 ; : : : ; lNu ) rNu (15)

be the soft-rules generated for a given u.3  u  umax ; Nu 2 N; Nu > 0/. As


discussed before, umax n  v  1, in our practical application umax  15. We wish
to determine a u D u which results the best soft-rules in terms of their average
.u;v/ .u;v/
of support and confidence products supcon and average mean square error MSE
.u;v/
measures. According to the definitions of these two metrics, high values of supcon
.u;v/ .u;v/
and at the same time low values of MSE are preferred. As 0  supcon  1 and
.u;v/
0  MSE , we use the following metrics:

.u;v/ .u;v/
dsupcon D 1  supcon (16)
.u;v/
.u;v/ MSE
dMSE D .i;v/
; (17)
max .MSE /
iD3;:::;umax
350 T. Jónás et al.

and determine u as minimum of set U, where


 2  2
.u;v/ .u;v/
U D argmin dsupcon C dMSE : (18)
u

3 Using Soft-Rules for Demand Forecasting

Let us assume that applying the methodology discussed so far to the time series
Y1 ; Y2 ; : : : ; Yn results in the following soft-rules:

.u/ .v/ .u/ .v/ .u/ .v/


l1 ) r1 ; l2 ) r2 ; : : : ; lN ) rN (19)

.N 2 N; N > 0/ and u D u was determined according to (18). A v-period long


forecast for continuation of time series Y1 ; Y2 ; : : : ; Yn based on its last u historical
values by using the soft-rules in (19) can be provided as follows.
Let YnuC1 ; : : : ; Yn be the last u values in time series Y1 ; Y2 ; : : : ; Yn . We handle
three cases separately, and define vector z.u/ D .Z0 ; Z1 ; : : : ; Zu1 / depending on the
YnuC1 ; : : : ; Yn values.
Case 1. If the YnuC1 ; : : : ; Yn values are not all the same, we apply a min-max
Y ˛
normalization to them: Zp D nuC1Cp ˇ˛ , where ˛ and ˇ are the minimum and
maximum values of YnuC1 ; : : : ; Yn , respectively, p D 0; : : : ; u  1.
Case 2. If the YnuC1 ; : : : ; Yn values are all the same and nonzero, say a .a > 0/,
then z.u/ D .Z0 ; Z1 ; : : : ; Zu1 / D .1; 1; : : : ; 1/.
Case 3. If the YnuC1 ; : : : ; Yn values are all zeros, then z.u/ D .Z0 ; Z1 ; : : : ; Zu1 / D
.0; 0; : : : ; 0/.
.u/
All components of vector z.u/ and ls D .Ls;0 ; Ls;1 ; : : : Ls;u1 / are in the Œ0; 1
interval, and so jZi  Ls;i j is in the Œ0; 1 interval, too for each i 2 f0; 1; : : : ; u  1g,
.s D 1; 2; : : : ; N/. It allows us to use the fuzzy similarity ws between vector z.u/ and
.u/
ls :

Q
u1
.1  jZi  Ls;i j/
iD0
ws D : (20)
Q
P u1
N 
1  jZi  Lp;i j
pD1 iD0

.u/ .v/
It expresses how much the left-side of soft-rule ls ) rs is similar to vector
.v/
z.u/ .s D 1; 2; : : : ; N/. Let vector rs be .Rs;0 ; : : : ; Rs;v1 /, then vector r.v/ D
.R0 ; R1 ; : : : ; Rv1 / is calculated as

X
N
Rt D ws Rs;t (21)
sD1
Forecasting Short-Term Demand for Electronic Assemblies 351

.t D 0; 1; : : : ; v  1/, that is, right-side of each soft-rule is considered that much in


vector r.v/ as much the left-side of the rule is similar to vector z.u/ .
 
The YnC1 ; : : : ; YnCv v-period long forecast for the continuation of Y1 ; Y2 ; : : : ; Yn
for the three cases discussed above can be calculated as
Case 1.

 .ˇ  ˛/Rt C ˛; if .ˇ  ˛/Rt C ˛ 0
YnC1Ct D (22)
0; otherwise

Case 2.

 aRt ; if aRt 0
YnC1Ct D (23)
0; otherwise

Case 3.

 Rt ; if Rt 0
YnC1Ct D (24)
0; otherwise

.t D 0; : : : ; v  1/.

4 Empirical Results and Conclusions

Our method was developed for practical applications to support short-term demand
forecasting of electronic assemblies. This section demonstrates how the soft-rules
discovered from demand time series can be used for modeling and forecasting
purposes.
The soft-rule (SF) based, moving average (MA), exponential smoothing (ES),
and ARIMA forecasting methods were applied to generate 1 week .v D 1/ and
3 week .v D 3/ demand forecasts for 12 electronic assemblies. Forecast values
were compared to actual demand data and mean square error (MSE) values were
calculated for each method. For each time series, the soft-rules were generated with
historical periods u resulting the best set of rules according to the method discussed
in “Aggregate Goodness of Soft-Rules”. 0.95 of spread-out factor fs (both for left-
and right-side clustering), and a 0.8 of confidence threshold tcon were applied to
generate the soft-rules for each historical demand time series. The length of moving
average was set to the value of parameter u of the corresponding SF model. The
number of autoregressive terms . p/, nonseasonal differences needed for stationarity
.d/, and lagged forecast errors .q/ are indicated for the best fitting ARIMA model
for each time series. Results are summarized in Table 1 in which the bold and
underlined number for each prediction indicates the best MSE value.
352 T. Jónás et al.

Table 1 MSE values for 1 and 3 week ahead predictions


Length of . p; d; q/ of
Assembly time series u of SF ARIMA SF MA ES ARIMA
MSE’s of 1 week ahead forecasts
1 157 8 (3,1,3) 627.5 642:4 638:7 592.7
2 144 9 (2,0,2) 441.7 2114:4 1928:7 533.9
3 163 7 (1,0,2) 835.4 1777:3 1835:4 698.4
4 183 10 (3,1,1) 6379.1 8152:8 7134:2 6002.7
5 152 8 (3,0,1) 7025.9 10;422:2 9483:6 7472.7
6 172 11 (4,0,0) 6135.3 7943:7 9103:4 7006.1
7 178 7 (3,1,1) 685.5 1149:2 1076:7 725.6
8 169 9 (4,0,2) 2300.4 3009:3 3018:4 2574.6
9 159 12 (3,0,1) 9023.1 11;443:8 12;315:9 9552.8
10 191 6 (3,1,3) 3631.3 7255:1 6332:9 3817.5
11 147 9 (2,0,4) 4029.3 8975:2 9278:2 4477.8
12 182 13 (2,0,1) 2587.4 5312:6 5892:8 3255.6
MSE’s of 3 weeks ahead forecasts
1 157 9 (3,1,3) 6123.2 9312:5 9189:8 6282.3
2 144 11 (2,0,2) 1332.2 4432:4 4312:3 1442.6
3 163 8 (1,0,2) 1311.7 3882:2 3952:4 1072.4
4 183 8 (3,1,1) 9426.4 17;835:7 17;511:5 10,208.7
5 152 9 (3,0,1) 6873.8 12;805:2 11;613:2 6332.1
6 172 12 (4,0,0) 9886.2 17;872:2 16;999:2 9223.7
7 178 6 (3,1,1) 5521.3 9256:1 9986:9 5922.7
8 169 11 (4,0,2) 4611.3 15;183:1 14;291:2 5012.6
9 159 8 (3,0,1) 8694.1 14;442:8 15;192:1 8162.4
10 191 8 (3,1,3) 9324.2 17;423:0 18;189:4 10,097.3
11 147 9 (2,0,4) 6002.4 12;722:4 11;031:9 5537.3
12 182 10 (2,0,1) 9588.7 18;922:3 19;012:2 10,387.2

The empirical results tell us that in 16 cases, out of the total 24 cases, the soft-
rule based forecast method gave the least MSE value, while in rest of the cases the
ARIMA method produced better MSE values.
The presented approach, which was realized in a software, can be taken as a
machine learning based one. Namely, the GHSOM as an unsupervised learning
technique is used to discover typical patterns in time series, based on which left-
sides of soft-rules can be identified. We discussed two methods: the ARSC and the
CRSC to form soft-rules, however, there can be other possibilities to form soft-rules
once their left-sides are identified. For example, the age of each left-side historical
sub-time-series can be taken into consideration as a weight when average of the
right-side continuations is generated. Our method can be taken as a hybrid one
that combines neural learning and statistical techniques. Based on the empirical
results, the introduced technique can be considered as a viable alternative method for
Forecasting Short-Term Demand for Electronic Assemblies 353

short-term customer demand prediction and as such can support customer demand
management in a fluctuating business environment. One of our further research
plans is to study how our method relates to the ARIMA and fuzzy time series
models as well as how it could be combined with them. We also wish to study
how the spread-out factors for the left- and right-side clustering, and the confidence
threshold impact on the learned soft-rules and goodness of prediction.

References

1. Alahakoon, D., Halgamuge, S., Srinivasan, B.: Dynamic self-organizing maps with controlled
growth for knowledge discovery. IEEE Trans. Neural Netw. 11(3), 601–614 (2000)
2. Chopra, S., Meindl, P.: Supply Chain Management: Strategy, Planning and Operation. Prentice-
Hall, Upper Saddle River, NJ (2001)
3. Dittenbach, M., Merkl, D., Rauber, A.: The growing hierarchical self-organizing map. In:
Proceedings of the Intu Joint Conference on Neural Networks (IJCNN2000), Como, pp. VI-
15–VI-19. IEEE Computer Society Press, Los Alamitos, CA (2000)
4. Efendigil, T., Önüt, S., Kahraman, C.: A decision support system for demand forecasting with
artificial neural networks and neuro-fuzzy models: a comparative analysis. Expert Syst. Appl.
36(3), 6697–6707 (2009)
5. Garetti, M., Taisch, M.: Neural networks in production planning and control. Prod. Plan.
Control 10(4), 324–339 (1999)
6. Huarng, K.: Heuristic models of fuzzy time series for forecasting. Fuzzy Sets Syst. 123, 369–
386 (2001)
7. Khashei, M., Hejazi, S., Bijari, M.: A new hybrid artificial neural networks and fuzzy
regression model for time series forecasting. Fuzzy Sets Syst. 159(7), 769–786 (2008)
8. Kriegel, H., Kröger, P., Zimek, A.: Clustering high-dimensional data. ACM Trans. Knowl.
Discov. Data 3(1) (2009). doi:10.1145/1497577.1497578
9. Kuo, R., Xue, K.: An intelligent sales forecasting system through integration of artificial neural
network and fuzzy neural network. Comput. Ind. 37, 1–15 (1998)
10. Petrovic, D., Xie, Y., Burnham, K.: Fuzzy decision support system for demand forecasting with
a learning mechanism. Fuzzy Sets Syst. 157, 1713–1725 (2006)
11. Song, Q., Chissom, B.: Fuzzy time series and its models. Fuzzy Sets Syst. 54, 269–277 (1993)
12. Song, Q., Leland, R.: Adaptive learning defuzzification techniques and applications. Fuzzy
Sets Syst. 81, 321–329 (1996)
13. Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1st edn. Chap. 6. Association
Analysis: Basic Concepts and Algorithms. Addison-Wesley, Boston (2005)
14. Valenzuela, O., Rojas, I., Rojas, F., Pomares, H., Herrera, L., Guillen, A., Marquez, L., Pasadas,
M.: Hybridization of intelligent techniques and ARIMA models for time series prediction.
Fuzzy Sets Syst. 159, 821–845 (2008)
15. Zhang, G., Hu, B.: Forecasting with artificial neural networks: the state of the art. Neurocom-
puting 56, 205–232 (2004)
Electrical Load Forecasting: A Parallel Seasonal
Approach

Oussama Ahmia and Nadir Farah

Abstract The electrical load forecast is an important aspect for the electrical
distribution companies. It is important to determine the future demand for power
in the short, medium, and long term.
In the aim of making sure that the prediction remains relevant, different parame-
ters are taking into account such as gross domestic product (GDP) and weather.
This contribution covers the forecasting of medium and long terms of Algerian
electrical load, using information contained in past consumption in a parallel
approach where each season is forecasted separately and using dynamic load profile
in order to deduct daily and hourly load values.
Three models are implemented in this work. Multy variables linear regression
(MLRs), artificial neural network (ANN) multi-layer perceptron (MLP), support
vector machines regression (SVR) with grid search algorithm for hyperparameter
optimization, and we use real energy consumption records. The proposed approach
can be useful in the elaboration of energy policies, although accurate predictions of
energy consumption positively affect the capital investment, while conserving at the
same time the supply security.
In addition it can be a precise tool for the Algerian mid-long term energy
consumption prediction problem, which up today has not been faced effectively. The
results are very encouraging and were accepted by the local electricity company.

Keywords Electricity demand • Linear regression • MLP neural network •


Pearson VII • Support vector machines • Time series

1 Introduction

Electric energy is considered as an important factor in the economic and social


development of a country and therefore in the people’s wealth and everyday life.

O. Ahmia () • N. Farah


LABGED Laboratory, Département d’Informatique, Université Badji Mokhtar Annaba, Bp 12,
El Hadjar 23000, Algeria
e-mail: [email protected]; [email protected]

© Springer International Publishing Switzerland 2016 355


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_26
356 O. Ahmia and N. Farah

Energy consumption forecast is necessary in the studies of energy supply


strategy, industrial investment. In addition, an exact prediction helps to decrease
the loss of charge or overproduction [1].What really matters is for power companies
to settle the consumption peak [2], per day, per month, per season, or per year. Then,
according to these given peaks we can use a dynamic profile on months, weeks, or
years scale, these information will give a prediction on the consumption considering
the considered profiles. Put another way, using only the peak and basing on profiles
we can deduce daily or hourly load value. One of the components of chronological
series is seasonal variation, which corresponds to the periodic phenomenon [3]. In
the case of the power consumption, it is divided into four seasons [4].
In power consumption there are three types of voltage, the low voltage used in
domestic houses, the medium voltage used by small industries, and the high voltage
used by big companies, like steel industries and soaking water industries. The gross
domestic product (GDP) is an important parameter for the forecast of power charge
in case of mid or high voltage because the economy of a country lays in a major
part on power consumption. If we consider the transformation of resources into
goods and services and the fact that every transformation requires power, it is an
obviousness that the economic production depends on the power production [5].
Our work consists in finding a model which corresponds to the consumption’s
diagram considering the variation problem of seasonal consumption. With these
parameters, we choose a parallel approach [6, 7] using different models in order to
establish which one could give the most accurate prediction. These models are the
linear regression, the neural network multi-layer perceptron (MLP) with different
architectures, machine’s vector regression (SVR) with different functions, and core
(polynomial, radial basis function (RBF), (PUK) Pearson 7 functions) and deducting
the daily and hourly load values using dynamic load profiles.
The paper is organized as follows: in the second section, we present related
works. The methods used will be discussed briefly in the third section. In the fourth
section we present the work methodology and pretreatment. In the fifth section we
explain some experiments and results. Finally a conclusion comes in the last section.

2 Related Work

Different methods and approaches have been proposed for forecasting the long- and
mid-term electric load demand in the last decades. Many of them include time series
analysis with statistical method like linear regressions. Indeed, Bianco [8] used
multiple linear regression (MLR) using GDP and population as selected exogenous
variables to forecast electricity consumption in Italy, the paper presents the different
models used, the results were globally good with an error rate that varies between
0.11 and 2.4 %.
Nezzar [9] used also linear regression approach for mid-long term Algerian
electric load forecasting, using historical information and GDP. The paper is
composed mainly into two parts: the first one for the annual national grid forecasting
Electrical Load Forecasting: A Parallel Seasonal Approach 357

and the second one concerns with an introduction to the use of load profile in order
to obtain a global load matrix.
Achnata [10] applied support vector regression to a real world dataset provided
in the web by the Energy Information Administration (EIA) department of America
for the Alaska state. In this work the support vector machines (SVM) performance
was compared with MLP for various models. The results obtained show that SVM
performs better than neural networks trained with back propagation algorithm, they
concluded that through proper selection of the parameters, SVM can replace some
of the neural network based models for electric load forecasting.
On the other hand, we can found intelligent methods such as artificial neural
network (ANN), in this way Ekonomou [11] used an MLP to predict Greek long-
term energy consumption. Several neural network (MLP) architectures were tested
and the one with the best generalizing ability was selected. The selected ANN model
results were compared to linear regression and "-support vector regression. The
produced results were much more accurate than those obtained by a linear regression
model and similar to those obtained by a support vector machine model.
Karin [12] also used ANN approach for the forecasting of electricity demand
in Thailand; in the paper, three methodologies were used, autoregressive integrated
moving average (ARIMA), ANN, and MLR.
The objective was to compare the performance of these three approaches and
the empirical data used in this study was the historical data regarding the electricity
demand (population, GDP, stock index, revenue from exporting industrial products,
and electricity consumption).
The results based on the error measurement showed that ANN model outper-
forms the other approaches.

3 Models Used

3.1 Multivariable Regression

Regression analysis is a statistical tool for the investigation of relationships between


variables. Usually, we need to know the causal effect of one variable upon another,
the GDP increases upon demand, for example.
To explore such issues, we need to assemble data on the underlying variables
of interest and employ regression to estimate the quantitative effect of the causal
variables upon the variable that they influence.
Regression techniques have been central to the field of economic statistics
(“econometrics”). MLR attempts to model the relationship between two or more
explanatory variables and a response variable by fitting a linear equation on the
observed data [13].
In this work multiple linear regressions are used to forecast load, which consists
in attempting to model the relationship between several explanatory variables and a
358 O. Ahmia and N. Farah

response variable by fitting a linear equation on the observed data. Every value of
the independent variables is associated with a value of the dependent variable.
Formally, the model for MLRs, given N observations, is
X
$ .k/ D ˛0 C ai i .k/ C "k ; k D 1; 2; : : : ; N; (1)
iD1

where r.k/ is the estimated loads, K is the observation ID (the month predicted), 
is the number variables, i are the multiple variables, and "k is the notation for the
model deviations.

3.2 Artificial Neural Network

ANN is a machine learning approach inspired by the way in which the brain
performs a particular learning task. ANNs are modeled on human brain and consist
of a number of artificial neurons. Neuron in ANNs tends to have fewer connections
than biological neurons.
Each neuron in ANN receives a number of inputs. (Each input has weight). An
activation function is applied to these inputs which results in a neuron activation
level. There are different classes of network architectures: Single-layer feed-
forward, Multi-layer feed-forward, Recurrent [14].

3.2.1 Multi-Layer Perceptron

MLP is a feedforward ANNs which consists of one or more layers of computation


nodes in a directed graph the signal propagate in the network layer-by-layer, with
each layer fully connected to the next one.
The input layer consists of (sensory units) input nodes, each node is a neuron (or
processing element) with a nonlinear function, namely sigmoid function is widely
used to generate output activation in the computation nodes [15].
In a general case MLPs are trained with the back propagation algorithm to
increase the success of classification and regression systems.

3.3 Support Vector Machines

Support vector machine (SVM) is a statistical learning algorithm developed by


Vladimir Vapnik and co-workers [16]. SVM is based on the concept of decision
planes that define decision boundaries. A decision plane is one that separates
between a set of objects having different class memberships. The basic idea of SVM
Electrical Load Forecasting: A Parallel Seasonal Approach 359

Table 1 Kernels summary


Kernel Function Comment
Polynomial Œ1 C .X Xi /p Power p is specified by the user
 
RBF exp  2 1 2 X  Xi 2 The width 2 is specified by the user
PUK (Pearson 7) f .x/ D 2 0 p
H
12 3! Where H is the peak height at the
6 2. / 2.1=! / 1 A 7 center x0 of the peak, and x represents
41C@
xx0
5
the independent variable. The
parameters r and x control the half and
the tailing factor of the peak

is to map the original data into a feature space with high dimensionality through a
nonlinear mapping function and construct an optimal hyperplane in new space.
SVM can be applied to both classification and regression. In the case of
classification, an optimal hyperplane is found that separates the data into two
classes. Whereas in the case of regression a hyperplane is to be constructed that lies
close to as many points as possible [17]. The key characteristics of SVMs are the
use of kernels, the non-attendance of local minima, the sparseness of the solution,
and the capacity control obtained by optimizing the margin.

3.3.1 Support Vector Regression

The SVR task is to find a functional form for a function that can correctly predict
new cases that the SVM has not been presented with before. This can be achieved
by training the SVM model on a sample set, i.e., training set, a process that involves,
like classification (see above), and the sequential optimization of an error function,
depending on this error function definition [17].
There are a number of kernels that can be used in SVM models. These include
linear, polynomial, RBF, and sigmoid (Table 1).

4 Methodology

The power consumption in Algeria is divided into four seasons and the electric load
values in each season are relatively following the same evolution path.
In order to improve the precision of the forecast, we divided the dataset into
four parts; each part contains the monthly electric load values (peak) of a different
season.
Due to that, we constructed a model on each season’s dataset. Then we use three
months to predict a season. Combining the results obtained by the three months we
can have a season result, and then combining the four seasons we will have a year. In
this goal we divided the problem into four sub-problems. Otherwise we considered
360 O. Ahmia and N. Farah

Fig. 1 Diagram illustrating an example of prediction of monthly load for the year 2012

that, months belonging to the same season as a single time series (each season is a
different time series).
We will have a parallel approach as shown in Fig. 1 with four ANN, four SVR
with RBF kernel, and so on.
The forecasting models that are modeled are implemented using historical
information. Different approaches are used to find the best monthly forecast model
by choosing and testing different given information:
• The electric load values of the same month of the previous years (PMY) i.e.:

Y . y; m/ D f .Y . y  1; m/ ; Y . y  2; m/ ; : : : /

• The electric load values of the previous months (PM) i.e:

Y . y; m/ D f .Y . y; m  1/ ; Y . y; m  2/ ; : : : /

• and combination of them i.e.:

Y . y; m/ D f .Y . y  1; m/ ; Y . y  2; m/ ; : : : ; Y . y; m  1/ ; Y . y; m  2/ ; : : : /

where y is the year; m is the month.


Electrical Load Forecasting: A Parallel Seasonal Approach 361

According to the existence of a unidirectional causal relationship between


economic growth (GDP growth) and energy consumption with direction of causality
running from economic growth to energy consumption [18], we add the GDP as
exogenous variable.
From the stance that the future GDP values are unknown we use a power
regression model equation (2) (autoregressive model) to predict it.
0:9349
F .Xt / D 1:8056Xt1 : (2)

As our interest is focused on monthly forecast, Unfortunately the GDP is an annual


data, in order to get the monthly value of the GDP, we use the load profile. We
get the Monthly value of GDP by multiplying the GDP value by the percentage
corresponding to monthly power value compared to the annual peak (monthly
profile)
Example: We suppose that the profile value for a month is 80 % so the monthly
GDP (GDPm) is calculated by Eq. (3):

GDPm D 0:8 .GDP/ : (3)

4.1 Data Selection and Preprocessing

This study uses the Algerian national electricity consumption data from year 2000 to
2012, provided by the national electricity company Sonelgaz (http://www.sonelgaz.
dz/), in order to compare the forecasting performances of a parallel seasonal
approach, using SVR models, ANN, and linear regression models.
The data provided concerns the national electricity consumption for each hour
during the period 2000–2012, using annual, weekly, and daily profile we calculated
the electrical load value after that the monthly peak is deducted by taking the
maximum load values in the considered month.
The data used by ANN and SVR (PUK, RBF) has to be normalized on a Scaling
from 0 to 1.
ei  Emin
Normalized .ei / D ; (4)
Emax  Emin

where Emin is the minimum expected value for variable E, Emax is the maximum
expected value for variable E, if Emax is equal to Emin , then normalized (ei ) is set to
0.5.
362 O. Ahmia and N. Farah

4.2 Considered Variables

There is a relation between actual and past electric load. That’s why eight
combinations of past information (monthly peak load at a precedent month) are
used as variables in order to predict future peak load. To forecast a month we use
different models for each combination:
– (2 PMY C 2 PM) use the same month of the two previous years and the electric
load values of the two previous months (four variables).
– (2 PMY C PM) use the same month of the two previous years and the electric
load values of the previous month (three variables).
– (2 PMY) use the same month of the two previous years (two variables).
– (3 PMY) use the same month of the three previous years (three variables).
– (4 PMY C 2 PM) use the same month of the four previous years and the electric
load values of the two previous months (six variables).
– (4 PMY C 3 PM) use the same month of the four previous years and the electric
load values of the three previous months (seven variables).
– (4 PMY C PM) use the same month of the four previous years and the electric
load values of the previous month (five variables).
– (4 PMY) use the same month of the four previous years (four variables).

4.3 Design of the Proposed SVR and MLP ANN Models

In order to find the best neural network architecture to have a good generalizing
ability, several ANN MLP models were developed and tested. These structures were
consisted of one to five hidden layers and within two to ten neurons in each hidden
layer. The model, which presented the best generalizing ability, had a compact
structure with the following characteristics: two hidden layers, with five and four
neurons in each layer, and trained with back propagation learning algorithm and
logarithmic sigmoid transfer function.
The parameters optimizations increase considerably the accuracy of the peak
forecast [19, 20, 21]. In order to have better accuracy, the SVR parameters have
been chosen by grid search algorithm [22].

4.4 From the Long to Short Term

By linking together the three types of load profiles (every week of the annual profile
is linked with weekly profile and every day of weekly profile is associated with
a daily profile), we will obtain a global load profile, a matrix that contains the
percentage of each hour of the year compared to the annual peak.
Electrical Load Forecasting: A Parallel Seasonal Approach 363

Table 2 Clustering of the Observation Cluster


year 2012 using K-means
SUN 1
MON 2
TUE 2
WED 2
THU 2
FRI 3
SAT 3

Usually the electricity power supplier classifies intuitively week days into three
types (working day (WD), weekend (WE), and first day of the week (FDW)). In
order to verify this hypothesis, we used K-means [23] algorithm in order to cluster
the weekdays into three different types.
To classify the weekdays that have an overall similar behavior into the same
group (type), we create synthetic variables (24 variables), the percentage of each
hour of the same weekday, in relation to the daily peak.
The results are shown in Table 2.
From Table 2 we can notice that the weekdays are classified as follows:
weekend,1 working day, and the day after weekend.
The multiplication of the predicted annual peak by the global load profile will
result in the hourly load prediction of the entire year.
Instead of using annual peak, it is also possible to multiply monthly peak by
the corresponding weekly profiles combined to the daily profile, to get hourly
prediction. It is possible to improve the accuracy of the short-term prediction by
using personalized profile for special days and month (e.g., Ramadan).

5 Results and Discussion

Firstly, we train the models on the full dataset without dividing it into four seasons.
The performance of all the models with the same variables set is tested in order
to reveal the advantage of the proposed method.
Each model will be checked using mean absolute percentage error (MAPE). The
forecasting results for each model are presented in Table 3. The minimal errors in
each line are highlighted in bold.
Then we train four models on the dataset that have been divided into four
seasons, where each model will be specialized on a different season, the results
are represented in Table 4.
Finally we add the GDP as exogenous variable to the divided dataset, the results
are shown below.

1
The weekend in Algeria is Friday and Saturday.
364 O. Ahmia and N. Farah

Table 3 MAPE before dividing the dataset into four seasons of the five models using different
variables basing on different historical value
Linear regression SVR (PUK SVR (RBF SVR (Poly Neural
model (%) kernel) (%) kernel) (%) kernel) (%) network (%)
2 PMY C 2 PM 2.96 3.28 3.15 2.98 3.52
2 PMY C PM 3.02 3.17 3.23 3.20 3.74
2 PMY 3.60 3.57 3.56 3.64 3.90
3 PMY 3.55 3.50 3.37 3.42 3.86
4 PMY C 2 PM 2.96 3.02 3.04 3.01 3.75
4 PMY C 3 PM 2.98 3.12 3.11 3.11 3.93
4 PMY C PM 3.03 3.12 3.13 3.23 4.16
4 PMY 3.15 3.13 3.10 3.01 3.77

Table 4 MAPE after dividing the dataset into four seasons of the five models using different
variables basing on different historical value
Linear regression SVR (PUK SVR (RBF SVR (poly Neural
model (%) kernel) (%) kernel) (%) kernel) (%) network (%)
2 PMY C 2 PM 2.94 3.21 2.97 2.72 5.48
2 PMY C PM 2.83 3.00 2.76 2.73 4.81
2 PMY 3.63 4.33 3.58 3.48 3.71
3 PMY 3.51 3.81 3.05 3.48 3.51
4 PMY C 2 PM 2.85 2.77 2.77 2.67 4.81
4 PMY C 3 PM 2.93 2.89 2.88 2.65 3.99
4 PMY C PM 2.81 2.58 2.48 2.54 4.63
4 PMY 2.97 2.88 2.66 2.79 4.02

We notice from Tables 3 and 4 that dividing our dataset into seasons increases
the forecast precision, as an example the error of the support vector regression with
RBF kernel decreases from 3.10 to 2.66 %, for the model that uses the values of
the same month of the four previous years. Except in the case of the neural network
because of insufficient training dataset caused by the division, the size of the training
set before dividing contain 120 instances, after dividing the number of instance in
each dataset becomes 30, we note according to the result in Tables 2 and 3 that
the error of the neural network increases from 3.52 to 5.48 % for the model that
uses the values of the same month of the two previous years and the values of the
two previous. As shown in Table 5, taking into account the GDP value in the model
construction increase significantly the accuracy of the prediction, this shows that the
Algerian GDP is closely linked to the electricity load peak. From Tables 3, 4, and
5 it can be observed that hyperparameter optimized SVRs give a better result than
ANN and multiple regression.
To measure the accuracy of the long-term to short-term prediction, we predict the
hourly load values of the year 2012, we use the annual peak predicted by the SVM
(RBF) with the 4 PMY C 3 PM combination, and the global load profile of 2011, in
comparison with the real load values the MAPE error is 3.61 %.
Electrical Load Forecasting: A Parallel Seasonal Approach 365

Table 5 MAPE after dividing the dataset into four seasons of the five models using different
variables basing on different historical values and the GDPm value
Linear regression SVR (PUK SVR (RBF SVR (poly Neural
model (%) kernel) (%) kernel) (%) kernel) (%) network (%)
2 PMY C 2 PM 2.22 2.10 2.14 2.08 3.77
2 PMY C PM 2.21 2.37 2.17 2.30 4.04
2 PMY 3.51 3.36 2.98 3.43 2.81
3 PMY 3.45 3.10 2.62 3.16 3.08
4 PMY C 2 PM 2.35 1.93 1.60 1.99 3.04
4 PMY C 3 PM 2.53 1.91 1.59 2.20 2.68
4 PMY C PM 2.38 1.92 1.78 1.97 2.73
4 PMY 2.74 2.44 2.34 2.84 3.76

6 Conclusion

A parallel approach application of support vector regression, MLP neural networks,


and MLRs for electric load forecasting has been presented in this paper. Several
combinations of past load values were used in order to find which ones have the
best generalizing ability. The main idea of this work is to improve the accuracy of
the forecast by dividing the problem into several smaller problems; the results show
that by using a parallel approach where a model is constructed for season improve
precision of the forecasts.
The obtained results show that SVR performs better than neural network and
MLRs. It was also observed that parameter selection using grid search algorithm in
the case of SVR increases significantly the performance of the model.
The use of load profiles allows us to move a load value from the long term to
short term.

References

1. Feinberg, E.A., Genethliou, D.: Load forecasting. In: Chow, J.H., Wu, F.F., Momoh, J.A.
(eds.) Anonymous Applied Mathematics for Restructured Electric Power Systems, pp. 269–
285. Springer, Heidelberg (2005)
2. Haida, T., Muto, S.: Regression based peak load forecasting using a transformation technique.
IEEE Trans. Power Syst. 9(4), 1788–1794 (1994)
3. Cleveland, W.P., Tiao, G.C.: Decomposition of seasonal time series: a model for the census
X-11 program. J. Am. Stat. Assoc. 71(355), 581–587 (1976)
4. Perninge, M., Knazkins, V., Amelin, M., Söder, L.: Modeling the electric power consumption
in a multi-area system. Eur. Trans. Electr. Power 21(1), 413–423 (2011)
5. Ozturk, I., Acaravci, A.: The causal relationship between energy consumption and GDP in
Albania, Bulgaria, Hungary and Romania: evidence from ARDL bound testing approach. Appl.
Energy 87(6), 1938–1943 (2010)
366 O. Ahmia and N. Farah

6. Laouafi, A., Mordjaoui, M., Dib, D.: One-hour ahead electric load forecasting using neuro-
fuzzy system in a parallel approach. In: Azar, A.T., Vaidyanathan, S. (eds.) Anonymous
Computational Intelligence Applications in Modeling and Control, pp. 95–121. Springer,
Heidelberg (2015)
7. Ahmia, O., Farah, N.: Parallel seasonal approach for electrical load forecasting. In: Interna-
tional Work-Conference on Time Series (ITISE), pp. 615–626. Granada (2015)
8. Bianco, V., Manca, O., Nardini, S.: Electricity consumption forecasting in Italy using linear
regression models. Energy 34(9), 1413–1421 (2009)
9. Nezzar, M., Farah, N., Khadir, T.: Mid-long term Algerian electric load forecasting using
regression approach. In: 2013 International Conference on Technological Advances in Elec-
trical, Electronics and Computer Engineering (TAEECE), IEEE, pp. 121–126 (2013)
10. Achnata, R.: Long term electric load forecasting using neural networks and support vector
machines. Int. J. Comp. Sci. Technol. 3(1), 266–269 (2012)
11. Ekonomou, L.: Greek long-term energy consumption prediction using artificial neural net-
works. Energy 35(2), 512–517 (2010)
12. Kandananond, K.: Forecasting electricity demand in Thailand with an artificial neural network
approach. Energies 4(8), 1246–1257 (2011)
13. Sykes, A.O.: An introduction to regression analysis. Chicago Working Paper in Law and
Economics (1993)
14. Kubat, M.: Neural Networks: A Comprehensive Foundation by Simon Haykin. Macmillan,
New York, 1994, ISBN 0-02-352781-7 (1999)
15. Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: the
RPROP algorithm. In: IEEE International Conference on Neural Networks, 1993, IEEE, pp.
586–591 (1993)
16. Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl.
Disc. 2(2), 121–167 (1998)
17. Schölkopf, B., Burges, C.J., Smola, A.J.: Using support vector machine for time series
prediction. In: Advances in kernel Methods, pp. 242–253. MIT, Cambridge (1999)
18. Abaidoo, R.: Economic growth and energy consumption in an emerging economy: augmented
granger causality approach. Res. Bus. Econ. J. 4(1), 1–15 (2011)
19. Cherkassky, V., Ma, Y.: Practical selection of SVM parameters and noise estimation for SVM
regression. Neural Netw. 17(1), 113–126 (2004)
20. Diehl, C.P., Cauwenberghs, G.: SVM incremental learning, adaptation and optimization,
Neural Networks, 2003. In: Proceedings of the International Joint Conference on, vol. 4, pp.
2685–2690 (2003)
21. Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection
and hyper parameter optimization of classification algorithms. In: 19th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, pp. 847–855. Chicago
(2013)
22. Bi, J., Bennett, K., Embrechts, M., Breneman, C., Song, M.: Dimensionality reduction via
sparse support vector machines. J. Mach. Learn. Res. 3, 1229–1243 (2003)
23. Benabbas, F., Khadir, M.T., Fay, D., et al.: Kohonen map combined to the K-means algorithm
for the identification of day types of Algerian electricity load. In: IEEE 7th Computer
Information Systems and Industrial Management Applications (CISIM’08), pp. 78–83 (2008)
A Compounded Multi-resolution-Artificial
Neural Network Method for the Prediction
of Time Series with Complex Dynamics

Livio Fenga

Abstract Time series realizations of stochastic process exhibiting complex dynam-


ics are dealt with. They can be affected by a number of phenomena, like asymmetric
cycles, irregular spikes, low signal-to-noise ratio, chaos, and various sources of
turbulences. Linear models are not designed to perform optimally in such a
context, therefore a mixed self-tuning prediction method—able to account for
complicated patterns—is proposed. It is a two-stage approach, exploiting the multi-
resolution capabilities delivered by Wavelet theory in conjunction with artificial
neural networks. Its out-of-sample forecast performances are evaluated through an
empirical study, carried out on macroeconomic time series.

Keywords Complex dynamics • Feed-forward artificial neural networks • Maxi-


mum overlapping discrete wavelet transform • Time series forecast

1 Introduction

One of the most effective and well-established practical uses of time series
analysis is related to the prediction of future values using its past information [1].
However, much of the related statistical analysis is done under linear assumptions
which—outside trivial cases and ad hoc lab—controlled experiments—hardly ever
do possess features compatible with real-world DGPs. The proposed forecast
procedure has been designed to account for complex, possibly non-linear dynamics
one may encounter in practice and combines the wavelet multi-resolution decompo-
sition approach, with a non-standard, highly computer intensive statistical method:
artificial neural networks (ANN). The acronym chosen for i—i.e. MUNI, short for
Multi-resolution Neural Intelligence—reflects both these aspects. In more details,
MUNI procedure is based on the reconstruction of the original time series after its
decomposition, performed through an algorithm based on the inverse of a wavelet
transform, called Multi-resolution Approximation (MRA) [2–4]. Practically, the

L. Fenga ()
UCSD, University of California San Diego, La Jolla, CA, USA
e-mail: [email protected]

© Springer International Publishing Switzerland 2016 367


I. Rojas, H. Pomares (eds.), Time Series Analysis and Forecasting, Contributions
to Statistics, DOI 10.1007/978-3-319-28725-6_27
368 L. Fenga

original time series is decomposed into more elementary components (first step),
each of them representing an input variable for the predictive model, and as such
individually predicted and finally combined through the inverse MRA procedure
(second step). In charge of generating the predictions is the time domain—Artificial
Intelligence (AI)—part of the method which exploits an algorithm belonging to the
class of parallel distributed processing, i.e., ANN [5, 6], with an input structure of
the type autoregressive.

1.1 Signal Decomposing and Prediction Procedures

In what follows the time series (signal) of interest isn assumedo to be real-valued,
uniformly sampled of finite length T, i.e.: xt WD .xt /Tt2ZC . MUNI has been
implemented with a wavelet [7–9] signal-coefficient transformation procedure of the
type Maximum Overlapping Discrete Wavelet Transform (MODWT) [10], which is
a filtering approach aimed at modifying the observed series fxgt2Z C , by artificially
introducing an extension of it, so that the unobserved samples fxgt2Z  are assigned
the observed values XT1 , XT2 ; : : : ; X0 . This method considers the series as it were
periodic and is known as using circular boundary conditions, where wavelet and
scale coefficients are respectively given by:

Lj 1 Lj 1
1 XQ 1 X
dj;t D j=2 hj;l ; Xtl modN ; SJ;t D j=2 gQ j;l ; Xtl modN ;
2 lD0 2 lD0
˚  ˚ 
with hQ j;l and gQ j;l denoting the length L, level j, wavelet, and scaling ˚ filters,

obtained by rescaling their Discrete Wavelet Transform counterparts, i.e., hj;l and
˚  ˚ 
gj;l , as follows: hQ j;l D 2 j=2
hj;l gj;l
and gQ j;l D 2 j=2 . Here, the sequences of coefficients hj;l
˚ 
and gj;l are approximate filters: the former of the type band-pass, with nominal
pass-band f 2 Œ 4 1 j ; 12 j , and the latter of the type low-pass, with a nominal
pass-band f 2 Œ0; 14 j , with j denoting the scale. Considering all the J D J max
sustainable scales, MRA wavelet representation of xt , in the L2 .R/ space, can be
expressed as follows:
X X X
x.t/ D sJ;k J;k .t/ C dJ;k J;k .t/ C dJ1;k J1;k .t/ C
k k k
X X
C dj;k j;k .t/    C d1;k 1;k .t/; (1)
k k
Multi-resolution Neural Network-Based Predictions 369

with k taking integer values from 1 to the length of the vector of wavelet coefficients
related to the component j and and , respectively, the father and mother wavelets
(see, for example, [11, 12]). Assuming PJthat a number J0  J maxPof scales is
P is expressed as xt D jD1 Dj C SJ0 , with Dj D k dJ;k J;k .t/
0
selected, MRA
and Sj D k sJ;k J;k .t/; j D 1; 2; : : : ; J. Each sequence of coefficients dj , (in
signal processing called crystal), represents the original signal at a given resolution
level, so that the MRA conducted at a given level j . j D 1; 2; : : : ; J), delivers
the coefficients set Dj , which reflects signal local variations at the detailing level
j,
˚ and the set SJ0 , accounting  for the long run variations. By adding more levels
dj I j D 1; 2; : : : ; J max , finer levels js are involved in the reconstruction of the
original signal and the approximation becomes closer and closer, until the loss of
information becomes negligible. The forecasted values are generated by aggregation
of the predictions singularly obtained by each of the wavelet components, P 0 once they
are transformed via the Inverse MODWT algorithm, i.e.: xO t .h/ D JjD1 DO inv
j .h/ C
SO Jinv
0
.h/, where D and S are as above defined and the superscript inv indicates the
inverse MODWT transform. In total, four are the choices required for a proper imple-
mentation of MODWT, they are boundary conditions, type of wavelet filter, its width
parameter L, and number of decomposition levels. Regarding the first choice, MUNI
has been implemented with periodic boundary conditions. However, alternatives can
be evaluated on the basis of the characteristics of the time series and/or as a part of
a preliminary investigation. The choices related to the type of wavelet function and
its length L are generally hard to automatize, therefore their inclusion in MUNI
has not been pursued. More simply, it has been implemented with the fourth order
Daubechies least asymmetric wavelet filter (known also as symmlets) [8] of length
L D 8, usually denoted LA.8/. Regarding the forecasting method, MUNI uses a
neural topology belonging to the family of multilayer perceptron [13, 14], of the
type feed-forward (FFWD-ANN) [15]. This is a popular choice in computational
intelligence for its ability to perform in virtually any functional mapping problem
including autoregressive structures. This network represents the non-linear function
mapping from past observations fxt I . D 1; 2; : : : ; T  1/g to future values
fxh I .h D T; T C 1; : : :/g, i.e.: xt D nn xt1 ; xt2 ; : : : ; xtp ; w C ut , with p the
maximum autoregressive lag, .:/ the activation function defined over the inputs
and w the network parameters and ut the error term. In practice, the input–output
relationship is learnt by linking, via acyclic connections, the output xt to its lagged
values, constituting the network input, through a set of layers. While the latter has
usually a very low cardinality (often 1 or 2), the input set is critical—being the
inclusion of not-significant lags and=or the exclusion of significant ones able to
affect the quality of the outcomes.
370 L. Fenga

1.1.1 The Learning Algorithm and the Regularization Parameter 

MUNI envisions the time series at hand split in three different, non overlapping
parts, serving respectively
˚ as training, test, and validation sets. The training set
is the sequence .x1 ; q1 /; : : : ; .xp ; qp / , in the form of p ordered pairs of n- and
m-dimensional vectors, were qi denotes the target value and xi the matrix of the
delayed time series values. The network, usually initialized with random weights, is
presented an input pattern and an output, say oi , is generated as a result. Being in
general oi ¤ qi , the learning algorithm tries to find the optimal weights vector
minimizing the error function in the w-space, that is: op D fnn .w; xp /, where
the weight vector w refers, respectively, to the pi output and the pi input and fnn
the activation function. Denoting here the training set with Tr and with Ptr the
number of pairs, the average
P error E committed by the network can be expressed
Q
as: E.w/ D E.w/ C 12  i w2i , where  is a constraint term aimed at penalizing
model weights and thus limiting the probability of over-fitting.

1.1.2 Intelligent Network Parameters Optimization

In this section, MUNI’s AI-driven part is illustrated and some notation introduced. In
essence, it is a multi-grid searching system for the estimation of an optimal network
vector of parameters under a suitable loss function, i.e., the root mean square error
(RMSE), expressed as

" # 12
X
T
B.xi ; xO i / D T 1 jei j2 ; (2)
i

with xi denoting the observed value, e the difference between it and its prediction xO i ,
and T the sample size. The parameters subjected to neural-driven search, listed in
Table 1 along with the symbols used to denote each of themare stored in the vector
!, i.e.: !  .ˇ; ; ˛; ; ; /. Each of them is associated a grid, whose arbitrarily
chosen values are all in ZC . Consistently with the list of parameters of Table 1, the
set of these grids, is formalized as follows:  D .fˇ g, f g, f˛ g, f g, f g,
f g), where each subset f./ g has cardinality respectively equals to ˇ,Q ,
Q ˛,Q ,Q

Table 1 MUNI’s parameters Symbol Parameter


and related notation
ˇ Number of the sets of decomposition levels
 Rescaling factor
˛ Number of iterations
 Decay ratio
 Number of input neurons (lag-set)
 Number of hidden layer neurons
Multi-resolution Neural Network-Based Predictions 371

Q 1 The wavelet-based MRA procedure is applied ˇQ times, so that the time series
Q .
of interest is broken down into ˇQ different sets, each containing different numbers of
Q
crystals, in turn contained by a set denominated A, i.e.: fAˇw I w D 1; 2; : : : ; ˇg
A. Here, each of the A’s encompasses a number of decomposition levels ranging
from a minimum and a maximum, respectively, denoted by J min and J max , therefore,
for the generic set Aˇw A, it will be:
˚ 
Aˇw D J min  k; k C 1; k C 2; : : : ; K  J max I J min > 1 : (3)

Assuming a resolution set Aˇ0 and a resolution level k0 Aˇ0 , the related
crystal, denoted by Xk0 ;ˇ0 , is processed by the network N1 , which is parametrized by
.k0 ;Aˇ /
the vector of parameters !1 0
 ..k0 ;Aˇ0 / ; ˛ .k0 ;Aˇ0 / ; .k0 ;Aˇ0 / ; .k0 ;Aˇ0 / ;  .k0 ;Aˇ0 / /.
Once trained, the network N1 , denoted by CQ1 k0 ;Aˇ0 , is employed to generate H-
step ahead predictions. MUNI chooses the best parameter vector, i.e.,  !.k0 ;Aˇ0 /
for Xk0 ;ˇ0 , according to the minimization of a set of cost functions of the form
B.Xk0 ;ˇ0 ; XOk0 ;ˇ0 / iteratively computed on the predicted values XOk0 ;ˇ0 in the vali-
dation set. These predictions are generated by a set of networks parametrized and
trained according to the set of k-tuples (with k the length of !) induced by the set
of the Cartesian relations on  . Denoting the former by CQk0 ;Aˇ0 and by P the latter,
it will be:

!.k0 ;Aˇ0 / D arg min B.XOk0 ;Aˇ0 .P/; Xk0 ;Aˇ0 /: (4)
P

The set of all the trained networks attempted at the resolution level Aˇ0 (i.e.,
encompassing all the crystals in Aˇ0 ), is denoted by CQAˇ0 , whereas the set of
networks trained in the whole exercise (i.e., for all the A’s), by CQA . The networks
CQAˇ0 , parametrized with the optimal vector . !J min ; : : : ; !J max /   ˝ ˇ0 , which is
obtained by applying (4) to each crystal, are used to generate predicted values at
each resolution level independently. These predictions are combined via Inverse-
MODWT and evaluated in terms of the loss function B, computed on the validation
set of the original time series. By repeating the above steps for the remaining
sets, i.e., fAˇw I w D 1; 2; : : : ; ˇQ  1g A, ˇQ optimal sets of networks  CQA ,
each parametrized by optimal vectors of parameters f ˝ w I w D 1; 2; : : : ; ˇg Q are
obtained. Each set of networks in  CQA is used to generate one vector of predictions

.1/ .2/ Q
./ . j/
1
For example, for the grid  we have  D  ;  ; : : : ;  , with the generic element 
denoting one of the values chosen for the number of the input neurons.
372 L. Fenga

Table 2 MUNI procedure: a Decomposition unit AI unit


priori activities and choices
Data exploration
Data pre-processing
Waveform and its length Network topology
Wavelet transform method Number of hidden layers
Boundary conditions Sets of grids
Reconstruction algorithm Squashing function
Prediction horizon

for xt in the validation set (by combination of the multi-resolutions predictions via
MODWT), so that, by iteratively applying (2) to each of them, a vector containing
ˇQ values of the loss function, say Lw , is generated. Finally, the set of networks in
a resolution set say  A, whose parametrizations minimize Lw , are the winners, i.e.:
 A
˝ D arg min .L/.
. ˝ w /

1.1.3 Human-Driven Decisions

Being MUNI a partially self-regulating method, while it embodies automatic


estimation procedures for a number of parameters, it requires a set of preliminary,
human-driven choices, involving both the decomposition and the AI parts, as
summarized in Table 2. Here, the first two entries are in common in that refer to
activities which are not unit specific.

1.1.4 The Algorithm

MUNI procedure is now detailed in a step-by-step fashion.


1. Let xt be the time series of interest [as˚ defined
 in of interest (Sect. 1.1)], split in
three disjoint segments: training set xTr t I t D 1; 2; : : : ; T  .S C V C 1/,
˚  ˚ 
validation set, xU t t D T .S CV C1/; : : : ; .T V/ and test set, xS t t D
T  .V C 1/; : : : ; T/, where with V and S, respectively, the length of validation
and test set are denoted;
2. MODWT is applied ˇQ times to xTr t , and the related sets of crystals are stored
in ˇQ different sets, which in turn are stored in the matrix Dj;w , of dimension
(J MAX  ˇ)Q of the form2:

2
The upper row containing the symbols for the A’s has been added for clarity.
Multi-resolution Neural Network-Based Predictions 373

Aˇ1 Aˇ2 Aˇw A Q̌


2 3
XJ MIN ;1 XJ MIN ;2    XJ MIN ;w    XJ MIN ; Q̌
6 XJ MINC1 ;1 XJ MINC1 ;2    XJ MINC1 ;w    XJ MINC1 ; Q̌ 7
6 7
6 :: :: :: :: :: 7
6 7
6 : : : :  : 7
6 Xk;1 Xk;2    Xk;w    Xk;ˇQ 7
6 7
6 7
6 :: :: :: :: :: 7
6 : : : :  : 7
6 7
6 :: :: :: 7
6 X MAX :  :  : 7
6 J ;1 7
6 :: :: :: :: 7
Dj;w D6 7:
6 ; : : :  : 7
6 :: :: 7
6 7
6 ; XJ MAX ;2    :  : 7
6 7
6 :: :: :: 7
6 ; ; : :  : 7
6 7
6 :: 7
6 ; ;    XJ MAX ;w    : 7
6 7
6 :: :: 7
6 7
4 ; ; : ;  : 5
; ;  ;    XJ MAX ; Q̌

Here, the generic column ˇw , represents the set of resolution levels generated
by a given MRA procedure, so that its generic element Xk;w is the crystal
obtained at a given decomposition level k belonging to the set of crystals
Aˇw . For each column vector ˇw , a minimum and a maximum decomposition
level (3), J min and J max , is arbitrarily chosen;
3. the set P of the parametrizations ˚ of interest is built. It is the set of all
the Cartesian relations P    ˛       whose cardinality,
expressed through the symbol j  j, is denoted by jPj;
4. an arbitrary set of decomposition levels, say A0 A, is selected (the symbol ˇ
is suppressed for an easier readability);
5. an arbitrary crystal, say Xk0 ;A0 A0 , is extracted;
 .1/
6. the parameter vector ! is set to an (arbitrary) initial status !1  P1  ˛ ,
.1/ .1/ .1/ .1/ 
 ,  ,  ,  ;
(a) Xk0 ;A0 is submitted to and processed by a single hidden layer ANN of the
form
8  
<N D Xk0 ;A0 ; w.1/
.1/
:w.1/ D .!k0 ;A0 /;
1

with being the sigmoid activation function and w the network weights
evaluated for a given configuration of the parameter vector, i.e., !1 .
(b) network N1 is trained and the network CQ1 k0 ;A0 , obtained as a result, is
employed to generate H-step ahead predictions for the validation set xU .
These predictions are stored in the vector Pk10 ;A0 ;
374 L. Fenga

(c) steps 6a–6d are repeated for each of the remaining  .jPj  1/ elements of
P. The matrix Pkm0 ;A0 of dimension U  jPj  1 , containing the related
predictions (for the crystal Xk0 ;A0 ) is generated by the trained networks
k0 ;A0
CQ2;:::;jPj ;
 
(d) the matrix full Pk0 ;A0 of dimensions U  jPj containing all the predictions
for the crystal Xk0 ;A0 is generated, i.e.,

full P
k0 ;A0
D Pk10 ;A0 [ Pkm0 ;A0 I

(e) steps 6a–6d are repeated for each of the remaining crystals in A0 , i.e.,
˚ 
Xki ;A0 I i D J min ; J minC1 ; : : : ; .J max  1/ A0 A;

so that D 1; 2; : : : ; .J MAX  1/ prediction matrices Pki ;A0 are generated;


 iMAX
(f) the .J  J min C 1/  jPj dimension matrix JO A0 full Pk0 ;A0 [
ki ;A0
P I i D J min ; J˚minC1; : : : ; .J max  1/, containing all the predictions
for the validation set X U t , of all the crystals in A0 , is generated,3 i.e.:

2 3
XOJ MIN ;!1 XOJ MIN ;!2    XOJ MIN ;!k    XOJ MIN ;!jPj
6 7
6 XO MINC1 XO MINC1    XOJ MINC1 ;!k    XOJ MINC1 ;!jPj 7
6 J ;!1 J ;!2 7
6 7
6 : : :: :: :: 7
6 :: :: : :  : 7
JO A0 D6
6
7I
7
6 XOk;!1 XOk;!2    XOk;!k    XOk;!jPj 7
6 7
6 :: :: :: :: :: 7
6 : : : :  : 7
4 5
XOJ MAX ;!1 XOJ MAX ;!2    XOJ MAX ;!k    XOJ MAX ;!jPj

o
7. loss function minimization in the validation set is usedn to build the set of winner
min max
ANNs for each of the crystals in A0 , i.e., C A0  J C A0 ; : : : ; J C A0 .
For example, for the generic crystal k, the related optimal network is selected
according to: k CA0 D arg min B. XkU ; XOkU .P//;
P
8. C A0 is employed to generate the matrix XOA0 of the optimal predictions for the
validation set of each resolution level ini A0 , i.e.,
h min 0
XO  XO U ; : : : : : : ; J XO U
A 0 J max
;
˚ 
9. by applying inverse MODWT to XO 0 , the series xU t is reconstructed, i.e.,
A
˚  .U/
Inv.XOA0 / D xO U , so that the related loss function B.xU
A0 t t ;x
O t / is computed
and its value stored in the vector  whose length is .J MAX  J min C 1/;

3
In order to save space, the empty set symbol ; is omitted.
Multi-resolution Neural Network-Based Predictions 375

10. steps 4–9 are repeated for the remaining sets of resolutions A1 ; : : : Aw ; : : : ; Aw1
Q ,
so that all the wQ error function minima are stored in the vector ;
11. the network set C generating the estimation of the crystals minimizer of  over
all the network configurations C, is the final winner, i.e., C D arg min .C/;
CA
12. final performances assessments are obtained by using C on the test set xSt .

2 Empirical Analysis

In this section, the outcomes of an empirical study conducted on four


macroeconomic time series—i.e., Japan/USA Exchange rate, USA Civilian
Unemployment rate (un-transformed and differenced data), Italian industrial
production Index, respectively denoted TS1, TS2, TS3, TS4—are presented. These
series (detailed in Tables 3 and 5) along with their empirical autocorrelation
functions (EACF) [16], depicted respectively in Figs. 1 and 2, have been considered
as they differ substantially for the type of phenomenon measured other than for
their own inherent characteristics as time span, probabilistic structure, seasonality,
and frequency components. In particular, TS2 and TS3, refer to the same variable
(US civilian unemployment rate), and are included in the empirical analysis to
emphasize MUNI’s capabilities to yield comparable results when applied to both
the original and transformed data (and thus to simulate the case of a not pre-
processed input series). As expected, the two series exhibit a different pattern:
the un-transformed one (TS2), in fact, shows an ill behavior, in terms of both
seasonal components and non-stationarity, in comparison with its differenced
(the difference order is 1 and 12) counterpart TS3. On the other hand, TS1–2

Table 3 Specification of the time series employed in the empirical analysis


Name Variable Source Period Span Transform
TS1 Japan/USA Board of Jan. 1992–Jan. 2015 277 No
Exchange rate Governors of the
Federal Reserve
System
TS2 Civilian US Bureau of Jan. 1948–Jan. 2015 805 No
unemployment rate Labor Statistics
(Household
survey)
TS3 Civilian US Bureau of Jan. 1948–Jan. 2015 792 Diff(1,12)
unemployment rate Labor Statistics
(Household
survey)
TS4 Italian industrial Italian National Jan. 1981–Jun. 2014 402 No
production Index Institute of
Statistics
376 L. Fenga
350

10
300

8
250
200

6
150

4
100

1995 2000 2005 2010 2015 1950 1960 1970 1980 1990 2000 2010
1.0

120
0.5

100
0.0

80
−0.5

60
−1.0

40
−1.5

1950 1960 1970 1980 1990 2000 2010 1980 1985 1990 1995 2000 2005 2010 2015

Fig. 1 Graphs of the time series employed in the empirical study

shows roughly an overall similar pattern, with spikes, irregular seasonality, and
non-stationarity both in mean and variance. Such a similarity is roughly confirmed
by the patterns of their EACFs (Fig. 2). Regarding the time series TS3–4, they
exhibit more regular overall behaviors but deep differences in terms of their
structures. In fact, by examining Figs. 1 and 2, it becomes apparent that unlike TS4,
TS3 is roughly trend stationary with a 12-month seasonality with a persistence of
the type moving average—according to the (unreported) Partial EACF—different
from the one characterizing TS4, appearing to follow an autoregressive process.
Regarding TS4, this time series has been included for being affected by two major
problems: an irregular trend pattern with a significant structural break located in
2009 and seasonal variations with size approximately proportional with the local
Multi-resolution Neural Network-Based Predictions 377

1.0
1.0

0.8
0.8

0.6
0.6
EACF

EACF
0.4
0.4

0.2
0.2

0.0
0.0

−0.2
−0.2

12 24 36 48 60 0 50 100 150 200


LAG LAG
0.2

0.8
0.6
0.0

EACF
EACF

0.4
−0.2

0.2
0.0
−0.4

0 50 100 150 200 12 24 36 48 60 72 84 96


LAG LAG

Fig. 2 EACFs for the time series employed in the empirical study

level of the mean. A potential source of nonlinearity, this form of seasonality is


often dealt with by making it additive through ad hoc data transformation. However,
this is not a risk-free procedure, for being usually associated with the critical task
of back-transforming the data, as shown in [17, 18]. Quantitative assessment of the
quality of the predictions generated by MUNI are made by means of the following
three metrics—computed
q P on the test sets of each of the four time series—i.e.:
.h/ 1 .h/ 1 P xS OxS
RMSE D s
S S 2
jx  xO j , MPE D 100 s Œ xS , MAPE.h/ D
P
100 1s j x xO
S S
S j, with s the length of x
x S
and h the number of steps ahead the
predictions are evaluated, here h D 1; 2; 3; 4 (Figs. 3 and 4).
378 L. Fenga

7.0

7.0
6.5

6.5
Index
Index
6.0

6.0
5.5

5.5
2 4 6 8 10 12 2 4 6 8 10 12
Test set Test set
7.0

7.0
6.5

6.5
Index

Index
6.0

6.0
5.5

5.5

2 4 6 8 10 12 2 4 6 8 10 12
Test set Test set

Fig. 3 TS1: test set. True (continuous line) and 1,2,3,4-step ahead predicted values (dashed line)

2.1 Results

As illustrated in Sect. 1.1.4, each of the employed ANNs has been implemented
according to a variable-specific set  , containing all the grids whose values are
reported in Table 4. It is worth emphasizing that, in practical applications, not
necessarily the set  encompasses the optimal (in the sense of the target function
B) parameter values of a given network. More realistically, due to the computational
constraints, one can only design a grid set able to steer the searching procedure
towards good approximating solutions (Table 6). The outcomes of the empirical
analysis outlined in the previous Sect. 2 are reported in Table 7. From its inspection,
it is possible to see the good performances, with a few exceptions, achieved by the
procedure. The series denominated TS4, in particular, shows a level of fitting that can
Multi-resolution Neural Network-Based Predictions 379

100
100

90
90

80
80

Index
Index

70
70

60
60

50
50

40
40

2 4 6 8 10 12 2 4 6 8 10 12
Test set Test set
100

100
90

90
80

80
Index

Index
70

70
60

60
50

50
40

40

2 4 6 8 10 12 2 4 6 8 10 12
Test set Test set

Fig. 4 TS4: test set. True (continuous line) and 1,2,3,4-step ahead predicted values (dashed line)

be considered particularly interesting, especially in the light of its moderate sample


size and the irregularities exhibited by the lower frequency components, i.e., d4 and
s4 (Fig. 5). The procedure chooses in this case relatively simple architectures: in
fact (see Table 6), excluding s4 (with six lags and the parameter  D 5), for all
the remaining components we have a more limited number of delays and of hidden
neurons .  3/. Regarding the performances, it seems remarkable the level of
fitting obtained at horizon 1 and 2, for a MAPE respectively equal to 0.85 and 1.43,
and a RMSE of 0.71 and 1.5. Visual inspection of Fig. 4 confirms this impression
as well as the less degree of accuracy recorded at farther horizons, even though it
appears how, especially horizon 3 predictions (MAPE D 2:63), can provide some
insights about the future behavior of this variable. Other than from Table 7, less
impressive performances can be noticed for TS1 by examining Fig. 3. However, it
380

Table 4 Grid values employed in the empirical analysis


Parameters
Series ˇ  ˛   
TS1 3; 4; 5 300; 400 50; 100; 200 0.001; 0.01;0.1 1– 8 1–6; 12; 15; 18 21; 24; 30; 36
TS2 4; 5; 6 2; 6 50; 100; 200 0.0001; 0.001 ; 0.01; 0.1 1–10 1–12; 15; 18; 21; 24; 27 30; 33; 36; 48; 60; 72
TS3 4; 5; 6 2; 3 50; 100; 200 0.0001; 0.001; 0.01; 0.1 1–6 1–6; 8; 12; 18; 21 24; 36; 48
TS4 3; 4; 5 150; 200 50; 100; 200 0.001; 0.01; 0.1 1–6 1–12; 15; 18; 21; 24 27; 30; 33; 36; 48; 60
L. Fenga
Multi-resolution Neural Network-Based Predictions 381

Table 5 Length of the Size


subsets of the original time
Set TS1 TS2 TS3 TS4
series ˚ A
x 229 673 660 354
˚ V t
x 36 120 120 60
˚ S t
x t 12 12 12 12
d1
d2
d3
d4
s4

0 67 134 201 268 335 402

Fig. 5 TS4 and its MODWT coefficient sequence dj; t I j D 1; : : : ; 4


382 L. Fenga

Table 6 Parameters chosen by MUNI for each time series at each frequency component
Parameters
Time series Crystals ˛   
TS1 d1 200 0:1 1-2-3-12-18-36 7
 D 400 d2 200 0:1 1-2-3-4-12-36 8
ˇD5 d3 200 0:1 1-2-3-4-12-15-18 8
d4 200 0:1 1-2-3-4-12-15-18-30 8
s4 200 0:1 1-2-3-4-12-18-24-30 8
TS2 d1 100 0:001 1-2-12 4
D7 d2 100 0:001 1-2-12-18 5
ˇD6 d3 100 0:001 1-2-3-30 4
d4 50 0:001 1-2-4-48 3
d5 100 0:001 1-2-12–30 4
d6 100 0:01 1-4-36 2
s6 100 0:01 1-2-3-10-60 4
TS3 d1 100 0:01 1-2-12 2
D2 d2 100 0:01 1-2-3-12 2
ˇD5 d3 100 0:01 1-2-12-18 3
d4 50 0:01 1-2-3-4-21 3
s4 100 0:01 1-2-3-5-6-21 3
TS4 d1 100 0:001 1-2-3-12 2
 D150 d2 100 0:001 1-2-12 2
ˇ D5 d3 100 0:001 1-2-5-15 3
d4 200 0:01 1-3-4-24 3
s4 200 0:1 1-3-15-18-36-48 5

Table 7 Goodness of fit statistics computed on the test set for the four time series considered
Horizon RMSE MPE MAPE Horizon RMSE MPE MAPE
TS1 1 1:729 0:303 1:485 TS2 1 0:093 0:105 1:424
2 2:705 0:097 2:354 2 0:186 0:408 2:829
3 4:71 1:063 3:56 3 0:294 0:75 4:382
4 7:184 1:588 6:133 4 0:375 1:361 5:835
TS3 1 0:129 0:677 1:841 TS4 1 0:712 0:047 0:849
2 0:215 0:712 3:224 2 1:507 0:351 1:43
3 0:224 0:694 3:12 3 2:908 0:035 2:626
4 0:295 2:693 4:075 3 5:591 0:807 5:938
Values for TS3 obtained by back-transformation
Multi-resolution Neural Network-Based Predictions 383

has to be said that, among those included in the empirical experiment, this time
series proves to be the most problematic one both in terms of sample size and
for exhibiting a multiple regime pattern made more complicated by the presence
of heteroscedasticity. Such a framework induces MUNI at selecting architectures
which are too complex for the available sample size. As reported in Table 6, in fact,
the number of hidden neurons is large and reaches, for almost all the frequency
components chosen (ˇ D 5), its maximum grid value ( D 8). Also, the number
of input lags selected is always high, whereas the regularization parameter reaches,
for all the decomposition levels, its maximum value ( D 0:1). Such a situation
is probably an indication of how the procedure tries to limit model complexity
by using the greatest value admitted for the regularization term, nevertheless the
selected networks still seem to over-fit. This impression is also supported by the high
number of iterations (the selected value for ˛ is 200 for each of the final networks)
which might have induced the selected networks to learn irrelevant patterns. As
a result, MUNI is not able to properly screen out undesired, noisy components,
which therefore affect the quality of the predictions. However, notwithstanding this
framework, the performances can be still regarded as acceptable considering the
predictions at lag 1 and perhaps at lag 2 (RMSE D 1:73 and 2:70, respectively),
whereas they significantly deteriorate at horizon 3, where the RMSE reaches the
value of 4:71. Horizon 4 is where MUNI breaks down, probably for the increasing
degree of uncertainty present at higher horizons associated with poor network
generalization capabilities. With an RMSE of 7.18, that is, more than 4 times higher
than horizon 1 and an MPE of 1.59 (>5 times), additional actions would be in order,
e.g., increasing the information set by including ad hoc regressors in the model. As
already mentioned, TS2 shows an overall behavior fairly similar to TS1, in terms
of probabilistic structure, non-stationarity components and multiple regime pattern.
However, the more satisfactory performances recorded in this case are most likely
to be connected to the much bigger available sample size. In particular, it is worth
emphasizing the good values of the MAPE for the short term predictions (h D 1; 2),
respectively, equal to 1:42 and 2:83 as well as the RMSE obtained at horizon 4,
which amounts to 0:37. In this case, more parsimonious architecture are chosen
(see Table 6) for the ˇ D 6 selected number of components the original time series
has been broken into, with a number of neurons ranging from 2 (for the crystal d6)
to 5 (for the crystals d2), which are associated with input sets of moderate size,
( D 5 is the max value selected). As a pre-processed version of TS2, TS3 shows
a more regular behavior (even though a certain amount of heteroscedasticity is still
present) with weaker correlation structures at the first lags and a single peak at
the seasonal lag 12 (EACF D 0:474). As expected, MUNI for this case selects
simpler architectures for each of the ˇ D 5 sub-series, with a limited number of
input lags and a smaller number of neurons (  3). Although generated by more
parsimonious networks, the overall performances seem to be comparable to those
obtained in the case of TS2. In fact, while they are slightly worse for the first two
lags (MAPE D 1:84; 3:22 for h D 1; 2 respectively versus 1:42 and 2:83), the
error committed seems to decrease more smoothly as the forecast horizon increases.
In particular at horizon 4, MUNI delivers better predictions than in the case of
384 L. Fenga

the un-transformed series: in fact, the recorded values of RMSE and MAPE are
respectively of 0.29 and 4.07 for TS3 and 0.37 and 5.83 for TS2.

Acknowledgements The author is deeply indebted to Professor Dimitris N. Politis of the


Applied Physics and Mathematics Department of the University of California San Diego, for his
valuable research assistance. Computational support provided by Mr. Wilson Cheung of the same
Department is also gratefully acknowledged.

References

1. De Gooijer, J.G., Hyndman, R.J.: 25 years of time series forecasting. Int. J. Forecast. 22,
443–473 (2006)
2. Mallat, S.G.: A theory for multiresolution signal decomposition: the wavelet representation.
IEEE Trans. Pattern Anal. Mach. Intell. 11, 674–693 (1989)
3. Mallat, S.: A Wavelet Tour of Signal Processing. Academic, New York (1999)
4. Akansu, A.N., Haddad, R.A.: Multiresolution Signal Decomposition: Transforms, Subbands,
and Wavelets. Academic, New York (2001)
5. Patra, J., Pal, R., Chatterji, B., Panda, G.: Identification of nonlinear dynamic systems using
functional link artificial neural networks. IEEE Trans. Syst. Man Cybern. B Cybern. 29, 254–
262 (1999)
6. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, New York,
NY (1995)
7. Farge, M.: Wavelet transforms and their applications to turbulence. Annu. Rev. Fluid Mech.
24, 395–458 (1992)
8. Daubechies, I., et al.: Ten lectures on wavelets. In: Society for Industrial and Applied
Mathematics, vol. 61. SIAM, Philadelphia (1992)
9. Qian, T., Vai, M.I., Xu, Y.: Wavelet Analysis and Applications. Springer, Berlin (2007)
10. Percival, D.B., Walden, A.T.: Wavelet Methods for Time Series Analysis, vol. 4. Cambridge
University Press, Cambridge (2006)
11. Aboufadel, E., Schlicker, S.: Discovering Wavelets. Wiley, New York (2011)
12. Härdle, W., Kerkyacharian, G., Picard, D., Tsybakov, A.: Wavelets, Approximation, and
Statistical Applications, vol. 129. Springer, Berlin (2012)
13. Webb, A.R.: Statistical Pattern Recognition. Wiley, New York (2003)
14. Pal, S.K., Mitra, S.: Multilayer perceptron, fuzzy sets, and classification. IEEE Trans. Neural
Netw. 3, 683–697 (1992)
15. Michie, D., Spiegelhalter, D.J., Taylor, C.C.: Machine Learning, Neural and Statistical
Classification. Horwood Ellis, Ltd., Hempstead (1994)
16. Box, G.E.P., Jenkins, G.M.: Time Series Analysis, Forecasting, and Control, vol.136.
Holden-Day, San Francisco (1976)
17. Chatfield, C., Prothero, D.: Box-Jenkins seasonal forecasting: problems in a case-study. J. R.
Stat. Soc. Ser. A (General) (1973) 295–336
18. Poirier, D.J.: Experience with using the Box-Cox transformation when forecasting economic
time series: a comment. J. Econ. 14, 277–280 (1980)

You might also like