Exploratory Causal Analysis With Time Series Data - James M. McCracken (Morgan & Claypool, 2016)
Exploratory Causal Analysis With Time Series Data - James M. McCracken (Morgan & Claypool, 2016)
James M. McCracken
George Mason University
M
&C Morgan & cLaypool publishers
Copyright © 2016 by Morgan & Claypool
DOI 10.2200/S00707ED1V01Y201602DMK012
Lecture #12
Series Editors: Jiawei Han, University of Illinois at Urbana-Champaign
Lise Getoor, University of Maryland
Wei Wang, University of North Carolina, Chapel Hill
Johannes Gehrke, Cornell University
Robert Grossman, University of Chicago
Series ISSN
Print 2151-0067 Electronic 2151-0075
ABSTRACT
Many scientific disciplines rely on observational data of systems for which it is difficult (or impos-
sible) to implement controlled experiments. Data analysis techniques are required for identifying
causal information and relationships directly from such observational data. is need has led to
the development of many different time series causality approaches and tools including transfer
entropy, convergent cross-mapping (CCM), and Granger causality statistics.
A practicing analyst can explore the literature to find many proposals for identifying drivers
and causal connections in time series data sets. Exploratory causal analysis (ECA) provides a
framework for exploring potential causal structures in time series data sets and is characterized
by a myopic goal to determine which data series from a given set of series might be seen as the
primary driver. In this work, ECA is used on several synthetic and empirical data sets, and it is
found that all of the tested time series causality tools agree with each other (and intuitive notions
of causality) for many simple systems but can provide conflicting causal inferences for more com-
plicated systems. It is proposed that such disagreements between different time series causality
tools during ECA might provide deeper insight into the data than could be found otherwise.
KEYWORDS
time series causality, leaning, exploratory causal analysis
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Notation and Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Time Series Causality Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Exploratory Causal Analysis Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Moving Forward with Data Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Causality Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Foundational Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.1 Philosophical Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2 Natural Science Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.3 Psychological (and other Social Science) Studies . . . . . . . . . . . . . . . . . . . 13
2.2 Data Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Statistical Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.2 Computational Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.3 Time Series Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.1 ECA Results and Efficacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Preface
Consider a scientist wishing to find the driving relationships among a collection of time series
data. e scientist probably has a particular problem in mind, e.g., comparing the potential driving
effects of different space weather parameters, but a quick search of the data analysis literature
would reveal that this problem is found in many different fields. ey would find proposals for
different approaches, most of which are justified with philosophical arguments about definitions
of causality and are only applicable to specific types of data. However, the scientist would not find
any consensus of which tools consistently provide intuitive causal inferences for specific types of
systems. e literature seems to lack straightforward guidance for drawing causal inferences from
time series data. Many of the proposed approaches are tested on a small number of data sets,
usually generated from complex dynamics, and most authors do not discuss how their techniques
might be used as part of a general causal analysis.
is work was developed from the realization that drawing causal inferences from time
series data is subtle. e study of causality in data sets has a long history, so the first step is to
develop a loose taxonomy of the field to help frame the specific types of approaches an analyst may
be seeking (e.g., time series causality). en, the philosophical causality studies must be carefully
and deliberately divorced from the data causality studies, which is done here with the introduction
of exploratory causal analysis (ECA). Finally, examples need to be presented where the different
approaches are compared on identical data sets that have strongly intuitive driving relationships.
Using such an approach, the analyst can develop an understanding of how a causal analysis might
be performed, and how the results of that analysis can be interpreted. is work presents all three
of these steps and is intended as an introduction and guide to such analysis.
James M. McCracken
March 2016
CHAPTER 1
Introduction
Data analysis is a fundamental part of physics. e theory-experiment cycle common to most
sciences [1] invariably involves an analyst trying to draw conclusions from a collection of data
points. Gerald van Belle outlines data analysis as four questions in what he calls the “Four Ques-
tions Rule of umb” [2] (distilled from Fisher [3]), “Any statistical treatment must address the
following questions:
2. Can it be measured?
ese items are each addressed by one of the three primary components of data analysis,
• the question (i.e., what is the purpose of the data analysis?; van Belle item # 1),
• the data (i.e., what measurements are available to address the question?; van Belle items #
2 and 3), and
• the tools (i.e., what methods, techniques, or approaches can be used on the data to address
the question?; van Belle item # 4).
is work focuses on the tools (the third component) given data in a specific form (the second
component) and a distinct, limited question (the first component). e concern of this work is data
analysis to answer the question, “Does a given pair of time series data potentially contain some
causal structure?” In physics, this question is common when data has been collected over time
(i.e., time series data) in an experiment that was not intentionally designed to verify some existing
theory, e.g., recording daily precipitation and temperature at a specific location for many years. e
physicist’s preliminary analysis of such data would be exploratory [4], and a part of that exploratory
analysis may involve investigations of potential causal structure. e word “potential” is important
to the analysis, as confirmation of any causal structure in the system would require further analysis,
possibly including the collection of more data and/or the design on new experiments to collect
new data. Such analysis would be confirmatory and one of the main purposes of the exploratory
analysis is to guide confirmatory analysis [4].
2 1. INTRODUCTION
Causal inference as a part of data analysis (referred to in this work as data causality; see
Section 2) is a topic of debate among many researchers and has been for many years (see [5] for
an introduction). In 1980, Clive Granger stated
“Attitudes toward causality differ widely, from the defeatist one that it is impossible to
define causality, let alone test for it, to the populist viewpoint that everyone has their
own personal definition and so it is unlikely that a generally acceptable definition
exists. It is clearly a topic in which individual tastes predominate, and it would be
improper to try to force research workers to accept a definition with which they feel
uneasy. My own experience is that, unlike art, causality is a concept whose definition
people know what they do not like but few know what they do like.” [6]
Granger was referring to the attitudes of statisticians, economists, and philosophers of the time
who often repeated the platitude “correlation is not causation” [7] and declared statements of
causality were only possible in experiments with randomized trials, as described by Fisher [3, 8].
General statements of causality in modern social science data analysis often still require Fisher
randomization in the experimental design [9, 10]. It has come to be recognized, however, that
randomized trials can be prohibitively difficult (e.g., too expensive) or impossible to conduct in
practice (e.g., subjecting one subject to multiple treatments simultaneously) [9, 10]. Techniques
have been developed for causal inference when Fisher randomization is unavailable. It has been
argued that, in principle, Fisher randomization is not possible in any social science¹ experiment
and thus modern data causality techniques are required for rigorous causal inference of such data
sets [11].
Physics has a long history of studying causality, often more closely related to philosophical
considerations rather than specific data sets of physical systems (see, e.g., [12]). Physics applies
the scientific tradition of a prediction-experiment-repeat cycle [1] to fundamental systems, such
as billiard balls and particle interactions, where randomization of hidden variables is usually not
much of a concern. Some physicists consider statements of causality to be impossible without
direct intervention into the system dynamics [13]. Modern data collection techniques, however,
include measurements of system dynamics for which interventions are not technologically feasi-
ble. One such data set discussed in this work in is NASA/GSFC’s Space Physics Data Facility’s
OMNIWeb (or CDAWeb or ftp) service, and OMNI data set [14]. Causality studies of such
systems require data causality tools rather than traditional experiments (i.e., interventions into
the system dynamics).
3. present examples where the different approaches are compared on identical data sets, which
have driving relationships that are strongly intuitive.
Step 1 has been attempted by several authors (see, e.g., [15, 16, 29]). However, the tax-
onomies are usually either too broad to help an analyst surveying the field for useful tools, or far
too specific for such an analyst to understand without reading large amounts of background litera-
ture. is work introduces a taxonomy in Section 2 for which the category boundaries are defined
by the type of question the analyst is asking and the type of data available. is approach neces-
sarily leads to fuzzy boundaries within the taxonomy (e.g., many different types of data might be
available to answer the same question), but the taxonomy is not meant to be rigorous. is type
of taxonomy is currently missing from the field, and is intended to help guide the analyst within
the vast literature of causality studies.
Step 2 is often ignored completely by most authors, with the notable exception of Granger
[6]. Many authors often take the opposite approach and use philosophical arguments to justify
a particular tool or set of tools (see, e.g., [15, 30, 31]). However, for a practicing analyst, this
approach does not provide an understanding of how a particular tool might work with a given
data set. e primary question for an analyst is “Does this tool work?” or, more specifically, “Does
this particular approach to drawing causal inferences from the data do so correctly and consistently
for the particular type of data available to me?” Many philosophical arguments require the analyst
to define what they mean by “causal,” “correctly,” “consistently,” and sometimes even “data,” which
can be both frustrating and potentially limiting for an analyst who may avoid a particular tool (that
would otherwise be useful for them) because of philosophical objections. is work introduces
the concept of exploratory causal analysis,⁸ as opposed to confirmatory causal analysis, in which
operational definitions of causality are presented for specific tools. Such causal definitions are
specific and are not expected to meet any philosophical requirements such as those presented by
Suppes [32] or Good [33].⁹ Furthermore, this work posits that the data analysis may use many
different operational definitions of causality to draw as much causal inference from the data as
possible. is work not only outlines this new approach to data causality but also demonstrates
its usefulness with different examples and illustrates how such an approach may be undertaken
on a time series data set.¹⁰
Step 3 is, for an analyst looking for practical tools, a troubling omission of the current field.
Time series causality studies in particular often introduce new tools using data sets for which
there are not driving relationships that are either intuitively obvious or justified by outside theory
⁸See Section 1.3.
⁹See Section 1.2.
¹⁰See Section 4.
1.4. MOVING FORWARD WITH DATA CAUSALITY 9
(see, e.g., [30, 34, 35]). It is difficult to interpret the results of such studies. Does the proposed
tool work? Is the operational definition of causality presented in the study comparable to the
causal inferences the analyst is hoping to draw from their own data? Such questions cannot be
answered if the causal inferences implied by the proposed tools cannot be compared to nominally
“correct” causal inferences (i.e., intuitive causal inferences or those supported by well-established
theoretical models of the system dynamics). Many authors comment on how a proposed time
series causality tool may compare to existing tools but rarely apply both tools to the same data,
which leaves an analyst to rely on the authors’ comments alone when trying to determine which
tool might be best for a particular causal analysis. It has been shown that such an approach to in-
troducing new time series causality tools can actually lead an analyst to trust that a given tool may
provide reliable causal inference in situations where that is not true. Convergent cross-mapping
(CCM)¹¹ was shown to provide counter-intuitive causal inferences for simple example systems,
which this work addresses by introducing pairwise asymmetric inference; see Section 3.3 and
[36]. It is shown in Section 4 that many different time series causality tools may provide iden-
tical causal inferences despite expectations to the contrary (e.g., compare the discussion of how
Granger causality compares to CCM in [30] to the results presented in Section 4.2.5). is work
introduces the idea of comparing different time series causality tools on identical data sets, not
only to compare the tools but also to understand how applying different operational definitions
of causality to the same data set may provide deeper insight into the data.
Causality Studies
Studies of causality are as old as the scientific method itself.¹ An overview of such studies is far
beyond the scope of this work and would probably fill several volumes. As Holland says, “So
much has been written about causality by philosophers that it is impossible to give an adequate
coverage of the ideas that they have expressed in a short article” [16]. Physicists, economists (and
econometricians), mathematicians, statisticians, computer scientists, and scientists from a wide
array of other fields (including medicine, psychology, and sociology) have also contributed greatly
to the vast body of literature on causality.
is section is meant to provide a loose taxonomy of these studies, and is not meant to
provide an overview of these studies. Authors often discuss causality with little attention given
to the specific definitions of their causal language or the specific goals they are hoping to achieve
with their work (see, e.g., [6, 16] for a discussion of such issues). is taxonomy is meant to
explain where exploratory causal analysis, as it is defined in Section 1.3, relates to other causality
studies. is taxonomy will not be complete, nor will the boundaries be clean. Several authors,
particularly in theoretical physics, blur boundaries, e.g., between data studies and philosophy (see,
e.g., [13]).
Holland outlines four primary focuses for authors studying causality, the ultimate mean-
ingfulness of the notion of causality, the details of causal mechanisms, the causes of a given effect,
and the effects of a given cause [16]. e first two goals in this list will fall into a category called
foundational causality, and the last two will fall into a category called data causality. It is assumed
that all causality studies can be placed into one of these two categories, but the categories are
not considered disjoint. Data causality studies are characterized by the use of data (empirical or
synthetic) in discussion of causality, and foundational causality studies are characterized as not
being data causality studies.² Many authors have argued that foundational causality should not be
studied independent of data causality (see, e.g., [7, 12, 16, 42–45]). e authors of data causality
studies often bemoan the lack of academic consensus among foundational studies while recogniz-
ing the need for operational notions of causality in data analysis (see, e.g., [6, 15, 46–48]). Time
¹Aristotle is often credited with both the first theory of causality [37, 38] and an early version of the scientific method [39]
(along with other ancient Greek philosophers [40]). Aristotle’s death was in 322 BCE [41].
²ese two notions may seem at odds. If s is a causality study, then, given the description, it may belong to either the set of
foundational studies, F , or the set of data studies, D ; i.e., s 2 F or s 2 D . If F \ D ¤ ;, then 9s such that s 2 F and s 2 D .
If F D D c , where D c is the complement of D , i.e., foundational studies are those studies that are not data studies, then 9s
such that s 2 D c and s 2 D . is notion may seem nonsensical when presented in this set notation, but is common in practice
due to the large variety of things that may be described as s . For example, a single study s may include the introduction of
a completely new definition of “causality” and present several examples using empirical data to motivate the veracity of the
definition. See the work of Illari et al. for an exploration of these ideas [5, 11].
12 2. CAUSALITY STUDIES
series causality is considered a subset of data causality, and, therefore, data causality will be ex-
plored in more detail than foundational causality. Introductions and discussions of foundational
causality studies can be found in [5, 9, 12, 13, 15, 31].
“In speaking of cause and effect we arbitrarily give relief to those elements to whose
connection we have to attend in the reproduction of a fact in the respect in which it
is important to us. ere is no cause and effect in nature; nature has but an individual
existence; nature simply is. Recurrences of like cases in which A is always connected
with B , that is, like results under like circumstances, that is again, the essence of the
connection of cause and effect, exist but in the abstraction which we perform for the
purpose of mentally reproducing the facts.”[83]
Bunge provides a critical examination of Mach’s proposal that causality is nothing more than
functional dependence in [13].
e issue facing an analyst, however, is that Eq. (2.1) can never be measured for some systems.
Holland refers to this as the Fundamental Problem of Causal Inference, i.e.,
where ˇmn is the structural coefficient (to which the causal interpretations are applied), ˛ is a
fitting coefficient, and 'i are noise terms [18, 113, 115]. is equation is, by definition, an SEM,
not a regression, which implies it is not equivalent [18] to
1
ni D ˇmn .mi ˛ 'i / : (2.4)
e first SEM is for the causal relationship M ! N , and modeling the associated causal rela-
tionship of this data pair, N ! M, requires a separate SEM in this framework, e.g.,
ni D
C ˇnm mi C #i : (2.5)
18 2. CAUSALITY STUDIES
e ambiguity of the equals sign (i.e., “D”) in SEM has been argued to be one of the primary
sources of confusion for analysts using the method, and proposals have been made to instead use
assignment operators (i.e., “WD”) [114] or associated “path diagrams” (i.e., directed acyclic graphs)
[15].
Bollen and Pearl provide an overview of many common criticisms (along with responses)
of SEM [18]. e philosophical framework of using SEM for causal inference has also been
criticized in the literature (see, e.g., [116, 117]). ere have also been criticisms regarding the
causal interpretations of SEM in particular (see, e.g., [118, 119]). It has been shown that SEM
and the potential outcome approach are equivalent [10, 15, 18], although it has been argued that
the potential outcome approach is more formal (see a summary of the argument in [18]), but that
an SEM may be seen as a “formula for computing the potential outcomes” [120].
Graphical models are a part of many statistical fields, particularly statistical physics [121].
Path analysis has been a part of statistics since the early work of Wright [122]. Eerola notes
in [110] that the practice actually began before Wright with the work of Yule in 1903 [123].
Directed acyclic graphs (DAGs; also called path diagrams or causal graphs) were originally an
aid for causal inference with SEM [120]. e formal theory of DAGs was developed as part of
machine learning efforts to model probabilistic reasoning, e.g., using Bayesian nets [124]. An
example of a DAG for a system containing three variables A, B , and C might be
A!B!C (2.6)
or
A!B C ; (2.7)
where ! and represent general causal relations. Formally, a DAG obeys three conditions,
the “causal Markov” condition, the “causal minimality” condition, and the faithfulness condition
[15, 124]. It may be assumed that A and C are independent in the second DAG, which is written
as A ?? C . e Markov condition states A and C are also independent in the first DAG given (or
“conditional on”) the presence of B , i.e., A ?? C jB [124]. e minimality condition states that
any proper subgraph (e.g., removing one of the directed edges, i.e., deleting an arrow) of the DAG
containing the same vertex set will obey the Markov condition [124, 125]. e faithfulness con-
dition states that all probabilistic independences within the model set of variables (e.g., fA; B; C g
in the example DAGs above) are required by the Markov condition [15, 124, 125]. ese con-
ditions are the framework by which a DAG models probabilistically causal relationships among
variables [15].
SEM and DAG approaches to computational causality have been combined, along with
counterfactuals, in the Structural Causal Model (SCM) proposed by Pearl [15, 126]. Pearl lists
the goals of SCM as (in [126]):
1. “e unification of the graphical, potential outcome, structural equations, decision ana-
lytical [104], interventional [127], sufficient component [128] and probabilistic [32] ap-
proaches to causation; with each approach viewed as a restricted version of the SCM.”
2.2. DATA CAUSALITY 19
2. “e definition, axiomatization and algorithmization of counterfactuals and joint probabil-
ities of counterfactuals.”
3. “Reducing the evaluation of ‘effects of causes,’ ‘mediated effects,’ and ‘causes of effects’ to
an algorithmic level of analysis.”
4. “Solidifying the mathematical foundations of the potential-outcome model, and formulat-
ing the counterfactual foundations of structural equation models.”
5. “Demystifying enigmatic notions such as ‘confounding,’ ‘mediation,’ ‘ignorability,’ ‘compa-
rability,’ ‘exchangeability (of populations),’ ‘superexogeneity’ and others within a single and
familiar conceptual framework.”
6. “Weeding out myths and misconceptions from outdated traditions [129–134].”
Introductions to SCM may be found in [15, 126]. Pearl emphasizes that SCM pays special atten-
tion to confounding (through the “back-door criteria” for DAGs) and relies on non-parametric
(i.e., in general, non-linear) SEMs to study counterfactuals, rather than the more traditional linear
SEM [126]. Criticisms of SCM include arguments over the types of causes being explored with
the approach and discussion of counterexamples for which SCM fails (see, e.g., [116, 135, 136]).
Kleinberg has proposed a logical approach to data causality whereby causes are formulated
using probabilistic computation tree logic (PCTL) [31, 35, 137]. A potential cause c is identified
in this framework as a PCTL formula obeying the following three conditions, c has some finite
probability of occurring, the probability of the effect e is less than some value p , and there is a
logical path from c to e that may occur with a probability greater than or equal to p within a
finite amount of time (paraphrased from [137]). e causal significance "avg is defined as
X P .ejc ^ x/ P .ej:c ^ x/
"avg .c; e/ D ; (2.8)
jX n cj
x2X nc
where X is the set of all potential causes of e [35, 137], P .a/ is the probability of a, A ^ B is the
intersection of A and B , and A n B is the set A minus the set B , i.e., A n B D fx j x 2 A and x 62
Bg. e logic used to define the cause and effect relationships is temporal, so there are time
windows associated with each term of the sum in Eq. (2.8). A cause c is an "-significant cause of e
if j"avg j – " for some significance level " [137]. A full introduction to this approach can be found
in [31], which also includes discussions of algorithmically finding "-significant causes in large
data sets and appropriately determining " for the given data (i.e., determining what significance
level " may lead to useful causal inference).
A consequence of these statements is that the causal variable can help forecast the effect variable
after other data has first been used. : : :” (emphasis in original) [7]. is statement reflects the mo-
tivation behind Granger causality, which is perhaps the best known and most widely used time
series causality tool. e precedent cause is common to all time series causality tools, in the way
they are being considered in this work. Ganger does not provide formal definitions of “informa-
tion,” in contrast to the information-theoretic measures, but the second item in the quote may
also be considered common to most time series causality tools. However, Granger’s “consequence
of these statements” is the formulation of a tool based on forecasting models, which is unique
among the time series causality tools discussed in this work.
3.1.1 BACKGROUND
Consider a discrete universe with two time series X D fX t j t D 1; : : : ; ng and Y D fY t j t D
1; : : : ; ng, where t D n is considered the present⁷ time. All knowledge available in the universe at
all times t n will be denoted as ˝n . Granger introduces the following two axioms, which he
assumes always hold [6] (Axioms A and B, Ibid.),
Axiom 3.1 e past and present may cause the future, but the future cannot cause the past.
where "; are uncorrelated, Gaussian noise terms and a, b , and c are the model coefficients.
If 2 .f2 .X// < 2 .f1 .X//, where 2 .Z/ is the variance of the forecast errors for Z, then Y
“Granger causes” X; i.e., if a better prediction of X can be made with Y rather than without it,
then Y causes X in Granger’s framework [27]. is operational definition corresponds Granger’s
original formulation of causality [24, 25], as Granger points out in [6]. Often, however, vector
autoregressive models are used in practice, in conjunction with statistical tests (see, e.g., [165,
166]).
Consider a vector autoregressive (VAR) model for the system of X and Y ,
X n i
Xt Axx Aixy Xt i "x;t
D i i C (3.4)
Yt Ayx Ayy Yt i "y;t
iD1
where "x;y are, again, uncorrelated noise terms. Suppose the coefficient matrix A is found such
that this model is optimal in some sense, e.g., the forecast error is minimized. If this model is
the best fit possible for the system of X and Y , then Y Granger causes X only if Aixy ¤ 0 8i
i
and X Granger causes Y only if Ayx ¤ 0 8i . A statistical test can then be formulated under the
null hypothesis for non-causality, i.e., Y does not Granger cause X or vice versa, if Aixy D 0 or
i
Ayx D 0. For example, Toda et al. outline the creation of several different statistics, including a
Wald statistic, to test for Granger non-causality [167].
X D fX t j t D 1; 2; : : : ; 10g
D f0; 0; 1; 0; 0; 1; 0; 0; 1; 0g
Y D fY t j t D 1; 2; : : : ; 10g
D f0; 0; 0; 1; 0; 0; 1; 0; 0; 1g :
are fit to the data using well-known algorithms (see [143] for a discussion of such algorithms).
ese models can be compared using a null hypothesis that, e.g., Axy;1 D Axy;2 D : : : D Axy;n D
0 in Eq. (3.6), which can be tested using different test statistics [143, 167]. It has been argued
that the log-likelihood statistic
j˙ 0 j
FY !X D ln xx ; (3.8)
j˙xx j
0
where ˙xx D cov."0x;t / is the covariance of the X model residuals for Eq. (3.7) and ˙xx D
cov."x;t / is the covariance of the X model residuals for Eq. (3.6), is the most appropriate Granger
causality test statistic because of various formal mathematical properties and an information the-
oretic interpretation in analogy with transfer entropy [143, 168].
e interpretation of the magnitude of the log-likelihood statistic in information theoretic
terms (i.e., as information flow) allows the individual statistics to be compared. As Barnett et al.
point out [143], Granger causality statistics are often used only for hypothesis testing. However,
the primary question of bivariate exploratory causal analysis is which time series may be seen
as the stronger driver, and such a question may be difficult to address using traditional Granger
causality testing. For example, consider a scenario in which the null hypothesis, NXY , “X does not
cause Y ” has been rejected at a significance level of 0.05 and the null hypothesis, NYX , “Y does not
cause X” has also been rejected at a significance level of 0.05. What should the causal inference
be? If NX Y is rejected at a lower significance level than NYX , e.g., 10 3 and 0:05, respectively,
should the causal inference be Y ! X because there is “more confidence” that X does not cause
Y than vice versa (i.e., NX Y is rejected at a lower significance level)? If the log-likelihood statistic
is interpreted as an information flow, then the information flow between the time series can be
compared.⁹ For example, if FX !Y FY !X > 0, then the causal inference is X ! Y . See [143]
for a discussion of interpreting the G ranger causality log-likelihood measure as a value with units
(e.g., “bits”), as is done in this work, rather than just the test statistic of a hypothesis test.
⁹A similar approach will be used with the transfer entropy in Section 3.2.
26 3. TIME SERIES CAUSALITY TOOLS
Many of the theoretical criticisms of Granger causality can be levied on most of the other
time series causality measures used in this work. For example, Granger emphasizes if X Granger
causes Y then X should only be considered a potential physical cause [24, 169], which is true of
every measure used in this work. e inability of Granger causality to identify confounding in the
system, the apparent equating of causality with predictability, and the complete dependence on
observed data for causal inference are, likewise, not unique to this tool. ese issues are discussed
in Section 1.2.
A pair of perceived limitations of Granger causality, linearity and stationarity [170], have
been addressed by extensions of the operational definitions described in Section 3.1.1. Eq. (3.1)
is not restricted by linearity or stationarity, but the operational definitions originally introduced
by Granger relied on linear models for the times series under consideration. Nonlinear extensions
have been approached with a variety of techniques, including SSR techniques [171], radial basis
functions [172], and assumptions of local linearity [173]. e use of Granger causality with non-
stationary time series has also been explored in the literature (see, e.g., [174–176]). Computational
approaches to Granger causality have also been studied as tools for both causal modeling [177] and
causal inference in general [178, 179]. e Granger causality framework has also been extended
to include spectral (e.g., Fourier transform) methods [180].
3.2.1 BACKGROUND
Consider a random variable X that takes value Xn with probability pn with n D 1; 2; : : : ; NX .
e probability distribution P .X D Xn / D pn 8Xn has a discrete Shannon entropy, HX defined
as
NX
X
HX D pn log2 pn ; (3.9)
nD1
where the logarithm base determines the entropy units, which, in this case, is “bits” [34, 148, 149,
181], and log2 0 WD 0. e Shannon entropy was developed as a measure of “of how uncertain we
are of the outcome” X D Xn [181]; i.e., if a transmitter sends a message to a receiver over some
channel, where the message is modeled as the random variable X, then how much certainty does
the receiver have in receiving the specific value X D Xn ? If the transmitter only ever sends Xn , i.e.,
P .X D Xn / D 1, then the receiver is completely certain that X D Xn and the Shannon entropy
is zero [181].
3.2. INFORMATION-THEORETIC CAUSALITY 27
e error made by incorrectly assuming P .X D Xn / D qn (rather than pn ) is the Kullback
entropy¹⁰
NX
X pn
KX D pn log2 ; (3.10)
nD1
qn
where, again, the base of the logarithm is due to the unit choice [34, 148, 149, 183]. Henceforth
throughout this section, the logarithm base notation will be dropped and all logarithms should
be assumed base 2 (i.e., everything is in units of bits) unless otherwise noted.
Consider a second random variable Y that takes value Ym with probability pm with m D
1; 2; : : : ; NY . e joint entropy is defined as
NX X
X NY
HX;Y D pn;m log pn;m ; (3.11)
nD1 mD1
where pn;m is the joint probability P .X D Xn ; Y D Ym /. If the two random variables are statis-
tically independent, then HX;Y D HX C HY [148], which is motivation for the introduction of
the mutual information IXIY as
NX X
X NY
pn;m
IX IY D HX C HY HX;Y D pn;m log ; (3.12)
nD1 mD1
pn pm
where the last equality can also be seen as the Kullback entropy due to assuming P .X D Xn ; Y D
Ym / D pn pm [149]. e mutual information is symmetric with respect to X and Y and is, there-
fore, not much use as a time series causality tool. Time lags and conditional entropies, i.e.,
HX jY D HX;Y HY , can be used to make the mutual information non-symmetric, but such
modifications are not easily interpreted in terms of information flow [34] (the assumption is that
a flow of information from one time series to another may be indicative of a driving relationship).
Schreiber’s approach for using entropies to study dynamical structure of data was to focus on
transitional probabilities in the system [34]. Suppose at time t , X.t/ D Xn with some probability
P .X.t/ D Xn.t / / D pn . e additional temporal structure allows for more complicated questions
such as “what is the uncertainty in X.t/ given some value of X.t 1/?” e more basic question
is how to define the transitional probabilities themselves, i.e., what is P .X.t / D Xn jX.t 1/ D
Xn 1 ; X.t 2/ D Xn 2 ; : : : ; X.0/ D X0 / D pnjn 1;n 2;:::;0 ?
Assume X.t/ is governed by a Markov process; i.e., pnjn 1;n 2;:::;0 D pnjn 1 [184]. e
basic concept of transfer entropy is to measure the error made by assuming the Markov process
generating X does not depend at all on the second time series Y . is error is quantified, in
¹⁰is quantity is also known as the Kullback-Leibler divergence, relative entropy, and discrimination information, among
others. Kullback noted in 1987 that this quantity could be found under nine different names in the literature [182].
28 3. TIME SERIES CAUSALITY TOOLS
analogy to the mutual information expression shown above, with the Kullback entropy as
NX X
X NY
pnC1jn;m
TY !X D pnC1;n;m log ; (3.13)
nD1 mD1
pnC1jn
X D fX t j t D 0; 1; : : : ; 9g
D f0; 0; 1; 0; 0; 1; 0; 0; 1; 0g
Y D fY t j t D 0; 1; : : : ; 9g
D f0; 0; 0; 1; 0; 0; 1; 0; 0; 1g ;
where, for convenience, the notation is Z t WD Z.t /. is example system obeys the joint and
conditional probabilities shown in Table 3.1. e causal intuition for this system is that X drives
Y.
e transfer entropy pair for this example can be calculated using Eq. (3.13) and Table 3.1
as
1 1 2 1 1 3
TY !X D log C log .2/ C log .1/ C log 0:31 (3.14)
9 2 9 3 3 2
and
4 7 2 1 7
TX !Y D log C log .1/ C log 0:77 : (3.15)
9 4 9 3 3
e difference TX !Y TY !X D 0:46 is positive, which implies X ! Y as expected.
It can be shown that the transfer entropy is equivalent to Granger causality (up to a factor
of 2) if X and Y are jointly multivariate Gaussian (i.e., both variables follow normal [Gaussian]
distributions individually and jointly) [168]. As Barnett et al. point out in their paper, this re-
sult provided “for the first time a unified framework for data-driven causal inference that bridges
3.2. INFORMATION-THEORETIC CAUSALITY 29
Table 3.1: Conditional and joint probabilities of fX; Y g in Section 3.2.2 (the shorthand notation
Z t WD Z.t/ is used for convenience)
3.3.1 BACKGROUND
CCM uses points with the most similar histories to X t to estimate Y t , where similar histories are
defined as nearest neighbors on a shadow manifold. e CCM correlation is the squared Pearson
correlation coefficient¹³ between the original time series Y and an estimate of Y made using its
convergent cross-mapping with X, which is labeled as Y jX Q:
Q 2 :
CYX D .Y ; Y jX/ (3.16)
Any pair of times series, X and Y , will have two CCM correlations, CYX and CX Y , which are
compared to determine the time series causality. Sugihara et al. [30] define a difference of CCM
¹²For example, CCM has been used to draw conclusions regarding the “controversial sardine-anchovy-temperature” problem
[30], confirm predictions of climate effects on sardines [198], compare the driving effects of precipitation, temperature, and
solar radiation on the atmospheric CO2 growth rate [199], and to quantify cognitive control in developmental psychology
[200]. e technique has also been presented as a useful tool in studying the causality of respiratory systems in insects [201].
¹³is definition differs slightly from the definition in [30], which uses the un-squared Pearson’s correlation coefficient.
3.3. STATE SPACE RECONSTRUCTION CAUSALITY 31
correlations
D CYX CX Y (3.17)
and use the sign of (along with arguments of convergence¹⁴) to determine the time series causal-
ity between X and Y . If X can be estimated using Y better than Y can be estimated using X (e.g.,
if < 0), then X drives Y .
An algorithm to find the CCM correlations may be written in terms of five steps:
1. Create the shadow manifold for X, called X Q Given an embedding dimension E , the shadow
Q
manifold of X, labeled X, is created by associating an E -dimensional vector (also called a
delay vector) to each point X t in X, i.e., XQ t D X t ; X t ; X t 2 ; : : : ; X t .E 1/ . e first
such vector is created at t D 1 C .E 1/ and the last is at t D L where L is the number
of points in the time series (also called the library length).
2. Find the nearest neighbors to a point in the shadow manifold at time t , XQ t e minimum
number of points required for a bounding simplex in an E -dimensional space is E C 1 [195,
196]. us, the set of E C 1 nearest neighbors must be found for each point on the shadow
manifold, XQ t . For each XQ t , the nearest neighbor search results in a set of distances that are
ordered by closeness fd1 ; d2 ; : : : ; dE C1 g and an associated set of times ftO1 ; tO2 ; : : : ; tOE C1 g.
e distances from XQ t are
di D D XQ t ; XQ tOi ; (3.18)
where D.a; Q is the Euclidean distance between vectors aQ and bQ .
Q b/
3. Create weights using the nearest neighbors Each of the E C 1 nearest neighbors¹⁵ are used
to compute an associated weight. e weights are defined as
ui
wi D ; (3.19)
N
P C1
where ui D e di =d1 and the normalization factor is N D jED1 uj :
Q ) A point Y t in Y is estimated
4. Estimate Y using the weights; (this estimate is called Y jX
using the weights calculated above. is estimate is
E
X C1
Q D
Y t jX wi YtOi : (3.20)
iD1
X D fX t j t D 1; 2; : : : ; 10g
D f0; 0; 1; 0; 0; 1; 0; 0; 1; 0g
Y D fY t j t D 1; 2; : : : ; 10g
D f0; 0; 0; 1; 0; 0; 1; 0; 0; 1g :
is noiseless impulse-response system obeys Y t D X t 1 , leading to the intuitive causal relation-
ship X drives Y . e PAI shadow manifolds for these two signals can be written down using an
embedding dimension E D 3 and a delay time step D 1 as
˚
Q D XQ t j t D 3; 4; : : : ; 10
X (3.22)
8 9
ˆ
ˆ f1; 0; 0; 0g ; >>
ˆ
ˆ >
ˆ
ˆ f0; 1; 0; 1g ; >>
>
ˆ
ˆ >
>
ˆ
ˆ f0; 0; 1; 0g ; >
>
ˆ
< >
=
f1; 0; 0; 0g ;
D (3.23)
ˆ
ˆ f0; 1; 0; 1g ; >>
ˆ
ˆ >
ˆ
ˆ f0; 0; 1; 0g ; >>
>
ˆ
ˆ >
>
ˆ f1; 0; 0; 0g ; >
ˆ >
:̂ >
;
f0; 1; 0; 1g
3.3. STATE SPACE RECONSTRUCTION CAUSALITY 33
and
˚
YQ D 8YQt j t D 3; 4; : 9
: : ; 10 (3.24)
ˆ
ˆ f0; 0; 0; 1g ; >
>
ˆ
ˆ >
ˆ
ˆ f1; 0; 0; 0g ; >
>
>
ˆ f0; 1; 0; 0g ; >
ˆ >
ˆ
ˆ >
>
ˆ
< >
=
f0; 0; 1; 1g ;
D : (3.25)
ˆ
ˆ f1; 0; 0; 0g ; >
>
ˆ
ˆ >
ˆ
ˆ f0; 1; 0; 0g ; >
>
>
ˆ
ˆ >
ˆ f0; 0; 1; 1g ; >
ˆ >
>
:̂ >
;
f1; 0; 0; 0g
Let the Euclidean distances between a PAI shadow manifold vector and every other vector in the
manifold be
˚ Z ˚
dt D nD ZQ m ; ZQ t j m 2 Œ3; 10 o (3.26)
p
D .am a t /2 C .bm b t /2 C .cm c t /2 C .dm d t /2 j m 2 Œ3; 10 ;(3.27)
8 n p p p p p o 9
ˆ
ˆ 0; 3; 2; 0; 3; 2; 0; 3 ; >
>
ˆ
ˆ np >
>
ˆ
ˆ p p p p o >
>
ˆ
ˆ 3; 0; 3; 3; 0; 3; 3; 0 ; >
>
ˆ
ˆ np p >
>
ˆ
ˆ p p p p o >
>
ˆ
ˆ 2; 3; 0; 2; 3; 0; 2; 3 ; >
>
ˆ
ˆ n p p >
>
ˆ
ˆ p p p o >
>
˚ X < 0; 3; 2; 0; 3; 2; 0; 3 ; =
dt D np p p p p o (3.28)
ˆ
ˆ 3; 0; 3; 3; 0; 3; 3; 0 ; >
>
ˆ
ˆ np p >
>
ˆ
ˆ p p p p o >
>
ˆ
ˆ 2; 3; 0; 2; 3; 0; 2; 3 ; >
>
ˆ
ˆ n p p >
>
ˆ
ˆ p p p o >
>
ˆ
ˆ 0; 3; 2; 0; 3; 2; 0; 3 ; >
>
ˆ
ˆ np >
>
ˆ p p p p o >
>
:̂ 3; 0; 3; 3; 0; 3; 3; 0 ;
34 3. TIME SERIES CAUSALITY TOOLS
and 8 n p p p p p o 9
ˆ
ˆ 0; 2; 2; 1; 2; 2; 1; 2 ; >
>
ˆ
ˆ np >
>
ˆ
ˆ p p p p o >
>
ˆ
ˆ 2; 0; 2; 3; 0; 2; 3; 0 ; >
>
ˆ
ˆ np p >
>
ˆ
ˆ p p p p o >
>
ˆ
ˆ 2; 2; 0; 3; 2; 0; 3; 2 ; >
>
ˆ
ˆ n p p >
>
ˆ
ˆ p p p o >
>
˚ Y < 1; 3; 3; 0; 3; 3; 0; 3 ; =
dt D np p p p p o : (3.29)
ˆ
ˆ 2; 0; 2; 3; 0; 2; 3; 0 ; >
>
ˆ
ˆ np p >
>
ˆ
ˆ p p p p o >
>
ˆ
ˆ 2; 2; 0; 3; 2; 0; 3; 2 ; >
>
ˆ
ˆ n p p >
>
ˆ
ˆ p p p o >
>
ˆ
ˆ 1; 3; 3; 0; 3; 3; 0; 3 ; >
>
ˆ
ˆ np >
>
ˆ p p p p o >
>
:̂ 2; 0; 2; 3; 0; 2; 3; 0 ;
Let the pair d tZ.j / ; j denote for each manifold vector ZQ t the distance to ZQ j and the time t D j .
e ordered distances, ignoring the self-distances, are then
8 n p p p p p o 9
ˆ
ˆ .0; 6/ ; .0; 9/ ; 2; 8 ; 2; 5 ; 3; 4 ; 3; 7 ; 3; 10 ; >
>
ˆ
ˆ n p p p p p o >
>
ˆ
ˆ >
>
ˆ
ˆ .0; 7/ ; .0; 10/ ; 3; 3 ; 3; 5 ; 3; 6 ; 3; 8 ; 3; 9 ; >
>
ˆ
ˆ n o >
>
ˆ
ˆ p p p p p p >
>
ˆ
ˆ .0; 8/ ; 2; 3 ; 2; 6 ; 2; 9 ; 3; 4 ; 3; 7 ; 3; 10 ; >
>
ˆ n
ˆ p p p p p o >
>
ˆ
ˆ >
>
˚ X < .0; 3/ ; .0; 9/ ; 2; 5 ; 2; 8 ; 3; 4 ; 3; 7 ; 3; 10 ; =
dt o D n p p p p p o (3.30)
ˆ
ˆ .0; 4/ ; .0; 10/ ; 3; 3 ; 3; 5 ; 3; 6 ; 3; 8 ; 3; 9 ; > >
ˆ
ˆ n p p p p p p o > >
ˆ
ˆ >
>
ˆ
ˆ .0; 5/ ; 2; 3 ; 2; 6 ; 2; 9 ; 3; 4 ; 3; 7 ; 3; 10 ; >
>
ˆ
ˆ n p p p p p o >
>
ˆ
ˆ >
>
ˆ
ˆ .0; 3/ ; .0; 6/ ; 2; 5 ; 2; 8 ; 3; 4 ; 3; 7 ; 3; 10 ; >
>
ˆ
ˆ n o >
>
ˆ p p p p p >
>
:̂ .0; 4/ ; .0; 7/ ; 3; 3 ; 3; 5 ; 3; 6 ; 3; 8 ; 3; 9 ;
and
8 np p p p p o 9
ˆ
ˆ 2; 4 ; 2; 5 ; 2; 7 ; 2; 8 ; 2; 10 ; .1; 6/ ; .1; 9/ ; >
>
ˆ
ˆ n p p p p p o >
>
ˆ
ˆ >
>
ˆ
ˆ .0; 7/ ; .0; 10/ ; 2; 3 ; 2; 5 ; 2; 8 ; 3; 6 ; 3; 9 ; >
>
ˆ
ˆ n p p p p p p o >
>
ˆ
ˆ >
>
ˆ
ˆ .0; 8/ ; 2; 3 ; 2; 4 ; 2; 7 ; 2; 10 ; 3; 6 ; 3; 9 ; >
>
ˆ
ˆ n p p p p p o >
>
ˆ
ˆ >
>
˚ Y < .0; 9/ ; 3; 4 ; 3; 5 ; 3; 7 ; 3; 8 ; 3; 10 ; .1; 3/ ; =
dt o D n p p p p p o :
ˆ
ˆ .0; 4/ ; .0; 10/ ; 2; 3 ; 2; 5 ; 2; 8 ; 3; 6 ; 3; 9 ; >
>
ˆ
ˆ n p p p p p p o >
>
ˆ
ˆ >
>
ˆ
ˆ .0; 5/ ; 2; 3 ; 2; 4 ; 2; 7 ; 2; 10 ; 3; 6 ; 3; 9 ; >
>
ˆ
ˆ n p p p p p o >
>
ˆ
ˆ >
>
ˆ
ˆ .0; 6/ ; 3; 4 ; 3; 5 ; 3; 7 ; 3; 8 ; 3; 10 ; .1; 3/ ; >
>
ˆ
ˆ n p p p p p o >
>
ˆ >
>
:̂ .0; 4/ ; .0; 7/ ; 2; 3 ; 2; 5 ; 2; 8 ; 3; 6 ; 3; 9 ;
(3.31)
3.3. STATE SPACE RECONSTRUCTION CAUSALITY 35
ese ordered distances are then subset to the E C 1 D 4 closest non-zero¹⁶ distances for each
vector in the manifold, which are then used to calculate the weights; i.e.,
8 9
ˆ
ˆ f.0:278; 8/ ; .0:278; 5/ ; .0:222; 4/ ; .0:222; 7/g ; >
>
ˆ f.0:250; 3/ ; .0:250; 5/ ; .0:250; 6/ ; .0:250; 8/g ; >
ˆ >
ˆ
ˆ >
>
ˆ
ˆ >
ˆ f.0:263; 3/ ; .0:263; 6/ ; .0:263; 9/ ; .0:210; 4/g ; >
ˆ >
>
ˆ
< >
=
˚ X f.0:278; 5/ ; .0:278; 8/ ; .0:222; 4/ ; .0:222; 7/g ;
wt o D (3.32)
ˆ
ˆ f.0:250; 3/ ; .0:250; 5/ ; .0:250; 6/ ; .0:250; 8/g ; >
>
ˆ
ˆ >
ˆ
ˆ f.0:263; 3/ ; .0:263; 6/ ; .0:263; 9/ ; .0:210; 4/g ; >
>
>
ˆ
ˆ >
>
ˆ
ˆ f.0:278; 5/ ; .0:278; 8/ ; .0:222; 4/ ; .0:222; 7/g ; >
>
:̂ >
;
f.0:250; 3/ ; .0:250; 5/ ; .0:250; 6/ ; .0:250; 8/g
and 8 9
ˆ
ˆ f.0:250; 4/ ; .0:250; 5/ ; .0:250; 7/ ; .0:250; 8/g ; >
>
ˆ
ˆ >
>
ˆ
ˆ f.0:263; 3/ ; .0:263; 5/ ; .0:263; 8/ ; .0:210; 6/g ; >
>
ˆ
ˆ >
>
ˆ
ˆ f.0:250; 3/ ; .0:250; 4/ ; .0:250; 7/ ; .0:250; 10/g ; >
>
ˆ
< >
=
˚ Y f.0:250; 4/ ; .0:250; 5/ ; .0:250; 7/ ; .0:250; 8/g ;
wt o D ; (3.33)
ˆ
ˆ f.0:263; 3/ ; .0:263; 5/ ; .0:263; 8/ ; .0:210; 6/g ; > >
ˆ
ˆ >
ˆ
ˆ f.0:250; 3/ ; .0:250; 4/ ; .0:250; 7/ ; .0:250; 10/g ; > >
>
ˆ
ˆ >
>
ˆ f.0:250; 4/ ; .0:250; 5/ ; .0:250; 7/ ; .0:250; 8/g ; >
ˆ >
:̂ >
;
f.0:263; 3/ ; .0:263; 5/ ; .0:263; 8/ ; .0:210; 6/g
˚
Z
where w tZ o is the set of E C 1 pairs w t.j /
; j consisting of normalized¹⁷ weights w t .j / calcu-
lated from the Euclidean distances between manifold vectors ZQ t and ZQ j as described in Eq. (3.19).
ese weights can be used to estimate the original signals as
˚
Q D Y t jX
Y jX Q j t D 3; 4; : : : ; 10 (3.34)
D f0:444; 0; 0:210; 0:444; 0; 0:210; 0:444; 0g (3.35)
and
˚
XjYQ D X t jYQ j t D 3; 4; : : : ; 10 (3.36)
D f0; 0:473; 0:250; 0; 0:473; 0:250; 0; 0:473g : (3.37)
¹⁸As Ma et al. state, “In fact, due to the computational way of state space reconstruction using delayed embedding technique,
sufficiently long time series are required to guarantee that the nearest neighbors on the reconstructed attractor converge to
the true neighborhood. : : : us, detecting causality based on nearest neighbors and mutual neighbors essentially requires
sufficiently long time series data to make reliable causality detection.” PAI, like CCM, relies on finding nearest neighbors in
the shadow manifold, so this concern also applies to PAI.
¹⁹is technique relies on the proof that any smooth map can be approximated by a neural network [205].
²⁰is technique relies on a hypothesis-testing framework to explore continuity in state space reconstructions developed by
Pecora et al. [207].
3.4. CORRELATION CAUSALITY 37
3.4 CORRELATION CAUSALITY
Lagged cross-correlation, also known as cross-lagged correlation, has been a popular time series
causality tool in psychology [19, 208] and general signal analysis for many years [163]. Some
authors consider it to be the first time series causality tool [208], with origins that can be traced
back to 1901 [209]. e shortcomings of lagged cross-correlation have been discussed at length
in the literature [19, 163, 210]. It is still, however, among the most popular time series causality
tools because of its simplicity [19, 163].
3.4.1 BACKGROUND
Consider two time series X D fX t jt D 0; 1; 2; : : : ; N g and Y D fY t jt D 0; 1; 2; : : : ; N g. e
lagged cross-correlation is defined as the normalized cross-covariance [139]
E Œ.X t X / .Y t l Y /
lxy D q ; (3.40)
X2 Y2
where l is the lag, Z2 is the variance of Z, Z is the mean of Z, E Œ.X t X / .Y t l Y / is the
cross-covariance, and EŒz is the expectation value of z [139, 163, 211]. Causal inference usually
relies on using differences of these cross-correlations [19, 212, 213]; i.e.,
where jzj is the absolute value of z . If l is positive, then the correlation between fX t jt D
l; l C 1; l C 2; : : : ; N g and fY t jt D 0; 1; 2; : : : ; N lg is higher (i.e., further from zero) than the
correlation between fX t jt D 0; 1; 2; : : : ; N lg and fY t jt D l; l C 1; l C 2; : : : ; N g. e causal
interpretation is as follows: If l > 0, then Y ! X at lag l , and if l < 0, then X ! Y at lag l .
If l D 0, then there is no causal inference at lag l . is interpretation depends on the definition
of l as a lag, i.e., l 0. If l is allowed to be a lead, i.e., l > 0, then these causal inference rules
need to be altered.
X D fX t j t D 0; 1; : : : ; 9g
D f0; 0; 1; 0; 0; 1; 0; 0; 1; 0g
Y D fY t j t D 0; 1; : : : ; 9g
D f0; 0; 0; 1; 0; 0; 1; 0; 0; 1g :
38 3. TIME SERIES CAUSALITY TOOLS
e structure of this system, i.e., Y t D X t 1, implies X drives Y . Table 3.2 can be calculated²¹ by
considering l 2 Œ0; 6.
Table 3.2: Lagged cross-correlation calculations for the example times series pair fX; Y g in Sec-
tion 3.4.2
ι ριxy ριyx Δι
0 0.43 0/43 0.0
1 0.38 1.0 -0.62
2 0.75 0.45 0.30
3 0.40 0.55 -0.15
4 0.32 1.0 -0.68
5 0.61 0.41 0.20
6 0.33 0.58 -0.24
One of the main practical issues with lagged cross-correlations is the decision of which
lag to consider for causal inference. Table 3.2 shows l is nonzero for most l but does not sug-
gest the same causal inference 8 l > 0. One strategy is to determine the relevant lag as the lag
corresponding to the maximum difference, i.e., use m for causal inference where
3.5.1 BACKGROUND
e causal penchant EC 2 Œ1; 1 is defined as
EC WD P .EjC / P EjCN : (3.43)
e motivation for this expression is in the straightforward interpretation of EC as a causal in-
dicator; i.e., if C drives E , then EC > 0, and if EC 0, then the direction of causal influence is
undetermined.
One of the main ideas of the penchant definition is to circumvent philosophical issues
regarding P .EjCN / as being unobservable (see [15]) by using an expression for Eq. (3.43) that
does not have this term. Eq. (3.43) can be rewritten using Bayes’ theorem
P .E/
P .EjC / D P .C jE/ (3.44)
P .C /
and the definitions of probability complements
P .CN / D 1 P .C / (3.45)
is expression gives a penchant that requires only a single conditional probability estimate:
P .C / P .E/
EC D P .EjC / 1 C : (3.47)
1 P .C / 1 P .C /
40 3. TIME SERIES CAUSALITY TOOLS
e penchant is not defined if P .C / or P .CN / are zero (because the conditionals in
Eq. (3.43) would be undefined). us, the penchant is not defined if P .C / D 0 or if P .C / D 1.
e former condition corresponds to an inability to determine causal influence between two time
series when a cause does not appear in one of the series; the latter condition is interpreted as
an inability to determine causal influence between two time series if one is constant. e use of
Bayes’ theorem in the derivation of Eq. (3.47) implies that the penchant is not defined if P .E/
N are zero. e method given in this work uses no a priori assignment of “cause” or “ef-
or P .E/
fect” to a given time series pair when using penchants for causal inference. So, operationally, the
constraints on P .C / and P .E/ only mean that the penchant is undefined between pairs of time
series where one series is constant.
Consider the assignment of X as the cause, C , and Y as the effect, E . If EC > 0, then
pen
the probability that X drives Y is higher than the probability that it does not, i.e., X ! Y . It is
possible, however, that the penchant could also be positive when X is assumed as the effect and
pen
Y is assumed as the cause, i.e., Y ! X. e leaning addresses this apparent confusion via
for which EC 2 Œ 2; 2. A positive leaning implies the assumed cause C drives the assumed effect
E more than the assumed effect drives the assumed cause, a negative leaning implies the effect E
drives the assumed cause C more than the assumed cause drives the assumed effect, and a zero
leaning yields no causal inference.
e possible outcomes are notated as
lean
EC > 0 fC; Eg D fX; Y g ) X ! Y
lean
EC < 0 fC; Eg D fX; Y g ) Y ! X
EC D 0 fC; Eg D fX; Y g ) no conclusion
with fC; Eg D fA; Bg meaning A is the assumed cause and B as the assumed effect.
If EC > 0 with X as the assumed cause and Y as the assumed effect, then X has a larger
penchant to drive Y than Y does to drive X. at is, EC > 0 implies that the difference between
the probability that X drives Y and the probability that it does not is higher than the difference
between the probability that Y drives X and the probability that it does not.
e leaning is a function of four probabilities, P .C /, P .E/, P .C jE/, and P .EjC /. e
usefulness of the leaning for causal inference will depend on an effective method for estimating
these probabilities from times series and a more specific definition of the cause-effect assignment
within the time series pair. An operational definition of C and E will need to be drawn directly
from the time series data if the leaning is to be useful for causal inference. Such assignments,
however, may be difficult to develop and may be considered arbitrary without some underlying
theoretical support. For example, if the cause is x t 1 and the effect is y t , then it may be considered
unreasonable to provide a causal interpretation of the leaning without theoretical support that X
3.5. PENCHANT CAUSALITY 41
may be expected to drive Y on the time scale of t D 1. is issue is, however, precisely one of the
reasons for divorcing the causal inference proposed in this work (i.e., exploratory causal inference)
from traditional ideas of causality. Statistical tools are associational, and cannot be given formal
causal interpretation without the use of assumptions and outside theories (see [5] for an in-depth
discussion of these ideas). In practice, many different potential cause-effect assignments may be
used to calculate different leanings, which may then be compared as part of the causal analysis
of the data. It can be noted that AB WD AB BA ) AB D BA AB WD BA . us, the
causal inference is independent of which times series is initially assumed to be the cause (or effect).
X D fx t j t D 0; 1; : : : ; 9g
D f0; 0; 1; 0; 0; 1; 0; 0; 1; 0g
Y D fy t j t D 0; 1; : : : ; 9g
D f0; 0; 0; 1; 0; 0; 1; 0; 0; 1g :
Because y t D x t 1 , one may conclude that X drives Y . However, to show this result using a
leaning calculation requires first a calculation using the cause-effect assignment fC; Eg D fX; Y g.
For consistency with the intuitive definition of causality, we require that a cause must precede an
effect. It follows that a natural assignment may be fC; Eg D fx t l ; y t g for 1 l < t 9. is
cause-effect assignment will be referred to as the l -standard assignment.
e cause-effect assignment is an assignment of some countable feature of the data in one
time series as the “cause” and another in the other time series as the “effect.” For example, in
the l -standard cause-effect assignment, the cause is the lag l time step in one time series and
the effect is the current time step in the other. e leaning compares the symmetric application
of these cause-effect definitions to the time series pair. So, for the above example of fC; Eg D
fx t l ; y t g, the first penchant will be calculated using fC; Eg D fx t l ; y t g and the second will be
calculated using fC; Eg D fy t l ; x t g. e second penchant is not the direct interchange of C ,
E from the first penchant because such an interchange would violate the assumption that a cause
must precede an effect. For example, if the first penchant in the leaning calculation is calculated
42 3. TIME SERIES CAUSALITY TOOLS
using fC; Eg D fx t l ; y t g, then the second penchant is not calculated using fC; Eg D fy t ; x t l g
because the definition of the effect, x t l , precedes the definition of the cause, y t .
Given .X; Y /, one possible penchant that can be defined using the 1-standard assignment
is
P .x t 1 D 1/ P .y t D 1/
yt D1;x t 1 D1 D 1 C ;
1 P .x t 1 D 1/ 1 P .x t 1 D 1/
with D P .y t D 1jx t 1 D 1/. Another penchant defined using this assignment is yt D0;xt 1 D0
with D P .y t D 0jx t 1 D 0/. ese two penchants are called observed penchants because they
correspond to conditions that were found in the measurements. Two other penchants have
D P .y t D 0jx t 1 D 1/ and D P .y t D 1jx t 1 D 0/, and they are associated with unob-
served conditions. ese unobserved penchants in the leaning calculation would involve a com-
parison of how unlikely postulated causes are to cause given effects. Such comparisons are not as
easily interpreted in the intuitive framework of causality, and as such, are often not used as part
of leaning calculations.
e probabilities in the penchant calculations can be estimated from time series using
counts, e.g.,
nEC 3
P .y t D 1jx t 1 D 1/ D D D1 ;
nC 3
where nEC is the number of times y t D 1 and x t 1 D 1 appears in .X; Y /, and nC is the number
of times the assumed cause, x t 1 D 1, has appeared in .X; Y /.
Estimating the other two probabilities in this penchant calculation using frequency counts
from .X; Y / requires accounting for the assumption that the cause must precede the effect by
shifting X and Y into XQ and YQ such that, for any given t , x
Q t precedes yQ t . For this example, the
shifted sequences are
Q D f0; 0; 1; 0; 0; 1; 0; 0; 1g
X
Q
Y D f0; 0; 1; 0; 0; 1; 0; 0; 1g
which are both shorter than their counterparts above by a single value because the penchants
are being calculated using the 1-standard cause-effect assignment. It follows that xQ t D x t 1 and
yQ t D y t . e probabilities are then
nE 3
P .y t D 1/ D D (3.49)
L 9
and
nC 3
P .x t 1 D 1/ D D ; (3.50)
L 9
where nC is the number of times xQ t D 1, nE is the number of times yQ t D 1, and L is the (“library”)
Q and YQ (which are assumed to be the same length).
length of X
3.5. PENCHANT CAUSALITY 43
e two observed penchants in this example under the assumption that X causes Y (with
l D 1) are
y t D1;x t 1 D1 D 1 (3.51)
and
y t D0;xt 1 D0
D1 :
3
x t D1;y t 1 D0
D ;
7
3
x t D0;y t 1 D1
D ;
7
and
3
x t D0;y t 1 D0
D :
7
e mean observed penchant is the algebraic mean of the observed penchants. For X causes
Y , it is
1
hy t ;xt 1
i D y t D1;xt 1 D1
C y t D0;xt 1 D0
2
D 1
hy t ;x t 1
i D hy t ;x t 1
i hx t ;yt 1
i (3.52)
6
D : (3.53)
7
e weighted mean observed penchant is defined similarly to the mean observed penchant,
but each penchant is weighted by the number of times it appears in the data; e.g.,
1
hy t ;x t 1
iw D ny t D1;x t 1 D1 yt D1;x t 1 D1
L
Cnyt D0;x t 1 D0 y t D0;x t 1 D0
D 1
44 3. TIME SERIES CAUSALITY TOOLS
and
1
hx t ;yt 1
iw D nx t D1;y t 1 D0 x t D1;yt 1 D0
L
Cnx t D0;y t 1 D1 x t D0;yt 1 D1
Cnx t D0;y t 1 D0 x t D0;y t 1 D0
3
D ;
63
where na;b is the number of times the assumed cause a appears with the assumed effect b and L
Q (i.e., L D N l where N is the library length of X and l is the lag used
is the library length of X
in the l -standard cause-effect assignment).
e weighted mean observed leaning follows naturally as
hy t ;x t 1
iw D hyt ;x t 1
iw hx t ;y t 1
iw
60
D :
63
For this example, hyt ;xt 1 iw ) X ! Y as expected.
Conceptually, the weighted mean observed penchant is preferred to the mean penchant
because it accounts for the frequency of observed cause-effect pairs within the data, which is as-
sumed to be a predictor of causal influence. For example, given some pair .A; B/, if it is known
that a t 1 causes b t and both b t D 0 j a t 1 D 0 and b t D 0 j a t 1 D 1 are observed, then com-
parison of the frequencies of occurrence is used to determine which of the two pairs represents
the cause-effect relationship.
If the example time series contained noise, then a realization of the example time series
.X0 ; Y 0 / could be
X0 D fx t0 j t D 0; 1; : : : ; 9g
D f0; 0; 1:1; 0; 0; 1; 0:1; 0; 0:9; 0g
Y0 D fy t0 j t D 0; 1; : : : ; 9g
D f0; 0:2; 0:1; 1:2; 0; 0:1; 0:9; 0:1; 0; 1g :
e previous time series pair, .X; Y /, had only five observed penchants, but .X0 ; Y 0 / has
more due to the noise. It can be seen in the time series definitions that x t0 D x t ˙ 0:1 WD x t ˙ ıx
and x t0 D x t ˙ 0:2 WD x t ˙ ıy . e weighted mean observed leaning for .X0 ; Y 0 / is hyt0 ;xt0 1 iw
0:19.
If the noise is not restricted to a small set of discrete values, then the effects of noise on
the leaning calculations can be addressed by using the tolerances ıx and ıy in the probability
estimations from the data. For example, the penchant calculation in Eq. (3.51) relied on esti-
mating P .y t D 1jx t 1 D 1/ from the data, but if, instead, the data is known to be noisy, then
the relevant probability estimate may be P .y t 2 Œ1 ıy ; 1 C ıy jx t 1 2 Œ1 ıx ; 1 C ıx /. If the
tolerances, ıx and ıy , are made large enough, then the noisy system weighted mean observed
3.5. PENCHANT CAUSALITY 45
leaning, hyt0 ˙ıy ;xt0 1 ˙ıx iw , can, at least in the simple examples considered here, be made equal
to the noiseless system weighted mean observed leaning, i.e., hyt0 ˙ıy ;xt0 1 ˙ıx iw D hyt ;xt 1 iw .
Tolerance domains, however, can be set too large. If the tolerance domain is large enough to
encompass every point in the time series, then the probability of the assumed cause becomes
one, which leads to undefined penchants. For example, given the symmetric definition of the
tolerance domain used here, ıx D 2 implies P .x t 1 D 1 ˙ ıx / D 1, which implies hyt0 ;xt 1 iw is
undefined. e tolerance domains can be interpreted as the set of values that an analyst is willing
to consider equivalent causes or effects. For example, if the lag l time step of X, i.e., x t l , is the
assumed cause of some assumed effect (e.g., the current time step of Y , y t ), then an x-tolerance
domain of Œx t 1 a; x t 1 C b may be thought of as an analyst’s willingness to consider all values
that fall within that domain equivalently as the assumed cause of that assumed effect.
Reasonable leaning calculations require an understanding of the noise in the measurements,
which may not always be possible. Estimating relevant tolerance domains is one of the two key
difficulties in using penchants and leanings for causal inference. e other is finding an appro-
priate cause-effect assignment.
CHAPTER 4
¹e scripts and functions used to calculate the penchants and leanings can be found at https://github.com/jmmccracken.
²e software used to calculate PAI can be found at https://github.com/jmmccracken.
³I.e., unless otherwise noted, the JIDT transfer entropy calculator will always be instantiated in MATLAB with the following
commands,
“teCalc=javaObject('infodynamics.measures.continuous.kernel.TransferEntropyCalculatorKernel');,”
“teCalc.setProperty('NORMALISE','true');,” and teCalc.initialize(1,0.5);.”
48 4. EXPLORATORY CAUSAL ANALYSIS
ternary vector defined as
gE D .g1 ; g2 ; g3 ; : : : ; gn / ; (4.1)
where each trit⁴ represents the causal inference of a given time series causality tool for a given
time series pair .X; Y /, 0 for X ! Y , 1 for Y ! X, and 2 if there is no conclusion (e.g., if the
leaning is 0 within some expected error for all the tested cause-effect assignments and tolerance
domains). e vector gE may then be used to provide a concise summary of the exploratory causal
analysis (ECA) results, which will be called the ECA summary; i.e., if all gi D 0, then X ! Y
and if all gi D 1, then Y ! X. e inner product of gE with itself may be a simple test for an ECA
summary,
2 0 ECA summary is X ! Y
gE gE D jgj
E D (4.2)
anything else ECA summary is not defined or Y ! X
If gi D 1 8gi 2 gE , then the ECA summary would be Y ! X, which may make it tempting to
interpret jgjE 2 D n as implying Y ! X. It may be true, however, that jgj E 2 D n but gi ¤ 1 8gi 2
gE . As discussed in the previous section, n D 5 in all the examples in this work.
It should be emphasized that the ECA summary is neither the final product nor the only
conclusion that should be drawn from the exploratory causal analysis. e ECA summary will of-
ten be undefined in situations where a majority of g -trits in gE agree with each other. For example,
the transfer entropy, Granger causality, lagged cross-correlation, and PAI tools could all provide a
causal inference of Y ! X while the leaning fails to provide any causal inference because, e.g., the
cause-effect assignment is inappropriate for the system being studied. An undefined ECA sum-
mary does not immediately imply that the exploratory causal inference is inconclusive. Rather, it
only implies that each time series causality tool may need to be applied more carefully.
e automated generation of ECA summaries for a set of time series may be appealing in
the speed and ease with which a large number of causal inferences can be performed, but such
automated procedures should not be considered a substitution for a more complete exploratory
causal analysis. e ECA summary is part of exploratory causal analysis and, as discussed in
Section 1.3, no part of such analysis should be confused with causality as it is defined traditionally
in fields such as physics and philosophy.
In all the examples presented in this work, gE has five elements with g1 as the causal inference
implied by the JIDT transfer entropy, g2 as the causal inference implied by the MVGC Granger
causality log-likelihood test statistics, g3 as the causal inference implied by the PAI differences,
g4 as the causal inference implied by the weighted mean observed leanings averaged over all
the tested lags, and g5 as the causal inference implied by the lagged cross-correlation differences
averaged over all the tested lags.
where t D 0; 1; : : : ; L,
8
< 2 t D1
xt D A t 8 t 2 ft j t ¤ 1 and t mod 5 ¤ 0g
:
2 8 t 2 ft j t mod 5 D 0g
and
yt D xt 1 C B t
with y0 D 0, A; B 2 R 0 and t N .0; 1/. Specifically, consider A 2 Œ0; 1 and B 2 Œ0; 1.
e driving system X is a periodic impulse with a signal amplitude above the maximum noise
level of both the driving and the response systems, and the response system Y is a lagged version
of the driving signal with standard normal (i.e., N .0; 1/) noise of amplitude B applied at each
time step.
Let the instance of Eq. (4.3) with L D 500, A D 0:1, and B D 0:4 shown in Figure 4.1
be considered the synthetic data set .X; Y / upon which the exploratory causal analysis will be
performed. A preliminary visual inspection of Figure 4.1 shows X appears less noisy than Y (as
expected), but perhaps more importantly, both times series appear to have two major groupings
for their data, around 0 and 2 for X and Y , albeit with a wider spread in Y . is observation
is supported by the histograms of these data shown in Figure 4.2. is data set is synthetic, so
such observations may seem pointlessly obvious. However, if the data were not synthetic, then
these observations would be useful to setting the tolerance domains, ıx and ıy , for the leaning. It
has been shown that if A and B are known a priori, then the leaning will agree with intuition if
ıx D A and ıY D B [36]. It is assumed here, however, that A and B are unknown to the analyst.
From Figure 4.2, initial tolerance domains of ˙ıx D 0:5 and ˙ıy D 1 will be used for the leaning
calculations.
Autocorrelations in the data are useful to understand for both potential cause-effect as-
signments for the leaning calculations and potential issues in drawing causal inferences from the
50 4. EXPLORATORY CAUSAL ANALYSIS
2.5
1.5
x
t 1
0.5
−0.5
−1
0 100 200 300 400 500
t
(a) X
1
yt
−1
−2
0 100 200 300 400 500
t
(b) Y
Figure 4.1: An instance of Eq. (4.3) for L D 500, A D 0:1, and B D 0:4.
lagged cross-correlations. is data is synthetic, so it is known from Eq. (4.3) that a reasonable
cause-effect assignment for the leaning would be fC; Eg D fx t 1 ; y t g. e goal here is to show
how such potential cause-effect assignments can be drawn directly from the data as part as ECA.
Figure 4.3 shows strong autocorrelations for both X and Y at l D 6; 12; 18; : : : ; 48. e autocor-
relations appear cyclic, which implies the leaning and lagged cross-correlation time series causality
tools need not be calculated for more than l D 1; 2; : : : ; 6, which is the lag after which the auto-
correlations pattern seems to repeat. If l > 6 in these calculations, then the causal inference may
be strongly influenced by the autocorrelations in the data. e similar autocorrelations pattern
4.2. SYNTHETIC DATA EXAMPLES 51
100
80
60
Counts
40
20
0
−0.5 0 0.5 1 1.5 2
x t bins
(a) X
25
20
15
Counts
10
0
−2 −1 0 1 2 3
yt bins
(b) Y
Figure 4.2: Histograms of the instance of Eq. (4.3) shown in Figure 4.1.
seen in Figure 4.3 may imply a strong driving relationship within the time series pair (or perhaps
a strong shared driving relationship with some outside driver) but do not immediately suggest a
cause-effect assignment for the leaning calculation. e most straightforward cause-effect assign-
ment is the l -standard assignment, which will be used with l D 1; 2; : : : ; 6.
e leaning calculations using the l -standard cause-effect assignment and tolerance do-
mains suggested by Figures 4.3 and 4.2, respectively, fit naturally on a plot with the lagged
52 4. EXPLORATORY CAUSAL ANALYSIS
1
0.8
2 0.6
|r(xt−l ,xt )|
0.4
0.2
0
0 10 20 30 40 50
l
(a) X
0.7
0.6
0.5
2
|r(yt−l ,yt )|
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50
l
(b) Y
Figure 4.3: Autocorrelations of the instances of X and Y of Eq. (4.3) shown in Figure 4.1 given lags of
l D 1; 2; : : : ; 50. e autocorrelations are jr.b t l ; b t /j2 where r./ is the Pearson correlation coefficient
between the lagged l series fb t l g and the time series fb t g.
cross-correlations (because both depend on some lag l ). ese values are shown in Figure 4.4
for l D 1; 2; : : : ; 6. is figure shows both the weighted mean observed leaning, hl i, and the
lagged cross-correlation differences, l , which suggest the same causal inference for almost every
lag.⁵ e causal inferences for the leanings are Y ! X for l D 2; : : : ; 6 and X ! Y for l D 1
⁵hl i > 0 and l < 0 ) X ! Y and vice versa. See Sections 3.5 and 3.4.2 for more details.
4.2. SYNTHETIC DATA EXAMPLES 53
1
〈λ 〉
l
∆
l
0.5
−0.5
−1
1 2 3 4 5 6
l
Figure 4.4: Lagged cross-correlation differences, l , and weighted mean observed leanings, hl i,
(given an l -standard cause-effect assignment with ˙ıx D 0:5 and ˙ıy D 1) for the instances of
Eq. (4.3) shown in Figure 4.1 given lags of l D 1; 2; : : : ; 6.
with a maximum absolute value at l D 1, which implies X ! Y , and a mean across all the lags,
hhl iil D 6:6 10 3 , which also implies X ! Y . e causal inferences for the lagged cross-
correlation differences are Y ! X for l D 2; 3; 5; 6 and X ! Y for l D 1; 4 with a maximum ab-
solute value at l D 1, which implies X ! Y , and a mean across all the lags, hl il D 2:8 10 3 ,
which also implies X ! Y . Both the mean across all lags and the maximum absolute value for
both tools yield the same causal inference, and that causal inference agrees with intuition for
this example. However, the second largest value of both tools in Figure 4.4 suggests the opposite
(counter-intuitive) causal inference.
ese lags can be investigated further by examining the correlation plots, i.e., plots of
.y t l ; x t / (Figure 4.5) and .x t l ; y t / (Figure 4.6) for l D 1; 2; : : : ; 6.
ese figures show the data clusters around 0 and 2, as expected from Figure 4.2. A no-
table feature is the strong linear relationship seen in Figure 4.6a and Figure 4.5e. ese are the
relationships shown by the lagged cross-correlation differences (and the leaning) in Figure 4.4.
ese relationships are not unexpected given the structure of Eq. (4.3). e .y t ; x t 1 / relation-
ship is the explicit part of the system that accounts for, in large part, the intuitive causal structure.
e .x t ; y t 5 / relationship is a by-product of the structured pulse pattern created for X; i.e., if
the pulses of X were set to occur at a different interval, then it is expected that this .x t ; y t l /
relationship would appear for a lag l corresponding to that different interval.
e assumption, however, has been that the analyst doing the exploratory causal analysis
of this synthetic data does not know Eq. (4.3) a priori. e question is what causal inference
should such an analyst make for this example? It has already been shown that the causal inference
54 4. EXPLORATORY CAUSAL ANALYSIS
4 4 4
3 3 3
2 2 2
xt
xt
xt
1 1 1
0 0 0
−1 −1 −1
−2 −2 −2
−2 0 2 4 −2 0 2 4 −2 0 2 4
yt−1 yt−2 yt−3
4 4 4
3 3 3
2 2 2
xt
xt
xt
1 1 1
0 0 0
−1 −1 −1
−2 −2 −2
−2 0 2 4 −2 0 2 4 −2 0 2 4
yt−4 yt−5 yt−6
(d) l D 4 (e) l D 5 (f ) l D 6
Figure 4.5: Lagged cross-correlation plots, .y t l ; x t /, for the instance of Eq. (4.3) shown in Figure 4.1
given lags of l D 1; 2; : : : ; 6.
would be X ! Y (which agrees with intuition) if the analyst were to use the average across all
the calculated lags for either the leaning or the lagged cross-correlation difference, but the strong
counter inferences provided by l D 5 (both hl i and l imply Y ! X for l D 5) may be seen as a
concern that needs to be investigated further. e most desirable path forward may be to change
the impulse pattern of X (e.g., by changing the period of the impulse) and see if the leaning and
lagged cross-correlation differences change for either l D 1 (which implies the intuitive inference)
or l D 5 (which implies the counter-intuitive inference). As explained in the previous paragraph,
it is expected that doing so would change the result for l D 5 but may create a similar result for a
different lag, while the l D 1 is expected to remain unchanged. is type of experimental result
may be seen as strong evidence that the l D 1 results are the more correct causal inferences for
this example system, but such experiments may not be possible. e analyst may only have access
to the single instances of X and Y shown in Figure 4.1. Another possible approach is to calculate
the leaning using other tolerance domains.
4.2. SYNTHETIC DATA EXAMPLES 55
4 4 4
3 3 3
2 2 2
yt
yt
yt
1 1 1
0 0 0
−1 −1 −1
−2 −2 −2
−2 0 2 4 −2 0 2 4 −2 0 2 4
xt−1 xt−2 xt−3
4 4 4
3 3 3
2 2 2
yt
yt
yt
1 1 1
0 0 0
−1 −1 −1
−2 −2 −2
−2 0 2 4 −2 0 2 4 −2 0 2 4
xt−4 xt−5 xt−6
(d) l D 4 (e) l D 5 (f ) l D 6
Figure 4.6: Lagged cross-correlation plots, .x t l ; y t /, for the instance of Eq. (4.3) shown in Figure 4.1
given lags of l D 1; 2; : : : ; 6.
Consider the tolerance domains used in Figure 4.4, ˙ıx D 0:5 and ˙ıy D 1. ese do-
mains were found by visual inspection of Figures 4.1 and 4.2 (by estimating the widths of
the peaks around 0). A computational approach is to set the tolerance domains as ˙ıx D
f .max.X/ min.X// and ˙ıy D f .max.Y / min.Y // with f D 1=4 (see [36] for other ex-
amples of setting tolerances domains). is method for setting the tolerance domains will be
used often, so for brevity, it will be referred to as the f-width tolerance domains; i.e., this ex-
ample uses the .1=4/-width tolerance domains. Figure 4.7 shows the leaning calculations using
the l -standard assignment for l D 1 and l D 5 for different reasonable⁶ tolerance domains of
˙ıx 2 Œ0; 1 and ˙ıy 2 Œ0; 1 in steps of 0.05. is figure shows the causal inferences implied for
l D 1 and l D 5 for the method used in Figure 4.4 are consistently implied by the leaning calcula-
⁶“Reasonable” is defined in reference to Figure 4.2. For example, ˙ıx D 2 would lead to undefined leanings as the domain
would be large enough to include all possible values of x t and ˙ıx D 1:5 appears equivalent to ˙ıx D 1:0, so both such
tolerance domains are considered “unreasonable.”
56 4. EXPLORATORY CAUSAL ANALYSIS
λl λl
1 1 0.6
0.8
0.4
0.8 0.6 0.8
0.2
0.4
0.6 0.6 0
0.2
δx
δx
0 −0.2
0.4 0.4
−0.2 −0.4
(a) l D 1 (b) l D 5
Figure 4.7: Leaning calculations for the instance of Eq. (4.3) shown in Figure 4.1 using the l -standard
assignment for l D 1 and l D 5 and tolerance domains defined as ˙ıx 2 Œ0; 1 and ˙ıy 2 Œ0; 1 in steps
of 0.05.
tions using different tolerance domains. Figure 4.7 shows the leaning calculation changes sign for
small tolerance domains, as expected,⁷ but is consistently either positive (for l D 1) or negative
(for l D 5) as the tolerance domain increases. is behavior implies the signs of original leaning
calculations were not attributable to the specific tolerance domains used in those calculations.
For this example, the MVGC toolbox returns Granger causality log-likelihood statistics
of FY !X D 4:1 10 3 and FX !Y D 4:5 10 1 , or FX !Y FY !X 4:5 10 1 . e JIDT
transfer entropy calculation returns TX !Y D 6:0 10 1 and TY !X D 6:9 10 2 , or TX !Y
TY !X D 5:3 10 1 . Both of these results imply X ! Y , which agrees with intuition. e PAI
correlation difference is 8:3 10 3 , which also implies X ! Y . If the leaning and lagged cross-
correlation difference contributions to the ECA summary vector are defined as the mean across
E 2 D 0 ) X ! Y , which, again,
all the tested lags, then the ECA summary for this example is jgj
is the intuitively correct answer.
Figure 4.8 shows the ECA summary for different instances of Eq. (4.3) evaluated with
A 2 Œ0; 1 and B 2 Œ0; 1 in steps on 0.05 and L D 500. e leaning and lagged cross-correlation
difference causal inferences were made with the average across all the tested lags, and the leaning
calculations use the l -standard cause-effect assignment with l D 1; 2; : : : ; 6 (which are the same
lags used by the lagged cross-correlation differences). e tolerance domains are calculated as
˙ıx D f .max.X/ min.X// and ˙ıy D f .max.Y / min.Y // with f D 1=4. e maximum
value of jgj2 shown in Figure 4.8 is 1 which implies that the ECA summary never implies the
⁷See Section 3.5.
4.2. SYNTHETIC DATA EXAMPLES 57
1
0.8
0.6
B
0.4
0.2
0
0 0.5 1
A
Figure 4.8: ECA summaries for different instances of Eq. (4.3) evaluated with A 2 Œ0; 1 and B 2
Œ0; 1 in steps on 0.05 and L D 500 with l -standard assignment leaning calculations and lagged
cross-correlation differences given l D 1; 2; : : : ; 6 and leaning tolerance domains calculated as ˙ıx D
f .max.X/ min.X// and ˙ıy D f .max.Y / min.Y // with f D 1=4. e black indicates jgj2 D
0 ) X ! Y , as expected, and the white indicates jgj2 ¤ 0.
⁸e causal inference for g2 failed to be defined for approximately 20 points in Figure 4.8 because the MVGC toolbox failed
to successfully fit a VAR model with the desired maximum model orders for some of the points with low noise in the impulse
signal, e.g., A D 0:05. Every defined g2 , however, agreed with intuition, i.e., g2 D 0. Every calculated g1 and g4 agreed
with intuition.
58 4. EXPLORATORY CAUSAL ANALYSIS
the data in the lagged cross-correlation difference calculations. If, and how, such changes to the
calculations change the causal inferences may provide insight into the relationship between the
data sets. Such effort may seem inconsequential or trivial on a synthetic data set, but if an analyst
has very little knowledge of the system dynamics that generated a given time series pair, then
anything that may be drawn from exploratory causal analysis may be considered helpful.
where t D 0; 1; : : : ; L,
x t D a sin.bt C c/ C A t
and
yt D xt 1 C B t
with y0 D 0, A 2 Œ0; 1, B 2 Œ0; 1, t N .0; 1/, and with the amplitude a, the frequency b , and
the phase c all in the appropriate units. is example is very similar to the previous one, except
that the driving system X is sinusoidal.
Let the instance of Eq. (4.4) with L D 500, A D 0:1, B D 0:4, a D b D 1, and c D 0
shown in Figure 4.9 be considered the synthetic data set .X; Y / for the exploratory causal analysis.
e histograms of these data are shown in Figure 4.10. ese data have a wider spread than the
data seen in Figure 4.1, and do not appear to be clustered about some small set of points. is
observation is supported by Figure 4.10. e tolerance domains might be set by visual inspection
of the histograms, as was done in Section 4.2.1, but for this example, the leaning calculation will
use the .1=4/-width tolerance domains.
Figure 4.11 shows strong autocorrelations for both X and Y , and the autocorrelations ap-
pear cyclic. So, the leaning and lagged cross-correlation time series causality tools will not be cal-
culated for more than l D 1; 2; : : : ; 20, which is the lag after which the autocorrelations pattern
seems to repeat. e most straightforward cause-effect assignment is the l -standard assignment,
which will be used with l D 1; 2; : : : ; 20.
e leaning and lagged cross-correlation differences are shown in Figure 4.12 for l D
1; 2; : : : ; 20. is figure shows both the weighted mean observed leaning, hl i, and the lagged
cross-correlation differences, l , seem to suggest different causal inferences for almost every
third lag. e causal inference for both tools at l D 1 is the same, intuitively correct, inference
of X ! Y , and the mean across all the lags is hhl iil D 3:9 10 3 and hl il D 2:9 10 2 ,
which also imply X ! Y .
For this example, the MVGC toolbox returns Granger causality log-likelihood statistics
of FY !X D 2:8 10 2 and FX !Y D 2:4 10 1 , or FX !Y FY !X D 2:1 10 1 . e JIDT
transfer entropy calculation returns TX !Y D 7:6 10 1 and TY !X D 5:7 10 1 , or TX !Y
4.2. SYNTHETIC DATA EXAMPLES 59
xt 1
−1
−2
(a) X
1
yt
−1
−2
(b) Y
Figure 4.9: An instance of Eq. (4.4) for L D 500, A D 0:1, and B D 0:4.
10
8
Counts
6
0
−1.5 −1 −0.5 0 0.5 1 1.5
x t bins
(a) X
14
12
10
Counts
0
−2 −1 0 1 2 3
yt bins
(b) Y
Figure 4.10: Histograms of the instance of Eq. (4.4) shown in Figure 4.9.
the l -standard cause-effect assignment with l D 1; 2; : : : ; 20 (which are the same lags used by the
lagged cross-correlation differences). e tolerance domains used the .1=4/-width domains. e
maximum value of jgj2 shown in Figure 4.13 is 1, not 5, which suggests that the ECA summary
never implies the counter-intuitive causal inference of Y ! X for this example. However, a visual
comparison of Figures 4.8 and 4.13 shows the ECA summary agrees with intuition less reliably
for this example than for the example of Section 4.2.1. e majority of the elements of gE , i.e.,
4.2. SYNTHETIC DATA EXAMPLES 61
0.8
2
0.6
|r(xt−l ,xt )|
0.4
0.2
0
0 10 20 30 40 50
l
(a) X
0.7
0.6
0.5
2
,y )|
0.4
t−l t
|r(y
0.3
0.2
0.1
0
0 10 20 30 40 50
l
(b) Y
Figure 4.11: Autocorrelations of the instances of X and Y of Eq. (4.4) shown in Figure 4.9 given
lags of l D 1; 2; : : : ; 50. e autocorrelations are jr.b t l ; b t /j2 where r./ is the Pearson correlation
coefficient between the lagged l series fb t l g and the time series fb t g.
62 4. EXPLORATORY CAUSAL ANALYSIS
1
〈λ 〉
l
∆
l
0.5
−0.5
−1
0 5 10 15 20
l
Figure 4.12: Lagged cross-correlation differences, l , and weighted mean observed leanings, hl i,
(given an l -standard cause-effect assignment with .1=4/-width tolerance domains) for the instances
of Eq. (4.4) shown in Figure 4.9 given lags of l D 1; 2; : : : ; 20.
0.8
0.6
B
0.4
0.2
0
0 0.5 1
A
Figure 4.13: ECA summaries for different instances of Eq. (4.4) evaluated with A 2 Œ0; 1 and B 2
Œ0; 1 in steps on 0.05 and L D 250 with l -standard assignment leaning calculations and lagged cross-
correlation differences given l D 1; 2; : : : ; 20 and .1=4/-width tolerance domains. e black indicates
jgj2 D 0 ) X ! Y , as expected, and the white indicates jgj2 ¤ 0.
4.2. SYNTHETIC DATA EXAMPLES 63
g1 (transfer entropy), g2 (Granger), and g5 (lagged cross-correlation), imply the intuitive causal
inference of X ! Y for almost every point⁹ calculated in Figure 4.13. e ECA summary failed
to agree with intuition in this example for all the tested time series pairs because of counter-
intuitive inferences implied by g3 (PAI) and g4 (leaning), which differs from the example shown
in Section 4.2.1 where g4 implied the intuitive causal inference for every tested time series pair.
is observation helps illustrate the need for multiple types of tools in an exploratory causal
analysis; the naive application of some tools may be better suited to a given system than others,
and the use of different tools together can help guide the analyst in determining how to apply the
time series causality tools differently (e.g., by changing the cause-effect assignment of the leaning
or the model parameters of the Granger test statistic) for different systems. e inference implied
by g3 (PAI) is counter-intuitive as A increases, which is similar to the behavior seen for this tool
in Section 4.2.1. e inference implied by g4 (leaning) is counter-intuitive as B increases, which
is in contrast to the behavior of g3 (PAI) and implies the ECA summary may agree with intuition
for most of the test times series pairs if either g4 or g3 were removed from gE .
As with the example in Section 4.2.1, a more rigorous exploratory causal analysis of this
example might investigate further those points in Figure 4.13 for which the ECA summary does
not agree with intuition. It may be possible to change the PAI and weighted mean observed
leaning calculations, e.g., by using different embedding dimensions E and/or delay time steps
in the PAI calculations or tolerance domains in the leaning calculations.
fV ; Ig D ffV t g; fI t gg (4.6)
¹⁰is scenario leads to a division by zero. See steps 2 and 3 of the algorithm outlined in Section 3.3. e implementation of
the PAI algorithm used in this work declares the correlations undefined in such scenarios.
4.2. SYNTHETIC DATA EXAMPLES 65
0.15
0.1
0.05
It
−0.05
−0.1
0 20 40 60 80 100
t
0.05
It
−0.05
0 20 40 60 80 100
t
Figure 4.14: e voltage and current signals of a circuit containing a resistor R and inductor L in
series where the driving voltage is V .t / D sin.t / and the current response is given by the solution
to Eq. (4.5). Eq. (4.5) is solved both numerically (represented by the open dots in (a) and (b)) and
analytically (represented by the solid dots in (a) and (b)). e signals are plotted for a sampling length
of 8 seconds at a sampling interval of 10 1 (or a sampling frequency of approximately 3 Hz), which
corresponds to a time series length of 81 data points. (Continues.)
is example can also illustrate the importance of sample frequency and sample length.
For example, the leaning calculation requires an assumed cause and effect pair to appear in the
data enough times to provide reliable estimates of probabilities. us, data that is sampled for
too few periods or too sparsely may lead to leanings that do not agree with intuition. Consider
the signals shown in Figure 4.14 with L D 10 H and R D 5 ˝ sampled at the time steps t D
66 4. EXPLORATORY CAUSAL ANALYSIS
1.5
0.5
Vt
0
−0.5
−1
−1.5
0 20 40 60 80 100
t
(c) V
Figure 4.14: (Continued.) e voltage and current signals of a circuit containing a resistor R and
inductor L in series where the driving voltage is V .t / D sin.t/ and the current response is given by the
solution to Eq. (4.5). Eq. (4.5) is solved both numerically (represented by the open dots in (a) and (b))
and analytically (represented by the solid dots in (a) and (b)). e signals are plotted for a sampling
length of 8 seconds at a sampling interval of 10 1 (or a sampling frequency of approximately 3
Hz), which corresponds to a time series length of 81 data points.
8 8
6 6
Counts
Counts
4 4
2 2
0 0
−0.1 −0.05 0 0.05 0.1 0.15 −0.1 −0.05 0 0.05 0.1 0.15
It bins It bins
(a) I .L D 10H andR D 5˝/, analytical (b) I .L D 10H andR D 5˝/, numerical
12 8
10
6
Counts
Counts
6 4
4
2
2
0 0
−0.05 0 0.05 −0.1 −0.05 0 0.05 0.1 0.15
It bins It bins
12
10
8
Counts
0
−1 −0.5 0 0.5 1
Vt bins
(e) V
0.8
|r(It−l,It)|2
0.6
0.4
0.2
0
0 10 20 30 40 50
l
0.8
2
0.6
|r(It−l,It)|
0.4
0.2
0
0 10 20 30 40 50
l
Figure 4.16: Autocorrelations of the signals shown in Figure 4.14. e numerical solution for I is
represented by the open dots in (a) and (b) and the analytical solution is represented by the solid dots
in (a) and (b). (Continues.)
where t D 0; 1; : : : ; L,
x t D a sin.bt C c/ C A t
and
y t D Bx t 1 .1 C xt 1/ C D t ;
with y0 D 0, with A; B; C; D 2 Œ0; 1, t N .0; 1/, and with the amplitude a, the frequency b ,
and the phase c all in the appropriate units given t D 0; f ; 2f ; 3f ; : : : ; 6 with f D 1=30,
which implies L D 181.
4.2. SYNTHETIC DATA EXAMPLES 69
1
0.8
|r(Vt−l ,Vt )| 2
0.6
0.4
0.2
0
0 10 20 30 40 50
l
(c) V
Figure 4.16: (Continued.) Autocorrelations of the signals shown in Figure 4.14. e numerical solu-
tion for I is represented by the open dots in (a) and (b) and the analytical solution is represented by
the solid dots in (a) and (b).
Table 4.1: e ECA summary and vector, gE , for the signals shown in Figure 4.14 with L D 10 H and
R D 5 ˝ sampled at the time steps t D 0; f ; 2f ; 3f ; : : : ; 8
→
f g ECA summary
80-1 (0, 2, 2, 0, 0) undefined
60-1 (0, 2, 2, 0, 0) undefined
40-1 (0, 2, 2, 0, 0) undefined
20-1 (0, 2, 2, 0, 0) undefined
10-1 (0, 2, 2, 0, 0) undefined
5-1 (0, 2, 2, 0, 0) undefined
Let the instance of Eq. (4.8) with A D 0:1, B D 0:3, C D 0:4, D D 0:5, a D b D 1, and
c D 0 shown in Figure 4.18 be the time series pair .X; Y / for the exploratory causal analysis. e
tolerance domains for the leaning calculation are set as the .1=4/-width tolerance domains.
Figure 4.19 shows strong cyclic autocorrelations for X. e autocorrelations of Y are ap-
parently acyclic, which might make it difficult to set the number of leaning and lagged cross-
correlation difference lags to calculate based on Y in Figure 4.19. e leaning (using the l -standard
assignment) and lagged cross-correlation time series causality tools will not be calculated for more
than l D 1; 2; : : : ; 15, which is the lag after which the autocorrelations pattern seems to repeat
for X.
70 4. EXPLORATORY CAUSAL ANALYSIS
1 1
〈λ 〉 〈λ 〉
l l
∆ ∆
l l
0.5 0.5
0 0
−0.5 −0.5
−1 −1
1 1.5 2 2.5 3 3.5 4 4.5 5 1 1.5 2 2.5 3 3.5 4 4.5 5
l l
(a) I .L D 10H andR D 5˝/, analytical (b) I .L D 10H andR D 5˝/, numerical
0.6 0.6
〈λ 〉 〈λ 〉
l l
0.4 ∆
0.4 ∆
l l
0.2 0.2
0 0
−0.2 −0.2
−0.4 −0.4
−0.6 −0.6
−0.8 −0.8
1 1.5 2 2.5 3 3.5 4 4.5 5 1 1.5 2 2.5 3 3.5 4 4.5 5
l l
Figure 4.17: Lagged cross-correlation differences, l , and weighted mean observed leanings, hl i,
(given an l -standard cause-effect assignment with .1=4/-width tolerance domains) for the signals
shown in Figure 4.14 given lags of l D 1; 2; : : : ; 5.
e leaning and lagged cross-correlation differences are shown in Figure 4.20 for l D
1; 2; : : : ; 20. is figure shows both the weighted mean observed leaning, hl i, and the lagged
cross-correlation differences, l , seem to suggest the intuitive causal inference, X ! Y , for
the majority of the calculated lags. e mean across all the lags is hhl iil D 8:4 10 3 and
hl il D 6:8 10 2 , which both imply X ! Y .
For this example, the MVGC toolbox returns Granger causality log-likelihood statistics of
FX!Y FY !X D 2:6 10 1 . e JIDT transfer entropy calculation returns TX!Y TY !X D
2:7 10 1 . e PAI correlation difference is 1:8 10 3 , which also implies X ! Y . If the
leaning and lagged cross-correlation difference contributions to the ECA summary vector are
defined as the mean across all the tested lags, then the ECA summary for this example is jgjE2D
0 ) X ! Y . All of these results imply X ! Y , which agrees with intuition.
4.2. SYNTHETIC DATA EXAMPLES 71
xt 1
−1
−2
(a) X
1
yt
−1
−2
(b) Y
Figure 4.18: An instance of Eq. (4.8) for A D 0:1, B D 0:3, C D 0:4, D D 0:5, a D b D 1, and c D
0.
A natural question is whether the data series shown in Figure 4.18 is a unique instance of
Eq. (4.8). e synthetic data examples have, so far, interpreted the instances of Eq. (4.3), (4.4),
and (4.8) being explored as the only collection of data points potentially collected by an analyst.
However, the stochastic noise present in each of these examples raises the question of whether
or not the ECA summaries for these examples would have agreed with intuition for a different
instance of these systems; i.e., for instances of Eq. (4.3), (4.4), and (4.8) with the same coefficients
72 4. EXPLORATORY CAUSAL ANALYSIS
1
0.8
2
0.6
,x )|
t
t−l
|r(x
0.4
0.2
0
0 10 20 30 40 50
l
(a) X
0.1
0.08
2
0.06
,y )|
t
t−l
|r(y
0.04
0.02
0
0 10 20 30 40 50
l
(b) Y
Figure 4.19: Autocorrelations of the instances of X and Y of Eq. (4.8) shown in Figure 4.18 given
lags of l D 1; 2; : : : ; 50. e autocorrelations are jr.b t l ; b t /j2 where r./ is the Pearson correlation
coefficient between the lagged l series fb t l g and the time series fb t g.
as those used to create Figures 4.1, 4.9, and 4.18 but different realizations of the stochastic noise
terms t . In many physical systems, an analyst may be able to measure several instances of X
and Y . Exploratory causal analysis may be performed on the collection of times series pairs. For
example, given m sets of the pair .X; Y /, then an analyst may combine the time series in some way
4.2. SYNTHETIC DATA EXAMPLES 73
0.05
〈λ 〉
l
∆l
0
−0.05
−0.1
−0.15
0 5 10 15
l
Figure 4.20: Lagged cross-correlation differences, l , and weighted mean observed leanings, hl i,
(given an l -standard cause-effect assignment with .1=4/-width tolerance domains) for the instances
of Eq. (4.8) shown in Figure 4.18 given lags of l D 1; 2; : : : ; 15.
P
N D fxN t D m 1 m x t;i 8t D 1; 2; : : : ; Lg where Xi D fx t;i g is the i th measured instance
(e.g., X i
of X, which contains L data points) and form an ECA summary used the combined data series,
or an analyst may calculate m ECA summaries which may then be used for causal inference.
Consider 104 instances of Eq. (4.8) with the coefficients shown in Figure 4.18. Table 4.2
shows the ECA summary vector distribution for this collection of time series pairs. e individ-
Table 4.2: e ECA summary distribution of 104 instances of Eq. (4.8) with the coefficients shown
in Figure 4.18
→
g Counts ECA summary
(0, 0, 0, 0, 0) 5707 undefined
(0, 0, 0, 0, 1) 231 undefined
(0, 0, 0, 1, 0) 1608 undefined
(0, 0, 0, 1, 1) 2454 undefined
ual causal inferences g1 (transfer entropy), g2 (Granger),¹¹ and g3 (PAI) agree with intuition for
every instance. e two that imply counterintuitive causal inferences do so only for a minority
of the tested instances, approximately 41% of the instances for g4 (leaning) and approximately
¹¹It is interesting to note that the MVGC toolbox uses a linear VAR model fitting procedure that implies the intuitively correct
causal inference for every instance of this nonlinear example. is might be seen as evidence for using every available time
series causality tool during exploratory causal inference, even those with well documented shortcomings.
74 4. EXPLORATORY CAUSAL ANALYSIS
27% of the instances for g5 (lagged cross-correlation). e leaning calculation uses .1=4/-width
tolerance domains and both the leaning and the lagged cross-correlation differences are calcu-
lated as the mean of lags l D 1; 2; : : : ; 15. e average leaning, hhl ii, across all 104 instances is
hhl ii D 4 10 3 and the average lagged cross-correlation difference, hl i D 4:3 10 2 , both
of which imply the intuitive causal inference. A bootstrapping [218] procedure can be set up with
the sample of leaning and lagged cross-correlation calculations, whereby 106 means are calculated
from new sets (of the same size as the original set) of leanings and lagged cross-correlations that
have been sampled (with replacement) from the original set. is procedure yields no negative
means for the leaning calculation and no positive means for the lagged cross-correlation differ-
ences; the null hypothesis that the mean leaning value is negative (i.e., hhz1 ii < 0) and the null
hypothesis that the mean lagged cross-correlation difference is positive (i.e., hl i > 0) can be
rejected with a p -value less than 10 6 . e 90% confidence interval for the mean of the 106 boot-
strapped leaning calculation means is Œ3:7 10 3 ; 4:3 10 3 and for the lagged cross-correlation
means is Œ 4:4 10 2 ; 4:1 10 2 . ese results imply the intuitively correct causal inference
for both g4 and g5 .
It becomes more difficult to visualize the agreement between the ECA summary and in-
tuition for different sets of system parameters as the parameter space becomes larger. Eq. (4.8)
has a 4-dimensional parameter space in which such agreement may be explored, A, B , C , and D
where a, b , and c are held constant. One option for visualization may be plotting points within
the unit cube framed by B , C , and D for a given A where jgj2 D 0. Every point tested within the
4-D parameter space for which the ECA summary agrees with intuition should appear in one of
the units cubes. e number of plots, however, becomes unwieldy if A is sampled for more than
a few points on the unit domain, and the reader would rely on counting the number of points
on each plot and comparing it to the total number of tested points to determine how often the
ECA summary failed to agree with intuition. Rather than produce such a set of plots, the 4-D
parameter space will be sampled to provide descriptive statistics. e four parameters were sam-
pled in steps of 0.05 in the domains of A 2 Œ0:05; 0:55 and B; C; D 2 Œ0:05; 1:0 for a total of
88,000 instances of Eq. (4.8). e ECA summary agreed with intuition, i.e., jgj E 2 D 0, in 60,155
(68%) of those instances. Most of the time series causality tools that failed to imply the intuitive
causal inference did so by implying the counter-intuitive causal inference, i.e., gi ¤ 2 for any
i ¤ 2 of any of the ECA summary vectors. e only exception was g2 (Granger) which failed
to provide any causal inference if a VAR forecast model could not be fit to the data within the
requested maximum model order by the MVGC toolbox. e fewest counter-intuitive inferences
were implied by g1 (transfer entropy) with 381 or 0.43% of the total number of tested instances.
e majority (327 or 86%) of those 381 counter-intuitive implications occurred for instances with
C > 0:5 and/or D > 0:5. e majority (3,414 or 73%) of the 4,686 (5.3% of the total number
of tested instances) counter-intuitive inferences implied by g3 (PAI) occurred for instances with
B > 0:5 and/or C > 0:5. e highest number of counter-intuitive inferences were implied by
g4 (leaning) and g5 (lagged cross-correlation) with 19,971 and 14,051, or 23% and 16% of the
4.2. SYNTHETIC DATA EXAMPLES 75
total number of instances, respectively. e counter-intuitive inferences implied by both tools oc-
curred mostly (65% for g4 and 68% for g5 ) for instances with D > 0:5. e number of lags used
in the calculation of both the leaning and the lagged cross-correlation difference was fixed (with
l D 1; 2; : : : ; 15) for each tested instance of Eq. (4.8). More intuitive causal inferences may have
been implied by g4 and g5 if the number of lags used in those calculations was algorithmically
set, e.g., with autocorrelation lengths as was discussed in Section 4.2.3. e .1=4/-width tolerance
domains used in the leaning calculation may also have contributed to the counter-intuitive infer-
ences in some instances. is assumption might be checked by varying the tolerance domains in
the leaning calculation for every tested instance of Eq. (4.8) for which the .1=4/-width tolerance
domain calculations implied a counter-intuitive causal inference.
and
yt D yt 1 ry ry y t 1 ˇyx x t 1
where the parameters rx ; ry ; ˇxy ; ˇyx 2 R 0. is pair of equations is a specific form of the
two-dimensional coupled logistic map system often used to model population dynamics [219]
and it was a system used in the introduction of cross convergent mapping, CCM, which is a SSR
time series causality tool [30].
Sugihara et al. [30] note that ˇxy > ˇyx intuitively implies Y “drives” X more than X
“drives” Y , and vice versa. Such intuition, however, can be difficult to justify for all instances of
Eq. (4.9). e x t 1 term that appears in y t can be seen as a function of x t 2 with coefficients
of ˇyx rx . ese product coefficients suggest that if rx > ry , then X may be seen as the stronger
driver in the system even if ˇyx < ˇxy . e same argument can be made, with the appropriate
substitutions, to show that Y may be seen as the stronger driver in the system even if ˇxy < ˇyx .
As such, there is no clear intuitive causal inference for this system.
Consider the instance of Eq. (4.9) with L D 500, ˇxy D 0:5, ˇyx D 1:5, rx D 3:8, and
ry D 3:2 with initial conditions x0 D y0 D 0:4 shown in Figure 4.21. For this example, rx > ry ,
ˇyx > ˇxy , and the initial conditions are the same, which implies the intuitive causal inference for
this example is X ! Y . e histograms of these data are shown in Figure 4.22. For this example,
the leaning calculation will use the .1=4/-width tolerance domains.
Figure 4.23 shows strong autocorrelations for both X and Y for a lag of l D 1 and pro-
gressively weaker autocorrelations for l D 2; 3; 4 with minimal autocorrelations for l 5. e
autocorrelations do not appear cyclic. e argument was made in Section 4.2.1 that cyclic pat-
terns in the autocorrelations of either time series might be used to limit the number of lags for
76 4. EXPLORATORY CAUSAL ANALYSIS
1
0.8
0.6
xt
0.4
0.2
0
0 100 200 300 400 500
t
(a) X
0.6
0.5
0.4
yt
0.3
0.2
0.1
0 100 200 300 400 500
t
(b) Y
Figure 4.21: An instance of Eq. (4.9) for L D 500, ˇxy D 0:5, ˇyx D 1:5, rx D 3:8, and ry D 3:2
with initial conditions x0 D y0 D 0:4.
which the leanings and lagged cross-correlation differences are calculated, i.e., it was argued that a
repeating autocorrelation pattern after a given lag l may imply that the leaning or cross-correlation
calculations using lags greater than l would give redundant information for drawing causal in-
ferences. is argument is not applicable for this example given that there are no apparent cyclic
autocorrelation patterns. ere is no a priori reason to assume reasonable cause-effect assignments
4.2. SYNTHETIC DATA EXAMPLES 77
30
25
20
Counts
15
10
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
x t bins
(a) X
20
15
Counts
10
0
0.1 0.2 0.3 0.4 0.5 0.6
yt bins
(b) Y
Figure 4.22: Histograms of the instance of Eq. (4.9) shown in Figure 4.21.
for the leaning might be drawn from autocorrelation patterns (i.e., the relationship of a signal to
itself at different points in time is not necessarily related to the relationship between that signal
and its potential driving or response partner signal). However, visual inspection of the signals in
Figure 4.21 shows the signals have (roughly) similar shapes. So, it may be reasonable to assume
that if the signal X has a strong relationship with its own past at a given l , then, by a loose simi-
larity argument, the partner signal Y may also have a strong relationship with X at lag l . For this
78 4. EXPLORATORY CAUSAL ANALYSIS
0.5
0.4
2
0.3
|r(xt−l ,xt )|
0.2
0.1
0
0 10 20 30 40 50
l
(a) X
0.7
0.6
0.5
2
|r(yt−l ,yt )|
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50
l
(b) Y
Figure 4.23: Autocorrelations of the instances of X and Y of Eq. (4.9) shown in Figure 4.21 given
lags of l D 1; 2; : : : ; 50. e autocorrelations are jr.b t l ; b t /j2 where r./ is the Pearson correlation
coefficient between the lagged l series fb t l g and the time series fb t g.
4.2. SYNTHETIC DATA EXAMPLES 79
0.6
〈λ 〉
l
∆
0.4 l
0.2
−0.2
−0.4
1 1.5 2 2.5 3 3.5 4
l
Figure 4.24: Lagged cross-correlation differences, l , and weighted mean observed leanings, hl i,
(given an l -standard cause-effect assignment with .1=4/-width tolerance domains) for the instances
of Eq. (4.4) shown in Figure 4.21 given lags of l D 1; 2; : : : ; 4.
example, the leaning (with the l -standard assignment) and lagged cross-correlation time series
causality tools will be calculated for lags l D 1; 2; : : : ; 4, given the autocorrelations seem to de-
crease significantly for l 5. A cause-effect assignment must be made to calculate the leaning.
It will be shown that the leaning calculated using the l -standard assignment with D 1; 2; : : : ; 4
implies the same causal inference as the other time series causality tools used in this example.
e leaning and lagged cross-correlation differences are shown in Figure 4.24 for l D
1; 2; : : : ; 4. is figure shows both the weighted mean observed leaning, hl i, and the lagged
cross-correlation differences, l , imply the intuitively correct causal inference X ! Y . e mean
across all the lags is hhl iil D 2:7 10 1 and hl il D 2:6 10 1 , both of which imply X ! Y .
For this example, the MVGC toolbox returns Granger causality log-likelihood statistics of
FX!Y FY !X D 5:4 10 1 . e JIDT transfer entropy calculation returns TX!Y TY !X D
4:9 10 1 . e PAI correlation difference is 3:9 10 3 . If the leaning and lagged cross-
correlation difference contributions to the ECA summary vector are defined as the mean across
all the tested lags, then the ECA summary for this example is jgj E 2 D 0 ) X ! Y . All of these
results imply X ! Y , which agrees with intuition.
e parameter space of Eq. (4.9) provides an opportunity to ask many interesting ques-
tions about the exploratory causal analysis. For example, an instance of Eq. (4.9) with L D 500,
ˇxy D ˇyx D 1:0, rx D ry D 3:5 and x0 D y0 D 0:4 leads to an ECA summary vector of gE D
.2; 2; 2; 2; 2/ (given g4 [leaning] and g5 [lagged cross-correlation] calculated in the same fash-
ion as the previous example, i.e., Figure 4.18). e intuitive causal inference for this instance
of Eq. (4.8) is not obvious and the undefined ECA summary seems to imply this confusion
80 4. EXPLORATORY CAUSAL ANALYSIS
is not remedied by the exploratory causal analysis approach. Changing the system parameters
slightly to L D 500, ˇxy D ˇyx D 1:0, rx D 3:6, ry D 3:4 and x0 D y0 D 0:4 leads to an ECA
summary vector of gE D .1; 2; 1; 0; 0/. It may be argued that the intuitive causal inference in this
case is X ! Y because rx > ry with all the other systems parameters being equal. is intu-
ition, however, is not reflected by the ECA summary vector, where only two of the time series
causality tools imply that intuitive inference, g4 (leaning) and g5 (lagged cross-correlation). Con-
sider instead L D 500, ˇxy D 1:1, ˇyx D 0:9, rx D ry D 3:5 and x0 D y0 D 0:4, which leads to
an ECA summary vector of gE D .0; 2; 2; 0; 1/. e intuitive causal inference in this case might
be Y ! X because ˇxy > ˇyx with all the other systems parameters being equal, but only one
tool, g5 (lagged cross-correlation), implies this inference in the ECA summary vector. e ini-
tial conditions of the system also affect the ECA summary, e.g., L D 500, ˇxy D ˇyx D 1:0,
rx D ry D 3:5, x0 D 0:3 and y0 D 0:5 leads to gE D .1; 2; 2; 1; 0/. e ECA summary does, how-
ever, seem to agree with intuition when such intuitions are straightforward for a given instance
of Eq. (4.8); e.g., L D 500, ˇxy D ˇyx D 0:5, rx D 3:0 , ry D 3:8, and x0 D y0 D 0:4 leads to
gE D .1; 1; 0; 1; 1/ (which implies Y ! X, as expected, for all gi except g3 (PAI)) and L D 500,
ˇxy D 0:5, ˇyx D 2:0, rx D ry D 3:5, and x0 D y0 D 0:4 leads to gE D .0; 2; 2; 0; 0/ (which im-
plies X ! Y , as expected, for a majority of the gi ).
where t D 0; 1; : : : ; L,
8
< 2 t D1
xt D A t 8 t 2 ft j t ¤ 1 and t mod 5 ¤ 0g
:
2 8 t 2 ft j t mod 5 D 0g
and
yt D xt 1 C B t ;
and either (case 1)
zt D yt 1 (4.11)
or (case 2)
z t0 D y t 1 C yt D yt 1 C xt 1 C B t (4.12)
or (case 3)
z t00 D y t 1 C xt 1 C z t00 1 (4.13)
with y0 D 0, B 2 R 0, t N .0; 1/, and L D 500.
4.2. SYNTHETIC DATA EXAMPLES 81
In case 1, Z depends directly on Y and indirectly on X (through Y , which depends directly
on X). e intuitive causal inference is then Y ! Z and X ! Z. Case 2, despite the additional Y
dependence in Z, has the same intuitive causal inference as case 1. In case 3, Z depends directly
on itself and both Y and X. Case 3 also has the same intuitive causal inference.
Consider the instance of Eq. (4.3) with L D 500, A D 0:4, and B D 0:6 shown in Fig-
ures 4.25 and 4.26. e leaning calculation will use .1=4/-width tolerance domains.
1.5
0.5
t
x
−0.5
−1
−1.5
0 100 200 300 400 500
t
(a) X
2
yt
−1
−2
0 100 200 300 400 500
t
(b) Y
Figure 4.25: An instance of Eq. (4.10) (X and Y ) for L D 500, A D 0:4, and B D 0:6.
82 4. EXPLORATORY CAUSAL ANALYSIS
4
zt
1
−1
−2
0 100 200 300 400 500
t
(a) Z D fz t g (case 1)
2
z′t
−2
−4
0 100 200 300 400 500
t
(b) Z D fz t0 g (case 2)
Figure 4.26: An instance of Eq. (4.10) (Z)for L D 500, A D 0:4, and B D 0:6. (Continues.)
Histograms of these data are shown in Figure 4.27. Figure 4.28 shows strong autocorre-
lations for X, Y , and Z (for cases 1 and 2) at l D 6; 12; 18; : : : ; 48, which is expected given the
similar forms of Eq. (4.10) and (4.3) (see Section 4.2.1). e leaning (using the l -standard as-
signment) and lagged cross-correlation calculations will use lags l D 1; 2; : : : ; 6.
Figure 4.29 shows both the weighted mean observed leaning, hl i, and the lagged cross-
correlation differences, l , for each of the seven time series pairs in this example, .X; Y /, .X; Z/
(for all three cases), and .Y ; Z/ (for all three cases). Table 4.3 shows the values of each of the
five time series causality tools used in this exploratory causal analysis for each of the time series
pairs in this example with the mean leaning across all the lags labeled “L,” the mean lagged cross-
correlation difference labeled “LCC,” Granger causality log-likelihood statistics difference (i.e.,
4.2. SYNTHETIC DATA EXAMPLES 83
350
300
250
200
z′′t
150
100
50
0
0 100 200 300 400 500
t
Figure 4.26: (Continued.) An instance of Eq. (4.10) (Z)for L D 500, A D 0:4, and B D 0:6.
FX!Y FY !X , calculated by the MVGC toolbox) labeled “GC,” the transfer entropy difference
(i.e., TX!Y TY !X , calculated with the JIDT) labeled “TE,” and the PAI correlation difference
labeled “PAI.” Table 4.3 also shows the ECA summary and vector for each pair.
Table 4.3 shows the individual values for each tool used during the exploratory analysis
along with the ECA summary vector. Table 4.4 shows how these ECA summaries compare both
the intuitive causal inferences for each case and to the majority-vote inference, i.e, the causal
inference implied by the majority of the time series causality tools used during the analysis. e
ECA summary of X ! Y , and its agreement with the intuitive inference, may have been expected
given the similarity of this example to the one discussed in Section 4.2.1, but the only two other
ECA summaries that agree with the intuitive inferences are for case 1 and 2 of the pair .X; Z/. e
ECA summary is undefined for every other time series pair. e ECA summary was undefined for
case 1 and 2 of the pair .Y ; Z/ because g2 (Granger) was undefined. In these scenarios, along with
case 3 of both pairs, the MVGC toolbox failed to fit a VAR model to the data with the maximum
request model parameters and/or within the maximum allotted computation time. However, the
majority-vote inference agrees with intuition for case 1 and 2 of the pair .Y ; Z/. Interestingly, the
majority-vote inference for case 3 of both time series pairs is counter-intuitive. Only g4 (leaning)
agrees with intuition for the case 3 pairs. ese counter-intuitive majority-vote inferences may
imply case 3 of Z has some property that makes time series causality particularly unreliable. Case
3 of Z is, e.g., apparently non-stationary.¹² e autoregressive term in z t00 of Eq. (4.10) is unique
among all the cases of Z in this example. ese properties may lead to unreliable exploratory
causal analysis with the time series causality tools being used in this work. is idea might be
¹²e phrase “apparently” is used here to indicate that the approximate stationarity of the data is drawn from visual inspections
of Figures 4.25 and 4.26 rather than formal tests.
84 4. EXPLORATORY CAUSAL ANALYSIS
20
100
80 15
Counts
Counts
60
10
40
5
20
0 0
−1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1 0 1 2 3 4
xt bins yt bins
(a) X (b) Y
20 14
12
15
10
Counts
Counts
8
10
6
5 4
2
0 0
−2 −1 0 1 2 3 4 −3 −2 −1 0 1 2 3 4 5
zt bins z′t bins
20
15
Counts
10
0
0 50 100 150 200 250 300 350
z′′t bins
Figure 4.27: Histograms of the instance of Eq. (4.10) shown in Figures 4.25 and 4.26.
4.2. SYNTHETIC DATA EXAMPLES 85
0.8 0.35
0.3
0.6
|r(x t−l,xt )|2
0.25
|r(yt−l,yt )|2
0.2
0.4
0.15
0.2 0.1
0.05
0 0
0 10 20 30 40 50 0 10 20 30 40 50
l l
(a) X (b) Y
0.35 0.35
0.3 |r(z′ t−l ′,z t )|2 0.3
0.25 0.25
|r(zt−l,z t )|2
0.2 0.2
0.15 0.15
0.1 0.1
0.05 0.05
0 0
0 10 20 30 40 50 0 10 20 30 40 50
l l
1.001
1
|r(z ′′t−l,z ′′t)|2
0.999
0.998
0.997
0.996
0.995
0 10 20 30 40 50
l
Figure 4.28: Autocorrelations of the instances of X, Y , and Z of Eq. (4.10) shown in Figures 4.25
and 4.26 given lags of l D 1; 2; : : : ; 50. e autocorrelations are jr.b t l ; b t /j2 where r./ is the Pearson
correlation coefficient between the lagged l series fb t l g and the time series fb t g.
86
Table 4.3: e values of each of the five time series causality tools used in this exploratory causal analysis for each of the time
series pairs shown in Figures 4.25 and 4.26 with the mean leaning across all the lags labeled “L,” the mean lagged cross-correlation
difference labeled “LCC,” Granger causality log-likelihood statistics difference (i.e., FX!Y FY !X , calculated by the MVGC
toolbox) labeled “GC,” the transfer entropy difference (i.e., TX !Y TY !X , calculated with the JIDT) labeled “TE,” and the PAI
correlation difference labeled “PAI.” An entry of “n/a” in the GC column indicates the MVGC toolbox failed to fit a VAR model
to the data with the maximum request model parameters and/or within the maximum allotted computation time.
→
4. EXPLORATORY CAUSAL ANALYSIS
0.6 0.6
〈λ 〉 〈λ 〉
l l
0.4 ∆ 0.4 ∆
l l
0.2 0.2
0 0
−0.2 −0.2
−0.4 −0.4
−0.6 −0.6
−0.8 −0.8
1 2 3 4 5 6 1 2 3 4 5 6
l l
x 10−3
0.3 8
〈λ 〉 〈λ 〉
l l
0.2 ∆ 6 ∆l
l
0.1 4
0
2
−0.1
0
−0.2
−0.3 −2
−0.4 −4
1 2 3 4 5 6 1 2 3 4 5 6
l l
Figure 4.29: Lagged cross-correlation differences, l , and weighted mean observed leanings, hl i,
(given an l -standard cause-effect assignment with .1=4/-width tolerance domains) for the instances
of Eq. (4.10) shown in Figures 4.25 and 4.26 given lags of l D 1; 2; : : : ; 6. (Continues.)
explored by focusing on more synthetic data examples with response signals similar to case 3 of
Z, i.e., with autoregessive terms and/or non-stationary. In this example, however, the counter-
intuitive inferences illustrate that exploratory causal analysis inferences may be unreliable in the
presence of confounding. Although, if an analyst only had access to Y and Z in this example, then
the majority-vote inferences would still imply the intuitive causal inference in every case except
case 3, as seen in Table 4.4.
0.5 0
0 −0.2
−0.5 −0.4
−1
1 2 3 4 5 6 −0.61 2 3 4 5 6
l l
−3
x 10
12
〈λ 〉
l
10 ∆l
8
6
4
2
0
−2
1 2 3 4 5 6
l
(g) .Y ; Z/ (case 3)
Figure 4.29: (Continued.) Lagged cross-correlation differences, l , and weighted mean observed
leanings, hl i, (given an l -standard cause-effect assignment with .1=4/-width tolerance domains) for
the instances of Eq. (4.10) shown in Figures 4.25 and 4.26 given lags of l D 1; 2; : : : ; 6.
30
20
10
t
0
x
−10
−20
−30
0 1000 2000 3000 4000 5000 6000 7000 8000
t
120
100
80
t
y
60
40
20
0
0 1000 2000 3000 4000 5000 6000 7000 8000
t
Figure 4.30: Daily snowfall (the expected response) and mean temperature (the expected driver) from
July 1, 1972, to December 31, 2009, at Whistler, BC, Canada (Latitude: 50ı 040 04.00000 N, Longitude:
122ı 560 50.00000 W, Elevation: 1835.00 meters).
90 4. EXPLORATORY CAUSAL ANALYSIS
Table 4.4: Comparisons of the ECA summaries for each of the time series pairs shown in Figures 4.25
and 4.26 with the intuitive inferences and the “majority-vote” inference, i.e., the causal inference im-
plied by the majority of the times series causality tools used during the exploratory causal analysis
Figure 4.32 shows the autocorrelations for the data shown Figure 4.30, and Figure 4.31
shows the 100-bin histograms of the data. e autocorrelations do not appear cyclic (within the 50
lags that were calculated) but, for the snowfall time series (i.e., Y ), the autocorrelations approach
zero for l > 20. is observation will be used to set the lags for the leaning (with the l -standard
assignment) and lagged cross-correlation calculations as l D 1; 2; : : : ; 20. e tolerance domains
for the leaning calculation will be the .1=4/-width domains.
e mean leaning across all the lags, hhl iil D 3:7 10 2 , which implies the intuitive
causal inference. e mean lagged cross-correlation across all the lags, hl il D 2:3 10 2 , which
implies the counter-intuitive causal inference. e MVGC toolbox returns Granger causality log-
likelihood statistics of FX !Y FY !X D 2:6 10 3 , which also implies the counter-intuitive
causal inference. e JIDT transfer entropy calculation returns TX !Y TY !X D 2:1 10 2 ,
and the PAI correlation difference¹³ is 3:4 10 2 , both of which imply the intuitive causal in-
ference. If the leaning and lagged cross-correlation difference contributions to the ECA summary
vector are defined as the mean across all the tested lags, then the ECA summary vector for this
example is gE D .0; 1; 0; 0; 1/, which implies the ECA summary is undefined. e majority-vote
causal inference, however, implies X ! Y , which agrees with intuition.
is example again illustrates the benefit of using multiple time series causality tools.
Consider g4 (leaning), which implied an intuitive causal inference, and g5 (cross-correlation),
which implied a counter-intuitive causal inference. If those tools are calculated as the means
across all tested lags with l D 1; 2; : : : ; lmax , then as seen above, lmax D 20 ) g4 D 0; g5 D 1.
¹³For this example, the embedding dimension was set as E D 100, rather than E D 3, which was the embedding dimension
in every previous example. e larger embedding dimension was not set with any formal procedure (see, e.g., [159–161, 221,
222]). Instead, the PAI correlation difference was calculated with E D 100 once it was discovered that the algorithm failed
with E D 3 (See footnote in Section 4.2.3 for a discussion of failures of the PAI algorithm). e time delay used in this
example was the same as every other example, i.e., D 1.
4.3. EMPIRICAL DATA EXAMPLES 91
350
300
250
Counts
200
150
100
50
0
−30 −20 −10 0 10 20 30
xt bins
5000
4000
3000
Counts
2000
1000
0
0 20 40 60 80 100 120
y bins
t
0.8
2
0.6
,x )|
t
t−l
|r(x
0.4
0.2
0
0 10 20 30 40 50
l
(a) X
0.14
0.12
0.1
2
|r(yt−l ,yt )|
0.08
0.06
0.04
0.02
0
0 10 20 30 40 50
l
(b) Y
Figure 4.32: Autocorrelations of the data shown in Figure 4.30 given lags of l D 1; 2; : : : ; 50. e
autocorrelations are jr.b t l ; b t /j2 where r./ is the Pearson correlation coefficient between the lagged
l series fb t l g and the time series fb t g.
4.3. EMPIRICAL DATA EXAMPLES 93
It is also true that any lmax D 2; 3; : : : ; 20 ) g4 D 0; g5 D 1. e lagged cross-correlation dif-
ference with the maximum absolute value across all the tested lags is positive, which also im-
plies the counter-intuitive inference. e majority of the lagged cross-correlation differences im-
plied the counter-intuitive inference (16 of the 20 calculated lags). So, it seems most reason-
able approaches to determine g5 from the set of lagged cross-correlation differences would lead
to g5 D 1, which does not agree with intuition. Likewise, g2 (Granger) implies the counter-
intuitive causal inference; i.e., g2 D 1. It may be argued that the MVGC log-likelihood of
FX!Y FY !X D 2:6 10 3 is better interpreted as g2 D 2 because the value is two orders
of magnitude closer to zero than any other MVGC log-likelihood differences calculated in this
work. But, neither conclusion, g2 D 1 or g2 D 2, agrees with the intuitive causal inference. e
use of either of these time series causality tools alone (i.e., g2 or g5 ) may lead an analyst to draw
a counter-intuitive causal inference. e use of these tools as part of a set, however, allows the
analyst to compare and contrast the different tools. e analyst may decide to use a majority-vote
inference, which agrees with intuition for this example, or an analyst may use outside assumptions
to determine certain tools, e.g., g5 or g2 , are unreliable for the type of data being analyzed. At
a minimum, the different implications of the tools strongly suggests the analyst should consider
applying each tool more carefully, e.g., by changing the model parameters for g2 .
60
40
20
0
B [nT]
z
-20
-40
-60
-80
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
t [hours since 01-Jan-1963 00:30:00 UTC] ×105
(a) B
100
1100
-200
Dst
-300
-400
-500
-600
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
t [hours since 01-Jan-1963 00:30:00 UTC] ×105
(b) D
Figure 4.33: Hourly measurements of the magnetic field component, Bz , and disturbance storm time
index, Dst from the beginning of January 1, 1963, until the end of December 31, 2012, taken from
the NASA OMNI data set.
4.3. EMPIRICAL DATA EXAMPLES 95
12000
10000
8000
Counts
6000
4000
2000
0
−80 −60 −40 −20 0 20 40 60
B z bins
(a) B
12000
10000
8000
Counts
6000
4000
2000
0
−600 −500 −400 −300 −200 −100 0 100
D st bins
(b) D
-10
-20
-30
B [nT]
¢
-40
-50
-60
-70
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
t [hours since 01-Jan-1963 00:30:00 UTC] ×105
5
x 10
2
1.5
Counts
0.5
0
−70 −60 −50 −40 −30 −20 −10 0 10
B ′z bins
(b) Br (histogram)
Figure 4.35: Hourly measurements of the rectified magnetic field, Bs from the beginning of January
1, 1963, until the end of December 31, 2012, and the 1000-bin histogram of that data.
4.3. EMPIRICAL DATA EXAMPLES 97
set). Both of these properties are practical concerns for the data analyst. e length may make the
calculation of certain time series causality tools computationally infeasible, and the missing data
may lead to erroneous results, if the algorithm used to calculate a given time series causality tool
can even handle missing data. e magnetic field data, B, contains 131,928 missing data points,
and the storm index, Dst , contains 7,736 missing data points. Some, but not all, of the missing
data coincide within the time series pair. Creating a new time series from the defined points of
B and D will lead to nonuniform time intervals for subsequent time steps in the series, which
will make causal interpretations more difficult.¹⁵ e length of the time series may be addressed
by only considering contiguous subsets of the series, but there is a risk that data associated with
interesting physical phenomena captured during one subset may not be present in another. is
work will address these practical issues with two different approaches, averaging and sampling.
ere is no missing data in D between 00:30:00 January 1, 1997 UTC and 23:30:00 De-
cember 31, 2001 UTC. Let D N,BN , and BN r be the daily (i.e., 24 hour) averages of D , B, and Br ,
respectively, within this time period, where a daily average is the arithmetic mean of 24 hours of
data ignoring any missing values. Each of the new time series, D N,B N , and BN r , contains 1,826 data
points with no missing points. Figures 4.36, 4.37, and 4.38 show both the times series and the
100-bin histograms of these averaged data sets.
e autocorrelations of D N and B N are shown in Figure 4.39, from which the lags for the
leaning (using the l -standard cause-effect assignment) and lagged cross-correlation difference
calculations are set as l D 1; 2; : : : ; 30. e leaning calculation will use the .1=4/ tolerance do-
mains. e PAI correlation differences are calculated with E D 100 and D 1. ECA summary
vectors can be constructed for the two time series pairs .B; N D/
N and .B N r ; D/
N , which are shown
in Tables 4.6 and 4.5. e majority-vote inference agrees with intuition for both pairs, but the
ECA summary only agrees with intuition for the pair .BN r ; D/ N . e ECA summary for the pair
N N
.B; D/ is undefined because g2 (Granger) implies B N N
D while every other time series causality
tool implies the intuitively correct BN ! D N . e question of whether or not g2 might imply the
intuitive causal inference for different VAR modeling parameters was not explored in this work.
Another approach to the practical issues discussed above is to sample smaller contigu-
ous subsets of D , B, and Br . Let D O L D ffDst .t 0 /g, B O L D ffBz .t 0 /g, and B O rL D ffBz0 .t 0 /g with
t 0 D t00 ; t10 ; t20 ; : : : ; L be an ordered subset of D , B, and Br . An ECA summary vector could be
constructed for each sampled pair .BO L ; D O L / and .BO rL ; D
O L /. A set of n sampled pairs, each with a
0
different t0 , would produce a set of n values for each time series causality tool. ese sets of values
can then be used to develop a mean ECA summary vector to draw causal inferences.
Let L D 500 and n D 104 . e starting points for each time series are sampled from a
uniform distribution over Œ0; N L. e leaning calculation uses the .1=4/-width tolerance do-
mains and the l -standard assignment where l is set as l D 1; : : : ; la , where la is the lag for which
¹⁵Consider the leaning calculated using the 1-standard cause-effect assignment for a given time series pair .A; B/. If the time
intervals of the time steps of A and B are not uniform, an analyst must be certain to understand the leaning is comparing
time steps, not physical time intervals. us, fC; E g D fa t 1 ; b t g would not be an assumption of, e.g., an hour in the past
of A drives the present of B as it would be if t D tn tn 1 D 1 hour 8tn in A and B.
98 4. EXPLORATORY CAUSAL ANALYSIS
15
10
5
Bz [nT]
−5
−10
−15
0 500 1000 1500 2000
t [days since 01−Jan−1997]
(a) BN
50
−50
D st
−100
−150
−200
−250
0 500 1000 1500 2000
t [days since 01−Jan−1997]
N
(b) D
Figure 4.36: Daily averages of the hourly measurements of the magnetic field, Bz , and disturbance
storm time index, Dst from the beginning of January 1, 1997, until the end of December 31, 2001,
taken from the NASA OMNI data set.
4.3. EMPIRICAL DATA EXAMPLES 99
200
150
Counts
100
50
0
−15 −10 −5 0 5 10 15
B z bins
(a) BN
140
120
100
Counts
80
60
40
20
0
−250 −200 −150 −100 −50 0 50
Dst bins
N
(b) D
−2
−4
B [nT]
−6
z
′
−8
−10
−12
−14
0 500 1000 1500 2000
t [days since 01−Jan−1997]
1000
800
600
Counts
400
200
0
−14 −12 −10 −8 −6 −4 −2 0
′
Bz bins
(b) Br (histogram)
Figure 4.38: Daily averages of the hourly measurements of the rectified magnetic field, Bs from the
beginning of January 1, 1997, until the end of December 31, 2001, and the 100-bin histogram of that
data.
4.3. EMPIRICAL DATA EXAMPLES 101
0.01
0.008
2
,B )|
0.006
t
t−l
|r(B
0.004
0.002
0
0 20 40 60 80 100
l
(a) BN
0.4
0.3
2
|r(Dt−l ,Dt )|
0.2
0.1
0
0 20 40 60 80 100
l
N
(b) D
Figure 4.39: Autocorrelations of the data shown in Figure 4.36 given lags of l D 1; 2; : : : ; 100. e
autocorrelations are jr.b t l ; b t /j2 where r./ is the Pearson correlation coefficient between the lagged
l series fb t l g and the time series fb t g.
102 4. EXPLORATORY CAUSAL ANALYSIS
Table 4.5: Comparisons of the ECA summaries for each of the time series pairs shown in Figures 4.36
and 4.38 with the intuitive inferences and the “majority-vote” inference, i.e., the causal inference im-
plied by the majority of the times series causality tools used during the exploratory causal analysis
the lagged autocorrelation of D O L is minimum in a set of autocorrelations calculated with lags from
1 to L=2. e same l used in the leaning calculation will be used in the lagged cross-correlation
difference calculations. e PAI correlation differences are calculated with E D 50 and D 1.
Table 4.11 shows the mean value across the n calculated values of each time series causality tools,
and Table 4.7 shows a comparison of the implied causal inferences of these mean values and the
intuitive inferences. Table 4.9 shows the 90% confidence intervals across the n calculated values of
each time series causality tools, and Table 4.8 shows a comparison of the implied causal inferences
of these confidence intervals and the intuitive inferences. e 90% confidence interval is defined
as Œp5 ; p95 where pi is the i th percentile of the data; i.e., the lower bound of the 90% confidence
interval is the value that is above 5% of the data and the upper bound of the 90% confidence
interval is the value that is above 95% of the data. A bootstrapping [218] procedure can be set
up with the sample of n values for each of the time series causality tool calculations, whereby 105
means are calculated from new sets (of the same size as the original set) of values that have been
sampled (with replacement) from the original set. e 90% confidence interval for the mean of
these bootstrapped samples, and the comparisons of the implied causal inference with intuition,
are shown in Tables 4.12 and 4.10.
is sampling procedure resulted in five sets of 3,025 values, one set for each of the time
series causality tool calculations. Only approximately 3 103 of the approximately 104 sampled
times series subsets D O L, BO L , and B O rL (with L D 500) contained no missing data. ese were the
only subsets for which the time series causality tools were calculated. e means of these sets,
shown in Table 4.11, lead to undefined ECA summaries but majority-vote inferences that agree
with intuition for both pairs. e 90% confidence intervals of these sets, however, imply neither
the intuitive nor the counter-intuitive causal inference, as shown in Table 4.9. e bootstrapped
90% confidence intervals, shown in Table 4.12, imply ECA summary vectors that agree with
Table 4.11. In both cases, g1 (transfer entropy) and g3 (PAI) both imply the counter-intuitive
causal inference, while every other element of the ECA summary vector implies the intuitive
causal inference.
e exploratory causal analysis of this data, like any exploratory analysis [4], is dependent
on the framework within which the causal inferences are drawn. e inferences of Table 4.6
Table 4.6: e values of each of the five time series causality tools used in the exploratory causal analysis for each of the time
series pairs shown in Figures 4.36 and 4.38 with the mean leaning across all the lags labeled “L,” the mean lagged cross-correlation
difference labeled “LCC,” Granger causality log-likelihood statistics difference (i.e., FX!Y FY !X , calculated by the MVGC
toolbox) labeled “GC,” the transfer entropy difference (i.e., TX !Y TY !X , calculated with the JIDT) labeled “TE,” and the PAI
correlation difference labeled “PAI.” An entry of “n/a” in the GC column indicates the MVGC toolbox failed to fit a VAR model
to the data with the maximum request model parameters and/or within the maximum allotted computation time.
→
TE GC PAI L LCC g ECA
summary
(B, D) 3.6 × 10-3 -4.6 × 10-3 -1.2 × 10-1 5.0 × 10-3 -1.2 × 10-2 (0, 1, 0, 0, 0)
(Br, D) 2.7 × 10-2 4.6 × 10-3 -1.3 × 10-1 7.6 × 10-3 -2.5 × 10-2 (0, 0, 0, 0, 0) (Br, → D)
4.3. EMPIRICAL DATA EXAMPLES
103
104 4. EXPLORATORY CAUSAL ANALYSIS
Table 4.7: Comparisons of the ECA summaries for each of the time series pairs shown in Table 4.11
with the intuitive inferences and the “majority-vote” inference, i.e., the causal inference implied by the
majority of the times series causality tools used during the exploratory causal analysis
Table 4.8: Comparisons of the ECA summaries for each of the time series pairs shown in Table 4.9
Table 4.9: e 90% confidence intervals of the average values shown in Table 4.11
TE GC PAI L
ˆ L, D
(B ˆ L) [-3.0, -2.9] × 10-2 [1.9, 2.0] × 10-1 [1.2, 1.2] × 10-1 [1.2, 1.2] × 10-1
(B ˆ L)
ˆ Lr , D [-5.5, -5.2] × 10-3 [2.2, 2.2] × 10-1 [9.8, 9.9] × 10-2 [1.1, 1.1] × 10-1
→
LCC g ECA summary
ˆ L, D
(B ˆ L) [-1.6, -1.6] × 10-1 (1, 0, 1, 0, 0) undefined
(B ˆ L)
ˆ Lr , D [-1.9, -1.9] × 10-1 (1, 0, 1, 0, 0) undefined
Table 4.10: Comparisons of the ECA summaries for each of the time series pairs shown in Table 4.12
were drawn from an averaged times series subset of the available data. e averaging procedure
implies the majority-inference of BN ! DN or the ECA summary of B Nr ! D N are statements of
daily averages only during the time period of 1997 to 2001. ese are not statements about the
hourly data during the available time period of 1963 to 2012. It may be that the five years used
during the averaging procedure are not representative of other five-year subsets of the 50 years of
Table 4.11: e average values of each of the five time series causality tools used in this exploratory causal analysis for each of the
time series pairs .BO L ; D
O L / and .B
O rL ; D
O L / with the mean leaning across all the lags labeled “L,” the mean lagged cross-correlation
difference labeled “LCC,” Granger causality log-likelihood statistics difference (i.e., FX!Y FY !X , calculated by the MVGC
toolbox) labeled “GC,” the transfer entropy difference (i.e., TX !Y TY !X , calculated with the JIDT) labeled “TE,” and the PAI
correlation difference labeled “PAI.” An entry of “n/a” in the GC column indicates the MVGC toolbox failed to fit a VAR model
to the data with the maximum request model parameters and/or within the maximum allotted computation time.
-
-
→
TE GC PAI L LCC g ECA
summary
ˆ L)
ˆ L, D
(B -2.9 × 10-2 1.9 × 10-1 1.2 × 10-1 1.2 × 10-1 1.2 × 10-1 (1, 0, 1, 0, 0)
ˆ L, D
(B r
ˆ L) -5.3 × 10-3 2.2 × 10-1 9.8 × 10-2 1.1 × 10-1 1.1 × 10-1 (1, 0, 1, 0, 0)
4.3. EMPIRICAL DATA EXAMPLES
105
106 4. EXPLORATORY CAUSAL ANALYSIS
Table 4.12: e 90% confidence intervals for the set of 105 bootstrap calculations of the means shown
in Table 4.11
TE GC PAI L
(Bˆ L, Dˆ L) [-9.7, -2.9] × 10-2 [7.8 × 10-2, 3.5 × 10-1] [4.1 × 10-2, 2.1 × 10-1] [-8.3 × 10-3, 3.4 × 10-1]
(Bˆ Lr , D
ˆ L) [-6.5, -5.5] × 10-2 [6.8 × 10-2, 4.2 × 10-1] [1.8 × 10-2, 1.8 × 10-1] [-9.2 × 10-3, 3.6 × 10-1]
→
LCC g ECA summary
ˆ L ˆ
(B , D ) [-2.6 × 10-1, -3.9 × 10-2]
L (2, 0, 1, 2, 0) undefined
(Bˆ Lr , D
ˆ L) [-2.9 × 10-1, -7.8 × 10-2] (2, 0, 1, 2, 0) undefined
available data. Trends in the hourly time series may not be present in the daily time series. For
example, g1 (transfer entropy) and g3 (PAI) imply the intuitive causal inference for the daily time
series (Table 4.6) and the counter-intuitive causal inference for the sampled hourly times series
(Table 4.11). However, the majority-vote inferences for both the daily and sampled hourly time
series pairs agree with intuition. e sampling procedure sampled times series of length L D 500,
which corresponds to approximately 20 days of hourly data. It follows that the majority-vote
inferences of BO rL ! D O L and B O rL ! D
O L are not statements about the time series pairs .B; D/ and
.Br ; D/. e causal inferences drawn from the 20-day sampled times series may be the same causal
inference that would be drawn from an exploratory causal analysis of the entire time series pairs
.B; D/ and .Br ; D/ but such assumptions would need theoretical (or analytical) support that has
not been explored in this example. For example, if the sampling procedure is performed again but
with L D 1000 (i.e., approximately 40 days of data), then the ECA vector¹⁶ for the pair .BO L ; D O L/
is gE D .1; 0; 1; 0; 0/, which is the same as the L D 500 case. However, the ECA vector for the pair
.BO rL ; D
O L / is gE D .0; 0; 1; 0; 0/, which is different from the L D 500 case. Both ECA vectors for
the longer sampled subset time series have majority-vote inferences that agree with intuition, just
as was found for the L D 500 case. But g1 (transfer entropy) for .BO rL ; D O L / implies the intuitive
inference of BO r ! D
L O for L D 1000, which differs both from the g1 inference for .B
L O L; D
O L / with
L D 1000 and from the g1 inference for either pair with L D 500. Changing only the length of
the sampled time series subsets can change the causal inference implied by g1 for BO rL ! D O L.
e missing data points have also affected both the daily averaging procedure and the sam-
pling procedure. e averaging procedure involves the arithmetic mean of 24-hours of data where
missing data within that 24-hour period was ignored; i.e., if, e.g., only 20-hours of data are avail-
able for a given 24-hour period, then the reported daily mean for that period is the arithmetic
mean of only 20 data points rather than the expected 24. is process may or may not have have
led to daily averages that are representative of the physical system, i.e., the calculated daily av-
erages may be significantly different from the daily averages that might have been calculated if
¹⁶e ECA vector here is found by using the mean value of each time series causality tool to find the implied causal inference,
as was done in Table 4.11.
4.3. EMPIRICAL DATA EXAMPLES 107
the data was not missing. Such counter-factual concerns, however, are not (and perhaps cannot)
be addressed in this example. e missing data also biases the sampling procedure. e sampling
procedure involves the random selection of 500 contiguous data points within the time series B,
Br , and D . If any of the subset time series are missing data, then the exploratory causal analysis
is not performed and different subset time series are selected. is is why there are only 3,025
samples of each time series causality tool calculation in the attempted sampling of 104 time series
subsets.¹⁷ It may be that 500 contiguous data points containing no missing data across all three
time series are relatively rare within the data, which may cause the sampling procedure to over-
represent certain 20-day time periods in the analysis. is issue was also not addressed in this
example.
e majority-vote inferences of both the daily average and sampling procedures used in this
example agree with intuition. is analysis, however, is exploratory causal analysis and should not
be confused with confirmatory causal analysis. e issues discussed in the previous two paragraphs
help illustrate the distinction between the two. e results presented in this section do not confirm
any theory that posits B ! D . Rather, the results show BN ! D N and B O rL ! DO L (for L D 500) are
N D/
potential causal structures of the time series pairs .B; N and .B
O rL ; D
O L / (for L D 500), respectively.
ese results may be useful for the confirmatory causal analysis of B ! D but would require, as
stated previously, outside theoretical (or analytical) support.
¹⁷Only 1,366 samples were found in 5,800 attempts when the sampling procedure was run with L D 1000.
CHAPTER 5
Conclusions
5.1 ECA RESULTS AND EFFICACY
e causal inferences drawn from every tested time series causality tool agree with intuition for
four of the five bivariate synthetic data examples, including every linear example and certain pa-
rameters for both nonlinear examples. PAI and transfer entropy lead to counter-intuitive causal
inferences for specific parameter values of the coupled logistic map example.¹ PAI and the Granger
causality log-likelihood statistic can fail to be defined (and, thus, provide no causal inference) for
the RL circuit example (depending on the sampling frequency of the voltage signal).² Overall,
PAI provides intuitive causal inferences as often as transfer entropy, when it is defined, and is
defined as often as the Granger tool. ese results indicate that the PAI algorithm introduced
in [36] may need to be refined (to handle zero distance nearest neighbors better) and the algo-
rithm parameters (i.e., the embedding dimension, time delay, and number of weights to use in
the cross-mapping procedure) may need to be set more carefully during the ECA process. e
leaning never fails to be defined for any of the examples and provides the intuitive causal inference
for a majority of the tested parameter values for both nonlinear examples. e leaning even pro-
vides intuitive causal inferences for examples in which both the transfer entropy and PAI provide
counter-intuitive inferences.³
e leaning is the only tested time series causality tool that consistently provides the in-
tuitive causal inference for both empirical data examples. e Granger tool and lagged cross-
correlation are the only tested tools that imply counter-intuitive causal inferences for the snowfall
example.⁴ e second empirical example, the OMNI data, shows how the time series sampling
procedure can affect the implied causal inferences. e tested time series causality tools did not
all provide the same causal inference for either empirical example, which implies the empirical
data sets do not have clear, intuitive driving relationships in the sense of the simpler synthetic
data examples. e disagreement between the tools implies some complexity in the system dy-
namics but can also provide additional inferences. For example, the transfer entropy and Granger
tool provide conflicting causal inferences for both the snowfall example and most of the sampling
procedures used for the OMNI data example. It is known that these two tools are equivalent if
the joint distribution of the data is Gaussian [168]. us, this disagreement implies the empiri-
Several examples of exploratory causal analysis have been shown, some with successful naive ECA
summaries and some that required individual time series analysis tools to be applied in different
ways. Overall, the theme of using more than one tool to draw exploratory causal inferences proved
to be useful both in understanding the potential causal structure of the system and in understand-
ing any failures of particular tools (i.e., disagreements with intuition and/or disagreements with
the majority of the other tools). is approach potentially helps prevent errant causal inferences at
the cost of (potentially) redundant calculations. is work, however, has not explored these ideas
formally. What properties of the system dynamics consistently lead to counter-intuitive causal
inferences from one time series causality tool but not another? Are there such properties? ese
types of research questions may help an analyst better understand the time series causality tools,
but may also help guide the system modeling (e.g., if a given Granger causality tool is known to
always fail if the time series data is generated by non-linear dynamics, then the failure of such
a tool may guide the analyst to eliminate linear approaches during the system modeling effort).
is work briefly noted in Section 4.2.4 that a given Granger causality tool (i.e., the MVGC
implementation of the Granger log-likelihood statistic) consistently implied the intuitive causal
inference for synthetic data sets generated by non-linear dynamics, despite the expectation to not
do so (which also happened for the example shown in Section 4.2.5). e exploration of such
research questions, however, may require more than computational testing. It may be fruitful to
formally explore, e.g., how a given cause-effect assignment in the leaning calculation might appear
in the VAR forecast model used by a given Granger causality tool.
e synthetic data example of Section 4.2.5 is the only example in this work of exploratory
causal analysis on coupled system dynamics. ere is a history of studying such systems in non-
linear and chaotic time series analysis (see, e.g., [138]). Such systems, however, do not always
have clear physical causal intuitions, so this work did not explore them in detail (i.e., beyond
what is done in Section 4.2.5). ese systems may provide an intriguing study of the exploratory
causal analysis approach. Is it possible for multiple time series analysis tools to imply the same
causal inference for such systems? If so, how are the tools that agree related, e.g., formally or
computationally? Exploratory causal analysis of well-studied chaotic dynamics may help guide
112 5. CONCLUSIONS
analysts in applying such techniques to, e.g., building forecast models, which is known to be
difficult for data generated from non-linear and/or chaotic dynamics [228].
e leaning was introduced in Section 3.5 and used throughout this work. All leaning calcu-
lations rely on cause-effect assignment and tolerance domains, which, given empirical data, must
be set with some reasonable data analysis. is work used the .1=4/-width tolerances domains ex-
tensively, which are straightforward to define for a given data set. e l -standard cause-effect as-
signments were used exclusively in this work with l set as a function of the autocorrelation lengths
of one or both times series. Section 4.2.1 discussed one possible method for setting l algorithmi-
cally. ere was no discussion, however, of determining a reasonable cause-effect assignment algo-
rithmically in general. e l -standard assignment is a straightforward choice but an analyst may be
limiting the usefulness of the leaning calculation by not trying other possible cause-effect assign-
ments. Consider, for example, a system for which one signal D is a steady impulse representing the
times at which a grain of sand is dropped into a pile, i.e., D D fd t D 1 8t 2 D; d t D 0 8t 62 Dg
where D is the set of times at which the grain of sand is dropped, and the other signal H is the
maximum height of the pile, i.e., H D fh t D f .di / ji D 1; 2; : : : ; t g where the maximum height
of the pile at time t is a function f of all the previously dropped grains of sand di . It is expected
that f is sufficiently complex to represent both the height increases due to each additional grain
of sand and the occasional height decreases due to sand avalanches when the pile meets certain
physical requirements. e intuitive causal inference for this scenario is D ! H but the leaning
may not be able to imply such an inference using only the l -standard assignment because of the
occasional avalanches. Instead, the leaning may require an “autoregressive” cause-effect assign-
ment of fC; Eg D fd t l and h t k ; h t g, i.e., the assumed cause is defined using both the l lagged
time steps of the impulse signal D and the k lagged time steps of the response signal H, to re-
turn the intuitive causal inference. Perhaps instead there is some fixed value h0 of h t that should
be used in the cause-effect assignment somehow, e.g., fC; Eg D fd t l and .h t 1 < h0 /; h t g. It
would be useful to algorithmically determine which cause-effect assignments might be useful for
the leaning calculation. is task may be prohibitively difficult,⁸ but it may allow the leaning to
provide more detailed causal inference, e.g., by identifying which cause-effect assignment leads
to a higher leaning than another. Likewise, it may be useful to algorithmically set the embedding
dimension and time delay in the PAI correlation difference calculation, e.g., by finding the embed-
ding dimension and time delay that maximum the SSR correlation (i.e., the correlation between
a given time series and the estimation of that time series calculated using the shadow manifold of
itself ). If all the time series causality tool parameters were set algorithmically, directly from the
data being analyzed, then the exploratory causal analysis may become significantly automated.
Section 1.3 emphasized that exploratory causal analysis of bi-variate time series data was
focused on answering the question “Given .A; B/, is the potential driving relationship A ! B or
B ! A, or can no conclusion be drawn with the tools being used?” is focus may be a myopic use
⁸e leaning calculation is, as discussed in Section 3.5, essentially a structured counting of features in the time series pairs. So,
algorithmically setting a cause-effect assignment for the leaning calculation may be equivalent to pattern finding and matching
within and between the two series, and then calculating the leaning for all potential cause-effect assignments.
5.2. FUTURE WORK 113
of the time series tools. e actual values calculated with each tool was only used for comparison
to another value and was distilled into a ternary (yes-no-unknown) answer. A fruitful extension of
this work may be to explore how the actual values calculated with each of these tools compare to
each other and to the causal intuitions drawn from system parameter values in the synthetic data
examples. Consider, for example, an instance of Eq. (4.9) with some fixed x0 D y0 , rx > ry and
ˇyx > ˇxy . It was shown in Section 4.2.5 that such an instance of Eq.(4.9) can lead to an ECA
summary that agrees with the intuitive causal inference of X ! Y . If a second instance of Eq. (4.9)
is generated with all the same system parameters expect ˇyx , which is increased by some amount
, do the individual values of the times series causality tools also change appropriately by some
amount related to ? Does, for example, the transfer entropy difference become more positive by
some amount related to and/or related to the amount by which the PAI correlation difference
becomes more negative? ese types of research questions may lead to a deeper understanding
of both the time series causality tools themselves and what type of “causality” is actually being
investigated by these tools.
Finally, this work has also purposefully ignored the philosophical foundations of the time
series causality tools being used. Some of these tools have been discussed in the philosophical
literature (e.g., Granger causality and information-theoretic causality tools; see [5]) but others,
particularly the leaning, have not. e penchant is derived from probabilistic causality notions
and is related to quantities that have been discussed philosophically, including Kleinberg’s causal
significance, for which the foundational causality issues are discussed at length in [31]. e deriva-
tion of the leaning from the penchants and the penchants’ relationship to the causal significance
may imply that the leaning has an interpretation within Kleinberg’s causal framework.
Bibliography
[1] P. Godfrey-Smith. eory and Reality: An Introduction to the Philosophy of Science. Sci-
ence and Its Conceptual Foundations series. University of Chicago Press, 2009. DOI:
10.7208/chicago/9780226300610.001.0001. 1, 2, 14
[2] G. van Belle. Statistical Rules of umb. Wiley Series in Probability and Statistics. Wiley,
2011. DOI: 10.1002/9780470377963. 1
[3] R. A. Fisher. e design of experiments, volume 12. Oliver and Boyd Edinburgh, 1960.
DOI: 10.1136/bmj.1.3923.554-a. 1, 2, 14, 15, 16
[4] J. W Tukey. Exploratory data analysis. 1977. 1, 4, 5, 6, 102
[5] P. Illari and F. Russo. Causality: Philosophical eory Meets Scientific Practice. Oxford Uni-
versity Press, 2014. DOI: 10.5860/choice.190085. 2, 5, 6, 11, 12, 15, 16, 41, 113
[6] C. W. J. Granger. Testing for causality: A personal viewpoint. Journal of Economic Dynamics
and Control, 2(0):329–352, 1980. DOI: 10.1016/0165-1889(80)90069-X. 2, 8, 11, 23, 24,
41
[7] C. Granger. Time series analysis, cointegration, and applications. Nobel Lecture, pages
360–366, 2003. DOI: 10.1257/0002828041464669. 2, 3, 11, 23
[8] R. A. Fisher. Statistical methods for research workers. 1934. DOI: 10.1007/978-1-4612-
4380-9_6. 2
[9] G. W. Imbens and D. B. Rubin. Causal Inference in Statistics, Social, and Biomedical Sci-
ences. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction.
Cambridge University Press, 2015. DOI: 10.1017/cbo9781139025751. 2, 5, 12, 14, 15,
16
[10] S. L. Morgan and C. Winship. Counterfactuals and Causal Inference. An-
alytical Methods for Social Research. Cambridge University Press, 2014. DOI:
10.1017/cbo9780511804564. 2, 14, 15, 16, 18
[11] P. Illari, F. Russo, and J. Williamson. Causality in the Sciences. OUP Oxford, 2011. DOI:
10.1093/acprof:oso/9780199574131.001.0001. 2, 11, 12, 17
[12] D. Bohm. Causality and Chance in Modern Physics. Pennsylvania paperbacks. University
of Pennsylvania Press, Incorporated, 1971. DOI: 10.1063/1.3060163. 2, 6, 11, 12, 13, 14
116 BIBLIOGRAPHY
[13] M. Bunge. Causality and Modern Science. Courier Corporation, 1979. 2, 6, 11, 12, 13, 14
[14] J. H. King and N. E. Papitashvili. Solar wind spatial scales in and comparisons of hourly
wind and ace plasma and magnetic field data. Journal of Geophysical Research: Space Physics
(1978–2012), 110(A2), 2005. DOI: 10.1029/2004ja010649. 2, 93
[15] J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.
DOI: 10.1017/cbo9780511803161. 2, 3, 4, 5, 6, 8, 11, 12, 14, 15, 17, 18, 19, 39
[16] P. W. Holland. Statistics and causal inference. Journal of the American statistical Association,
81(396):945–960, 1986. DOI: 10.1080/01621459.1986.10478354. 2, 3, 4, 8, 11, 12, 14,
15, 16, 17
[17] B. Chen and J. Pearl. Regression and causation: A critical examination of six econometrics
textbooks. Real-World Economics Review, Issue, (65):2–20, 2013. 3
[18] K. A. Bollen and J. Pearl. Eight myths about causality and structural equation models.
In Handbook of Causal Analysis for Social Research, pages 301–328. Springer, 2013. DOI:
10.1007/978-94-007-6094-3_15. 3, 17, 18
[20] B. Russell. On the notion of cause. In Proceedings of the Aristotelian society, pages 1–26.
JSTOR, 1912. DOI: 10.1093/aristotelian/13.1.1. 6, 12
[21] B. Russell. Human knowledge: Its Scope and its Limits. New York: Simon & Schuster, 1948.
DOI: 10.4324/9780203875353. 12
[22] W. Salmon. Scientific Explanation and the Causal Structure of the World. Princeton Univer-
sity Press, 1984.
[23] H. Reichenbach. e philosophy of space and time. Lorentz and Poincaré Invariance:
100 Years of Relativity, 8:218, 2001. 6
[24] C. W. J. Granger. Economic processes involving feedback. Information and Control,
6(1):28–48, 1963. DOI: 10.1016/s0019-9958(63)90092-5. 7, 21, 23, 24, 26
[30] G. Sugihara, R. May, H. Ye, C. Hsieh, E. Deyle, M. Fogarty, and S. Munch. Detecting
causality in complex ecosystems. Science, 338(6106):496–500, 2012. DOI: 10.1126/sci-
ence.1227079. 8, 9, 30, 31, 32, 36, 75
[31] S. Kleinberg. Causality, Probability, and Time. Cambridge University Press, 2012. DOI:
10.1017/cbo9781139207799. 8, 12, 19, 38, 113
[32] P. Suppes. A Probablistic eory of Causality. North Holland Publishing Company, 1970.
8, 12, 18, 38, 41
[33] I. J. Good. Causal propensity: A review. PSA: Proceedings of the Biennial Meeting
of the Philosophy of Science Association, 1984:829–850, 1984. DOI: 10.1086/psaprocbien-
meetp.1984.2.192542. 8, 12, 38
[34] T. Schreiber. Measuring information transfer. Phys. Rev. Lett., 85:461–464, 2000. DOI:
10.1103/physrevlett.85.461. 9, 26, 27, 28
[38] M. G. Evans. Causality and explanation in the logic of aristotle. Philosophy and Phe-
nomenological Research, 19(4):466–485, 1959. DOI: 10.2307/2105115. 11, 12
[39] C. F. Bolduan. e autobiography of science. American Journal of Public Health and the
Nations Health, 35(10):1090, 1945. DOI: 10.2105/ajph.35.10.1090-a. 11
[40] K. R. Popper, A. F. Petersen, and J. Mejer. e World of Parmenides: Essays on the Presocratic
Enlightenment. Routledge, 1998. DOI: 10.4324/9781315824482. 11
118 BIBLIOGRAPHY
[41] I. Düring. Aristotle in the Ancient Biographical Tradition. Distr.: Almqvist & Wiksell,
Stockholm, 1957. 11
[42] A. Plotnitsky. “dark materials to create more worlds:” On causality in classical physics,
quantum physics, and nanophysics. Journal of Computational and eoretical Nanoscience,
8(6):983–997, 2011. DOI: 10.1166/jctn.2011.1778. 11
[45] A. Zellner. Causality and causal laws in economics. Journal of Econometrics, 39(1):7–21,
1988. DOI: 10.1016/0304-4076(88)90038-3. 11
[47] Z. Zheng and P. A. Pavlou. Research note-toward a causal interpretation from observa-
tional data: A new bayesian networks method for structural models with latent variables.
Information Systems Research, 21(2):365–391, 2010. DOI: 10.1287/isre.1080.0224. 12
[49] A. Honore. Causation in the law. In Edward N. Zalta, Ed., e Stanford Encyclopedia of
Philosophy. Winter edition, 2010. DOI: 10.1145/379437.379789. 12
[51] J. Locke. An Essay Concerning Human Understanding. Eliz. Holt, 1700. DOI:
10.1093/oseo/instance.00018020. 12
[53] I. Kant and J. M. D. Meiklejohn. Critique of Pure Reason. Bohn’s philosophical library.
Henry G. Bohn, 1855. DOI: 10.1037/11654-000. 12
BIBLIOGRAPHY 119
[54] D. Hume and L.A. Selby-Bigge. A Treatise of Human Nature. Clarendon Press, 1888.
DOI: 10.1093/oseo/instance.00032872. 12
[55] J. S. Mill. A System of Logic, Ratiocinative and Inductive: Being a Connected View of the
Principles of Evidence and the Methods of Scientific Investigation. Harper & Brothers, 1858.
DOI: 10.1017/cbo9781139149846. 12
[56] Plato. e Allegory of the Cave. P & L Publication, 2010. 12
[57] B. Spinoza. On the Improvement of the Understanding: e Ethics; Correspondence. Dover
books on philosophy. Dover, 1955. 12
[58] K. Pearson. Mathematical Contributions to the eory of Evolution. On Homotyposis in Ho-
mologous but Differentiated Organs. 1903. DOI: 10.1098/rspl.1902.0099. 12, 13
[59] W. C. Salmon. Causality and Explanation. Oxford University Press, Oxford, 1998. DOI:
10.1093/0195108647.001.0001. 12
[60] N. Bohr. Essays 1958-1962 on Atomic Physics and Human Knowledge. Ox Bow Press, 1963.
DOI: 10.1063/1.3051271. 13
[61] N. Bohr. Causality and complementarity. Philosophy of Science, 4(3):289–298, 1937. DOI:
10.1086/286465.
[62] P. J. Riggs. Quantum Causality: Conceptual Issues in the Causal eory of Quantum Mechanics,
vol. 23. Springer Science & Business Media, 2009. DOI: 10.1007/978-90-481-2403-9.
13
[63] A. Bohm, H. D. Doebner, and P. Kielanowski. Irreversibility and Causality: Semigroups
and Rigged Hilbert Spaces. Lecture Notes in Physics. Springer Berlin Heidelberg, 2013.
DOI: 10.1007/bfb0106772. 13
[64] M. Pawłowski, T. Paterek, D. Kaszlikowski, V. Scarani, A. Winter, and M. Zukowski.
Information causality as a physical principle. Nature, 461(7267):1101–1104, 2009. DOI:
10.1038/nature08400. 13
[65] A. Kuzmich, A. Dogariu, L. J. Wang, P. W. Milonni, and R. Y. Chiao. Signal velocity,
causality, and quantum noise in superluminal light pulse propagation. Physical Review
Letters, 86(18):3925, 2001. DOI: 10.1103/physrevlett.86.3925. 13
[66] J. A. Smolin and J. Oppenheim. Locking information in black holes. Physical Review
Letters, 96(8):081302, 2006. DOI: 10.1103/physrevlett.96.081302. 13
[67] G. W. Gibbons and C. A. R. Herdeiro. Supersymmetric rotating black holes and causal-
ity violation. Classical and Quantum Gravity, 16(11):3619, 1999. DOI: 10.1088/0264-
9381/16/11/311. 13
120 BIBLIOGRAPHY
[68] E. C. Zeeman. Causality implies the lorentz group. Journal of Mathematical Physics,
5(4):490–493, 1964. DOI: 10.1063/1.1704140. 13
[69] F. J. Tipler. Singularities and causality violation. Annals of Physics, 108(1):1–36, 1977.
DOI: 10.1016/0003-4916(77)90348-7.
[70] S. Liberati, S. Sonego, and M. Visser. Faster-than-c signals, special relativity, and causality.
Annals of Physics, 298(1):167–185, 2002. DOI: 10.1006/aphy.2002.6233. 13
[71] G. F. R. Ellis. Physics, complexity and causality. Nature, 435(7043):743–743, 2005. DOI:
10.1038/435743a. 13
[72] J. Barbour. e End of Time: e Next Revolution in Physics. Oxford University Press, 1999.
DOI: 10.5860/choice.38-0352.
[73] L. S. Schulman. Time’s Arrows and Quantum Measurement. Cambridge
monographs on mathematical physics. Cambridge University Press, 1997. DOI:
10.1017/cbo9780511622878.
[74] R. Penrose. e Emperor’s New Mind: Concerning Computers, Minds, and the Laws of Physics.
Popular Science Series. OUP Oxford, 1999. DOI: 10.1119/1.16207. 13, 14
[75] F. Dowker. Causal sets as discrete spacetime. Contemporary Physics, 47(1):1–9, 2006. DOI:
10.1080/17445760500356833. 13
[76] L. Bombelli, J. Lee, D. Meyer, and R. D. Sorkin. Space-time as a causal set. Phys. Rev.
Lett., 59:521–524, 1987. DOI: 10.1103/physrevlett.59.521. 13
[77] K. S. orne. Closed timelike curves. In General Relativity and Gravitation 1992, Proceed-
ings of the irteenth INT Conference on General Relativity and Gravitation, held at Cordoba,
Argentina, 28 June–July 4 1992, page 295. CRC Press, 1993. 13
[78] S. Aaronson and J. Watrous. Closed timelike curves make quantum and classical com-
puting equivalent. In Proceedings of the Royal Society of London A: Mathematical, Phys-
ical and Engineering Sciences, vol. 465, pages 631–647. e Royal Society, 2009. DOI:
10.1098/rspa.2008.0350. 13
[79] F. Lobo and P. Crawford. Time, closed timelike curves and causality. In e Nature of Time:
Geometry, Physics and Perception, pages 289–296. Springer, 2003. DOI: 10.1007/978-94-
010-0155-7_30. 13
[80] S. M. Korotaev and E. O. Kiktenko. Quantum causality in closed timelike curves. Physica
Scripta, 90(8):085101, 2015. DOI: 10.1088/0031-8949/90/8/085101. 13
[81] J. G. Cramer and N. Herbert. An inquiry into the possibility of nonlocal quantum com-
munication. arXiv preprint arXiv:1409.5098, 2014. 13
BIBLIOGRAPHY 121
[82] E. Mach and B. F. McGuinness. Principles of the eory of Heat: Historically and Critically
Elucidated. Vienna Circle Collection. Springer Netherlands, 1986. DOI: 10.1007/978-
94-009-4622-4. 13
[83] E. Mach. e Science of Mechanics. Cambridge Scholars Publishing, 2009. DOI:
10.1017/cbo9781107338401. 13
[84] P. A. White. Ideas about causation in philosophy and psychology. Psychological Bulletin,
108(1):3, 1990. DOI: 10.1037/0033-2909.108.1.3. 13
[85] K. Sawa. Predictive behavior and causal learning in animals and humans1. Japanese Psy-
chological Research, 51(3):222–233, 2009. DOI: 10.1111/j.1468-5884.2009.00396.x. 13
[86] A. H. Taylor, R. Miller, and R. D. Gray. New caledonian crows reason about hidden causal
agents. Proceedings of the National Academy of Sciences, 109(40):16389–16391, 2012. DOI:
10.1073/pnas.1208724109. 13
[87] P. A. White. Causal processing: origins and development. Psychological Bulletin, 104(1):36,
1988. DOI: 10.1037//0033-2909.104.1.36. 13
[88] P. A. White. A theory of causal processing. British Journal of Psychology, 80(4):431–454,
1989. DOI: 10.1111/j.2044-8295.1989.tb02334.x.
[89] T. R. Shultz. Causal reasoning in the social and nonsocial realms. Canadian Journal
of Behavioural Science/Revue Canadienne des Sciences du Comportement, 14(4):307, 1982.
DOI: 10.1037/h0081266. 13
[90] P. W. Cheng. From covariation to causation: a causal power theory. Psychological Review,
104(2):367, 1997. DOI: 10.1037/0033-295x.104.2.367. 13
[91] A. M. Leslie and S. Keeble. Do six-month-old infants perceive causality? Cognition,
25(3):265–288, 1987. DOI: 10.1016/S0010-0277(87)80006-9. 13
[92] L. M. Oakes and L. B. Cohen. Infant perception of a causal event. Cognitive Development,
5(2):193–207, 1990. DOI: 10.1016/0885-2014(90)90026-p.
[93] A. Michotte, T. R. Miles, and E. Miles. e perception of causality. British Journal for the
Philosophy of Science, 15(59):254–259, 1964. 13
[94] S. Golin, P. D. Sweeney, and D. E. Shaeffer. e causality of causal attributions in depres-
sion: A cross-lagged panel correlational analysis. Journal of Abnormal Psychology, 90(1):14,
1981. DOI: 10.1037/0021-843x.90.1.14. 14
[95] W. Yan and E. L. Gaier. Causal attributions for college success and failure an asian-
american comparison. Journal of Cross-Cultural Psychology, 25(1):146–158, 1994. DOI:
10.1177/0022022194251009. 14
122 BIBLIOGRAPHY
[96] S. P. Nguyen and K. S. Rosengren. Causal reasoning about illness: A comparison between
european-and vietnamese-american children. Journal of Cognition and Culture, 4(1):51–78,
2004. DOI: 10.1163/156853704323074750. 14
[97] S. A. Sloman and D. Lagnado. Causality in thought. Annual Review of Psychology,
66(1):223–247, 2015. PMID: 25061673. DOI: 10.1146/annurev-psych-010814-015135.
14
[98] D. B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized
studies. Journal of Educational Psychology, 66(5):688, 1974. DOI: 10.1037/h0037350. 16
[99] D. B. Rubin. Statistics and causal inference: Comment: Which ifs have causal an-
swers. Journal of the American Statistical Association, 81(396):961–962, 1986. DOI:
10.2307/2289065. 16
[100] J. Neyman and K. Iwaszkiewicz. Statistical problems in agricultural experimentation.
Supplement to the Journal of the Royal Statistical Society, pages 107–180, 1935. DOI:
10.2307/2983637. 16
[101] A. P. Dawid. Causal inference without counterfactuals. Journal of the American Statistical
Association, 95(450):407–424, 2000. DOI: 10.1080/01621459.2000.10474210. 16, 17
[102] J. Pearl. Causal inference without counterfactuals: Comment. Journal of the American
Statistical Association, pages 428–431, 2000. DOI: 10.2307/2669380. 16
[103] C. Granger. Comment. Journal of the American Statistical Association, 81(396):967–968,
1986. DOI: 10.1080/01621459.1986.10478358. 16
[104] A. P. Dawid. Influence diagrams for causal modelling and inference. International Statis-
tical Review, 70(2):161–189, 2002. DOI: 10.1111/j.1751-5823.2002.tb00354.x. 16, 18
[105] A. P. Dawid. Counterfactuals, Hypotheticals and Potential Responses: A Philosophi-
cal Examination of Statistical Causality. In Federica Russo and Jon Williamson, Eds.,
Causality and Probability in the Sciences, pages 503–532. 2007. 16
[106] A. P. Dawid. Beware of the dag! NIPS Causality: Objectives and Assessment, 6:59–86, 2010.
16
[107] S. Geneletti and A. P. Dawid. Defining and identifying the effect of treatment
on the treated. In Phyllis McKay Illari, Federica Russo, and Jon Williamson,
Eds., Causality in the Sciences. Oxford University Press, Oxford, 2011. DOI:
10.1093/acprof:oso/9780199574131.001.0001. 17
[108] E. Arjas and M. Eerola. On predictive causality in longitudinal studies. Journal of Statistical
Planning and Inference, 34(3):361–386, 1993. DOI: 10.1016/0378-3758(93)90146-w. 17
BIBLIOGRAPHY 123
[109] E. Arjas and J. Parner. Causal reasoning from longitudinal data*. Scandinavian Journal of
Statistics, 31(2):171–187, 2004. DOI: 10.1111/j.1467-9469.2004.02-134.x. 17
[110] M. Eerola. Probabilistic Causality in Longitudinal Studies. Lecture Notes in Statistics.
Springer, New York, 2012. DOI: 10.1007/978-1-4612-2684-0. 17, 18
[111] T. G. Ditterrich. Machine learning research: four current direction. Artificial Intelligence
Magazine, 4:97–136, 1997. 17
[112] I. Guyon, C. Aliferis, G. Cooper, A. Elisseeff J.-P. Pellet, P. Spirtes, and
A. Statnikov. Causality Workbench. In Phyllis McKay Illari, Federica Russo,
and Jon Williamson, Eds., Causality in the Sciences. OUP Oxford, 2011. DOI:
10.1093/acprof:oso/9780199574131.001.0001. 17
[114] J. Pearl. e causal foundations of structural equation modeling. Technical report, DTIC
Document, 2012. 18
[115] K. A. Bollen. Structural Equations with Latent Variables. Wiley Series in Probability and
Statistics. Wiley, 2014. DOI: 10.1002/9781118619179. 17
[116] N. Hall. Structural equations and causation. Philosophical Studies, 132(1):109–136, 2007.
DOI: 10.1007/s11098-006-9057-9. 18, 19
[119] N. Cliff. Some cautions concerning the application of causal modeling methods. Multi-
variate Behavioral Research, 18(1):115–126, 1983. DOI: 10.1207/s15327906mbr1801_7.
18
[124] P. Spirtes, C. N. Glymour, and R. Scheines. Causation, Prediction, and Search. Adaptive
computation and machine learning. MIT Press, 2000. DOI: 10.1007/978-1-4612-2748-
9. 18
[129] C. Meek and C. Glymour. Conditioning and intervening. e British Journal for the
Philosophy of Science, 45(4):1001–1021, 1994. DOI: 10.1093/bjps/45.4.1001. 19
[130] S. Greenland. Relation of probability of causation to relative risk and doubling dose: a
methodologic error that has become a social problem. American Journal of Public Health,
89(8):1166–1169, 1999. DOI: 10.2105/ajph.89.8.1166.
[131] S. R. Cole and M. A. Hernán. Fallibility in estimating direct effects. International Journal
of Epidemiology, 31(1):163–165, 2002. DOI: 10.1093/ije/31.1.163.
[132] O. A. Arah. e role of causal reasoning in understanding simpson’s paradox, lord’s para-
dox, and the suppression effect: covariate selection in the analysis of observational studies.
Emerging emes in Epidemiology, 5(1):1–5, 2008. DOI: 10.1186/1742-7622-5-5.
[133] I. Shrier. Propensity scores. Statistics in Medicine, 28(8):1317–1318, 2009. DOI:
10.1002/sim.3554.
[134] J. Pearl. Letter to the editor: Remarks on the method of propensity score. Department of
Statistics, UCLA, 2009. DOI: 10.1002/sim.3521. 19
[135] C. Glymour, D. Danks, B. Glymour, F. Eberhardt, J. Ramsey, R. Scheines, P. Spirtes,
C. M. Teng, and J. Zhang. Actual causation: a stone soup essay. Synthese, 175(2):169–
192, 2010. 19
[136] P. Menzies. Causal models, token causation, and processes. Philosophy of Science,
71(5):820–832, 2004. DOI: 10.1086/425057. 19
BIBLIOGRAPHY 125
[137] S. Kleinberg. A logic for causal inference in time series with discrete and continuous vari-
ables. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, vol. 22,
page 943, 2011. DOI: 10.5591/978-1-57735-516-8/IJCAI11-163. 19
[138] H. Kantz and T. Schreiber. Nonlinear Time Series Analysis. Cambridge nonlinear science
series. Cambridge University Press, 2004. DOI: 10.1017/cbo9780511755798. 19, 21, 22,
111
[140] J. Cui, L. Xu, S. L. Bressler, M. Ding, and H. Liang. Bsmart: a matlab/c toolbox for anal-
ysis of multichannel neural time series. Neural Networks, 21(8):1094–1104, 2008. DOI:
10.1016/j.neunet.2008.05.007. 21
[141] Q. Luo, W. Lu, W. Cheng, P. A. Valdes-Sosa, X. Wen, M. Ding, and J. Feng. Spatio-
temporal granger causality: A new framework. NeuroImage, 79:241–263, 2013. DOI:
10.1016/j.neuroimage.2013.04.091.
[142] M. G. Tana, R. Sclocco, and A. M. Bianchi. Gmac: A matlab toolbox for spectral granger
causality analysis of fmri data. Computers in Biology and Medicine, 42(10):943–956, 2012.
DOI: 10.1016/j.compbiomed.2012.07.003.
[143] L. Barnett and A. K. Seth. e mvgc multivariate granger causality toolbox: a new ap-
proach to granger-causal inference. Journal of Neuroscience Methods, 223:50–68, 2014.
DOI: 10.1016/j.jneumeth.2013.10.018. 25
[144] Z. Zang, C. Yan, Z. Dong, J. Huang, and Y. Zang. Granger causality analysis imple-
mentation on matlab: A graphic user interface toolkit for fmri data processing. Journal of
Neuroscience Methods, 203(2):418–426, 2012. DOI: 10.1016/j.jneumeth.2011.10.006.
[145] A. K. Seth. A {MATLAB} toolbox for granger causal connectivity analysis. Journal of
Neuroscience Methods, 186(2):262–273, 2010. DOI: 10.1016/j.jneumeth.2009.11.020. 21
[146] M. J. Kaminski and K. J. Blinowska. A new method of the description of the infor-
mation flow in the brain structures. Biological Cybernetics, 65(3):203–210, 1991. DOI:
10.1007/bf00198091. 21
[151] M. Lindner, R. Vicente, V. Priesemann, and M. Wibral. Trentool: A matlab open source
toolbox to analyse information flow in time series data with transfer entropy. BMC Neu-
roscience, 12(1):119, 2011. DOI: 10.1186/1471-2202-12-119.
[152] A. Montalto, L. Faes, and D. Marinazzo. Mute: a matlab toolbox to compare estab-
lished and novel estimators of the multivariate transfer entropy. 2014. DOI: 10.1371/jour-
nal.pone.0109462.
[153] J. T. Lizier. Jidt: An information-theoretic toolkit for studying the dynamics of complex
systems. Frontiers in Robotics and AI, 1(11), 2014. DOI: 10.3389/frobt.2014.00011. 21,
47
[155] T. Sauer, J. A. Yorke, and M. Casdagli. Embedology. Journal of Statistical Physics, 65(3-
4):579–616, 1991. DOI: 10.1007/bf01053745. 22, 30
[156] J. D. Farmer and J. J. Sidorowich. Predicting chaotic time series. Phys. Rev. Lett., 59:845–
848, 1987. DOI: 10.1103/physrevlett.59.845. 22, 30
[158] H. Ma and C. Han. Selection of embedding dimension and delay time in phase space
reconstruction. Frontiers of Electrical and Electronic Engineering in China, 1(1):111–114,
2006. DOI: 10.1007/s11460-005-0023-7. 22, 32
[174] W. Hesse, E. Moller, M. Arnold, and B. Schack. e use of time-variant {EEG} granger
causality for inspecting directed interdependencies of neural assemblies. Journal of Neuro-
science Methods, 124(1):27–44, 2003. DOI: 10.1016/s0165-0270(02)00366-7. 26
[176] L. Barnett and A. K. Seth. Granger causality for state-space models. Phys. Rev. E,
91:040101, 2015. DOI: 10.1103/physreve.91.040101. 26, 47
[177] A. Arnold, Y. Liu, and N. Abe. Temporal causal modeling with graphical granger meth-
ods. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Dis-
covery and Data Mining, KDD ’07, pages 66–75, New York, NY, USA, 2007. ACM. DOI:
10.1145/1281192.1281203. 26
[178] A. Shojaie and G. Michailidis. Discovering graphical granger causality using the trun-
cating lasso penalty. Bioinformatics, 26(18):i517–i523, 2010. DOI: 10.1093/bioinformat-
ics/btq377. 26
[179] A. C. Lozano, N. Abe, Y. Liu, and S. Rosset. Grouped graphical granger modeling for
gene expression regulatory networks discovery. Bioinformatics, 25(12):i110–i118, 2009.
DOI: 10.1093/bioinformatics/btp199. 26
[180] M. Dhamala, G. Rangarajan, and M. Ding. Estimating granger causality from fourier
and wavelet transforms of time series data. Phys. Rev. Lett., 100:018701, 2008. DOI:
10.1103/physrevlett.100.018701. 26
[201] A. Bozorgmagham and S. Ross. Dynamical system tools and causality analysis. In SIAM
(Society for Industrial and Applied Mathematics), Student Chapter at Virginia Tech, 2013. 30
[202] I. Vlachos and D. Kugiumtzis. State space reconstruction from multiple time series.
In Topics on Chaotic Systems: Selected Papers from Chaos 2008 International Conference,
page 378. World Scientific, 2009. DOI: 10.1142/9789814271349_0043. 32
[203] H. Ma, K. Aihara, and L. Chen. Detecting causality from nonlinear dynamics with short-
term time series. Scientific Reports, 4, 2014. DOI: 10.1038/srep07464. 36
[204] B. Cummins, T. Gedeon, and K. Spendlove. On the efficacy of state space reconstruction
methods in determining causality. SIAM Journal on Applied Dynamical Systems, 14(1):335–
381, 2015. DOI: 10.1137/130946344. 36
[207] L. M. Pecora, T. L. Carroll, and J. F. Heagy. Statistics for mathematical properties of maps
between time series embeddings. Phys. Rev. E, 52:3420–3439, 1995. DOI: 10.1103/phys-
reve.52.3420. 36
[208] D. A. Kenny. Cross-lagged panel correlation: A test for spuriousness. Psychological Bulletin,
82(6):887, 1975. DOI: 10.1037//0033-2909.82.6.887. 37
BIBLIOGRAPHY 131
[209] M. S. Bartlett. Some aspects of the time-correlation problem in regard to tests of signifi-
cance. Journal of the Royal Statistical Society, pages 536–543, 1935. DOI: 10.2307/2342284.
37
[210] J. Runge, V. Petoukhov, and J. Kurths. Quantifying the strength and delay of climatic
interactions: the ambiguities of cross correlation and a novel measure based on graphical
models. Journal of Climate, 27(2):720–739, 2014. DOI: 10.1175/jcli-d-13-00159.1. 37,
38
[211] G. C. Carter. Coherence and time delay estimation. Proceedings of the IEEE, 75(2):236–
255, 1987. DOI: 10.1109/proc.1987.13723. 37
[212] R. M. Rozelle and D. T. Campbell. More plausible rival hypotheses in the cross-
lagged panel correlation technique. Psychological Bulletin, 71(1):74, 1969. DOI:
10.1037/h0026863. 37
[213] A. H. Yee and N. L. Gage. Techniques for estimating the source and direction of causal
influence in panel data. Psychological Bulletin, 70(2):115, 1968. DOI: 10.1037/h0025927.
37
[214] D. L. Weakliem. A critique of the Bayesian information criterion for model selection. Soci-
ological Methods & Research, 27(3):359–397, 1999. DOI: 10.1177/0049124199027003002.
47
[215] B. Hayes. ird base. American Scientist, 89(6), 2001. DOI: 10.1511/2001.40.3268. 48
[216] D. Halliday, R. Resnick, and J. Walker. Fundamentals of Physics. John Wiley & Sons,
2010. DOI: 10.1063/1.3070817. 64
[217] R. D. Knight. Physics for Scientists and Engineers with Modern Physics: A Strategic Approach
With Access Code. Prentice Hall, 2012. 64
[218] B. Efron and R. J. Tibshirani. An Introduction to the Bootstrap. Chapman &
Hall/CRC Monographs on Statistics & Applied Probability. Taylor & Francis, 1994.
DOI: 10.1007/978-1-4899-4541-9. 74, 102
[219] A. L. Lloyd. e coupled logistic map: a simple model for the effects of spatial het-
erogeneity on population dynamics. Journal of eoretical Biology, 173(3):217–230, 1995.
DOI: 10.1006/jtbi.1995.0058. 75
[220] K. Bache and M. Lichman. UCI machine learning repository, 2013. 88
[221] H. Ma and C. Han. Selection of embedding dimension and delay time in phase space
reconstruction. Frontiers of Electrical and Electronic Engineering in China, 1(1):111–114,
2006. DOI: 10.1007/s11460-005-0023-7. 90
132 BIBLIOGRAPHY
[222] D. Kugiumtzis. State space reconstruction parameters in the analysis of chaotic time
series—the role of the time window length. Physica D: Nonlinear Phenomena, 95(1):13–28,
1996. DOI: 10.1016/0167-2789(96)00054-1. 90
[223] M. Sugiura and T. Kamei. IAGA Bulletin N 40. International Association of Geomag-
netism and Aeronomy, 1991. 93
[224] M. A. Hapgood. Space physics coordinate transformations: A user guide. Planetary and
Space Science, 40(5):711–717, 1992. DOI: 10.1016/0032-0633(92)90012-d. 93
[225] W. D. Gonzalez, J. A. Joselyn, Y. Kamide, H. W. Kroehl, G. Rostoker, B. T. Tsurutani,
and V. M. Vasyliunas. What is a geomagnetic storm? Journal of Geophysical Research: Space
Physics, 99(A4):5771–5792, 1994. DOI: 10.1029/93ja02867. 93
[226] R. K. Burton, R. L. McPherron, and C. T. Russell. e terrestrial magnetosphere—a
half-wave rectifier of the interplanetary electric field. Science, 189(4204):717–718, 1975.
DOI: 10.1126/science.189.4204.717. 93
[227] J. W. Dungey. Interplanetary magnetic field and the auroral zones. Physical Review Letters,
6(2):47, 1961. DOI: 10.1103/physrevlett.6.47. 93
[228] H. Tong. Non-linear Time Series: A Dynamical System Approach. Dynamical System Ap-
proach. Clarendon Press, 1993. 112