Deriving Inferential Statistics From Recurrence Plots: A Recurrence-Based Test of Differences Between Sample Distributions and Its Comparison To The Two-Sample Kolmogorov-Smirnov Test

This document describes a study that proposes using recurrence plots (RPs) to replicate certain types of inferential statistics, specifically the two-sample Kolmogorov-Smirnov test. RPs are generated from sample data without temporal dependencies to capture differences between sample distributions. A χ2 statistic is derived from ratios of recurrence points in RPs to obtain p-values, and these are compared to p-values from the Kolmogorov-Smirnov test. Simulated data show RPs can feasibly mimic this standard statistical test, suggesting RPs have potential for broader application in inferential statistics beyond analyzing time series dynamics.

Uploaded by

Sebastian Wallot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views9 pages

Deriving Inferential Statistics From Recurrence Plots: A Recurrence-Based Test of Differences Between Sample Distributions and Its Comparison To The Two-Sample Kolmogorov-Smirnov Test

Uploaded by

Sebastian Wallot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Deriving inferential statistics from recurrence plots: A recurrence-based test of

differences between sample distributions and its comparison to the two-sample

Kolmogorov-Smirnov test
Sebastian Wallot, and Giuseppe Leonardi

Citation: Chaos 28, 085712 (2018); doi: 10.1063/1.5024915

View online: https://doi.org/10.1063/1.5024915
View Table of Contents: http://aip.scitation.org/toc/cha/28/8
Published by the American Institute of Physics
CHAOS 28, 085712 (2018)

Deriving inferential statistics from recurrence plots: A recurrence-based test

of differences between sample distributions and its comparison to the
two-sample Kolmogorov-Smirnov test
Sebastian Wallot1 and Giuseppe Leonardi2,a)
1
Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, 60322 Frankfurt am Main, Germany
2
Faculty of Psychology, University of Finance and Management, ul. Pawia 55, 01-030 Warsaw, Poland
(Received 5 February 2018; accepted 13 June 2018; published online 28 August 2018)
Recurrence plots (RPs) have proved to be a very versatile tool to analyze temporal dynamics of time
series data. However, it has also been conjectured that RPs can be used to model samples of random
variables, that is, data that do not contain any temporal dependencies. In the current paper, we show
that RPs can indeed be used to mimic nonparametric inferential statistics. Particularly, we use the case
of the two-sample Kolmogorov-Smirnov test as a proof-of-concept, showing how such a test can be
done based on RPs. Simulations on differences in mean, variance, and shape of two distributions
show that the results of the classical two-sample Kolmogorov-Smirnov test and the recurrence-based
test for differences in distributions of two independent samples scale well with each other. While
the Kolmogorov-Smirnov test seems to be more sensitive in detecting differences in means, the
recurrence based test seems to be more sensitive to detect heteroscedasticity and asymmetry. Poten-
tial improvements of our approach as well as extensions to tests with individual distributions are
discussed. Published by AIP Publishing. https://doi.org/10.1063/1.5024915

Since their introduction, recurrence plots (RPs) have stim- using recurrence plots to conduct standard inferential statisti-
ulated much work and methodological advancements in cal analysis) for a while that RQA can also be used to replicate
several fields of research. Our paper proposes that the certain types of inferential statistics and linear analyses (such
range of application of recurrence plots could be extended as the Fast Fourier Transform, FFT7 ). While a complete gen-
to cover problems of inferential statistics and hypothesis eralization of the General Linear Model (GLM) and similar
testing, and to answer the question if two sample distri- non-parametric statistics currently seems out of the scope of
butions are derived from the same population nor not. RPs, it has been shown that RPs do generalize certain types
Simulated data show the feasibility of this proposal. of linear analyses—particularly spectral analysis7,8 —and the
current article wants to move this kind of research forward
by showing how RPs can be used to mimic the Kolmogorov-
I. INTRODUCTION Smirnov test9–11 for inferring differences between sample
distributions.
Recurrence Quantification Analysis (RQA) is a method In the next sections, we will first shortly describe the
developed in the context of nonlinear dynamic systems the- way a RP is generated for a single, un-embedded series of
ory, which has its specific range of applications in the analysis values. Then, we will describe an approach of how an RP
of empirically observed time series.1–3 By providing the time can be used to reflect differences in two independent sam-
series as an input to RQA, it is possible to extract several ples by capturing the ratio of the number of recurrence points
quantities characterizing the dynamics of the underlying sys- in specific areas of the RP, and we show how a χ 2 -statistic
tem generating the time series itself.1,4 These measures are can be used to derive a p-value from such a ratio. We will
technically derived from the Recurrence Plot (RP) of the time then show how the p-values obtained from the RP-based dis-
series, the core of all recurrence-based analyses.5 The RP is a tribution test scale with p-values obtained from the classical
thresholded distance matrix of every value in the series (or its Kolmogorov-Smirnov test.
associated phase-space coordinates) where all distances below
the threshold are counted as being recurrent, and all distances II. A BRIEF INTRODUCTION TO RECURRENCE PLOTS
exceeding the threshold are counted as non-recurrent. The
structures formed by recurrence points in the RP, like diagonal The recurrence plot (RP) is the crucial building block for
and vertical lines, are informative of the dynamic behavior of RQA when analyzing a time series. The RP is a depiction of
the system and can be quantified.6 recurring patterns of a phase-space portrait based on a one-
However, beyond the realm of nonlinear analysis it has dimensional time-series. However, in order to properly cap-
been conjectured (this conjecture goes back to discussions ture the dynamics of a time series, phase-space reconstruction
of the first author with Guy Van Orden and Charles Web- is usually performed,12,13 effectively converting the original
ber throughout the years 2009-2013 about the possibility of one-dimensional time-series into a higher-dimensional phase-
space trajectory. For the current application, however, we are
a)
Author to whom correspondence should be addressed: [email protected], explicitly not interested in temporal properties of data (in
Tel.: +48 22 536-54-11. fact, our application does not pertain to time series at all, but
1054-1500/2018/28(8)/085712/8/$30.00 28, 085712-1 Published by AIP Publishing.
085712-2 S. Wallot and G. Leonardi Chaos 28, 085712 (2018)

rather to the analysis of random variables drawn from some For sufficiently large sample size, the test statistic can be
distribution). Hence, no embedding is performed and the RP converted into an approximate chi-square by
of interest is directly based on the values of the input data.
NX NY
In such a situation, the values of our input data are cross- χ 2 = 4D2 , (4)
matched at every point, so that each value is compared to all NX + NY
the other values in the data set with regard to the distance (i.e., where NX is the sample size of X and NY is the sample size
the absolute numerical distance) among these pairs of values. of Y. To derive a p-value indicating the probability of the
By running the comparisons for all the values of the input distance D between the two cumulative distributions under
data, we fill up all the cells of a square matrix Rij with the the null hypothesis that both distributions are drawn from the
distances of the ith and jth values of the data, for all i < N same population, the χ 2 -value is evaluated with two degrees
and j < N, where N is the total number of points in the data of freedom (df = 2). The null hypothesis can be rejected if p
set, which is then thresholded using a distance ε. This proce- is below a pre-set criterion α (i.e., the value for Type I error)
dure effectively turns the distance matrix into a binary matrix to conclude that there is in fact a difference between the two
of recurrent and non-recurrent values (i.e., distances between sample distributions, and they do come indeed from different
two data points that are ≤ ε and distances > ε): populations.
Let us now consider the simple situation where we have
Ri,j = (ε − ||Xi − Xj ||), i, j = 1, . . . , N, (1) two samples to compare with the same number of data points
in each of them. The logic of an inferential test based on the
where R is the thresholded distance matrix, is the thresh- RP starts by formulating the usual null hypothesis that the dis-
old parameter, (·) is the Heaviside step function [where tributions of the two different samples come from the same
(x) = 0, if x < 0, and (x) = 1 otherwise], X is a variable population, and hence the distances measured over the data
containing the input data, || . . . || is a distance norm, typically points within X and within Y should be statistically the same
the Euclidean norm, and N is the number of data points of X. as the distances measured between the data points of X and
Note that we have explicitly shifted our terminology for Y. However, if the two samples come from different popula-
X from time series to input data. This is because in the tions that differ with regard to their distribution, the distances
following we are interested in the analysis of pairs of sam- of the data points within X and the distances of the data points
ples drawn from two independent and identically distributed within Y should be smaller compared to the distances between
(i.i.d.) random variables that do not contain any temporal the data points of X and Y. This can now be tested by exam-
dependence between their data points, akin to the commonly ining different areas of an RP with regard to the population of
known Kolmogorov-Smirnov test for the comparison of sam- recurrences.
ple distributions, which provides the probability under the To that end, we prepare the input data so that the data
null hypothesis that the two independent samples of random points within each of the two vectors—or lists, to be pre-
variables are drawn from the same population. cise—X and Y are concatenated into a single variable Z,
which will be the single input series for the construction of
III. INFERENTIAL TESTS FOR DISTRIBUTIONAL the recurrence plot:
DIFFERENCES AMONG TWO SAMPLES
X = (x1 , x2 , . . . , xn ), (5)
The aim of this paper is to show how an RP can be used
to mimic standard inferential statistics. As a proof-of-concept, Y = (y1 , y2 , . . . , yn ), (6)
we specifically aim to show how information obtained from
the RP can be used to reproduce the Kolmogorov-Smirnov Z = (x1 , x2 , . . . , xn , y1 , y2 , . . . , yn ). (7)
test for two samples. In order to construct the RP of Z, we use Eq. (1) with a thresh-
The two-sample Kolmogorov-Smirnov test14 aims to test old parameter equal to the median of all distance values of
for the difference between two distributions by quantify- the distance matrix Di,j = ||Zi − Zj || of Z:
ing a statistic D, which is the largest absolute difference
between the cumulative frequency distributions of two i.i.d. Ri,j = (ε − ||zi − zj ||), i, j = 1, . . . , N, (8)
random variables X and Y, where the cumulative frequency where the norm used is the usual Euclidean distance.
distribution of X (and likewise of Y ) is defined as Since the recurrence plot basically represents in a graph-
ical way the distances of every data point in the sample with
1
N
F(X ) = I[−∞,X ] (xi ), (2) all the other data points, by choosing an appropriate value
N i=1 for we can observe how the smaller distances among those
within each sample, separately, group together in the form of
where N is the sample size, xi is the ith data point of X, and recurrence points in specific areas of the plot. If the numer-
I[−∞,X ] is an indicator function equal to 1 if I[−∞,X ] and equal ical values of the data points that compose the two samples
to 0 otherwise. The Kolmogorov-Smirnov statistic D is then are very different, recurrence points should appear more often
simply defined as the largest absolute difference between the when comparing two values from the same sample than two
cumulative distribution function of X and Y : values across the different samples. For this reason, and in
the specific situation we are considering of two samples with
D = maximum |F(X ) − F(Y )|. (3) an equal number of data points, the appropriate value of
085712-3 S. Wallot and G. Leonardi Chaos 28, 085712 (2018)

FIG. 1. Three example recurrence plots where the mean differences of the two independent samples inputted were 0.0 (a), 2.0 (b), and 4.0 (c). The top panels
show the histograms of the two distributions, each containing N = 50 values drawn from a standard normal distribution. The middle panel shows the associated
recurrence plots, and the bottom panel shows the effective input vector Z, with values belonging to sample X marked in black and values belonging to sample Y
marked in red.

the threshold for this kind of analysis is the one which more densely populated, while cells 1 and 4 become increas-
keeps the value of the critical recurrence rate RECcrit fixed ingly de-populated [Figs. 1(b) and 1(c)] as mean difference
to RECcrit = 0.5 for the whole RP. between samples increases.
Using the median distance as a threshold will lead to One can sort the values of the data points within
a recurrence rate RECcrit ≈ 0.5, that is, at about every each of the two variables, X and Y, in ascending order
other coordinate on the resulting RP will display a recur- with regard to their numerical values before concatenation,
rence. Figure 1 shows three resulting recurrence plots with so that x1 ≤ x2 , x2 ≤ x3 , . . . , xN−1 ≤ xN and y1 ≤ y2 , y2 ≤
N X = N Y = 50, SDX = SDY = 1, and mean difference of the y3 , . . . , yN−1 ≤ yN . Note that this is not a necessary step—the
sample means λ equaling 0.0, 2.0, and 4.0, respectively. results of the analysis will not be impacted by sorting the
A schematic division of different areas in the resulting values, but this will improve the visual distinctness of the
RP can be drawn (Fig. 2), which is also evident in the RPs resulting RP and will allow visually determine how the two
presented in Fig. 1 (see also Rapp et al. who use a quad- distributions differ by examining the recurrence pattern (see
rant scan of the RP of time-series data to determine transition Fig. 2, cell 1 and cell 4).
points in that time-series15 ). Recurrences within sample X are We can now use this density of recurrence points in the
found in the 3rd cell in the lower-left, and distances within various cells to infer differences in sample means. This can be
the sample Y are found in the 2nd cell in the upper-right. done by considering the ratio of the recurrence points between
Distances between samples are found in cells 1 and 4 (i.e., the two samples in cells 1 and 4 (Fig. 2) to the recurrence
upper-left and lower-right). Further, as the mean distance points within the two samples in cells 2 and 3.
between samples increases [from Figs. 1(a)–1(c)], these cells By counting the number of recurrence points in each of
become less equally populated: While there are approximately the groups of cells, we could use a ratio of the number of
equally many recurrences points in all four cells when the recurrence points to derive a p-value that relates to the mean
mean difference between the two samples is 0.0 [Fig. 1(a)], differences between the samples. However, because the RP
cells 2 and 3, reflecting within-sample recurrence, become always has a line of identity filled with recurrence points at
085712-4 S. Wallot and G. Leonardi Chaos 28, 085712 (2018)

Now, a p-value can be derived from the observed RRwithin

and RRbetween and the expected frequencies E by using a χ 2 -
statistic with df = 1:
(RRwithin − Ewithin )2 (RRbetween − Ebetween )2
χ2 = + . (13)
Ewithin Ebetween
In the case of unequal sample size (i.e., NX = NY ), the
expected frequencies are not equal for RRwithin and RRbetween ,
but rather proportional to the size of the different cells in the
RP (i.e., cells 1 and 4 compared to cells 2 and 3). Here, we
can compute the expected values Ewithin and Ebetween as

(NX + NY )2 − NX2 − NY2 NZ2
Ewithin = 1 − ·
(NX + NY )2 4
NX2 + NY2 − NZ
= (14)
4
and
FIG. 2. Schematic representation of a recurrence plot and the different

(NX + NY )2 − NX2 − NY2 NZ2
cells—quadrants on the recurrence plot—that are associated with recurrences Ebetween = ·
within and between samples. (NX + NY )2 4
NX ∗ NY
its main diagonal, and because the RP is always symmetric = , (15)
about this diagonal, we cannot simply sum-up the recurrence 4
points in cells 2 and 3, and 1 and 4. Rather, to determine with the calculation of the χ 2 -statistic again as in Eq. (12).
the within-sample recurrence rate RRwithin and the between-
sample recurrence rate RRbetween , we need to subtract the IV. SIMULATION
recurrences at the diagonal and account for the symmetry of
the plot. If we define RRc1 as the number of recurrence points Finally, we are interested in the relative performance of
in cell 1, RRc2 as the number of recurrence points in cell 2, the classical two-sample Kolmogorov-Smirnov test compared
RRc3 as the number of recurrence points in cell 3, and RRc4 as to the recurrence-based test for differences in distributions.
the number of recurrence points in cell 4, then To that end, we conduct a small series of simulations. The
basic question we will pursue is how the results of the two-
RRwithin = (RRc2 + RRc3 − NZ )/2 (9) sample Kolmogorov-Smirnov test scale with the results of
and the recurrence-based test (a) given increasing differences in
means λ = |μX − μY | between two otherwise similar distribu-
RRbetween = (RRc1 + RRc4 )/2. (10)
tions under the assumption of normality and homoscedastic-
After determining the observed frequencies of RRwithin and ity of the population (with μ = 0 and σ = 1) from which
RRbetween , a χ 2 -statistic can be computed with the expected the samples are drawn; (b) given increasing heteroscedastic-
frequencies under H 0 being an equal distribution of recur- ity between the two distributions under the assumptions of
rence points in cells 1 and 4 compared to cells 2 and 3, normality and similarity in means (with μ = 0); (c) given
RRwithin = RRbetween . Hence, the expected values for RRwithin increasing difference in skewness between the two distri-
and RRbetween , under the discussed parametrization of ε which butions, by comparing a standard normal-distribution (with
leads to RECcrit = 0.5 and discounting also for the diagonal μ = 0 and σ = 1) to a skew-normal distribution (with μ = 0,
and the symmetry of the RP, are σ = 1 and increasing coefficient of skewness γ1 ); (d) given
2 increasing difference in distributional shape, comparing a
NZ
Ewithin = − NZ /2 /2 (11) standard normal-distribution (with μ = 0 and σ = 1) to mix-
4
tures of normal and log-normal distributions (both with μ = 0
and and σ = 1) with increasing log-normal component.
Ebetween = NZ2 /8, (12) For the four simulations, the following parameters space
is inspected:
where NX + NY = NZ is the total number of data points of
the two samples. That is, we can test the observed distribu- (a) The parameter λ indicating the difference in means
tion of recurrences against the theoretical uniform distribution among two otherwise similar normal distributions. We
of frequencies in the cells under H 0 . If two samples came choose a series of values going from λ = 0 (indicating
from the same population, and would hence distribute sim- no difference in population mean μ) to λ = 1 in steps of
ilarly, the RRwithin would tend to be equal to RRbetween . The 0.1.
more the two population distributions separate, the more the (b) The parameter δ indicating the difference in standard
recurrences would tend to gather in the within sample cells, deviation among two normal distributions with the same
i.e., RRwithin > RRbetween . mean. Again, we choose a series of values going from
085712-5 S. Wallot and G. Leonardi Chaos 28, 085712 (2018)

δ = 0 (no difference in standard deviation σ ) to δ = 1 in steeper slope of the curve for the traditional Kolmogorov-
steps of 0.1. Smirnov test. Yet, the recurrence-test we are discussing here,
(c) The parameter γ1 indicating the coefficient of skewness, shows a very similar behavior, scaling well with the bench-
ranging from a value of γ1 = 0 (normally distributed data mark Kolmogorov-Smirnov test. In general, it seems that the
without skew) to a value of γ1 = 0.9 (strongly right-skew recurrence-based test is less sensitive in registering a differ-
distribution) in steps of 0.1. In every case, skew-normal ence between the samples’ distribution, however, although its
distributions will be compared to a fixed normal distri- performance gets closer and closer to that of the Kolmogorov-
bution. In this case, the mean and standard deviation of Smirnov test with increasing sample size.
both the normal and the skew-normal distributions are
kept constant (μ = 0 and σ = 1). B. Normal distributions with equal mean but different
(d) The percentage of sampling data points drawn from a standard deviations
standardized log-normal distribution. In this case, one The critical parameter in this case is the difference δ in
sample is drawn from a normal distribution with μ = 0 the standard deviations of otherwise identical normal distri-
and σ = 1, while for the second sample is drawn from a butions. At values of δ = 0, there is no difference in the σ of
normal and log-normal, with an increasing portion of the the two distributions, and both tests are expected to reject the
data points from the log-normal (i.e., positively skewed) null-hypothesis only by chance.
whose values have been standardized (i.e., sample 2 has As δ increases, the two distributions increasingly differ
M = 0 and SD = 1 as well). In this way, we introduce a in spread (though still being centered around the same value
random but increasing component of positively skewed μ = 0) and the two tests should reject the null-hypothesis
data points in sample 2, testing a homogenous (i.e., nor- at higher rates. From Fig. 4, we see that this is hardly the
mal) vs. a heterogeneous (i.e., composed of a normal and case—even for quite big differences in σ —when the sample
log-normal) distribution. size is small (N < 40). However, both the Kolmogorov-
Smirnov test and the recurrence-test are able to detect the
In addition to the above parameters, in each simulation we
difference in distribution shape—due to the different standard
varied the sample size N for each of the comparison outlined
deviations—for bigger sample sizes. Here, the recurrence-
above (while keeping it equal for both samples), going from
based test consistently shows greater sensitivity compared to
NX = NY = 10 to 100 in steps of 10. The chosen levels of
the Kolmogorov-Smirnov test, a reversal of the pattern found
sample size cover typical ranges of sizes used in empirical
for differences in means (Fig. 3).
research (at least in the field of the social sciences, which is
the one we are more acquainted with). C. Normal distribution compared to a Skew-normal
By randomly extracting 1000 independent samples for distribution with equal mean and standard deviation
each comparison and parameter combination as outlined
above, we run a standard two-sample Kolmogorov-Smirnov In this case, we wanted to check if two distributions with
test in R using the ks.test()-function of the “stats” package and the same mean and standard deviation but differing in terms
the recurrence-based test described above, i.e., a test based on of the asymmetry around the mean can be reliably differenti-
the distribution of recurrence points in the RP built from the ated by the two tests. We compared samples extracted from
data extracted. The implementation of the recurrence-based a standard normal distribution (μ = 0 and σ = 1) to sam-
test in R, together with the simulations outlined above, can be ples extracted from a skew-normal distribution16 with positive
accessed online in the supplementary material. skew increasing from 0 (symmetric) to 0.9. We manipulated
only the value of the coefficient of asymmetry γ1 , keeping
mean and standard deviation equal across both distributions.
A. Normal distributions with equal standard deviations Again, a value of γ1 = 0 means that the two distribu-
but different means tions are identical and the null-hypothesis should only be
At a value of λ = 0, the two samples are derived from the rejected by chance. As γ1 increases, the capacity of the two
same population; therefore, the statistics computed on them tests to reliably differentiate the two distributions increases,
(χ 2 ) should (almost) always fall outside the critical region but is very low for most of the values of γ1 and N, reach-
of the sampling distribution. Considering that in this case we ing some weak power (∼15% of significant tests) only for
are basically testing two samples when the null hypothesis the most asymmetrical distributions (γ1 ≥ 0.8) in the com-
true, over 1000 trials we should observe a significant result parisons involving the biggest samples (N ≥ 90; see Fig. 5).
in proportions close to the value of α = 0.05. As λ increases, This pattern of results is comparable for the recurrence-based
though, the two populations are not equal anymore in the test and the Kolmogorov-Smirnov test, though the first tends
parameter μ (i.e., the null hypothesis is false) so both tests to perform slightly better in those cases.
should quickly increase the proportion of significant conclu-
D. Normal distribution compared to a variable mixture
sions over the 1000 trials, up to a point when the difference
of normal and log-normal distributions
between the samples (λ) is always detected and all the tests
run on them should be statistically significant. Finally, we devised a simulation strategy to examine how
This pattern of results is what we observe in our sim- the test performed on comparing a homogenous vs. heteroge-
ulation (see Fig. 3). For values of λ > 0, the number of neous (positively skewed) distribution, comparing a sample
statistically significant tests at α = 0.05 increases, with a drawn from a standard normal distribution with the usual
085712-6 S. Wallot and G. Leonardi Chaos 28, 085712 (2018)

FIG. 3. Simulation results: case a. The proportion of significant tests (over 1000 trials) is represented as a function of mean displacement of the population
means from which the two samples are extracted and for different sample sizes (see text for details).

parameters of mean and standard deviation (μ = 0 and σ = 1) observed than in the standard case. The manipulated parame-
against a sample whose data points consist of a mixture of data ter space was in this case the proportion of points in sample
drawn from the same normal distribution as in the first sample 2 coming from such a skewed distribution, which we varied
and a log-normal distribution. Log-normal distributions are from 0.00 (i.e., both samples come from a normal distribution
normal distributions on a logarithmic scale, and hence, on a with the same parameters) to 1.00 (i.e., data from sample 2
regular scale, they are characterized by a thicker positive tail, all come from a log-normal distribution, and hence it is much
i.e., extreme positive values have higher probabilities of being more likely for us to observe extreme positive values).

FIG. 4. Simulation results: case b. The proportion of significant tests (over 1000 trials) is represented as a function of mean difference in the standard deviations
of the population from which the two samples are extracted and for different sample sizes (see text for details).
085712-7 S. Wallot and G. Leonardi Chaos 28, 085712 (2018)

FIG. 5. Simulation results: case c. The proportion of significant tests (over 1000 trials) is represented as a function of mean difference in the skewness of the
population from which the two samples are extracted and for different sample sizes (see text for details).

As evidenced in Fig. 6, for small sample sizes (N ≤ 30), can still be assimilated as coming from the standard normal
the recurrence test and the Kolmogorov-Smirnov test hardly distribution.) When the proportion of data coming from a log-
lead to any significant result. This is in general true, more- normal distribution starts to be considerable (i.e., greater than
over, for any sample size when the mixture of normal and 0.5) and when sample size increases, a more definite differ-
log-normal distribution is at or below a proportion of 0.5. ence in the distribution of the two samples emerges, which
This hints to the fact that the distributions of the sampled data seems to be more promptly and reliably detected by the recur-
points under such conditions may not be that different after rence test as compared to the Kolmogorov-Smirnov test. This
all in the two samples. (In these cases, a few extreme values result, combined with the previous simulation, leads us to

FIG. 6. Simulation results: case d. The proportion of significant tests (over 1000 trials) is represented as a function of the proportion of the mixture of sample
points from the theoretical normal and log-normal populations (in sample 2) and for different sample sizes (see text for details).
085712-8 S. Wallot and G. Leonardi Chaos 28, 085712 (2018)

conclude that in the presence of moderate to strong skewness problems. An extension of the recurrence-based test to the
in one of the samples, the recurrence test is better able to dis- one-sample Kolmogorov-Smirnov test (i.e., testing whether
criminate distributional differences between the samples and an observed distribution significantly differs from some ideal
produce significant results. distribution) seems rather straightforward: One could sim-
On the basis of the above presented simulation data, we ply test the observed distribution against a random sample
tentatively conclude that the recurrence-based test of the kind drawn from the ideal distribution of interest and conduct
outlined in this paper compares well to the classical two- that test as described above. Moreover, this could be done
sample Kolmogorov-Smirnov test for differences in distribu- in a semi-bootstrapped procedure, drawing multiple sam-
tions, and that recurrence plots could find an application for ples from the ideal distribution to increase reliability of the
the kind of statistical problems treated by inferential statistics. test. Potentially, such a test might also be applicable to dis-
crete distributions, but this is currently only a speculative
V. DISCUSSION conjecture.
On the other side, in general, the usage of RPs for con-
In the present paper, we showed how inferential statis- ducting multi-factorial analyses might be more complicated,
tics could be computed taking advantage of particular pat- but an extension of our approach to other methods of draw-
terns of recurrences on recurrence plots, using the case of ing statistical inferences seems possible: instead of comparing
the two-sample Kolmogorov-Smirnov test as a proof-of- only four cells on the RP that capture within- and between-
concept. Comparing the results from the classical two-sample sample recurrence, one could systematically divide the RP
Kolmogorov-Smirnov test and the recurrence-based test on into additional cells that—given the structure of the input
simulated data, we can say that both seem to perform simi- data—could be systematically compared to each other to
larly well over all, although they differ in their ability to detect assess factorial designs. In sum, the current paper provides a
particular kinds of differences between distributions. Specif- positive proof-of-concept that analysis using recurrence plots
ically, our simulations show that the Kolmogorov-Smirnov can be extended to applications of hypothesis testing using
test seems to be superior in detecting differences in sample inferential statistics.
means, while the recurrence based test seems to be superior
in detecting differences among samples drawn from popula- SUPPLEMENTARY MATERIAL
tions with unequal variance, and to a certain extent among
samples drawn from distributions with unequal symmetry. It See supplementary material for the R code used for the
has been previously noted11 that the Kolmogorov-Smirnov simulations.
test tends to be particularly sensitive near the center of the 1
C. L. J. Webber and J. P. Zbilut, J. Appl. Physiol. 76, 965 (1994).
distribution compared to the tails. The results of our simula- 2
J. P. Zbilut and C. L. J. Webber, Phys. Lett. A 171, 199 (1992).
3
tions confirm this observation, posing at the same time our N. Marwan, M. C. Romano, M. Thiel, and J. Kurths, Phys. Rep. 438, 237
recurrence based test as a valuable complementary method (2007).
4
C. L. J. Webber and N. Marwan, Recurrence Quantifcation Analysis:
to test for differences in distributions when the differences Theory and Best Practices (Springer, Cham, 2015).
manifest themselves at the tails rather than the center (e.g., 5
J.-P. Eckmann, S. O. Kamphorst, and D. Ruelle, Europhys. Lett. 4, 973
heteroscedasticity, asymmetry). (1987).
6
While the derivation of a p-value for the Kolmogorov- C. L. J. Webber and J. P. Zbilut, in Tutorials in Contemporary Nonlinear
Methods for the Behavioral Sciences, edited by M. A. Riley and G. C. Van
Smirnov test via the χ 2 -statistic is well understood, however, Orden (NSF, 2005), pp. 26–94.
there might be alternative ways to derive p-values for the 7
J. P. Zbilut and N. Marwan, Phys. Lett. A 372, 6622 (2008).
8
recurrence-based test that might be more appropriate or poten- C. L. J. Webber, M. A. Schmidt, and J. M. Walsh, J. Appl. Physiol. 78, 814
tially more sensitive in assessing the differences in distribu- (1995).
9
A. Kolmogorov, G. Dell’Istituto Ital. Degli Attuari 4, 83 (1933).
tions of recurrence points between and within samples—an 10
N. V. Smirnov, Bull. Math. Univ. Moscou 2, 3 (1939).
issue which warrants further exploration. 11
I. M. Chakravarti, R. G. Laha, and J. Roy, Handbook of Methods of Applied
The somewhat lower sensitivity of the recurrence-based Statistics (John Wiley & Sons, New York, 1967).
12
test might also be a consequence of the “nominalization” of F. Takens, in Dynamical Systems and Turbulence Warwick 1980, edited by
D. Rand and L.-S. Young (Springer, Berlin, 1981), pp. 366–381.
the data (i.e., only considering whether two values are differ- 13
R. Mañé, in Dynamical Systems and Turbulence Warwick 1980, edited by
ent, but disregarding the magnitude of the difference), which D. Rand and L.-S. Young (Springer, Berlin, 1981), pp. 230–242.
14
usually goes along with a loss of power if the data are not W. Feller, Ann. Math. Stat. 19, 177 (1948).
15
modeled adequately. P. E. Rapp, D. M. Darmon, and C. J. Cellucci, IEICE Proc. Ser. 2, 286
(2014).
We also want to discuss possible generalizations of 16
A. Azzalini and A. Capitanio, The Skew-Normal and Related Families
the recurrence-based test to other applications or inferential (Cambridge University Press, Cambridge, 2014).

Solution Manual of Probability & Statistics For Engineers & Scientists (9th Edition)
69% (13)
Solution Manual of Probability & Statistics For Engineers & Scientists (9th Edition)
257 pages
Unit 4 QB Part B Answer (2023)
No ratings yet
Unit 4 QB Part B Answer (2023)
17 pages
Ae Test Bank This Is Applied Econometrics Testbank
100% (1)
Ae Test Bank This Is Applied Econometrics Testbank
134 pages
Solution Manual of Probability Amp Statistics For Engineers Amp Scientists 9th Edition PDF Free
No ratings yet
Solution Manual of Probability Amp Statistics For Engineers Amp Scientists 9th Edition PDF Free
257 pages
2501.00118v1
No ratings yet
2501.00118v1
63 pages
Lajos Horváth
No ratings yet
Lajos Horváth
426 pages
Thesis 1973D H723b
No ratings yet
Thesis 1973D H723b
85 pages
Point Pattern Analysis: Using Spatial Inferential Statistics
No ratings yet
Point Pattern Analysis: Using Spatial Inferential Statistics
34 pages
Abund 12
No ratings yet
Abund 12
35 pages
lec05
No ratings yet
lec05
28 pages
0906.1418v1
No ratings yet
0906.1418v1
22 pages
Marwan-2011-How To Avoid Potential Pitfalls in Recurrence Plot Based Data Analysis
No ratings yet
Marwan-2011-How To Avoid Potential Pitfalls in Recurrence Plot Based Data Analysis
16 pages
Testing Random Number Generators
No ratings yet
Testing Random Number Generators
54 pages
Friedman F Test
No ratings yet
Friedman F Test
24 pages
Capitulo 4 Primera Edicion
No ratings yet
Capitulo 4 Primera Edicion
29 pages
Kolmogorov Smirnov
100% (1)
Kolmogorov Smirnov
12 pages
Chaos RQA
No ratings yet
Chaos RQA
11 pages
Functional Analysis of Variance for Hilbert-Valued Multivariate Fixed Effect Models (Ruiz-Medina, M.D.) (Z-Library)
No ratings yet
Functional Analysis of Variance for Hilbert-Valued Multivariate Fixed Effect Models (Ruiz-Medina, M.D.) (Z-Library)
28 pages
0503056v1
No ratings yet
0503056v1
9 pages
ch15sec7
No ratings yet
ch15sec7
11 pages
Applications of T F and Chi2 Distributions-Slide Share
No ratings yet
Applications of T F and Chi2 Distributions-Slide Share
15 pages
A fast algorithm for 2-D KS two sample tests-Yuanhui Xiao
No ratings yet
A fast algorithm for 2-D KS two sample tests-Yuanhui Xiao
6 pages
ozken_PRE2018
No ratings yet
ozken_PRE2018
8 pages
Matlab File
No ratings yet
Matlab File
12 pages
Ug 3 Statistics
No ratings yet
Ug 3 Statistics
23 pages
FULL WORK
No ratings yet
FULL WORK
19 pages
ADS QB Num+Theory Soln
No ratings yet
ADS QB Num+Theory Soln
37 pages
A Kolmogorov-Smirnov Test For R Samples: Walter B Ohm
No ratings yet
A Kolmogorov-Smirnov Test For R Samples: Walter B Ohm
24 pages
2347232
No ratings yet
2347232
10 pages
The Two-Dimensional Kolmogorov-Smirnov Test: Raul H.C. Lopes
No ratings yet
The Two-Dimensional Kolmogorov-Smirnov Test: Raul H.C. Lopes
12 pages
AIwanski, 1998. Recurrence Plots of Experimental Data
No ratings yet
AIwanski, 1998. Recurrence Plots of Experimental Data
12 pages
1 Np Methods
No ratings yet
1 Np Methods
9 pages
Bondad de Ajuste
No ratings yet
Bondad de Ajuste
13 pages
Test - Random Numbers
No ratings yet
Test - Random Numbers
47 pages
Lecture On Time Series Diagnostic Tests: Chung-Ming Kuan Institute of Economics Academia Sinica
No ratings yet
Lecture On Time Series Diagnostic Tests: Chung-Ming Kuan Institute of Economics Academia Sinica
23 pages
Laboratory Experiments in Information Retrieval: Sample Sizes, Effect Sizes, and Statistical Power Tetsuya Sakai pdf download
100% (1)
Laboratory Experiments in Information Retrieval: Sample Sizes, Effect Sizes, and Statistical Power Tetsuya Sakai pdf download
51 pages
Young 1941
No ratings yet
Young 1941
9 pages
Advanced Data Analysis - Lecture Notes
No ratings yet
Advanced Data Analysis - Lecture Notes
874 pages
entropy-21-00713
No ratings yet
entropy-21-00713
16 pages
Fractal Random Walk and Classification of ECG Signal
No ratings yet
Fractal Random Walk and Classification of ECG Signal
10 pages
PS2 Sol
No ratings yet
PS2 Sol
11 pages
Strucchange: An R Package For Testing For Structural Change in Linear Regression Models
No ratings yet
Strucchange: An R Package For Testing For Structural Change in Linear Regression Models
17 pages
Recurrence Plot TK
No ratings yet
Recurrence Plot TK
3 pages
Power Versus Frequency of Observation : Yale University, New Haven, C T 06520, USA
No ratings yet
Power Versus Frequency of Observation : Yale University, New Haven, C T 06520, USA
6 pages
Box 1953
No ratings yet
Box 1953
19 pages
Some Comparisons of The Relative Power of Simple Tests For Structural Change in Regression Models
No ratings yet
Some Comparisons of The Relative Power of Simple Tests For Structural Change in Regression Models
8 pages
TMP 2 A08
No ratings yet
TMP 2 A08
6 pages
Testing For Changes in The Error Distribution in Functional Linear Models
No ratings yet
Testing For Changes in The Error Distribution in Functional Linear Models
18 pages
Basic Concepts of Non-Parametric Methods (Statistics)
No ratings yet
Basic Concepts of Non-Parametric Methods (Statistics)
18 pages
Statitistic
No ratings yet
Statitistic
11 pages
Anna's Archive
No ratings yet
Anna's Archive
12 pages
4355772
No ratings yet
4355772
8 pages
3 Residual Analysis
No ratings yet
3 Residual Analysis
5 pages
Testing The Statistiscal Independence of Continuous Random Variables. A New Robust Algorithm
No ratings yet
Testing The Statistiscal Independence of Continuous Random Variables. A New Robust Algorithm
5 pages
Solution 1
No ratings yet
Solution 1
14 pages
Statistics For Management Unit 1 - V1
No ratings yet
Statistics For Management Unit 1 - V1
31 pages
P01 Arima
No ratings yet
P01 Arima
68 pages
217 - Chapter 4 REGRESSION AND CORRELATION
No ratings yet
217 - Chapter 4 REGRESSION AND CORRELATION
69 pages
Journal of Financial Economics: Jonathan B. Cohn, Zack Liu, Malcolm I. Wardlaw
No ratings yet
Journal of Financial Economics: Jonathan B. Cohn, Zack Liu, Malcolm I. Wardlaw
23 pages
STT201 Exam
No ratings yet
STT201 Exam
3 pages
Pica Daniel Homework 2
No ratings yet
Pica Daniel Homework 2
5 pages
Mangalam-KeltyStephen-etal 2020 FractalFluctuationsInPosturePredictPerception
No ratings yet
Mangalam-KeltyStephen-etal 2020 FractalFluctuationsInPosturePredictPerception
18 pages
Recurrence Quantification Analysis of Processes and Products of Discourse: A Tutorial in R
No ratings yet
Recurrence Quantification Analysis of Processes and Products of Discourse: A Tutorial in R
24 pages
Kennel Brown Abarbanel PhysRevA.45.3403
No ratings yet
Kennel Brown Abarbanel PhysRevA.45.3403
9 pages
CASTILLO_MAED Music Tasks 1 and 2
No ratings yet
CASTILLO_MAED Music Tasks 1 and 2
5 pages
Sebastian Wallot and Giuseppe Leonardi: Methods Published: 04 December 2018 Doi: 10.3389/fpsyg.2018.02232
No ratings yet
Sebastian Wallot and Giuseppe Leonardi: Methods Published: 04 December 2018 Doi: 10.3389/fpsyg.2018.02232
21 pages
2A.3 Lecture Slides20 LDV 1
No ratings yet
2A.3 Lecture Slides20 LDV 1
21 pages
FCVM 09 875434
No ratings yet
FCVM 09 875434
16 pages
Allsop - 2016 - Coordination and Collective Performance: Cooperative Goals Boost Interpersonal Synchrony and Task Outcomes
No ratings yet
Allsop - 2016 - Coordination and Collective Performance: Cooperative Goals Boost Interpersonal Synchrony and Task Outcomes
11 pages
Calculation of Average Mutual Information (AMI) and False-Nearest Neighbors (FNN) For The Estimation of Embedding Parameters of Multidimensional Time Series in Matlab
No ratings yet
Calculation of Average Mutual Information (AMI) and False-Nearest Neighbors (FNN) For The Estimation of Embedding Parameters of Multidimensional Time Series in Matlab
10 pages
PSGR Krishnammal College For Women
No ratings yet
PSGR Krishnammal College For Women
4 pages
Copy of Assignment5_Fall 2024
No ratings yet
Copy of Assignment5_Fall 2024
14 pages
Customer Month Age Region Product Sales Revenue
No ratings yet
Customer Month Age Region Product Sales Revenue
11 pages
Likelihood - and - Probability
No ratings yet
Likelihood - and - Probability
11 pages
Revisiting A 90-Year-Old Debate The Advantages of The Mean Deviation
No ratings yet
Revisiting A 90-Year-Old Debate The Advantages of The Mean Deviation
15 pages
CH - 12 - Serial Correlation and Heteroskedasticity in Time Series Regressions
No ratings yet
CH - 12 - Serial Correlation and Heteroskedasticity in Time Series Regressions
19 pages
Response Surface Methodology and MINITAB
100% (1)
Response Surface Methodology and MINITAB
22 pages
Chi-Square Test: Prem Mann, Introductory Statistics, 7/E
No ratings yet
Chi-Square Test: Prem Mann, Introductory Statistics, 7/E
33 pages
Silent Reading Fluency and Comprehension in Bilingual Children
No ratings yet
Silent Reading Fluency and Comprehension in Bilingual Children
12 pages
AP Statistics - Summary of Confidence Intervals and Hypothesis Tests
No ratings yet
AP Statistics - Summary of Confidence Intervals and Hypothesis Tests
5 pages
The Wilcoxon Rank-Sum Test: Example 1
No ratings yet
The Wilcoxon Rank-Sum Test: Example 1
10 pages
Chapter 6 - 2-2
No ratings yet
Chapter 6 - 2-2
14 pages
Fundamentals of Data Science and Analytics On Descriptive Analysis
No ratings yet
Fundamentals of Data Science and Analytics On Descriptive Analysis
53 pages
Case - Ambulance and Fire Department Response Internal Study
No ratings yet
Case - Ambulance and Fire Department Response Internal Study
5 pages
Bayes Lecture Notes
No ratings yet
Bayes Lecture Notes
172 pages
Chi Square
No ratings yet
Chi Square
10 pages
Forecast Error: Suresh K Jakhar, PHD Indian Institute of Management Lucknow
No ratings yet
Forecast Error: Suresh K Jakhar, PHD Indian Institute of Management Lucknow
21 pages
Quantitative Tools and Techniques: Public Policy and Business
No ratings yet
Quantitative Tools and Techniques: Public Policy and Business
8 pages
Test of Difference (Non Parametric)
No ratings yet
Test of Difference (Non Parametric)
18 pages
Calendar Variation Model Based On ARIMAX
No ratings yet
Calendar Variation Model Based On ARIMAX
13 pages
BEST Linear Estimators
No ratings yet
BEST Linear Estimators
8 pages
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
100% (1)
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
8 pages
CH 11 Quiz
No ratings yet
CH 11 Quiz
3 pages
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
Stationary and Related Stochastic Processes: Sample Function Properties and Their Applications
From Everand
Stationary and Related Stochastic Processes: Sample Function Properties and Their Applications
Harald Cramér
4/5 (2)
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Cross Correlation: Unlocking Patterns in Computer Vision
From Everand
Cross Correlation: Unlocking Patterns in Computer Vision
Fouad Sabry
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet

Deriving Inferential Statistics From Recurrence Plots: A Recurrence-Based Test of Differences Between Sample Distributions and Its Comparison To The Two-Sample Kolmogorov-Smirnov Test

Uploaded by

Deriving Inferential Statistics From Recurrence Plots: A Recurrence-Based Test of Differences Between Sample Distributions and Its Comparison To The Two-Sample Kolmogorov-Smirnov Test

Uploaded by

Deriving inferential statistics from recurrence plots: A recurrence-based test of

differences between sample distributions and its comparison to the two-sample

Citation: Chaos 28, 085712 (2018); doi: 10.1063/1.5024915

Deriving inferential statistics from recurrence plots: A recurrence-based test

Now, a p-value can be derived from the observed RRwithin

You might also like