Practical Method For Determining The Minimum Embedding Dimension
Practical Method For Determining The Minimum Embedding Dimension
Abstract
A practical method is proposed to determine the minimum embedding dimension from a scalar time series. It has the
following advantages: (1) does not contain any subjective parameters except for the time-delay for the embedding; (2) does
not strongly depend on how many data points are available; (3) can clearly distinguish deterministic signals from stochastic
signals; (4) works well for time series from high-dimensional attractors; (5) is computationally efficient. Several time series
are tested to show the above advantages of the method.
There have been many discussions on how to deter- and indicated by large singular values, is an esti-
mine the optimal embedding dimension from a scalar mate of the dimension of the smallest space that
time series based on Takens' theorem [1] or its exten- contains the trajectory. This approach is also sub-
sions [2]; for a survey, see e.g., O t t e t al. [3]. Follow- jective to some extent. As Mees et al. [6] pointed
ing are the three basic methods which are usually used out, the number of large singular values may de-
to choose the minimum embedding dimension: pend on the details of the embedding and the ac-
(1) computing some invariant on the attractor [4]. By curacy of the data as much as they do on the
increasing the embedding dimension used for the dynamics of the system.
computation one notes when the value of the in- (3) the method of false neighbors [7]. It was devel-
variant stop changing. The typical problem with oped based on the fact that choosing too l o w an
this approach is that it is often very data inten- embedding dimension results in points that are
sive, certainly subjective, and time-consuming for far apart in the original phase space being moved
computation. closer together in the reconstruction space. Cer-
(2) singular value decomposition [5]. The procedure tainly this method is a good approach. But the
identifies orthogonal directions in the embedding criterion in [7] is subjective in some sense for say-
space which may be ordered according to the mag- ing that a neighbor is false, where different values
nitude of the variance of the trajectory's projection of the parameters Rtol and Atol may lead to dif-
on them. The ordering is done using the singu- ferent results (see example (v) later). Therefore,
lar values of the embedding. The number of these for realistic time series, different optimal embed-
directions visited by the reconstructed trajectory, ding dimensions are probably obtained if we use
different values of the parameters. In addition, one Perfect embedding means that no false neighbors ex-
will see some other weaknesses of the false neigh- ist. This is the idea of the false neighbor method in
bor method through our examples in this paper. [7], where the authors diagnosed a false neighbor by
There are other methods and some modified meth- seeing whether the a(i, d) (note: in [7], a(i, d) =
ods developed, based on the above methods, e.g., [8], Ixi+dr -- Xn(i,d)+drl/[[yi(d) Yn(i,d)(d)ll, which is a
but they are more or less subjective as well in deter- little bit different from Eq. (1)) is larger than some
mining the m i n i m u m embedding dimension, or con- given threshold value. The problem is how to choose
tain the other shortcomings we mentioned above. this threshold value. From the definition of a(i, d) in
The method presented in this paper overcomes the [7] or Eq. (1), one can see that the threshold value
shortcomings of the above methods. Suppose that we should be determined by the derivative of the under-
have a time series Xl, x 2 , . . •, XN. The time-delay vec- lying signal, therefore, for different phase points i,
tors can be reconstructed as follows: a(i, d) should have different threshold values at least
in principle. Furthermore, different time series data
Yi (d) = (xi, Xiq-r . . . . . Xi+(d-1)r), may have different threshold values. These imply that
i=1,2 ..... N-(d 1)r, it is very difficult and even impossible to give an ap-
propriate and reasonable threshold value which is in-
where d is the embedding dimension and r is the time-
dependent of the dimension d and each trajectory's
delay. Note that Yi (d) means the ith reconstructed vec-
point, as well as the considered time series data.
tor with embedding dimension d. Similar to the idea
To avoid the above problem, we instead define the
of the false neighbor method [7], we define
following quantity, i.e., the mean value of all a (i, d)'s,
Ily~(d + 1) - Yn(i,d)(d + 1)ll
a(i, d) = N--dr
Ilyi (d) - Yn(i,d) (d)[I 1
E ( d ) -- N - d------~ ~ a(i, d). (2)
i=1,2 ..... N-dr, (1) i=1
where 1-[[ is some measurement of Euclidian distance E (d) is dependent only on the dimenston d and the
and is given in this paper by the maximum norm, i.e.. lag r . To investigate its variation from d to d 4- 1. we
define
lye(m) - yl(m)[] = max [x/~+jr -- xl+]~l;
O< )<_m-1
El(d) = E(d + 1)/E(d). (3)
yi(d + 1) is the ith reconstructed vector with
embedding dimension d ± 1. i.e.. yi(d - 1) = We found that E 1 (d) stops changing when d is greater
(xi. xi+r . . . . . Xi-dr); n(i, d) (1 <_ n(i, d) < N dr) than some value do if the time series comes from an
is an integer such that Ynfi.d)(d) is the nearest neigh- attractor. Then do -I- 1 is the minimum embedding di-
bor of Yi (d) in the d-dimensional reconstructed phase mension we look for.
space in the sense of distance II - we defined above. Remarks. The parameter r is a necessary parameter
n(i, d) depends on i and d. which must be given before the minimum embedding
Notes. (1) The n(i, d) in the numerator of Eq. (1) is dimension is determined numerically no matter what
the same as that in the denominator. (2) If Yn(i,d)(d) methods are used. Although m principle the embed-
equals Yi (d), w e take the second nearest neighbor in- ding dimension is independent of the time delay r.
stead of it. the minimum embedding dimension is dependent on r
If d is qualified as an embedding dimension by the in practice. Different values of r may lead to differ-
embedding theorems [1,2], then any two points which ent minimum embedding dimensions, especially for
stay close in the d-dimensional reconstructed space time series from continuous time systems, see exam-
will be still close in the (d + 1)-dimensional recon- ple (vi) later (for time series from discrete time maps,
structed space. Such a pair of points are called true the best choice of r is 1). It can be easily understood
neighbors, otherwise, they are called false neighbors. that a good choice of r may decrease the minimum
L. Cao/Physica D 110 (1997) 43-50 45
N-dr
Dimension
1
E* (d) -- N - d r ~ [Xi+d'c - - Xn(i'd)q-d'c 1' (4) Fig. 1. The values E1 and E2 for time series data from the
i=1 chaotic Hrnon attractor. " E I - I T " and "EI-10T" represent E1
values obtained using 1000 data points and 10 000 data points,
where the meaning of n(i, d) is the same as above, respectively, and the similar meaning for the " E 2 - I T " and
i.e., it is the integer such that Yn(i,cl)(d) is the nearest "E2-10T".
neighbor of Yi (d), We define
strongly on the length of the time series, we calculate
E 2 ( d ) = E * ( d q- 1)/E*(d). (5)
the E l ( d ) and the E 2 ( d ) using 1000 data points and
For time series data from a random set of numbers, 10 000 data points, respectively, where we let the time
E 1 (d), in principle, will never attain a saturation value delay v equal to 1. Shown in Fig. 1 are our results.
as d increases. But in practical computations, it is dif- Very clearly the minimum embedding dimension is 2,
ficult to resolve whether the E 1 (d) is slowly increas- and the result does not strongly depend on how many
ing or has stopped changing if d is sufficiently large. data points are used.
In fact, since available observed data samples are lim- (ii) data from the x-component values of Ikeda at-
ited, it may happen that the E 1 (d) stops changing at tractor [11]. The Ikeda map is as follows:
some d although the time series is random. To solve
xn+l = p + lZ(Xn cos(t) - Yn sin(t)),
this problem, we can consider the quantity E 2 ( d ) . For
Yn+l = #(Xn sin(t) + Yn cos(t)),
random data, since the future values are independent
of the past values, E 2 ( d ) will be equal to 1 for any d where p = 1.0,/z = 0.9, t = k - ~ / ( 1 + x 2 ÷ y2)
in this case. However, for deterministic data, E 2 ( d ) is with k = 0.4 and o~ = 6.0. The map is the same as
certainly related to d, as a result, it cannot be a con- that considered in [7]. In the same way as above, we
stant for all d; in other words, there must exist some also calculate the E l ( d ) and the E 2 ( d ) using 1000
d ' s such that E 2 ( d ) 7~ 1. data points and 10 000 data points, respectively, where
We recommend calculating both E 1 (d) and E 2 ( d ) r = 1. Shown in Fig. 2 are our results. Very clearly
for determining the m i n i m u m embedding dimension the minimum embedding dimension is 4, which is the
of a scalar time series, and to distinguish deterministic same as the result in [7]. Our result does not strongly
data from random data. Now we turn to test several depend on how many data points are used. Here we
time series. also used the false neighbor method, and found that
(i) data from the x-component values of H r n o n if we use 1000 data points, the result is not stable,
attractor [10] with the usual parameters (a = 1.4, i.e., the percentage of false neighbors increases after
b = 0.3). To investigate whether our method depends dimension 4, This implies that more data points are
46 L. Cao/Physica D 110 (1997) 43-50
1.6 , 1.2
..o.
1.4 ............. ~e..~ ,-~''x'-~ ...... "-" . . . . . . " 4~ ...... t3 ....
1,
0.4 m "El-iT"
/ / "EI-10T" -+--
"EI-10T" -+--- 0.2 / ,/ "E2-1T" -÷---
/ "E2-1T" -e--- /,~//// E2 10T
0.2
// "E2-10T" -x....
0 • I i i ~ I i p
0 2 3 4 5 6 7 8
1 2 3 4 5 6 7 B 9
Dimension
Dimension
Fig. 4. The same as the above figures but the data come from
Fig. 2. The same as Fig. 1 but the data come from the Ikeda
random colored noise.
attractor.
1.2
r = 15 x 0.01 = 0.15). One can see that 3 is the
m i n i m u m embedding dimension.
1 j,, "-.,.. ~ ......... ~-. . . . . . . . . . . . . ,,,~........ = ........ (iv) data from random colored noise. We generate
time series data {Xn} through:
0.8
0.4
0.6 ~/i'// '~'////
~ / ? ......~..........
where {Yn} is a white Gaussian time series. Such a
time series can fool the false neighbor method. But our
method can clearly distinguish it from deterministic
"EI-IT" -*-- chaos. Fig. 4 shows the results on this time series
"EI-10T" -÷---
0.2 //" "E2-1T" -~--- by our method, where r = 1. Here the E 2 values
"E2-10T" ...x-....
approximately equal 1 for any d and have certainly no
r
0
1
I
2
t
3
~
4 5
5
6
I
7 8
I
relation to the E 1 values. As a comparison, we show
Dimension the results by the false neighbor method in Fig. 5,
where the values of two parameters in the method are:
Fig. 3. The values E1 and E2 for the data from the chaotic
Atol = 2 and R t o l = 10. One can see that the false
Lorenz attractor.
neighbor method cannot distinguish this time series
from chaotic time series.
needed using the false neighbor method than using our (v) data from Mackey-Glass delay-differential
method. equation [13],
(iii) data from the x-component values of Lorenz
dx(t) 0.2x(t - A)
attractor [12] with the usual parameters (o- = 10, r = dt -- - 0 . 1 x ( t ) -t- 1 + x ( t - A) 16' A = 100.
28, b ----- 8/3). We numerically integrate the equa-
(6)
tions with integral step 0.01, and record the time series
data from the numerical solution (x-component val- Here our aim to choose A = 100 in the above equation
ues) with sampling time 0.01 after all transients have is to generate a time series from high-dimensional at-
diminished. We test this time series as above and show tractor [14]. The information dimension of the attrac-
the results in Fig. 3, where r = 15 sampling time (i.e., tor generated by Eq. (6) is about 10 estimated by the
L. Cao/Physica D 1]0 (1997) 43-50 47
100 - 1.2
9O ,:' ", x
...... × ÷ . . ~ - ~ ~ , ~ . . . ~ - x--,~-,-~ .a. ~.. ........ o.
~-'7"--n m--~-- - .~," ~-__,,, ~ -.~...-~-~'=.~-..~-..~
8O // ~~~
g
?
7O
0.8
Q 60 f
5O 0.6
Z
Z
40
0.4
ZP
30
I.EI.IT,
2O " E l - l O T " --~--
0.2 "E2-1T" -~---
"E2-1OT" - , .....
10
...... x i i
0
2 3 4 5 6 10 15 20 25
Dimension Dimension
Fig. 5. The percentage of false nearest neighbors for the same Fig. 6. The values E1 and E2 for the data from the
time series data as used in Fig. 4, i.e., the random colored noise. Mackey~31ass delay-differential equation with delay equal
The line with the diamonds corresponds to the case of using 100.
1000 data points and the line with the crosses to the case of
using 10000 data points.
We now test the above M a c k e y - G l a s s time series
using the false neighbor method. We fix the parame-
Kaplan-Yorke formula [15], see also [16]. For a more
ter A t o l = 2 and let the parameter Rto] = 10, 20, 30
accurate estimate of the attractor dimension, one can
and 40, respectively. The results are shown in Fig. 7
use the method proposed in [17]. As we know, most
(10 000 data points are used). One can see that the min-
recent discussions of time series analysis are only con-
imum embedding dimension is about 6 by the method
cerned with low-dimensional chaotic systems. This is
of false neighbors, which is too small since the infor-
quite unsatisfactory. Through this example, we show
mation dimension of the attractor is about 10. In addi-
that our method works very well for time series from
tion, we can also see that different values of Rto] can
high-dimensional attractors. We numerically integrate
lead to different minimum embedding dimensions (for
Eq. (6) with integral step 0.01, and record the time
example, if we think 0.5% is small enough, then the
series data from the numerical solution with sampling
minimum embedding dimension is, 6 for Rtol = 10, 5
time 6 after all transients have diminished. The results
f o r Rtol ----=20, and 4 for Rto 1 = 30 and 40).
for this time series are shown in Fig. 6 by our method,
In the following we would like to test two unusual
where r = 1 sampling time (i.e., r = 1 x 6). One
time series. The first one is usually considered to be
can see that the E 1 (d) attains its saturation value at
an example for which it is difficult to determine the
d = 17 and so does the E 2 ( d ) , therefore, 17 should
embedding dimension. For the second one, we know
be the m i n i m u m embedding dimension for the time
its exact embedding dimension, we just want to test if
series in this example. Obviously the result does not
our method can give the correct result.
strongly depend on how many data points are used. A
(vi) data from toms. The data are generated from
curious phenomenon in Fig. 6 is that there is a sudden
the flow:
dip at d = 15. We do not fully understand it. The rea-
son m a y be related to the fact that the sampling time x(t) = sin(t) + 0.3sin(rrt + 1), t c R (7)
used to generate the time series was too large so that
the data is best modeled by a discrete map rather than with the sampling time 0.1. Note that Eq. (7) is a flow,
a differential equation. We will be investigating it in not a map. Shown in Fig. 8 are our results, where time
our future work. delay r = 10 sampling time (i.e., r = 10 x 0.1 = 1).
48 L. Cao/Physica D 110 (1997) 43-50
5 5
o a
~4
e-
~o 3
o
ft.. 13..
z 2 zZ 2
Z
q)
ii
4 6 8 10 4 6 8 10
Dimension Dimension
z2t\
o c ~ o I- -E_Lo .e c _
4 6 8 0 4 6 8 10
Dimension Dimension
Fig. 7. The percentage of false nearest neighbors for the same time series data as used in Fig. 6. Here 10000 data points are used.
We only plot the points with the percentage of false nearest neighbors less than 5. The dashed straight line corresponds to the
percentage equal to 0.5. (a) Rto1 = 10. (b) Rto1 ~- 20, (c) RtoI = 30. (d) Rto1 = 40.
Very clearly the minimum embedding dimension is Certainly the minimum embedding dimension is 4 for
3, and the result does not strongly depend on how time series from this map because we can exactly write
many data points are used. Here we also used the down the following predictive model with embedding
false neighbor method and found the same results. We dimension 4 for this time series,
now take this example to test how the minimum em-
bedding dimension is dependent on r. Fig. 9 shows Xn+4 ~ F ( x n , X n + l , Xn+2, Yn+3)
the results with r -- 2 sampling time by our method, = sin(xn + 5) + sin(2Xn+l + 5)
where one can obviously see that the minimum em-
+ sin(3xn+2 + 5) + sin(4xn+3 + 5).
bedding dimension becomes equal to 4. In fact, r = 2
is too small and therefore the components of the re- We check our method by testing this time series.
constructed vectors with this r are strongly correlated, Shown in Fig. 10 are our results, where r = 1. Obvi-
as a result, the dimension which is needed to frame ously we obtain the minimum embedding dimension
the attractor is larger than that with larger r such as 4 which is the same as the theoretical result. We also
r = 10. test this time series using the false neighbor method,
(vii) data from the following map: and show the results in Fig. 11, where Atol -- 2,
Rtol = 10. F r o m Fig. 11, we can see that the false
Xn+4 = sin(xn + 5) + sin(2xn÷l + 5) neighbor method cannot distinguish this time series
+ sin(3xn+2 + 5) + sin(4xn÷3 + 5). (8) from random data if using 1000 data points, and the
L. Cao/Physica D 110 (1997) 43-50 49
1.2 1.4 i
1.2
1
.~" / ........ G ............
/ ~ ~ ......... ~ - - - + ~ : = = _ _
0,8
0.6
! 0
......,~/i
............ a'
:....
/
,./
i /
i
/
0.4
0.2
/ " E I - I T " --*--
"El-lOT" -+---
"E2-1T" -e---
"E2-10T = -*-.-
0
0 •
/ /
~t ;'
I /' "EI-IT" -¢~
"El-lOT"-+---
"E2-1T" -,~----
// "E2-1OT" -~--
I i i I i I
. . . . ,
2 3 4 5 6 7 8 9 2 3 4 5 6 7 8
Dimension Dimension
Fig. 8. The values E1 and E 2 for the data from the torus Fig. 10. The values E1 and E T f o r the data from the map
Eq. (7), with r = 10 sampling time. Eq. (8).
100
1.2
70 \~
o.8 " /
60
g- 50
,T,
°'I
I 7/,/
'~
40
30
/ / ' / J/' "E1-1T"-4~
t,_~/~
0.2 I-/" / ~ E 2 - 1 T "
"EvlOT"...... -.a.---
20
I///
0V 1
l
2
i
3 4
i i
5
i
6
i
7 8
i
0
1 2 3 4 5 6 7 8 9 10
Dimension
Dimension
Fig. 11. The percentage of false nearest neighbors for the same
Fig. 9. Same as Fig. 8, but with ~ = 2 sampling time.
data as used in Fig. 10, i.e., generated from the m a p Eq. (8).
The line with the diamonds corresponds to the case o f using
1000 data points and the line with the crosses to the case of
result is not stable if using 10 000 data points, where using 10000 data points.
the percentage of false neighbors increases after
dimension 4. dynamics is unknown. We would like to test a realistic
The above examples have shown us the effective- time series as our final example.
ness of our method in applying to determine the min- (viii) data from the experimental laser data -
imum embedding dimension o f a scalar time series. Santa Fe Data A [18]. The time series contains 1000
Since it does not contain any free parameters except samples. Fig. 12 shows the results by our method, and
the time delay r , we could expect that our method one can see that the m i n i m u m embedding dimension
would be more useful than the previous methods in is 7, where r = 1. As a comparison, we show the
practical time series analysis where the underlying result by the false neighbor method in Fig. 13, where
50 L Cao/Physica D 110 (1997) 43-50
1,1
./+I" "".... .../" "..~--? ...... , ....... , ........ presented in this paper is v e r y effective. W e hope that
our m e t h o d will be useful in applications of nonlinear
0.9 techniques to analyze realistic t i m e series as w e l l as
artificial t i m e series.
0.7
,v, Acknowledgements
0.5
2 3 4 5 6 7 8 9 10 11 12
Dimension
References
Fig. 12., The values E1 and E2 for the experimental laser data
- Santa Fe Data A, where r = 1 sampling time, and the time [1] F. Takens, Lecture Notes in Mathematics, Vol. 898
series has only 1000 samples. (Springer, Berlin, 1981) p. 366.
[2] T. Sauer, J.A. Yorke and M. Casdagli, J. Stat. Phys. 65
(1991) 579.
3.5
[3] E. Ott, T. Saner and J.A. Yorke, Coping with Chaos (Wiley,
New York, 1994).
3! [4] R Grassberger and I. Procaccia, Phys. Rev. Lett. 50 (1983)
346.
[5] D.S. Broomhead and G.R King, Physica D 20 (1986)
g
217.
.o 2.5 2 1 [6] A.I. Mees, RE. Rapp and L.S. Jennings, Phys. Rev. A 36
g_ (1987) 340.
Z [7] M. Kennel, R. Brown and H. Abarbanel, Phys. Rev. A 45
z 1.5
(1992) 3403.
[8] D. Kaplan and L. Glass, Phys. Rev. Lett. 68 (1992) 427;
1
A.M. Albano, J. Muench, C. Schwartz, A.I. Mees and
RE. Rapp, Phys. Rev. A 38 (1988) 3017; For a review; see
0.5 H.D. Abarbanel, R. Brown, J. Sidorowich and L. Tsimring,
Rev. Mod. Phys. 65 (1993) 1331.
0 i i i r i i [9] A.M. Fraser and H.L. Swinney, Phys. Rev. A 33 (1986)
2 3 4 5 6 7 8 9 10
1134; See also W. Liebert and H.G. Schuster, Phys. Lett.
Dimension A 142 (1989) 107.
[10] M. Hdnon, Commun. Math. Phys. 50 (1976) 69.
Fig. 13. The percentage of false nearest neighbors for the Santa [11] K. Ikeda. Opt. Commun. 30 (1979) 257; S.M. Hammel,
Fe Data A. where ~ -- 1 sampling time, A t o I = 2. R t o 1 = 10. C.K.R.T. Jones and J.V. Moloney, J. Opt. Soc. Amer. 2 B
(1985) 552.
[12] E.N. Lorenz, J. Atmospheric Sci. 20 (1963) 130.
Am] = 2, Rtol ----- 10. O n e can see that the result in [13] M.C. Mackey and L. Glass, Science 197 (1977) 287.
Fig. 13 is not stable, w h e r e the percentage of false [141 J.D. Farmer, Physica D 4 (1982) 366.
[15] J.R Eckmann and D. Ruelle, Rev. Modem Phys. 57 (1985)
neighbors increases after d i m e n s i o n 7.
617.
In summary, we have p r o p o s e d a m e t h o d for de- [16] M. Casdagli, Physica D 35 (1989) 335.
t e r m i n i n g the m i n i m u m e m b e d d i n g d i m e n s i o n f r o m a El7] M. Ding, C. Grebogi, E. Ott, T. Sauer and J.A. Yorke,
scalar t i m e series, w h i c h has the advantages as m e n - Physica D 69 (1993) 404.
[18] A.S. Weigend and N.A. Gershenfeld, Time Series
tioned in the abstract. Several time series w e r e tested. Prediction: Forecasting the Future and Understanding the
and the numerical results showed that the m e t h o d Past (Addison-Wesley, Reading, MA, 1994).