Gros C. Complex and Adaptive Dynamical Systems. A Comprehensive Introd. 5ed 2024
Gros C. Complex and Adaptive Dynamical Systems. A Comprehensive Introd. 5ed 2024
Complex
and Adaptive
Dynamical
Systems
A Comprehensive Introduction
Fifth Edition
Complex and Adaptive Dynamical Systems
Claudius Gros
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2008, 2011, 2013, 2015, 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
These are a few examples treated in this textbook. Complex systems theory is a
fascinating subject becoming increasingly important in our socio-scientific culture,
with its rapidly growing degrees of complexity. It is also a deeply interdisciplinary
subject, providing the mathematical basis for applications in fields ranging from
the social sciences and psychology on one side to the neurosciences and artificial
intelligence on the other side.
Readership and Preconditions This primer is intended for students and scientists
from the natural and social sciences, engineering, informatics, experimental psy-
chology, or neuroscience. Technically, the reader should have a basic knowledge
of mathematics as it is used in the natural sciences or engineering. This textbook
is suitable both for studies in conjunction with teaching courses as well as for the
individual reader.
Course Material and the Modular Approach When used for teaching, this
primer is suitable for a course running over one or two terms, depending on the
pace and on the number of chapters covered. In general, there should be no problem
to follow the mathematics involved. Considerable care has been taken to perform
the respective derivations on a step-by-step basis.
vii
viii Preface
Style This interdisciplinary primer sets a high value on conveying concepts and
notions within their respective mathematical settings. Believing that a concise style
helps the reader to go through the material, this textbook mostly abstains from long
text passages with general background explanations or philosophical considerations.
Widespread use has been made of paragraph headings with the intention to facilitate
scientific reading.
Exercises and Suggestions for Further Readings Towards the end of each
chapter, a selection of exercises is presented. The respective solutions can be found
in a dedicated chapter. In addition, the section “Further Reading” at the end of each
chapter contains references to standard introductory textbooks and review articles.
Also listed are selected articles for further in-depth studies dealing with specific
issues treated within the respective chapter.
Content of the Fifth Edition Since its first printing in 2008, this textbook has seen
several extensions. The present fifth edition has been completely revised. As a main
addition, the complex systems theory of modern machine learning architectures has
been included in a new chapter. In addition, a range of new sections have been
added to existing chapters, each dealing with an exciting topic. Such as “Rate
Induced Tipping”, “Partially Predictable Chaos”, the “Tragedy of the Commons”,
and “Piecewise Linear Dynamical Systems”, to mention a few. The field of complex
systems is continuously evolving, and so is this book.
their comments, for the preparation of figures and the reading of the manuscript.
Particular thanks go to Roser Valentí for continuing support.
1 Network Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Properties of Real-World Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 The Small World Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Basic Graph-Theoretical Concepts . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Network Degree Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Spectral Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.1 Graph Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3 Percolation in Generalized Random Graphs . . . . . . . . . . . . . . . . . . . . . . . 17
1.3.1 Graphs with Arbitrary Degree Distributions . . . . . . . . . . . . . 17
1.3.2 Probability Generating Function Formalism . . . . . . . . . . . . . 23
1.3.3 Distribution of Component Sizes . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.4 Robustness of Random Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.5 Small-World Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.6 Scale-Free Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2 Bifurcations and Chaos in Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.1 Basic Concepts of Dynamical Systems Theory . . . . . . . . . . . . . . . . . . . 45
2.2 Fixpoints, Bifurcations and Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.2.1 Fixpoints Classification and Jacobian . . . . . . . . . . . . . . . . . . . . 53
2.2.2 Bifurcations and Normal Forms . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.2.3 Hopf Bifurcations and Limit Cycles . . . . . . . . . . . . . . . . . . . . . 59
2.3 Global Bifurcations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.3.1 Infinite Period Bifurcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.3.2 Catastrophe Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.3.3 Rate Induced Tipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.4 Logistic Map and Deterministic Chaos . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.4.1 Colliding Attractors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.5 Dynamical Systems with Time Delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.5.1 Distributed Time Delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
xi
xii Contents
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3 Dissipation, Noise and Adaptive Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.1 Chaos in Dissipative Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.1.1 Phase Space Contraction and Expansion . . . . . . . . . . . . . . . . . 87
3.1.2 Strange Attractors and Dissipative Chaos . . . . . . . . . . . . . . . . 91
3.1.3 Partially Predictable Chaos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.2 Adaptive Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.2.1 Conserving Adaptive Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.3 Diffusion and Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.3.1 Random Walks, Diffusion and Lévy Flights . . . . . . . . . . . . . 106
3.3.2 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.4 Stochastic Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.4.1 Langevin Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.4.2 Stochastic Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.5 Noise-Controlled Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.5.1 Fokker–Planck Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.5.2 Stochastic Escape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.5.3 Stochastic Resonance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4 Self Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.1 Interplay Between Diffusion and Reaction . . . . . . . . . . . . . . . . . . . . . . . . 129
4.1.1 Travelling Wavefronts in the Fisher Equation . . . . . . . . . . . 131
4.1.2 Sum Rule for the Shape of the Wavefront. . . . . . . . . . . . . . . . 135
4.1.3 Self-Stabilization of Travelling Wavefronts. . . . . . . . . . . . . . 136
4.2 Interplay Between Activation and Inhibition . . . . . . . . . . . . . . . . . . . . . . 138
4.2.1 Turing Instability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.2.2 Pattern Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4.2.3 Gray–Scott Reaction Diffusion System . . . . . . . . . . . . . . . . . . 142
4.3 Collective Phenomena and Swarm Intelligence . . . . . . . . . . . . . . . . . . . 147
4.3.1 Phase Transitions in Social Systems . . . . . . . . . . . . . . . . . . . . . 147
4.3.2 Collective Decision Making and Stigmergy . . . . . . . . . . . . . 149
4.3.3 Collective Behavior and Swarms . . . . . . . . . . . . . . . . . . . . . . . . . 152
4.3.4 Opinion Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
4.4 Car Following Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
4.4.1 Linear Flow and Carrying Capacity . . . . . . . . . . . . . . . . . . . . . . 157
4.4.2 Self-Organized Traffic Congestions . . . . . . . . . . . . . . . . . . . . . . 158
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Contents xiii
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
Network Theory
1
Dynamical and adaptive networks are the backbone of many complex systems.
Examples range from ecological prey–predator networks to the gene- and protein
expression networks on which all living creatures are based. The basis of our own
identity is provided by the highly sophisticated neural network we carry in our
heads. On a social level we interact through social and technical networks like the
Internet. Networks are ubiquitous throughout the domain of all living creatures.
A good understanding of network theory is of basic importance for complex
systems theory. In this chapter we will discuss the most important notions of
graph theory, like clustering and degree distributions, together with basic network
realizations. Central concepts like percolation, the robustness of networks with
regard to failure and attacks, and the “rich-get-richer” phenomenon in evolving
social networks will be treated.
Eight or more billion humans live on earth and it might seem that the world is a big
place. But, as an Italian proverb states,
A B C D
1 2 3 4 5 6 7 8
Six Degrees of Separation About 20% of Milgram’s letters did eventually reach
their destination. Milgram found that it had only taken an average of six steps
for a letter to get from Nebraska to Boston. This result, dubbed “six degrees of
separation”, suggests that it should be possible to connect any two persons on earth
via social networks in a similar number of steps.
SMALL-WORLD EFFECT The “small-world effect” denotes the result that the average
distance linking two nodes belonging to the same network can be orders of magnitude
smaller than the number of nodes making up the network.
The small-world effect occurs in all kinds of networks. Milgram examined the
networks of friends, which may originate from participating in common activities.
This is the case also for other social networks, as illustrated in Fig. 1.1. An example
are managers serving together on the board of directors of the same company, which
results in a dense network of acquaintances between the managers.
Networks Are Everywhere Social networks are but just one important example
of a communication network. Most human communication takes place directly
among individuals. The spreading of news, rumors, jokes and of diseases takes place
1.1 Properties of Real-World Networks 3
Sui1
Hcr1
43S complex and Prt1 Tif34
protein metabolism Ribosome
Sua7
Tif5 biogenesis/assembly
Tif35 Nog2
Cdc68 Nop7
Cka2 Rpf2 Mak11
Rpg1 Tif6
Bud20Nsa2
CK2 complex and
Dbp10 Ytm1 Puf6 Arx1
transcription regulation Ckb2 Rlp7
Has1 Nug1 Ycr072c
Nop15
Ckb1 Erb1
Mak5 Sda1
Nop12 Mak21
Abf1 Cka1
Hta1 Nop4
Arp4 Nop2
Nop6 Brx1
Protein phosphatase Cic1
Hht1 Htb1 Mdn1 Rrp12
type 2A complex (part)
Rts3 Hhf1 DNA packaging,
Tpd3 Sir4
Pph21 chromatin assembly
Sir3
Cdc55
Sir1
Chromatin silencing
Pph22
Zds2 Rrp14
Hst1
Sif2
Hos2 Cell polarity,
Cla4 budding
Zds1 Gic1
Hos4 Bni5
Gic2 Cdc12
Snt1 Gin4
Bob1
Rga1
Cph1 Set3
Cdc42 Cdc11 Shs1
Set3c Ste20 Bem1
complex Far1
Kcc4 Cdc3
Cdc10
Cdc24 Pheromone response Cytokinesis
(cellular fusion) (septin ring)
by contacts between individuals. And we are all aware that rumors and epidemic
infections can spread fast in densely webbed social networks.
Communication networks are ubiquitous. Well known examples are the Internet
and the world-wide web. Inside a cell the constituent proteins form an interacting
network, as illustrated in Fig. 1.2. The same holds for artificial neural networks
and for the scaffolding of our brain, neural networks. It is therefore important to
understand the statistical properties of core network classes.
4 1 Network Theory
8 10 8 10
9 9
We start with some basic concepts allowing to characterize graphs and real-world
networks. We will use the terms graph and network as interchangeable, vertex, site
and node as synonyms, and either edge or link.
COORDINATION NUMBER The coordination number z is the average number of links per
vertex, viz the average degree.
A graph with an average degree z has .Nz/2 connections. Alternatively we can define
with p the probability to find a given edge.
CONNECTION PROBABILITY The probability that a given edge occurs is called the
connection probability p.
Nz 2 z
p=
. = (1.1)
2 N(N − 1) N −1
1.1 Properties of Real-World Networks 5
for the relation between the coordination number z and the connection probability
p. In (1.1) we divided the total number of active links, .N z/2, by the total number
of possible links, .N(N − 1)/2.
THE THERMODYNAMIC LIMIT The limit where the number of elements making up a
system diverges to infinity is called the “thermodynamic limit” in physics. A quantity is
extensive if it is proportional to the number of constituting elements, and intensive if it
scales to a constant in the thermodynamic limit.
For real-world networks, the thermodynamic limit makes sense for very large
numbers of nodes. This is the case for the world-wide web, the network of
hyperlinks, which is so large that its size can only be estimated. In order of
magnitude, the number of documents available online did grow from .N ≃ 0.8 × 109
in 1999 to .N ≃ 3.4 × 1015 in 2023.
zD ≈ N,
. D ∝ log N/ log z , (1.2)
in order of magnitude, since nodes have z neighbors, .z2 next-nearest neighbors, and
so on. The logarithmic increase in the number of degrees of separation with the size
of the network is characteristic of small-world networks. A logarithm increases very
slowly with its argument, which implies that network diameters remains small even
for networks containing large numbers of nodes.
For large networks, the diameter is closely related to the average distance.
AVERAGE DISTANCE The average distance .𝓁 is the average of the minimal path length
between all pairs of nodes of a network.
The clustering coefficient scales to zero in the thermodynamic limit, viz for large
networks. In Table 1.1 the respective clustering coefficients for some real-world
networks and for the corresponding random networks are listed for comparison.
CLIQUES A clique is a set of vertices for which (a) every node is connected by an edge
to every other member of the clique and (b) no node outside the clique is connected to all
members of the clique.
The term “clique” comes from social networks. A clique is a group of friends
where everybody knows everybody else. A clique corresponds, in terms of graph
theory, to a maximal fully connected subgraph. For Erdös–Rényi graphs with N
vertices and linking probability p, the number of cliques is
N−K
N
. p K(K−1)/2 1 − pK ,
K
Table 1.1 The number of nodes N , average degree of separation .𝓁, and clustering coefficient
C, for four real-world networks. The last column is the value .Crand = z/(N − 1) the clustering
coefficient would take for a random graph with the same size and coordination number. From
Watts and Strogatz (1998) and Ludueña et al. (2013)
Network N .𝓁 C .Crand
4 5 6
2 1
N −K
– .1 − pK the probability that every of the .N − K out-of-clique vertices is
not connected to all K vertices of the clique.
The only cliques occurring in random graphs in the thermodynamic limit have the
size two, since .p = z/N.
Also given in Table 1.1 are the clustering coefficients .Crand of random graphs
of identical size and coordination number. Note that the real-world value is
systematically higher than that of random graphs. The small values for the average
8 1 Network Theory
distances .𝓁 given in Table 1.1 for three of the four networks relects their small-world
nature.
Erdös–Rényi random graphs obviously do not match the properties of real-world
networks well. In Sect. 1.3 we will discuss generalizations of random graphs that
approximate the properties of real-world graphs better. Before, we will discuss
several general properties of random graphs.
Correlation Effects The degree distribution .pk captures the statistical properties
of nodes as if they where all independent of each other. In general, the property
of a given node will however be dependent on the properties of other nodes, e.g.
of its neighbors. When this happens one speaks of “correlation effects”, with the
clustering coefficient C being an example.
Another example for a correlation effect is what one calls “assortative mixing”.
A network is assortatively correlated whenever large-degree nodes, the hubs, tend
to be mutually interconnected and assortatively anti-correlated when hubs are pre-
dominantly linked to low-degree vertices. Social networks tend to be assortatively
correlated, in agreement with the everyday experience that the friends of influential
persons, the hubs of social networks, tend to be VIPs themselves.
Tree Graphs Most real-world networks show strong local clustering and loops
abound. An exception are metabolic networks which contain typically only very
few loops since energy is consumed and dissipated when biochemical reactions go
around in circles.
For many types of graphs commonly considered in graph theory, like Erdös–
Rényi graphs, the clustering coefficient vanishes in the thermodynamic limit, and
loops become irrelevant. One denotes a loopless graph a “tree graph”.
BIPARTITE GRAPH A bipartite graph has two kinds of vertices with links only between
vertices of unlike kinds.
DEGREE DISTRIBUTION If .Xk is the number of vertices having the degree k, then .pk =
Xk /N is the degree distribution, where N is the total number of nodes.
The
degree distribution is a probability distribution function and hence normalized,
. k pk = 1.
(pN)k zk
.pk ≃ e−pN = e−z , (1.5)
k! k!
where z is the average coordination number. We have used
x N N −1 (N − 1)! (N − 1)k
. lim 1− = e−x , = ≃ ,
N →∞ N k k!(N − 1 − k)! k!
and .(N − 1)k pk = zk , see (1.1). Equation (1.5) is a Poisson distribution with mean
∞ ∞
−z zk zk−1
〈k〉 =
. ke = z e−z = z.
k! (k − 1)!
k=0 k=1
The variance√of the Poisson distribution is z, which implies that the relative width
〈k〉/σ = 1/ k becomes progressively smaller when the coordination number z
.
increases.
B B B B
C C C C
A A A A
Fig. 1.5 Four networks with three nodes exist, respectively with .0/1/2/3 edges (from left to
right). Given a link probability p, the probability to generate networks with .0/1/2/3 edges is
.(1 − p) , .3p(1 − p) , .3p (1 − p) and .p . One speaks of ensemble fluctuations, compare (1.7)
3 2 2 3
will be given by
1
. 〈Xk 〉 = pk . (1.6)
N
Here .〈. . .〉 denotes the ensemble average. One can go one step further and calculate
the probability .P (Xk = R) that in a realization of a random graph the number of
vertices with degree k equals R. It is given in the large-N limit by
(λk )R
P (Xk = R) = e−λk
. , λk = 〈Xk 〉 . (1.7)
R!
Note the similarity to (1.5) and that the mean .λk = 〈Xk 〉 is in general extensive
while the mean z of the degree distribution (1.5) is intensive.
Power Law Distribution A special case are degree distributions decaying for large
degrees as a power law,
1
pk ∼
. , k → ∞. (1.8)
kα
Typically, for real-world graphs, the scaling .∼ k −α holds only for large degrees
k. The divergence at .k → 0 is avoided by the “Tsallis–Pareto distribution” .pk =
(α − 1)(1 + k)−α . For the normalization on .R + we consider .k ∈ [0, K], with
.K → ∞,
K K dk −1 K 1
. pk ≈ (α − 1) = =1− .
0 (1 + k)α (1 + k)α−1 k=0 (1 + K)α−1
k=0
1.2 Spectral Properties 11
log10(degree distribution)
distribution of the in-degrees slope -2.2
of the hyperlink network 6,242,194 domains
between 6,242,194 domains. 0
To a fair degree, the degree
distribution is scale invariant,
.pk ∼ 1/k . The value of the
α
-2
exponent, .α ≈ 2.2, implies
that the mean degree is finite,
with a diverging variance. At -4
the time of the crawl,
.〈k〉 = 35.8 was found. Data
2 3 4 5
from Ludueña et al. (2013)
log10(in-degree)
Powerlaw distributions are hence normalizable only if .α > 1. The average degree
of the Tsallis–Pareto distribution is
K k+1−1 α−1 K
〈k〉 = (α − 1)
. dk = (1 + k)2−α − 1,
0 (1 + k) α 2−α 0
which diverges when .α ≤ 2. Otherwise, when .α > 2, one has .〈k〉 = 1/(α − 2). The
variance is finite in analogy when .α > 3.
Scale-Free Graphs Power-law functional relations are “scale free”, since any
rescaling .k → a k can be reabsorbed into the normalization constant. A famous
example of a scale-invariant degree distribution is that of the in-degree of the
hyperlink network between domains, as shown in Fig. 1.6.
Scale-free functional dependencies are considered to be ‘critical’, since they
occur generally at the critical point of a phase transition. We will come back to
this issue recurrently in the following chapters.
ADJACENCY MATRIX The .N × N adjacency matrix . has elements .Aij = 1 if nodes i
and j are connected, with .Aij = 0 if they are not connected.
12 1 Network Theory
ρ(λ)
the expected result for the
thermodynamic limit
.N → ∞, the semi-circle law 0.05
(1.15), is given. A broadening
of .ϵ = 0.1 has been used, as
defined by (1.11)
0
-10 -5 0 5 10
λ
SPECTRUM OF A GRAPH The spectrum of a graph G is given by the set of eigenvalues .λi
of the adjacency matrix .Â.
A graph with N nodes has N eigenvalues .λi and it is useful to define the
corresponding “spectral density”
1
ρ(λ) =
. δ(λ − λj ), dλ ρ(λ) = 1 , (1.9)
N
j
where .δ(λ) is the Dirac delta function. In Fig. 1.7 the spectral density of an Erdös–
Rényi graph is given.
Green’s Function The spectral density .ρ(λ) can be evaluated once the Green’s
function1 .G(λ),
1 1 1 1
G(λ) =
. Tr = , (1.10)
N λ − Â N λ − λj
j
is known. Here .Tr[. . .] denotes the trace over the matrix .(λ − Â)−1 = (λ1 − Â)−1 ,
where .1 is the identity matrix. In complex plane, we can decompose the inverse,
1 1
. lim =P − iπ δ(λ − λj ) . (1.11)
ε→0 λ − λj + iϵ λ − λj
1 The reader without prior experience with Green’s functions may skip the following derivation
and pass directly to the result, namely to (1.15).
1.2 Spectral Properties 13
where P denotes the principal part, which takes into account that the positive and
the negative contributions of the .1/λ divergence cancel. We find the relation
1
ρ(λ) = −
. lim ImG(λ + iε) . (1.12)
π ε→0
Semi-Circle Law The graph spectra can be evaluated for random matrices2 for the
case of small link densities .p = z/N, where z is the average connectivity. We note
that the Green’s function (1.10) can be expanded into powers of .Â/λ,
⎡ 2 ⎤
1 Â Â
.G(λ) = Tr ⎣1 + + + ...⎦ . (1.13)
Nλ λ λ
Traces over powers of the adjacency matrix . can be interpreted, for random graphs,
as random walks which need to come back to the starting vertex.
Starting from a random site we can connect on the average to z neighboring
sites and from there on to .z − 1 next-nearest neighboring sites, and so on. This
consideration allows to recast the Taylor expansion (1.13) for the Green’s function
into a recursive continued fraction,
1 1
G(λ) =
. ≈ , (1.14)
λ− z
z−1 λ − z G(λ)
λ− z−1
λ− λ−...
since .limλ→∞ G(λ) = 0. The spectral density (1.12) then takes the form
√
4z − λ2 /(2π z) if λ2 < 4z
ρ(λ) =
. (1.15)
0 if λ2 > 4z
Loops and the Clustering Coefficient The total number of triangles, viz
the overall number of loops of length three in a network is, on the average,
.C(N/3)(z − 1)z/2, where C is the clustering coefficient. This relation holds for
2 Random matrix theory will be further discussed in Sect. 10.2.1 of Chap. 10.
14 1 Network Theory
large networks. One can relate the clustering coefficient C to the adjacency matrix
A via
z(z − 1) N
C
. = number of triangles
2 3
1 1
= Ai1 i2 Ai2 i3 Ai3 i1 = Tr A3 ,
6 6
i1 ,i2 ,i3
since three sites .i1 , .i2 and .i3 are interconnected only when the respective entries of
the adjacency matrix are unity. The sum of the right-hand side of above relation is a
moment of the graph spectrum. The factors .1/3 and .1/6 account for overcountings.
Above formula holds only when all node have identical degrees .ki ≡ z. Otherwise
one needs to take an average, .z(z − 1) → i ki (ki − 1)/N.
Moments of the Spectral Density The graph spectrum is directly related to certain
topological features of a graph via its moments. The lth moment of .ρ(λ) is given by
N
1 l
. dλ λ ρ(λ) =
l
λj
N
j =1
1 l 1
= Tr A = Ai1 i2 Ai2 i3 · · · Ail i1 , (1.16)
N N
i1 ,i2 ,...,il
as one can see from (1.9). The lth moment of .ρ(λ) is therefore equivalent to the
number of closed paths of length l, the number of all paths of length l returning to
the starting point.
d f (x + Δx) − f (x)
. f (x) = lim ,
dx Δx→0 Δx
⎧
⎨ ki i=j
Λij =
. Ail δij − Aij = −1 i and j connected , (1.17)
⎩
l 0 otherwise
where the .Λij = (Λ̂)ij are the elements of the Laplacian matrix, .Aij the adjacency
matrix, and where .ki is the degree of vertex i.
Apart from a sign convention, .Λ̂ corresponds to a straightforward generalization
of the usual Laplace operator .Δ = ∂ 2 /∂x 2 +∂ 2 /∂y 2 +∂ 2 /∂z2 . To see this, just apply
the Laplacian matrix .Λij to a graph-function .f = (f1 , . . . , fN ). The graph Laplacian
is hence intrinsically related to diffusion processes on networks.3 Alternatively one
defines by
⎧
⎨ 1 i=j
Lij =
. −1/ ki kj i and j connected , (1.18)
⎩
0 otherwise
the “normalized graph Laplacian”, where .ki = j Aij is the degree of vertex i. The
eigenvalues of the normalized graph Laplacian have a straightforward interpretation
in terms of the underlying graph topology. One needs however to remove first all
isolated sites from the graph, which do not generate entries to the adjacency matrix.
The respective .Lij would be ill defined.
0 = λ0 ≤ λ1 ≤ · · · ≤ λN −1 ≤ 2 .
.
1
e(λ0 ) = √
. k1 , k2 , . . . , kN , (1.19)
C
where C is a normalization constant and where the .ki are the respective vertex
degrees.
. λl ≤ N
l
holds generally. The equality holds for connected graphs, viz when .λ0 has
degeneracy one.
– For a complete N-site graph, viz when all sites are mutually interconnected, the
eigenvalues are
λ0 = 0,
. λl = N/(N − 1), l = 1, . . . , N − 1 .
– For a complete bipartite graph (all sites of one subgraph are connected to all other
sites of the other subgraph) the eigenvalues are
λ0 = 0,
. λN −1 = 2, λl = 1, l = 1, . . . , N − 2 .
1
e(λN −1 ) = √
. kA , . . . , kA , − kB , . . . , − kB . (1.20)
C
A sublattice B sublattice
Denoting with .NA and .NB the number of sites in two sublattices A and B, with
NA + NB = N, the degrees .kA and .kB of vertices belonging to sublattice A and
.
The most random of all graphs are Erdös–Rényi graphs. One can relax the extent
of randomness somewhat and construct random networks with an arbitrarily given
degree distribution. This procedure is also denoted “configurational model”. We will
use the configurational model to examine the “percolation transition”, which occurs
as a function for link probability p. Graphs decompose into a set of disconnected
subgraphs when containing only a few links, but not for high link densities. The
respective transition is the percolation transition.
DEGREE SEQUENCE A degree sequence is a specified set .{ki } of the degrees for the
vertices .i = 1 . . . N .
The degree sequence is also the first information one obtains when examining
real-world networks and hence the foundation for all further analysis.
pk ,
. N → ∞.
– Assign .ki “stubs” (ends of edges emerging from a vertex) to all vertices, .i =
1, . . . , N .
– Iteratively choose pairs of stubs at random and join them together to make
complete edges.
When all stubs have been used up, the resulting graph is a random member of
the ensemble of graphs with the desired degree sequence. Figure 1.8 illustrates the
construction procedure.
Average Degree and Clustering The mean number of neighbors is the coordina-
tion number
z = 〈k〉 =
. kpk .
k
18 1 Network Theory
Step A Step B
Fig. 1.8 Construction procedure of a random network with nine vertices and degrees .X1 = 2,
.X2 = 3, .X3 = 2, .X4 = 2. In step A (left) the vertices with the desired number of stubs (degrees)
are constructed. In step B (right) the stubs are connected randomly
The probability that one of the second neighbors of a given vertex is also a first
neighbor scales as .N −1 for random graphs, regardless of the degree distribution.
Loops can be ignored in the limit .N → ∞.
kpk (k + 1)pk+1
qk−1 =
. , qk = , (1.21)
j jpj j jpj
since .kpk is the degree distribution of a neighbor, compare Fig. 1.9. The distribution
qk of the outgoing edges of a neighbor vertex is the “excess degree distribution”.
.
1.3 Percolation in Generalized Random Graphs 19
A B
Fig. 1.9 The excess degree distribution .qk−1 is the probability of finding k outgoing links of a
neighboring site. Here the site B is a neighbor of site A and has a degree .k = 5 and .k − 1 = 4
outgoing edges, compare (1.21). The probability that B is a neighbor of site A is proportional to
the degree k of B
〈k 2 〉 − 〈k〉
= , (1.22)
〈k〉
zm ,
. z1 = 〈k〉 ≡ z
the average number of m-nearest neighbors. Equation (1.22) gives the average
number of vertices two steps away from the starting vertex A via a particular
neighbor vertex. Multiplying this by the mean degree of A, namely .z1 ≡ z, we
find that the mean number of second neighbors .z2 of a vertex is
z2 = 〈k 2 〉 − 〈k〉 .
. (1.23)
Note that .z2 is not given by the variance of the degree distribution,4 which would be
.〈(k − 〈k〉) 〉 = 〈k 〉 − 〈k〉 .
2 2 2
is the Poisson distribution, .pk = e−z zk /k!, see (1.5). For the average number of
second neighbors, Eq. (1.23), we obtain
∞ ∞
zk zk−1
.z2 = k 2 e−z − z = ze−z (k − 1 + 1) −z
k! (k − 1)!
k=0 k=1
= z = 〈k〉 .
2 2
Number of Far Away Neighbors The average number of edges emerging from
a second neighbor, and not lead ing back to where it came from, is also given
by (1.22), and indeed this is true at any distance m away from vertex A. The average
number of neighbors at a distance m is then
〈k 2 〉 − 〈k〉 z2
zm =
. zm−1 = zm−1 , (1.24)
〈k〉 z1
where .z1 ≡ z = 〈k〉 and .z2 are given by (1.23). Iterating this relation we find
m−1
z2
.zm = z1 . (1.25)
z1
Giant Connected Cluster Depending on whether .z2 is greater than .z1 or not,
Eq. (1.25) will either diverge or converge exponentially as m becomes large.
∞ if z2 > z1
. lim zm = . (1.26)
m→∞ 0 if z2 < z1
The percolation point is therefore .z1 = z2 . Below the transition, the total number of
neighbors is
∞ m−1
z2 z1 z12
. z m = z1 = = .
m
z1 1 − z2 /z1 z 1 − z2
m=1
The total number of sites connected to the starting node is finite below the
percolation transition, which implies that the network decomposes into non-
1.3 Percolation in Generalized Random Graphs 21
connected components. Beyond the percolation transition one or more clusters with
macroscopic numbers of nodes form.
If the total number of neighbors is infinite, then there must be a giant connected
component. When the total number of neighbors is finite, there can be no giant
connected component.
This phase transition occurs when .z2 = z1 . Making use of (1.23), .z2 = 〈k 2 〉−〈k〉,
we find that this condition is equivalent to
∞
〈k 2 〉 − 2〈k〉 = 0,
. k(k − 2)pk = 0 . (1.27)
k=0
Because of the factor .k(k − 2), vertices of degree zero and degree two do not
contribute to the sum. The number of vertices with degree zero or two therefore
affects neither the phase transition nor the existence of a giant component.
– Vertices of degree zero are not connected to any other node, they do not
contribute to the network topology.
– Vertices of degree one constitute the sole negative contribution to the percolation
condition (1.27). No further path is available when arriving at a site with degree
one.
– Vertices of degree two act as intermediators between two other nodes. Removing
vertices of degree two does not change the topological structure of a graph.
One can therefore remove (or add) vertices of degree two or zero without affecting
the existence of the giant component.
log(N/z1 )
. log(N/z1 ) = (𝓁 − 1) log(z2 /z1 ), 𝓁= + 1, (1.28)
log(z2 /z1 )
when using .zl = z1 (z2 /z1 )l−1 , as given by (1.25). For Erdös–Rényi random graphs
one has .z1 = z and .z2 = z2 , which leads to
where the notation .〈. . .〉q indicates that the average is to be taken with respect to the
excess degree distribution .q = qk , as given by (1.21).
As expected, the clustering coefficient vanishes in the thermodynamic limit
.N → ∞. However, it may have a very big leading coefficient, especially for degree
distributions with fat tails. The differences listed in Table 1.1, between the measured
clustering coefficient C and the value .Crand = z/N for Erdös–Rényi graphs,
are partly due to the fat tails in the degree distributions .pk of the corresponding
networks.
1.3 Percolation in Generalized Random Graphs 23
Network theory is about the statistical properties of graphs. A powerful method from
probability theory is the generating function formalism, which we will discuss now
and apply later on.
the “generating function” .G0 (x) for the probability distribution .pk . The generating
function .G0 (x) contains all information present in .pk . We can recover .pk from
.G0 (x) by differentiation,
1 dk G0
. pk = . (1.31)
k! dx k x=0
Generating Function for the Excess Degree Distribution For later use we define
the generating function .G1 (x) for the distribution .qk = (k + 1)pk+1 /z measuring
the number of non-returning edges leaving a neighboring vertex,
∞ ∞ ∞
k=0 (k + 1)pk+1 x k k=1 kpk x
k−1
.G1 (x) = qk x =
k
=
z z
k=0
G'0 (x)
= , z = 〈k〉 , (1.32)
z
where .G'0 (x) denotes the first derivative of .G0 (x) with respect to its argument.
– NORMALIZATION
The normalization of .pk implies
G0 (1) =
. pk = 1 . (1.33)
k
24 1 Network Theory
– MEAN
Straightforward differentiation,
G'0 (1) =
. k pk = 〈k〉 , (1.34)
k
i.e. by the product of the individual generating functions. This is the reason
why generating functions are so useful in describing combinations of independent
random events.
As an application consider n randomly chosen vertices. The sum . i ki of the
respective degrees has a cumulative degree distribution, which is generated by
n
. G0 (x) .
Generating
Function of the Poisson Distribution The generating function .G0 =
−z zk /k! is
k pk x of the Poisson distribution .pk = e
k
∞
zk k
G0 (x) = e−z
. x = ez(x−1) ,
k!
k=0
with z being the average degree. Compare (1.5). The generating function .G1 (x) for
the excess degree distribution .qk is
G'0 (x)
G1 (x) =
. = ez(x−1) , (1.36)
z
compare (1.32). As expected, we find .G1 (x) = G0 (x) for the Poisson distribution.
1.3 Percolation in Generalized Random Graphs 25
together with
Stochastic Sum of Independent Variables Let’s assume that the random variables
k1 , k2 , . . . are identically distributed, which implies that they have the same
.
G02 (x),
. G03 (x), G04 (x) ,
k1 + k2 ,
. k 1 + k2 + k3 , k 1 + k2 + k3 + k4 ,
and so on. Next consider that the number of times n this stochastic process is
executed is distributed as .pn . As an example consider throwing a dice several times,
with a probability .pn to throw n times. The distribution of the results obtained is
then generated by
= + + + +
Fig. 1.10 Graphical representation of the self-consistency (1.39) for the generating function
H1 (x), represented by the open box. A single vertex is represented by a hashed circle. The
subcluster connected to an incoming vertex can be either a single vertex or an arbitrary number
of subclusters of the same type connected to the first vertex. In analogy to Newman et al. (2001)
H1 (x) =
. h(1)
m x
m
the generating function for the distribution of cluster sizes containing a given vertex
j , with the condition that j is linked to a specific incoming edge, see Fig. 1.10. That
is, h(1)
m is the probability that the such-defined cluster contains m nodes.
– The first vertex j belongs to the subcluster with probability 1, its generating
function is x.
– The probability that the vertex j has k outgoing stubs is qk .
– At every stub outgoing from vertex j there is a subcluster.
– The total number of vertices consists of those generated by [H1 (x)]k plus the
starting vertex.
Sect. 1.3.2. The self-consistency equation for the total number of vertices reachable
is then
∞
H1 (x) = x
. qk [H1 (x)]k = x G1 (H1 (x)) , (1.39)
k=0
where the prefactor x corresponds to the generating function of the starting vertex.
The complete distribution of component sizes is given by solving (1.39) self-
consistently for H1 (x) and then substituting the result into (1.40).
Mean Component Size The calculation of H1 (x) and H0 (x) in closed form is
not possible. We are, however, interested only in the first moment, viz the mean
component size, see (1.34).
The component size distribution is generated by H0 (x), Eq. (1.40). The the mean
component size below the percolation transition is therefore
〈s〉 = H0' (1) = G0 (H1 (x)) + x G'0 (H1 (x)) H1' (x)
.
x=1
of generating functions, see (1.33). The value of H1' (1) can be calculated from
(1.39) by differentiating:
G'0 (1)
〈s〉 = 1 +
. . (1.43)
1 − G'1 (1)
Percolation Transition On a closer look, above expression for the mean compo-
nent size reproduces the percolation condition z2 = z1 derived in Sect. 1.3. We make
use of (1.23) and
G'0 (1) =
. k pk = 〈k〉 = z1 , (1.44)
k
k(k − 1)pk 〈k 2 〉 − 〈k〉 z2
G'1 (1) = k
= = ,
k kpk 〈k〉 z1
where z1 and z2 are the mean numbers of nearest- and next-nearest neighbors.
Substitution into (1.43) gives the average component size below the transition as
z12
〈s〉 = 1 +
. . (1.45)
z 1 − z2
Size of the Giant Component Above the percolation transition the network
contains a giant connected component, which contains a finite fraction S of all
sites N . Formally we can decompose the generating functional for the component
sizes into a part generating the giant component, H0∗ (x), and a part generating the
remaining components,
(1.48)
0.2
0
0 0.5 1 1.5 2 2.5 3
coordination number z
1 − S = H0 (1) = G0 (u),
. u = H1 (1) = G1 (u) . (1.46)
For Erdös–Rényí graphs we have G1 (x) = G0 (x) = exp(z(x − 1)), see (1.36). This
leads to 1 − S = u in (1.46), which takes with
1 − S = u = ez(u−1) = e−zS ,
. S = 1 − e−zS (1.47)
the form of a self-consistency condition for the mean size S of the giant component.
Percolation takes place for z ≥ 1.
(zS)2 2(z − 1)
S = 1 − e−zS ≈ zS −
. , S(z) ≈ (1.48)
2 z2
shows that the S(z) increases linearly in the percolating regime, as shown in
Fig. 1.11. It follows from (1.47) that the giant component engulfs the entire net
exponentially fast for large degrees z, namely that limz→∞ S(z) → 1.
Degree distributions of real-world networks often have “fat tails”, a vaguely defined
notion that a given distribution decays only slowly when the degree becomes large.
In general, fat tails increase the robustness of a network. That is, the network retains
functionality even when a certain number of vertices or edges is removed. The
Internet remains functional, to give an example, even when a substantial number
of Internet routers fail.
b = bk
.
generates the probabilities that a vertex has degree k and is present. Strictly
speaking, .F0 (x) is not a proper generating function because the normalization .F0 (1)
is not unity, but the fraction of all vertices that are active. This shortcoming could
be rectified by adding via
∞
. 1− pk bk + pk bk x k ,
k k=0
the generating function for the degree distribution of neighbor sites. Also .F1 (x) is
not proper normalized.
bk ≡ b ≤ 1,
. F0 (x) = b G0 (x), F1 (x) = b G1 (x)
1.4 Robustness of Random Networks 31
of a vertex being present is independent of the degree k and just equal to a constant,
b, which means that
where .G0 (x) and .G1 (x) are the standard generating functions for the degree of a
vertex and of a neighboring vertex, see (1.30) and (1.32). This implies that the
mean size of a cluster of connected and present vertices is
b2 z12
〈s〉 = b +
.
z1 − bz2
z1 1
bc =
. = ' . (1.53)
z2 G1 (1)
If the fraction b of the vertices present in the network is smaller than the critical
fraction .bc , then there will be no giant component. This is the point at which the
network ceases to be functional in terms of connectivity. When there is no giant
component, connecting paths exist only within small isolated groups of vertices,
but no long-range connectivity exists. For a communication network such as the
Internet, this would be fatal.
For networks with fat tails, however, we expect that the number of next-nearest
neighbors .z2 is large compared to the number of nearest neighbors .z1 and that
.bc is consequently small. The network is robust as one would need to take out a
1 dk
pk ∼
. , < ∞, α > 1,
kα kα
32 1 Network Theory
see (1.8) and Sect. 1.6. The first two moments are
Noting that the number of next-nearest neighbors .z2 = 〈k 2 〉 − 〈k〉, we can identify
three regimes:
randomly removed with the network remaining above the percolation limit. The
network is extremely robust.
– .3 < α, with .z1 < ∞ and .z2 < ∞.
.bc = z1 /z2 can acquire any value and the network has normal robustness.
Biased Failure of Vertices What happens when somebody sabotages the most
important sites of a network? This is equivalent to removing vertices in decreasing
order of their degrees, starting with the highest degree vertices. The probability that
a given node is active then takes the form
bk = θ (kc − k) ,
. (1.54)
This form of targeted attacks correspond to setting the upper limit of the sum in
(1.49) to .kc . Differentiating (1.51) with respect to x yields
F1 (1)
H1' (1) = F1 (H1 (1)) + F1' (H1 (1)) H1' (1),
. H1' (1) = ,
1 − F1' (1)
critical fraction fc
.pk ∼ 1/k the critical
α
Hk(α−2)
.
c
− Hk(α−1)
c
= H∞
(α−1)
, (1.57)
(r)
where .Hn is the nth harmonic number of order r,
n
1
Hn(r) =
. . (1.58)
kr
k=1
The number of vertices present is .F0 (1), see (1.49), or .F0 (1)/ k pk , since the
degree distribution .pk is normalized. If we remove a certain fraction .fc of the
vertices we reach the transition determined by (1.57),
(α)
F0 (1) Hkc
fc = 1 −
. = 1 − (α) . (1.59)
k pk H∞
It is impossible to determine .kc from (1.57) and (1.59) to get .fc in closed form.
However one can, solve (1.57) numerically for .kc and substitute it into (1.59). The
results are shown in Fig. 1.12, as a function of the exponent .α. The network is very
susceptible with respect to a biased removal of highest-degree vertices.
– A removal of more than about 3% of the highest degree vertices always leads to
a destruction of the giant connected component. Maximal robustness is achieved
for .α ≈ 2.2, which is actually close to the exponents measured in some real-
world networks. Compare Fig. 1.6.
34 1 Network Theory
Fig. 1.13 Regular linear graphs with connectivities .z = 2 (top) and .z = 4 (bottom)
– Networks with .α > αc = 3.4788 never have a giant connected component. The
(α−2) (α−1)
critical exponent .αc is given by the percolation condition .H∞ = 2H∞ ,
see (1.27).
Random graphs and random graphs with arbitrary degree distribution show no
clustering in the thermodynamic limit, in contrast to real-world networks. It is
therefore important to find methods to generate graphs that have a finite clustering
coefficient and, at the same time, the small-world property.
Clustering in Lattice Models Lattice models and random graphs are two extreme
cases of network models. In Fig. 1.13 we illustrate a simple one-dimensional lattice
with connectivity z. For periodic boundary conditions, viz when the chain wraps
around itself in a ring, one can evaluate the clustering coefficient C analytically.
– ONE DIMENSION
For number of clusters one finds
3(z − 2)
C=
. , (1.60)
4(z − 1)
3(z − 2d)
C=
. , (1.61)
4(z − d)
which generalizes (1.60). We note that the clustering coefficient tends to .3/4 for
z ⪢ 2d for regular hypercubic lattices in all dimensions.
.
1.5 Small-World Models 35
Distances in Lattice Models Regular lattices do not show the small-world effect.
A regular hypercubic lattice in d dimensions with linear size L has .N = Ld vertices.
The average vertex–vertex distance increases as L, or equivalently as
𝓁 ≈ N 1/d .
.
The Watts and Strogatz Model Watts and Strogatz proposed a small-world model
that interpolates smoothly between a regular lattice and an Erdös–Rényi random
graph. The construction starts with a one-dimensional periodic lattice, see Fig. 1.14.
There are several possibilities
– One goes through all the links of the lattice and rewires the link with a given
probability .pnew .
– A single edge is selected and rewired, with the procedure repeated .Nnew times.
– As a variation one may add links, instead of rewiring. The advantage is that
networks may not become disconnected.
For small .pnew and/or .Nnew , this process produces a graph that is still mostly regular,
possessing however a few connections stretching long distances across the lattice,
as illustrated in Fig. 1.14. The average coordination number z of the lattice remains
constant. The number of neighbors of any particular vertex can, however, be greater
or smaller than z.
Social Network Interpretation The small-world models such as the ones illus-
trated in Fig. 1.14, have an intuitive justification for social networks. Most people
are friends with their immediate neighbors. Neighbors on the same street, people
that they work with, or their relatives. However, some people are also friends with
a few far-away persons. Far away in a social sense, like people in other countries,
people from other walks of life, acquaintances from previous eras of their lives, and
so forth. These long-distance acquaintances are represented by the long-range links
in small-world models.
In Fig. 1.15 the clustering coefficient is shown as a function of network diameter
when rewiring is performed edge by edge. The key result is that a few steps are
sufficient to achieve a drastic reduction of intra-network distances, as measured by
36 1 Network Theory
1
link by link rewiring
0.8 N = 500, z = 4
diameter l/l0
5 runs, stopping when
0.6 graph becomes disconnected
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
clusterning coefficient C/C0
Fig. 1.15 The scaled clustering coefficient .C/C0 as as a function of the scaled diameter .l/ l0 ,
where .C0 = 3/4 and .l = N/4 are the initial values, for a one-dimensional ring with .N = 500 sites
and .z = 4, compare Fig. 1.14. Points correspond to the consecutive random rewiring of a single
edge
the graph diameter. At the same time, the clustering coefficient remains high, as
observed in many real-world networks.
Evolving Networks Most real-world networks are open, i.e. they are formed by
the continuous addition of new vertices to the system. The number of vertices, N,
increases throughout the lifetime of the network, as it is the case for the world wide
web, which grows exponentially by the continuous addition of new web pages. The
small world networks discussed in Sect. 1.5 are, however, constructed for a fixed
number of nodes N, growth is not considered.
A newly created web page, to give an example, will include links to well-known
sites with a quite high probability. Popular web pages will therefore have both a high
number of incoming links and a high growth rate for incoming links. The growth of
vertices in terms of edges is therefore in general not uniform.
1.6 Scale-Free Graphs 37
Fig. 1.16 Illustration of the preferential attachment model for an evolving network. At t = 0 the
system consists of N0 = 3 isolated vertices. At every time step a new vertex (shaded circle) is
added, which is connected to m = 2 vertices, preferentially to the vertices with high connectivity,
determined by the rule (1.62)
– GROWTH
At every time step a new vertex is added and connected with m ≤ N0 links to
existing nodes.
– PREFERENTIAL ATTACHMENT
The new links are determined by connecting the m stubs of the new vertex to
nodes with degrees ki with probability
ki + C
Π (ki ) =
. a(t) = 2m(t − 1) + C(N0 + t − 1) , (1.62)
a(t)
C
pk ∼ k −γ
. γ =3+ . (1.63)
m
This scaling relation is valid for large degrees ki in the limit t → ∞.
∂ki ki + C
.Δki (t) ≡ ki (t + 1) − ki (t) ≈ = A Π (ki ) = A , (1.64)
∂t a(t)
where Π (ki ) is the attachment probability. The overall number of new links is
proportional to a normalization constant A, which is hence determined by the sum
rule
∂ki ki + C m ki + C
. =m ≈ , (1.65)
∂t (2m + C)t + a0 2m + C t
where did neglect a0 = C(N0 − 1) − 2m for large times t. The solution of (1.65) is
given by
m/(2m+C)
k̇i m 1 t
. = , ki (t) = C + m −C, (1.66)
ki + C 2m + C t ti
where ti is the time at which the vertex i had been added, ki (t) = 0 for t < ti .
Adding Times We specialize to the case C = 0, the general result for C > 0
can be obtained subsequently from scaling considerations. A vertex added at time
ti = Ni − N0 has initially m = ki (ti ) links, in agreement with (1.66), which reads
0.5
t
ki (t) = m
. , ti = t m2 /ki2 (1.67)
ti
for C = 0. Older nodes, i.e. those with smaller ti , increase their connectivity faster
than the younger vertices, viz those with bigger ti , see Fig. 1.17. For social networks
this mechanism is dubbed the rich-gets-richer phenomenon.
where we did define the initial N0 nodes to have adding times zero.
1.6 Scale-Free Graphs 39
3 P(ti )
ki(t)
m²t/k²
2 1/(N0+t)
1 P(t i>m²t/k²)
0
0 1 2 3 4 5 6 t ti
adding times
Fig. 1.17 Left: Time evolution of the connectivities for vertices with adding times t = 1, 2, 3, . . .
and m = 2, following (1.67). Right: The integrated probability, P (ki (t) < k) = P (ti > tm2 /k 2 ),
see (1.68)
Integrated Probabilities Using (1.67), the probability that a vertex has a connec-
tivity ki (t) smaller than a certain k, P (ki (t) < k) can be written as
m2 t
P (ki (t) < k) = P ti > 2
. . (1.68)
k
Adding times are uniformly distributed, compare Fig. 1.17, which implies that the
probability P (ti ) to find an adding time ti is
1
P (ti ) =
. , (1.69)
N0 + t
viz just the inverse of the total number of adding times, which coincides in turn with
the total number of nodes. P (ti > m2 t/k 2 ) is therefore the cumulative number of
adding times ti larger than m2 t/k 2 , multiplied with the probability P (ti ) to add a
new node,
m2 t m2 t 1
P ti > 2 = t − 2
. . (1.70)
k k N0 + t
law scaling in the degree distribution. To verify that both ingredients are really
necessary, we investigate variants of above model.
One can repeat above calculation for a finite offset C > 0. The exponent γ is
identical to the inverse of the scaling power of ki with respect to time t in (1.66),
plus one, which leads to γ = (2m + C)/m + 1 = 3 + C/m.
Growth with Random Attachment We examine whether growth alone can result
in a scale-free degree distribution, assuming random instead of preferential attach-
ment. The growth equation for the connectivity ki of a given node i, compare (1.65),
then takes the form
∂ki m
. = . (1.72)
∂t N0 + t
The m new edges are linked randomly at time t to the (N0 + t − 1) nodes present at
the previous time step. Solving (1.72) for ki , with the initial condition ki (ti ) = m,
we obtain
N0 + t
.ki = m ln +1 , (1.73)
N0 + ti
which is a logarithmic increase with time. The probability that vertex i has
connectivity ki (t) smaller than k is
P (ki (t) < k) = P ti > (N0 + t) e1−k/m − N0
. (1.74)
1
= t − (N0 + t) e1−k/m + N0 ,
N0 + t
where we assumed that we add the vertices uniformly in time to the system. Using
k∗ = m ,
. (1.76)
distribution. Note that pk in (1.75) is not properly normalized, nor in (1.71), since
we used a large-k approximation during the respective derivations.
The model reduces to the original preferential attachment model in the limit r → 1.
The scaling exponent γ can be evaluated along the lines used above for the case
r = 1. One finds
1 1
pk ∼
. , γ =1 + . (1.77)
kγ 1 − r/2
The exponent γ = γ (r) interpolates smoothly between two and three, with γ (1) =
3 and γ (0) = 2. For most real-world graphs r is quite small, viz most links are
added internally. Note, that the average connectivity 〈k〉 = 2m remains constant,
since one new vertex is added for 2m new stubs.
Exercises
α−1
p(x) =
. , (1.78)
(1 + x)α
Further Reading
For further studies several books and review articles on general network theory are
recommended, Estrada (2012), Kadushin (2012), and Albert and Barabási (2002).
The interested reader might delve into some of the original literature, including
the original Watts and Strogatz (1998) small-world model, the mean-field solution
of the preferential attachment model, Barabasi et al. (1999), a study regarding the
community structure of real-world networks, Palla et al. (2005), or the mathematical
basis of graph theory, Erdös and Rényi (1959). A good starting point is the account
by Milgram (1967) of his by now famous experiment, which led to the law of “six
degrees of separation”, Guare (1990).
References
Albert, R., & Barabási, A. L. (2002). Statistical mechanics of complex networks. Review of Modern
Physics, 74, 47–97.
Barabasi, A. L., Albert, R., & Jeong, H. (1999). Mean-field theory for scale-free random networks.
Physica A, 272, 173–187.
Brinkman, W. F., & Rice, T. M. (1970). Single-particle excitations in magnetic insulators. Physical
Review B, 2, 1324–1338.
Erdös, P., & Rényi, A. (1959). On random graphs. Publications Mathematicae, 6, 290–297.
Estrada, E. (2012). The structure of complex networks: Theory and applications. Oxford: Oxford
University Press.
Guare, J. (1990). Six degrees of separation: A play. New York: Vintage.
Kadushin, C. (2012). Understanding social networks: Theories, concepts, and findings. Oxford:
Oxford University Press.
Ludueña, G. A., Meixner, H., Kaczor, G., & Gros, C. (2013). A large-scale study of the world wide
web: Network correlation functions with scale-invariant boundaries. The European Physical
Journal B, 86, 1–7.
Milgram, S. (1967). The small world problem. Psychology Today, 2, 60–67.
Newman, M. E. J., Strogatz, S. H., & Watts, D. J. (2001). Random graphs with arbitrary degree
distributions and their applications. Physical Review E, 64, 026118.
Palla, G., Derenyi, I., Farkas, I., & Vicsek, T. (2005). Uncovering the overlapping community
structure of complex networks in nature and society. Nature, 435, 814–818.
Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of small world networks. Nature, 393,
440–442.
Bifurcations and Chaos in Dynamical Systems
2
Complex systems theory deals with dynamical systems containing often large
numbers of variables. It extends dynamical systems theory, which treats dynamical
systems containing a few variables. A good understanding of dynamical systems
theory is therefore a prerequisite when studying complex systems.
In this chapter we introduce core concepts, like attractors and Lyapunov expo-
nents, bifurcations, and deterministic chaos from the realm of dynamical systems
theory. An introduction to catastrophe theory will be provided together with the
notion of rate-induced tipping and colliding attractors.
Most of the chapter will be devoted to ordinary differential equations and maps,
the traditional focus of dynamical systems theory, venturing however towards the
end into the intricacies of time delay dynamical systems.
Dynamical systems theory deals with the properties of coupled differential equa-
tions, determining the time evolution of a few, typically a handful of variables. We
present a concise overview covering the most important concepts and phenomena.
ṙ = (Γ − r 2 ) r,
. ϕ̇ = ω (2.2)
y
y
x x
Fig. 2.1 The solution of the non-linear rotator, compare (2.1) and (2.2). For .Γ < 0 the orbit is
attracted by a stable fixpoint (left), for .Γ > 0 by a stable limit cycle (right)
govern the dynamical behavior. The typical orbits .(x(t), y(t)) are illustrated in
Fig. 2.1. The limiting behavior of (2.2) is
⎧
⎪
⎪ 0
⎪
⎨ Γ <0
x(t) 0
. lim = (2.3)
t→∞ y(t) ⎪ rc cos(ωt)
⎪
⎪
⎩ rc2 = Γ > 0
rc sin(ωt)
In the first case, .Γ < 0, trajectories are attracted by the stable fixpoint .x∗0 = (0, 0).
In the second case, .Γ > 0, the dynamics approaches stable periodic orbit.
LIMIT CYCLE Trajectories retrace themselves with a period T for limit cycles, viz one
has .x(t + T ) = x(t).
Limit cycles can be attracting or repelling. In (2.3) one has an attracting limit
cycle for .Γ > 0.
and .r(t) decreases (increases) for .Γ < 0 (.Γ > 0). For a d-dimensional system
x = (x1 , . . . , xd ) the stability of a fixpoint .x∗ is determined by calculating the
.
PHASE SPACE The phase space is the space spanned by all allowed values of the variables
entering a set of first-order differential equations defining the dynamical system.
For a two-dimensional system .(x, y) the phase space is .R2 , but in the polar
coordinates one has
Attracting States and Manifolds Attracting states play a center role in dynamical
systems theory.
ATTRACTOR A bounded region in phase space to which orbits with certain initial
conditions come arbitrarily close is called an attractor.
Attractors can be isolated points, fixpoints, limit cycles or more complex objects
like attracting manifolds, viz subsets of phase space, or chaotic attractors.
48 2 Bifurcations and Chaos in Dynamical Systems
BASIN OF ATTRACTION The set of initial conditions in phase space leading to orbits
approaching a certain attractor arbitrarily closely is the basin of attraction.
Most of the times the extend of a basin of attraction can be evaluated only
numerically.
d3
. x(t) = f (x, ẋ, ẍ) . (2.4)
dt 3
Using
x1 (t) = x(t),
. x2 (t) = ẋ(t), x3 (t) = ẍ(t) , (2.5)
One can reduce any set of coupled differential equations to a set of first-order differ-
ential equations by introducing an appropriate number of additional variables. We
therefore consider in the following only first-order, ordinary differential equations.
dx(t)
. = f(x(t)), x, f ∈ Rd , t ∈ [−∞, +∞] , (2.6)
dt
when time is continuous, or, equivalently, maps such as
x(t + 1) = g(x(t)),
. x, g ∈ Rd , t = 0, 1, 2, . . . , (2.7)
when time is discrete. Together with the time evolution equation one has to set
the initial condition .x0 = x(t0 ). An evolution equation of type (2.15) is denoted
“autonomous”, since it does not contain an explicit time dependence. A system of
type .ẋ = f(t, x) is dubbed “non-autonomous”.
A particular solution .x(t) of a dynamical system in phase space is denoted “tra-
jectory”, or “orbit”. Orbits are uniquely determined by the set of initial conditions,
.x(0) ≡ x0 , a consequence of dealing with first-order differential equations.
2.1 Basic Concepts of Dynamical Systems Theory 49
P(x)
x
.Σ = { (x1 , x2 , 0, . . . , 0) | x1 , x2 ∈ R }
The Poincaré map is a discrete map, compare (2.7), which can be constructed for
continuous-time dynamical systems like (2.15). The Poincaré map is very useful,
since we can print and analyze it directly. In the simplest case, a periodic orbit
would show up in the Poincaré map as the identity mapping.
– CONSTANT OF MOTION
A function .F (x) on phase space .x = (x1 , . . . , xd ) is a “constant of motion” if it
is conserved under the time evolution of the dynamical system, i.e. when
d
d ∂
. F (x(t)) = F (x) ẋi (t) ≡ 0
dt ∂xi
i=1
holds for all times t. In many mechanical systems the energy is a conserved
quantity.
– ERGODICITY
A dynamical system in which orbits come arbitrarily close to any allowed point
in the phase space, irrespective of the initial condition, is called ergodic.
(x1 . . . xf , v1 . . . vf ),
. vi = ẋi , i = 1, . . . , f ,
(4) ( 4)
Fig. 2.4 Left: A KAM-torus can be cut along two lines (vertical/horizontal) and unfolded. Right:
A closed orbit on the unfolded torus with .ω1 /ω2 = 3/1. The numbers indicate points that coincide
after refolding (periodic boundary conditions)
KAM Theorem Kolmogorov, Arnold and Moser (KAM) examined the question
of what happens to an integrable system when it is perturbed. Let us consider
a two-dimensional torus, as illustrated in Fig. 2.4. Orbit wraps around the torus
respectively with frequencies .ω1 and .ω2 . A key quantity is the ratio of revolution
frequencies .ω1 /ω2 , which can be rational or irrational.
We recalr that irrational numbers r may be approximated with arbitrary accuracy
by a sequence of quotients
m1 m2 m3
. , , , ... s1 < s2 < s3 < . . .
s1 s2 s3
Gaps in the Saturn Rings A spectacular example for the instability of rational
KAM-tori are the gaps in the rings of the planet Saturn.
The time a hypothetical particle in Cassini’s gap (between the A- and the B-
ring, .r = 118,000 km) would need to orbit Saturn is half the orbiting period of the
‘shepherd-moon’ Mimas. The quotient of these two revolving frequencies is 2:1.
Particle orbiting in Cassini’s gap are therefore unstable against perturbation caused
by Mimas and thrown out of their orbit.
52 2 Bifurcations and Chaos in Dynamical Systems
ẋ = f (x),
. f (x ∗ ) = 0 . (2.9)
d
. x − x ∗ = ẋ ≈ f (x ∗ ) + f ' (x ∗ )(x − x ∗ ) + . . . , (2.10)
dt
d
. Δx = f ' (x ∗ ) Δx, Δx = x − x ∗ ,
dt
where we neglected terms of order .(x − x ∗ )2 and higher and where we made use of
the fixpoint condition .f (x ∗ ) = 0. This equation has the solution
' (x ∗ ) ∞ f ' (x ∗ ) > 0
. Δx(t) = Δx(0) etf → . (2.11)
0 f ' (x ∗ ) < 0
The perturbation .Δx decreases/increases with time and the fixpoint .x ∗ is hence
stable/unstable for .f ' (x ∗ ) < 0 and .f ' (x ∗ ) > 0 respectively.
The Lyapunov exponent controls the direction of the flow close to a fixpoint.
Orbits are exponentially repelled/attracted for .λ > 0 and for .λ < 0 respectively. For
more than a single variable one has to find all eigenvalues of the linearized problem,
as discussed further below, with the fixpoint being stable only when all eigenvalues
are negative.
x(t + 1) = g(x(t)),
. x ∗ = g(x ∗ ) (2.12)
controls the stability of the fixpoint. Note the differences in the relation of the
Lyapunov exponent .λ to the derivatives .f ' (x ∗ ) and .g ' (x ∗ ) for differential equations
and maps respectively, as given by (2.11) and (2.14).
dx(t)
. = f(x(t)), x, f ∈ Rd , f(x∗ ) = 0 (2.15)
dt
Jacobian and Lyapunov Exponents For a stability analysis of the fixpoint .x∗ one
linearizes (2.15) around the fixpoint, using .xi (t) ≈ xi∗ + δxi (t), with small .δxi (t).
One obtains
The matrix .Jij of all possible partial derivatives is the “Jacobian” of the dynamical
system (2.15). The Jacobian allows to generalizes the definition of the Lyapunov
exponent for one-dimensional systems, as given previously.
LYAPUNOV SPECTRUM The set of eigenvalues .{λi } of the Jacobian .i = 1, .., d is the
spectrum characterizing the fixpoint .x∗ .
Lyapunov exponents .λn = λ'n + iλ''n may have real .λ'n and imaginary .λ''n
components, which leads to the time evolution
' ''
eλn t = eλn t eiλn t
.
HYPERBOLIC FIXPOINT The flow is well defined in linear order when all .λ'i /= 0. In this
case the fixpoint is hyperbolic.
Pairwise Conjugate Exponents With .λ = λ' +iλ'' also its conjugate .λ∗ = λ' −iλ''
in an eigenvalue of the Jacobian, which is a real matrix. It follows that .λ and .λ∗ differ
when .λ'' /= 0. In this caser there are (at least) two eigenvalues having the same real
part .λ' .
Fixpoint Classification for .d = 2 In Fig. 2.5 some example trajectories are shown
for several fixpoints in .d = 2 dimensions.
– NODE
Both eigenvalues are real and have the same sign, which is negative/positive for
a stable/unstable node.
– SADDLE
Both eigenvalues are real and have opposite signs.
– FOCUS
The eigenvalues are complex conjugate to each other. The trajectories spiral
in/out for negative/positive real parts.
Fig. 2.5 Example trajectories for a stable node (left), with a ratio .λ2 /λ1 = 2, for a saddle (middle)
with .λ2 /λ1 = −3, and for a unstable focus (right)
Stable and Unstable Manifolds For real eigenvalues .λn /= 0 of the Jacobian J ,
with eigenvectors .en and a sign .sn = λn /|λn |, we define via
For a neutral Lyapunov exponent with .λn = 0 one define a “center manifold”
which we will discuss in the context of catastrophe theory in Sect. 2.3.2. The term
manifold denotes in mathematics, loosely speaking, a smooth topological object.
Stable and unstable manifolds control the flow infinitesimal close to the fixpoint
along the eigendirections of the Jacobian and may be continued to all positive and
negative times t. Typical examples are illustrated in Figs. 2.5 and 2.6.
which has two saddles .x∗± = (x ∗ , 0), where .x ∗ = ±1. The eigenvectors of the
Jacobian .J (x ∗ ) are aligned with the x and the y axis respectively, for all values of
the control parameter .ϵ.
The flow diagram, as illustrated in Fig. 2.6, is invariant when inverting both .x ↔
(−x) and .y ↔ (−y). The system contains additionally the .y = 0 axis as a mirror
line for a vanishing .ϵ = 0 and there is a heteroclinic orbit connecting an unstable
manifold of .x∗− to one of the stable manifolds of .x∗+ . A finite .ϵ removes the mirror
56 2 Bifurcations and Chaos in Dynamical Systems
Fig. 2.6 Sample trajectories of the system (2.18), for .ϵ = 0 (left) and .ϵ = 0.2 (right). Shown
are the stable manifolds (thick green lines), the unstable manifolds (thick blue lines) and the
heteroclinic orbit (thick red line)
The nature of the solution to a dynamical system, as defined by a suitable first order
differential equation (2.15), may change qualitatively as a function of a given control
parameter. When this happens, a bifurcation occurs. Here we list the most important
classes of what is called a “local bifurcation”. The distinction to their counterpart,
“global bifurcations”, will be elucidated in Sect. 2.3.
A local bifurcations can be characterized by a simple archetypical equation, the
“normal form”, to which a more complex dynamical systems will generically reduce
close to the transition point.
dx
. = a − x2 , (2.19)
dt
for a real variable x and a real control parameter a. The fixpoints .ẋ = 0
∗ √ ∗ √
x+
. = + a, x− = − a, a>0 (2.20)
exist only for positive control parameters, .a > 0; there are no fixpoints for negative
a < 0. For the flow we find
.
⎧ √
⎪
⎨ < 0 for x > a
dx √ √
. = > 0 for x ∈ [− a, a] (2.21)
dt ⎪
⎩ < 0 for x < − a √
2.2 Fixpoints, Bifurcations and Stability 57
Fig. 2.7 The saddle-node bifurcation, as described by (2.19). There are two fixpoints for .a > 0,
∗ = −√a and a stable branch .x ∗ = +√a. Left: The phase diagram together
an unstable branch .x− +
with the flow (arrows). Right: The bifurcation potential .U (x) = −ax + x 3 /3, compare (2.26)
when .a > 0. The upper branch .x+∗ is hence stable and the lower branch .x ∗ unstable,
−
as illustrated in Fig. 2.7.
For a saddle-node bifurcation a stable and an unstable fixpoint collide and
annihilate each other, one speaks also of a fold, tangential or blue sky bifurcation.
dx
. = x(a − x) , (2.22)
dt
again for a real variable x and a real control parameter a. The two fixpoint solutions
ẋ = 0,
.
x0∗ = 0,
. xa∗ = a, ∀a (2.23)
exist for all values of the control parameter. The direction of the flow .ẋ is positive for
x in between the two solutions and negative otherwise, see Fig. 2.8. The respective
stabilities of the two fixpoint solutions exchange consequently at .a = 0.
dx √ √
. = ax − x 3 , x0∗ = 0, ∗
x+ = + a, ∗
x− = − a. (2.24)
dt
Bifurcation Potentials In many cases one can write the dynamical system under
consideration as
dx d
. =− U (x) , (2.25)
dt dx
58 2 Bifurcations and Chaos in Dynamical Systems
Fig. 2.8 The transcritical bifurcation, see (2.22). The two fixpoints .x0∗ = 0 and .xa∗ = a exchange
stability at .a = 0. Left: Phase diagram and flow (arrows). Right: The bifurcation potential .U (x) =
−ax 2 /2 + x 3 /3, compare (2.26)
1 a 1
Usaddle (x) = −ax + x 3 ,
. Utrans (x) = − x 2 + x 3 , (2.26)
3 2 3
respectively, see the definitions (2.19) and (2.22). The bifurcation potentials, as
shown in Figs. 2.7 and 2.8, bring immediately to evidence the stability of the
respective fixpoints. The bifurcation potential of the pitchfork bifurcation,
a 1
Upitch (x) = − x 2 + x 4 ,
. (2.27)
2 4
is identical to the Landau-Ginzburg potential describing second-order phase transi-
tions in statistical physics.3 Of relevance is also the subcritical pitchfork transition,
defined by .ẋ = ax + x 3 .
3 The Landau-Ginzburg theory of phase transitions will be treated at length in the context of self-
organized criticality, in Sect. 6.1. of Chap. 6.
2.2 Fixpoints, Bifurcations and Stability 59
Fig. 2.9 The supercritical pitchfork bifurcation, as defined by (2.24). The t.x0∗ = 0 becomes
∗ = +√a and .x ∗ = −√a appear. Left: Phase
unstable for .a > 0 and two new stable fixpoints, .x+ −
diagram and flow (arrows). Right: The bifurcation potential .U (x) = −ax 2 /2 + x 4 /4, compare
(2.26)
The bifurcation potentials of the saddle-node and the pitchfork transitions are
respectively antisymmetric and symmetric under a sign change .x ↔ −x of the
dynamical variable, compare (2.26) and (2.27).
The bifurcation potential of the transcritical bifurcation is, on the other hand,
antisymmetric under the combined symmetry operation .x ↔ −x and .a ↔ −a,
compare (2.26).
Hopf Bifurcation A Hopf bifurcation occurs when a fixpoint changes its stability
together with the appearance of an either stable or unstable limiting cycle, e.g. as
for non-linear rotator illustrated in Fig. 2.1. The canonical equations of motion are4
ẋ = −y + d(Γ − x 2 − y 2 ) x
. (2.28)
ẏ = x + d(Γ − x 2 − y 2 ) y
4 In complex plane, with .z = x + iy, Eq. (2.28) takes the form of a Stuart–Landau oscillator,
.ż = iωz + d(Γ − |z|2 )z, with .ω = 1.
60 2 Bifurcations and Chaos in Dynamical Systems
in Euclidean phase space .(x, y) = (r cos ϕ, r sin ϕ). The standard non-linear rotator
(2.2) is recovered when setting .d = 1. There are two steady-state solutions for
.Γ > 0,
√
(x0∗ , y0∗ ) = (0, 0),
. (xΓ∗ , yΓ∗ ) = Γ (cos(t), sin(t)) , (2.29)
a fixpoint and a limit cycle. The limit cycle disappears for .Γ < 0.
Super- vs. Sub-critical Hopf Bifurcation For d > 0 the bifurcation is “supercrit-
ical”. The fixpoint x∗0 = (x0∗ , y0∗ ) is stable/unstable for Γ < 0 and Γ > 0 and the
limit cycle x∗Γ = (xΓ∗ , yΓ∗ ) is stable, as illustrated in Fig. 2.1.
The direction of flow is reversed for d < 0, with the limit cycle x∗Γ becoming
repelling. The fixpoint x∗0 is then unstable/stable for Γ < 0 and Γ > 0 and one
speaks of a “subcritical” Hopf bifurcation.
Hopf Bifurcation Theorem One may be interested to find out whether a generic
two dimensional system,
ẋ = fμ (x, y),
. ẏ = gμ (x, y) , (2.30)
can be reduced to the normal form (2.28) for a Hopf bifurcation, where μ is the
bifurcation parameter. Without loss of generality one can assume that the fixpoint
x∗0 stays at the origin for all values of μ and that the transition takes place for μ = 0.
To linear order the normal form (2.28) and (2.30) are equivalent if the Jacobian
of (2.30) has a pair of complex conjugate eigenvalues, with the real value crossing
μ = 0 with a finite slope, which corresponds to a transition from a stable to an
unstable focus.
Comparing (2.28) and (2.30) to quadratic order one notices that quadratic terms
are absent in the normal form (2.28) but not in (2.30). One can however show,
with the help of a suitable non-linear transformation, that it is possible to eliminate
all quadratic terms from (2.30). One finds that the nature of the bifurcation is
determined by a combination of partial derivatives up to cubic order,
a = fxxx + fxyy + gxxy + gyyy /16
.
+ fxy (fxx + fyy ) − gxy (gxx + gyy ) − fxx gxx − fyy gyy /(16ω)
where ω > 0 is the imaginary part of the Lyapunov exponent at the critical point
μ = 0 and where the partial derivatives as fxy are to be evaluated at μ = 0 and
x → x∗0 . The Hopf bifurcations is supercritical and subcritical respectively for a < 0
and a > 0.
2.2 Fixpoints, Bifurcations and Stability 61
μ = 1.1
dr/dt
μ = 0.7
limit cycles
μ=0
stable μ =-0.7
unstable 1 r
0 1 μ
√
Fig. 2.10 Left: For a fold bifurcation of limit cycles, the locations R± = 1 ± μ of the stable
and unstable limit cycles, R− and R+ , as defined by (2.31) and (2.32). At μ = 0 a fold bifurcation
of limit cycles occurs and a subcritical Hopf bifurcation at μ = 1. Right: The respective flow
diagram. The filled/open circles denote stable/unstable limit cycles. For μ → 1 the unstable limit
cycle vanishes, a subcritical Hopf bifurcation. The stable and the unstable limit cycle collided for
positive μ → 0 and annihilate each other, a fold bifurcation of limit cycles
ṙ = −(r 2 − γ− )(r 2 − γ+ ) r,
. ϕ̇ = ω, γ − ≤ γ+ . (2.31)
For γ− < 0 the first factor (r 2 − γ− ) is smooth for r 2 ≥ 0 and does not influence
the dynamics qualitatively. In this case (2.31) reduces to a standard supercritical
Hopf bifurcation, as a function of the bifurcation parameter γ+ .
where μ will be our bifurcation parameter. For μ ∈ [0, 1] we have two positive roots
and consequently also two limit cycles, a stable and an unstable one. For μ → 1 the
unstable limit cycle vanishes in a subcritical Hopf bifurcation, compare Fig. 2.10.
The bifurcations discussed in Sect. 2.2 are termed “local” as they are based on
Taylor expansions around a local fixpoint, which implies that the dynamical state
changes smoothly at the bifurcation point. There are, on the other hand, bifurcations
characterized by the properties of extended orbits. These kinds of bifurcations are
hence of “global” character.
The conservative contribution to the force is .−V ' (x) = x(1 − x), as illustrated in
Fig. 2.11. The energy, defined by
ẋ 2 dE
. E= + V (x), = ẍ + V ' (x) ẋ = (x − μ)ẋ 2 , (2.34)
2 dt
is dissipated when .x < μ, viz when the term .(x −μ)ẋ in (2.33) reduces the velocity.
The energy increases however for .x > μ, which means that the term .(x − μ)ẋ
induces accelerated a movement. This interplay between energy dissipation and
uptake is typical for adaptive systems.5
Fixpoints and Jacobian The Taken–Bogdanov system (2.33) has two fixpoints
(x ∗ , 0), with .x ∗ = 0, 1. The eigenvalue of the Jacobian,
.
0 1 λ± (0, 0) = −μ/2 ± μ2 /4 + 1
J =
. ,
(1 − 2x ∗ ) (x ∗ − μ) λ± (1, 0) = (1 − μ)/2 ± (1 − μ)2 /4 − 1
5 Per definition, adaption implies that a system may both dissipate energy and increase its own
reservoir, as discussed further in Sect. 3.2 of Chap. 3.
2.3 Global Bifurcations 63
y μ =μc
V(x)
energy energy
dissipation uptake 1 x
0 μ 1 x
Fig. 2.11 Left: The potential .V (x) of the Taken–Bogdanov system (2.33). Energy is dissipated
to the environment for .x < μ and taken up for .x > μ. For .μ < 1 the local minimum .x = 1 of
the potential becomes an unstable focus. Right: The flow for the critical .μ = μc ≈ 0.8645. The
stable and unstable manifolds form an homoclinic loop (red line)
The local minimum .(1, 0) of the potential is a stable/unstable focus for .μ > 1 and
μ < 1 respectively, with .μ = 1 being the locus of a supercritical Hopf bifurcation.
.
We consider now .μ ∈ [0, 1] and examine the further evolution of the resulting limit
cycle.
An unstable and a stable manifold cross at .μ = μc , compare Figs. 2.11 and 2.12,
forming a homocline. The homocline is also the endpoint of the limit cycle present
for .μ > μc , which disappears for .μ < μc . The limit cycle is hence destroyed at the
64 2 Bifurcations and Chaos in Dynamical Systems
y y
μ > μc μ < μc
1 x 1 x
Fig. 2.12 The flow for the Taken–Bogdanov system (2.33). Left: The flow in the subcritical
region, for .μ = 0.9 > μc ≈ 0.8645, together with the resulting limit cycle (thick black line).
For .μ → μc the unstable (red) and stable (orange) manifold join to form a homoclinic loop, which
is identical to the locus of the limit cycle for .μ → μc . Right: The flow in the supercritical region,
for .μ = 0.8 < μc . The limit cycle did break after touching the saddle at .(0, 0)
point of its maximal size for a homoclinic bifurcation, and not when vanishing, as
for a supercritical Hopf bifurcation.
For a further example of how a limit cycle may disappear discontinuously we con-
sider two coupled oscillators characterized by their individual phases, respectively
.θ1 and .θ2 . A typical evolution equation for the phase difference .ϕ = θ1 − θ2 is
6
ϕ̇ = 1 − K cos(ϕ) .
. (2.35)
6 This formulation parallels the Kuramoto model, which is treated detail in Sect. 9.1 of Chap. 9.
2.3 Global Bifurcations 65
Infinite Period Bifurcation The limit cycle for .|K| < 1 has a revolution period T
of
T 2π 2π 2π
dt dϕ dϕ 2π
.T = dt = dϕ = = =√ ,
0 0 dϕ 0 ϕ̇ 0 1 − K cos(ϕ) 1 − K2
which diverges in the limit .|K| → 1. The transition occurring at .|K| = 1 is hence a
“infinite period bifurcation”, being characterized by a diverging time scale.
ẋ = h + ax − x 3 .
. (2.37)
For .h = 0 the system reduces to the pitchfork normal form of (2.24), becoming
invariant under the parity transformation .x ↔ −x.
(3)
(4)
x (2) x*
(1)
Fig. 2.14 Left: The self-consistency condition .x 3 = h + ax for the fixpoints .x ∗ of the symmetry
broken pitchfork system (2.37), for various fields h and a positive .a > 0. The unstable fixpoint
at .x = 0 would becomes stable for .a < 0, compare Fig. 2.9. Right: The hysteresis loop .(1) →
(2) → (3) → (4) → (1) occurring for .a > 0 as function of the field h
a = a(T ) = a0 (Tc − T ),
. a0 > 0 ,
where T is the temperature. In the absence of an external field h, only the trivial
fixpoint .x ∗ = 0 exists for .T > Tc , viz for temperatures above the critical
temperature .Tc . In the ordered state, for .T < Tc , there are two possible phases,
characterized by the positive and negative stable fixpoints .x ∗ .
Hysteresis and Memory The behavior of the phase transition changes when an
external field h is present. Switching the sign of the field is accompanied, in the
ordered state for .T < Tc , by a “hysteresis-loop”
– The field h changes from negative to positive values along .(1) → (2), with the
fixpoint .x ∗ remaining negative.
– At (2) the negative stable fixpoint disappears and the system makes a rapid
transition to (3), the catastrophe.
– Lowering eventually again the field h, the system moves to (4), jumping in the
end back to (1).
The system retains its state, .x ∗ being positive or negative, to a certain extend and
one speaks of a memory in the context of catastrophe theory.
2.3 Global Bifurcations 67
∂fi
ẋ = f(x),
. Jij = , J en = λn en ,
∂xj x=x∗
may have a number of neutral Lyapunov exponents with vanishing eigenvalues .λi =
0 of the Jacobian.
CENTER MANIFOLD The space spanned by the neutral eigenvectors .ei is the center
manifold.
of the pitchfork system (2.37) vanishes at the jump-points .(2) and .(4) at which the
catastrophe occurs, compare Fig. 2.14. At the jump-points .h + ax is per definition
tangent to .x 3 , having identical slopes,
d 3 d
. x = h + ax , 3x 2 = a . (2.38)
dx dx
At these transition points the autonomous flow becomes hence infinitesimally slow,
since .λ → 0, a phenomenon called “critical slowing down” in the context of the
theory of thermodynamic phase transitions.
Center Manifold Normal Forms The program of the catastrophe theory consists
of finding and classifying the normal forms for the center manifolds of stationary
points .x∗ , by expanding to the next, non-vanishing order in .δx = x − x∗ . The
aim is hence to classify the types of dynamical behavior potentially leading to
discontinuous transitions.
Changing the Controlling Parameters How does the location .x∗ of a fixpoint
change upon variations .δc around the set of parameters .cc determining the catas-
trophic fixpoint .x∗c ? With
x∗c = x∗ (cc ),
. x∗ = x∗ (c), δc = c − cc
∂fi ∂fi
J δx∗ + P δc = 0,
. Jij = , Pij = , (2.39)
∂xj ∂cj
δx∗ = −J −1 P δc,
. if |J | /= 0 . (2.40)
The fixpoint may change however in a discontinuous fashion whenever the determi-
nant .|J | of the Jacobian vanishes, viz in the presence of a center manifold. This is
precisely what happens at a catastrophic fixpoint .x∗c (cc ).
Catastrophic Manifold The set .x∗c = x∗c (cc ) of catastrophic fixpoints is deter-
mined by two conditions, by the fixpoint condition .f = 0 and by .|J | = 0. For the
pitchfork system (2.37) we find,
2 3 3
ac = 3 xc∗ ,
. hc = xc∗ − ac xc∗ = −2 xc∗ ,
√
when using (2.38). Solving for .xc∗ = ac /3 we can eliminate .xc∗ and obtain
hc = −2 (ac /3)3/2 ,
. (hc /2)2 = (ac /3)3 , (2.41)
which determines the “catastrophic manifold” .(ac , hc ) for the pitchfork transition,
as illustrated in Fig. 2.15.
Gradient Dynamical Systems At times the flow .f(x) of a dynamical system may
be represented as a gradient of a bifurcation potential .U (x),
0 h
0
a
This is generically the case for one-dimensional systems, as discussed in Sect. 2.2.2,
but otherwise not. For the gradient representation
the case.
For gradient dynamical systems one needs to discuss only the properties of the
bifurcation potential .U (x), as scalar quantity, and they are hence somewhat easier
to investigate than a generic dynamical system of the form .ẋ = f(x). Catastrophe
theory is mostly limited to gradient systems.
Bifurcation vs. Tipping A rapid transfer between distinct stable manifolds may
occur when systems reach a bifurcation point upon adiabatic parameter changes, as
illustrated in Fig. 2.14. Systems undergoing self-accelerating instabilities are said
“to tip”, which is a somewhat broader terminology. Instead of a description in terms
70 2 Bifurcations and Chaos in Dynamical Systems
20 r = 0.4
0.4
flow f(x-a) r = 0.5
15 r = 0.6
f(x-b)
r = 1.2
10
x(t)
x 5
b a 0
-5
Fig. 2.16 Rate-induced tipping. Left: The fixpoints .x ∗ = a/b are stable√ for the individual flows
(2.43), here for .a = 5 and .b = −5. The stationary points shift to .± 52 − 1 ≈ ±4.9 when the
two flows are superimposed, as given by (2.45). Right: The orbits .x(t) under the influence of a
forcing .ȧ = r. For small rates the system manages to adapt, as for .r = 0.4, for larger rates not. For
.r = 0.5/0.6/1.2 the system tips towards the alternative steady state
−z
ẋ = f (x − a),
. f (z) = . (2.43)
1 + z2
In this example the flow decays inversely with the distance to the fixpoint .x ∗ = a,
as illustrated in Fig. 2.16, after being maximal in magnitude at .x = a ± 1. For two
attracting states the dynamics takes the form
ẋ = f (x − a) + f (x − b) ,
. (2.44)
7 See Chap. 8.
2.4 Logistic Map and Deterministic Chaos 71
which makes the system bistable. The two flows compete with each other, which
shifts the two stable fixpoints,
(x − a) 1 + (x − b)2 = −(x − b) 1 + (x − a)2 ,
.
as determined by .ẋ = 0, or
(x − a)(x − b) = −1,
. x = ± a2 − 1 , (2.45)
where we specialized in the last step to .b = −a. The two stable fixpoints are
seperated by an unstable fixpoint at .x = 0. Equation (2.44) describes a situation
in which the influence of attracting states decays in phase space, which is typical
for dynamical systems composed of interacting real-world identities. An example
would be physical objects, whose sphere of influence is often limited to the
immediate surrounding.
Adaption Catastrophy In (2.44) the dynamical variable .x = x(t) reflects the state
of the system. In the following we take .x ≈ a to correspond to a healthy system,
say a pristine ecosystem, and states .x ≈ b to a disrupted system. External impacts
may change the control parameter .a = a(t), e.g. with a constant rate r,
ȧ = r .
.
The orbit cannot follow when a moves to the right too fast, which happens when
ẋ < ȧ = r. Given that .f (x −a) is bounded, this necessarily happens with increasing
.
r. At this point the system fails to adapt to the changing environment, with the orbit
reverting to a disrupted state close to b. Rate induced tipping occurs in the form of
an adaption catastrophy. Extended transients are observed close to the critical rate
r, as shown in Fig. 2.16.
0
0 2 4 6 8 10 12 14 16 18 20 22
forecasting timespan [days]
Fig. 2.17 The average accuracy of weather forecasting, normalized to .[0, 1], decreases rapidly
with increasing prediction timespan, due to the chaotic nature of weather dynamics. Increasing
the resources devoted for improving the prediction accuracy results in decreasing returns close to
the resulting complexity barrier. Reprinted from Gros (2012) under CC-BY-4.0 license, © 2012
Complex Systems Publications, Inc.
This means that a very small change in the initial setting can blow up even after
a short time. When considering real-world applications, when models need to be
determined from measurements containing inherent errors and limited accuracies,
an exponential sensitivity can result in unpredictability. A well known example is
the problem of long-term weather prediction, as shown in Fig. 2.17.
Logistic Map One of the most cherished models in the field of deterministic chaos
is the logistic map of the interval .[0, 1] onto itself,
where we made use of the notation .xn = x(n), for discrete times .n = 0, 1, .. . The
functional dependence is illustrated in Fig. 2.18. Despite its apparent simplicity, the
logistic map shows an infinite series of bifurcations that culminate in a transition to
chaos.
which is limited for high population densities .x → 1, viz when resources become
scarce. The classical example is that of a herd of reindeer on an island.
Knowing the population density .xn in a given year n we may predict via (2.46)
the population density for all subsequent years exactly; the system is deterministic.
2.4 Logistic Map and Deterministic Chaos 73
1 1
0.8 0.8
0.6 0.6
0.4 0.4
r = 2.5 r = 3.3
0.2 g(x)
0.2 g(x)
g(g(x)) g(g(x))
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x
Fig. 2.18 The logistic map .g(x) = rx(1 − x) (red), and the iterated logistic map .g(g(x)) (green);
for .r = 2.5 (left) and .r = 3.3 (right). Also shown are iterations of .g(x), starting from .x = 0.1 (thin
solid line). Note, that the fixpoint .g(x) = x is stable/unstable for .r = 2.5 and .r = 3.3, respectively.
The orbit is attracted to a fixpoint of .g(g(x)) for .r = 3.3, corresponding to a cycle of period two
for .g(x)
Nevertheless the population shows irregular behavior for certain values of r, which
is hence chaotic.
x = rx(1 − x)
. ⇐⇒ x=0 or 1 = r(1 − x) .
1/r = 1 − x,
. x (1) = 1 − 1/r, r1 < r, r1 = 1 . (2.47)
It is present for .r1 < r, with .r1 = 1, due to the restriction .x (1) ∈ [0, 1].
Stability Analysis For maps, fixpoints are stable when the derivative .g ' of the flow
is smaller than one in magnitude, see (2.14). For .x (1) = 1 − 1/r this translates to
g ' (x) = r(1 − 2x),
. g ' x (1) = 2 − r . (2.48)
Fixpoints of Period Two For .r > 3 a fixpoint of period two appears. This fixpoint
is a fixpoint of the iterated function
1 = r 2 (1 − rx + rx 2 ) − r 2 x(1 − rx + rx 2 ),
.
0 = r 3 x 3 − 2r 3 x 2 + (r 3 + r 2 )x + 1 − r 2 . (2.50)
In order to find the roots of (2.50) we use the fact that .x = x (1) = 1 − 1/r is a
stationary point of both .g(x) and .g(g(x)), see Fig. 2.18. Dividing (2.50) by the root
.(x − x
(1) ) = (x − 1 + 1/r) one obtains
(r 3 x 3 − 2r 3 x 2 + (r 3 + r 2 )x + 1 − r 2 ) : (x − 1 + 1/r) =
.
r 3 x 2 − (r 3 + r 2 )x + (r 2 + r) .
which leads to
(2) 1 1 1 1 2 1 1
x±
. = 1+ ± 1+ − + 2 . (2.51)
2 r 4 r r r
Period Doubling Bifurcation We have three fixpoints of period two for .r > 3
(two stable ones and one unstable), and only a single fixpoint for .r < 3. What
happens at .r = 3?
2
(2) 13+1 1 3+1 3+1
.x± (r = 3) = ± −
2 3 4 3 9
2 1
= = 1 − = x (1) (r = 3) .
3 3
At .r = 3 the fixpoint splits into two stable and one unstable branch, see Fig. 2.19,
akin to the pitchfork bifurcations discussed in Sect. 2.2. Given that a period doubling
occurs at .r = 3, the transition can be seen alternatively as the discrete analog of a
supercritical Hopf bifurcation.
2.4 Logistic Map and Deterministic Chaos 75
1 1
0.8
0
3 3.2 3.4 3.6 3.8 4
0.6 r
<λ>
-1
0.4
-2
0.2
0 -3
2.8 3 3.2 3.4 3.6 3.8 4
r
Fig. 2.19 Left: The values .xn for the iterated logistic map (2.46). For .r < r∞ ≈ 3.57 the .xn go
through cycles with finite but progressively longer periods. For .r > r∞ the plot would be fully
covered in most regions if all .xn would be shown. Right: The corresponding maximal Lyapunov
exponents, as defined by (2.55). Positive Lyapunov exponents .λ indicate chaotic behavior
(2)
Bifurcation Cascade One may carry out a stability analysis for .x± , just as for
.x
(1) . One finds a critical value .r > r such that
3 2
(2)
x±
. (r) stable ⇐⇒ r2 < r < r3 . (2.52)
Going further on one finds an .r4 such that there are four fixpoints of period four,
that is of .g(g(g(g(x)))), for .r3 < r < r4 . In general there are critical values .rn and
.rn+1 such that there are
The logistic map therefore shows iterated bifurcations. This, however, is not yet
chaotic behavior.
Chaos in the Logistic Map There exists a critical .r∞ at which the period-doubling
series converges,
. lim rn → r∞ , r∞ = 3.5699456 . . .
n→∞
r∞ < r < 4 .
.
In order to characterize the sensitivity of (2.46) with respect to the initial condition,
we consider two slightly different starting points .x1 and .x1' :
x1 − x1' = y1 ,
. |y1 | ⪡ 1 .
76 2 Bifurcations and Chaos in Dynamical Systems
' dg(x)
ym = xm − xm
. , ym+1 ≈ ym (2.53)
dx x=xm
between the two respective orbits is still small after m iterations. For (2.53) we used
' = x − y , neglecting terms .∼ y 2 . We hence obtain
xm
. m m m
For .|ϵ| < 1 the map is stable, as two initially different populations close in with
time passing. For .|ϵ| > 1 they diverge; the map is chaotic.
dg(x)
|ϵ| = eλ ,
. λ = log (2.54)
dx
the Lyapunov exponent .λ = λ(r) of a map, as introduced in Sect. 2.2. For positive
Lyapunov exponents the time development is exponentially sensitive to the initial
conditions and shows chaotic features,
This is indeed observed in nature, e.g. for populations of reindeer on isolated islands,
as well as for the logistic map for .r∞ < r < 4, compare Fig. 2.19.
1 dg (n) (x)
λ(max) = lim
. log , g (n) (x) = g(g (n−1) (x)) . (2.55)
n⪢1 n dx
Using (2.54) for the short time evolution we can decompose .λ(max) into an averaged
sum of short time Lyapunov exponents. .λ(max) is the “global Lyapunov exponent”.
One needs to select advisedly the number of iterations n in (2.55). On one side n
should be large enough such that short-term fluctuations of the Lyapunov exponent
are averaged out. The available phase space is however generically finite, for the
logistic map we have .y ∈ [0, 1]. Therefore, two initially close orbits cannot diverge
ad infinitum. One needs hence to avoid phase-space restrictions, evaluating .λ(max)
for large but finite numbers of iterations n.
2.4 Logistic Map and Deterministic Chaos 77
Routes to Chaos The chaotic regime .r∞ < r < 4 of the logistic map connects to
the regular regime .0 < r < r∞ with increasing period doubling. One speaks of a
“route to chaos via period-doubling”. The study of chaotic systems is a wide field of
research and a series of routes leading from regular to chaotic behavior have been
found. Two important alternative routes to chaos are:
Bifurcations are generically the result of colliding invariant manifolds, viz of stable
and unstable fixpoints, limit cycles and chaotic sets. Examples are the saddle-node
bifurcation (2.19), at which a stable and an unstable fixpoint merge, and the collision
between a limit cycle and a saddle within the Taken–Bogdanov system (2.33).
Odd Logistic Map The collision between two chaotic attractors can be studied
within the odd logistic map
xn+1 = g(xn ) = xn λ − xn2 ,
. λ > 0, (2.56)
which is invariant under inversion .x ↔ (−x). The local maximum .gm = g(xm ) is
given by
λ λ3
g ' = 0,
. xm = , gm = 2 , (2.57)
3 33
Merging Chaotic Attractors For .λ < 1 only the trivial fixpoint exists. Afterwards,
for .1 < λ < λs , there are two attracting states that are related via inversion
symmetry, as shown in Fig. 2.20. Both branches undergo a period-double transition
to chaos, in equivalence to the one observed for the standard logistic map, compare
78 2 Bifurcations and Chaos in Dynamical Systems
2
2
gm xn+1 = xn(λ-xn )
1 1
0
0
-1 0
xm 1
unique fixpoint
-1 positive attractor
-1 x negative attractor
2
g=x(2-x ) merged
-2
0 0.5 1 1.5 2 2.5 3
λ
√ Left: For .λ = 2, the odd logistic map (2.56). The local maximum .gm is reached for
Fig. 2.20
.xm = λ/3 ≈ 0.82. Right: The bifurcation diagram of the odd logistic map as a function of .λ.
One has a unique fixpoint .xn = 0 (black)
√ for .λ < 1 and two equivalent attractors (blue/green) for
.1 < λ < λs , which merge at .λs = 27/2 ≈ 2.6. Inversion symmetry is restored at .λs (red). A
period-doubling transition to chaos is observable in the symmetry broken phase
The dynamical systems we considered so far all had instantaneous dynamics, being
of the type
d
.y(t) = f (y(t)), t >0 (2.59)
dt
y(t = 0) = y0 ,
when denoting with .y0 the initial condition. This is the simplest case: one dimen-
sional (a single dynamical variable only), autonomous (.f (y) is not an explicit
function of time) and deterministic (no noise).
2.5 Dynamical Systems with Time Delays 79
d
. y(t) = f (y(t), y(t − T )), t >0 (2.60)
dt
Due to the delayed coupling we need now to specify an entire initial function .φ(t).
Differential equations containing one or more time delays need to be considered
carefully, with the time delay introducing additional dimensions to the problem. We
discuss a several basic examples.
d
. y(t) = −ay(t) − by(t − T ), a, b > 0 . (2.61)
dt
The only constant solution for .a + b /= 0 is the trivial state .y(t) ≡ 0. The trivial
solution is stable in the absence of time delays, .T = 0, whenever .a + b > 0. The
question is, whether a finite T may change this.
We may expect the existence of a certain critical .Tc , such that .y(t) ≡ 0 remains
stable for small time delays .0 ≤ T < Tc . In this case the initial function .φ(t) will
affect the orbit only transiently, in the long run the motion would be damped out,
approaching the trivial state asymptotically for .t → ∞.
Delay Induced Hopf Bifurcation Trying our luck with the usual exponential
ansatz, we find
λ = −a − be−λT ,
. y(t) = y0 eλt , λ = p + iq . (2.62)
p + a = −be−pT cos(qT ),
. (2.63)
q = be−pT sin(qT ) .
For .T = 0 the solution is .p = −(a + b), .q = 0, as expected, and the trivial solution
y(t) ≡ 0 is stable. A numerical solution is shown in Fig. 2.21 for .a = 0.1 and .b = 1.
.
80 2 Bifurcations and Chaos in Dynamical Systems
-3
a = −b cos(qT ),
. q = b sin(qT ) . (2.64)
The first condition in (2.64) can be satisfied only for .a < b. Taking the squares in
(2.64) and eliminating qT one has
q=
. b2 − a 2 , T ≡ Tc = arccos(−a/b)/q .
One therefore finds a Hopf bifurcation at .T = Tc , which implies that the trivial
solution becomes unstable for .T > Tc . For .a = 0 the transition point is defined
by .q = b, together with .Tc = π/(2b). Note, that there is a Hopf bifurcation only
for .a < b, viz whenever the time delay dominates, and that q becomes non-zero
well before the bifurcation point, compare Fig. 2.21. One has therefore a region of
damped oscillatory behavior with .q /= 0 and .p < 0.
Discontinuities For differential equations with time delays one may specify
arbitrary initial functions .φ(t), also initial functions with discontinuities, which
will induce discontinuities in the derivatives of the respective trajectories. As an
example we consider the case .a = 0, .b = 1 of (2.61), with a non-zero constant
initial function,
d
. y(t) = −y(t − T ), φ(t) ≡ 1 . (2.65)
dt
The solution can be evaluated by stepwise integration,
t t t
y(t) − y(0) =
. dt ' ẏ(t ' ) = − dt ' y(t ' − T ) = − dt ' = −t, 0 < t < T .
0 0 0
d d
. lim y(t) = 0, lim y(t) = −1 .
t→0− dt t→0+ dt
2.5 Dynamical Systems with Time Delays 81
In analogy one show that the second derivative has a discontinuity at .t = T , the
third derivative at .t = 2T , and so on.
Injectivity Ordinary differential equations are injective in the sense that distinct
initial conditions lead to distinct trajectories. This holds regardless of the presence
of attracting manifolds, which determine solely the long-term behavior.
Delay systems are not necessarily injective with respect to the initial function.
Consider logistic growth with a delayed growth rate,
d
. y(t) = y(t − T ) y(t) − 1 , φ(t = 0) = 1 . (2.66)
dt
For any .φ(t) with .φ(0) = 1 the solution is .y(t) ≡ 1 for all .t ∈ [0, ∞]. Distinct
initial functions lead to identical orbits.
Non-constant Time Delays Things may become rather weird when the time delays
are not constant, as for
d
dt y(t) = y t − T (y) + 12 , T (y) = 1 + |y(t)|,
(2.67)
.
1 t < −1
φ(t) = .
0 t ∈ [−1, 0]
t 3t
y(t) =
. , y(t) = , t ∈ [0, 2] ,
2 2
are both solutions of (2.67), with appropriate continuations for .t > 2. Two different
solutions of the same differential equation and identical initial conditions, this
cannot happen for ordinary differential equations. It is evident, that special care
must be taken when examining dynamical systems with time delays numerically.
Basic delay differential systems contain a single time delay T , like (2.61), which
corresponds to an instantaneous memory process. In general, the memory .yM (t) of
past trajectories will be a convolution,
∞ ∞
yM (t) =
. K(τ )y(t − τ )dτ, K(τ )dτ = 1 , (2.68)
0 0
where we defined with .K(τ ) the delay kernel. For a sharply peaked delay kernel,
K(τ ) = δ(τ − T ), one recovers .yM (t) = y(t − T ).
.
82 2 Bifurcations and Chaos in Dynamical Systems
1 −τ/T
K(τ ) =
. e (2.69)
T
exponentially distributed time delays. One has
∞ ∞
d d d
. yM (t) = dτ K(τ ) y(t − τ ) = − dτ K(τ ) y(t − τ ), (2.70)
dt 0 dt 0 dτ
which allows to integrate the last expression by parts when using (2.69). The
resulting closed expression,
∞
−1 d y − yM
.ẏM = dτ e−τ/T y(t − τ ) = , (2.71)
T 0 dτ T
corresponds to a trailing average, with the memory variable .yM trying to approach
y.
Kernel Series Framework Equation (2.71) implies that a delay differential equa-
tion with exponentially distributed delays can be mapped exactly to a system of
ordinary differential equations by adding an additional variable, namely .yM (t).
For generic kernels .K(τ ) in (2.68) one can generalize this concept by adding a
diverging number of memory variables. With this approach, denoted “kernel series
framework”, one can map any time delay system to a N-dimensional system of
ordinary differential equations. For the case of a single time delay one speaks of the
“linear chain trick”. In general one has .N → ∞, which reflects the notion that delay
systems are formally infinite dimensional.
Exercises
ẋ = −x + y,
. ẏ = −ry, r > 0, (2.72)
and determine the eigenvalues of the Jacobian for the fixpoint (0, 0) and its
eigenvectors. What happens in the limit r → 1?
(2.2) CATASTROPHIC SUPERLINEAR GROWTH
The growth of a resource variable x(t) ≥ 0, as defined by
ẋ = x β − γ x,
. γ = 0, 1 , (2.73)
Exercises 83
together with Lyapunov exponent. Plot the iterated map for r = r ∗ and
values a bit smaller/larger. The occurring saddle-node bifurcation is also
called ‘tangent transition’ Why?
(2.12) CAR-FOLLOWING MODEL
A car moving with velocity ẋ(t) follows another car driving with velocity
v(t) via
ẍ(t + T ) = α v(t) − ẋ(t) ,
. α > 0, (2.76)
with T > 0 being the reaction time of the driver. Prove the stability of the
steady-state solution for a constant velocity v(t) ≡ v0 of the preceding car.8
Further Reading
References
Devaney, R. (2018). An introduction to chaotic dynamical systems. London: CRC Press.
Goldstein, H. (2002). Classical mechanics (3rd ed.). Reading: Addison-Wesley.
Gros, C. (2012). Pushing the complexity barrier: Diminishing returns in the sciences. Complex
Systems, 21, 183.
Gutzwiller, M. C. (2013). Chaos in classical and quantum mechanics. Berlin: Springer Science &
Business Media.
Kharitonov, V. (2012). Time-delay systems: Lyapunov functionals and matrices. Berlin: Springer
Science & Business Media.
Kielhöfer, H. (2012). Bifurcation theory. Berlin: Springer.
Nevermann, D. H., & Gros, C. (2023). Mapping dynamical systems with distributed time delays
to sets of ordinary differential equations. Journal of Physics A: Mathematical and Theoretical,
56, 345702.
Ott, E. (2002). Chaos in dynamical systems. Cambridge: Cambridge University Press.
Poston, T., & Stewart, I. (2014). Catastrophe theory and its applications. Chelmsford: Courier
Corporation.
Ritchie, P. D., Alkhayuon, H., Cox, P. M., & Wieczorek, S. (2023). Rate-induced tipping in natural
and human systems. Earth System Dynamics, 14, 669–683.
Wernecke, H., Sandor, B., & Gros, C. (2019). Chaos in time delay systems, an educational review.
Physics Reports, 824, 1.
Dissipation, Noise and Adaptive Systems
3
Most dynamical systems are not isolated, but interacting with an embedding
environment that may add stochastic components to the evolution equations. The
internal dynamics slows down when energy is dissipated to the outside world,
approaching attracting states which may be regular, such as fixpoints or limit cycle,
or irregular, such as chaotic attractors. Adaptive systems alternate between phases of
energy dissipation and uptake, until a balance between these two opposing processes
is achieved.
In this chapter an introduction to adaptive, dissipative and stochastic systems
will be given together with important examples from the realm of noise controlled
dynamics, like diffusion, random walks and stochastic escape and resonance. We
will discuss to which extent chaos, a regular guest of adaptive systems, may remain
predictable.
Energy is conserved when the contributions from all constituent parts of the
overall system are taken into account. Friction just stands for a transfer process
of energy; when energy is transferred from a system we observe, like a car on a
motorway with the engine turned off, to a system not under observation, such as the
surrounding air. In this case the combined kinetic energy of the car and the thermal
energy of the air body remains constant; the air heats up a little bit while the car
slows down.
which describes a pendulum with a rigid bar, capable of turning over completely,
with φ corresponding to the angle between the bar and the vertical.
The mathematical pendulum reduces to the damped harmonic oscillator for small
φ ≈ sin φ, which is damped/critical/overdamped for γ < 2ω0 , γ = 2ω0 and γ >
2ω0 . In the absence of damping, γ = 0, the energy
φ̇ 2
E=
. − ω02 cos φ (3.2)
2
is conserved,
d
. E = φ̇ φ̈ + ω02 φ̇ sin φ = φ̇ φ̈ + ω02 sin φ = −γ φ̇ 2 ,
dt
when using (3.1).
ẋ = y
. (3.3)
ẏ = −γ y − ω02 sin x .
The phase space for x = (x, y) R 2 . For all γ > 0 the motion approaches for t → ∞
one of the equivalent global fixpoints (2π n, 0), where n ∈ Z.
ΔV (t) = Δx(t)Δy(t)Δz(t)
.
3
.
φ 3 .
φ
2
2
1
1
0
π 0
φ -π/2 π/2
-π/2 π/2
φ π
-1 -1
-2 -2
Fig. 3.1 Simulation of the mathematical pendulum φ̈ = − sin(φ) − γ φ̇, illustrating the evolution
of the phase space volume for consecutive times (shaded regions), starting with t = 0 (top). Left:
The dissipationless case γ = 0. The energy, see (3.2), is conserved as well as the phase space
volume (Liouville’s theorem). Shown are trajectories for E = 1 and E = −0.5 (solid/dashed line).
Right: For γ = 0.4. Note the consecutive contraction of the phase space volume
corresponds to a small volume in phase space when |x − x' | is small. Its time
evolution is given by
d
. ΔV = ΔẋΔyΔz + ΔxΔẏΔz + ΔxΔyΔż ,
dt
or
In Fig. 3.1 the time evolution of a phase space volume is illustrated for the
case of the mathematical pendulum. Volumes of the phase space remain connected
under the effect of the time evolution, undergoing however at times substantial
deformations.
which implies
∇ · ẋ = −γ < 0 .
.
ṙ = (Γ − r 2 ) r,
. ϕ̇ = ω , (3.5)
we have
⎧
∂ ṙ ∂ ϕ̇ ⎨ < 0 for Γ < 0 √
+ = Γ − 3r = < 0 for Γ > 0 and r > rc / 3
2
, (3.6)
√
.
∂r ∂ϕ ⎩
> 0 for Γ > 0 and 0 < r < rc / 3
√
where rc = Γ is the radius of the limit cycle existing when Γ > 0. The system
might either dissipate or take up energy, which is typical behavior of adaptive
systems, as we will discuss further in Sect. 3.2.
Phase space contracts globally when Γ < 0, and locally, close to the limit cycle,
when Γ > 0. In the latter case, when Γ > 0, phase space expands around the
unstable fixpoint (0, 0).
The respective infinitesimal phase space volumes are related via the Jacobian,
.dx dy = r dr dϕ ,
and we find
Comparing with (3.6) we see that the amount and even the sign of phase space
contraction can depend on the choice of the coordinate system. However, phase
space will always contract close to an attractor, regardless of the coordinate
√ system
selected. This hold for (0, 0) when Γ < 0 and for the limit cycle at r = Γ when
Γ > 0.
ΔV̇ ∂fi
. =∇·f= = λi , (3.7)
ΔV ∂xi
i i
and hence by the trace of the Jacobian Jij= ∂fi /∂xj , as mentioned before. The
trace of a matrix corresponds to the sum i λi of its eigenvalues λi . Phase space
hence contracts when the sum of the local Lyapunov exponents is negative.
Lorenz Model A rather natural question regards the possible existence of attractors
with irregular behaviors, i.e. which are different from stable fixpoints, periodic or
quasi-periodic motion. For this question we examine the Lorenz model
ẋ = −σ (x − y),
.
ẏ = −xz + rx − y, (3.8)
ż = xy − bz .
The classical values are .σ = 10 and .b = 8/3, with r being the control variable.
92 3 Dissipation, Noise and Adaptive Systems
Fig. 3.2 A typical trajectory of the Lorenz system (3.8), for the classical set of parameters, σ =
10, b = 8/3 and r = 28. The chaotic orbit loops around the remnants of the two fixpoints (3.9),
which are unstable for the selected set of parameters. Color coding with respect to z, projected to
the z = 0 plane
Fixpoints of the Lorenz Model A trivial fixpoint is (0, 0, 0). The non-trivial
fixpoints are
0 = −σ (x − y), x = y,
. 0 = −xz + rx − y, z = r − 1,
0 = xy − bz, x 2 = y 2 = b (r − 1) .
It is easy to see by linear analysis that the fixpoint (0, 0, 0) is stable for r < 1. For
r > 1 it becomes unstable via a pitchfork bifurcation and two new fixpoints appear,
These are stable for r < rc = 24.74 (for σ = 10 and b = 8/3), at which point a
subcritical Hopf bifurcation occurs. Generally one has rc = σ (σ +b+3)/(σ −b−1).
For r > rc the behavior becomes more complicated and generally non-periodic.
Strange Attractors One can show, that the Lorenz model has a positive Lyapunov
exponent for r > rc . It is chaotic with sensitive dependence on the initial conditions.
A typical orbit is illustrated in Fig. 3.2. The Lorenz model is at the same time
globally dissipative, since
∂ ẋ ∂ ẏ ∂ ż
. + + = −(σ + 1 + b) < 0, σ, b > 0 . (3.10)
∂x ∂y ∂z
The consequence is that the attractor of the Lorenz system cannot be a smooth
surface. Close to the attractor phase space contracts. At the same time two nearby
orbits are repelled due to the positive Lyapunov exponents. One finds a self-similar
structure for the Lorenz attractor with a fractal dimension 2.06 ± 0.01, as defined
further below. Such a structure is called a “strange attractor”.
3.1 Chaos in Dissipative Systems 93
Dissipative Chaos and Strange Attractors Strange attractors can only occur in
dynamical system of dimension three and higher, in one dimension fixpoints are the
only possible attracting states and one needs at least two dimensions for limit cycles.
The Lorenz model has an important historical relevance for the development
of chaos theory It is considered a paradigmatic model, since chaos in dissipative
and deterministic dynamical systems is closely related to the emergence of strange
attractors. Chaos may arise in one dimensional maps, as we see in Sect. 2.4, but
continuous-time dynamical systems need to be at least three dimensional in order to
show chaotic behavior.
N(l) ∝ l −DH ,
. for l → 0, (3.11)
94 3 Dissipation, Noise and Adaptive Systems
6 7 8
then DH is called the Hausdorff dimension of the set. Alternatively we can rewrite
(3.11) as
−DH
N (l) l log[N(l)/N(l ' )]
. = , DH = − , (3.12)
N(l ' ) l' log[l/ l ' ]
log[8/1] log 8
DH → −
. = ≈ 1.8928 .
log[1/3] log 3
The strange attractor of the Lorenz model has the form of a ‘butterfly’, but otherwise
seemingly no further internal structure. Orbits exponentially diverge close to the
attractor, remaining however bounded by its overall size. The only feature remaining
predictable is that orbits will not diverge indefinitely. The sistuation is more complex
for most real-world chaotic systems.
As an example of a real-world system with substantial chaotic components we
may take weather forecasting. When a hurricane approaches a coastline, one may
3.1 Chaos in Dissipative Systems 95
ΔXt
ΔX0
x
x x
y y y
Fig. 3.5 The evolution of two initially close trajectories (blue/magenta lines). The respective
initial states (shaded circles) are evolved for the identical period t (filled circles). Shown are orbits
for the non-linear rotator (3.5), for .Γ < 0 (left panel) and .Γ > 0 (middle panel), which converge
respectively to a stable focus and limit cycle (black ring). For the Lorenz model (right panel),
trajectories diverge on the attractor, with one orbit circling one of the unstable fixpoints (grey
circles) an additional time. Parameters as in Fig. 3.2
not be able to predict the location of landfall precisely, nor the strength of the
hurricane at that point. Nobody will doubt however, that a full-blown hurricane
is approaching. Certain features can remain predictable even in otherwise chaotic
systems. This is the notion of “partially predictable chaos”, which we will examine
in the following.
For a precise definition of partially predictable chaos one needs an observational
tool allowing to determine the presence of chaos directly from the properties of
typical orbits, without resorting to the evaluation of Lyapunov exponents. This tool,
a “0-1 test for chaos”, is developed first.
Initially Close Orbits For a given dynamical system we examine the long-term
fate of two initially close orbits, .x = x(t) and .x' = x' (t), with
which is defined as the average distance between .x and .x' for times .t > T , where T
is larger than the time scales of the defining dynamical system.
0-1 Test for Chaos In Fig. 3.5 the evolution for two initial close orbits is illustrated
for the case that both orbits approach the same attractor, which may be a fixpoint,
limit cycle, or a chaotic attractor.
96 3 Dissipation, Noise and Adaptive Systems
ΔXt
classical fully
decorrelated
partially
predictable
log(time)
Fig. 3.6 Chaotic dynamics is partially predictable (black curve) when the distance .ΔXt of two
initially close orbits remains at a temporary plateau after the starting divergence. The fully
decorrelated final state is reached only after a substantial delay. Information is lost within a single
process for classical chaos (orange curve)
– FIXPOINT
One has .ΔX∞ → 0 independently of .ΔX0 .
– LIMIT CYCLE
When .ΔX0 is small, substantially smaller than the size of the attracting limit
cycle, one observes .ΔX∞ ∝ ΔX0 . The two orbits will follow each other
indefinitely after having entered the limit cycles at slightly different points, with
the average distance .ΔX∞ scaling linearly with .ΔX0
– CHAOTIC ATTRACTOR
Orbits fully decorrelate on the attractor, with .ΔX∞ approaching the average two-
point distance irrespectively of .ΔX0 .
Above rules constitute a “0-1 test” for chaos. Operationally, one integrates the
system in question for pairs of initially close orbits. The resulting average final
distance .ΔX∞ is plotted relative to its initial value. The attracting state is a limit
cycle if .ΔX∞ scales with .ΔX0 , otherwise a fixpoint or a chaotic attractor.
In general, complex systems are neither fully conserving nor fully dissipative.
Adaptive systems will have phases where they take up energy and periods where
they give energy back to the environment. An example is the non-linear rotator
defined in (3.5), see also (3.6).
One affiliates with the term “adaptive system” often to the notion of complexity
and adaption. Strictly speaking, any dynamical system is adaptive if .∇ · ẋ may take
both positive and negative values. In practice, however, it is usual to reserve the
term adaptive system to dynamical systems showing a certain complexity, such as
emerging behavior.
Van der Pol Oscillator Circuits or mechanisms built for the purpose of controlling
an engine or machine are intrinsically adaptive. An example is the Van der Pol
oscillator,
ẋ = y
ẍ − ϵ(1 − x 2 )ẋ + x = 0,
. (3.14)
ẏ = ϵ(1 − x 2 )y − x
where .ϵ > 0 and where we did use the phase space variables .x = (x, y). The
evolution of the phase space volume is
∇ · ẋ = ϵ (1 − x 2 ) .
.
The oscillator takes up/dissipates energy for .x 2 < 1 and .x 2 > 1, respectively. A
simple mechanical example for a system with similar properties is illustrated in
Fig. 3.7
Amplitude a and phase .φ are arbitrary, as usual for harmonic oscillators. The
perturbation .ϵ(1 − x 2 )ẋ may change both the amplitude and the unperturbed
frequency .ω0 = 1 by an amount .∝ ϵ. In order to account for this “secular
perturbation” we make the ansatz
x(t) = A(T )eit + A∗ (T )e−it + ϵx1 ,
. A(T ) = A(ϵt) , (3.16)
98 3 Dissipation, Noise and Adaptive Systems
which differs from the usual expansion .x(t) → x0 (t) + ϵx ' (t) of the full solution
.x(t) of a dynamical system with respect to a small parameter .ϵ. The yet to be
Expansion The goal is to expand the Van der Pol oscillator together with (3.16)
with respect to the perturbation2 We start be evaluating several expressions involing
1
.x(t) up to order .O(ϵ ), namely
x 2 ≈ A2 e2it + 2|A|2 + (A∗ )2 e−2it + 2ϵx1 Aeit + Ae−it
.
ϵ(1 − x 2 ) ≈ ϵ(1 − 2|A|2 ) − ϵ A2 e2it + (A∗ )2 e−2it ,
∂A(T )
ẋ ≈ (ϵAT + iA) eit + c.c. + ϵ ẋ1 , AT =
∂T
ϵ(1 − x 2 )ẋ = ϵ(1 − 2|A|2 ) iAeit − iA∗ e−it
− ϵ A2 e2it + (A∗ )2 e−2it iAeit − iA∗ e−it
and
ẍ =
. ϵ 2 AT T + 2iϵAT − A eit + c.c. + ϵ ẍ1
≈ (2iϵAT − A) eit + c.c. + ϵ ẍ1 .
2 The following derivations are informative, but somewhat advanced. In case, the reader may skip
directly to the result, Eq. (3.18).
3.2 Adaptive Systems 99
to order .O(ϵ 1 ).
of the two terms on the right-hand side of (3.17) are proportional to the unperturbed
frequency .ω0 = 1 and to .3ω0 , respectively.
∂A 1 ∂A ϵ
AT =
. = 1 − |A|2 A, = 1 − |A|2 A , (3.18)
∂T 2 ∂t 2
where we used .T = ϵt. The solvability condition (3.18) can be written as
ϵ
ȧ eiφ + i φ̇ a eiφ =
. 1 − a 2 a eiφ ,
2
ȧ = ϵ 1 − a 2 a/2,
. φ̇ ∼ O(ϵ 2 ) . (3.19)
The system takes up energy for .a < 1 and the amplitude a increases till the
saturation limit .a → 1, the conserving point. For .a > 1 the system dissipates
energy to the environment and the amplitude a decreases, approaching unity for
.t → ∞, just as we discussed in connection with (3.5).
In the limit .ϵ → 0 the long-term solution of the Van der Pol oscillator is
.x(t) ≈ 2 a cos(t), compare (3.16) and (3.19), which constitutes an amplitude-
regulated oscillation. This behavior, compare Fig. 3.8, is relevant for technical
control tasks.
3 The harmonic oscillator is resonant only when the frequency of the perturbation matches the
internal frequency. Non-harmonic oscillators may however unstable against rational frequency
ratios, as discussed in Sect. 2.1 of Chap. 2 in the context of the KAM theorem, with regard to
the gaps in the Saturn rings.
100 3 Dissipation, Noise and Adaptive Systems
-2
-4
d
. ϵ Y (t) = −x(t) = ẍ(t) − ϵ 1 − x 2 (t) ẋ(t) (3.20)
dt
the Liénard variable .Y (t), where the second equality is just the definition of the Van
der Pol oscillator, see (3.14). With the .X(t) ≡ x(t) we rewrite (3.20) as
ϵ Ẏ = Ẍ − ϵ 1 − X2 Ẋ,
. X(t) = x(t) ,
X3
ϵY = Ẋ − ϵ X −
. ,
3
where we did set the integration constant to zero. Together with (3.20) we obtain
Ẋ = c Y − f (X) ,
. f (X) = X3 /3 − X,
Ẏ = −X/c , (3.21)
Relaxation Oscillations For a large driving c, we can discuss the solution of the
Van der Pol oscillator (3.21) graphically, as illustrated in Fig. 3.9. Of relevance is the
flow .(Ẋ, Ẏ ) in phase space .(X, Y ). For .c ⪢ 1 there is a separation of time scales,
Y y
2
2
1
a0 1 a0
1 -a 0 X -a 0 x
1
Fig. 3.9 Van der Pol oscillator for a large driving .c ≡ ϵ. Left: Relaxation oscillations with respect
to the Liénard variables (3.21). Indicated is the flow .(Ẋ, Ẏ ) (arrows), for .c = 3, see (3.21). Also
shown is the .Ẋ = 0 isocline .Y = −X + X 3 /3 (solid line) and the limit cycle, which includes a
non-constant part (dashed line) and a section of the isocline. Right: The limit cycle in terms of the
original variables .(x, y) = (x, ẋ) = (x, v). Note that .X(t) = x(t)
– Starting at a general .(X(t0 ), Y (t0 )) the orbit develops very fast .∼ c and nearly
horizontally until it hits the “isocline”4
. Ẋ = 0, Y = f (X) = −X + X3 /3 . (3.22)
– Once the orbit is close to the .Ẋ = 0 isocline .Y = −X + X3 /3 the motion slows
down, proceeding with a velocity .∼ 1/c close-to (but not exactly on) the isocline
(3.22).
– Once the slow motion reaches one of the two local extrema .X = ±a0 = ±1 of
the isocline, it cannot follow the isocline any more and makes a rapid transition
towards the other branch of the .Ẋ = 0 isocline, with .Y ≈ const. Note, that
trajectories may cross the isocline vertically, which implies that .Ẏ |X=±1 = ∓1/c
is small but finite right at the extrema.
Per definition, conserving dynamical systems conserves the volume of phase space
enclosed by a set of trajectories, as illustrated in Fig. 3.1. In contrast, phase space
expands and contracts along a given orbit when the system is adaptive, as we
discussed for the case of the Van der Pol oscillator, as defined by (3.14), and for the
Taken-Bogdanov system.6 Adaptive systems cannot conserve phase space volume,
but they may conserve other quantities, like a generalized energy functional.
ẋ = v v2
. E(x, v) = + V (x) (3.23)
v̇ = −∇V (x) 2
for a mechanical systems with a potential .V (x) conserves energy, .E = E(x, v),
dE ∂E ∂E
. = ẋ + v̇ = ∇V + v̇ v = 0 .
dt ∂x ∂v
Energy is an instance of a “constant of motion”, viz of a conserved quantity.
Lotka–Volterra Model for Rabbits and Foxes Evolution equations for one or
more interacting species are termed “Lotka–Volterra” models. A basic example is
that of a prey (rabbit) with population density x being hunted by a predator (fox)
with population density y,
ẋ = Ax − Bxy
. . (3.24)
ẏ = −Cy + Dxy
The population x of rabbits can grow by themselves but the foxes need to eat rabbits
in order to multiply. All constants .A, B, C and D, are positive.
A 0 0 −BC/D
J0 =
. , J1 = .
0 −C AD/B 0
6 SeeSect. 2.3 of Chap. 2). for an in-depth treament of the Taken-Bogdanov equations.
7 We recall that the Jacobian is the matrix of all possible partial derivatives, see Sect. 2.2.1, of
Chap. 2.
3.2 Adaptive Systems 103
3
y
1
E = -2.2 -3.0 -4.0
0
0 1 2 3
x
Fig. 3.10 In the space of population densities .(x, y), the flow of the fox and rabbit Lotka–Volterra
model, see (3.24) and (3.26). The flow expands (contracts) for .x > y (.x < y), as indicated
(blue/green orbits). Trajectories coincide with the iso-energy lines of the conserved function E, as
defined by (3.27), with .(0, 0) being a saddle and .(1, 1) a neutral focus (open red circle). Lyapunov
exponents are real in the shaded region and complex otherwise
The trivial fixpoint .x∗0 is hence a saddle and .x∗1 a neutral focus with purely imaginary
√
Lyapunov exponents .λ = ±i CA. The trajectories circling the focus close onto
themselves, as illustrated in Fig. 3.10 for .A = B = C = D = 1.
Phase Space Evolution We now consider the evolution of phase space volume, as
defined by (3.4),
∂ ẋ ∂ v̇
. + = A − By − C + Dx . (3.26)
∂x ∂v
The phase space expands/contracts for y smaller/larger than .(A + Dx − C)/B, the
tell sign of an adaptive system.
on phase space .(x, y) is a constant of motion for the Lotka–Volterra model (3.24),
since
dE
. = Aẏ/y + C ẋ/x − B ẏ − D ẋ
dt
= A(−C + Dx) + C(A − By) − B(−C + Dx)y − D(A − By)x
= 0.
dE
. = ∇E · ẋ = 0, ∇E ⊥ ẋ . (3.28)
dt
The flow .ẋ is consequently perpendicular to the gradient .∇E the conserved
generalized energy. Orbits are therefore confined to an iso-energy manifold, which is
one-dimensional, given that phase space is two-dimensional. Trajectories coincided
hence with the iso-energy manifold for the fox and rabbit Lotka–Volterra, as
illustrated in Fig. 3.10.
Lyapunov Exponent We evaluate the Jacobian for a generic point .(x, y) in phase
space,
(1 − y) −x x−y 1
. , λ± = ± (x − y)2 − 4(x + y − 1) ,
y (x − 1) 2 2
where we did set .A = B = C = D = 1. The Lyapunov exponents .λ± are real close
to the axes and complex further away, with the separatrix given by
√
(x − y)2 = 4(x + y − 1),
. y =x+2± 8x . (3.29)
There is no discernible change in the flow dynamics across the separatrix (3.29),
which we included in Fig. 3.10, viz when the Lyapunov exponents acquire finite
imaginary components.
Invariant Manifolds Fixpoints and limit cycles are examples of invariant subsets
of phase space.
INVARIANT MANIFOLD A subset of phase space invariant under the flow for all times
.t∈ [−∞, ∞] is denoted an invariant manifold.
3.2 Adaptive Systems 105
All trajectories of the fox and rabbit Lotka–Volterra model, apart from the stable
and the unstable manifolds of the saddle .(0, 0), are closed and constitute hence
invariant manifolds.
Fixpoints and limit cycles are invariant manifolds with dimensions zero and one
respectively, strange attractors, see Sect. 3.1.1, have generically a fractal dimension.
Phase space cannot expand or contract forever for orbits on closed invariant
manifolds M, which are finite. This is the case for all trajectories of the fox and
rabbit Lotka–Volterra model and manifestly evident for the case .A = B = C =
D = 0 discussed above, for which the real part of the Lyapunov exponent .λ± is
anti-symmetric under the exchange .x ↔ y.
x
ẋ = Ax 1 −
. − Bxy, ẏ = (Dx − C)y .
xmax
C A C
. x∗ = , y∗ = 1− , (3.30)
D B Dxmax
which exists for .C < Dxmax . The population of rabbits is never large enough to
support a finite population of foxes in the opposite case, when .C > Dxmax , which
leads to .x → xmax and .y → 0. When existing, the steady state defined by (3.30) is
stable.
The stability of (3.30) can be shown via a direct evaluation of the respective
Jacobian. On a general level one can argue that the resource-limiting factor .1 −
x/xmax adds a contracting element. It is hence not surprising that the previously
neutral fixpoint is stabilized.
106 3 Dissipation, Noise and Adaptive Systems
pt (x),
. x = 0, ±1, ±2, .. . . . , t = 0, 1, 2, . . .
pt (x − 1) + pt (x + 1)
.pt+1 (x) = . (3.31)
2
Next we take the limit of continuous time and space by generalizing (3.31) to
discrete steps .Δx and .Δt in space and time,
where we subtracted on both sides the current distribution .pt (x). Taking the limit
Δx, Δt → 0 in such a way that .(Δx)2 /(2Δt) remains finite, we obtain the diffusion
.
equation
with D being the diffusion constant. Note that the diffusion equation can be cast
into the form of a continuity equation,
ṗ + ∇ · j = 0,
. j = −D ∇p , (3.34)
with the diffusion current .j encoding the diffusive transport of particle from high- to
low concentrations.
Solution of the Diffusion Equation The solution Φ(x, t) of the diffusion equation
(3.33) is given by
∞
1 x2
Φ(x, t) = √
. exp − , dx Φ(x, t) = 1 , (3.35)
4π Dt 4Dt −∞
which holds8 for a localized initial state Φ(x, t = 0) = δ(x). For the derivation one
enters the appropiate derivates,
where we assumed that the mean 〈x〉 = 0 vanishes. Diffusive transport is therefore
characterized by transport sublinear in time, in contrast to ballistic transport
following x = vt. Compare Fig. 3.11.
Green’s Function for Diffusion For general initial distributions p0 (x) = p(x, 0)
of walkers the diffusion equation (3.33) is solved by
. p(x, t) = dy Φ(x − y, t) p0 (y) , (3.37)
√ √
e−x
2 /a
8 Note that dx = aπ, together with lima→0 exp(−x 2 /a)/ aπ = δ(x).
108 3 Dissipation, Noise and Adaptive Systems
Fig. 3.11 Examples of random walkers with scale-free distributions ∼ 1/|Δx|1+β for real-space
jumps, see (3.39). Left: β = 3, which falls into the universality class of standard Brownian motion.
Right: β = 0.5, a typical Levy flight. Note the occurrence of longer-ranged jumps in conjunction
with local walking
First Passage Time When starting from the origin, which is the typical time ty a
random walker would need to reach a certain distance y > 0 for the first time? We
define the survival probability
y
Sy (t) =
. dx Φ(x, t) − Φ(x − 2y, t) , Sy (0) = 1, Sy (∞) = 0 ,
−∞
which denotes the probability that the walker is below y at time t, without having
ever crossed the ‘cliff’ x = y. Here we used the solution (3.35) of the diffusion
equation, as describing walkers starting respectively at x = 0 and x = 2y.
Importantly, the kernel Φ(x, t) − Φ(x − 2y, t) vanishes for x = y. The survival
probability is monotonically decreasing with increasing time. The first passage time
t = tF is not fixed, but distributed as
d y d2
.Fy (t) = − Sy (t) = −D dx Φ(x, t) − Φ(x − 2y, t) ,
dt −∞ d 2x
3.3 Diffusion and Transport 109
where we used the diffusion equation Φ̇ = DΔΦ in one dimension below the
integral. Direct integration yields
−D d 2 −x 2 /(4Dt)
y
−(x−2y)2 /(4Dt)
Fy (t) = √
. edx − e
4π Dt −∞ dx 2
−D d −x 2 /(4Dt)
− e−(x−2y) /(4Dt)
2
= √ e
4π Dt dx x=y
y y2
= √ exp − , (3.38)
4π Dt 3 4Dt
For fixed y, one has that (3.38) scales as ∼ t −α for large t, with α = 3/2. The
first moment of the first passage time density Fy (t) diverges consequently.10 An
average first passage time is not defined.
Lévy Flights One can generalize the concept of a random walker, which is at the
basis of ordinary diffusion, and consider a random walk with distributions ρ(Δt)
and ρ(Δx) for waiting times Δti and jumps Δxi , at time step i = 1, 2, . . . of the
walk, as illustrated in Fig. 3.12. One may assume scale-free distributions
1 1
ρ(Δt) ∼
. , ρ(Δx) ∼ , α, β > 0 . (3.39)
(Δt)1+α (Δx)1+β
If α > 1 (finite mean waiting time) and β > 2 (finite variance), nothing special
happens. In this case the central limiting theorem for well behaved distribution
functions is valid for the spatial component and one obtains standard Brownian
diffusion. Relaxing the above conditions one finds four regimes: normal Brownian
diffusion, “Lévy flights”, fractional Brownian motion, also denoted “subdiffusion”
and generalized Lévy flights termed “ambivalent processes”. Their respective
scaling laws are listed in Table 3.1, with two examples being shown in Fig. 3.11.
Lévy flights occur for a wide range of processes, such as for the flight patterns
of wandering albatrosses. Human travel habits seem to be characterized by a
generalized Lévy flight with α, β ≈ 0.6.
√
9 The Lévy distribution c/(2π ) exp(− 2tc )/t 3/2 is normalized on the interval t ∈ [0, ∞].
10 The moments of powerlaw distributions are discussed in Sect. 1.1.3 of Chap. 1.
110 3 Dissipation, Noise and Adaptive Systems
distance
Δti and jumps Δxi may Δ ti
become a generalized Lévy
flight, compare (3.39)
Δxi
time
A memory would be present, on the other hand, if the transition rule .xt → xt+1
would be functionally dependent on earlier .xt−1 , xt−2 , . . . elements of the process.
since one always arrives to some state .xt+1 = y when starting from a given .xt = x.
A process may stay in place, which occurs with the probability .p(x, x). A state .x ∗
is “absorbing” whenever
p(x ∗ , x ∗ ) = 1,
. p(x ∗ , y) = 0, ∀y /= x ∗ . (3.41)
Table 3.1 The four regimes of a generalized walker with distribution functions, Eq. (3.39),
characterized by scalings ∼ (Δt)−1−α and ∼ (Δx)−1−β for the waiting times Δt and jumps
Δx, as depicted in Fig. 3.12
√
α>1 β>2 x̄ ∼ t Ordinary diffusion
α>1 0<β<2 x̄ ∼ t 1/β Lévy flights
0<α<1 β>2 x̄ ∼ t α/2 Subdiffusion
0<α<1 0<β<2 x̄ ∼ t α/β Ambivalent processes
3.3 Diffusion and Transport 111
starting from a given state .x0 . A famous example is the Galton-Watson process,
which describes the extinction probabilities of family names.11
Master Equation We consider density distributions .ρt (x) of walkers, with each
walker having the same transition probabilities .p(x, y). For discrete times .t =
0, 1, . . . , the evolution of the density of walkers is given by the “master equation”
ρt+1 (y) = ρt (y) + x ρt (x)p(x, y) − ρt (y)p(y, x)
. = ρt (y) + x ρt (x)p(x, y) − ρt (y) (3.42)
= x ρt (x)p(x, y) ,
where
we took into account that the number of walkers is conserved, namely that
. y p(x, y) = 1. Random walks and any other stochastic time series, are described
by their defining master equations.
where P is the transition matrix .p(x, y). The stationary distribution of walkers .ρ ∗
is consequently a left eigenvector of P .
α 1−α
P =
. , α, β ∈ [0, 1] (3.43)
1−β β
the transition matrix P for the general two-state Markov process, compare Fig. 3.13.
The eigenvalues .λ of the left eigenvectors .ρ ∗ = (ρ1 , ρ2 ) of P are determined by
α ρ1 + (1 − β)ρ2 = λρ1
. , (3.44)
(1 − α)ρ1 + β ρ2 = λρ2
and
1 1
λ2 = α + β − 1,
. ρλ∗2 = √ .
2 −1
The first eigenvalue dominates generically, .λ1 = 1 > |α + β − 1| = |λ2 |, with the
contribution to .ρλ∗2 dying out. Absorbing states are present whenever .α/β = 1, see
Fig. 3.13.
Random Surfer Model A famous diffusion process is the “random surfer model”
which tries to capture the behavior of Internet users. This model is at the basis of
the original Page & Brin Google page-rank algorithm.
N
ρi (t),
. ρi (t) = 1
i=1
the probability of finding an Internet surfer visiting host i at time t. The surfers are
assumed to perform a markovian walk on the Internet by clicking randomly any
available out-going hyperlink, giving raise to the master equation
c Aij
ρi (t + 1) =
. + (1 − c) ρj (t) . (3.45)
N l Alj j
Normalization is conserved,
Aij
. ρi (t + 1) = c + (1 − c) i ρj (t) = c + (1 − c) ρj (t) .
i j l Alj j
Hence . i ρi (t + 1) = 1 whenever . j ρj (t) = 1.
Google Page Rank The parameter c in the random surfer model regulates the
probability to randomly enter the Internet:
– For .c = 1 the adjacency matrix and hence the hyperlinks are irrelevant. We can
interpret therefore c as the uniform probability to enter the Internet.
– For .c = 0 a surfer never enters or leaves the Internet, continuing to click around
forever, c is hence also the probability to stop clicking hyperlinks.
The random surfer model (3.45) can be solved iteratively. Convergence is fast for
not too small c. At every iteration authority is transferred from one host j to other
hosts i through its outgoing hyperlinks .Aij . The steady-state density .ρi of surfers
can hence be considered as a measure of host authority and is equivalent to the
orginal Google page rank, which was an important score for ranking search results
at the dawn of the Internet.
Relation to Graph Laplacian The continuous time version of the random surfer
model can be derived, for the case .c = 0, from
where .kj is the out-degree of host j and .Δt the time step. Taking the limit .Δt → 0
yields
d Λij
. ρ = Λ̃ ρ, Λ̃ij = − , Λij = kj δij − Aij , (3.46)
dt kj
m v̇ = −m γ v + ξ(t),
. 〈ξ(t)〉 = 0, 〈ξ(t)ξ(t ' )〉 = Qδ(t − t ' ) , (3.47)
where .v(t) is the velocity of the particle and .m > 0 its mass.
for the formal solution of the Langevin equation (3.47), where v0 ≡ v(0).
Mean Velocity Taking the ensemble average 〈v(t)〉 of the velocity leads to
−γ t e−γ t t '
〈v(t)〉 = v0 e
. + dt ' eγ t 〈ξ(t ' )〉 = v0 e−γ t . (3.49)
m 0
0
Mean Square Velocity For the ensemble average 〈v 2 (t)〉 of the velocity squared
one finds
2 v0 e−2γ t t '
〈v (t)〉 =
.
2
v02 e−2γ t + dt ' eγ t 〈ξ(t ' )〉
m 0
0
e−2γ t t t ' ''
+ dt ' dt '' eγ t eγ t 〈ξ(t ' )ξ(t '' )〉
m2 0 0
Q δ(t ' −t '' )
Q e−2γ t t '
= v02 e−2γ t + dt ' e2γ t ,
m2
0
(e2γ t −1)/(2γ )
and finally
Q
〈v 2 (t)〉 = v02 e−2γ t +
. 1 − e−2γ t . (3.50)
2 γ m2
3.4 Stochastic Systems 115
Q
. lim 〈v 2 (t)〉 = (3.51)
t→∞ 2 γ m2
1
〈x v̇〉 = −γ 〈x v〉 +
. 〈x ξ 〉 . (3.52)
m
We note that
d x2 d2 x 2
x v = x ẋ =
. , x v̇ = x ẍ = − ẋ 2
dt 2 dt 2 2
and
t t t' ' '' )
e−γ (t −t
〈xξ 〉 = ξ(t)
. v(t ' )dt ' = dt ' dt '' ξ(t)ξ(t '' ) = 0 ,
0 0 0 m
Qδ(t−t '' )
where we have used (3.48) in the limit of large times and that t '' < t. We then find
d2 〈x 2 〉 d 〈x 2 〉
.
2
− 〈v 2 〉 = −γ
dt 2 dt 2
for (3.52), or
d2 2 d Q
. 〈x 〉 + γ 〈x 2 〉 = 2〈v 2 〉 = , (3.53)
dt 2 dt γ m2
the latter with the help of the long-time result (3.51) for 〈v 2 〉. The solution of (3.53)
is
Q
〈x 2 〉 = γ t − 1 + e−γ t
. . (3.54)
γ 3 m2
116 3 Dissipation, Noise and Adaptive Systems
Q Q
. lim 〈x 2 〉 ≃ t ≡ 2Dt, D= (3.55)
t→∞ γ 2 m2 2γ 2 m2
that the solutions of the Langevin equation show diffusive behavior, compare (3.36).
This result, that D ∝ Q, underpins the notion that diffusion is microscopically due
to a stochastic process.
Massless Limit We add an external force F (x) to the Langevin equation (3.47),
ẋ = v,
. m v̇ = −m γ v + F (x) + ξ(t) , (3.56)
Γ ẋ = F (x) + ξ(t) .
. (3.57)
For stochastic processes in non-physical settings, like in finance, one usually starts
with (3.57), which may be further adapted to the problem at hand.
Γ ẋ = F (x) + b(x)ξ(t) ,
. (3.58)
which is called the “non-linear Langevin equation”. While looking like a fairly
innocuous term, .b(x)ξ is actually not uniquely defined. The random kicks the
system receives at time t depend via .b(x) = b(x(t)) on a yet undefined position
.x(t). Equation (3.58) needs hence to be supplemented with a rule on how to treat
the last term. This is usually done by looking at an integral over a small time inveral
.[t, t + Δt].
Ito Stochastic Calculus Assuming that the strength of the kicks received is
determined by the position of the sytem immediately before the kick takes place
is consistent with the substitution
t+Δt t+Δt
. dt ' b(x(t ' )) ξ(t ' ) → b(x(t)) dt ' ξ(t ' ) , (3.59)
t t
Stratonovich Stochastic Calculus Alternatively, one may assume that the strength
of the individual kicks depend on a suitable average of the position. A possibility to
t+Δt '
model this situation is to substitute . t dt b(x(t ' )) ξ(t ' ) by
b(x(t)) + b(x(t + Δt)) t+Δt
. dt ' ξ(t ' ) , (3.60)
2 t
– NEURAL NETWORKS
Networks of interacting neurons are responsible for the cognitive information
processing in the brain. They must remain functional also in the presence of
noise and be stable as stochastic systems. In this case the introduction of a noise
term to the evolution equation should not change the dynamics qualitatively. This
postulate should be valid for the vast majorities of biological networks.
– DIFFUSION
The Langevin equation reduces, in the absence of noise, to a damped motion
without an external driving force, with .v = 0 acting as a global attractor. The
stochastic term is therefore essential in the long-time limit, leading to diffusive
behavior, with .〈x 2 〉 ∝ t.
– STOCHASTIC ESCAPE AND STOCHASTIC RESONANCE
A particle trapped in a local minimum may escape the minimum by a noise-
induced diffusion process; a phenomenon denoted “stochastic escape”. Stochas-
tic escape in a driven bistable system leads to an even more subtle consequence
of noise-induced dynamics, “stochastic resonance”.
In the following we detail out both stochastic escape and stochastic resonance.
Drift Velocity We add an external potential .V (x) to the Langevin equation (3.47),
d
m v̇ = −m γ v + F (x) + ξ(t),
. F (x) = −V ' (x) = − V (x) , (3.61)
dx
118 3 Dissipation, Noise and Adaptive Systems
where v and m are the velocity and the mass of the particle, .〈ξ(t)〉 = 0 and
〈ξ(t)ξ(t ' )〉 = Qδ(t − t ' ). In the absence of damping and noise, when .γ = 0 = Q,
.
We consider for a moment a constant force F (x) = F and the absence of noise,
ξ(t) ≡ 0. The system reaches an equilibrium for t → ∞ when relaxation and force
cancel each other:
F
m v̇D = −m γ vD + F ≡ 0,
. vD = . (3.62)
γm
∂P (x, t) ∂J (x, t)
. + = 0. (3.63)
∂t ∂x
There are two contributions, JD and Jξ , to the total particle current density, J =
JD + Jξ , induced respectively by diffusion and by stochastic motion. We derive
these two contributions in two steps.
Drift and Diffusion Currents In a first step we disregard noise in (3.61) and set
Q = 0. In the stationary limit particles move in this case uniformly, with the drift
velocity vD . The respective current density is
JD = vD P (x, t) .
.
In a second step we derive the contribution Jξ of the noise term ∼ ξ(t) to the particle
current density by setting the force term to zero, F = 0. For this purpose we rewrite
the diffusion equation (3.33) with
∂P (x, t)
Jξ = −D
. . (3.64)
∂x
∂P (x, t)
J (x, t) = vD P (x, t) − D
. (3.65)
∂x
F Q ∂P (x, t)
= P (x, t) −
γm 2γ 2 m2 ∂x
for the total current density J = JD + Jξ . Substituting this expression for the
total particle current density into the continuity equation, see (3.63), one obtains
the “Fokker–Planck” or “Smoluchowski” equation
aka the master equation of the density distribution P (x, t). The first and second term
on the right-hand side correspond respectively to ballistic and diffusive transport.
Without going into details, we note that above expression for the Fokker–Planck
equation is consistent with the Ito stochastic calculus, as defined by (3.59).
Harmonic Potential One can solve the Fokker–Planck equation (3.66) analytically
for a harmonic confining potential,
f 2
V (x) =
. x , F (x) = −f x .
2
We are interested in particular in the stationary density distribution,
dP (x, t) dJ (x, t)
. =0 =⇒ = 0,
dt dx
where the second equation follows from the continuity condition (3.63). With (3.65)
for the total current density we find
d fx Q d d d
. + P (x) = 0 = βf x + P (x) ,
dx γ m 2γ 2 m2 dx dx dx
120 3 Dissipation, Noise and Adaptive Systems
V(x)
ΔV
P(x)
xmax xmin
Fig. 3.14 Left: Stationary distribution .P (x) of diffusing particles in a harmonic potential .V (x).
Right: Stochastic escape from a local minimum, with .ΔV = V (xmax )−V (xmin ) being the potential
barrier height and J the escape current
1
P (x) = A e−βf x = A e−βV (x) ,
2 /2
. A= √ , (3.68)
2π σ 2
where .σ 2 = 1/(βf ) = Q/(2f γ m), with normalization condition . dxP (x) = 1
being fulfilled. The density of diffusing particles in a harmonic trap is Gaussian-
distributed, see Fig. 3.14.
V (x) ∼ −x + x 3 .
. (3.69)
Without noise, the particle will oscillate around the local minimum eventually
coming to a standstill under the influence of friction, with x → xmin . With noise,
the particle will have a small but finite probability
to reach the next saddle, where ΔV is the potential difference between the saddle
and the local minimum, see Fig. 3.14.
3.5 Noise-Controlled Dynamics 121
The solution (3.68) for the stationary particle distribution in a confining potential
V (x) has a vanishing total current J . For non-confining potentials, like (3.69), the
particle current J (x, t) never vanishes. Stochastic escape occurs when starting with
a density of diffusing particles close the local minimum, as illustrated in Fig. 3.14.
The escape current will be nearly constant whenever the escape probability is small.
In this case the escape current
J (x, t)
. ∝ e−β [V (xmax )−V (xmin )] ,
x=xmax
will be proportional to the probability a particle has to reach the saddle. Function-
ally, we did approximate P (x) with one valid for a perfect harmonic potential, as
given by (3.68).
Kramer’s Escape When the escape current is finite, there is a finite probability
per unit of time for the particle to escape the local minima, the “Kramer’s escape
rate”, rK ,
ωmax ωmin
rK =
. exp − β (V (xmax ) − V (xmin )) , (3.70)
2π γ
where β = 2γ m/Q and where the prefactors can be derived from a more detailed
2 = |V '' (x ''
min )|/m and ωmax = |V (xmax )|/m.
calculation, with ωmin 2
1 1
ẋ = −V ' (x) + A0 cos(Ωt) + ξ(t),
. V (x) = − x 2 + x 4 . (3.71)
2 4
Several remarks.
14 Stochastic escape from local fitness maxima will be discussed in more detail in Sect. 8.4 of
Chap. 8.
122 3 Dissipation, Noise and Adaptive Systems
Transient State Dynamics The system will stay close to one of the two minima,
x ≈ ±1, for most of the time when both A0 and the noise strength are weak, as
illustrated in Fig. 3.16. The dynamics is therefore characterized by rapid transitions
between transiently stable states.
Switching Times An important question is then: How often does the system switch
between the two preferred states x ≈ 1 and x ≈ −1? There are two time scales
present.
– STOCHASTIC ESCAPE
In the absence of external driving, A0 ≡ 0, the transitions are noise driven and
irregular, with the average switching time given by Kramer’s lifetime TK =
1/rK , see Fig. 3.16. The system is translational invariant with respect to time
and the ensemble averaged expectation value
. 〈x(t)〉 = 0
which follows the time evolution of the driving potential with a certain phase
shift φ̄, see Fig. 3.17.
3.5 Noise-Controlled Dynamics 123
Resonance Condition When the time scale 2TK = 2/rK to switch back and forth
due to the stochastic process equals the period 2π/Ω, we expect a large response x̄,
see Fig. 3.17. The time-scale matching condition
2π 2
. ≈
Ω rK
depends via (3.70) on the noise-level Q, which enters the Kramer’s escape rate
rK . For otherwise constant parameters, the response x̄ first increases with rising Q,
decreasing however for elevated noise levels Q. This is the telltale characteristic of
“stochastic resonance”, as shown in Fig. 3.17.
Ice Ages The average temperature Te of the earth differs by about ΔTe ≈ 10 ◦ C
in between a typical ice age and interglacial periods. Both states of the climate are
locally stable.
– ICE AGE
A substantial ice covering increases the albedo of the earth, which leads in turn
to a larger part of sunlight to be reflected back to space. Earth remains cool.
124 3 Dissipation, Noise and Adaptive Systems
– INTERGLACIAL PERIOD
The ice covering is reduced. A larger portion of the sunlight is absorbed by the
oceans and land, with earth remaining warm.
A parameter of the orbit of planet earth, the eccentricity, varies slightly with a period
T = 2π/Ω ≈ 105 years. The intensity of the incoming radiation from the sun
therefore varies with the same period. Long-term climate changes can therefore be
modeled by a driven two-state system, i.e. by (3.71). The driving force, viz the
variation of the energy flux the earth receives from the sun, is however small. The
increase in the amount of incident sunlight is too weak to pull the earth out of
an ice age into an interglacial period or vice versa. Random climatic fluctuation,
like variations in the strength of ocean circulations, are needed to finish the job.
The alternation of ice ages with interglacial periods may therefore be modeled as a
stochastic resonance phenomenon.
Exercises
where the %-sign denotes the modulus operation. This representation can be
used to evaluate analytically the distribution p(x) of finding x when iterating
the logistic map at r = 4 ad infinitum.
(3.2) FIXPOINTS OF THE LORENZ MODEL
Perform
√ the stability
√ analysis of the fixpoint (0, 0, 0) and of C+,− =
(± b(r − 1), ± b(r − 1), r − 1) for the Lorenz model (3.8) with r, b > 0.
(3.3) HAUSDORFF DIMENSION OF THE CANTOR SET
Calculate the Hausdorff dimension of the Cantor set, which is generated
by removing consecutively the middle-1/3 segment of a line having a given
initial length. Start with the Hausdorff dimension of simple straight line.
(3.4) DRIVEN HARMONIC OSCILLATOR
Solve the driven, damped harmonic oscillator
ẍ + γ ẋ + ω02 x = ϵ cos(ωt)
.
Find the general solution. Furthermore, compare to the logistic map (2.46)
for discrete times t = 0, Δt, 2Δt, ...
126 3 Dissipation, Noise and Adaptive Systems
Further Reading
References
Baronchelli, A., & Radicchi, F. (2013). Lévy flights in human behavior and cognition. Chaos,
Solitons & Fractals, 56, 101–105.
Benzi, R. (2010). Stochastic resonance: From climate to biology. Nonlinear Processes in Geo-
physics, 17, 431–441.
Datseris, G., & Parlitz, U. (2022). Nonlinear dynamics: A concise introduction interlaced with
code. Springer.
Einstein, A. (1905). Über die von der molekularkinetischen Theorie der Wärme geforderte
Bewegung von in ruhenden Flüssigkeiten suspendierten Teilchen. Annalen der Physik, 17, 549.
Ginoux, J. M., & Letellier, C. (2012). Van der Pol and the history of relaxation oscillations: Toward
the emergence of a concept. Chaos: An Interdisciplinary Journal of Nonlinear Science, 22,
023120.
Karatzas, I., & Shreve, S. (2012). Brownian motion and stochastic calculus. Springer.
Kulkarni, V. G. (2016). Modeling and analysis of stochastic systems. CRC Press.
References 127
Langevin, P. (1908). Sur la théorie du mouvement brownien. Comptes Rendus, 146, 530–532.
Layek, G. C. (2015). An introduction to dynamical systems and chaos. Springer.
Lorenz, E. N. (1963). Deterministic nonperiodic flow. Journal of the Atmospheric Sciences, 20,
130–141.
Wernecke, H., Sándor, B., & Gros, C. (2017). How to test for partially predictable chaos. Scientific
Reports, 7, 1087.
Self Organization
4
∂ ∂2 ∂2 ∂2
. ρ(x, t) = DΔρ(x, t), Δ= + + , (4.1)
∂t ∂x 2 ∂y 2 ∂z2
with .D > 0 denoting the diffusion constant, .Δ the Laplace operator and .ρ(x, t) the
density of walkers at any given point .(x, t) in space and time.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 129
C. Gros, Complex and Adaptive Dynamical Systems,
https://doi.org/10.1007/978-3-031-55076-8_4
130 4 Self Organization
ρ̇ = rρ(1 − ρ) + Dρ '' ,
. ρ ∈ [0, 1] , (4.3)
∂ 1 ∂ ∂2 1 ∂2
t = α t˜,
. x = β x̃, = , = .
∂t α ∂ t˜ ∂x 2 β 2 ∂ x̃ 2
∂ρ αD ∂ 2 ρ αD
. = αrρ(1 − ρ) + 2 , αr = 1, = 1.
∂ t˜ β ∂ x̃ 2 β2
ρ̇ = ρ(1 − ρ) + ρ ''
. (4.4)
. lim ρ(x, t) = 1, ∀x ,
t→∞
4.1 Interplay Between Diffusion and Reaction 131
1
t=0
t=1
t=2
0.8 t=3
t=4
t=5
t=6
0.6 t=7
ρ(x,t)
t=8
0.4
0.2
0
-15 -10 -5 0 5 10 15
x
Fig. 4.1 Simulation of the Fisher reaction-diffusion equation (4.4). Plotted is .ρ(x, t) for .t =
0, . . . , 8. For the initial distribution .ρ(x, 0) a Gaussian has been taken, which does not correspond
to the natural line-form, but already at .t = 1 the system has relaxed. The wavefronts propagate
asymptotically with velocities .±2
viz that the system saturates. The question of interest is however how saturation is
achieved when starting from a local population .ρ(x, 0), a simulation is presented in
Fig. 4.1. The system is seen to develop wavefronts with a characteristic shape and
velocity. In a biological setting this corresponds to an expansion wave allowing
an initial local population to invade ballistically the uninhabited regions of the
surrounding ecosystem.
This is an interesting observation, since diffusion alone,
√ viz in the absence of
a reaction term, would lead to a sublinear expansion .∼ t. In contrast, ballistic
propagation is linear in time.
For the Fisher equation (4.4), this ansatz leads to the two-component ordinary
differential equation
u' = v
u'' + cu' + u(1 − u) = 0,
. , (4.6)
v ' = −cv − u(1 − u)
132 4 Self Organization
0.2
v 0
-0.2
0 0.5 1
u
Fig. 4.2 Phase space trajectories of the travelling-wave solution .ρ(x, t) = u(x − ctz), with .v =
u' , as given by (4.6). The shape of the propagating wavefront is determined by the heteroclinic
trajectory (red line) emerging from the saddle .(1, 0) and leading to the stable fixpoint .(0, 0)
Minimal Propagation Velocity For the stability of the trivial fixpoint .u∗0 = (0, 0)
one expands (4.6) for small .(u, v),
u' 0 1 u
. = , λ(λ + c) + 1 = 0 ,
v' −1 −c v
1
λ± =
. −c ± c2 − 4 , c ≥ 2. (4.8)
2
A complex eigenvalue would lead to a spiral around .u∗0 = (0, 0), which is not
admissible since .u ∈ [0, 1] needs to be strictly positive. Hence .c = 2 is the√ minimal
occurring propagation velocity. The trivial fixpoint .(0, 0) is stable, since . c2 − 4 <
c for .c ≥ 2 and hence .λ± < 0. The situation is illustrated in Fig. 4.2.
Saddle Restpoint The eigenvalues of the .u∗1 = (1, 0) fixpoint are given by
d u−1 0 1 u−1 1
. = , λ± = −c ± c2 + 4 ,
dz v 1 −c v 2
when denoting .u = u(z). The fixpoint .(1, 0) is hence a saddle, with .λ− < 0 and
λ+ > 0.
.
2 The entries of Jacobian matrix are the derivatives of the flow, see Sect. 2.2 of Chap. 2.
4.1 Interplay Between Diffusion and Reaction 133
1
c=2 (numerical)
c=2.041 (exact)
0.75
u(z)
0.5
0.25
0
0 5 10 15 20
z=x-ct
Fig. 4.3 Numerical result for the minimal velocity (.c = 2) wavefront solution .u∗ (z), compare
(4.9), of
√ the Fisher equation (4.4). For comparison, the wavefront for a slightly larger velocity
.c = 5/ 6 ≈ 2.041 has been plotted, for which the shape can be obtained analytically, see (4.10).
Indicated is .u = 1/4 (dashed lines), which has been used to align the respective horizontal offsets
The unstable direction .u∗ (z), emerging from the saddle and leading to the stable
fixpoint .(0, 0), viz the heteroclinic trajectory, is the only trajectory in phase space
fulfilling the conditions
characterizing a propagating wavefront. The lineshape .u∗ (z) of the wavefront can
be evaluated numerically, see Fig. 4.3.
1
ρ ∗ (x, t) = σ 2 (x − ct),
. σ (z) = , (4.10)
1 + eβz
−βeβz
.σ' = 2
= −βσ (1 − σ ), σ '' = β 2 (1 − 2σ )σ (1 − σ ) , (4.11)
1 + eβz
which leads to
∂ρ ∗
. = 2cβσ 2 (1 − σ )
∂t
and
∂ 2ρ∗ ∂
. = (−2βσ 2 )(1 − σ ) = β 2 (4σ − 6σ 2 )σ (1 − σ ) .
∂x 2 ∂x
134 4 Self Organization
∂ρ ∗ ∂ 2ρ∗ !
. − = σ 2 (1 − σ ) 2cβ − β 2 (4 − 6σ ) = σ 2 (1 − σ 2 ) ≡ ρ ∗ (1 − ρ ∗ ) ,
∂t ∂x 2
!
= (1+σ )
if
4 10 5
1 = 6β 2 ,
. 1 = 2cβ − 4β 2 = 2cβ − , 2cβ = = .
6 6 3
This last condition determines the two free parameters .β and c as
1 5 5
β=√ ,
. c= = √ ≈ 2.041 , (4.12)
6 6β 6
which shows that the propagation velocity c of .ρ ∗ (x, t) is very close to the lower
bound .c = 2 for the allowed velocities. The lineshape of the exact particular
solution is nearly identical to the numerically-obtained minimal-velocity shape for
the propagating wavefront, as shown in Fig. 4.3.
u'' + cu' + u ≈ 0 ,
.
1 1 + 4β 2 6+4 5
c=
. + 2β = = √ =√ ,
2β 2β 2 6 6
√
in agreement with (4.12), when using .β = 1/ 6.
4.1 Interplay Between Diffusion and Reaction 135
may be used to derive a sum rule for the shape .u(z) of the wavefront. For this
purpose one integrates (4.14),
2
z u3 (z) u2 (z) u' (z)
u' (w) dw = A +
2
c
. − − . (4.15)
−∞ 3 2 2
where the second equation is the sum rule for .u' . The wavefront is steepest for the
minimal velocity .c = 2.
Sum Rule for Generic Reaction Diffusion Systems Sum rules equivalent
to (4.16) can be derived for any integrable reaction term .R(ρ) in (4.2). One finds,
by generalizing the derivation leading to (4.16),
∞ 1
1
u' (w) dw =
2
. R(ρ)dρ . (4.17)
−∞ c 0
Sum Rule for the Exact Particular Solution We verify that the sum rule (4.16)
for the Fisher equation is satisfied by the particular solution (4.10),
1
u∗ (z) = σ 2 (z),
. σ (z) = σ ' = −βσ (1 − σ ) .
1 + eβz
136 4 Self Organization
where we used .β 2 = 1/6 and .c = 5β, see (4.12). Note that only the lower bound
z → −∞ contributes to above integral.
.
The Fisher equation .ρ̇ = ρ(1 − ρ) + ρ '' supports travelling wavefront solutions
ρ(x, t) = u(x − ct) for any velocity .c ≥ 2. The observed speed of a wavefront
.
may either depend on the starting configuration .ρ(x, 0), or may self-organize to a
universal value. We use perturbation theory in order to obtain an heuristic insight
into this issue.
can be constructed for arbitrary initial conditions .p0 (x) = ρ(x, 0), using via
1
e−z /(4t)
2
ρ0 (x, t) =
. dy Φ(x − y, t) p0 (y), Φ(z, t) = √ (4.19)
4π t
the Green’s function .Φ(z, t) for diffusion systems,3 which obeys .limt→0 Φ(z, t) =
δ(z). Note that .Φ(z, t) is the generic solution of the diffusion equation,viz of (4.18)
with .r → 0. One can easily verify that
is the solution of the linearized Fisher equation (4.18), propagating the initial
distribution .p0 (x) to finite times. Equation (4.20) describes exponentially growing
diffusive behavior.
Velocity Stabilization and Self Organization Using (4.19) for .ρ0 (x, t) in (4.20)
leads to terms of the type
= e−(z−2t)(x+2t)/(4t) (4.21)
in the kernel, viz inside the integral, with .z = x − y. Equation (4.21) describes
left- and right propagating fronts, travelling with propagation speeds .c = ±2. The
envelopes of the respective wavefronts are time dependent and do not show the
simple exponential tail (4.13), as exponential falloffs are observed only for solutions
.ρ(x, t) = u(x − ct) characterized by a single velocity c.
– Ballistic Transport
The expression (4.21) shows that the interplay between diffusion and exponential
growth, the reaction of the system, leads to ballistic transport.
– Velocity Selection
The perturbative result (4.21) indicates, that the minimal velocity .|c| = 2 is
achieved for the wavefront when starting from an arbitrary localized initial state
.p0 (x), since .limt→0 e Φ(x, t) = δ(x).
t
– Self Organization
Propagating wavefronts with any .c ≥ 2 are stable solutions of the Fisher
equation, but the system settles to .c = 2 for localized initial conditions. The
stabilization of a non-trivial dynamical property for a wide range of starting
conditions is an example of a self-organizing process.
Above considerations concern the speed c of the travelling wavefront, but do not
make any direct statement regarding the lineshape.
Stability Analysis of the Wavefront In order to examine the stability of the shape
of the wavefront we consider
where the second term with .ϵ ⪡ 1 is a perturbation to the solution .u(z) = u(x − ct)
of the travelling wave Eq. (4.6). The particular form of above ansatz is motivated by
yet to be derived final expression.
With the help of the derivatives,
we obtain
Wavefront Self-Stabilization With above results the Fisher equation .ρ̇ − ρ '' =
ρ(1 − ρ) reduces with
d2 c2
. − + V (z) ψ(z) = λψ(z), V (z) = 2u(z) + −1, (4.23)
dz2 4
to a differential equation for .ψ(z), which holds to order .O(ϵ). This expression
corresponds to a time-independent one-dimensional Schrödinger equation4 for a
particle with a mass .m = h̄2 /2 moving in a potential .V (z) that is strictly positive
since .u(z) ∈ [0, 1],
V (z) ≥ 0,
. for c≥2.
the eigenvalues .λ are hence also positive. The perturbation term in (4.22) conse-
quently decays with .t → ∞, which implies that that the wavefront self-stabilizes.
This result would be trivial if it were known a priori that the wavefront equation
'' '
.u +cu +u(1−u) = 0 has a unique solution .u(z). In this case, all states of the form
of (4.22) would need to contract to .u(z). The stability condition (4.23) indicates that
the travelling-wavefront solution to the Fisher equation is indeed unique.
Finally, we remark that the procedure followed here, to derive a differential
equation for a perturbation containing a yet unknown wavefunction, here .ψ(z), is
analogous to secular perturbation theory, which we got to know in the context of the
Van der Pol oscillator.5
In chemical reaction systems one reagent may activate or inhibit the production of
the other components, leading to non-trivial chemical reaction dynamics. Chemical
reagents typically also diffuse spatially and the interplay between the diffusion
process and the chemical reaction dynamics may give rise to the development to
spatially structured patterns.
2
4 Recall the expression .ih̄ ∂ψ
∂t = − 2mh̄
Δψ + V ψ for the time-dependent one dimensional
Schrödinger equation. Equation (4.23) is recovered for an exponential time dependency .∼
exp(−iλt/h̄) of the wavefunction .ψ.
5 For the Van der Pol oscillator, secular perturbation theory is developed in Sect. 3.2 of Chap. 3.
4.2 Interplay Between Activation and Inhibition 139
The reaction-diffusion system (4.2) contains additively two terms, the reaction and
the diffusion term. Diffusion alone leads generically to an homogeneous steady
state.
For the reaction terms considered in Sect. 4.1 the reference state .ρ = 0 was
unstable against perturbations. Will now consider reaction terms for which the
reference homogeneous state is however stable. Naively one would expect a further
stabilization of the reference state, but this is not necessarily the case.
TURING INSTABILITY The interaction of two processes, which separately would stabilize
a given homogeneous reference state, can lead to an instability.
The Turing instability is thought to be the driving force behind the formation of
spatio-temporal patterns observed in many physical and biological settings, such as
the stripes of a zebra.
Turing Instability of Two Stable Foci As a first example we consider a linear two-
dimensional dynamical system composed of two subsystem, which are per se stable
foci6
−ϵ1 1 −ϵ2 −a
.ẋ = Ax, A1 = , A2 = . (4.24)
−a −ϵ1 1 −ϵ2
Here .0 < a < 1 and .ϵα > 0, for .α = 1, 2. The eigenvalues for the matrices .Aα and
A = A1 + A2 are
.
√
λ± (Aα ) = −ϵα ± i a,
. λ± (A) = −(ϵ1 + ϵ2 ) ± (1 − a) .
For negative determinants .Δ < 0 the system is a saddle, having both an attracting
and a repelling eigenvalue.
6 As a reminder, note that a node/focus has real/complex Lypunov exponents, as defined in Sect. 2.2
of Chap. 2.
140 4 Self Organization
with
The system .ẋ = A1 x has a stable focus at .x = 0 when the trace .ϵb − ϵa < 0 is
negative, compare (4.25), viz when it has two complex conjugate eigenvalues with
negative real parts. Possible values are, e.g. .ϵa = 1/2 and .ϵb = 1/4.
Next we add a stable node
−a 0 −ϵa − a 1
.A2 = , A= . (4.27)
0 −b −1 ϵb − b
Can we select .a, b > 0 such that .A = A1 + A2 becomes a saddle? In this case, the
determinant .Δ12 of A,
should then become negative, compare (4.25). This is clearly possible for .b < ϵb
and a large enough a.
ρ̇ = f (ρ, σ ) + Dρ Δρ
. (4.28)
σ̇ = g(ρ, σ ) + Dσ Δσ
describing the deviation from the equilibrium state .(ρ0 , σ0 ) by a single Fourier
component. We obtain
δ ρ̇ δρ −Dρ k 2 0
. =A , A2 = , (4.31)
δ σ̇ δσ 0 −Dσ k 2
fρ < 0,
. gσ > 0, 0 > gσ fρ = −|gσ fρ | ,
sub-critical
sub-critical
critical
determinant Δ super-critical
super-critical
stable focus
stable node
saddle
wavevector k
Fig. 4.4 The determinant .Δ of the Turing bifurcation matrix, see (4.32), for a range .d = Dσ /Dρ
of ratios of the two diffusion constants. For large .d → 1 the determinant remains positive,
becoming negative for a finite interval of wavevectors k when d becomes small enough. With
decreasing size of the determinant the fixpoint changes from a stable focus (two complex conjugate
eigenvalues with negative real components) to a stable node (two negative real eigenvalues) and to
a saddle (a real positive and a real negative eigenvalue), compare (4.25)
Diffusion processes may act as bifurcation parameters also within other bifurcation
scenarios, like a Hopf bifurcation, as we will discuss in more detail in Sect. 4.2.3.
ρ̇ = −ρσ 2 + F (1 − ρ) + Dρ Δρ,
. (4.33)
σ̇ = ρσ 2 − (F + k)σ + Dσ Δσ ,
ρi∗ σi∗ = K,
. Fρi∗ + Kσi∗ = F, i = 1, 2 , (4.34)
∗ 1±a ∗ 1∓a 4K 2
ρ1,2
. = , σ1,2 = F, a2 = 1 − .
2 2K F
The trivial fixpoint .p∗0 has a diagonal Jacobian with eigenvalues .−F and .−K
respectively. It is always stable, even in the presence of diffusion.
∗ merge when .a = 0,
Critical Loss of Particles The two non-trivial fixpoints .ρ1,2
which happens at
F = 4K 2 = 4(k + F )2 ,
. kc = Fc /2 − Fc . (4.35)
We recall that two fixpoints merge via a saddle-node bifurcation,7 which implies
that the two non-trivial fixpoints .ρ1,2∗ cease to exists when the loss rate .K = F + k
of .σ -particles becomes too large. Beyond, .p∗0 = (1, 0) remains as a global attractor.
The critical line .(kc , Fc ) is shown in Fig. 4.5, where the phase diagram of the Gray–
Scott system is presented.
Saddle Fixpoint .p∗1 The Jacobians and determinants of the non-trivial restpoints
.p
∗ are
1,2
∗ )2 ) −2K ∗ )2 − F
Δ1,2 = K (σ1,2
−(F + (σ1,2
. , . (4.36)
∗ )2 ∗ )2 /F − 1
(σ1,2 K = KF (σ1,2
The restpoint corresponding to the minus sign in above expression, .p∗1 , is always
a saddle, given that .Δ1 is negative for all .γ 2 ∈ [0, 1/2], viz when .p∗1 exists.
Compare (4.25).
0.1
2
saddle-node bifurcation: F = 4(k+F)
stable focus -> unstable focus
stable node -> stable focus
simulation parameters
F
0.05
0
0 0.05
k
Fig. 4.5 Phase diagram for the reaction term of the Gray–Scott model (4.33). Inside the saddle-
node bifurcation line (solid red line), see (4.35) there are three fixpoints .p∗i (.i = 0, 1, 2), outside
the saddle-node bifurcation line only the stable trivial restpoint .p∗0 is present. .p∗1 is a saddle and .p∗2
a stable node (checkerboard brown area), a stable focus (light shaded green area) or an unstable
focus (shaded brown area), the later two regions being separated by the focus transition line (solid
blue line), as defined by (4.39). Also include is conjunction point .pc = (1, 1)/16 (black filled
circle) and the parameters used for the simulation presented in Fig. 4.7 (open diamonds)
Focus Transition for .p∗2 Eigenvalues are complex when the discriminant .D =
(trace)2 − 4Δ is negative, compare (4.25). For the Gray–Scott system one has
2
∗ 2 ∗ 2
D = k − (σ1,2
. ) − 4K(σ1,2 ) + 4KF ,
where we used .k = K −F . The discriminant is positive for small and large densities
∗ , but negative in between. At the transition, .p∗ changes from a stable node to a
σ1,2
.
2
stable focus. The explicit analytic expression is not of relevance for the following,
the transition line is however shown in Fig. 4.5.
Stability of .p∗2 The focus .p∗2 changes stability when the trace of its Jacobian (4.36),
Kf − Ff − (σ2∗ )2 = 0,
. σ2∗ = Kf − F f = kf , (4.38)
changes sign. The respective focus line .(kf , Ff ) has been included in Fig. 4.5.
Inserting this relation, .σ2∗ = kf , into the fixpoint conditions (4.34)
K K F + (σ2∗ )2
.
∗ = ρ2∗ = 1 − σ2∗ , K = σ2∗ ,
σ2 F F
4.2 Interplay Between Activation and Inhibition 145
0.5 0.5
0.4 0.4
0.3 0.3
σ
σ
0.2 0.2
0.1 0.1
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
ρ ρ
Fig. 4.6 Flow of the reaction term of the Gray–Scott model (4.33) in phase space .(ρ, σ ), for
.F = 0.01. The stable fixpoint .p∗0 = (1.0) (black filled circle) is globally attracting. The focus
transition (4.39) occurs at .kf = 0.0325 and the saddle-node transition (4.35) at .kc = 0.04 Left:
∗ ∗
.kf < k = 0.035 < kc , with .p1 and .p2 (red circles) being a saddle an unstable focus respectively.
Right: .kc < k = 0.045. The locus of the attractor relict (shaded circle) corresponds to the local
minimum of .q = f2 , see (4.41), which evolves into the bifurcation point for .k → kc
we obtain
(Ff + kf )2 = Ff
. kf , (4.39)
where we also used .K = F + k. One can verify that the Lyapunov exponents are
complex along (4.39), becoming however real for lower values of k, as illustrated in
Fig. 4.5. The endpoint of (4.39), determined by .∂kf /∂Ff = 0, is
in the .(k, F ) plane, which coincides with the turning point of the saddle-node
line (4.35).
Merging of an Unstable Focus and a Saddle In Fig. 4.6 we illustrate the flow of
the reaction term of the Gray–Scott model for two sets of parameters .(k, F ). The
unstable focus .p∗2 and the saddle .p∗1 annihilate each other for .k → kc . One observes
that the large-scale features of the flow are remarkable stable similar for .k < kc and
.k > kc , as all trajectories, apart from the stable manifolds of the saddle, flow to the
Attractor Relict Dynamics Close to the outside of the saddle-node line .(kc , Fc )
the dynamics slows down when approaching a local minimum of
q(x) = f2 (x),
. ẋ = f(x) , (4.41)
with q being a measure for the velocity of the flow, which vanishes at a fixpoint.
146 4 Self Organization
Fig. 4.7 Dynamical patterns for the Gray–Scott model (4.33). Shown is .ρ(x, y, t), the diffusion
constants are .Dρ /2 = Dσ = 10−5 . The simulation parameters are .(k, F ) = (0.062, 0.03)
(left) and .(k, F ) = (0.06, 0.037) (right), as indicated in the phase diagram, Fig. 4.5 (Illustrations
courtesy P. Bastani)
Beyond the transition, for .k > kc , the attractor relict still influences the flow
strongly, as indicated in Fig. 4.6, determining a region in phase space where the
direction of the flow turns sharply.
being the respective Jacobian. There is no way to add a diffusion term .A2 , compare
expression (4.31), such that .A1 + A2 would have a positive eigenvalue. Pattern
formation in the Gray–Scott system is hence not due to a Turing instability.
4.3 Collective Phenomena and Swarm Intelligence 147
When a system is composed out of many similar or identical constituent, parts,8 their
mutual interaction may give rise to interesting phenomena. Several distinct concepts
have been developed in this context, each carrying its own specific connotation.
– Collective Phenomena
Particles like electrons obey relatively simple microscopic equations of motions,
like Schrödinger’s equation, interacting pairwise. Their mutual interactions may
lead to phase transitions and to emergent macroscopic collective properties,
like superconductivity or magnetism, not explicitly present in the underlying
microscopic description.
– Emergence
At times heavily loaded with philosophical connotations, “weak emergence” is
equivalent to collective behavior, e.g. such as occurring in physics. There is
also “strong emergence”, which denotes the emergence of higher-level properties
which cannot be traced back causally to microscopic laws. Strong emergence is
a popular playball in philosophy, it transcends however the realm of scientific
investigations.
– Self Organization
When generic generative principles give rise to complex behavior for a wide
range of environmental and/or starting conditions, one speak of self organization
in the context of complex systems theory. The resulting properties may be
interesting, biologically relevant or emergent.
– Swarm Intelligence
In biological settings, with the agents being individual animals, one speaks of
swarm intelligence whenever the resulting collective behavior is of behavioral
relevance. Given that evolutionary optimized behavior is performance oriented,
one can use the principles underlying swarm intelligence as an algorithm to solve
certain computational problems.
Collective phenomena arise, loosely speaking, when “the sum is more than the
parts”, just as mass psychology transcends the psychology of the constituent
individuals.
Phase transitions occur in many physical systems,9 being of central importance also
in biology, climatology and sociology. A well known psychological phenomenon is,
in this context, the transition from normal crowd behavior to collective hysteria. As
a basic example we consider here the nervous rats problem.
Calm and Nervous Rats Consider N rats constrained to live in an area A, with an
overall population density .ρ = N/A. There are .Nc calm and .Nn nervous rats with
.Nc + Nn = N, together with the respective densities .ρc = Nc /A and .ρn = Nn /A.
Comfort Zone Each rat has a zone .a = π r 2 around it, with r being a characteristic
radius. A calm rat will get nervous if at least one nervous rat comes too close,
entering its comfort zone a, with a nervous rat calming down when having its
comfort zone all for itself.
Master Equation The time evolution for the density of nervous rats is given by
Stationary Solution For the stationary solution .ρ̇n = 0 of the master equa-
tion (4.43) we obtain
. 1 − e−ρn a ρc = e−ρa ρn , ρc = ρ − ρn ,
which has a trivial solution .ρn = 0 for all population densities .ρ and comfort
zones a. Multiplying with a yields
σn = eσ 1 − e−σn (σ − σn ),
. σ = ρa, σn = ρn a , (4.45)
where the dimensionless densities .σ and .σn correspond to the average numbers of
rats within a comfort area a.
4.3 Collective Phenomena and Swarm Intelligence 149
self-consistency condition σ = σ*
(4.45), for various numbers of
rats .σ in the comfort zone a, 0.1
as function of the average σ < σ*
number of nervous rats .σn per
a. The transition occurs at .σ ∗ 0
0 0.1 0.2
(dashed line) σn
Critical Rat Density The graphical representation of (4.45) is given in Fig. 4.8. A
non-trivial solution .σn > 0 is possible only above a certain critical number .σ ∗ of
rats per comfort zone. At .σ = σ∗ the right-hand side of (4.45) has unitary slope
with respect to .σn , for .σn → 0,
∗
1 = eσ σ ∗ ,
. σ ∗ ≈ 0.56713 , (4.46)
Herding animals and social insects are faced at times with the problem of taking
decisions collectively. When selecting a new nest for swarming honey bees or a
good foraging site for ants, no single animal will compare two prospective target
sites. Individual opinions regarding the quality of prospect sites are instead pooled
together into groups of alternative decisions with a competitive dynamical process
reducing the number of competing opinions until a single target site remains.
Swarming Bees Bees communicate locally by dancing, both when they are in the
nest and communicate prospective foraging sites as well when a new queen leaves
the old nest together with typically about 10.000 workers, searching for a suitable
location for building a new nest.
The swarm stays huddled together with a small number of typically .5% of bees
scouting in the meantime for prospective new nest sites. Scouts coming back to
150 4 Self Organization
the waiting swarm will advertise new locations they found by dancing, with the
duration of the dance being proportional to the estimated quality of the prospective
nest location.
New scouts ready to fly will observe the dancing returning scouts. The proba-
bility that departing scouts will target advertised sites is consequently proportional
to their quality. This mechanism leads to a competitive process with lower quality
prospective nest sites receiving fewer consecutive visits by scouting bees.
Opinion pooling is inherent in this process as there are many scout bees flying
out for site inspection. The whole consensus process is an extended affair, taking up
to a few days.
Information Encoding and Stigmergy Bees and ants need an observable scalar
quantity in order to exchange information about the quality of prospective sites.
This quantity is time for the case of bees and pheromone intensity for the ants.
Producing pheromone traces ants manipulate the environment for the purpose of
information exchange.
Ant and Pheromone Dynamics For the binary decision process of Fig. 4.9 we
denote with .ρ1,2 the densities of travelling ants along the two routes, with .ϕ1,2 and
.Q1,2 the respective pheromone densities and site qualities. The process
Ant Number Conservation The updating rules .ρ̇1,2 need to obey conservation of
the overall number .ρ = ρ1 + ρ2 of ants. This can be achieved by selecting the flux
.Ψ in (4.47) appropriately as
2Ψ = ρ(ϕ1 + ϕ2 ) − ρ1 ϕ1 − ρ2 ϕ2 2T ρ̇1 = ρ2 ϕ1 − ρ1 ϕ2
. , (4.48)
= ρ2 ϕ1 + ρ1 ϕ2 2T ρ̇2 = ρ1 ϕ2 − ρ2 ϕ1
Φ̇ = −Γ Φ + Q1 Q2 (ρ1 + ρ2 ),
. Φ → ρ Q1 Q2 /Γ ,
when using (4.47). Any initial weighted pheromone concentration .Φ will hence
relax fast to .ρQ1 Q2 /Γ . It is hence enough to consider the time evolution in the
subspace spanned by .ϕ2 = (Φ − Q2 ϕ1 )/Q1 ,
ρ1 = ρ,
. ρ2 = 0, ϕ1 = Q1 ρ/Γ, ϕ2 = 0 ,
152 4 Self Organization
and viceversa, with .1 ↔ 2. The Jacobian of (4.49) for the fixpoint .ρ1 = 0 = ϕ1 is
−Φ/Q1 ρ
. , Δ = ΦΓ /Q1 − Q1 ρ = (Q2 − Q1 )ρ ,
Q1 −Γ
when setting .2T = 1 for simplicity. The trace .−(Γ + Φ/Q1 ) of the Jacobian
is negative and the fixpoint is hence stable/unstable, compare (4.25), when the
determinant .Δ is positive/negative, hence when .Q2 > Q1 and .Q1 > Q2
respectively.
The dynamics (4.49) hence leads to binary decision process with all ants
proceeding in the end along the path with the higher quality factor .Qj .
In a large body of moving agents, like a flock of birds in the air, a school of fishes in
the ocean, or cars on a motorway, agents will adapt individually their proper velocity
according to the perceived positions and movements of other close-by agents. For
modelling purposes one can consider the behavior of the individual agents, as we
will do. Alternatively, for a large enough number of agents one may also use a
hydrodynamic description together with a corresponding master equation for the
density of agents.
Individual Decision Making For a swarm of birds, the individual birds take their
decisions individually, an explicit mechanism for collective decision making is not
present, f.i. regarding the overall direction the swarm should be heading to. The
resulting group behavior may nevertheless have high biological relevance, such as
predator avoidance.
example is
ẋi = vi ,
. (4.50)
v̇i = γ (vi0 )2 − (vi )2 vi + f(xj , xi ) + g(xj , xi |vj , vi ) ,
j /=i j /=i
4.3 Collective Phenomena and Swarm Intelligence 153
with the first term in .v̇i modelling the preference for moving with a preferred
velocity .vi0 . The collective behavior is generated by the pairwise interaction terms
.f and .g. Reacting to observations needs time and time delays are inherently present
in .f and .g.
Distance Regulation Animals and humans alike have preferred distances. The
regulation of the distance to neighbors, to the side, front, back, above and below,
is job of the inter-agent force .f (xj , xi ), which may be taken as the derivative of a
“Mexican hat potential” .V (z),
with the tendency to align vanishing both for identical velocities .vj and .vi and for
large inter-agent distances .|xj − xi | ⪢ Rv .
Hiking Humans Numerical simulation of equations like (4.50) describe nicely the
observed flocking behavior of birds and schools of fishes. For a specific application
154 4 Self Organization
we ask when a group of humans hiking along a one-dimensional trail will break
apart due to differences in the individual preferred hiking speeds .vi0 .
⇐ ⇐ ⇐
x0 , v0 x1 , v1 x2 , v2
.
The distance alignment force .f is asymmetric when walkers pay attention only to the
person in front, but not the one next in line. In the simplest setting we may assume
furthermore that .g vanishes, namely that the hikers are primarily concerned to walk
close to their own preferred velocities .vi . The group stays together in this case if
everybody walks with the speed .v ≡ v0 of the leader, we consider here the case that
.vi < v0 for .i = 1, 2, ...
The restraining potential illustrated in Fig. 4.10 has a maximal positive slope
.fmax when the walker lags behind, which implies that a maximal speed difference
The group is more likely to break apart when the desire .γ to walk with one’s own
preferred velocity .vi0 is large. An analogous argument holds for .vi > v0 , for which
the maximal negative slope of .V (z) would enter above expression.
Opinion dynamics research deals with agents having real-valued opinions .xi , which
may change through various processes, like conviction or consensus building.
The two agents do not agree to a common opinion if they initially differ beyond the
confidence bound d. As a consequence, distinct sets of attracting opinions tend to
form, as illustrated in Fig. 4.11.
Master Equation For large populations of agents we may define with .ρ(x, t) the
density of agents having opinion x, with the time evolution given by the master
4.3 Collective Phenomena and Swarm Intelligence 155
log(1+(nunber of agents))
4 million
of .60 · 103 agents with 16 million
64 million
.1//4/16/64 million pairwise
updatings of type (4.53).
Opinion attractors with a 5
large number of supporters
tend to stabilize faster than
attracting states containing
smaller number of agents
0
0 0.2 0.4 0.6 0.8 1
opinion
equation
d/2 d
τ ρ̇(x) = 2
. ρ(x + y)ρ(x − y)dy − ρ(x)ρ(x + y)dy , (4.54)
−d/2 −d
with .τ setting the time scale of the consensus dynamics and with the time
dependency of .ρ(x, t) being suppressed. The first term results from two agents
agreeing on x, the second term describes the flow of agents leaving opinion x by
agreeing with other agents within the confidence interval d.
y2
ρ(x + y) ≈ ρ(x) + ρ ' (x)y + ρ '' (x)
. + ... . (4.55)
2
Substituting (4.55) into (4.54) one notices that the terms proportional to .y 0 cancel, as
the overall number of agents is conserved. The terms .∼ y 1 also vanish by symmetry
and we obtain
d/2 d y2
ρρ '' − ρ ' y 2 dy − ρρ ''
2
τ ρ̇ = 2
. dy
−d/2 −d 2
3 d3 d 3 ''
2 1d
= 4 ρρ '' − ρ ' − ρρ '' ρρ + ρ '
2
=− .
3 8 3 6
∂ρ d 3 ∂ 2ρ2
. =− , ρ̇ = −Do Δρ 2 , (4.56)
∂t 12τ ∂ 2 x
with .Δ denoting the Laplace operator in one dimension. .ρ 2 enters the evolution
equation as two agents need to interact for the dynamics to proceed.
156 4 Self Organization
ρ̇ + ∇ · jo = 0,
. jo = Do ∇ρ 2 , (4.57)
defines the opinion current .jo . With .D0 and .ρ being positive this implies that the
current strictly flows uphill, an example of a “rich gets richer dynamics” which is
evident also in the simulation shown in Fig. 4.11.
Voter Models Large varieties of models describing opinion dynamics have been
examined, such as networks of agents having a discrete number of opinions, like 0/1
for conservative/progressive, which can be described by “voter models”.11 Voters
change opinion when being ‘infected’ by the opinion of one or more neighbors.
The properties of the underlying social network, like the clustering coefficient, will
determine the overall outcome.
The flow of cars on a motorway can be modelled by “car following models”, akin
to the bird flocking model discussed in Sect. 4.3.3. Of central importance is here the
interplay between velocity dependent forces and human reaction times.
Chain of Cars We denote with .xj (t) the position of cars on a one-dimensional
motorway, .j = 0, 1, .. . The acceleration .ẍj of the j th car is given by
.ẍj +1 (t + T ) = g(xj , xi |ẋj , ẋi ) = λ ẋj (t) − ẋj +1 (t) , (4.58)
j /=i
where .λ is a reaction constant and T the reaction time.12 Actions are delayed by T
relative to the perception of sensory input. Drivers tend to break when closing in to
the car in the front and to accelerate when the distance grows. In car following
models, using the notation of (4.50), one considers mostly velocity-dependent
forces.
10 An analogous rescaling is done when deriving the diffusion equation .ṗ = DΔp,as discussed in
Sect. 3.3.1 of Chap. 3.
11 The classical voter model is treated in exercise (4.8).
12 The general theory of dynamical systems with time delays is developed in Sect. 2.5 of Chap. 2.
4.4 Car Following Models 157
One of the most important questions for traffic planning regards the carrying
capacity of a road, namely the maximal possible number of cars per hour, q. When
all cars advance with identical velocities u, the optimal situation, one would like to
evaluate .q(u).
Carrying Capacity for the Linear Model Integrating the linear model (4.58) and
assuming a steady state, identical velocities .ẋj ≡ u and inter-vehicle distances .xj −
xj +1 ≡ s, one obtains
ẋj +1 = λ[xj − xj +1 ] + c0 ,
. u = λ(s − s0 ) , (4.59)
where we did write the constant of integration as .c0 = −λs0 , with .s0 being the
minimal allowed distance between the cars. The carrying capacity q, the number
of cars per time transiting, is given by the product of the mean velocity u and the
density .1/s of cars,
u s0 u u
q=
. =λ 1− = , s = s0 + , (4.60)
s s s0 + u/λ λ
Maximal Velocity The linear model (4.58) cannot be correct, as the velocity .u =
λ(s − s0 ), as given by (4.59), would diverge for empty highways, namely when the
inter-vehicle distances diverge, .s → ∞. The circumstance that real-world cars have
an upper velocity .umax implies that the carrying capacity must vanish instead as
.umax /s for large inter-vehicle spacings s.
A natural way to overcome this deficiency of the basic model (4.58) would be to
consider terms, like for the bird flocking model (4.50), expressing the preference
to drive with a certain velocity. An alternative venue, pursued normally when
modelling traffic flow, is to consider non-trivial distance dependencies for the
velocity dependent force .g(xj , xi |ẋj , ẋi ).
Non-Linear Model Driving a car one reacts stronger when the car in front is closer,
an observation which can be modeled as
ẋj (t) − ẋj +1 (t)
ẍj +1 (t + T ) = λ
. , α > 0, (4.61)
[xj (t) − xj +1 (t)]1+α
158 4 Self Organization
cars / time
non-linear car following
model (4.63). The carrying
capacity vanishes when the
road is congested, and the
mean velocity vanishes,
.u → 0. Arbitrary large
cruising velocities are
possible for the linear model
in the limit of empty streets
0 umax
velocity
with .s0 denoting again the minimal inter vehicle distance and .d0 = λ/(αs0α ). For
.α > 0 the mean velocity u takes a finite value
λ λ 1 α 1/α
umax =
. , u = umax − , = (umax − u)1/α
αs0α αs α s λ
for near to empty streets with .s → ∞. The carrying capacity .q = u/s is then given
by the parabola
A steady flow of cars with .ẋj (t) ≡ u may become unstable when fluctuations
propagate along the line of vehicles, a phenomena that can induce traffic congestion
even in the absence of external influences, such as an accident.
Moving Frame of Reference The linear car following model (4.58) is the minimal
model for analyzing the dynamics for intermediate to high densities of vehicles. We
are interested in the evolution of the relative deviations .zj = xi /u − 1 from the
4.4 Car Following Models 159
steady-state velocity u,
żj +1 (t + T ) = λ zj − zj +1 ,
. ẋi (t) = u 1 − zi (t) , (4.64)
z0 (t) → eγ t ,
. γ = 1/τ + iω τ >0. (4.65)
For evaluating the exact time evolution of the following cars one would need to
integrate (4.64) piecewise, step by step for the intervals .t ∈ [nT , (n + 1)T ].
The situation simplifies when assuming that the delay interval T is substantially
smaller than the timescale of the perturbation, .τ . In this case the system will follow
the behavior of the lead car smoothly.
Recursion We make the ansatz that the column follows the leading car with the
same time-dependency, as given by (4.65), albeit with individual amplitudes .aj ,
zj (t) = aj eγ t ,
. γ eγ T aj +1 = λ(aj − aj +1 ) .
we obtain
1/2
λ λ2
.
λ + γ eT γ = λ2 + ω2 − 2λω sin(T ω) (4.67)
for the norm of the prefactor in the recursion (4.66). The recursion is unstable if
Suitable large time delays, with .T ω ≈ π/2, induce an instability for any values of
λ and .ω.13 Non-constant driving may induce traffic jams.
.
1/(2λ) < T ,
. sin(T ω) ≈ T ω . (4.69)
When reactions are strong and slow (.λ and T large), self-organized traffic jams are
likely to emerge.
λτ
zn (t) = a0 C n et/τ = a0 en log(C) et/τ ,
. C= .
λτ + eT /τ
−1
zn (t) = a0 e(n−vt) log(C) ,
. v= . (4.70)
τ log(C)
The speed is positive, .v > 0, since .0 < C < 1 and .log(C) < 0. Large reaction
times T limit propagation speed, since
in the limit .T ⪢ τ .
Exercises
13 This result is in accordance with our discussion in Sect. 2.5 of Chap. 2, regarding the influence
of time delays in ordinary differential equations.
Exercises 161
Prove that the resulting explicit time evolution map becomes unstable for
DΔt/(Δx)2 > 1/2 by considering ρ(x, 0) = cos(π x/Δx) as a particular
initial state.
(4.2) EXACT PROPAGATING WAVEFRONT SOLUTION
Find the reaction-diffusion system (4.2) for which the Fermi function
1
.ρ ∗ (x, t) =
1 + eβ(x−ct)
2
ρ̇ = ρ(1 − ρ) + ρ '' +
. ρ ' )2 (4.71)
1−ρ
and show that it is equivalent to the linearized Fisher equation (4.18) using the
transformation ρ = 1 − 1/u.
(4.4) TURING INSTABILITY WITH TWO STABLE NODES
Is it possible that the matrix A1 entering the Turing instability and defined
in (4.26), with positive ϵa , ϵb > 0, describes a stable node with both λ± <
0? If yes, show that the superposition of two stable nodes may generate an
unstable direction.
(4.5) EIGENVALUES OF 2 × 2 MATRICES
Use the standard expression (4.25) for the eigenvalues of a 2 × 2 matrix
and show that local maxima of the potential V (x) of a one-dimensional
mechanical system
ẋ = y,
. ẏ = −λ(x)y − V ' (x)
Further Reading
The interested reader is encouraged to take a look at Ellner et al. (2011) for an
in-depth treatise on mathematical modelling in biology and ecology, at Walgraef
(2012) and Landge et al. (1920) for the generic mathematics of pattern formation in
reaction-diffusion systems, and to Petrovskii et al. (2010) for a discussion of exactly
solvable models in the field.
We further suggest Hassanien et al. (2018) for a comprehensive textbook on
swarm intelligence, Kerner (2004) and Jusup et al. (2022) for traffic modelling and
for reviews on opinion dynamics and flocking behaviors. A review of voter models
is given in Redner (2019).
References
Ellner, S. P., & Guckenheimer, J. (2011). Dynamic models in biology. Princeton University Press.
Hassanien, A. E., & Emary, E. (2018). Swarm intelligence: Principles, advances, and applications.
CRC Press.
Jusup, M., et al. (2022). Social physics. Physics Reports, 948, 148.
Kerner, B. S. (2004). The physics of traffic: Empirical freeway pattern seatures, engineering
applications, and theory. Springer.
Landge, A. N., Jordan, B. M., Diego, X., & Müller, P. (2020). Pattern formation mechanisms of
self-organizing reaction-diffusion systems. Developmental Biology, 460, 460.
Petrovskii, S. V., & Bai-Lian, L. (2010). Exactly solvable models of biological invasion. CRC
Press.
Redner, S. (2019). Reality-inspired voter models: A mini-review. Comptes Rendus Physique, 20,
275–292.
Walgraef, D. (2012). Spatio-temporal pattern formation: with examples from physics, chemistry,
and materials science. Springer Science & Business Media.
Information Theory of Complex Systems
5
What do we mean, when we say that a given system shows “complex behavior”,
can we provide precise measures for the degree of complexity? This chapter offers
an account of several common measures of complexity together with the relation of
complexity to predictability and emergence.
Following a self-contained introduction to information theory and statistics,
we will learn about probability distribution functions, Bayesian inference, the
law of large numbers, and the central limit theorem. Next, Shannon entropy and
mutual information will be discussed, two concepts that play central roles both in
the context of time series analysis, and as starting points for the formulation of
quantitative measures of complexity. The chapter concludes with a short overview
regarding generative approaches to complexity.
Statistics is ubiquitous in everyday life. We are used to chat, e.g. about the
probability that our child will have blue or brown eyes, the chances to win a lottery,
or those of a candidate to win the presidential elections. Statistics is ubiquitous
also throughout the realms of science. Indeed, basic statistical concepts are used
abandonedly all over these lecture notes.
x ∈ [0, ∞],
. xi ∈ {1, 2, 3, 4, 5, 6}, α ∈ {blue, brown, green} .
For example, we may define with .p(x) the probability distribution of human life
expectancy x, with .p(xi ) the chances to obtain .xi when throwing a dice, or with
.p(α) the probability to meet somebody having eyes of color .α. Probabilities are
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 163
C. Gros, Complex and Adaptive Dynamical Systems,
https://doi.org/10.1007/978-3-031-55076-8_5
164 5 Information Theory of Complex Systems
The notation used for a given variable will indicate in the following its nature, i.e.
whether it is a continuous or discrete variable, or denoting a symbol. For continuous
variables the distribution .ρ(x) represents a probability density function (PDF).
the resulting discrete distribution function .p(xi ) is not any more normalized; the
properly normalized discrete distribution function is .p(xi )Δx. The two notations,
.pi and .p(xi ), are both used for discrete distributions.
1
Mean, Median and Standard Deviation Common symbols for the average .〈x〉
are .μ and .x̄. Average and standard deviation .σ are given by
〈x〉 =
. x p(x) dx, σ2 = (x − x̄)2 p(x) dx . (5.2)
Mean and expectation value are synonyms for .x̄, with .σ 2 being the variance.2 For
everyday life situations the median .x̃, defined by
1
. p(x) dx = = p(x) dx , (5.3)
x<x̃ 2 x>x̃
is somewhat more intuitive than the mean. We have a 50 % chance to meet somebody
being smaller/taller than the median height.
1 The expression .p(xi ) is context specific and can denote both a properly normalized discrete
distribution function as well as the value of a continuous probability distribution function.
2 In formal texts on statistics and information theory, the notation .μ = E(X) is used, where X
stands for an abstract random variable, with x denoting a particular value and .pX (x) the probability
density.
5.1 Probability Distribution Functions 165
1 0.4
exp(-t)
0.5
0
0 ln(2) 1 2 3 t 4 -4 -3 -2 -1 0 1 2 3 x 4
Fig. 5.1 Left: For an average waiting time .T = 1, the exponential distribution .exp(−t/T )/T .
With 50% probability waiting times are below the the median .ln(2)
√ (shaded area). Right: For a
standard deviation .σ = 1, the normal distribution .exp(−x 2 /2)/ 2π . The probability to draw a
result within one/two standard deviations of the mean, .x ∈ [−1, 1] and .x ∈ [−2, 2] respectively
(shaded regions), is 68 and .95%
t˜ = T ln(2),
. σ =T.
In 50 % of times one has to wait less than .t˜ ≈ 0.69 T , which is smaller than the
average waiting time T , compare Fig. 5.1.
Standard Deviation and Bell Curve The standard deviation .σ measures the size
of the fluctuations around the mean. The standard deviation is especially intuitive
for the Gaussian distribution
2
1 − (x−μ)
.p(x) = √ e 2 2σ , 〈x〉 = μ, 〈(x − x̄)2 〉 = σ 2 . (5.5)
2π σ 2
“Gaussian”, “Bell curve”, and “normal distribution” all denote (5.5). Bell curves
are ubiquitous in daily life, characterizing cumulative processes, as detailed out in
Sect. 5.1.1.
Gaussians falls off rapidly with distance from the mean .μ, compare Fig. 5.1.
The probability to draw a value within n standard deviation of the mean, viz the
166 5 Information Theory of Complex Systems
which are defined for discrete distributions .pk , where .k = 0, 1, 2, .. . For the
normalization and the mean .k̄ = 〈k〉 one evaluates
G0 (1) =
. pk = 1, G'0 (1) = k pk = 〈k〉 . (5.7)
k k
d 2
.σ 2 = 〈k 2 〉 − k̄ 2 = x G'0 (x) − G'0 (1)
dx x=1
2
= G''0 (1) + G'0 (1) − G'0 (1) . (5.9)
This relation is easily verified, f.i. for the case of two random variables by
multiplying out .G10 (x)G20 (x).
Throwing a dice many times and adding up the results obtained, the resulting
average will be close to .3.5 N, where N is the number of throws. This is the typical
outcome for cumulative stochastic processes.4
LAW OF LARGE NUMBERS Repeating . N times a stochastic process with mean . x̄ and
standard deviation . σ √
, the mean and the standard deviation of the cumulative result will
approach . x̄ N and . σ N respectively in the thermodynamic limit . N → ∞.
Proof We prove the law of large numbers for a a discrete process .pk described by
the generating functional .G0 (x). This is not really a restriction, since probability
densities of continuous variables can be discretized with arbitrary accuracy. The
generating function of the cumulative process is .GN 0 (x), which allows to express
the mean as
d N −1 '
k̄ (N ) =
. G0 (x) = N GN
0 (x) G0 (x) = N k̄ ,
dx x=1 x=1
with the help of (5.7). For the standard deviation .σ (N ) of the cumulative process we
obtain
2
2 d d N
. σ (N ) = x G0 (x) − N k̄ (5.10)
dx dx x=1
d −1 '
2
= x N GN0 (x) G 0 (x) − N 2 G'0 (1)
dx x=1
2 2
= NG0 (1) + N(N − 1) G0 (1) + N G''0 (1) − N 2 G'0 (1)
' '
2
= N G''0 (1) + G'0 (1) − G'0 (1) ≡ N σ2 ,
viz the law of large numbers, where (5.9) was used twice.
4 Please take note of the difference between a cumulative stochastic process, when adding the
results of individual trials, and the “cumulative PDF”, .F (x), defined by .F (x) = −∞ p(x ' )dx ' .
x
168 5 Information Theory of Complex Systems
flat, N=1
Gaussian
probability densities
0.02 flat, N=3
0.01
0
0 0.2 0.4 0.6 0.8 1
x
Fig. 5.2 The flat distribution, with variance .σ 2 = 1/12, and the probability density of the sum
of .N =√ 3 flat distributions. The latter approximates remarkably well the limit Gaussian with
.σ = 1/ 3 · 12 = 1/6, compare (5.11), in accordance with the central limit theorem. .10 random
5
Central Limit Theorem The law of large numbers states, that the variance .σ 2 is
additive for cumulative processes, not the standard deviation .σ . The “central limit
theorem” tells us in addition, that the limit distribution is a Gaussian, as illustrated
in Fig. 5.2.
In most cases one is not interested in the cumulative result, but in averaged
quantities, which are obtained by rescaling variables,
2
1 − (y−μ̄)
y = x/N,
. μ̄ = μ/N, σ̄ = σ/N, p(y) = √ e 2σ̄ 2 .
σ̄ 2π
√
The rescaled standard deviation scales with .1/ N. To see this, just consider
identical processes with .σi ≡ σ0 ,
1 σ0
σ̄ =
. σi2 = √ , (5.11)
N N
i
Is Everything Boring Then? One might be tempted to draw the conclusion that
systems containing large numbers of variables are boring, since everything seems to
average out. This is actually not the case, the law of large numbers holds only for
statistically independent processes. Subsystems of distributed complex systems are
however dynamically dependent, and it is often the case that dynamical correlations
lead to highly non-trivial properties in the thermodynamic limit.
5.1 Probability Distribution Functions 169
The notions of statistics considered so far can be easily generalized to the case of
more than one random variable. Whenever a certain subset of random variables is
considered to be the causing event for the complementary subset of variables one
speaks of inference, a domain of the Bayesian approach.
Conditional Probability Events and processes may have dependencies upon each
other. A physician will typically have to know, to give an example, the probability
that a patient has a certain illness, given that the patient shows a specific symptom.
CONDITIONAL PROBABILITY The probability that an event x occurs, given that an event
y has happened, is the “conditional probability” .p(x|y).
Throwing a dice twice, the probability that the first throw resulted in a 1, given
that the total result was .4 = 1 + 3 = 2 + 2 = 3 + 1, is 1/3. One defines with
.p(x) = p(x|y)p(y)dy (5.12)
Bayes Theorem The probability distribution of throwing x in the first throw and
y in the second throw is determined by the joint distribution .p(x, y), which obeys
.1= p(x, y)dxdy, p(x) = p(x, y)dy . (5.13)
Together, the two expressions (5.12) and (5.13) for the marginal are equivalent to
or
p(x|y)p(y) p(x|y)p(y)
p(y|x) =
. = , (5.14)
p(x) p(x|y)p(y)dy
where (5.12) was used in the second step. This relation is denoted “Bayes theorem”.
The conditional probability .p(x|y) of x happing given that y had occurred, is the
“likelihood”.
with the latter being the rate of false positives. The probability of a positively tested
person of being infected is then actually just 33%,
p(pos|ill)p(ill)
p(ill|pos) =
.
p(pos|ill)p(ill) + p(pos|healthy)p(healthy)
0.99 · 0.01 1
= = ,
0.99 ∗ 0.01 + 0.02 ∗ 0.99 3
where Bayes theorem (5.14) was used. A second follow-up test is hence necessary.
and solve for our estimate .p(ill) of infections. In addition one needs to estimate the
confidence of the obtained result, viz the expected fluctuations due to the limited
number of tests actually carried out.
Bayesian Inference We start by noting that both sides of Bayes theorem (5.14) are
properly normalized,
p(x|y)p(y) dy
. p(y|x) dy = 1 = .
p(x)
For a given x, the probability that any y happens is unity, and vice versa. For a given
x we may hence interprete the left-hand side as the probability that y is true. We
change the notation slightly,
p(x|y)p0 (y)
p1 (y) ≡
. . (5.16)
p(x|y)p0 (y)dy
5.1 Probability Distribution Functions 171
One denotes
Equation (5.16) constitutes the basis of Bayesian inference. In this setting one is not
interested in finding a self-consistent solution .p0 (y) = p1 (y) = p(y). Instead it is
premised that one disposes of prior information, viz knowledge and expectations,
about the status of the world, .p0 (y). Performing an experiment a new result x is
obtained which is then used to improve the expectations of the world status through
.p1 (y), using (5.16).
This update procedure of the knowledge .pi (y) about the world is independent of
the grouping of observations .xi , viz
p 0 → p1 → · · · → p n
. and p 0 → pn
yield the same result, due to the multiplicative nature of the likelihood .p(x|y), viz
when considering in the last relation all consecutive observations .{x1 , . . . , xn } as a
single event.
Beyond the elementary parameters, like mean and variance, one is interested in
many cases in estimating the very probability distribution, in particular for data
generated by some known or unknown process, like the temperature measurements
of a weather station. When doing so, it is important to keep a few caveats in mind.
172 5 Information Theory of Complex Systems
6
xn + 1 = rxn(1 − xn)
0,4
4
joint probabilities
2 0,2
0 0
0 0,2 0,4 0,6 0,8 1 p++ p+− p−+ p−−
t
Fig. 5.3 For the logistic map with .r = 3.9 and .x0 = 0.6, two statistical analyses of the
identical time series .{xn |n = 0, . . . , N }, with .N = 106 . Left: The distribution .p(x) of the .xn .
Plotted at the midpoints of the respective bins is .p(x)(Nbin /N ), for .Nbin = 10 (square symbols)
and .Nbin = 100 (vertical bars). Right: The joint probabilities .p±± , as defined by (5.20), of
consecutive increases/decreases of .xn . The probability .p−− that the data decreases consecutively
twice vanishes
xn+1 = r xn (1 − xn ),
. xn ∈ [0, 1], r ∈ [0, 4] . (5.17)
For systems with continuous readings, as for the logistic map, one needs to bin
observations in order to estimate the respective probability distribution. In Fig. 5.3
the statistics of a time series in the chaotic regime is given, here for .r = 3.9.
Apart from the overall number of bins, .Nbin , a choice has to be made regarding
the positions and the widths of the individual bins. When the data is not uniformly
distributed, one may place more bins in a region of interest, generalizing the
relation (5.1) through .Δx → Δxi , with the .Δxi being the width of the individual
bins.
For the example shown in Fig. 5.3, we selected .Nbin = 10/100 equidistant
bins. Note that the average number of observations scales with .1/Nbin . Rescaling
the count per bin with .Nbin allows therefore to compare distributions obtained for
different .Nbin , as done in Fig. 5.3.
The selection of the binning procedure is in general an intricate choice. Fine
structure will be lost when .Nbin is too low, but statistical noise will dominate for
large numbers of bins.
5A detailed account of period doubling and chaos in the logistic map can be found in Sect. 2.4, of
Chap. 2.
5.1 Probability Distribution Functions 173
Adaptive Binning Classically, number and positions of the set .{Bi } of bins are
predetermined,
Bi = x|x ∈ [bi− , bi+ ] ,
. (5.18)
where .bi± is the upper/lower border of the ith bin. Observations .xn ∈ Bj are added
to the count of the j th bin. Results are assigned either to the bare midpoint .(bi+ +
bi− )/2, or to a weighted average.
For adaptive binning, the desired number .Nobs of observations per bin is
predetermined, not the .Bi themselves. For the following we assume that the
observations .xn > 0 are ordered, with .xn+1 ≥ xn . Starting with .b1− = 0, the
following steps are repeated.
– For the ith bin, add data points until .Nobs is reached. Say, that the last addition is
the mth observation.
– Set the border of the current bin halfway to the next data point, .bi+ = (xm+1 +
xm )/2.
−
– Repeat for the next bin, with matching bin borders, .bi+1 = bi+ .
Per construction, the statistical accuracy of all bins are identical when adaptive
binning is used. Adaptive binning improves the quality of estimates substantially
when the data is distributed unequally over large ranges. This is in particular the
case for data showing powerlaw behavior, as for scale-free graphs.6
Till now, we implicitly assumed now that the statistical evaluation of a given set
of observations is done directly, without further preprocessing. This is however not
always the optimal approach.
6 The data for the in-degree of Internet domains presented in Fig. 1.6 of Chap. 1, has been processed
The consecutive development of the .δt may also be encoded, using higher-level
symbolic stochastic variables. For example, one might be interested in the joint
probabilities
p++ = p(δt = 1, δt−1 = 1) t , p+− = p(δt = 1, δt−1 = −1) t ,
. (5.20)
p−+ = p(δt = −1, δt−1 = 1) t , p−− = p(δt = −1, δt−1 = −1) t ,
where .p++ gives the probability that the data increases at least twice consecutively,
etc., and where .〈. . . 〉t denotes the time average. In Fig. 5.3 the values for the joint
probabilities .p±± are given for a selected time series of the logistic map in the
chaotic regime. The data never decreases twice consecutively, .p−− = 0, a somewhat
unexpected result. It tells us, that certain properties of an otherwise chaotic system
may be predictable,7 at times even at a 100% level.
The symbolization procedure selected to analyze a given time series determines
the type of information one may hope to extract, as evident from the results
presented in Fig. 5.3. The selection of symbolization procedures is given further
attention in Sect. 5.2.2.
The four initial conditions 00, 01, 10 and 11, give rise to the following time series’,
. . . 000000000 . . . 101101101
. (5.22)
. . . 110110110 . . . 011011011
where time runs from right to left. In (5.22) the initial conditions .σ1 and .σ0 have been
underlined. The typical time series, occurring for 75 % of the initial conditions, is
.. . . 011011011011 . . ., with .p(0) = 1/3 and .p(1) = 2/3 for the probability to find
respectively .0/1. When averaging over all four initial conditions, we have on the
other hand .(2/3)(3/4) = 1/2 for the probability to find a 1. Then
2/3 typical
p(1) =
. .
1/2 average
7 General aspects of predictability in chaotic systems are developed in Sect. 3.1.3 of Chap. 3.
8 Remember, that .XOR(0, 0) = 0 = XOR(1, 1) and .XOR(0, 1) = 1 = XOR(1, 0).
5.1 Probability Distribution Functions 175
When observing a single time series we are likely to obtain the typical probability,
analyzing many time series will result on the other hand in the average probability.
The XOR series is not self averaging and one can generally not assume self
averaging to occur. An inconvenient situation whenever only a single time series is
available, as it is the case for most historical data, e.g. of past climatic conditions.
XOR Series with Noise Most real-world processes involve a certain degree of
noise. It is therefore tempting to presume, that noise could effectively restart the
dynamics, leading to an implicit averaging over initial conditions. This assumption
is not generally valid, it holds however for the XOR process with noise,
XOR(σt , σt−1 ) with probability 1 − ξ
σt+1 =
. 0 ≤ ξ ⪡ 1.
¬ XOR(σt , σt−1 ) with probability ξ
. . . . 000000001101101101011011011011101101101100000000 . . .
for the noise-induced transition probabilities. In the stationary case .p000 = p011 /3
for the XOR process with noise, the same ratio one would obtain for the determin-
istic XOR series averaged over the initial conditions.
The introduction of noise generally introduces complex dynamics akin to (5.23),
which will lead in most cases to self-averaging time series. This is also the case for
the OR time series,9 for which the small noise limit does however not coincide with
the time series obtained in the absence of noise.
Time Series Analysis and Cognition Time series analysis is a tricky business
whenever the fundamentals of the generative process are unknown, e.g. whether
noise is important or not. This is however the setting in which cognitive systems
are operative. Our sensory organs, eyes and ears, provide us with a continuous time
Animals need to perform online analysis of their sensory data input streams,
otherwise they would not survive long enough to react. Training of most machine
learning algorithms is however offline.
respectively the “trailing average” .μt and the trailing variance .σt2 . Trailing expec-
tation values exponentially discount older data, with the respective moments of the
input stream .x(t) being recovered in the limit .T → ∞. The factor .1/T in (5.24)
and (5.25) normalizes the respective trailing averages. For the case of a constant,
time independent input .x(t) ≡ x̄, we obtain correctly
∞
1
μt →
. dτ x̄ e−τ/T = x̄ .
T 0
The trailing average can be evaluated by a simple online update rule, there is no
need to store past data .x(t − τ ). To see this, we evaluate the time dependence
∞ ∞
1 −τ/T d −1 d
.μ̇t = dτ e x(t − τ ) = dτ e−τ/T x(t − τ ) .
T 0 dt T 0 dτ
The last expression can be evaluated by direct partial integration.10 One obtains
x(t) − μt
μ̇t =
. , (5.26)
T
10 The identical procedure can be used in the context of dynamical systems with distributed time
delays, as shown in Sect. 2.5.1 of Chap. 2.
5.2 Entropy and Information 177
together with an analogous update rule for the variance .σt2 , by substituting .x →
(x − μ)2 . Expression (5.26) is an archetypical example of an online updating rule
for a time averaged quantity, here the trailing average .μt .
Entropy and Life Living organisms have a body, which means that they are
capable of creating ordered structures from basic chemical constituents. As a
consequence, living entities decrease entropy locally, with their bodies, seemingly
in violation of the second law. In reality, the local entropy depressions are created
at the expense of corresponding entropy increases in the environment, in agreement
with the second law of thermodynamics. All living beings need to be capable of
manipulating entropy.
e.g. when transmitting a message. Typically, in everyday computers, the .σt are
words of bits. Let us consider two time series of bits, e.g.
The first example is predictable, from the perspective of a time-series, and ordered,
from the perspective of an one-dimensional alignment of bits. The second example
is unpredictable and disordered respectively.
Information can be transmitted through a time series of symbols only when this
time series is not predictable. Talking to a friend, to illustrate this statement, we will
not learn anything new when capable of predicting his next joke. We have therefore
the following two perspectives,
large disorder physics
high entropy =
. ˆ
high information content information theory
178 5 Information Theory of Complex Systems
and vice versa. Only seemingly disordered sequences of symbols are unpredictable
and thus potential carriers of information. Note, that the predictability of a given
time series, or its degree of disorder, may not necessarily be as self evident as in
above example, Eq. (5.27), depending generally on the analysis procedure used, see
Sect. 5.2.2.
A typical extensive property is the mass, a typical intensive property the density.
When lumping together two chunks of clay, their mass adds, but the density does
not change.
One demands, both in physics and in information theory, that the entropy should
be an extensive quantity. The information content of two independent transmission
channels should be just the sum of the information carried by the two individual
channels.
where .p(xi ) is a normalized discrete probability distribution function and where the
brackets in .H [p] denote the functional dependence.11 Note, that .−p log(p) ≥ 0 for
.0 ≤ p ≤ 1, see Fig. 5.4, the entropy is therefore strictly positive.
b is the base of the logarithm used in (5.28). Common values of b are 2, Euler’s
number e and 10. The corresponding units of entropy are then termed “bit” for
.b = 2, “nat” for .b = e and “digit” for .b = 10. In physics the natural logarithm is
always used and there is an additional constant (the Boltzmann constant .kB ) in front
of the definition of the entropy. Here we will use .b = 2 and drop in the following
the index b.
11 A function .f (x) is a function of a variable x; a functional .F [f ] is, on the other hand, functionally
dependent on a function .f (x). In formal texts on information theory the notation .H (X) is often
used for the Shannon entropy and a random variable X with probability distribution .pX (x).
5.2 Entropy and Information 179
2
0.4
0
-x log 2 (x)
-2
0.2
-4 log2 (x)
0 -6
0 0.2 0.4 0.6 0.8 1 0 1 2 3
x x
Fig. 5.4 Left: Plot of .−x log2 (x). Right: The logarithm .log2 (x) (full line) is concave, every cord
(dashed line) lies below the graph
The logarithm is the only function which maps a multiplicative input onto an
additive output. Consequently,
H [p] = −
. p(xi , yj ) log(p(xi , yj ))
xi ,yj
=− pX (xi )pY (yj ) log(pX (xi )) + log(pY (yj ))
xi ,yj
=− pX (xi ) pY (yj ) log(pY (yj ))
xi yj
− pY (yj ) pX (xi ) log(pX (xi ))
yj xi
= H [pY ] + H [pX ] ,
a celebrated result. The entropy grows logarithmically with the number of degrees
of freedom.
Shannon’s Source Coding Theorem So far we did show, that (5.28) is the
only possible definition, modulo renormalizing factors, for an extensive quantity
depending exclusively on the probability distribution. The operative significance of
the entropy .H [p] in terms of informational content is given by Shannon’s theorem.
SOURCE CODING THEOREM Given is a random variable x with a PDF .p(x) and entropy
.H [p]. The cumulative entropy .N H [p] is then, for .N → ∞, a lower bound for the number
of bits necessary when compressing N independent processes drawn from .p(x).
Entropy and Compression Let’s make an example. Consider the four letter
alphabet .{A, B, C, D}. Suppose, that these four letters do not occur with the same
probability, the relative frequencies being instead
1 1 1
. p(A) = , p(B) = , p(C) = = p(D) .
2 4 8
When transmitting a long series of words using this alphabet we will have the
entropy
−1 1 1 1
H [p] =
. log(1/2) − log(1/4) − log(1/8) − log(1/8)
2 4 8 8
1 2 3 3
= + + + = 1.75 , (5.30)
2 4 8 8
since we are using the logarithm with base .b = 2. The most naive bit encoding,
A → 00,
. B → 01, C → 10, D → 11 ,
would use exactly 2 bit, which is larger than the Shannon entropy. An optimal
encoding would be, on the other hand,
1 2 3 3
p(A) + 2p(B) + 3p(C) + 3p(D) =
. + + + = 1.75 , (5.32)
2 4 8 8
which is the same as the information entropy .H [p]. The encoding given in (5.31) is
actually “prefix-free”. When we read the words from left to right, we know where a
5.2 Entropy and Information 181
110000010101
. ←→ AADCBB ,
without ambiguity. Fast algorithms for optimal, or close to optimal encoding are
clearly of importance in the computer sciences and for the compression of audio
and video data.
where .pi = p(xi )Δx is the properly normalized discretized PDF, see (5.1). The
difference .log(Δx) between
the continuous-variable entropy .H [p]con and the dis-
cretized version .H [p] dis diverges as .Δx → 0, the transition is hence discontinuous.
Entropy
of a Continuous PDF If follows from (5.33), that the Shannon entropy
H [p]con can be negative for a continuous probability distribution function. As an
.
The absolute value of the entropy is hence not meaningful forcontinuous probability
density, only entropy differences. Hence one refers to .H [p]con as the “differential
entropy”.
182 5 Information Theory of Complex Systems
see Fig. 5.4, it is intuitive that a flat distribution might be optimal. This is indeed
correct in the absence of any constraint other than the normalization condition
. p(x)dx = 1.
where the notation used will be of use later on. Maximizing a functional like .H [p]
is a typical task of variational calculus, which examines the variation .δp(x) around
an optimal function .popt (x),
where .f ' (p) = 0 follows from the fact that .δp is an arbitrary function.
For the entropy functional .f (p) = −p log(p) we find then with
For a given mean .μ, the the exponential distribution (5.38) maximises entropy. The
Lagrange parameter .λ is determined such that the condition (5.37) is satisfied. For a
support .x ∈ [0, ∞], as assumed above, we have .λ loge (2) = 1/μ.
One can generalize this procedure and consider distribution maximizing the
entropy under the constraint of a given mean .μ and variance .σ 2 ,
μ=
. x p(x) dx, σ2 = (x − μ)2 p(x) dx . (5.39)
Generalizing the derivation leading to (5.38), one sees that the maximal entropy
distribution constrained by (5.39) is a Gaussian,12 as given by (5.5).
e−H
p(x1 , . . . , xn ) =
. , H = Jij xi xj + λi xi (5.41)
N
ij i
the .n(n − 1)/2 variational parameters .Jij in order to reproduce given .n(n − 1)/2
pairwise correlations .〈xi xj 〉, and the Lagrange multiplier .λi for regulating the
respective individual averages .〈xi 〉.
The maximal entropy distribution (5.41) has the form of a Boltzman factor of
statistical mechanics with H representing an Hamiltonian, viz the energy function.
It contains coupling constants .Jij encoding the strength of pairwise interactions.
processes, such as the time series of certain financial data or the data stream
produced by our sensory organs.
Symbolization and Information Content The result obtained for the information
content of a real-world time series .{σt } depends in general on the symbolization
procedure used. Let us consider, as an example, the first time series of (5.27),
. . . . 101010101010 . . . . (5.42)
1 1
p(0) =
. = p(1), H [p] = −2 log(1/2) = 1 ,
2 2
as expected. If, on the other hand, we use a 2-bit symbolization, we find
When 2-bit encoding is presumed, the time series is predictable and carries no
information. This seems intuitively the correct result and the question is: Can we
formulate a general guiding principle which tells us which symbolization procedure
would yield the more accurate result for the information content of a time series at
hand?
Minimal Entropy Principle The Shannon entropy constitutes a lower bound for
the number of bits per symbol necessary when compressing the data without infor-
mation loss. Trying various symbolization procedures, the symbolization procedure
yielding the lowest information entropy allows us consequently to represent a given
time series lossless with the least number of bits.
MINIMAL ENTROPY PRINCIPLE The information content of a time series with unknown
encoding is given by the minimum (actually the infimum) of the Shannon entropy over all
possible symbolization procedures.
The minimal entropy principle then gives us a definite answer with respect to
the information content of the time series in (5.42). We have seen that at least one
symbolization procedure yields a vanishing entropy, the lowest possible value since
.H [p] ≥ 0. This is the expected result, since .. . . 01010101 .. . . is predictable.
So far, we have been concerned mostly with individual stochastic processes as well
as the properties of cumulative processes generated by the sum of stochastically
independent random variables. In order to understand complex systems, we need to
develop tools for the description of a large number of interdependent processes. As a
first step towards this direction, we study in the following the case of two stochastic
processes, which may now be statistically correlated.
This dynamics is markovian, as the value for the state .{σt+1 , τt+1 } depends only on
the state at the previous time step,13 viz on .{σt , τt }.
When state space is finite, as in our example, one has a Markov chain.
. . . σt+1 σt . . . : 00010000001010...
.
. . . τt+1 τt . . . : 00011000001111...
pt+1 (0, 0) = (1 − ξ ) [pt (1, 1) + pt (0, 0)] pt+1 (1, 0) = ξ [pt (0, 1) + pt (1, 0)]
.
pt+1 (1, 1) = (1 − ξ ) [pt (1, 0) + pt (0, 1)] pt+1 (0, 1) = ξ [pt (0, 0) + pt (1, 1)]
for the ensemble averaged joint probability distributions .pt (σ, τ ) = 〈p(σt , τt )〉ens ,
where the average .〈..〉ens denotes the average over an ensemble of time series. For
the solution in the stationary case, .pt+1 (σ, τ ) = pt (σ, τ ) ≡ p(σ, τ ), we use the
normalization
finding
p(1, 1) + p(0, 0) = 1 − ξ,
. p(1, 0) + p(0, 1) = ξ ,
p(0, 0) = (1 − ξ )2 p(1, 0) = ξ 2
. , . (5.44)
p(1, 1) = (1 − ξ )ξ p(0, 1) = ξ(1 − ξ )
For .ξ = 1/2 the two channels become 100 % uncorrelated, as the .τ -channel is then
fully random. The dynamics of the Markov process given in (5.43) is self averaging,
which allows to verify (5.44) by a straightforward numerical simulation.
2
H[pσ] + H[pτ]
entropy [bit] 1.5
H[p]
H[pσ]
1
H[pτ]
0.5
0
0 0.1 0.2 0.3 0.4 0.5
ξ
Fig. 5.5 For the two-channel XOR-Markov chain .{σt , τt } with noise .ξ , see (5.43), the entropy
.H [p] of the combined process (full line), see (5.47), of the individual channels (dashed lines),
see(5.46), .H [pσ ] and .H [pτ ], together with the sum of the joint entropies (dot-dashed line). Note
the positiveness of the mutual information, .I (σ, τ ) > 0, with .I (σ, τ ) = H [pσ ] + H [pτ ] − H [p]
for the marginal distributions .pσ and .pτ , we find from (5.44)
Joint and Marginal Entropy We evaluate now two entropies, that of the individual
channels, .H [pσ ] and .H [pτ ], the “marginal entropies”, viz
In Fig. 5.5 the respective entropies are plotted as a function of noise strength .ξ .
Some observations.
– For maximal noise .ξ = 0.5, the information content of both individual chains is
1 bit and of the combined process 2 bits, implying statistical independence.
188 5 Information Theory of Complex Systems
– For general noise strengths .0 < ξ < 0.5, the two channels are statistically corre-
lated. The information content of the combined process .H [p] is consequently
smaller than the sum of the information contents of the individual channels,
.H [pσ ] + H [pτ ].
MUTUAL INFORMATION For two stochastic processes .σt and .τt , the difference
between the sum of the marginal entropies .H [pσ ] + H [pτ ] and the joint entropy .H [p] is
the mutual information .I (σ, τ ).
When two dynamical processes become correlated, information is lost and this
information loss is given by the mutual information. Note, that .I (σ, τ ) = I [p] is
a functional of the joint probability distribution p only, the marginal distribution
functions .pσ and .pτ being themselves functionals of p.
as illustrated in Fig. 5.4. The inequality (5.51) holds for .p1 , p2 ∈ [0, 1], with .p1 +
p2 = 1, which expresses that any cord of a concave function lies below the graph.
We can regard .p1 and .p2 as the coefficients of a distribution function and generalize,
p1 δ(x − x1 ) + p2 δ(x − x2 )
. −→ p(x) ,
5.2 Entropy and Information 189
where .p(x) is now a generic, properly normalized probability density. The concave-
ness condition (5.51) then becomes
. log p(x) x dx ≥ p(x) log(x)dx, Φ (〈x〉) ≥ 〈 Φ(x) 〉 . (5.52)
This is the Jensen inequality, which holds for any concave function .Φ(x). It remains
valid when substituting .x → pX pY /p for the argument of the logarithm.14 For the
mutual information (5.50) we then obtain
p X pY pX pY
I (X, Y ) = −
. p log dxdy ≥ − log p dxdy
p p
= − log pX (x)dx pY (y)dy = − log(1) = 0 ,
viz that .I (X, Y ) is non-negative. Information can only be lost, and not gained, when
correlating two previously independent processes.
Conditional Entropy There are various ways to rewrite the mutual information,
using Bayes theorem .p(x, y) = p(x|y)pY (y) between the joint density .p(x, y), the
conditional probability distribution .p(x|y) and the marginal .pY (y), e.g.
p p(x|y)
I (X, Y ) = log
. = p(x, y) log dxdy
pX pY pX (x)
≡ H (X) − H (X|Y ) , (5.53)
where we used the notation .H (X) = H [pX ] for the marginal entropy, together with
the “conditional entropy”
H (X|Y ) = −
. p(x, y) log(p(x|y))dxdy . (5.54)
is positive, given that .−p log(p) ≥ 0 holds in the interval .p ∈ [0, 1]. Compare (5.33)
for changing from continuous to discrete variables. Several variants for the condi-
tional entropy can be used to define statistical complexity measures, as discussed in
Sect. 5.3.1.
14 For a proof consider the generic substitution .x → q(x) and a transformation of variables .x → q
via .dx = dq/q ' , with .q ' = dq(x)/dx, for the integration in (5.52).
190 5 Information Theory of Complex Systems
p(x|y) = p(x),
. H (X|Y ) → H (X) .
The opposite extreme is realized when the first channel is just a function of the
second channel, viz when
xi = f (yi ),
. p(xi |yi ) = δxi ,f (yi ) , p(xi , yi ) = δxi ,f (yi ) p(yi ) .
since .δxi ,f (yj ) is either unity, in which case .log(δ) = log(1) = 0, or zero, in
which case .0 log(0) vanishes as a limiting process. The conditional entropy .H (X|Y )
measures hence the amount of information present in the stochastic process X which
is not causally related to the process Y . The mutual entropy reduces to the marginal
entropy, as a corollary,
I (X, Y ) → H (X) ,
.
One is often interested in comparing two distribution functions .p(x) and .q(x)
with respect to their similarity. When trying to construct a measure for the degree
of similarity one is facing the dilemma that probability distributions are positive
definite and one can hence not define a scalar product as for vectors; two probability
densities cannot be orthogonal. It is nevertheless possible to define a positive definite
measure.
Relation to the .χ 2 Test We consider the case that the two distribution functions p
and q are nearly identical,
2
δp δp
. log(q) = log(p + δp) ≈ log(p) + − + ... ,
p p
obtaining
2
δp δp
K[p; q] ≈
. dx p log(p) − log(p) − +
p p
(δp)2 (p − q)2
= dx = dx , (5.56)
p p
since . δpdx = 0, as a consequence of the normalization conditions . pdx = 1 =2
qdx. This measure for the similarity of two distribution functions is termed .χ
test. It is actually symmetric under exchanging .q ↔ p, up to order .(δp)2 .
Example As a simple example we take two distributions .p(σ ) and .q(σ ) for a
binary variable .σ = 0/1,
with .p(σ ) being flat and .α ∈ [0, 1]. The Kullback-Leibler divergence,
p(σ ) −1 1
K[p; q] =
. p(σ ) log = log(2α) − log(2(1 − α))
q(σ ) 2 2
σ =0,1
−1
= log(4(1 − α)α) ≥ 0 ,
2
is unbounded, since .limα→0,1 K[p; q] → ∞. Interchanging .p ↔ q yields
0 0.5 1
α
which is now finite in the limit .limα→0/1 . The Kullback-Leibler divergence is highly
asymmetric, compare Fig. 5.6.
This expression is identical to the mutual information (5.50) for the case that .q(x, y)
is the product of the two marginal distributions of .p(x, y),
.q(x, y) = p(x)p(y), p(x) = p(x, y)dy, p(y) = p(x, y)dx .
with .p = p(y, θ ) and .p' = ∂p(y, θ )/∂θ . The first term in (5.60),
∂ ∂
.(−δθ ) dy p(y, θ ) = (−δθ ) 1 ≡ 0,
∂θ ∂θ
vanishes. The second term in (5.60) contains the Fisher information (5.59). Hence
(δθ )2
K p(y, θ ); p(y, θ + δθ ) = F (θ )
. , (5.61)
2
which establishes the role of the Fisher information as a metric.
complexity
disordered regime. For some
applications it may however
be meaningful to consider
complexity measures
maximal for random states
(dashed line)
order disorder
15 States of the forest fire model are presented in Fig. 6.7, see Chap. 6.
5.3 Complexity Measures 195
Complexity and Behavior The search for complexity measures is not just an
abstract academic quest. As an example consider how bored we are when our envi-
ronment is repetitive, having low complexity, and how stressed when complexity
overwhelms our sensory organs. There are indeed indications that a valid behavioral
strategy for highly developed cognitive systems may consist in optimizing the
degree of complexity. Well defined complexity measures are necessary in order to
quantify this intuitive statement mathematically.
xn , xn−1 , . . . , x2 , x1 ,
. xi = x(ti ), xi ∈ X, , (5.62)
It makes no sense to evaluate joint probabilities .pn for time differences .tn ⪢ τ ,
as all joint distributions factorize when time lags become large. For finite values of
n large numbers of subsets of length n can be cut out of a complete time series,
16 An analogous discussion for the autocorrelation function of critical vs. non-critical system is
presented in Sect. 6.2. of Chap. 6.
196 5 Information Theory of Complex Systems
providing the basis for reliable statistical estimates. This is an admissible procedure
for stationary dynamical processes.
1
.h∞ = lim H [pn ] , (5.65)
n→∞ n
which exists for stationary dynamical processes with finite time horizons. The
entropy density is the mean number of bits per time step needed for encoding the
time series statistically.
The excess entropy is equivalent to the non-extensive part of the entropy, being the
coefficient of the term .∝ n0 when expanding the entropy in powers of .1/n,
H [pn ] = n h∞ + E + O(1/n),
. n → ∞, (5.67)
compare Fig. 5.8. The excess entropy E is positive as long as .H [pn ] is concave as a
function of n, which is the case for stationary dynamical processes.17 For practical
purposes, the excess entropy can be approximated using finite differences,
h∞ = lim hn ,
. hn = H [pn+1 ] − H [pn ] , (5.68)
n→∞
since .h∞ corresponds to the asymptotic slope of .H [pn ], compare Fig. 5.8.
– One may use (5.68) together with (5.54) for expressing entropy density .hn in
terms of an appropriately generalized conditional entropy.
17 To prove that the excess entropy is positive is the task of exercise (5.10).
5.3 Complexity Measures 197
H [pn ]
. − h∞ .
n
n
In this form the excess entropy is known as the “effective measure complexity”
(EMC) or “Grassberger entropy”.
Excess Entropy and Predictability The excess entropy vanishes both for a
random and for an ordered system. For a random system
H [pn ] = n H [pX ] ≡ n h∞ ,
.
where .pX is the marginal probability. The excess entropy (5.66) vanishes conse-
quently. For an example of ordered dynamics we can take a system generating only
two types of sequences, say
. . . . 000000000000000 . . . , . . . 111111111111111 . . . ,
p(0, . . . , 0) = α,
. p(1, . . . , 1) = 1 − α, ∀n ,
The entropy density .h∞ vanishes for .α → 0, 1, viz in the deterministic limit, with
the excess entropy E becoming .H [pn ].
198 5 Information Theory of Complex Systems
Individual Objects For the statistical analysis of a time series we have been
concerned with ensembles of time series, as generated by the identical underlying
dynamical system, together with the limit of infinitely long times. In this section we
will be dealing with individual objects composed of a finite number of n symbols,
like
0000000000000000000000,
. 0010000011101001011001 .
The question is then: which dynamical model can generate the given string of
symbols? One is interested, in particular, in strings of bits and in computer codes
capable of reproducing them.
Turing Machine In theoretical informatics, the reference computer code is the set
of instructions needed for a “Turing machine” to carry out a given computation. The
exact definition of a Turing machine is not of relevance here, it is essentially a finite-
state machine working on a set of instructions called code. The Turing machine
plays a central role in the theory of computability, e.g. when one is interested in
examining how hard it is to find the solution to a given set of problems.
Weak and Strong Emergence On a final note one needs to mention that a
vigorous distinction is being made in philosophy between the concept of ‘weak
emergence’, which we treated here, and the scientifically irrelevant notion of ‘strong
emergence’. Properties of a complex system generated via weak emergence result
from the underlying microscopic laws, whereas strong emergence leads to top-level
properties which are strictly novel in the sense that they cannot, like magic, linked
causally to the underlying microscopic laws of nature.
18 Transitions between extended chaotic and regular phases occur in boolean networks, see Chap. 7.
19 For strange attractors and the like consult Chap. 3.
20 The Kuramoto model is the the standard reference for globally synchronized states, as detailed
out in Chap. 9.
200 5 Information Theory of Complex Systems
Exercises
Which exponent γ minimizes K[p; q]? How many times do the graphs for
p(x) and q(x) cross?
(5.9) CHI-SQUARED TEST
The quantity
N
(pi − qi )2
χ 2 [p; q] =
. (5.69)
pi
i=1
1 q
.Hq [p] = pk − pk , 0<q≤1 (5.70)
1−q
k
202 5 Information Theory of Complex Systems
Further Reading
References
Alishahiha, M. (2015). Holographic complexity. Physical Review D, 92, 126009.
Boffetta, G., Cencini, M., Falcioni, M., & Vulpiani, A. (2002). Predictability: A way to characterize
complexity. Physics Reports, 356, 367–474.
Bolstad, W. M., & Curran, J. M. (2016). Introduction to Bayesian statistics. John Wiley & Sons.
Bürgisser, P., Clausen, M., & Shokrollahi, M. A. (2012). Algebraic complexity theory. Springer
Science & Business Media.
Olbrich, E., Bertschinger, N., Ay, N., & Jost, J. (2008). How should complexity scale with system
size? The European Physical Journal B, 63, 407–415.
Pierce, J. R. (2012). An introduction to information theory: Symbols, signals and noise. Courier
Corporation.
Wilson, J. (2016). Metaphysical emergence: Weak and strong. In Metaphysics in contemporary
physics (pp. 345–402). Brill.
Zenil, H. 2020 A review of methods for estimating algorithmic complexity: options, challenges,
and new directions. Entropy, 22, 612.
Self-Organized Criticality
6
Classically, a phase transition occurs when the properties of a system change upon
tuning an external parameter, like the temperature. Is it possible, that a complex
system regulates an internal parameter on its own, self-organized, such that it
approaches a critical point all by itself? This is the central question discussed in
this chapter.
Starting with an introduction to the Landau theory of phase transitions, particular
attention will be devoted to cellular automata, an important and popular class of
standardized dynamical systems. Cellular automata allow for an intuitive construc-
tion of models, such as the forest fire mode, the game of life, and the sandpile model,
which exhibits “self-organized criticality”. Mathematically, a further understanding
will be attained with the help of random branching theory. The chapter concludes
with a discussion of whether self-organized criticality occurs in the most adaptive
dynamical system of all, namely in the context of long-term evolution.
One may describe the physics of thermodynamic phases either microscopically with
the tools of statistical physics, or by considering the general properties close to a
phase transition. The Landau theory of phase transitions does the latter, providing a
general framework valid irrespectively of the microscopic details of the material.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 203
C. Gros, Complex and Adaptive Dynamical Systems,
https://doi.org/10.1007/978-3-031-55076-8_6
204 6 Self-Organized Criticality
M h
disordered M=0
M 0
h=0 h>0 Tc T
M 0
ordered
T
Tc
Fig. 6.1 Phase diagram of a magnet in an external magnetic field h. Left: The order parameter
M, here the magnetization, as a function of temperature across the phase transition. Illustrated
are typical arrangements of the local moments (arrows). In the ordered phase there is a net
magnetic moment, viz magnetization. For .h = 0/.h > 0 the transition disorder-order is a sharp
transition/crossover. Right: The .T − h phase diagram. A sharp transition occurs only for vanishing
external fields h
Table 6.1 Examples of common types of phase transitions in physical systems. When the
transition is continuous/discontinuous one speaks of a second-/first-order phase transition. Note,
that order parameters can be non-intuitive. The superconducting state, notable for its ability to
carry electrical current without dispersion, breaks what one calls the .U (1)-gauge invariance of
the normal (non-superconducting) metallic state
Transition Type Order parameter .φ
Superconductivity Second-order U(1)-gauge
Magnetism Mostly second-order Magnetization
Ferroelectricum Mostly second-order Polarization
Bose–Einstein Second-order Amplitude of .k = 0 state
Liquid-gas First-order Density
6.1 Landau Theory of Phase Transitions 205
a>0 3
f(T, φ,h) - f0(T,h)
h>0 P(φ) P(φ) = (t - 1) φ + φ
h>0
φ1 φ3 0
0 φ1 φ2 φ0 φ3
0 φ 0 φ
Fig. 6.2 Landau-Ginzburg theory for .t > 1 (dotted green line) and .t < 1 (dashed blue line),
corresponding repectively to the disordered and ordered phase, where .a = (t − 1)/2. Left: The
functional dependence of .f (T , φ, h) − f0 (T , h) = −h φ + a φ 2 + b φ 4 , including the case .h = 0
(full red line). Right: For a non-vanishing field .h /= 0, the graphical solution of (6.5), where .φ0 is
the order parameter in the disordered phase, .φ1 and .φ3 the stable solutions in the ordered state, and
.φ2 the unstable solution for .t < 1
Free Energy A statistical mechanical system takes the configuration with the
lowest energy at zero temperature. At finite temperatures .T > 0, it is however
not the energy E which is minimized, but the free energy F, which differs from
the energy by a term proportional to the entropy and to the temperature. For the
following this difference is however not directly of relevance.
Close to the transition temperature .Tc , the order parameter .φ is small and one
assumes within the Landau-Ginzburg model that the free energy density,
F
f = f (T , φ, h),
. f = ,
V
can be expanded for a small order parameter .φ and a small external field h,
f (T , φ, h) = f0 (T , h) − h φ + a φ 2 + b φ 4 + . . .
. (6.1)
where the parameters .a = a(T ) and .b = b(T ) are functions of the temperature
T . The external field h, e.g. a magnetic field for the case of magnetic systems, is
presumed to couple linearly to the order parameter .φ. For (6.1) to be stable for large
.φ one needs .b > 0, compare Fig. 6.2.
f (T , φ, h) = f (T , −φ, −h),
. φ ↔ −φ, h ↔ −h ,
206 6 Self-Organized Criticality
which is generically the case. Inversion symmetry is broken when the temperature
is lowered below .Tc , viz when the order parameter .φ acquires a finite expectation
value. This is the phenomenon of “spontaneous symmetry breaking”, which is
inherent to second-order phase transitions.1
0 = δf = −h + 2aφ + 4b φ 3 δφ,
.
where .δf and .δφ denote small variations respectively of the free energy and of the
order parameter. The stationary condition (6.2) corresponds to a minimum of the
free energy if
.δ 2 f > 0, δ 2 f = 2a + 12bφ 2 (δφ)2 . (6.3)
In this case the solution is “locally stable”, since any change in .φ from its optimal
value would increase the free energy.
Solutions for .h = 0 When there is no external field, for .h = 0, the solution of (6.2)
is
0 for a > 0
.φ = √ . (6.4)
± −a/(2b) for a < 0
if .a > 0. The nontrivial solutions are given by .φ 2 = −a/(2 b), they are stable for
.a < 0,
2
. δ f = −4a(δφ)2 .
φ/=0
1 Spontaneous symmetry breaking is not present for first-order transitions, at which properties
change discontinuous, see Table 6.1.
6.1 Landau Theory of Phase Transitions 207
Graphically this is immediately evident,2 see Fig. 6.2. For .a > 0 there is a single
global minimum at .φ = 0, whereas one has two symmetric minima. when .a < 0.
Continuous Phase Transition Given (6.4), one has that the Ginzburg-Landau
functional (6.1) describes a continuous phase transition when .a = a(T ) changes
sign at the critical temperature .Tc . We may expand .a(T ) for small .T − Tc ,
a(T ) ∼ T − Tc ,
. a = a0 (t − 1), t = T /Tc , a0 > 0 ,
where we used .a(Tc ) = 0. For .T < Tc , in the ordered phase, the solution takes the
form
a0
.φ = ± (1 − t), t < 1, T < Tc .
2b
Simplification by Rescaling One can always rescale the order parameter .φ, the
external field h and the free energy density f , such that .a0 = 1/2 and .b = 1/4. This
leads to
t −1 t −1 2 1 4
a=
. , f (T , φ, h) − f0 (T , h) = −h φ + φ + φ
2 2 4
and
√
φ = ± 1 − t,
. t = T /Tc
h = (t − 1) φ + φ 3 ≡ P (φ) ,
. (6.5)
as illustrated in Fig. 6.2. In general one finds three solutions .φ1 < φ2 < φ3 and one
can show that the intermediate solution is always locally unstable,3 with .φ3 (.φ1 )
being globally stable for .h > 0 (.h < 0).
2 From a dynamical systems point of view, the transition shown in Fig. 6.2 is equivalent to a
pitchfork bifurcation, as detailed out Sect. 2.2.2 of Chap. 2.
3 The stability of the three solutions are treated in exercise (6.1).
208 6 Self-Organized Criticality
First-Order Phase Transition We note, see Fig. 6.2, that the solution .φ3 for .h > 0
remains locally stable when we vary the external field slowly, viz adiabatically,
(h > 0)
. → (h = 0) → (h < 0) ,
in the ordered state .T < Tc . At a certain critical field, see Fig. 6.3, the order
parameter changes sign abruptly, jumping from the branch corresponding to .φ3 > 0
to the branch .φ1 < 0. One speaks of hysteresis, a phenomenon typical for first-order
phase transitions.
Susceptibility When the system approaches the phase transition from above in the
disordered state, it has an increased sensitivity towards ordering under the influence
of an external field h.
Diverging Response Taking the derivative with respect to the external field h
in (6.5), .h = (t − 1) φ + φ 3 , we find
∂φ 1 Tc
1 = (t − 1) + 3 φ 2
. , χ (T ) = = (6.6)
∂h h→0 t −1 T − Tc
for the disordered phase, since .φ(h = 0) = 0 for .T > Tc . The susceptibility diverges
at the phase transition for .h = 0, see Fig. 6.3. This divergence is a typical precursor
of ordering for a second-order phase transition. Exactly at .Tc , viz at criticality, the
response of the system is formally infinite.
order parameter φ φ3
susceptibility χ
0
h=0
h>0
φ1
0 Tc
field h temperature T
Fig. 6.3 Left: Discontinuous phase transition and hysteresis in the Landau model. Plotted is the
solution .φ = φ(h) of .h = (t − 1)φ + φ 3 in the ordered phase, viz for .t < 1, when changing the
field h. Right: The susceptibility .χ = ∂φ/∂h for .h = 0 (solid line) and .h > 0 (dotted line). The
susceptibility divergence in the absence of an external field, compare (6.6)
Length Scales Pphysical or complex system normally dispose of well defined time
and space scales. As an example we take a look at the Schrödinger equation for the
hydrogen atom,
where
∂2 ∂2 ∂2
Δ=
. + +
∂x 2 ∂y 2 ∂z2
is the Laplace operator. One does not need to know the physical significance of the
parameters to realize that one can rewrite the differential operator H , namely the
“Hamilton operator”, as
2a0 mZ 2 e4 h̄2
H = −ER
. a02 Δ + , ER = , a0 = .
|r| 2h̄2 mZe2
The length scale .a0 = 0.53 Å./Z is called “Bohr radius”, with the energy scale
ER = 13.6 eV being the “Rydberg energy”. The energy scale .ER determines both
.
the ground state energy and excitation. The Bohr radius .a0 sets the scale for the mean
radius of the ground state wavefunction and all other radius-dependent properties.
Similar length scales can be defined for essentially all dynamical systems defined
by a set of differential equations. The damped harmonic oscillator and the diffusion
210 6 Self-Organized Criticality
∂ρ(t, r)
ẍ(t) − γ ẋ(t) + ω2 x(t) = 0,
. = DΔρ(t, r) . (6.7)
∂t
The parameters 1/γ and 1/ω determine respectively the time scales for relaxation
and oscillation, with D being the diffusion constant.
Here ρ(t0 , x) denotes the particle density, for the case of the diffusion equation
or when considering a statistical mechanical system of interacting particles. The
exact expression for ρ(t0 , x) in general depends on the type of dynamical system
considered, for the Schrödinger equation ρ(t, x) = Ψ ∗ (t, x)Ψ (t, x), i.e. the
probability to find the particle at time t at the point x.
In normal, viz non-critical systems, correlations over arbitrary large distances cannot
be built up, which implies that the correlation function decays exponentially, over
a scale determined by the correlation length ξ . The notation d − 2 + η > 0 for the
decay exponent of the critical system is a convention from statistical physics, where
d = 1, 2, 3, . . . is the dimensionality of the embedding system.
SCALE INVARIANCE If a measurable quantity, like the correlation function, decays like a
power of the distance ∼ (1/r)δ , with a critical exponent δ, the system is said to be critical
or scale-invariant.
6.2 Criticality in Dynamical Systems 211
Fig. 6.4 Simulation of the 2D-Ising model on a square lattice, H = 〈i,j 〉 σi σj , with nearest-
neighbor interactions 〈i, j 〉. There are two magnetization orientations σi = ±1 (dark/light dots).
Shown is the ordered state T < Tc (left), the critical state T ≈ Tc (middle), and the disordered
state T > Tc (right). Note the occurrence of fluctuations at all length scales at criticality, reflfecting
self similarity
Scale invariance implies that fluctuations occur over all length scales, albeit
with varying probabilities. This can be seen by observing snapshots of statistical
mechanical simulations of simple models. The case of the two-dimensional Ising
model is given in Fig. 6.4.
The scale invariance of the correlation function at criticality is a central result of
the theory of phase transitions within statistical physics. Properties of systems close
to a phase transition are not determined by the exact values of their parameters, but
by the structure of the governing equations and their symmetries. This circumstance
is denoted “universality”, it constitutes one of the reasons for classifying phase
transitions according to the symmetry of their order parameters, see Table 6.1.
4 Power laws in terms of scale-free degree distributions are a cornerstone of network theory, please
consult Sect. 1.6 of Chap. 1.
212 6 Self-Organized Criticality
which can be defined for any time-dependent measurable quantity A, such as A(t) =
ρ(t, r→). Note that observations are defined relative to 〈A〉2 , viz relative to the mean
fluctuations. The denominator in (6.10) is a normalization convention, it leads to
Γ (0) ≡ 1.
Γ (t) ∼ e−t/τ ,
. t → ∞, (6.11)
1
τ ∼ ξ z,
. ξ= , ξ T →Tc
→ ∞.
|T − Tc |ν
It may then come as a surprise that there should exist complex dynamical
systems that attain a critical state for a finite range of parameters. This possibility,
denoted “self-organized criticality”, is somewhat counter-intuitive. We can regard
the parameters entering the evolution equation as given externally. Self-organized
criticality then implies that the system adapts to changes in the external parameters,
e.g. to changes in the given time and length scales, in such a way that the stationary
state becomes effectively independent of those changes.
6.2 Criticality in Dynamical Systems 213
1/f Noise The power spectrum of the noise generated by many real-world
.
dynamical processes falls off inversely with frequency f . This .1/f noise has been
observed for various biological activities, like the heart beat rhythms, for functioning
electrical devices or for meteorological data series. The origin of “pink noise”,
another word for a .1/f power spectrum, could be self-organized phenomena. Within
this view, one describes pink noise as being generated by a continuum of weakly
coupled damped oscillators representing the environment.
For large frequencies .ω ⪢ 1/τ , the power spectrum .S(ω, τ ) falls off like .1/ω2 .
Being interested in the large-f behavior, we neglect .ω0 in the following.
1
D(τ ) ≈
. (6.12)
τα
for the relaxation times .τ . This distribution of relaxation times leads to a frequency
spectrum
τ τ 1−α
S(ω) =
. dτ D(τ ) ∼ dτ
1 + (τ ω)2 1 + (τ ω)2
1 (ωτ )1−α 1
= d(ωτ ) ∼ 2−α . (6.13)
ω ω1−α 1 + (τ ω) 2 ω
The question is then how assumption (6.12) can be justified. The wide-spread
appearance of .1/f noise can only happen when scale-invariant distribution of
relaxation times are ubiquitous, viz if they were self-organized. The .1/f noise
therefore constitutes an interesting motivation for the search of possible mechanisms
leading to self-organized criticality.
Cellular automata are finite state lattice systems with discrete local update rules.
where .i + δ denote neighboring sites of site i. Each site or “cell” of the lattice
follows a prescribed rule evolving in discrete time steps. At each step the new value
for a cell depends only on the current state of itself and on the state of its neighbors.
Cellular automata differ from general dynamical networks in two aspects.5
– The update functions are all identical: .fi () ≡ f (), viz they are translational
invariant.
– The number n of states per cell is usually larger than two, the boolean case.
Cellular automata can give rise to exceedingly complex behavior despite their
deceptively simple dynamical structure. We note that cellular automata are always
updated synchronously and never sequentially or randomly. The state of all cells is
renewed simultaneously.
zi = 0 (dead),
. zi = 1 (alive) .
For a basic type of rules the evolution of a given cell to the next time step depends
only on the current state of the cell and on the values of each of its eight nearest
neighbors. In this case there are
since any one of the 512 configurations can be mapped independently to ‘live’ or
‘dead’. For comparison note that the age universe is of the order of .3×1017 seconds.
Totalistic Update Rules It clearly does not make sense to explore systematically
the consequences of arbitrary updating rules. One simplification is to consider a
mean field approximation that results in a subset of rules termed “totalistic”. For
mean field rules, the new state of a cell depends only on the total number of living
neighbors, besides its own state. An eight-cell neighborhood has
This is a large number, but it is exponentially smaller than the number of all possible
update rules for the same neighborhood.
The “game of life” takes its name because it attempts to simulate the reproductive
cycle of a species. It is formulated on a square lattice with an update rule involving
the eight-cell neighborhood. A new offspring needs exactly three parents in its
neighborhood. A living cell dies of loneliness if it has less than two live neighbors,
and of overcrowding if it has more than three live neighbors. A living cell feels
comfortable with two or three live neighbors; in this case it survives. The complete
set of updating rules is listed in Table 6.2.
Living Isolated Sets The time evolution of an initial set of a cluster of living cells
can show varied types of behavior. Fixpoints of the updating rules, such as a square
. (0, 0), (1, 0), (0, 1), (1, 1)
of four neighboring live cells, survive unaltered. There are many configurations of
living cells which oscillate, such as three live cells in a row or column,
. (−1, 0), (0, 0), (1, 0) , (0, −1), (0, 0), (0, 1) .
Table 6.2 Updating rules for the game of life, with .zi = 0, 1 corresponding to empty and living
cells. An “x” as an entry denotes what is going to happen for the respective number of living
neighbors
Number of living neighbors
.zi (t) .zi (t + 1) 0 1 2 3 4. . . 8
0 1 x
0 x x x x
1 1 x x
0 x x x
216 6 Self-Organized Criticality
(a) (b)
(c)
Fig. 6.5 Time evolution of some living configurations for the game of life, see Table 6.2. (a) The
“block”; it quietly survives. (b) The “blinker”; it oscillates with period 2. (c) The “glider”; it shifts
by (.−1, 1) after four time steps
is dubbed “glider”, since it returns to its initial shape after four time steps but is
displaced by .(−1, 1), see Fig. 6.5. It constitutes a fixpoint of .f (f (f (f (.)))) times
the translation by .(−1, 1). The glider continues to propagate until it encounters a
cluster of other living cells.
zi = 0 (empty),
. zi = 1 (tree), zi = 2 (fire) .
6.3 Cellular Automata 217
Fig. 6.6 For the forest fire model, the time evolution (from left to right) of a configuration of
living trees (green), burning trees (red) and of places burnt down (grey). Places burnt down can
regrow spontaneous with a small rate, the fire always spreads to nearest neighboring trees
A tree sapling can grow on every empty cell, with probability .p < 1. There is no
need for nearby parent trees, as sperms are carried by wind over wide distances.
Trees do not die in this model, but they catch fire from any burning nearest neighbor
tree. The rules are illustrated in Fig. 6.6.
The forest fire automaton differs from typical rules, such as Conway’s game of
life, because it has a stochastic component. In order to have an interesting dynamics
one needs to adjust the growth rate p as a function of system size, so as to keep the
fire burning continuously. The fires burn down the whole forest when trees grow too
fast. When the growth rate is too low, on the other hand, the fires, being surrounded
by ashes, may die out completely.
.zi (t) + 1)
.zi (t Condition
Empty Tree With probability .p < 1
Tree Tree No fire close by
Tree Fire At least one fire close by
Fire Empty Always
When adjusting the growth rate properly, one reaches a steady state, the system
having fire fronts continually sweeping through the forest, as it is observed for
real-world forest fires, see Fig. 6.7 for an illustration. In large systems stable spiral
structures form and set up a steady rotation.
Criticality and Lightning As defined above, the forest fire model is not critical,
since the characteristic time scale .1/p for the regrowth of trees governs the
dynamics. This time scale translates into a characteristic length scale .1/p, which
can be observed in Fig. 6.7, via the propagation rule for the fire.
Self-organized criticality can, however, be induced in the forest fire model when
introducing an additional rule, namely that a tree might ignite spontaneously with a
small probability f , when struck by lightning, causing also small patches of forest
218 6 Self-Organized Criticality
Fig. 6.7 Simulations of the forest fire model. Left: Fires burn in characteristic spirals for a growth
probability .p = 0.005 and no lightning, .f = 0. Reprinted from Clar et al. (1996) with permissions,
© 1996 IOP Publishing Ltd. Right: A snapshot of the forest fire model with a growth probability
.p = 0.06 and a lightning probability .f = 0.0001. Note the characteristic fire fronts with trees in
front and ashes behind
to burn. We will not discuss this mechanism in detail here, treating instead in the
next section the occurrence of self-organized criticality in the sandpile model on a
firm mathematical basis.
Self-Organized Criticality The notion of ‘life at the edge of chaos’ states,6 that
certain dynamical and organizational aspects of living organisms may be critical.
Normal physical and dynamical systems show criticality however only for selected
parameters, e.g. .T = Tc , see Sect. 6.1. For criticality to be biologically relevant, the
system must evolve into a critical state starting from a wide range of initial states,
the defining trait of “self-organized criticality”.
Sandpile Model Per Bak and coworkers introduced a simple cellular automaton,
the BTW model, that mimics the properties of sandpiles. Every cell is characterized
by a force
zi = z(i, j ) = 0, 1, 2, . . . ,
. i, j = 1, . . . , L
6 The concept of life at the edge of chaos was first develop in the context of information-processing
zj → zj − Δij ,
. if zi > K ,
This update rule is valid for the four-cell neighborhood {(0, ±1), (±1, 0)}. The
threshold K is arbitrary, a shift in K simply shifts zi . It is customary to consider
K = 3. Any initial random configuration will eventually relax into a steady-state
with
zi = 0, 1, 2, 3,
. (stable state) .
CONSERVING QUANTITY If there is a quantity that is not changed by the update rule, it
is said to be conserving.
The sandpile model is locally conserving and hence said to be “abelian”. The
total height j zj is constant due to j Δi,j = 0. Globally, however, it is not
conserving, as one uses open boundaries, at which excess sand is lost, toppling
down from the table. Sand is lost when a site at the boundary topples.
Avalanches When starting from a random initial state with zi ≤ K, the system
settles in a steady-state dynamic state when adding “grains of sand” for a while.
When a grain is dropped onto a site with zi = K
zi → zi + 1,
. zi = K ,
220 6 Self-Organized Criticality
Fig. 6.8 The progress of an avalanche, with duration t = 3 and size s = 13, for a sandpile
configuration on a 5 × 5 lattice with K = 3. The height of the sand in each cell is indicated by the
numbers. Also shown is avalanche progression (shaded region). The avalanche stops after step 3
a toppling event is induced, which may in turn lead to a whole series of topplings.
The resulting avalanche is characterized by its duration t and the size s of affected
sites. It continues until a new stable configuration is reached. In Fig. 6.8 a small
avalanche is shown.
Distribution of Avalanches We define with D(s) and D(t) the distributions of the
size and of the duration of avalanches. One finds that they are scale free,
D(s) ∼ s −αs ,
. D(t) ∼ t −αt , (6.16)
as we will discuss in the next section. The scaling relations (6.16) express the
essence of self-organized criticality, which we expect to be valid for a wide range
of cellular automata with conserving dynamics, independent of the special values of
the parameters entering the respective update functions. Numerical simulations and
analytic approximations for d = 2 dimensions yield
5 3
.αs ≈ , αt ≈ .
4 4
Conserving Dynamics and Self-Organized Criticality Given that toppling events
are locally conserving, avalanches of arbitrary large sizes must occur, as sand can
be lost only at the boundary of the system. Vice versa, only avalanches reaching the
boundary contribute to the powerlaw scaling expressed by (6.16). Self-organized
criticality breaks down as soon as there is a small but non-vanishing probability to
lose sand somewhere inside the system.
Features of the Critical State The empty board, when all cells are initially empty,
zi ≡ 0, is not critical. The system remains in the frozen phase when adding sand,
as long as most zi < K. Adding one sand corn after the other, the critical state is
slowly approached. There is no way to avoid the critical state.
Once the critical state is achieved the system remains critical. The critical state is
paradoxically also the point at which the system is dynamically most unstable. It has
6.4 Sandpile Model and Self-Organized Criticality 221
an unlimited susceptibility to the external driving, viz to the addition of grains, using
the terminology of Sect. 6.1, as a single added grain of sand can trip avalanches of
arbitrary size.
– All topplings will stop eventually whenever the average number of grains is too
small. The resulting inactive configuration is called “absorbing state”.
– For a large average number of grains, the redistribution of grains will never
terminate, resulting in a continously active state.
Adding externally a single grain to an absorbing state will lead at most to a single
avalanche with the transient acitivity terminating in another absorbing state. In this
picture, the grain of sand added has been absorbed.
is characterized by the mean number of sites with heights .zi greater then the
threshold K, the active sites. The avalanche shown in Fig. 6.8, has 1/2/2/0 active
sites respectively, to give an example, at time steps 1/2/3/4. The transition from the
absorbing to the active state is of second order, as illustrated in Fig. 6.9, with the
number of active sites acting as an order parameter.8
7 Generically, absorbing states are final configurations of Markov chains, as discussed in Sect. 3.3.2
of Chap. 3.
8 A fully self consistent mean-field theory of absorbing phase transitions in the presence of
ρc ρ
Fig. 6.9 Changing the mean particle density .ρ, an absorbing phase transition may occur, whith
the density of active sites acting as an order parameter. The system self-organizes towards the
critical particle density .ρc through the balancing of a slow external drive, realized by adding grains
in the sandpile model, and a fast internal dissipative process, when loosing grains of sand at the
boundaries
number, driving the system towards criticality from below, as shown in Fig. 6.9.
Particles surpassing the critical density are instantaneously dissipated through large
avalanches reaching the boundaries of the systems, the mean particle density is
hence pinned at criticality.
Branching theory deals with the growth of networks via branching. Networks
generated by branching processes are loopless; they typically arise in theories of
evolutionary processes.
Avalanches have an intrinsic relation to branching processes; at every time step the
avalanche can either continue or stop. Random branching theory is hence a suitable
method for studying self-organized criticality.
time 0: zi → zi − 4 zj → zj + 1
.
time 1: zi → zi + 1 zj → zj − 4
6.5 Random Branching Theory 223
p 1−p
1−p p 1−p p
Fig. 6.10 Branching processes. Left: The two possible processes of order .n = 1. Right: A
generic process of order .n = 3 with an avalanche of size .s = 7
when two neighboring cells i and j initially have .zi = K + 1 and .zj = K.
This implies that an avalanche typically intersects with itself. However, for a d-
dimensional hypercubic lattice with .K = 2d − 1, self-interaction of avalanches
becomes unimportant when .1/d → 0. In large dimensions, avalanches can be
mapped rigorously to a random branching process.9
For .p < 1/2 the number of new active sites decreases on the average, with
avalanches dying out; .pc = 1/2 is the critical state with on the average conserving
dynamics. See Fig. 6.10 for some examples of branching processes.
p p p
Small Avalanches For small s and large n one can evaluate the probability for
small avalanches to occur by hand,
Pn (1, p) = 1 − p,
. Pn (3, p) = p(1 − p)2 , Pn (5, p) = 2p2 (1 − p)3 ,
compare Figs. 6.10 and 6.11. Note that .Pn (1, p) is the probability to find an
avalanche of just one site.
the generating functional .fn (x, p) for the probability distribution .Pn (s, p). We note
that
1 ∂ s fn (x, p)
Pn (s, p) =
. , n, p fixed . (6.18)
s! ∂x s x=0
10 Foran introduction to generating functions for probability distributions we refer to Sect. 1.3.2
of Chap. 1.
6.5 Random Branching Theory 225
is valid. To see why, one builds the branching network backwards, adding a site at
the top.
for the steady-state generating functional .f (x, p). The normalization condition
1− 1 − 42 p(1 − p) 1− (1 − 2p)2
f (1, p) =
. = =1
2p 2p
is fulfilled for .p ∈ [0, 1/2]. For .p > 1/2 the last step in above equation would not
be correct.
Comparing this with the definition of the generating functional (6.17) we note that
s = 2k − 1, which is equivalent to .k = (s + 1)/2. This implies
.
1 s/2
P (s, p) ∼
. 4p(1 − p) 4p(1 − p) ∼ e−s/sc (p) (6.21)
p
for the probability to find an avalance of size s, where we have used the relation
= e−s(ln a)/(−2) ,
s/2 )
a s/2 = eln(a
. a = 4p(1 − p)
226 6 Self-Organized Criticality
−2
sc (p) =
. , lim sc (p) → ∞
ln[4p(1 − p)] p→1/2
of the avalanche correlation length .sc (p). For .p < 1/2, the correlation length .sc (p)
is finite and the avalanche is consequently not scale-free, compare Sect. 6.2. The
characteristic size of an avalanche .sc (p) diverges for .p → pc = 1/2. Note that
.sc (p) > 0 for .p ∈]0, 1[.
This expression is still unhandy. However we are only interested in the asymptotic
behavior for large avalanche sizes s. For this purpose we consider the recursive
relation
1/2 − k 1 − 1/(2k)
Pc (k + 1) =
. (−1)Pc (k) = Pc (k)
k+1 1 + 1/k
Qn (σ, p)
.
for an avalanche of duration n to have .σ cells at the boundary, see Fig. 6.11. One can
derive a recursion relation for the corresponding generating functional, in analogy
to (6.19), and solve it self-consistently.11
Qn ∼ n−2 ,
. D(t) ∼ t −2 , αt = 2 . (6.23)
zj → zj − Δij ,
. if zi > K ,
and
⎧
⎪
⎪ K i=j
⎨
−c K/4 i, j nearest neighbors
.Δi,j = (6.24)
⎪
⎪ −(1 − c) K/8 i, j next-nearest neighbors
⎩
0 otherwise
p2
0.8
0.6 W=1
0.2
0
0 0.2 0.4 0.6 0.8 1
q
Fig. 6.12 Galton–Watson processes. Left: Example of a reproduction tree, with .pm being the
probabilities of having .m = 0, 1, . . . offsprings. Right: Graphical solution for the fixpoint
equation (6.27), for various average numbers of offsprings W
for a square-lattice with four nearest neighbors and eight next-nearest neighbors.
The update rules are conserving,
. Δij = 0, ∀c ∈ [0, 1] .
j
History of Family Names Family names are handed down traditionally from
father to son. Family names regularly die out, leading over the course of time
to a substantial reduction of the pool of family names. This effect is especially
pronounced in countries looking back on millenia of cultural continuity, like China,
where 22 % of the population are sharing only three family names.
The evolution of family names is described by a Galton–Watson process, with a
key quantity of interest being the extinction probability, viz the probability that the
last person bearing a given family name dies without descendants.
We denote with .pm the probability that an individual has m offsprings and with
(n)
with .pD the probability of finding a total of D descendants in the n-th generation.
The respective generating functions,
(n)
G0 (x) =
. pm x m , G(n) (x) = pD x D ,
m D
which can be solved together with the starting point .G(0) (x) = x. This represen-
tation is the basis for all further considerations; we consider here the extinction
probability q.
Extinction Probability The reproduction process dies out when there is a genera-
tion with zero members. The probability of having zero persons bearing a specific
family name in the n-th generation is
(n)
q = p0 = G(n) (0) = G0 G(n−1) (0) = G0 (q) ,
. (6.26)
where we have used the recursion relation (6.25) and the stationarity condition
.G(n) (0) ≈ G(n−1) (0). The extinction probability q is hence given by the fixpoint
.q = G0 (q) of the generating functional .G0 (x) of the reproduction probability.
12 See Chap. 8.
230 6 Self-Organized Criticality
with the smaller root being here of relevance. The extinction probability vanishes
for a reproduction rate of two,
⎧
⎨ 0 W =2
.q(W ) = q ∈ ]0, 1[ 1<W <2
⎩
1 W ≤1
Fitness Landscapes Evolution deals with the adaption of species and their fitness
relative to the ecosystem they live in.
FITNESS LANDSCAPES The function that determines the chances of survival of a species,
its fitness, is the fitness landscape.
In Fig. 6.13 a simple fitness landscape is illustrated, in which there is only one
dimension in the genotype and/or phenotype space.13
Due to the presence of fitness barriers between adjacent local peaks, the
population will spend most of its time in a local fitness maximum whenever the
mutation rate is low with respect to the selection rate, see Fig. 6.13. Mutations
are random processes that induce evolutionary transitions from one local fitness
maximum to the next through stochastic escape.14
13 The term “genotype” denotes the ensemble of genes. The actual form of an organism, the
“phenotype”, is determined by the genotype plus environmental factors, like food supply during
growth.
14 Stochastic escape is discussed in Sect. 3.5.2 of Chap. 3.
6.6 Application to Long-Term Evolution 231
species fitness
landscape. A species evolving
from an adaptive peak P to a Q
B
new adaptive peak Q needs to
overcome the fitness
barrier B
genotype
such as average rainfall and temperature, and biological influences, namely the
properties and actions of the constituting species. The evolutionary progress of
one species will therefore be likely to trigger adaption processes in other species
appertaining to the same ecosystem, a process denoted “coevolution”.
Evolutionary Time Scales In the model of Bak and Sneppen there are no explicit
fitness landscapes like the one illustrated in Fig. 6.13. Instead, the model proposes
that a single number, the “fitness barrier”, can be used as a proxy for the influence of
all other species making up the ecosystem. The time needed for a stochastic escape
from one local fitness optimum increases exponentially with barrier height. We may
therefore assume that the average time t it takes to mutate across a fitness barrier of
height B scales as
t = t0 eB/T ,
. (6.28)
where .t0 and T are appropriate constants. The value of .t0 is arbitrary, as it merely
sets the time scale. The parameter T depends on the mutation rate, and the
assumption that mutation is low implies that T is small compared to the typical
barrier heights B in the landscape. In this case the time scales t for crossing slightly
different barriers are distributed over many orders of magnitude and only the lowest
barrier is relevant.
for its further evolution. The initial .Bi (0) are drawn randomly from .[0, 1]. The
evolutionary dynamics generated by the model consists of the repetition of two
steps.
232 6 Self-Organized Criticality
1 1 1
0.8 0.8
0.8
barriers
barriers
barriers
0.6 0.6
0.6
0.4 0.4
0.4
0.2 0.2
0.2 0 0
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
Fig. 6.14 The barrier values (dots) for a 100 species one-dimensional Bak–Sneppen model with
.K = 2 after 50, 200 and 1600 steps of a simulation. Included in each frame is the approximate
position of the upper edge of the “gap” (horizontal line). A few species have barriers below this
level, indicating that they were involved in an avalanche at the moment when the snapshot of the
system was taken
– The times for a stochastic escape are exponentially distributed, see (6.28). It
is therefore reasonable to assume that the species with the lowest barrier .Bi
mutates and escapes first. After escaping, it will adapt quickly to a new local
fitness maximum and acquire a new barrier for mutation, which is assumed to be
uniformly distributed in .[0, 1].
– The fitness function for a species i is determined by the ecological environment
it lives in, which is made up of the other species. When a given species
mutates it therefore influences the fitness landscape for a certain number of
other species. Within the Bak–Sneppen model this translates into assigning new
random barriers .Bj for .K − 1 neighbors of the mutating species i.
With these two rules, the Bak–Sneppen model tries to capture two essential ingre-
dients of long-term evolution: The exponential distribution of successful mutations
and the interaction of species when one constituting species evolves.
Random Neighbor Model The topology of the interaction between species in the
Bak–Sneppen model is unclear. It might be chosen to be two-dimensional, if the
species are thought to live geographically separated, or one-dimensional in a toy
model. In reality, the topology is complex and can be assumed to be random, in first
approximation, resulting in the soluble random neighbor model.
With increasing average barrier height the characteristic lowest barrier is also
raised, and eventually a steady state will be reached, just as in the sandpile model
discussed previously. It turns out that the characteristic value for the lowest barrier
is about .1/K at equilibrium when a the mean-field approximation is used and that
the steady state is critical.
p(x, t) ,
.
1
Q(x) =
. dx ' p(x ' ), Q(0) = 1, Q(1) = 0 . (6.29)
x
The dynamics is governed by the size of the smallest barrier. The distribution
function .p1 (x) for the lowest barrier is
it is given by the probability .p(x) for one barrier (out of the N barriers) to have the
barrier height x, while all the other .N − 1 barriers are larger. .p1 (x) is normalized,
1 1 ∂Q(x) x=1
. dx p1 (x) = (−N) dx QN −1 (x) = −QN (x) = 1,
0 0 ∂x x=0
Time Evolution of Barrier Distribution The time evolution for the barrier
distribution consists in taking away one (out of N) barrier, the lowest, via
1
p(x, t) −
. p1 (x, t) ,
N
and by removing randomly .K − 1 barriers from the remaining .N − 1 barriers, and
adding K random barriers,
1
p(x, t + 1) = p(x, t) − p1 (x, t)
. (6.31)
N
K −1 1 K
− p(x, t) − p1 (x, t) + .
N −1 N N
234 6 Self-Organized Criticality
We note that .p(x, t + 1) is normalized whenever .p(x, t) and .p1 (x, t) were
normalized correctly,
1 1 K −1 1 K
. dxp(x, t + 1) = 1 − − 1− +
0 N N −1 N N
K −1 N −1 K N −K K
= 1− + = + ≡ 1.
N −1 N N N N
∂Q(x) N −1 ∂Q(x)
0 = N (N − K)
. Q + (K − 1)N + K(N − 1) ,
∂x ∂x
or, equivalently,
0 = N (N − K) QN −1 dQ + (K − 1)N dQ + K(N − 1) dx .
.
0 = QN (x) + (K − 1) Q(x) − K (1 − x) .
. (6.33)
We note that .Q(x) ∈ [0, 1] and that .Q(0) = 1, .Q(1) = 0. There must therefore be
some .x ∈]0, 1[ for which .0 < Q(x) < 1. Then
K
QN (x) → 0,
. Q(x) ≈ (1 − x) . (6.34)
K −1
6.6 Application to Long-Term Evolution 235
1
at equilibrium
Q(x)
random
0
0 x 1
1/K
Fig. 6.15 The distribution .Q(x) to find a fitness barrier larger than .x ∈ [0, 1] for the Bak–Sneppen
model, for the case of random barrier distribution (dotted blue line) and the stationary distribution
(dashed red line), compare (6.35)
K 1
. 1= (1 − xc ), xc = .
K −1 K
when using .p(x) = −∂Q(x)/∂x. This result compares qualitatively well with
the numerical results presented in Fig. 6.14. Note, however, that the mean-field
solution (6.36) does not predict the exact critical barrier height for non-random
neighbors, which is somewhat larger for a one-dimensional arrangement of neigh-
bors, as in Fig. 6.14.
Distribution of the Lowest Barrier Equation (6.36) cannot be rigorously true for
.N < ∞, since there is a finite probability for barriers with .Bi < 1/K to reappear
at every step. If the barrier distribution is zero below the self-organized threshold
.xc = 1/K and constant above, then the lowest barrier must be below .xc with equal
probability:
K for x < 1/K 1
p1 (x) →
. , dx p1 (x) = 1 . (6.37)
0 for x > 1/K 0
236 6 Self-Organized Criticality
time
coevolutionary avalanches
interrupting the punctuated
equilibrium. Shown are the 0.4
new barrier values (dots), viz
the evolving species
0.2
0
0 20 40 60 80 100
species
Coevolution and Avalanches When the species with the lowest barrier mutates,
we assign new random barrier heights to it and to its .K − 1 neighbors. This
causes an avalanche of evolutionary adaptations whenever one of the new barriers
becomes the new lowest fitness barrier. One calls this phenomenon “coevolution”
since the evolution of one species drives the adaption of other species belonging to
the same ecosystem. In Fig. 6.16 this process is illustrated for the one-dimensional
model. The avalanches in the system are clearly visible and well separated in
time. In between the individual avalanches the barrier distribution does not change
appreciably; one speaks of a “punctuated equilibrium”.
With probability x one of the new random barriers is in .[0, x] and below the
actual lowest barrier, which is distributed with .p1 (x), see (6.37). We then have
1 1/K K2 2 1/K 1
.pbran = K p1 (x) x dx = K K x dx = x ≡ ,
0 0 2 0 2
viz the avalanches are critical. The distribution of the size s of the coevolutionary
avalanches is then
3/2
1
.D(s) ∼ ,
s
Exercises 237
as evaluated within the random branching approximation, see (6.22), and indepen-
dent of K. The size of a coevolutionary avalanche can be arbitrarily large and
involve, in extremis, a finite fraction of the ecosystem, compare Fig. 6.16.
Features of the Critical State The sandpile model evolves into a critical state
under the influence of an external driving by adding one grain of sand after another.
The critical state is characterized by a distribution of slopes (or heights) .zi that has
a discontinuity; there is a finite fraction of slopes with .zi = Z − 1, but no slope with
.zi = Z, apart from some of the sites transiently participating in an avalanche.
In the Bak–Sneppen model the same process occurs, but without external
drivings. At criticality, the barrier distribution .p(x) = ∂Q(x)/∂x has a discontinuity
at .xc = 1/K, see Fig. 6.15. One could say, cum grano salis, that the system
has developed an “internal phase transition”, namely a transition in the barrier
distribution .p(x), which is an internal variable. The emergent state for .p(x) is a
many-body or collective effect, since it results from the reciprocal interactions of
the species participating in the formation of the ecosystem.
Exercises
two arbitrary links with probability p and rewiring the four resulting stubs
randomly.15
Define an appropriate dynamical order parameter and characterize the changes
as a function of the rewiring probability.
(6.5) FOREST FIRE MODEL
Develop a mean-field theory for the forest fire model by introducing appro-
priate probabilities to find cells with trees, fires and ashes. Find the critical
number of nearest neighbors Z for fires to continue burning.
(6.6) REALISTIC SANDPILE MODEL
Propose a cellular automata model that simulates the physics of real-world
sandpiles somewhat more realistically than the original model. The cell value
z(x, y) should correspond to the local height of sand. Write a program to
simulate the model.
(6.7) RECURSION RELATION FOR AVALANCHE SIZES
Use the definition (6.17) for the generating functional fn (x, p) of avalanche
sizes in (6.19) and derive a recursion relation for the probability Ps (n, p)
of finding an avalanche of size s in the nth generation, given a branching
probability p. How does this recursion relation change when the branching is
not binary but, as illustrated in Fig. 6.12, determined by the probability pm of
generating m offsprings?
(6.8) RANDOM BRANCHING MODEL
Derive the distribution of avalanche durations (6.23) in analogy to the steps
explained in Sect. 6.5, by considering a recursion relation for the integrated
n
duration probability Q̃n = n' =0 Qn (0, p), viz for the probability that an
avalanche last maximally n time steps.
(6.9) GALTON–WATSON PROCESS
Use the fixpoint condition (6.26) and show that the extinction probability is
unity if the average reproduction rate is smaller than one.
Further Reading
15 Thisprocedure corresponds to a 2D Watts and Strogatz model, which is discussed in Sect. 1.5
of Chap. 1.
References 239
References
Bak, P., & Sneppen, K. (1993). Punctuated equilibrium and criticality in a simple model of
evolution. Physical Review Letters, 71, 4083–4086.
Calcaterra, C. (2022). Existence of life in Lenia. arXiv:2203.14390.
Clar, S., Drossel, B., & Schwabl, F. (1996). Forest fires and other examples of self-organized
criticality. Journal of Physics: Condensed Matter, 8, 6803–6824.
Creutz, M. (2004). Playing with sandpiles. Physica A, 340, 521–526.
Drossel, B. (2000). Scaling behavior of the Abelian Sandpile model. Physical Review E, 61, R2168.
Drossel, B., & Schwabl, F. (1992). Self-organized critical forest-fire model. Physical Review
Letters, 69, 1629–1632.
Flyvbjerg, H., Sneppen, K., & Bak, P. (1993). Mean field theory for a simple model of evolution.
Physical Review Letters, 71, 4087–4090.
Hinrichsen, H. (1993). Non-equilibrium critical phenomena and phase transitions into absorbing
states. Advances in Physics, 49, 815–958.
Marković, D., & Gros, C. (2014). Powerlaws and self-organized criticality in theory and nature.
Physics Reports, 536, 41–74.
Rennard, J. P. (2002). Implementation of logical functions in the game of life. In Collision-based
computing (pp. 491–512). Springer.
Sinai, Y. G. (2014). Theory of phase transitions: Rigorous results. Elsevier.
Wolfram, S. (Ed.). (1986). Theory and applications of cellular automata. World Scientific.
Zapperi, S., Lauritsen, K. B., & Stanley, H. E. (1995). Self-organized branching processes: Mean-
field theory for avalanches. Physical Review Letters, 75, 4071–4074.
Random Boolean Networks
7
Complex systems theory deals with dynamical systems containing a very large
number of variables. The resulting dynamical behavior can be arbitrary complex
and sophisticated. It is therefore important to have well controlled benchmarks,
dynamical systems which can be investigated and understood in a controlled way
for large numbers of variables.
Networks of interacting binary variables, i.e. boolean networks, constitute such
canonical complex dynamical systems. They allow the formulation and investigation
of important concepts, among others regarding information retention and loss
and the occurrence of phase transition in the resulting dynamical states. Boolean
networks are recognized to be the starting points for the modeling of gene expression
and protein regulation networks; the fundamental networks at the basis of life.
7.1 Introduction
BOOLEAN COUPLING FUNCTION A boolean function .{0, 1}K → {0, 1} maps K boolean
variables onto a single one.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 241
C. Gros, Complex and Adaptive Dynamical Systems,
https://doi.org/10.1007/978-3-031-55076-8_7
242 7 Random Boolean Networks
3
4
The notion would hence be that cell types correspond to dynamical states of a
complex system, i.e. of the gene expression network. This proposal by Kauffman
has received on one side strong support from experimental studies. In Sect. 7.5.2 the
case of the yeast cell division cycle will be discussed, supporting the notion that gene
regulation networks constitute the underpinnings of life. Regarding the mechanisms
of cell differentiation in multicellular organisms, the situation is, on the other side,
less clear. Cell types are mostly determined by DNA methylation, which affects the
respective gene expression on long time scales.
N–K Networks There are several types of random boolean networks. The most
simple realization is the N–K model. It is made up of N boolean variables, each
variable interacting exactly with K other randomly chosen variables. The respective
coupling functions are also chosen randomly from the set of all possible boolean
functions mapping K boolean inputs onto a single boolean output.
An ideal realization of the N–K model in nature is not known. All real physical
or biological problems have specific couplings determined by the structure and the
physical and biological interactions of the system in question. In many instances,
the topology of the couplings is complex and mostly unknown, it is then often a
good starting point to model a real-world system using a generic framework, like
the N–K model.
Boolean networks have a rich variety of possible concrete model realizations. The
most important types are discussed in the following.
σi ∈ {0, 1},
. i = 1, 2, . . . , N
244 7 Random Boolean Networks
σi = σi (t),
. t = 1, 2, . . . .
The value of a given boolean element σi at the next time step is determined by the
values of K controlling variables.
CONTROLLING ELEMENTS The controlling elements σj1 (i) , σj2 (i) , . . ., σjKi (i) of a
boolean variable σi determine its time evolution.
Table 7.1 Examples of boolean functions of three arguments. Given is a particular random
function and a canalizing function of the first argument. For the latter, the function value is 1
when σ1 = 0. If σ1 = 1, the output can be either 0 or 1. For the additive function shown, the
output is 1 (active) if at least two inputs are active. The generalized XOR is true when the number
of true-bits is odd
f (σ1 , σ2 , σ3 )
σ1 σ2 σ3 Random Canalizing Additive Gen. XOR
0 0 0 0 1 0 0
0 0 1 1 1 0 1
0 1 0 1 1 0 1
0 1 1 0 1 1 0
1 0 0 1 0 0 1
1 0 1 0 1 1 0
1 1 0 1 0 1 0
1 1 1 1 0 1 1
7.2 Random Variables and Networks 245
Model Definition For a complete definition of the system several parameters need
to be specified.
– CONNECTIVITY
The first step is to select the connectivity Ki of each element, i.e. the number of
its controlling elements. With
1
N
.〈K〉 = Ki
N
i=1
the average connectivity is defined. Mostly we will consider the case in which
the connectivity is identical for all nodes, Ki = K.
– LINKAGES
The second step is to select the specific set of controlling elements
. σj1 (i) , σj2 (i) , . . . , σjKi (i)
Geometry of the Network The way the linkages are assigned determines the
topology of the network and networks can have highly diverse topologies. It is
custom to consider two special cases.
LATTICE ASSIGNMENT The boolean variables σi are assigned to the nodes of a regular
lattice. The K controlling elements σj1 (i) , σj2 (i) , . . ., σjK (i) are then chosen in a regular,
translational invariant manner, as in Fig. 7.2.
UNIFORM ASSIGNMENT For a uniform assignment, the set of controlling elements are
randomly drawn from all N sites of the network. This is the case for the N–K model, the
original Kauffman net. In terms of graph theory one also speaks of an Erdös–Rényi random
graph.
246 7 Random Boolean Networks
has .2K different arguments. To each argument value one can assign either 0 or 1.
Thus, there are a total of
⎧
⎪
⎪ 2 K =0
⎨
2K 2K 4 K =1
.Nf = 2 =2 =
⎪
⎪ 16 K =2
⎩
256 K =3
possible coupling functions. In Table 7.1 several examples are presented for the case
3
K = 3, out of the .22 = 256 distinct .K = 3 boolean functions.
.
– K=0
There are only two constant functions, f = 1 and f = 0.
– K=1
Apart from the two constant functions,
forming the class A, there is the identity
and the negation, lumped together into
class B.
σ Class A Class B
. 0 0 1 0 1
1 0 1 1 0
2 Erdös–Rényi graphs are discussed in conjunction with small-world and scale-free networks in
Chap. 1.
7.2 Random Variables and Networks 247
– K=2
There are four classes of functions f (σ1 , σ2 ), as listed in Table 7.2, with each
class being invariant under the interchange 0 ↔ 1 in either the arguments or the
value of f .
A: Constant functions.
B1 : Fully canalizing functions.
.
B2 : Normal canalizing functions, see also Table 7.1.
C: Non-canalizing functions, also denoted “reversible functions”.
For fully canalizing functions one of the arguments determines the output determin-
istically.
– UNIFORM DISTRIBUTION
As introduced originally by Kauffman, the uniform distribution specifies all
possible coupling functions to occur with the same probability 1/Nf .
– MAGNETIZATION BIAS3
The probability of a coupling function to occur is proportional to p if the outcome
is 0 and proportional to 1 − p if the outcome is 1.
– FORCING FUNCTIONS
For forcing or canalizing functions, the function value is determined when one of
its arguments, say m ∈ {1, . . . , K}, has a specific value, say σm = 0. In contrast,
the function value is not specified if the forcing argument has a different value,
here when σm = 1. Compare Table 7.1.
Table 7.2 The 16 boolean functions for K = 2. For the definition of the various classes see
Sect. 7.2.2 and Aldana-Gonzalez et al. (2003)
σ1 σ2 Class A Class B1 Class B2 Class C
0 0 1 0 0 1 0 1 1 0 0 0 0 1 1 1 1 0
0 1 1 0 0 1 1 0 0 1 0 0 1 0 1 1 0 1
1 0 1 0 1 0 0 1 0 0 1 0 1 1 0 1 0 1
1 1 1 0 1 0 1 0 0 0 0 1 1 1 1 0 1 0
3 Magnetic moments may have only two possible directions, up or down in the language of spin-1/2
particles. A compound is hence magnetic when more moments point into one of the two possible
directions, viz if the two directions are populated unequally.
248 7 Random Boolean Networks
– ADDITIVE FUNCTIONS
In order to simulate the additive properties of inter-neural synaptic activities one
can choose
N
σi (t + 1) = Θ f˜i (t) ,
. f˜i (t) = −h + wij σj (t)
j =1
where Θ(x) is the Heaviside step function, h the threshold for activating the
neuron and wij the synaptic weight connecting the pre- and post-synaptic
neurons j and i. The value of σi (t + 1) depends only on a weighted sum of
its controlling elements at time t.
7.2.3 Dynamics
Model Realizations A given set of linkages and boolean functions .{fi } defines
what one calls a “realization” of the model, with the dynamics following from (7.1).
For the updating of the elements during a time step one has several choices.
– SYNCHRONOUS UPDATE
All variables .σi (t) are updated simultaneously.
– ASYNCHRONOUS UPDATING
A single variable is updated at every step. This variable may be picked at random
or by some predefined ordering scheme. Hence also the name “serial update”.
The choice of updating does not affect thermodynamic properties, like the phase
diagram discussed in Sect. 7.3.2. The occurrence and the properties of cycles and
attractors, as discussed in Sect. 7.4, depends however crucially on the form of
update.
Selection of the Model Realization There are several alternatives for choosing the
model realization during numerical simulations.
– QUENCHED MODEL4
One specific realization of coupling functions is selected at the beginning and
kept throughout all time.
quickly that it remains stuck in a specific atomic configuration, which does not change anymore
subsequently.
7.3 Dynamics of Boolean Networks 249
– ANNEALED MODEL5
A new realization is randomly selected at each time step. Then either the linkages
or the coupling functions or both change with every update, depending on the
choice of the algorithm.
– GENETIC ALGORITHM
If the network is thought to approach a predefined goal, one may employ
a genetic algorithm in which the system slowly modifies its realization with
passing time.
part of the original trajectory is retraced and cyclic behavior follows. The resulting
cycle acts as an attractor for a set of initial conditions.
We will now examine how we can characterize the dynamical state of boolean
networks in general and of N–K nets in particular. Two concepts will turn out to
be of central importance, the relation of robustness to the flow of information and
the characterization of the overall dynamical state, which we will find to be either
frozen, critical or chaotic.
5 A compound is said to be “annealed” when it has been kept long enough at elevated temperatures
such that the thermodynamic stable configuration has been achieved.
250 7 Random Boolean Networks
1 000 010
AND
001
2 OR OR 3
OR OR AND 100
12 3 13 2 23 1
00 0 00 0 00 0 110 011 111
01 1 01 1 01 0
10 1 10 1 10 0 101
11 1 11 1 11 1
Fig. 7.3 A boolean network with N = 3 sites and connectivities Ki ≡ 2. Left: Definition of the
network linkage and coupling functions. Right: The complete network dynamics. Reprinted from
Luque and Sole (2000) with permissions, © 2000 Elsevier Science B.V
we are typically interested in the case when Σ0 and Σ̃0 are close, viz when they
differ in the values of only a few elements. A suitable measure for the distance is
the “Hamming distance” D(t) ∈ [0, N],
N
2
D(t) =
. σi (t) − σ̃i (t) , (7.2)
i=1
which is just the sum of the elements that differ in Σ0 and Σ̃0 . As an example take
Σ1 = {1, 0, 0, 1},
. Σ2 = {0, 1, 1, 0}, Σ3 = {1, 0, 1, 1} .
We have 4 for the Hamming distance Σ1 –Σ2 and 1 for the Hamming distance Σ1 –
Σ3 . If the system is robust, two close-by initial conditions will never move far apart
with time passing, in terms of the Hamming distance.
7.3 Dynamics of Boolean Networks 251
Normalized Overlap The normalized overlap a(t) ∈ [0, 1] between two configu-
rations is defined as
1 2
N
D(t)
a(t) = 1 −
. =1− σi (t) − 2σi (t)σ̃i (t) + σ̃i2 (t)
N N
i=1
2
N
≈ σi (t)σ̃i (t) , (7.3)
N
i=1
where we did assume the absence of any magnetization bias, namely that
1 2 1 1 2
. σi ≈ ≈ σ̃i ,
N 2 N
i i
in the last step. The normalized overlap (7.3) corresponds to a rescaled scalar
product between Σ and Σ̃. On the average, two arbitrary states have a Hamming
distance of N/2, which translates to a normalized overlap a = 1 − D/N of 1/2.
Information Retention for Long Times The difference between two initial states
Σ and Σ̃ can also be interpreted as an information for the system. One then has two
possible behaviors.
– LOSS OF INFORMATION
limt→∞ a(t) → 1, which implies that two states are identical in the thermody-
namic limit, or that they differ only by a finite number of elements. This happens
when two states are attracted by the same cycle. All information about the starting
states is lost.
– INFORMATION RETENTION
limt→∞ a(t) = a ∗ < 1, which means that the system ‘remembers’ that the
two configurations were initially different, with the difference measured by the
respective Hamming distance.
The system is robust when information is routinely lost, which holds for a ∗ = 1.
Robustness depends on the value of a ∗ when information is kept. If a ∗ > 0 then two
trajectories retain a certain similarity for all time scales.
where 0 < D(0) ⪡ N is the initial Hamming distance, with λ being the Lyapunov
exponent.6
The question is then whether two initially close trajectories converge or diverge
initially. One may generally distinguish between three different types of behaviors
or phases.
– CHAOTIC PHASE
λ > 0 : The Hamming distance grows exponentially, i.e. information is
transferred to an exponentially large number of elements. Two initially close
orbits soon become different. This behavior, found for large connectivities K,
is not suitable for real-world biological systems.
– FROZEN PHASE
λ < 0 : Two close trajectories typically converge, as they are attracted by the
same attractor. This behavior arises for small connectivities K. The system is
locally robust.
– CRITICAL PHASE
λ = 0 : When present, exponential time dependencies dominate all other contri-
butions. There is no exponential time dependence when the Lyapunov exponent
λ vanishes. In this case, the Hamming distance will depend algebraically on time,
D(t) ∝ t γ .
All three phases can be found in the N–K model when N → ∞. This is our next
step.
Mean-Field Theory We recall that we are working with two initial states,
N
2
Σ0 ,
. Σ̃0 , D(0) = σi − σ̃i ,
i=1
and that the Hamming distance .D(t) measures the number of elements differing in
Σt and .Σ̃t . An illustration of the following arguments is presented in Fig. 7.4.
.
Σt+1 Σt+1
Σt 1 0 1 0 0 1 0 1 0 1 1 0 0 0 Σt
Fig. 7.4 The time evolution of the overlap between two states .Σt and .Σ̃t (left/right panels). The
vertices (squares) have values 0 or 1. Vertices with identical values in both states, .Σt and .Σ̃t , are
highlighted (gray background). The values of vertices at the next time step, .t + 1, can only differ
if the corresponding arguments are not identical. It is indicated whether vertices at time .t + 1 must
have the same value in both states (grey), or whether they can be different (star)
– For the N–K model, every boolean coupling function .fi is as likely to occur.
– On average, variables are controlling elements for K other variables.
– The variables differing in .Σt and .Σ̃t affect on the average .KD(t) coupling
functions.
– In the absence of a magnetization bias, coupling functions change their value
with probability .1/2.
Taken together, we conclude that the number of elements different in .Σt+1 and .Σ̃t+1
, viz the Hamming distance .D(t + 1), is given by
t
K K
D(t + 1) =
. D(t), D(t) = D(0) = D(0) et ln(K/2) , (7.5)
2 2
Classification of Phases The connectivity K then determines the phase of the N–K
network.
– CHAOTIC
.K > 2 : Two initially close orbits diverge, the number of different elements, i.e.
The power laws typical for critical regimes cannot be deduced within mean-field
theory, which discards fluctuations.
The mean-field theory takes only average quantities into account. The evolution law
D(t + 1) = (K/2)D(t) holds only on the average. Fluctuations, viz the deviation of
.
the evolution from the mean-field prediction, are however of importance only close
to a phase transition, i.e. close to the critical point .K = 2.
254 7 Random Boolean Networks
The mean-field approximation generally works well for lattice physical systems
in high spatial dimensions and fails in low dimensions. The Kauffman network has
no dimension per se, but the connectivity K plays an analogous role.
Phase Transitions in Dynamical Systems and the Brain The notion of a ‘phase
transition’ originally comes from physics, where it denotes the transition between
two or more different physical phases, like ice, water and gas, which are well
characterized by their respective order parameters.
Classically, the term phase transition denotes therefore a transition between two
stationary states. The phase transition discussed here involves the characterization of
the overall behavior of a dynamical system. They are well defined phase transitions
in the sense that .1 − a ∗ plays the role of an order parameter; its value uniquely
characterizes the frozen phase and the chaotic phase in the thermodynamic limit.
An interesting, completely open and unresolved question is in this context
whether dynamical phase transitions play a role in the most complex information
processing system known, the mammalian brain. It is tempting to speculate,
e.g., that the phenomenon of consciousness may result from a dynamical state
characterized by a yet unknown order parameter. In case, consciousness would be
an ‘emergent’ state.
In deriving (7.5), we assumed that the coupling functions .fi of the system acquire
the values 0 and 1 with the identical probabilities .p = 1/2. We generalize this
approach to the case of a magnetic bias, as defined by
0 with probability p
fi =
. .
1 with probability 1 − p
For a given value of the bias p and connectivity K, there are critical values
Kc (p),
. pc (K) ,
such that for .K < Kc (.K > Kc ) the system is in the frozen phase (chaotic phase).
Vice versa, keeping the connectivity fixed and vary p, we have that .pc (K) separates
the system into a chaotic and frozen phase.
Evolution of the Overlap We note that the overlap .a(t) = 1 − D(t)/N between
two states .Σt and .Σ̃t at time t is the probability that two vertices have the same
value both in .Σt and in .Σ̃t .
7.3 Dynamics of Boolean Networks 255
1
0.9 K=1 1
0.8 a*
a(t+1)
K=3 0.8
0.7 K=7
0.6 a*
0.6
0.5
0 0.2 0.4 0.6 0.8 1 0 2 4 6 8 10
a(t) K
Fig. 7.5 Solution of the self-consistency condition .a ∗ = 1 − 1 − (a ∗ )K /Kc , see (7.9). Left:
∗
Graphical solution equating both sides. Right: Numerical result for .a for .Kc = 3. The fixpoint
∗
.a = 1 becomes unstable for .K > Kc = 3
– The probability that all arguments of the function .fi will be the same is both
configurations is
K
ρK = a(t) .
. (7.6)
– As illustrated by Fig. 7.4, the values at the next time step differ with a probability
.2p(1 − p), but only if the arguments of the coupling functions are non-different.
– The probability that at least one controlling element has different values in .Σt
and .Σ̃t , .1 − ρK , this gives the probability, .(1 − ρK )2p(1 − p), of values being
different in the next time step.
We then have
1 − [a(t)]K
a(t + 1) = 1 − (1 − ρK ) 2p(1 − p) = 1 −
. , (7.7)
Kc
1 − [a ∗ ]K
a∗ = 1 −
. . (7.9)
Kc
This self-consistency condition for the normalized overlap can be solved graphically
or numerically by simple iterations, compare Fig. 7.5.
256 7 Random Boolean Networks
a∗ = 1
.
always constitutes a solution of (7.9). We examine its stability under the time
evolution, Eq. (7.7), by considering a small deviation .0 < Δat ⪡ 1 from the
fixpoint solution, .at = a ∗ − Δat :
1 − [1 − Δat ]K KΔat
1 − Δat+1 = 1 −
. , Δat+1 ≈ . (7.10)
Kc Kc
The trivial fixpoint .a ∗ = 1 therefore becomes unstable for .K/Kc > 1, viz when
−1
.K > Kc = 2p(1 − p) .
Bifurcation Equation (7.9) has two solutions for .K > Kc , a stable fixpoint .a ∗ < 1
and the unstable solution .a ∗ = 1. One speaks of a bifurcation, which is shown in
Fig. 7.5. We note that
Kc
. = 2,
p=1/2
in agreement with our previous mean-field result, Eq. (7.5). For large connectivities
one finds
∗ 1 − [a ∗ ]K 1
. lim a = lim 1− =1− = 1 − 2p(1 − p) ,
K→∞ K→∞ Kc Kc
since .a ∗ < 1 for .K > Kc , compare Fig. 7.5. Note that .a ∗ = 1/2 for .p = 1/2
corresponds to the average normalized overlap for two completely unrelated states
in the absence of a magnetization bias, .p = 1/2. Two initial similar states eventually
become completely uncorrelated for .t → ∞ when also the connectivity K diverges.
Rigidity of the Kauffman Net We can connect the results for the phase diagram
of the N–K network illustrated in Fig. 7.6 with our discussion on robustness, see
Sect. 7.3.1.
– CHAOTIC PHASE
∗
.K > Kc : The infinite time normalized overlap .a is less than unity even when
two trajectories .Σt and .Σ̃t start out very close to each other. However, .a ∗ always
remains above the value expected for two completely unrelated states. The reason
is that the Hamming distance remains constant, modulo small-scale fluctuations
that do not contribute in the thermodynamic limit, .N → ∞, after the two orbits
did enter two distinct attractors.
7.3 Dynamics of Boolean Networks 257
10 p = 0.90
CHAOS
8 p = 0.79
p = 0.60
6
K
4
2
ORDER
0
0.50 0.60 0.70 0.80 0.90 1.00
p
Fig. 7.6 Phase diagram for the N–K model. Shown is the separation .Kc = [2p(1−p)]−1 between
the chaotic and order/frozen phase (solid line). The insets are simulations for .N = 50 networks
with .K = 3 and .p = 0.60 (chaotic phase), .p = 0.79 (on the critical line) and .p = 0.90 (frozen
phase). The site index runs horizontally, the time vertically. Note the fluctuations for .p = 0.79.
Reprinted from Luque and Sole (2000) with permissions, © 2000 Elsevier Science B.V
– FROZEN PHASE
∗
.K < Kc : The infinite time overlap .a is exactly one. All trajectories approach
essentially the same configuration independently of the starting point, apart from
fluctuations that vanish in the thermodynamic limit. The system is said to “order”.
In the frozen phase, close-by orbits are attracted by the same cyclic attractor, in the
chaotic state by different attracting states.
0
10
−1 p = 0.4
10
−2
10
−3 pc = 0.1464
10
−4
10 (a)
−5
10
p = 0.05
1 10 100 1000
D(t)/N
0.010
0.009 pc = 0.1464
0.008
(b)
0.007
0 100 200 300 400 500
t
Fig. 7.7 Normalized Hamming distance .D(t)/N for a Kauffman net with .N = 10, 000 variables,
connectivity .K = 4 and .D(0) = 100, viz .D(0)/N = 0.01. Top: Frozen phase (.p = 0.05),
critical (.pc ≃ 0.1464) and chaotic (.p = 0.4) phases, plotted with a logarithmic scale. Bottom:
Hamming distance for the critical phase (.p = pc ) but on a non-logarithmic scale. Reprinted from
Aldana-Gonzalez et al. (2003) with permissions, © 2003 Springer-Verlag New York, Inc
– If the damage, viz the difference in between .Σt and .Σ̃t , spreads for long times
to the opposite edge, then the system is said to be percolating and in the chaotic
phase.
– If the damage never reaches the opposite edge, then the system is in the frozen
phase.
Numerical Simulations The results of the mean-field solution for the Kauffman
net are confirmed by numerical solutions of finite-size networks. In Fig. 7.7 the
normalized Hamming distance, .D(t)/N, is plotted for a Kauffman graph with
connectivity .K = 4 containing .N = 10,000 elements. The, results are shown for
parameters corresponding to the frozen phase and to the chaotic phase, in addition
to a parameter close to the critical line. Note that .1 − a ∗ = D(t)/N → 0 in the
frozen phase.
The Kauffman model is a reference model which can be generalized in various ways,
e.g. by considering small-world or scale-free networks.
7.3 Dynamics of Boolean Networks 259
∞
∞ if 1 < γ ≤ 2
〈K〉 =
. K P (K) = ζ (γ −1) , (7.12)
K=1 ζ (γ ) < ∞ if γ >2
Annealed Approximation As before, we examine two states .Σt and .Σ̃t together
with the respective normalized overlap,
.a(t) = 1 − D(t)/N ,
which is identical to the probability that two vertices in .Σ and .Σ̃ have the same
value. For a magnetization bias p, we derived in Sect. 7.3.3 that
a(t + 1) = 1 − (1 − ρK ) 2p(1 − p)
. (7.13)
a ∗ = F (a ∗ ) ,
.
Stability of the Trivial Fixpoint We repeat the stability analysis of the trivial
fixpoint .a ∗ = 1, as in Sect. 7.3.3, and assume a small deviation .Δa > 0 from
∗
.a ,
The fixpoint .a ∗ becomes unstable if .F ' (a ∗ ) > 1. The critical point is determined by
∞
dF (a)
1 = lim
. = 2p(1 − p) K P (K)
a→1− da
k=1
For the classical N–K model all elements have the same connectivity, .Ki = 〈K〉 =
K, with (7.16) reducing to (7.10).
Frozen and Chaotic Phases for the Scale-Free Model For .1 < γ ≤ 2 the average
connectivity is infinite, see (7.12). .F ' (1) = 2p(1−p) .〈K〉 is then always larger than
unity and .a ∗ = 1 unstable, as illustrated in Fig. 7.8. Equation (7.15) then has a stable
fixpoint .a ∗ /= 1; the system is in the chaotic phase for all .p ∈]0, 1[.
For .γ > 2 the first moment of the connectivity distribution .P (K) is finite and
the phase diagram is identical to that of the N–K model shown in Fig. 7.6, with
.K replaced by .ζ (γc − 1)/ζ (γc ). The phase diagram in .γ –p space is presented in
Fig. 7.8. One finds that .γc ∈ [2, 2.5] for any value of p. There is no chaotic scale-
free network for .γ > 2.5. It is interesting to note that .γ ∈ [2, 3] for many real-world
scale-free networks.
We emphasized so far the general properties of boolean networks, such as the phase
diagram. We now turn to a more detailed inspection of the dynamics, in particular
regarding the structure of the attractors.
7.4 Cycles and Attractors 261
magnetization bias p
with connectivity distribution
.∝ K
−γ , as given by (7.16).
ordered
0
2 2.2 2.4
exponent γ
Self-Retracting Orbits From now on we consider quenched systems for which the
coupling functions .fi (σi1 , . . . , σiK ) are fixed for all times. Orbits will eventually
retrace themself, at least partly, since state space .Ω = 2N is finite. Long-term
trajectories are therefore cyclic.
see Figs. 7.3 and 7.9 for some examples. Fixed points are cycles of length 1.
ATTRACTION BASIN The attraction basin B of an attractor A0 is the set {Σt } ⊂ Ω for
which there is a time T < ∞ such that ΣT ∈ A0 .
σ1 σ17
σ2 σ2
σ3 σ1 σ6 σ3
σ4 σ18 σ1
σ5 σ18
σ6 σ1 σ17 σ4
σ7 σ17
σ8 σ18 σ13 σ16 σ7 σ18 σ5
σ9 σ13
σ10 σ15
σ11 σ11 σ9 σ19 σ8
σ15
σ12 σ11
σ13 σ16 σ10 σ20
σ14 σ11
σ15 σ7
σ16 σ19 σ11 σ14
σ17 σ16 σ2
σ18 σ7 σ12
σ19 σ15
σ20 σ10
Fig. 7.9 Cycles and linkages. Left: sketch of the state space where every bold point stands for a
state Σt = {σ1 , . . . , σN }. The state space decomposes into distinct attractor basins for each cyclic-
or fixpoint attractor. Right: linkage loops for an N = 20 model with K = 1. The controlling
elements are listed in the center column. Each arrow points from the controlling element toward
the direct descendant. There are three modules of uncoupled variables. Reprinted from Aldana-
Gonzalez et al. (2003) with permissions, © 2003 Springer-Verlag New York, Inc
ANCESTORS AND DESCENDANT. The elements a vertex affects consecutively via the
coupling functions are its descendants. Going backwards in time one finds the ancestors
of an element.
In the 20-site network illustrated in Fig. 7.9 the descendants of σ11 are σ11 , σ12
and σ14 .
When an element is its own descendant (and ancestor) it is said to be part of a
“linkage loop”. Different linkage loops can overlap, as is the case for the linkage
loops
.σ1 → σ2 → σ3 → σ4 → σ1 , σ1 → σ2 → σ3 → σ1
shown in Fig. 7.1. Linkage loops are disjoint for K = 1, compare Fig. 7.9.
7.4 Cycles and Attractors 263
Modules and Time Evolution The set of ancestors and descendants determines
the overall dynamical dependencies.
MODULE The collection of all ancestors and descendants of a given element σi is the
module (or component) to which σi belongs.
Relevant Nodes and Dynamic Core Taking a look at dynamics of the 20-site
model illustrated in Fig. 7.9, we notice that the elements σ12 and σ14 just follow the
dynamics of σ11 , they are “enslaved” by σ11 . These two elements do not control any
other element and one can delete them from the system without qualitative changes
to the overall dynamics.
RELEVANT NODE A node is termed relevant if its state is not constant and if it controls at
least one other relevant element (eventually itself).
Lattice Nets vs. Kauffman Nets Linkages are short-ranged for lattice systems and
whenever a given element σj acts as a controlling element for another element σi ,
there is a high probability that also the reverse holds, viz that σi is an argument
of fj .
264 7 Random Boolean Networks
The linkages are generally non-reciprocal for the Kauffman net; the probability
for reciprocity is K/N, which vanishes in the thermodynamic limit for finite K. The
number of disjoint modules in a random network therefore grows more slowly than
the system size. For lattice systems, on the other hand, the number of modules is
proportional to the size of the system. The differences between lattice and Kauffman
networks translate to different cycle structures, as every periodic orbit for the full
system is constructed out of the individual attractors of all modules present in the
network.
We start our discussion of the cycle structure of Kauffman nets with the case .K = 1,
which can be solved exactly. The maximal length .lmax for a linkage loop is on the
average of the order of
lmax ∼ N 1/2 .
. (7.17)
At every of the l steps of a sequence of length l, the next node .σi could be one of
the l sites visited previously. Overall, the typical cycle length is reached when the
probability to close, .∼ l (l/N ), becomes of order one, which leads to (7.17). This
derivation can be made rigorous following the line of arguments we will develop in
Sect. 7.4.4 for large-K nets.
Three-Site Linkage Loop with Identities Linkage loops determine the cycle
structure together with the choice of the coupling ensemble. As an example we
take the case .N = 3.
For .K = 1 there are only two non-constant coupling functions, i.e. the identity I
and the negation .¬. When all three coupling functions are the identity, we have
where we denoted with .A, B, C the values of the binary variables .σi , .i = 1, 2, 3.
There are two cycles of length one in which all elements are identical. When the
three elements are not identical, the cycle length is three. The complete dynamics is
then
Three-Site Linkage Loops with Negations Let us consider now the case that all
three coupling functions are negations:
The complete state space .Ω = 23 = 8 decomposes into two cycles, one of length 6
and one of length 2.
Three-Site Linkage Loops with a Constant Function Let us see what happens
if any of the coupling functions is a constant function. For illustration purposes we
consider the case of two constant functions 0 and 1 and the identity,
Generally it holds that the cycle length is one if any of the coupling functions is the
identity, in this case only a single fixpoint attractor exists. Equation (7.18) holds for
all .A, B, C ∈ {0, 1}; the basin of attraction for 001 is hence the whole state space,
with 001 being a global attractor.
The Kauffman net contains very large linkage loops for .K = 1, see (7.17).
The probability that a given linkage loop contains at least one constant function is
consequently very high, which implies that the average cycle length remains short.
Loops and Attractors Attractors are made up of the set of linkage loops, which
illustrate by means of a five-site network with two linkage loops,
A→
.
I
B→
I
C→
I
A, D→
I
E→
I
D,
00000,
. 00011, 11100, 11111
266 7 Random Boolean Networks
and
In general, the length of an attractor is given by the least common multiple of the
periods of the constituent loops. This relation holds for .K = 1 boolean networks,
for general K the attractors are composed of the cycles of the constituent set of
modules.
The .K = 2 Kauffman net is critical, as discussed in Sects. 7.3.1 and 7.3.2. When
physical systems undergo a second-order phase transition, right at the point of
transition response functions will be scale free, following power laws. It is therefore
natural to expect the same for critical dynamical systems, such as a random boolean
network.
Initially, this expectation was born out by a series of mostly numerical investiga-
tions, which indicated that both the typical cycle lengths, as well as the mean
√ number
of different attractors, would grow algebraically with N , namely like . N. It was
therefore tempting to relate power laws seen in natural organisms to the behavior of
critical random boolean networks.
Within mean-field theory, which holds for the fully connected .K = N network,
one can evaluate the average number and the length of cycles using probability
arguments.
Σ0 , Σ1 , Σ2 , . . .
.
Closing the Random Walk The walk through configuration space continues until
it hits a previously visited point, as illustrated in Fig. 7.11. We define with
– .qt : the probability that the trajectory remains unclosed after t steps;
– .Pt : the probability of terminating the excursion exactly at time t.
Including .Σ0 and .Σt , .t + 1 different sites have been visited if the trajectory is still
open at time t. Therefore, there are .t + 1 ways of terminating the walk at the next
time step and the relative probability of termination is .ρt = (t + 1)/Ω. The overall
268 7 Random Boolean Networks
Σ t+1 ρt
1−ρ t
Σ0 Σ1 Σ2 Σ3 Σt Σ t+1
Fig. 7.11 A random walk in configuration space. The relative probability .ρt = (t + 1)/Ω of
closing the loop at time t is the probability that .Σt+1 ≡ Σt ' , with a certain .0 ≤ t ' ≤ t
t +1
Pt+1 = ρt qt =
. qt .
Ω
In analogy, the probability for the trajectory to remain open after .t + 1 steps is given
by
t+1
t +1 i
qt+1 = qt (1 − ρt ) = qt
. 1− = q0 1− , q0 = 1 .
Ω Ω
i=1
t
t
i
qt =
. 1− ≈ e−i/Ω = e− i i/Ω
= e−t (t+1)/(2Ω) (7.19)
Ω
i=1 i=1
becomes exact. For large times t we may use .t (t + 1)/(2Ω) ≈ t 2 /(2Ω). The overall
probability
Ω ∞ t −t 2 /(2Ω)
. Pt ≃ dt e =1
0 Ω
t=1
Cycle Length Distribution The average number .〈Nc (L)〉 of cycles of length L is
Average Number of Cycles From (7.20) the mean number .N̄c of cycles can be
extracted,
N ∞
N̄c =
. 〈Nc (L)〉 ≃ dL 〈Nc (L)〉 . (7.21)
L=1 1
When going from the sum . L to the integral√ . dL in (7.21) we neglected terms of
order unity. Rescaling variables by .u = L/ 2Ω, one obtains
∞ ∞
e−u e−u
1 2 2
exp[−L2 /(2Ω)]
N̄c =
. dL = √ du + du .
L u u
1 1/ 2Ω 1
≡ I1 ≡ I2
∞ c ∞
A separation . 1/√2Ω = 1/√2Ω + c of the integral was performed, for simplicity
with .c = 1, any other finite value for c would also do the job. The second integral,
.I2 , does not diverge as .Ω → ∞. For .I1 we have
e−u
1 2 1 1 1
I1 =
. √ du = √ du 1 − u2 + u4 + . . .
1/ 2Ω u 1/ 2Ω u 2
√
≈ ln( 2Ω) ,
1
since all further terms .∝ 1/√2Ω du un−1 < ∞ for .n = 2, 4, . . . and .Ω → ∞. The
average number of cycles is then
N ln 2
N̄c = ln( 2N ) + O(1) ≃
. , (7.22)
2
which holds for .N = K Kauffman nets approaching the thermodynamic limit .N →
∞.
Mean Cycle Length On the average, the length .L̄ of a random cycle is
∞ ∞
1 1 exp[−L2 /(2Ω)]
L̄ =
. L 〈Nc (L)〉 ≈ dL L
N̄c L=1 N̄c 1 L
∞ √
1 2Ω ∞
dL e−L /(2Ω) = du e−u
2 2
= √ (7.23)
N̄c 1 N̄c 1/ 2Ω
270 7 Random Boolean Networks
√
after rescaling with .u = L/ 2Ω and using (7.20). The last integral on the right-
hand-side of (7.23) converges for .Ω → ∞, with the consequence that the mean
cycle length .L̄ scales as
L̄ ∼ Ω 1/2 /N = 2N/2 /N ,
. (7.24)
7.5 Applications
7.5.1 Living at the Edge of Chaos
Gene expression networks of real-world cells are not random. However, the web
of linkages and connectivities among the genes in a living organism is intricate
and one may considered it to be a good 0-th order approximation to model gene-
gene interactions as random. The purpose would be to gain generic insights into the
properties of gene expression networks, namely results that are independent of the
particular set of linkages and connectivities realized in a particular living cell.
.L̄ ∼ 2αN ,
e.g. for N = K where α = 1/2, see (7.24). Considering that N ≈ 20,000 for the
human genome, an exponentially large mean cycle length L̄ is somewhat unsettling,
a single cell would take the universe’s lifetime to complete just a single cycle. It
then follows that operational gene expression networks of living organisms cannot
be in the chaotic phase.
7.5 Applications 271
Living at the Edge of Chaos There are but two possibilities left if the gene
expression network cannot operate in the chaotic phase, the frozen phase together
with the critical point. In the frozen phase, the average cycle length is short and the
dynamics stable, see Sect. 7.4.2. The system is consequently resistant to damage of
linkages.
But what about Darwinian evolution? Is too much stability good for the
adaptability of cells in a changing environment? Kauffman suggested that gene
expression networks would operate “at the edge of chaos”, an expression that
became legendary. By this Kauffmann meant that networks close to criticality
benefit from the stability properties of the close-by frozen phase, being at the same
time sensible to changes in the network structure, such that Darwinian adaption
remains possible.
But how can a system reach criticality by itself? For the N–K network there is
no extended critical phase, only a single critical point, K = 2. One speaks of “self-
organized criticality” when internal mechanisms allow adaptive systems to evolve
autonomously in such a way that they approach the critical point.9 This would be the
case if Darwinian evolution trims the gene expression networks towards criticality.
Cells close to the critical point would have the highest fitness, as cells in the chaotic
phase die because they are operationally unstable, with cells deep in the frozen
phase being selected out in the course of time, given that they are unable to adapt to
environmental changes.
Cell Division Process Cells have two tasks: to survive and to multiply. When a
living cell grows too big, a cell division process starts. The cell cycle has been
studied intensively for budding yeast. In the course of the division process, the cell
goes through a distinct set of states
G1 → S → G2 → M → G1 ,
.
with .G1 being the “ground state” in physics slang, viz the normal cell state; the
chromosome division takes place during the M phase. These states are characterized
by distinct gene activities, i.e. by the kinds of proteins active in the cell. All
eukaryote cells have similar cell division cycles.
9 See Chap. 6.
272 7 Random Boolean Networks
Clb1,2
Cdh1 Mcm1
Cdc20 Swi5
Yeast Gene Expression Network From the ∼ 800 genes involved only 11–13 core
genes are actually responsible for regulating the part of the gene expression network
responsible for the division process; all other genes are more or less just descendants
of the core genes. The cell dynamics contains certain checkpoints, where the cell
division process can be stopped if something was to go wrong. When eliminating the
checkpoints a core network with only 11 elements remains, as shown in Fig. 7.12.
Boolean Dynamics The full dynamical dependencies are not known in detail for
the yeast gene expression network. The simplest model is to assume
1 if ai (t) > 0
σi (t + 1) =
. , ai (t) = wij σj (t) , (7.25)
0 if ai (t) ≤ 0 j
i.e. a boolean dynamics10 for the binary variables σi (t) = 0/1 representing the
activation/deactivation of protein i, with couplings wij = ± 1 for an excitatory/inhi-
bitory functional relation.
10 Genes are boolean variables in the sense that they are either expressed or not. The quantitative
amount of proteins produced by a given active gene is regulated via a separate mechanism involving
microRNA, small RNA snippets.
7.5 Applications 273
Fig. 7.13 The yeast cell cycle as an attractor trajectory of the gene expression network. Out of the
211 = 2, 048 states in phase space, shown are the 1764 states (green dots), that make up the basin
of attraction of the biologically stable G1 -state (bottom). After starting with the excited G1 normal
state (the first state in the biological pathway), compare Fig. 7.12, the boolean dynamics runs
through the known intermediate states (blue arrows), until the G1 states attractor is again reached,
this time representing the two daughter cells. Reprinted from Li et al. (2004) with permissions,
© 2004 by The National Academy of Sciences, U.S.A
Fixpoints The 11-site network has 7 attractors, all cycles of length one, viz
fixpoints. The dominating fixpoint has an attractor basin of 1764 states, representing
about 72% of the state space Ω = 211 = 2,048. Remarkably, the protein activation
pattern of the dominant fixpoint corresponds exactly to that of the experimentally
determined G1 ground state of the living yeast cell.
Cell Division Cycle In the G1 ground state the protein Cln3 is inactive. When
the cell reaches a certain size it becomes expressed, i.e. it becomes active. For the
network model one then just starts the dynamics by setting
.σCln3 → 1, at t =0
in the G1 state. The ensuing simple boolean dynamics, induced by (7.25), is depicted
in Fig. 7.13.
274 7 Random Boolean Networks
The remarkable result is that the system follows an attractor pathway that runs
through all experimentally known intermediate cell states, reaching the ground state
G1 in 12 steps.
Nevertheless, the yeast protein network shows more or less the same susceptibil-
ity to damage as a random network. The core yeast protein network has an average
connectivity of 〈K〉 = 27/11 ≃ 2.46. The core network has only N = 11 sites, a
number far too small to allow comparison with the properties of N–K networks in
the thermodynamic limit N → ∞. Nevertheless, an average connectivity of 2.46 is
remarkably close to K = 2, i.e. the critical connectivity for N–K networks.
Life as an Adaptive Network Living beings are complex and adaptive dynamical
systems.11 The here discussed insights in the yeast gene expression network indicate
that this statement is not just an abstract notion. Adaptive regulative networks
constitute the core of all living.
– ENSEMBLE ENCODING
Ensemble encoding entails that the activity of a sensory input is transmitted
via the firing of certain ensembles of neurons. Distinct sensory inputs, like the
different smells sensed by the nose, is encoded by dedicated neural ensembles.
– TIME ENCODING
Time encoding is present if the same neurons transmit more than one piece of
sensory information by changing their respective firing patterns.
Cyclic attractors are an obvious tool to generate time encoded information. One
would envision that appropriate initial conditions corresponding to certain activity
patterns of the primary sensory organs start the dynamics of neural and/or random
boolean networks, which will settle eventually into a cycle, as discussed in Sect. 7.4.
The random network may then be used to encode initial firing patterns by the time
a b
Ensembles of neurons
Fig. 7.14 Illustration of ensemble and time encoding (left/right). Left: Neuronal receptors
corresponding to the same class of input signals are combined, as occuring in the nose for different
odors. Right: The primary input signals are mixed together by a random neural network close to
criticality, the relative weights of odor components are subsequently time encoded by the output
signal
sequence of neural activities resulting from the firing patterns of the corresponding
limiting cycle, see Fig. 7.14.
The chaotic phase is unsuitable for information processing, any input results
in an unbounded response and saturation. The response in the frozen phase is
strictly proportional to the input and is therefore well behaved, but also relatively
uninteresting. The critical state, on the other hand, has the possibility of nonlinear
signal amplification.
Sensory organs in animals can routinely process physical stimuli, such as light,
sound, pressure, or odorant concentrations, which vary by many orders of magnitude
in intensity. The primary sensory cells, e.g. the light receptors in the retina, have,
however a linear sensibility to the intensity of the incident light, with a relatively
small dynamical range. It is therefore conceivable that the huge dynamical range
of sensory information processing of animals is a collective effect, as it occurs in a
random neural network close to criticality. This mechanism, which is plausible from
the view of possible genetic encoding mechanisms, is illustrated in Fig. 7.15.
276 7 Random Boolean Networks
response
response
stimuli stimuli
Fig. 7.15 The primary response of sensory receptors can be enhanced by many orders of
magnitude using the non-linear amplification properties of a random neural network close to
criticality
Exercises
– f1 = f2 = f3 = identity,
– f1 = f2 = f3 = negation, and
– f1 = f2 = negation, f 3 = identity.
with I denoting the identity coupling and ¬ the negation, compare Sect. 7.2.2.
Find all attractors by considering first the dynamics of the individual linkage
loops. Is there any state in phase space which is not part of any cycle?
Further Reading 277
1
T N
1
.Ψ = lim |s(t)| dt, s(t) = lim σi (t) .
T →∞ T 0 N→∞ N
i=1
See Huepe and Aldana-González (2002) and additional hints in the solutions
section.
(7.7) LOWER BOUND FOR BOND PERCOLATION
When covering the individual bonds of a d-dimensional hypercubic lattice
with a probability p there is giant connected component above a certain
critical pc (d) and none below. Define with
! 1/n "
λ(d) = lim
. σn (d) (7.26)
n→∞
the connectivity constant λ(d) of the original lattice, where σ (n) is the number
of paths of length n starting from a given site. Find an upper bound for σn (d)
and hence for the connectivity constant. For percolation to happen the bond
probability p must be at least as large as 1/λ(d). Why? Use this argument to
find a lower bound for pc (d).
Further Reading
The interested reader may want to take a look at Kauffman’s seminal work
on random boolean networks, Kauffman (1969), or to study his well-known
book, Kauffman (1993). For reviews on boolean networks please consult Aldana-
Gonzalez et al. (2003) and Schwab et al. (2020). See also Hopfensitz et al. (2013),
for a tutorial on attractors in boolean nets, and Wang et al. (2012), for boolean
network modeling of biological systems.
Original studies of potential interest include a numerical investigation of Kauff-
man nets, Bastolla and Parisi (1998), together with an investigation of the scaling
of the number of attractors with size, Samuelsson and Troein (2003). The modeling
278 7 Random Boolean Networks
of the yeast reproduction cycle by boolean networks is given in Li et al. (2004). For
the concept of observational scale invariance, and of nonlinear signal amplification
close to criticality, see respectively Marković and Gros (2013) and Kinouchi and
Copelli (2006).
References
Aldana-Gonzalez, M., Coppersmith, S., & Kadanoff, L. P. (2003). Boolean dynamics with random
couplings. In E. Kaplan, J. E. Marsden, & K. R. Sreenivasan (Eds.), Perspectives and problems
in nonlinear science. A celebratory volume in honor of Lawrence Sirovich. Springer Applied
Mathematical Sciences Series (pp. 23–89). Springer.
Bastolla, U., & Parisi, G. (1998). Relevant elements, magnetization and dynamical properties in
Kauffman Networks: A numerical study. Physica D, 115, 203–218.
Hopfensitz, M., Müssel, C., Maucher, M., & Kestler, H. A. (2013). Attractors in Boolean networks:
a tutorial. Computational Statistics, 28, 19–36.
Huepe, C., & Aldana-González, M. (2002). Dynamical phase transition in a neural network model
with noise: An exact solution. Journal of Statistical Physics, 108, 527–540.
Kauffman, S. A. (1969). Metabolic stability and epigenesis in randomly constructed nets. Journal
of Theoretical Biology, 22, 437–467.
Kauffman, S. A. (1993). The origins of order: Self-organization and selection in evolution. Oxford
University Press.
Kinouchi, O., & Copelli, M. (2006). Optimal dynamical range of excitable networks at criticality.
Nature Physics, 2, 348–352.
Li, F., Long, T., Lu, Y., Ouyang, Q., & Tang, C. (2004). The yeast cell-cycle network is robustly
designed. Proceedings of the National Academy Science, 101, 4781–4786.
Luque, B., & Sole, R.V. (2000). Lyapunov exponents in random Boolean networks. Physica A,
284, 33–45.
Marković, D., & Gros, C. (2013). Criticality in conserved dynamical systems: Experimental
observation vs. exact properties. Chaos, 23, 013106.
Samuelsson, B., & Troein, C. (2003). Superpolynomial growth in the number of attractors in
Kauffman networks. Physical Review Letters, 90, 098701.
Schwab, J. D, et al. (2020). Concepts in Boolean network modeling: What do they all mean?
Computational and Structural Biotechnology Journal, 18, 571–582.
Wang, R. S., Saadatpour, A., & Albert, R. (2012). Boolean modeling in systems biology: An
overview of methodology and applications. Physical Biology, 9, 055001.
Darwinian Evolution, Hypercycles and Game
Theory 8
8.1 Introduction
Population Genetics The ecosystem of the earth is a complex and adaptive system.
It formed via Darwinian evolution through species differentiation and adaptation
to a changing environment. Reproduction success is based on a set of inheritable
traits, the genome, that is passed from parent to offspring, with the interplay
between random mutations and natural selection playing a central role. This process
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 279
C. Gros, Complex and Adaptive Dynamical Systems,
https://doi.org/10.1007/978-3-031-55076-8_8
280 8 Darwinian Evolution, Hypercycles and Game Theory
– POPULATION
Mostly we will assume that the number of individuals M does not change with
time, which models the competition for a limited supply of resources.
– GENOME
A genome of size N encodes the inheritable traits by a set of N binary variables,
s = (s1 , s2 , . . . , sN ),
. si = ±1 .
We will treat predominantly models for asexual reproduction, though most concepts
can be easily generalized to the case of sexual reproduction. In Table 8.1 some
Table 8.1 Genome size N and the spontaneous mutation rates μ, compare (8.2), per base for
two RNA-based bacteria and DNA-based eukaryotes. From Jain and Krug (2006) and Drake
et al. (1998).
Organism Genome size Rate per base Rate per genome
Bacteriophage Qβ 4.5 × 103 1.4 × 10−3 6.5
Bacteriophage λ 4.9 × 104 7.7 × 10−8 0.0038
E. Coli 4.6 × 106 5.4 × 10−10 0.0025
C. Elegans 8.0 × 107 2.3 × 10−10 0.018
Mouse 2.7 × 109 1.8 × 10−10 0.49
Human 3.2 × 109 5.0 × 10−11 0.16
1 Evolutionary
processes transcending a single species, like the radiation of lineages, would be the
domain of “macroevolution”.
8.1 Introduction 281
Gene A Gene B
Enzyme A Enzyme B
X Y Z
white brown black
Fig. 8.1 A simple form of epistatic interaction occurs when the influence of one gene builds on
the outcome of another. In this fictitious example black hair can only be realized when the gene for
brown hair is also present
typical values for the size N of genomes are listed. Note the three orders of
magnitude between simple eukaryotic life forms and the human genome.
State of the Population The state of the population at time t can be described by
specifying the genomes of all the individuals,
to the definition of the number Xs of individuals with genome s for each of the
2N points s in genome space. Typically, most of these occupation numbers vanish,
biological populations are extremely sparse in genome space.
– GENOTYPE
The genotype of an organism is the class to which that organism belongs as
determined by the DNA that was passed to the organism by its parents at
conception.
– PHENOTYPE
The phenotype of an organism is the class to which that organism belongs as
determined by the physical and behavioral characteristics of the organism, for
example its size and shape, its metabolic activities and its pattern of movement.
Strictly speaking, selection acts upon phenotypes, but only the genotype is
bequeathed. Variations in phenotypes act therefore as a source of noise for the
selection process. Pheno- and genotypes are set to be identical in the following.
1 2 3 4 5 6
S S S S S S t
Constant Mutation Rates We furthermore assume that the mutation rates are
Alternative frameworks would require refined modeling, which is beyond our scope.
– REPRODUCTION
The individual α at generation t is the offspring of an individual α living at
generation t − 1. Reproduction is thus represented as a stochastic map
α
. −→ α = Gt (α) ,
Most mutations are neutral in real life, which leads to a “neutral regime” when
selection pressure is low, as further discussed in Sect. 8.4.
284 8 Darwinian Evolution, Hypercycles and Game Theory
Point Mutations and Mutation Rate The basic theory is based on independent
point mutations, namely that every element of the genome is modified independently
of the other elements,
Gt (α)
siα (t) = −si
. (t − 1) with probability μ , (8.2)
Fitness and Fitness Landscape The fitness W (s), also called “Wrightian fitness”,
of a genotype trait s is proportional to the average number of offspring an individual
possessing the trait s has. It is strictly positive and can therefore be written as
Selection acts in first place upon phenotypes, but we neglect here the difference,
considering the variations in phenotypes as a source of noise, as discussed above.
Several key parameters and functional dependencies enter (8.3).
The inverse selection temperature takes it name from physics slang.3 For the
Malthusian fitness, one rewrites (8.3) as W (s) = ew(s)Δt , where Δt is the generation
time. We will work here with discrete time, viz with non-overlapping generations,
making use only of the Wrightian fitness W (s).
Fig. 8.3 Illustration of idealized (smooth) one-dimensional model fitness landscapes F (s). In
contrast, real-world fitness landscapes are likely to contain discontinuities. Left: A fitness
landscape with peaks and valleys, metaphorically denoted “rugged landscape”. Right: A fitness
landscape containing a single smooth peak, as described by (8.19)
Fitness Ratios The assumption of a constant population size makes the reproduc-
tive success a relative notion. Only the ratios
The theorem states that the average fitness of the population cannot decrease in
time under these circumstances, and that the average fitness becomes stationary only
when all individuals in the population have the maximal reproductive fitness.
The proof is straightforward. We define with
1 1
M
.W t ≡ W (sα (t)) = W (s)Xs (t) , (8.5)
M M s
α=1
286 8 Darwinian Evolution, Hypercycles and Game Theory
W (s)
Xs (t + 1) =
. Xs (t) , (8.6)
W t
where W (s)/W t is the relative reproductive success. The overall population size
remains constant,
1
. Xs (t + 1) = Xs (t)W (s) = M ,
s
W t s
1
1 2
s W (s)Xs (t)
W t+1
. = W (s)Xs (t + 1) = M
M s 1
M s W (s )Xs (t)
W 2 t
= ≥ W t ,
W t
W t+1 = W t ,
. W 2 t = W 2t ,
is only realizable when all individuals 1 . . . M in the population have the same
fitness. Modulo degeneracies, this implies identical genotypes.
Baldwin Effect Variations in the phenotype may be induced not only via stochastic
influences of the environment, but also through adaption of the phenotype itself to
the environment, viz through cognitive learning. Learning can speed up evolution
whenever the underlying fitness landscape is very rugged, by smoothing it out and
providing a stable gradient towards the genotype with the maximal fitness. One
speaks of the “Baldwin effect”.
Mutations are random events, which implies that evolution is inherently a stochastic
process. This does not hold in the limit of infinite population size .M → ∞,
for which stochastic fluctuations average out, becoming irrelevant. In this limit
the equations governing evolution are deterministic and governed only by average
transition rates. This allows to study in detail the condition necessary for adaptation
to occur for various mutation rates.
8.3 Deterministic Evolution 287
or
xs (t)W (s )Qμ (s → s)
s
xs (t + 1) =
. , W t = Ws xs (t) , (8.8)
W t
s
Xs (t)
xs (t) =
. , xs (t) = 1 .
M s
The evolution dynamics (8.8) retains the overall size s Xs (t) of the population,
due to the normalization of the mutation matrix Qμ (s → s), via (8.7).
N
(si − si )2 N 1
N
.dH (s, s ) = = − si si (8.9)
4 2 2
i=1 i=1
measures the number of units that are different in two genome configurations s and
s , e.g. before and after the effect of a mutation event.
Mutation Matrix for Point Mutations We examine the effects of the simplest
mutation process, namely the case that a genome of fixed length N is affected by
random transcription errors afflicting only individual loci. For this case, namely
288 8 Darwinian Evolution, Hypercycles and Game Theory
Qμ (s → s) ∝ exp
. log(μ) − log(1 − μ) dH
∝ exp β si si , (8.11)
i
The relation of the evolution equation (8.11) to the partition function of a thermo-
dynamical system, hinted at by the terminology “inverse temperature” will become
evident below.
Evolution Equations for Point Mutations With the help of the exponential
representations for both the fitness, W (s) = exp[kF (s)], see (8.3,) and for the
mutation matrix Qμ (s → s), we may express the evolution equation (8.8) as
1
.xs (t + 1) = xs (t) exp β si si + kF (s ) (8.12)
Nt
s i
ys (t)
xs (t) =
. . (8.13)
s ys (t)
The idea is that the normalization s ys (t) can be selected freely for every
generation t = 1, 2, 3, . . .. The evolution (8.12) becomes
.ys (t + 1) = Zt ys (t) exp β si si
+ kF (s ) , (8.14)
s i
where
1 s ys (t + 1)
.Zt = .
Nt s ys (t)
The form
ys (t + 1) =
. eβH [s,s ] ys (t)
s
which is a function of the binary variables s and s . Given that s and s run over the
genome space of subsequent generations, we may rename variables, s → s(t + 1)
and s → s(t), which leads to the notation
.ys(t+1) = eβH [s(t+1),s(t)] ys(t) . (8.16)
s(t)
4 In classical thermodynamics, the Hamiltonian H (α), or energy function, determines the proba-
bility that a given state α is populated. This probability is proportional to the Boltzmann factor
exp(−βH (α)), where β = 1/(kB t) is the inverse temperature.
290 8 Darwinian Evolution, Hypercycles and Game Theory
The two dimensions are genome space, i = 1, . . . , N, and time t, viz the sequence
of subsequent generations. This expression will be used in the next sections.
N N
F (s) =
. hi si , W (s) = ek hi si , (8.19)
i=1 i=1
F (s) = s0 · s,
. s0 = (h1 , h2 , . . . , hN ) .
The fitness of a given genome .s is directly proportional to the scalar product with
the master sequence .s0 , with a well defined gradient pointing towards the optical
genom.
The Fujiyama Hamiltonian Epistatic interactions are absent in the smooth peak
landscape (8.19). In terms of the corresponding Hamiltonian, see (8.18), this fact
expresses itself as
N khi
βH = β
. Hi , Hi = si (t + 1)si (t) + si (t) . (8.20)
t
β t
i=1
5 Any system of binary variables is equivalent to a system of interacting Ising spins, which retains
only the classical contribution to the energy of interacting quantum mechanical spins (the magnetic
moments).
8.3 Deterministic Evolution 291
Transfer Matrix The Hamiltonian (8.20) does not contain interactions between
different loci of the genome. It is enough, as a consequence, to examine the evolution
of a single gene locus i, together with the associated Hamiltonian Hi . We define with
y+ (t)
y = y+ (t), y− (t) ,
. yT =
y− (t)
For the last expression s, s = ±1 was used, together with the symmetrized form
khi
βHi = β
. si (t + 1)si (t) + si (t + 1) + si (t)
t
2 t
of the one-dimensional Ising model. The matrix T transfers the state of the system
from one generation to the next, from time t to t + 1, hence the naming.
hi ≡ 1,
. s0 = (1, 1, . . . , 1) ,
The entire population is hence within a finite distance of the optimal genome
sequence s0 whenever y− /y+ < 1, viz for ω > 0. We recall that
1 1−μ
.β = log , W (s) = ekF (s) ,
2 μ
where μ is the mutation rate for point mutations. Thus we see that there is some
degree of adaptation whenever the fitness landscape does not vanish (k > 0). This is
the case even for a maximal mutation rate μ → 1/2, which corresponds to β → 0.
The result of the previous Sect. 8.3.2, i.e. the occurrence of adaptation in a smooth
fitness landscape for any non-trivial model parameter, is due to the absence of
epistatic interactions in the smooth fitness landscape. Epistatic interactions intro-
duce a phase transition to a non-adapting regime once the mutation rate becomes
too high.
8.3 Deterministic Evolution 293
Sharp Peak Landscape One possibility to study this phenomenon is the limiting
case of strong epistatic interactions. In this limit, individual elements of the
genotype do not provide information on the value of the fitness, as per se, which
is defined by
1 if s = s0
W (s) =
. , (8.24)
1−σ otherwise
with .σ > 0. In this case, all genome sequences but one have the identical fitness,
which is lower than the fitness of the master sequence .s0 . The corresponding
landscape .F (s), defined by .W (s) = ekF (s) , is equally discontinuous, with (8.24)
corresponding to a fitness landscape with a “tower”. This landscape is devoid of
gradients pointing towards the master sequence of maximal fitness.
At first sight, the presence of epistatic interactions in (8.24) may not be evident.
For this we recall that the fitness landscape .F (s) of beanbag genetics is strictly
additive. Epistatic interactions are hence present whenever the fitness .F (s) is not
strictly multiplicative, which is the case for the sharp-peak model (8.24).
Distance from Optimality We define by .xk the fraction of the population whose
genotype has a Hamming distance k from the preferred genotype,
1
. xk (t) = δdH (s,s0 ),k Xs (t) .
M s
The evolution equations can be formulated entirely in terms of the .xk , which
correspond to the fraction of the population being k point mutations away from
the master sequence.
Infinite Genome Limit For the limit .N → ∞ we rescale the rate for point
mutations, .μ, as
.μ = u/N , (8.25)
u = Nμ
.
occurring per genome remains finite. Experimentally, this approximation holds well
for microbial taxa, as “Drake’s rule” states, with .u ≈ 0.003 for base substitutions.
Absence of Back Mutations Starting from the optimal genome .s0 we examine the
effect of successive mutations. Successful mutations increase the distance k from
the optimal genome .s0 . Assuming .u 1 in (8.25) leads to several consequences.
294 8 Darwinian Evolution, Hypercycles and Game Theory
x0 x1 x2 x3 x4
Fig. 8.4 The linear chain model for the tower landscape, as defined by (8.26), (8.27), and (8.28),
with k denoting the number of point mutations necessary to reach the optimal genome. The
population fraction .xk (t + 1) is influenced only by .xk−1 (t) and .xk (t), its own value at time t
k
. 1,
N −k
Linear Chain Model We have two parameters; u, which measures the mutation
rate, and .σ , setting the strength of the selection. Remembering that the fitness .W (s)
is proportional to the number of offspring, see (8.24), we find
1
.x0 (t + 1) = x0 (t) (1 − u) , . (8.26)
W
1
x1 (t + 1) = ux0 (t) + (1 − u) (1 − σ ) x1 (t) , . (8.27)
W
1
xk (t + 1) = uxk−1 (t) + (1 − u)xk (t) (1 − σ ) , k > 1, (8.28)
W
where .W is the average fitness. These equations describe a linear chain model,
as illustrated in Fig. 8.4. The population of individuals with the optimal genome .x0
is constantly losing members, due to mutations. But it also has a higher number of
offspring than all other populations, due to its larger fitness.
W = x0 + (1 − σ )(1 − x0 ) = 1 − σ (1 − x0 ) .
. (8.29)
8.3 Deterministic Evolution 295
When looking for the stationary distribution .{xk∗ }, one notes that the equation for .x0∗
does not involve .xk∗ with .k > 0,
x0∗ (1 − u)
x0∗ =
. , 1 − σ (1 − x0∗ ) = 1 − u ,
1 − σ (1 − x0∗ )
which leads to
∗ 1 − u/σ if u < σ
.x0 = , (8.30)
0 if u ≥ σ
due to the normalization condition .0 ≤ x0∗ ≤ 1. For .u > σ the model becomes ill
defined. The stationary solutions for the .xk∗ follow, for .k = 1 one has
u
x1∗ =
. x0∗ ,
1 − σ (1 − x0∗ ) − (1 − u)(1 − σ )
compare (8.27) and (8.29). The stationary probability .xk∗ to find genomes at a
generic distance from optimality, .k > 1, is
(1 − σ )u
xk∗ =
. x∗ , (8.31)
1 − σ (1 − x0∗ ) − (1 − u)(1 − σ ) k−1
u=σ
.
being the transition point. The density .x0∗ of optimal genomes can be taken as an
order parameter,6 being zero for .u ≥ σ , and finite otherwise. In physics language,
epistasis corresponds to many-body interactions, which can be thought to induce a
phase transition in the sharp peak model. In this view, the absence of many-body
interactions in the smooth fitness landscape model is the underlying reason for the
likewise absence of distinct phases, compare Sect. 8.3.2.
σ (1 − x0∗ ) = σ (1 − 1 + u/σ ) = u
.
6 Order parameters characterize the different between ordered and disordered phases, as explained
which is summable whenever .u < σ . In this adaptive regime the population forms
what Manfred Eigen denoted a “quasispecies”, see Fig. 8.5.
Wandering Regime and Error Threshold In the regime of a large mutation rate,
u > σ , we have .xk∗ = 0, .∀k, which is equivalent to a situation where the population
.
is distributed in an essentially uniform way over genotype space. Fitness will drop
to very low levels, given that the whole population lies far away from the preferred
genotype. The size of the population will drop equally, until the preconditions of
above analysis are no longer valid and a “wandering regime” is reached. In this
state, the effects of finite population size are prominent.
8.4 Finite Populations and Stochastic Escape 297
ERROR CATASTROPHE The transition from the adaptive (quasispecies) regime to the
wandering regime is denoted the “error threshold” or “error catastrophe”.
Drift Barrier Hypothesis So far, our line of arguments has been based on short-
term adaptive pressures, namely on a fitness landscape that is defined in relation
to a static environment. In this situation, the more individuals possess the optimal
genome, the better. However, environments change on longer time scales, which
is bad news for a well adapted species. Too few variations in the extant gene pool
severely restricts adaptability to new environmental forcings.
M
uM =
. μ
N
where .μ is the mutation rate per base pair, in analogy to (8.2). If genetic variability
can be assumed to be similar between species, it follows that optimal mutation rates
scale as .1/M, viz inversely with population size. This is indeed seen in multicellular
eukaryotic species. There is however a catch, .u ∼ 1/M will cross the “drift barrier”,
as the error threshold is called in this context, when the population size M becomes
too small. This is the “drift barrier hypothesis”, which can be seen as an instance
of Kauffman’s notion of “life at the edge of chaos”.7 Both concepts deal with the
tradeoff between short- and long-term survivability.
NEUTRAL REGIME The stage where evolution is driven essentially by random mutations
is called the neutral, wandering, or drifting regime.
Our first inroad into the population dynamics of finite populations is based on three
basic assumptions.
– Finite populations.
– Strong selective pressure.
– Small mutation rates.
Strong selective pressure implies that one can represent the population by a single
point in genome space, namely that the genomes of all individuals are taken to be
equal.
8 Coevolutionary avalanches are likely to lead to a self-organized critical state, as shown in Sect. 6.6
of Chap. 6.
9 A general account of the theory of stochastic escape is given in Sect. 3.5.2 of Chap. 3.
8.4 Finite Populations and Stochastic Escape 299
N=2
1
0
Fig. 8.6 Local fitness optima in a one-dimensional random fitness distribution. The illustration
shows that random distributions may exhibit enormously large numbers of local optima (filled
circles). For these, all sequences with a hamming distance of one point mutations have lower
fitness values
– At each time step, only a single locus of the genome of a certain individual
mutates.
– If the mutation leads to a genotype with higher fitness, the new genotype spreads
rapidly throughout the entire population, which moves altogether to the new
position in genome space.
– If the fitness of the new genotype is lower, the mutation is rejected and the
population remains at the old position.
Physicists would call this type of dynamics a Monte Carlo process at zero
temperature. Further below we will relax the condition that unfavorable mutations
are relaxed.
Random Energy Model As formulated so far, our rules do not allow to transverse
a valley with low fitness, viz to pass from one local optimum to the next. As a first
step it hence important to investigate the statistical properties of local optima, which
depend on the properties of the fitness landscape. A suitable approach is to assume
that fitness values are randomly distributed.
RANDOM ENERGY MODEL The fitness landscape .F (s) is uniformly distributed between
zero and one.
Local Optima in the Random Energy Model Let us denote by N the number of
genome elements. The probability that a sequence with fitness .F (s) has a higher
fitness than all its N neighbors is given by
F N = F N (s) ,
.
300 8 Darwinian Evolution, Hypercycles and Game Theory
has a fitness less than F . The probability that a point in genome space is a local
optimum is consequently given by
1 1
Plocal optimum =
. F N dF = ,
0 N +1
since the fitness F is equally distributed in .[0, 1]. There are therefore many local
optima, namely .2N /(N +1). A schematic picture of the large number of local optima
in a random distribution is given in Fig. 8.6.
Time Needed for One Successful Mutation Even though the number of success-
ful mutations (8.33) needed to arrive at the local optimum is small, the time needed
to climb to the local peak can be very long. In first place, a given mutation must be
successful before it can contribute to the climbing process. For the random energy
8.4 Finite Populations and Stochastic Escape 301
Fitness F
difficult it becomes to climb
further. With the escape
probability .pesc the
population jumps somewhere
else, escaping from the local
optimum
different genotypes
model this is only rarely the case close to the top, because mutations lead to a
randomly distributed fitness of the offspring.
We define with
.tF = n Pn , n : number of generations ,
n
the average number of generations necessary for a population with fitness F to attain
a single successful mutation. We denoted with
Pn = (1 − F )F n−1
.
the probability that it takes exactly n generations, which is the case if .n − 1 mutation
attempts with lower fitness are followed by one with fitness in .[F, 1]. The climbing
time is evaluated as
∞
∞
1−F 1−F ∂ n
.tF = nF =n
F F
F F ∂F
n=0 n=0
∂ 1 1
= (1 − F ) = . (8.34)
∂F 1 − F 1−F
typ
Topt = tF
. 2
=0
302 8 Darwinian Evolution, Hypercycles and Game Theory
the expected total number of generations to arrive at a local optimum, where . typ
is the typical number of successful mutations needed to arrive at a local optimum,
see (8.33). Using (8.34) for the average number of generations necessary for a single
successful mutation, .tF , we obtain
1 − 2 typ +1
Topt = tF
. ≈ tF 2 typ +1 = tF e( typ +1) log 2
1−2
N
≈ tF elog N = ≈ 2N , (8.35)
1−F
where we used . typ + 1 = log N/ log 2, setting .F ≈ 1/2 in the last step, our choice
for a typical starting fitness. When starting randomly, the number of generations
needed to climb to a local maximum in the random fitness landscape is hence
proportional to the length of the genome.
The focus of Sect. 8.4.1 has been on the average properties of adaptive climbing. We
now take fluctuations in the reproductive process into account. Our aim to compare
two typical time scales, the number of generations needed for a stochastic escape
and for adaptive climbing.
Mutations occur with probability u per genome, leading with probability F for
descendants having a lower fitness. The probability for this to happen, viz the
probability .pesc of stochastic escape, is consequently
pesc ∝ (F u)M ≈ uM ,
. F ≈ 1.
The escape can only happen when an adverse mutation occurs for every member of
the population within the same generation. A single individual not mutating retains
a higher fitness, the one of the present local optimum, with the consequence that all
other mutations are discarded when the selective pressure is strong, as assumed here.
The same holds whenever a single positive mutation in .[F, 1] shows up. Compare
also Fig. 8.7).
a = 1 − (1 − F )u
.
the probability that the fitness of an individual does not increase with respect to
the current fitness F of the population. The probability .qpos that at least one better
genotype is found is then given by
qpos = 1 − a M .
.
– ADAPTIVE WALK
The escape probability .pesc is much smaller than the probability to increase the
fitness, .qpos pesc . The population continuously increases its fitness via small
mutations.
– WANDERING REGIME
The adaptive dynamics slows down close to a local optimum. The probability
of stochastic escape .pesc may then become comparable to that of an adaptive
process, .pesc ≈ qpos .
In the “drifting regime”, as the second case is also called, the population wanders
around in genome space, starting a new adaptive walk after every successful escape.
Typical Escape Fitness The fitness F increases steadily during the adaptive walk
regime, until it reaches a certain typical fitness, .Fesc , for which the probability of
stochastic escape becomes substantial, i.e. when .pesc ≈ qpos ,
As .1 − Fesc is small, we can set .(Fesc u)M ≈ uM and expand the right-hand-side of
above expression with respect to .1 − Fesc ,
obtaining
The fitness .Fesc necessary for the stochastic escape to become relevant is exponen-
tially close to the global optimum .F = 1 for large populations M.
304 8 Darwinian Evolution, Hypercycles and Game Theory
1 uM−1 1 uM−1
Ftyp = 1 −
. ≡ Fesc = 1 − , = ,
N M N M
where we have used (8.32) for .Ftyp . The last expression is independent of the details
of the fitness landscape, containing only the measurable parameters .N, M and u.
This condition can be fulfilled only when the number of individuals M is much
smaller than the genome length N, as .u < 1. The phenomenon of stochastic escape
occurs only for very small populations.
Prebiotic evolution deals with the questions surrounding the origin of life. Is it
possible to define chemical autocatalytic networks in the primordial soup having
properties akin to those of the metabolic reaction networks powering the workings
of every living cell?
In this scenario, a key point regards the cell membrane, which acts as a defining
boundary, separating body and environment on a physical level. Precursors of life
would not have membranes, but one may hypothesize that chemical regulation
networks emerging within a primordial soup of macromolecules did evolve into
the protein regulation networks of living cells once enclosed by a membrane.
more realistic model of interacting chemical reactions, which will be, hopefully,
functionally closer to the precursors of the protein regulation network of living cells.
Mass Conservation We can choose the flux .−xφ(t) in Eigen’s equation (8.37) for
prebiotic evolution such that the total concentration
C=
. xi , Ċ = Wij xj − C φ,
i ij
is conserved for long times. Demanding .Ċ → 0 and .C → 1 for large times suggests
φ(t) =
. Wij xj (t) (8.38)
ij
for a suitable choice for the field .φ(t), which leads in turn to
d
Ċ = φ (1 − C),
. (C − 1) = −φ (C − 1) . (8.39)
dt
The total concentration .C(t) will therefore approach 1 for .t → ∞ for .φ > 0,
which we assume to be the case in the first case, implying total mass conservation.
In this case the autocatalytic rates .Wii dominate with respect to the transmolecular
mutation rates .Wij (.i = j ).
where W is the matrix .Wij . Assuming for simplicity a symmetric mutation matrix
Wij = Wj i , the solution of the linear differential equation (8.40) is given in terms
.
The eigenvector .eλmax with the largest eigenvalue .λmax will dominate for .t → ∞,
due to the overall mass conservation, as enforced by (8.39). The flux will likewise
adapt to the largest eigenvalue,
leading to the stationary condition .ẋi = 0 for the evolution equation (8.40) in the
long time limit.
Quasispecies The eigenvectors .eλ will contain only a single non-zero entry when
W is diagonal in (8.40), viz when mutations are absent, such as .(0, . . . , 1, . . . , 0).
In this case, a single macromolecule remains in the primordial soup for .t → ∞.
RNA World The macromolecular evolution equations (8.37) do not contain terms
describing the catalysis of molecule i by molecule j . This process is, however,
important both for the prebiotic evolution, as stressed by Manfred Eigen, as well
as for the protein reaction network in living cells.
HYPERCYCLES Two or more molecules may form a stable catalytic (hyper) cycle when
the respective intermolecular catalytic rates are large enough to mutually support their
respective synthesis.
10 A sequence of elements is summable if the sum does not diverge when the length is successively
A B
Fig. 8.8 The simplest hypercycle. Here, A/B are self-replicating molecules, with A acting as
a catalyst for B, and vice versa. The replication rate of one species increases hence with the
concentration of the other
matically and as a precursor of the genetic material. One speaks also of an “RNA
world”.
= (1 − C) φ → 0 ,
κi=j = κ,
. κii = 0, λi = α i , (8.43)
308 8 Darwinian Evolution, Hypercycles and Game Theory
par n
3
0
2
1
where we have used i xi = 1. The stationary concentrations {xi∗ } of (8.44) are
∗ (λi + κ − φ)/κ
.xi = λi = α, 2α, . . . , N α , (8.45)
0
as illustrated in Fig. 8.10, where the non-zero solution is valid when the respective
xi∗ is positive, namely for λi − κ − φ > 0. The flux φ is determined self-consistently
via (8.42).
x*i : κ=50
40 x*i : κ=200 0.2
0 0
0 10 20 30 i 40 50
Fig. 8.10 The autocatalytic growth rates λi (left axis), as in (8.43) with α = 1. The stationary
concentrations xi∗ (right axis) constitutes a prebiotic quasispecies, compare (8.46). Shown are
results for various mean transcatalytic rates κ. The individual molecules i = 1, . . . , 50 are indexed
along the horizontal axis
α
N N
λi + κ − φ κ −φ
.1= xi∗ = = i+ N + 1 − N∗
∗
κ κ ∗
κ
i i=N i=N
α κ − φ
∗ ∗
= N(N + 1) − N (N − 1) + N + 1 − N∗ ,
2κ κ
λN ∗ −1 + κ − φ α(N ∗ − 1) κ − φ
0=
. = + , (8.47)
κ κ κ
can be used to eliminate (κ − φ)/κ. We take the limit of large numbers of reactants
N and N ∗ ,
2κ 2
. N 2 − N ∗ − 2N ∗ N − N ∗
α
2
= N 2 − 2N ∗ N + N ∗ = (N − N ∗ )2 .
Origin of Life Aeons have passed since life emerged on earth. The scientific
discussions concerning its origin continue to be controversial and it remains spec-
ulative whether hypercycles played a central role. The basic theory of hypercycles
treated here, describing closed systems of chemical reactions, would be needed to
be extended to non-equilibrium situations, with constant in- and outflows of both
molecules and energy. In fact, a defining feature of biological activities is the buildup
of local structures, which is possible when entropy is reduced locally at the expense
of an overall increase of environmental entropy. Life, as we understand it today, is
based on open systems driven by a constant flux of energy.
trees in a 50 ha patch of
tropical rainforest in Panama 30
number of species
(bars), in comparison with the
prediction of neutral theory
(filled circles), see (8.52), 20
namely a Gamma
distribution. Data from 10
Volkov et al. (2003)
0
1 2 4 8 16 32 64 128 256 512 1024 2024
number of individuals per species
– DETERMINISTIC COMPETITION
In between species, resource competition is deterministic.
– STOCHASTIC EVENTS
Births, deaths, migration, and speciation, are treated on a stochastic level.
The birth and death processes contain both intensive and extensive terms, respec-
tively .∝ (x)0 and .∝ (x)1 ,
bx = b̃0 + b̃1 x,
. dx = d̃0 + d̃1 x , (8.50)
where the intensive contributions .b̃0 (.d̃0 ) model the cumulative effects of immi-
gration (emigration) and speciation (extinction), which occur on the level of
individuals. The extensive terms, .b̃1 (.d̃1 ), are generated by reproduction processes
actual births and deaths, which are proportional to the population size.
∂ 1 ∂2
.dx+1 px+1 − dx px dx px + dx px + . . . ,
∂x 2 ∂x 2
312 8 Darwinian Evolution, Hypercycles and Game Theory
∂p(x, t) ∂ 1 ∂2
. = dx − bx p(x, t) + dx + bx p(x, t)
∂t ∂x 2 ∂x 2
∂p(x, t) ∂ x ∂2
. = − b p(x, t) + D 2 x p(x, t) , (8.51)
∂t ∂x τ ∂x
where we defined
−1 b̃1 + d̃1
τ = d̃1 − b̃1
. > 0, b = b̃0 − d̃0 = 2b̃0 > 0, D= ,
2
when restricting to the case .b0 = −d0 > 0.
Competition vs. Diffusion The parameters introduced in (8.51) have the following
interpretations:
√
– D induces fluctuations of the order . x in a population of size x.
– b corresponds to the net influx, caused either by immigration or by speciation.
– .τ is a measure of the strength of interaction effects in the ecosystem, expressed
as the time scale the system needs to react to perturbations.
In order to understand the effect of .τ in more detail, we start with the case .b = 0 =
D, for which the Fokker–Planck equation (8.51) reduces to
τ ṗ = p + xp ,
. p ∼ e−t/T x β , −τ/T = 1 + β .
The distribution is normalizable for .β < −1, which implies .T > 0. The ecosystem
would slowly die out, on a time scale T , as a consequence of the competition
between the species, when not counterbalanced by the diffusion D and the external
source b. Note, that .τ > 0 implies that .d̃1 > b̃1 , namely that there are more deaths
than births, on average, for all species and independent of population sizes.
12 The theory behind the Fokker–Planck equation is developed in Sect. 3.5.2 of Chap. 3.
8.7 Coevolution and Game Theory 313
x ∂ x
0=
. − b p0 (x) + D xp0 (x) = + D − b p0 (x) + Dxp0 (x) ,
τ ∂x τ
which is satisfied by the solution (8.52). The steady state solution (8.52) fits the
real-world data quite well, see Fig. 8.11.
These update rules are conserving with respect to the total number of individuals.
The steady-state distribution obtained by this model is similar to the one obtained
for the neutral model defined by the birth and death rates (8.50), shown in Fig. 8.11,
with the functional form of their respective species abundance distributions differing
in details.
The average number of offspring, viz the fitness, is the single relevant reward
function within Darwinian evolution. There is hence a direct connection between
evolutionary processes and ecology with game theory, which deals with interacting
agents trying to maximize a single reward function, denoted utility. Several types
of games may be considered in this context, namely games of interacting species
giving rise to coevolutionary phenomena or games of interacting members of the
same species, pursuing distinct behavioral strategies.
314 8 Darwinian Evolution, Hypercycles and Game Theory
F(S)
x(S)
F(S)
x(S)
Fig. 8.12 Top: An evolutionary process of a single species in a static fitness landscape here with
tower-like structures, as defined in (8.24). The density of individuals in sequence space, .x(S), will
reorganize accordingly. Bottom: Coevolutionary processes are present when the adaption of one
species changes the fitness landscapes .F (S) of other species
Coevolution The larger part of this chapter has been devoted to the discussion of
the evolution of a single species. In Sect. 8.5.2 we ventured into the stabilization
of ‘ecosystems’ composed of a hypercycle of mutually supporting species, before
turning in Sect. 8.6 to general macroecological principles. We now go back to the
level of a few interdependent species.
One can view the coevolutionary process also as a change in the respective
fitness landscapes, as illustrated in Fig. 8.12. A prominent outcome of reciprocal
coevolution forcings is the “red queen” phenomenon.
RED QUEEN PHENOMENON When two or more species are interdependent then “It takes
all the running, to stay in place” (from Lewis Carroll’s children’s book “Through the
Looking Glass”).
A well-known example of the red queen phenomenon is the “arms race” between
predator and prey commonly observed in real-world ecosystems. Snakes becoming
ever more poisonous, with frogs developing successively higher resistance levels.
– UTILITY
Every participant, the agent, plays for himself, solely maximizing its own utility.
– STRATEGY
Every participant follows a set of rules of what to do when encountering an
opponent; the strategy.
– ADAPTIVE GAMES
In adaptive games, the participants change their strategy in order to maximize
future return. This change can be either deterministic or stochastic.
– ZERO-SUM GAMES
When the sum of utilities is constant, you can only win what the others lose.
– NASH EQUILIBRIUM
Any strategy change by participants leads to a reduction of its utility.
13 The basic model for a trophic cascade, the Lotka-Volterra Model for rabbits and foxes, is treated
analyzed.
316 8 Darwinian Evolution, Hypercycles and Game Theory
Hawks and Doves This simple evolutionary game tries to model competition
in terms of expected utilities between aggressive behavior (by the “hawk”) and
peaceful demeanor (by the “dove”). The rules are given in Table 8.2.
The expected returns, the utilities, are collected together in the “payoff matrix”,
1
AHH AHD 2 (V − C) V
A=
. = V . (8.53)
ADH ADD 0 2
where .xD and .xH are the densities of doves and hawks. The flux .φ is given by
Steady State Solution We are interested in the steady-state solution of (8.54), with
ẋD = 0 = ẋH . Setting
.
xH = x,
. xD = 1 − x ,
we find
x2 V V C
φ(t) =
. (V − C) + V x(1 − x) + (1 − x)2 = − x2
2 2 2 2
Table 8.2 Rules of the Hawk & Dove game, together with the entries .ADD , etc., of the payoff
matrix defined in (8.53)
Dove meets Dove .ADD = V /2 They divide the territory
Hawk meets Dove .AHD = V , .ADH = 0 The Hawk gets all the territory, the Dove
retreats and gets nothing
Hawk meets Hawk .AHH = (V − C)/2 They fight, get injured, and win half the
territory
8.7 Coevolution and Game Theory 317
for the flux. The update rule for the density of hawks is
V −C V V C 2
ẋ =
. x + V (1 − x) − φ(t) x = − x+ x −x x
2 2 2 2
C C+V V C V
= x x2 − x+ = x (x − 1) x −
2 C C 2 C
d
≡− V (x) ,
dx
where we defined the potential
x2 x3 x4
V (x) = −
. V + (V + C) − C .
4 6 8
The steady-state solution is given by
V (x) = 0,
. x = V /C ,
– For .V > C doves are eliminated from the population, it does not pay to be
peaceful.
– For .V < C hawks and doves coexists, with respective densities .x = V /C and
.1 − V /C.
Here “cooperation” between the two prisoners is implied, and not cooperation
between a suspect and the police. The prisoners are best off if both keep silent.
The standard values are
T = 5,
. R = 3, P = 1, S = 0.
318 8 Darwinian Evolution, Hypercycles and Game Theory
reward for defectors = Rd = T Nc + P (N − Nc ) /N ,
where .Nc is the number of cooperators and N the total number of agents. The
difference is
as .R − T < 0 and .S − P < 0. The reward for cooperation is always smaller than
that for defecting.
– At each generation (time step) agents evaluate their payoff together with the
payoff of the neighbors.
– Next, agents compare their payoff one by one with the payoffs obtained by the
neighbors.
– Agents switche strategy (cooperate/defect) to the strategy of the neighbor with
the highest payoff.
Despite their simplicity, this set of rules leads to surprisingly complex real-space
patterns of defectors intruding patches of cooperators, as illustrated in Fig. 8.13.
Further details depend on the value chosen for the payoff matrix.
Game Theory and Memory Standard game theory deals with an anonymous soci-
ety of agents, with agents having no memory of previous encounters. Generalizing
this standard setup it is possible to empower the agents with a memory of their past
strategies and achieved utilities. Considering additionally individualized societies,
Fig. 8.13 Time series of the spatial distribution of cooperators (gray) and defectors (black) on a
lattice of size .N = 40 × 40. Time is given as the numbers of generations (in brackets). Initial
condition: equal number of defectors and cooperators, randomly distributed. Parameters for the
payoff matrix, .{T ; R; P ; S} = {3.5; 3.0; 0.5; 0.0}. Reprinted from Schweitzer et al. (2002) with
permissions, © 2002 World Scientific Publishing Co Pte Ltd
this memory may include the names of the opponents encountered previously.
This type of games provides the basis for studying the emergence of sophisticated
survival strategies, like altruism, via evolutionary processes.
Opinion Dynamics Agents in classical game theory aim to maximize their respec-
tive utilities. However, this does not necessarily reflect daily social interactions.
When encountering somebody else, explicit maximization of rewards or utilities
is not always the priority.
An example of reward-free games is given by opinion dynamics models. In a
basic model .i = 1, . . . , N agents have continuous opinions .xi = xi (t). When two
agents interact, they change their respective opinions according to
[xi (t) + xj (t)]/2 |xi (t) − xj (t)| < θ
xi (t + 1) =
. , (8.56)
xi (t) |xi (t) − xj (t)| ≥ θ
Common Pool Resources A village may have a pond without access restrictions.
Everybody with a fishing rod may go to the pond and fish at any time. Overfishing is
the likely outcome, with the consequence that the common pool resource is depleted.
When this happens, the “tragedy of the commons” is in the making. However, free
access does not imply that fishing does not incur costs. One has to invest money, for
the equipment, and time, for the trip to the pond and for the actual activity. People
will stop fishing once returns become too small.
Diverse situations are described by the tragedy of the commons, such as the
overfishing of an ocean by fleets of trawlers, or the common exploitation of
an underground aquifer. Importantly, pollution of a common resource, like CO.2
emissions into earth’s atmosphere, fall into the same category.
Reference Model Agents invest into the commons as long as their payoffs .Ei
remain positive, where .i = 1, .., N, with N being the number of agents. The
quantity invested, .xi ≥ 0, can viewed as time or money. As usual, payoffs are given
by the difference between nominal return and investment costs,
Ei = e−xtot − ci xi ,
. xtot = xj , (8.57)
j
where .ci > 0 are per-unit investing costs. The factor .exp(−xtot ) specifies how the
productivity of the commons degrades when total investment .xtot increases.
Profitable Agents Selfish agents increase their investments .xi as long as their
gradients .dEi /dxi remain positive. The equilibrium condition is
(1 − xi )e−xtot = ci ,
. xi = 1 − ci extot . (8.58)
where .cmax is the profitability barrier. Agents with .ci > cmax would have negative
xi and returns, which is not profitable. It is hence sufficient to restrict the analysis to
.
2 N=1
N=2
N=3
1 -log(c)
0
0 0.2 0.4 0.6 0.8 1
mean investment costs c
Fig. 8.14 For the tragedy of the commons, the optimal total investment .xtot . For N agents, total
investment as a function of mean investment costs .c̄ is given, see (8.61). Limiting behaviors are
.xtot = N at .c̄ = 0 and .xtot = 0 for .c̄ = 1. Also shown is the case .N → ∞, for which .xtot =
log(1/c̄) diverges logarithmically for vanishing .c̄ → 0
Dispersion Relation For the stationary state, combining (8.59) and (8.57) leads to
ci ci 2
.Ei = cmax − ci 1 − = cmax 1 − , (8.60)
cmax cmax
which constitutes a dispersion relation .Ei = E(ci ). The profitability barrier .cmax =
exp(−xtot ) is determined in turn by the self-consistency condition
xtot −xtot 1
. 1− e = c̄, c̄ = ci , (8.61)
N N
i
which is obtained by summing (8.58) over all agents. Here, .c̄ are the average invest-
ment costs of profitable agents. The resulting .xtot is plotted in Fig. 8.14. As expected,
the productivity .exp(−xtot ) of the common resource degrades progressively with
increasing N and decreasing .c̄.
c̄ c̄ xtot log(1/c̄)
. = −x = 1 − ≈1− , (8.62)
cmax e tot N N
322 8 Darwinian Evolution, Hypercycles and Game Theory
where we used that .limN→∞ xtot = log(1/c̄). For an average agent, with .ci = c̄, the
large-N expansion of the payoff .E(c̄) is consequently
2
log(1/c̄) 1
E(c̄) ≈ cmax
. ∼ , (8.63)
N N2
where the dispersion relation (8.60) has been used. Intuitively one could have
expected that payoffs would scale instead as .1/N. The fact that the average payoff
scales worse, denoted “castastrophic poverty”, can be traced to a progressively
deteriorating state of the commons.
Oligarchs At face value, Eq. (8.63) holds only for the average agent. One can how-
ever prove that the payoffs of the majority of all agents scale as .(1/N)2 . A notable
exception are “oligarchs”, namely agents with investement costs substantially below
0
.c̄. Oligarchs have payoffs that scale as .(1/N) . Only a finite number of oligarchs
Exercises
by the transfer matrix method presented in Sect. 8.3.2. Calculate the free
energy F (T ,B), the magnetization M(T , B) and the susceptibility χ (T ) =
limB→0 ∂M(T∂B
,B)
.
(8.2) ERROR CATASTROPHE
For the prebiotic quasispecies model (8.40), consider tower-like autocatalytic
reproduction rates Wjj and mutation rates Wij (i = j ) of the form
⎧
⎨ u+ i = j + 1
1 i=1
.Wii = , Wij = u− i = j − 1 ,
1−σ i >1 ⎩
0 i = j otherwise
with σ, u± ∈ [0, 1]. Determine the error catastrophe for the two cases
u+ = u− ≡ u and u+ = u, u− = 0. Compare with the results for the
tower landscape discussed in Sect. 8.3.3.
Hint: For the stationary eigenvalue (8.40), with ẋi = 0 (i = 1, . . .), write xj +1
as a function of xj and xj −1 . This two-step recursion relation leads to a 2 × 2
matrix. Consider the eigenvalues/vectors of this matrix, the initial condition
8.7 Coevolution and Game Theory 323
for x1 , and the normalization condition i xi < ∞ valid in the adapting
regime.
(8.3) COMPETITION FOR RESOURCES
The competition for scarce resources has been modeled in the quasispecies
theory, see (8.37), by an overall constraint on population density. With
.ẋi = Wii xi Wii = f ri − d, f˙ = a − f ri xi (8.64)
i
one models the competition for the resource f explicitly. Here a and f ri
are the regeneration rates of the resource f , respectively for species i, with
d being the mortality rate. Equation (8.64) does not contain mutation terms
∼ Wij , describing a stationary ecosystem.
Which is the steady-state value of the total population density C = i xi and
of the resource level f ? Is the ecosystem stable?
(8.4) COMPETITIVE POPULATION DYNAMICS
Consider the symmetric Lotka-Volterra system
ẋ = x (1 − ky) − x ,
. ẏ = y (1 − kx) − y (8.65)
where v is the bare reward and c(x) = x γ a generic cost function. When
losing, the invested time is not recovered. Find the optimal strategy for the
war of attrition, as defined by (8.66).
(8.9) TRAGEDY OF THE COMMONS
For the analysis of the tragedy of the commons a productivity function
P (xtot ) = exp(−xtot ) had beed used in (8.57). Show that catastrophic poverty
is present for a generic P (xtot ).
Further Reading
References
Azaele, S., et. al. (2016). Statistical mechanics of ecological systems: Neutral theory and beyond.
Reviews of Modern Physics, 88, 035003.
Blythe, R. A., & McKane, A. J. (2007). Stochastic models of evolution in genetics, ecology and
linguistics. Journal of Statistical Mechanics: Theory and Experiment, P07018.
Bull, J. J., Meyers, L. A., & Lachmann, M. (2005). Quasispecies made simple. PLoS Computa-
tional Biology, 1, e61.
Drake, J. W., Charlesworth, B., & Charlesworth, D. (1998). Rates of spontaneous mutation.
Genetics, 148, 1667–1686.
Drossel, B. (2001). Biological evolution and statistical physics. Advances in Physics, 2, 209–295.
Eigen, M., & Schuster, P. (2012). The hypercycle – A principle of natural self-organization.
Springer.
Gros, H. (2023). Generic catastrophic poverty when selfish investors exploit a degradable common
resource. Royal Society Open Science, 10, 221234.
Heasley, L. R., Sampaio, N. M., & Argueso, J. L. (2021). Systemic and rapid restructuring of the
genome: A new perspective on punctuated equilibrium. Current Genetics, 67, 57–63.
Higgs, P. G., & Lehman, N. (2015). The RNA world: Molecular cooperation at the origins of life.
Nature Reviews Genetics, 16, 7–17.
Jain, K., & Krug, J. (2006). Adaptation in simple and complex fitness landscapes. In U. Bastolla,
M. Porto, H. E. Roman, & M. Vendruscolo, (Eds.), Structural approaches to sequence evolution:
Molecules, networks and populations. AG Porto.
References 325
Schweitzer, F., Behera, L., & Mühlenbein, H. (2002). Evolution of cooperation in a spatial
prisoner’s dilemma. Advances in Complex Systems, 5, 269–299.
Sigmund, K. (2017). Games of life: Explorations in ecology, evolution and behavior. Courier.
Volkov, I., Banavar, J. R., Hubbell, S. P., & Maritan, A. (2003). Neutral theory and relative species
abundance in ecology. Nature, 424, 1035–1037.
Wilkinson, D. M., & Sherratt, T. N. (2016). Why is the world green? The interactions of top-down
and bottom-up processes in terrestrial vegetation ecology. Plant Ecology & Diversity 9, 127–
140.
Synchronization Phenomena
9
Complex systems are based on interacting local computational units may show non-
trivial emerging behaviors. Examples are the time evolution of an infectious disease
in a certain city that is mutually influenced by an ongoing outbreak of the same
disease in another city, or the case of a neuron firing spontaneously while processing
the effects of afferent axon potentials.
A fundamental question is whether the time evolutions of interacting local
units remain dynamically independent of each other, or whether they will change
their states simultaneously, following identical rhythms. This is the notion of
synchronization, which we will study throughout this chapter. Starting with the
paradigmatic Kuramoto model we will learn that synchronization processes may be
driven either by averaging dynamical variables, or through causal mutual influences.
On the way, we will visit piecewise linear dynamical systems and the reference
model for infectious diseases, the SIRS model.
In this chapter, we will be dealing mostly with autonomous dynamical systems that
may synchronize spontaneously. A dynamical system can also be driven by outside
influences, being forced to follow the external signal synchronously.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 327
C. Gros, Complex and Adaptive Dynamical Systems,
https://doi.org/10.1007/978-3-031-55076-8_9
328 9 Synchronization Phenomena
where ‘c.c.’ stands for conjugate complex. Of relevance will be the relation between
the eigenfrequency .ω0 , damping .γ , and the driving frequency .ω. In the absence of
an external forcing, .F ≡ 0, the solution is
/
γ 1
x(t) ∼ eλt ,
. λ ± =− ± γ 2 − 4ω02 , (9.2)
2 2
which is damped/critical/overdamped for .γ < 2ω0 , .γ = 2ω0 and .γ > 2ω0 .
Frequency Locking In the limit of long times, .t → ∞, the dynamics of the system
follows the external driving. Due to the damping .γ > 0, this holds for all .F /= 0.
For the ansatz
= −a (ω + iλ+ ) (ω + iλ− ) ,
where the eigenfrequencies .λ± are given by (9.2). The solution for the amplitude a
can then be written in terms of .λ± or alternatively as
−F
a=(
. ) . (9.4)
ω2 − ω02 − iωγ
1 “Secular perturbation theory” deals with time-dependent amplitudes, .a = a(t). See Sect. 3.2 of
Chap. 3.
9.2 Coupled Oscillators and the Kuramoto Model 329
Whenever their dynamical behaviors are similar and the mutual couplings substan-
tial, sets of local dynamical systems may synchronize. It is not necessary that the
individual units show periodic dynamics on their own, coupled chaotic attractors can
also synchronize. We start by discussing the simplest non-trivial set-up, the case of
harmonically coupled uniform oscillators.
describe periodic attractors, viz limit cycles. In the following we take the radius r
as given, using the phase variable .θ = θ (t) for an effective description.
E
N
.θ̇i = ωi + Γij (θi , θj ), i = 1, . . . , N , (9.5)
j =1
Kuramoto Model A particularly tractable choice for the coupling constants .Γij
has been proposed by Kuramoto:
K
Γij (θi , θj ) =
. sin(θj − θi ) , (9.6)
N
where .K ≥ 0 is the coupling strength. The factor .1/N ensures that the model is well
behaved in the limit .N → ∞.
K K
θ̇1 = ω1 +
. sin(θ2 − θ1 ), θ̇2 = ω2 + sin(θ1 − θ2 ) , (9.7)
2 2
or
Δθ̇ = Δω − K sin(Δθ ),
. Δθ = θ2 − θ1 , Δω = ω2 − ω1 . (9.8)
330 9 Synchronization Phenomena
Δθ/π
with .Δω = 1 and a critical
coupling strength .Kc = 1. 1.5
For an undercritical coupling,
.K = 0.9, the relative phase
1
increases steadily, for an
overcritical coupling
0.5
K = 1.01
.K = 1.01 it locks
0
0 5 10 15 20
t
The system has a fixpoint .Δθ ∗ with regard to the relative phase, determined by
d Δω
. Δθ ∗ = 0, sin(Δθ ∗ ) = , (9.9)
dt K
which leads to
Δθ ∗ ∈ [−π/2, π/2],
. K > |Δω| . (9.10)
This condition is valid for attractive coupling constants .K > 0. For repulsive .K < 0
anti-phase states are stabilized. We analyze the stability of the fixpoint using .Δθ =
Δθ ∗ + δ and (9.8), obtaining
d ( ) ∗
. δ = − K cos Δθ ∗ δ, δ(t) = δ0 e−K cos Δθ t .
dt
The fixpoint is stable since .K > 0 and .cos Δθ ∗ > 0, due to (9.10). Consequently, a
bifurcation is present.
– For .K < |Δω| there is no phase coherence between the two oscillators, which
are drifting with respect to each other.
– For .K > |Δω| phase locking is observed, which means that the two oscillators
rotate together with a constant phase difference.
f ∞
.g(ω) = g(−ω), g(ω) dω = 1 , (9.11)
−∞
9.2 Coupled Oscillators and the Kuramoto Model 331
implicit in (9.11) is generally possible, as the dynamical equations (9.5) and (9.6)
are invariant under a global translation
ω → ω + Ω,
. θi → θi + Ωt ,
1 E iθj
N
r eiψ =
. e (9.12)
N
j =1
1 E i(θj −θi ) 1 E
N N
r ei(ψ−θi ) =
. e , r sin(ψ − θi ) = sin(θj − θi ) ,
N N
j =1 j =1
retaining in the second step the imaginary component. Inserting the second expres-
sion into the governing equation (9.5), we find
KE
θ̇i = ωi +
. sin(θj − θi ) = ωi + Kr sin(ψ − θi ) . (9.13)
N
j
The individual phases .θi are drawn towards the self-consistently determined
mean phase .ψ, as can be seen in the numerical simulations presented in Fig. 9.2.
Mean-field theory is exact for the Kuramoto model. It is nevertheless non-trivial to
solve, as the self-consistency condition (9.12) needs to be fulfilled.
332 9 Synchronization Phenomena
Rotating Frame of Reference The order parameter .reiψ performs a free rotation
for long times .t → ∞ in the thermodynamic limit. With
θi → θi + ψ = θi + Ωt,
. θ̇i → θi + Ω, ωi → ωi + Ω
to the rotating frame of reference. The governing equation (9.13) then becomes
θ̇i = ωi − Kr sin(θi ) .
. (9.14)
This expression is identical to the one for the case of two coupled oscillators, see
(9.8), when substituting Kr by K. It then follows directly that .ωi = Kr constitutes
a special point.
Drifting and Locked Components Eq. (9.14) has a fixpoint .θi∗ for which .θ̇i = 0,
as determined by
[ π π]
Kr sin(θi∗ ) = ωi ,
. |ωi | < Kr, θi∗ ∈ − , . (9.15)
2 2
The interval for .θi∗ is selected such that .sin(θi∗ ) ∈ [−1, 1].
9.2 Coupled Oscillators and the Kuramoto Model 333
−Kr 0 Kr ω
Fig. 9.3 The region of locked and drifting natural frequencies .ω within the Kuramoto model
– LOCKED UNITS
As we are working in the rotating frame of reference, .θ̇i = 0 means that the
participating limit cycles oscillate for .|ωi | < Kr with the average frequency .ψ,
they are “locked” to .ψ.
– DRIFTING UNITS
For .|ωi | > Kr, the participating limit cycle drifts, i.e. .θ̇i never vanishes. They do,
however, slow down when they approach the locked oscillators, compare (9.14).
The distinct dynamics of locked and drifting units is illustrated in Figs. 9.1, 9.2, and
9.3.
ρ(θ, ω) dθ
.
the fraction of oscillators with natural frequency .ω that lie between .θ and .θ + dθ .
Of relevance for the following will be the continuity equation for .ρ = ρ(θ, ω),
∂ρ ∂ ( )
. + ρ θ̇ = 0 ,
∂t ∂θ
θ̇ = ω − Kr sin(θ )
.
in the stationary case, when .ρ̇ = 0. The individual oscillators pile up at slow places,
thinning out at fast places on the circle. Hence
f π
C
ρ(θ, ω) =
. , ρ(θ, ω) dθ = 1 , (9.16)
|ω − Kr sin(θ )| −π
where the brackets .<·> denote population averages. In the last step we used the fact
that one can set the average phase .ψ to zero.
where we have assumed .g(ω) = g(−ω) for the distribution .g(ω) of the natural
frequencies within the rotating frame of reference. Using (9.15),
we obtain
f π/2
<eiθ >locked =
. cos(θ ∗ ) g(Kr sin θ ∗ ) Kr cos(θ ∗ ) dθ ∗ (9.18)
−π/2
f π/2
= Kr cos2 (θ ∗ )g(Kr sin θ ∗ ) dθ ∗ .
−π/2
vanishes. Physically this is clear: oscillators that are not locked to the mean field
cannot contribute to the order parameter. Mathematically, it follows from .g(ω) =
g(−ω), together with .ρ(θ + π, −ω) = ρ(θ, ω) and .ei(θ+π ) = −eiθ .
Critical Coupling With the drifting component vanishing, the population average
<eiθ > of the order parameter,
.
f π
2
r = <eiθ > ≡ <eiθ >locked = Kr
. cos2 (θ ) g(Kr sin θ ) dθ . (9.19)
− π2
9.2 Coupled Oscillators and the Kuramoto Model 335
0
Kc K
Expansion around Criticality For the functional dependence of the order r around
K = Kc we expand (9.19) with respect to .r << 1,
.
f [ ]
g '' (0) ( )2
1=K
. dθ cos2 (θ ) g(0) + Kr sin θ
|θ|< π2 2
π π
= K g(0) + K 3 r 2 g '' (0) (9.21)
2 2·8
which holds in light of our previous assumption, namely that integrals antisymmet-
ric in .θ vanish.3 Multiplying with .Kc /K, we rewrite (9.21) as
Kc r2 1 K 2 Kc |g '' (0)|
1−
. = 2, = ,
K R0 R02 16/π
f
2 Note,that . dx cos2 (x) = [cos(x) sin(x) + x]/2, modulo a constant.
f
3 One has . dx cos2 (x) sin2 (x) = x/8 − sin(4x)/32, plus an integration constant.
336 9 Synchronization Phenomena
where we assumed .g '' (0) < 0, namely that the frequency distribution is locally
maximal. Expanding .R0 = R0 (K) into powers of .K − Kc would lead to higher-
order corrections, one can hence set .R0 = R0 (Kc ). Together, we find
/
Kc 2
r = R0 1 −
. , Kc = , (9.22)
K πg(0)
Chimera States The original chimera, the one from Greek mythology, was a
hybrid animal, part lion, goat and snake. In the world of synchronization phenoma,
a “chimera” is a hybrid state, supporting both synchronous and asynchronous
behaviors in networks of identical coupled oscillators. A prototypical system is
E
θ̇i = ω +
. wij sin(θj − θi − α) , (9.23)
j /=i
where the .wij are coupling matrix elements of varying strength. Translation invari-
ance, namely that .wij = wi−j , is allowed, e.g. for units that are distributed regularly
in real space. Of relevance is a non-zero phase lag .α, which induces reciprocal
frustration. For large numbers N of participating units, strictly speaking when
.N → ∞, partially synchronized states are stabilized for possibly exponentially
clapping intensity
of 100 individuals. Data from
Néda et al. (2000a)
3
0
1 1.5 2 2.5 3 3.5 4
clapping frequency [Hz]
large transients. This is somewhat surprising, given that the natural frequencies are
all identical.
Kuramoto Model with Time Delays Two limit-cycle oscillators, coupled via a
time delay T are described by
K [ ]
θ̇1 (t) = ω1 +
. sin θ2 (t − T ) − θ1 (t) ,
2
K [ ]
θ̇2 (t) = ω2 + sin θ1 (t − T ) − θ2 (t) .
2
In the steady state,
θ1 (t) = ω t,
. θ2 (t) = ω t + Δθ ∗ , (9.24)
5 Anintroduction into the intricacies of time-delayed dynamical systems is given in Sect. 2.5 of
Chap. 2.
338 9 Synchronization Phenomena
2 ω
1-0.9*sin(ω) 5
1-0.9*sin(6ω)
1.5
4
1 1
0.5
3
0
0 0.5 1 1.5
2
ω
Fig. 9.6 Left: Graphical solution of the self-consistency condition (9.27), for time delays .T =
1/6 (shaded/full line), having respectively one/three intersections (filled circles) with the diagonal
(dashed line). The coupling constant is .K = 1.8. Right: An example of a directed ring, containing
five sites
K[ ]
ω = ω1 +
. − sin(ωT ) cos(Δθ ∗ ) + cos(ωT ) sin(Δθ ∗ ) , (9.25)
2
K[ ]
ω = ω2 + − sin(ωT ) cos(Δθ ∗ ) − cos(ωT ) sin(Δθ ∗ ) .
2
Taking the difference leads to
Δω = ω2 − ω1 = K sin(Δθ ∗ ) cos(ωT ) ,
. (9.26)
which generalizes (9.9) to the case of a finite time delay T . Together, (9.25) and
(9.26) determine the locking frequency .ω and the phase slip .Δθ ∗ .
Multiple Synchronization Frequencies For finite time delays T , there are gener-
ally more than one solution for the synchronization frequency .ω. For concreteness,
we consider
K
ω1 = ω2 ≡ 1,
. Δθ ∗ ≡ 0, ω =1− sin(ωT ) , (9.27)
2
compare (9.26) and (9.25). This equation can be solved graphically, as shown in
Fig. 9.6.
For .T → 0 the two oscillators are phase locked, oscillating with the original
natural frequency .ω = 1. A finite time delay leads to a change of the synchronization
frequency and, eventually, for large enough time delay T and couplings K,
to multiple solutions for the locking frequency, with every second intersection
9.4 Synchronization Mechanisms 339
shown in Fig. 9.6 being stable/unstable. The introduction of time delays induces
consequently qualitative changes regarding the structuring of phase space.
where periodic boundary conditions are implied, namely that .θN +1 ≡ θ1 . Special-
izing to the uniform case .ωj ≡ 1, the network becomes invariant under discrete
rotations, which allows for plane-wave solutions with frequency .ω and momentum
k,6
2π
θj = ω t − k j,
. k = nk , nk = 0, . . . , N − 1 , (9.29)
N
where .j = 1, . . . , N . With this ansatz, the locking frequency .ω is determined by
the self-consistency condition
ω = 1 + K sin(k − ωT ) .
. (9.30)
In analogy to (9.27), a set of solutions with distinct frequencies .ω can be found for
a given momentum k. The resulting dynamics is characterized by complex spatio-
temporal symmetries, in term of the constituent units .θj (t), which oscillate fully in
phase only for vanishing momentum .k → 0.
6 In the complex plane, .ψj (t) = eiθj (t) = ei(ωt−kj ) corresponds to a plane wave on a periodic
ring. Eq. (9.28) is then equivalent to the phase evolution of the wavefunction .ψj (t). The system
is invariant under translations .j → j + 1, which implies that the discrete momentum k is a good
quantum number, in the jargon of quantum mechanics. The periodic boundary condition .ψj +N =
ψj is satisfied for .k = 2π nk /N .
7 Initial functions for time delay systems are discussed in Sect. 2.5 of Chap. 2.
340 9 Synchronization Phenomena
The coupling term of the Kuramoto model, see (9.6), contains differences .θi − θj
in the respective dynamical variables .θi and .θj . With an appropriate sign of the
coupling constant, this coupling corresponds to a driving force towards the mean,
θ1 + θ2 θ1 + θ2
θ1 →
. , θ2 → ,
2 2
which competes with the time development of the individual oscillators whenever
the respective natural frequencies .ωi and .ωj are distinct. On their own, the
individual units would rotate with different angular velocities. As carried out in
Sect. 9.2, a detailed analysis is necessary when studying this competition between
the synchronizing effect of the coupling and the de-synchronizing influence of a
non-trivial natural frequency distribution.
The matrix elements are .Aij > 0 if the units i and j are coupled, and zero otherwise,
with .Aij representing the relative weight of the link. We define now aggregate
variables .x̄i = x̄i (t) by
E
x̄i = (1 − κi )xi + κi
. Aij xj , (9.31)
j
to a superposition of .xi with the weighted mean activity . j Aij xj of all its
neighbors.
8 The connection of the adjacency matrix to the graph spectrum is discussed in Sect. 1.2 of Chap. 1.
9.4 Synchronization Mechanisms 341
ẋi = fi (x̄i ),
. i = 1, . . . , N , (9.32)
with the .x̄i given by (9.31). The .fi describe local dynamical systems which could
be, e.g., harmonic oscillators, relaxation oscillators, or chaotic systems.
Expansion Around the Synchronized State In order to expand (9.32) around the
globally synchronized state, we first rewrite the aggregate variables as
E
x̄i = (1 − κi )xi + κi
. Aij (xj − xi + xi ) (9.33)
j
( E ) E
= xi 1 − κi + κi Aij + κi Aij (xj − xi )
j j
E
= xi + κi Aij (xj − xi ) ,
j
E
where we have used the normalization . j Aij = 1. The differences in activities
.xj − xi are small close to the synchronized state,
E
fi (x̄i ) ≈ fi (xi ) + fi' (xi ) κi
. Aij (xj − xi ) . (9.34)
j
Differential couplings .∼ (xj − xi ) between the nodes of the network are hence
equivalent, close to synchronization, to the aggregate averaging of the local
dynamics via the respective .x̄i .
General Coupling Functions One may go one step further and define with
E
ẋi = fi (xi ) + hi (xi )
. gij (xj − xi ) (9.35)
j
The equivalence of .hi (xi )gij' (0) and .fi' (xi )κi Aij , compare (9.34), is only local in
time, which is however sufficient for a local stability analysis. The synchronized
state of the system with differential couplings, see (9.35), is hence locally stable
342 9 Synchronization Phenomena
– We saw, when discussing the Kuramoto model in Sect. 9.2, that generically not
all nodes of a network participate in a synchronization process. For the Kuramoto
model the oscillators with natural frequencies far away from the average do
not lock to the time development of the order parameter, see Fig. 9.3, retaining
drifting trajectories.
– Generically, synchronization takes the form of coherent time evolution with
phase lags, we have seen an example when discussing two coupled oscillators
in Sect. 9.2. The synchronized orbit is then
xi (t) = x(t) + δi ct ,
. |c|t = eλt , (9.36)
λα ,
. (δ1α , . . . , δN
α
), α = 1, . . . , N .
One of the exponents characterizes the flow along the synchronized direction. The
synchronized state is stable if all the remaining .λj (.j = 2, . . . , N ) Lyapunov
exponents are negative when averaged over the orbit.
with
in linear order in the .δi around the synchronized orbit .x = x̄i = xi , for .i = 1, 2.
The expansion factor c corresponds to an eigenvalues of the Jacobian, the usual
situation for discrete maps. As a result, we find two local pairs of eigenvalues and
eigenvectors, namely
1
c1 = r(1 − 2x)
. (δ1 , δ2 ) = √ (1, 1)
2
1
c2 = r(1 − 2x)(1 − 2κ) (δ1 , δ2 ) = √ (1, −1)
2
As expected, .λ1 > λ2 , since .λ1 corresponds to a perturbation along the synchronized
orbit. The overall stability of the synchronized trajectory can be examined by
averaging above local Lyapunov exponents over the full time development, which
defines the “maximal Lyapunov exponent”.11
for the coupling strength .κ necessary for stable synchronization by observing that
|1 − 2x| ≤ 1 and hence
.
The synchronized orbit is stable for .|c2 | < 1. For .κ ∈ [0, 1/2] we obtain
r −1
|c2 | ≤ r(1 − 2κ),
. κ>
2r
for the lower bound for .κ. For the maximal reproduction rate, .r = 4, synchro-
nization is possible for .3/8 < κ ≤ 1/2. Given that the logistic map is chaotic for
.r > r∞ ≈ 3.57, this results proves that chaotic coupled systems may synchronize.
ẋ = f (x) − y + I f (x) = 3x − x 3 + 2
. ( ) ( ). (9.39)
ẏ = e g(x) − y g(x) = α 1 + tanh(x/β)
-2 -1 0 1 2 -2 -1 0 1 2
x x
Fig. 9.7 The .ẏ = 0 (red dashed-dotted lines), and .ẋ = 0 (blue full lines) isocline of the Terman–
Wang oscillator (9.39), here for .α = 5, .β = 0.2, .e = 0.1. Left: .I = 0.5, shown is the relaxation
cycle for .e << 1 (dotted line, arrows). Right: .I = −0.5, which has a stable fixpoint, .PI (filled
green dot)
∂ ẋ ∂ ẏ
. + = 3 − 3x 2 − e = 3(1 − x 2 ) − e .
∂x ∂y
For small .e << 1 the system takes up energy for membrane potentials .|x| < 1,
dissipating energy for .|x| > 1.
ẋ = 0 y = f (x) + I
.
ẏ = 0 y = g(x)
by the intersection of the two functions .f (x) + I and .g(x), as illustrated in Fig. 9.7.
Hence, there are two parameter regimes.
12 The divergence of the flow is equivalent to the relative contraction/expansion of phase space,
.ΔV /V , as discussed in Sect. 3.1.1 of Chap. 3.
346 9 Synchronization Phenomena
6
relaxational excitable state
I>0 4
4 I<0
y(t)
x(t), y(t)
2 y(t)
2
0 0
x(t) x(t)
-2 -2
0 20 40 60 80 0 20 40 60 80
time time
Fig. 9.8 Sample trajectories of .x(t) (blue line) and .y(t) (dashed-dotted line). Results for the
Terman–Wang oscillator (9.39), together with .α = 5, .β = 0.2, .e = 0.1. Left: Spiking behavior
for .I = 0.5, characterized by silent/active phases for negative/positive x. Right: Relaxation to a
stable fixpoint for .I = −0.5
– RELAXATION REGIME
For the .I > 0 the Terman–Wang oscillator relaxes to a periodic solution, see
Fig. 9.7, which is akin to the relaxation oscillation observed for the Van der Pol
oscillator.13
– EXCITABLE STATE
For .I < 0 the system settles into a stable fixpoint, becoming however active
again when an external input shifts I into positive region. The system is said to
be “excitable”.
Silent and Active Phases In the relaxation regime, the periodic solution jumps
rapidly, for .e << 1, between trajectories that approach closely the right branch (RB),
and the left branch (LB) of the .ẋ = 0 isocline. The times spent respectively on
the two branches may however differ substantially, as indicated in Figs. 9.7 and 9.8.
Indeed, the limit cycle is characterized by two distinct types of flows.
– SILENT PHASE
The LB (.x < 0) of the .ẋ = 0 isocline is comparatively close to the .ẏ = 0
isocline, the system spends therefore considerable time on the LB. This leads to
a “silent phase”, which is equivalent to a refractory period.
– ACTIVE PHASE
The RB (.x > 0) of the .ẋ = 0 isocline is far from the .ẏ = 0 isocline, when
.α >> 1, see Fig. 9.7. The time development .ẏ is hence fast on the RB, which
13 The van der Pol oscillator, treated in Sect. 3.2 of Chap. 3, has to regimes, corresponding
respectively to small/large adaptive term.
9.4 Synchronization Mechanisms 347
The relative rate of the time development through .ẏ between silent and active phase
is determined by the parameter .α, as defined in (9.39).
Spikes as Activity Bursts In its relaxation phase, the Terman–Wang oscillator can
be considered as a spontaneously spiking neuron, see Fig. 9.8. When .α >> 1, the
active phase undergoes a compressed burst, viz a “spike”.
. I → I + ΔI, ΔI > 0 ,
such that the second neural oscillator changes from an excitable state to the
oscillating state. This process is illustrated graphically in Fig. 9.9, it corresponds
to a signal sent from the first to the second dynamical unit. In neural terms, when
the first neuron fires, the second neuron follows suit.
. 1 ⇒ 2 ⇒ 3 ⇒ ...
which are assumed to be initially all in the excitable state, with .Ii ≡ −0.5. Inter-unit
coupling via fast threshold modulation corresponds to
where .Θ(x) is the Heaviside step function. That is, we define an oscillator i to be in
its active phase whenever .xi > 0. The resulting dynamics is shown in Fig. 9.10. The
348 9 Synchronization Phenomena
y
dy/dt = 0
CE
8
RB E
o2(t 2)
C 6 o1(t 2)
o2(0)
RB
4
o1(0) LB E
2 o2(t)
o2(t1)
LB o1(t)
o1(t 1)
−2 –1 0 1 2 x
Fig. 9.9 Fast threshold modulation for two excitatory coupled relaxation oscillators, see (9.39),
denoted symbolically as .o1 = o1 (t) and .o2 = o2 (t). When .o1 jumps at .t = t1 from the LB to the
RB, it becomes active. The cubic .ẋ = 0 isocline for .o2 is consequently raised from C to .CE . This
induces .o2 to jump as well from left to right. Note that the jumping from the right branch (.RB and
.RBE ) back to the left branches occurs in the reverse order, .o2 jumps first
chain is driven by setting the first oscillator of the chain into the spiking state for a
certain period of time. All other oscillators spike consecutively in rapid sequence.
as illustrated in Fig. 9.11. No fixpoint exists when .g0 > 1. In comparison to the
original functionality shown in Fig. 9.7, the .ẋ = 0 and .ẏ = 0 isoclines are now
discontinuous, but otherwise piecewise linear or constant functions.
9.5 Piecewise Linear Dynamical Systems 349
ΔI1(t)
0
xi(t)
-1
-2
i=1,2,3,4,5
-3
0 50 100 150
time
Fig. 9.10 As an example of synchronization via causal signaling, shown are sample trajectories
(lines) for a line of coupled Terman–Wang oscillators, as defined by (9.39). The relaxation
.xi (t)
oscillators are in excitable states, with .α = 10, .β = 0.2, .e = 0.1 and .I = −0.5. For .t ∈ [20, 100]
a driving current .ΔI1 = 1 is added to the first oscillator (dotted line). In consequence, .x1 starts to
spike, driving the other oscillators one by one via fast threshold modulation
ẋ = 1 − x − y,
. ẏ = e(g0 − y) (9.42)
Starting at .x(0) = 0, the orbit crosses the .x = 0 line the next time after half-a-
period, which is evident from Fig. 9.11, with the y-component changing in between.
The matching conditions are therefore
1 dy/dt=0 1
y(t)
x 0
-1 1
dx/dt=0 x(t)
-1
-1
10 20
time
Fig. 9.11 The piecewise linear Terman–Wang system, as described by (9.41) and (9.42). The
parameters are .g0 = 1.2 and .e = 1. Left: The phase space .(x, y). Shown are the piecewise linear
.ẋ = 0 isocline (blue), the piecewise constant .ẏ = 0 isolcine (red), and the resulting limit cycle
(black). Right: As a function of time, kinks show up when the orbit crosses .x = 0. Compare
Figs. 9.7, and 9.8
The starting condition .x(0) = 0 yields .x̃ = g0 − 1, see (9.43), which leads to
( T ) −T /2
.g0 − 1 = g0 − 1 − ỹ e ,
2
[ ]
2g0 = −ỹ e−T /2 + 1 . (9.45)
As a result one has two condition for two parameters, .ỹ and T .
Limit Cycle Period Given .g0 , the only free model parameter, one can eliminate
either .ỹ or T from (9.45). Doing so, one obtains a self-consistency equation for the
remaining parameter that can be solved numerically.
Alternatively we may ask which .g0 would lead to a given period T . In this case
T is fixed und one eliminates .ỹ. From
2 [ ] [ ]
ỹ =
. 1 − eT /2 (g0 − 1), ỹ 1 + e−T /2 = −2g0
T
one finds
2 [ ][ ]
. − 2g0 = (g0 − 1) 1 − eT /2 1 + e−T /2
T
and hence
2 sinh(T /2)
g0 =
. , (9.46)
2 sinh(T /2) − T
9.6 Synchronization Phenomena in Epidemics 351
a remarkably simple expression. E.g., the orbit shown in Fig. 9.11 has a period of
T ≈ 7.6, which is consistent with .g0 = 1.2,.
.
border of respectively the .ẋ = 0 and .ẏ = 0 isoclines. From (9.46) one finds
. lim g0 = 1 , (9.47)
T →∞
an expected relation. The oscillatory behavior slows down progressively when .g0 →
1, viz when the orbit needs to squeeze through the closing gap between the two
isoclines. In the opposite limit, when .T → 0, the Taylor expansion .sinh(z) ≈ z +
z3 /6 leads to
/
3 3
.g0 ∼ , T ∼ , T << 1 . (9.48)
T2 g0
√
The flow accelerates as .1/ g0 when .g0 becomes large.
There are illnesses, like measles, that come and go recurrently. Looking at the
statistics of local measle outbreaks presented in Fig. 9.12, one can observe that
outbreaks may occur in quite regular time intervals within a given city. Interestingly
though, these outbreaks can be either in phase (synchronized) or out of phase
between different cities.
The oscillations in the number of infected persons are definitely not harmonic,
sharing instead many characteristics with relaxation oscillations, which typically
have silent and active phases, compare Sect. 9.4.2.
SIRS Model The reference model for infectious diseases is the SIRS model. It
contains three compartments, in the sense that individuals belong at any point in
time to one of three possible classes.
S : susceptible,
. I : infected,
R : recovered.
352 9 Synchronization Phenomena
0
1
0.5
0
44 46 48 50 52 54 56 58
years
Fig. 9.12 Observation of the number of infected persons in a study on illnesses. Top: Weekly
cases of measle cases in Birmingham and Newcastle (red/blue lines). Bottom: Weekly cases of
measle cases in Cambridge and Norwich (green/brown lines). Data from He and Stone (2003)
– INFECTION PROCESS
With a certain probability, susceptibles pass to the infected state after coming
into contact with one infected individual.
– RECOVERING
Infected individuals recover from the infection after a given period of time, .τI ,
passing to the recovered state.
– IMMUNITY
For a certain period, .τR , recovered individuals are immune, returning to the
susceptible state once when immunity is lost.
Sum Rule The SIRS model can be implemented both for discrete and for contin-
uous time, we start with the former. The infected phase is normally short, which
allows to use .τI as the unit of time, setting .τI = 1. The recovery time .τR is then a
multiple of .τI = 1. We define with
S S S I R R R S S state
1 2 3 4 5 6 7 8 9 time
Fig. 9.13 Example of the course of an individual infection within the discrete-time SIRS model
with an infection period .τI = 1 and a recovery duration .τR = 3. The number of individuals
recovering at time t is the sum of infected individuals at times .t − 1, .t − 2 and .t − 3, compare
(9.49)
E
τR E
τR
. st = 1 − xt − xt−k = 1 − xt−k (9.49)
k=1 k=0
holds, as the fraction of susceptible individuals is just one minus the number of
infected individuals minus the number of individuals in the recovery state, compare
Fig. 9.13.
Discrete Time SIRS Model We denote with a the rate of transmitting an infection
when there is a contact between an infected individual and a susceptible individual.
Using the recursion relation (9.49) one obtains with
( )
E
τR
xt+1 = axt st = axt 1 −
. xt−k (9.50)
k=0
Relation to the Logistic Map For .τR = 0, the discrete time SIRS model (9.50)
reduces to the logistic map,
xt+1 = axt (1 − xt ) .
.
The trivial fixpoint .xt ≡ 0 is globally attracting when .a < 1, which implies that the
illness dies out. The non-trivial steady state is
1
.x (1) = 1 − , for 1 < a < 3.
a
At .a = 3 a Hopf bifurcation is observed, beyond an oscillation of period two.
Increasing the growth rate a further, a transition to deterministic chaos takes place.
The SIRS model (9.50) shows somewhat similar behaviors. Due to the memory
terms .xt−k , the resulting oscillations may however depend on the initial condition.
354 9 Synchronization Phenomena
0.1
0
0 20 40 60 80 100
time
Fig. 9.14 Example of a solution to the SIRS model, see (9.50), for .τR = 6. The number of infected
individuals might drop to very low values during the silent phase in between two outbreaks, as most
of the population is first infected and then immunized during an outbreak
Two Coupled Epidemic Centers What happens when two epidemic centers are
weakly coupled? We use
st(1,2) ,
. xt(1,2)
one describes a situation where a small fraction e of infected individuals visits the
respective other center. In addition, one needs to apply the sum rule (9.49) to both
centers.
For .e = 1 there is no distinction between the two centers, which means that the
dynamics described by (9.51) can be merged via .xt = xt(1) +xt(2) and .st = st(1) +st(2) .
A single combined epidemic population remains. The situation is similar to the case
of to coupled logistic maps, as given by (9.37). This is not surprising, since the
coupling term in (9.51) is based on aggregate averaging.
9.6 Synchronization Phenomena in Epidemics 355
0.1
0
0.3 a=2, e=0.100, τR=6, x0(1)=0.01, x0(2)=0
x(i)
t 0.2
0.1
0
0 10 20 30 40 50 60
time
Fig. 9.15 Time evolution of the fraction of infected individuals .x (1) (t) and .x (2) (t) within the SIRS
model, see (9.51), for two epidemic centers .i = 1, 2 with recovery times .τR = 6 and infection rates
.a = 2, see (9.50). For a very weak coupling .e = 0.005 (top) the outbreaks occur out of phase, for
a moderate coupling .e = 0.1 (bottom) in phase
In Fig. 9.15 we present the results from a numerical simulation of the coupled
model, illustrating the typical behavior. We see that the epidemic outbreaks occur
in the SIRS model indeed in phase for moderate to large coupling constants e.
In contrast, for very small couplings e between the two centers of epidemics, the
synchronization phase flips to antiphase. This phenomenon is observed in reality,
compare Fig. 9.12.
Time Scale Separation The reason for the occurrence of out of phase synchroniza-
tion is the emergence of two separate time scales in the limit .tR >> 1 and .e << 1.
A small seed .∼eax (1) s (2) of infections in the second city needs substantial time to
induce a full-scale outbreak, even via exponential growth, when e is too small. But
in order to remain in phase with the current outbreak in the first city the outbreak
occurring in the second city may not lag too far behind. When the dynamics is
symmetric under exchange .1 ↔ 2 the system then settles in antiphase cycles.
356 9 Synchronization Phenomena
It+Δt − It
I˙ ≈
. , Δt = 1 ,
Δt
in (9.50) yields
I˙ = aI (1 − I − R) − I = aI S − I ,
. (9.52)
which has a simple interpretation. The number of infected increases with a rate that
is proportional to the infection probability a and to the densities I and S of infected
and susceptibles. The term .−I describes per-time recovering.
Continuous Time SIRS Model For the discrete case, we took the duration of the
illness as the unit of time, as illustrated in Fig. 9.13. Using (9.52) and rescaling time
by .T = 1/λ, one obtains with
Ṡ = −gI S + δR,
. I˙ = gI S − λI, Ṙ = +λI − δR (9.53)
the standard continuous time SIRS model, where .g = λa and .δ = λ/τR . Recall that
τR denotes the average time an individual remains immune, as measured in units of
.
illness duration T .
λ g−λ λ g−λ
S=
. , I= , R= , (9.54)
g δ+λ δ δ+λ
which follows directly from (9.53). It exists when the infection rate g is larger than
the recovery rate .λ, namely when .g > λ. Otherwise, the illness dies out. One can
show that the endemic state is always stable. The discrete SIRS model supports in
contrast solutions corresponding to periodic outbreaks, as shown in Fig. 9.13.
I˙ dI λ 1
. = = −1 + , (9.55)
Ṡ dS g S
Exercises 357
0.3
g = 3.0 λ
peak
actual cases I g = 2.0 λ
0.2 g = 1.5 λ
0.1
0
0.2 0.4 0.6 0.8 1
all cases X
Fig. 9.16 For the SIR model, the current number of infected I as a function of the cumulative
number of cases .X = 1 − S. The curves are described by the analytic solution (9.56), using
.R0 = 0 = I0 and .S0 = 1. The outbreak starts at .(X, I ) = (0, 0), ending when I vanishes again
(open circles). A sizeable fraction of the population is never infected
where .S0 = S(0), .I0 = I (0) and .R0 = 1 − S0 − I0 denote the starting configuration.
The functional dependence of .I = I (X) is shown in Fig. 9.16, where .X = I + R =
1 − S is the cumulative number of cases. The illness dies out before the entire
population has been infected.
Exercises
14 The methods of time delay systems are laid down in Sect. 2.5 of Chap. 2.
358 9 Synchronization Phenomena
two coupled chaotic maps, with κ ∈ [0, 1] being the coupling strength and
T the time delay, compare (9.32). Discuss the stability of the synchronized
states x1 (t) = x2 (t) ≡ x̄(t) for general time delays T . What drives the
synchronization process?
(9.6) TERMAN–WANG OSCILLATOR
Discuss the stability of the fixpoints of the Terman–Wang oscillator, see
(9.39). Linearize the differential equations around the fixpoint solution and
consider the limit β → 0.
(9.7) PULSE COUPLED LEAKY INTEGRATOR NEURONS
The membrane potential x(t) of a leaky-integrator neuron can be thought to
increase over time like
( )
ẋ = γ (S0 − x),
. x(t) = S0 1 − e−γ t . (9.58)
Ḣ = −αH Z,
. Ż = (2α − 1)H Z . (9.59)
Find and discuss the analytic solution when starting from initial densities H0
and Z0 = 1 − H0 . Which is the difference to the SIRS model?
Further Reading
The reader may consult a textbook containing examples for synchronization pro-
cesses by Pikovsky et al. (2003), and an informative review of the Kuramoto model
by Pikovsky and Rosenblum (2015), which contains in addition historical annota-
tions. Some of the material discussed in this chapter requires a certain background
in theoretical neuroscience, see e.g. Sterrat et. al (2023). An introductory review to
chimera states is given in Panaggio and Abrams (2015).
We recommend that the interested reader takes a look at some of the original
research literature, such as the exact solution relaxation oscillators, Terman and
Wang (1995), the concept of fast threshold synchronization, Somers and Kopell
(1993), the physics of synchronized clapping, Néda et al. (2000a,b), and synchro-
nization phenomena within the SIRS model of epidemics, He and Stone (2003). For
synchronization with delays see D’Huys et al. (2008).
References
D’Huys, O.,Vicente, R., Erneux, T., Danckaert, J., & Fischer, I. (2008). Synchronization properties
of network motifs: Influence of coupling delay and symmetry. Chaos, 18, 037116.
He, D., & Stone, L. (2003). Spatio-temporal synchronization of recurrent epidemics. Proceedings
of the Royal Society London B, 270, 1519–1526.
Néda, Z., Ravasz, E., Vicsek, T., Brechet, Y., & Barabási, A. L. (2000a). Physics of the rhythmic
applause. Physical Review E, 61, 6987–6992.
Néda, Z., Ravasz, E., Vicsek, T., Brechet, Y., & Barabási, A.L. (2000b). The sound of many hands
clapping. Nature, 403, 849–850.
Panaggio, M. J., & Abrams, D. M. (2015). Chimera states: coexistence of coherence and
incoherence in networks of coupled oscillators. Nonlinearity, 28, R67.
Pikovsky, A., & Rosenblum, B. (2015). Dynamics of globally coupled oscillators: Progress and
perspectives. Chaos, 25, 9.
Pikovsky, A., Rosenblum, M., & Kurths, J. (2003). Synchronization: A universal concept in
nonlinear sciences. Cambridge University Press.
Somers, D., & Kopell, N. (1993). Rapid synchronization through fast threshold modulation.
Biological Cybernetics, 68, 398–407.
Sterratt, D., Graham, B., Gillies, A., Einevoll, G., & Willshaw, D. (2023). Principles of computa-
tional modelling in neuroscience. Cambridge University Press.
Strogatz, S. H. (2001). Exploring complex networks. Nature, 410, 268–276.
Terman, D., & Wang, D. L. (1995) Global competition and local cooperation in a network of neural
oscillators. Physica D, 81, 148–176.
Complexity of Machine Learning
10
Without doubt, the brain is the most complex adaptive system known to humanity,
arguably also a complex system about which we know little. In both respects, the
brain faces increasing competition from machine learning architectures.
We present an introduction to basic neural network and machine learning con-
cepts, with a special focus on the connection to dynamical systems theory. Starting
with point neurons and the XOR problem, the relation between the dynamics of
recurrent networks and random matrix theory will be developed. The somewhat
counter-intuitive notion of continuous numbers of network layers is shown next to
lead to neural differential equations, respectively for information processing and
error backpropagation. Approaches aimed at understanding learning processes in
deep architectures make often use of the infinite-layer limit. As a result, machine
learning can be described by Gaussian processes together with neural tangent
kernels. Finally, the distinction between information processing and information
routing will be discussed, with the latter being the task of the attention mechanism,
the core component of transformer architectures.
Modern day computation devices and architectures are constructed using large
numbers of local computation units. Examples are transistors and quantum gates for
classical and quantum computers, or artificial neurons for deep learning algorithms.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 361
C. Gros, Complex and Adaptive Dynamical Systems,
https://doi.org/10.1007/978-3-031-55076-8_10
362 10 Complexity of Machine Learning
Perceptron Unit We start with the classical artificial neuron, denoted here “per-
ceptron unit”. Time is discrete, with individual in- and outputs corresponding to
real numbers, often with restricted domains. Dimensionality reduction is achieved
by taking the scalar product of the input vector .x with a weight vector .w,
y = σ (a(x − b)),
. x = w · x. (10.1)
The output y is generated via a transfer function .σ (z). In machine learning, the
threshold b is defined most of the time with an opposite sign. When the aim is to
study a given set of weights .w, one keeps the gain a as a free parameter. If not, one
can set it to unity, .a → 1, as done in machine learning. Geometrically, the isocline
.x = b,
w·x = b,
. (10.2)
corresponds to a plane in input space. Percetron units act hence as linear classifiers.
Planes divide space into two half-spaces, say .SA and .SB . As feature extractors,
perceptron units output information about whether the input vector is part of either
.SA or .SB .
Geometric Features Spaces When images or visual data are processed, input units
may be associated with specific geometric locations .Ri , usually corresponding to
pixels in a two-dimensional plane. After adaption, the weights .wi of a specific
classifying neuron,
wi (Ri ) → w(ΔRi ),
. ΔRi = Ri − Ry , (10.3)
will be a function of the geometric location .Ri of the afferent neuron, normally
relative to the location .Ry of the classifying neuron.
10.1 Computation Units 363
Geometric feature extraction is the prime task of the visual cortex. Examples are
linear or center-surround contrasts and gratings, both in the black-white and in the
color domain. Features can be highly non-linear geometric objects, classification is
nevertheless linear in the space of input activities.
Transfer Functions The transfer function .σ (z) entering (10.1) comes in a range of
varieties. Two classical variants are
1
σsig (z) =
. , σtanh (z) = tanh(z) , (10.4)
1 + e−z
where the first, the “sigmoidal”, works in the range .σsig ∈ [0, 1]. For the second
we have .σtanh ∈ [−1, 1]. Both transfer functions are monotonically increasing and
bounded, which makes the output a measure for the degree to which the input has
been classified. The infinite-gain limit
⎧
⎨ 0.0 x<b
. lim σsig (a(x − b)) = 0.5 x=b (10.5)
a→∞ ⎩
1.0 x>b
is piecewise linear.1 Negative arguments are shunted. On the positive side, gradients
will not vanish for large arguments, as they do for .σsig and .σtanh , remaining finite
for all .z > 0. This helps to avoid the vanishing gradient problem when propagating
errors backwards, a subject treated further below in Sect. 10.3.
Boltzmann Units So far, deterministic updating in discrete time has been used.
Stochastic dynamics,2
1 with probability p
σBoltz (z) =
. , (10.7)
0 with probability 1 − p
is however also an option. In the context of “Boltzmann machines” one relates the
transition probability p to the difference between the two suitably defined energies,
1 See Sect. 9.5 of Chap. 9, for the general theory of piecewise linear dynamical systems.
2 For the general theory of stochastic dynamical systems see Chap. 3.
364 10 Complexity of Machine Learning
The computation units discussed so far correspond to “point neurons”, in the sense
that the output depends exclusively on a single aggregate quantity, namely .x = w·x,
respectively .z = a(x − b). A possible extension are detailed biological models
that are characterized by a potentially large number of biophysical compartments.
“Compartmental neurons” are of relevance in the neurosciences. Remaining on an
abstract level, we discuss here several options for internally structured computation
units.
Gated Units Gating occurs when a certain data stream controls the processing of
a second data stream. A basic example is the “gated linear unit”, GLU,
which contains two adaptable weight matrices and thresholds, .w, .v and .bw , .bv . The
output is a linear function of the input vector .x, which is however modulated by
a sigmoidal, .σsig (xv − bv ) ∈ [0, 1]. After training, .w and .v may encode distinct
features. If this happens, the processing of a given feature may proceed only when
the second feature is also present. This type of gating is hence equivalent to a
coincidence detector for self-selected features. It holds as a corollary that a single
GLU unit can express all logical gates, including the XOR gate.3
AND OR XOR
1 1 1
x2 x2 x2
?
0 0 0
0 x1 1 0 x1 1 0 x1 1
Fig. 10.2 The AND (left) and OR (middle) gates are examples of linearly separable logical gates
(dashed diagonal line). The XOR gate is not linearly separable (right). Filled/open bullets denote
logical true/false
The term “artificial intelligence” was coined at a 1956 Dartmouth workshop. Hopes
of rapidly achieving human-level AI were however crushed when it was realized
13 years later that linear classifiers,4 like point neurons, may not perform non-
linear classification tasks, such as representing an XOR gate; see Fig. 10.2. This
was considered unfortunate, given that universal computation hinges on being able
to express all logical gates, inclusively the XOR gate. The “XOR problem”, as it
was dubbed, is thought to have initiated an extended period of disillusionment with
neuronal networks, the “AI winter”.
Hidden Layers In machine learning jargon, whatever is not part of input or output
is “hidden”. A classical solution to the XOR problem is to add a hidden layer.
As an example consider that the weights and thresholds of the two hidden units
are such that they define respectively the planes
x1 + x2 = 0.5,
. x1 + x2 = 1.5 , (10.10)
as illustrated in Fig. 10.3. For binary neurons with .σ (z) = θ (z), where .θ (z) is the
Heaviside step function, the mapping .(x1 , x2 ) → (y1 , y2 ) reads
which implies that .(0, 1) and .(1, 0) are mapped to the same point in the activity
space of the two hidden-layer units, .(y1 , y2 ). At this step, the XOR configuration
becomes linearly separable.
1 1
x1 w1 y1
x2 y2
(2)
(1)
x2 w2 y2 0 0
0 x1 1 0 y1 1
Fig. 10.3 Left: Receiving inputs .x1 and .x2 , two hidden layer neurons produce outputs .y1 and .y2 ,
the hidden-layer activity. Specific to the hidden layer are the weight vectors .w1 and .w1 , together
with the respective thresholds (not shown). Middle: In input space .(x1 , x2 ), the classifying planes
(gray dashed lines) of the two hidden neurons; see (10.10). Right: The activity space of the hidden
neurons, .(y1 , y2 ). Shown is the mapping (10.11) for binary neurons. The XOR configuration is
now linearly separable (dashed diagonal line)
Consequently, the .x3 = 0 plane can be used for the classification of the XOR
gate. Above example is constructed by hand. When using random embeddings,
a common approach, larger embedding dimensions may be necessary. The brain
makes extensive use of dimensional embedding,
nets” not. In general, the “membrane potential” .xi of a point neuron is given by
xi = ei +
. wij yj , yi = tanh(ai (xi − bi )) , (10.13)
j
where .ei is an external driving and .wij the entries of the synaptic weight matrix .ŵ.
A .tanh-transfer function has been selected.
Lyapunov Exponents Recurrent nets feed back on themselves, which gives raise
to a trailing memory. In the autonomous case, when .ei ≡ 0, the largest eigenvalue
of .ŵ, the largest “Lyapunov exponent” of the map defined by (10.13), determines if
activities tend to increase or decrease in linear order.6 This issue will be treated in
detail in Sect. 10.2.2.
The statistics of the entries of a matrix can be used for estimating the corresponding
spectrum of eigenvalues. When the entries are only weakly correlated, as defined
below, random matrix theory applies, providing precise results. Often, matrix
elements can be assumed to be independently distributed, in particular when the
rank of the matrix is large, the typical case for both neuroscience and machine-
learning applications.
Ensemble Average Random matrix theory deals with matrices with elements that
are drawn from a given probability distribution, normally a Gaussian. Drawing
all elements once generates a specific realization. “Ensemble average” denotes the
average over realizations. In practice one is however interested in the spectrum of
a specific matrix. For large matrices the ensemble average can be substituted by an
average of all entries, the venue taken here.
Off-Diagonal Correlations In its basic form, random matrix theory can be applied
only when the entries .wij are statistically fully independent. It holds however also
when the off-diagonal cross-correlation .Γ ,
i,j wij − μw wj i − μw
Γ =
. 2
, Γ ∈ [−1, 1] , (10.14)
i,j wij − μw
6 See Chap. 2.
368 10 Complexity of Machine Learning
imaginary axis
imaginary axis
0 0 0
-1 -1 -1
-1 0 1 -1 0 1 -1 0 1
real axis real axis real axis
Fig. 10.4 The distribution of eigenvalues of an elliptic .400 × 400 matrix. The eigenvalues
(filled dots) in the complex plane are shown for .Γ = −0.6/0.0/0.6 (left/middle/right), as
defined by (10.14). Included are the corresponding analytic results (10.15), which are valid in
the thermodynamic limit (lines)
is non-zero. For a .N × N matrix, we defined the mean .μw = ij wij /N 2 . The
inequality .|Γ | ≤ 1 follows from .(a ± b)2 ≥ 0, or .a 2 + b2 ≥ ∓2ab. Three special
cases are of interest.
Elliptic Random
√ Matrices We set .μw = 0, without loss of generality, and assume
σw = 1/ N for the standard deviation .σw of the matrix elements. Random matrix
.
theory states that the spectrum of the eigenvalues .λ is uniformly distributed within
an ellipse defined by
x2 y2
EΓ = λ = x + iy,
. + ≤1 . (10.15)
(1 + Γ )2 (1 − Γ )2
The half widths are .1 ± Γ respectively along the real and imaginary axis. All
eigenvalues are real/imaginary for .Γ = ±1. In Fig. 10.4 a comparison between
numerical spectra and (10.15) is presented. The agreement would be perfect in the
limit of infinite-large matrices.
Finite Mean For .Γ = 0 the ellipse defined by (10.15) reduces to a circle and one
recovers the “circular law of random matrix theory”, namely that
the eigenvalues are
uniformly distributed within a circle. When the mean .μw = ij wij /N 2 is finite,
an additional isolated eigenvalue .λμ is generated. For .Γ = 0 one can derive that
Spectral Radius When the eigenvalue spectrum is bounded, there exists a .Rw > 0
such that
.|λi | ≤ Rw , ∀i (10.17)
The “spectral radius” .Rw determines the long-term evolution when applying the
mapping in question repeatedly. For elliptic random matrices with zero mean .Rw =
σ̃w (1 + |Γ |).
The brain is autonomously active, which implies that recurrent connections are
functionally important. Does the level of internally produced activities impact the
capability of the brain to process information? We have seen in Chap. 4, that
critical dynamical systems exhibit non-trivial properties. Indeed, it can be shown
370 10 Complexity of Machine Learning
The first relation is the standard sum rule for the variance,7 the second relation is
treated in exercise (10.2).
σ̃w
σw = √ ,
. σx , σy , σext , (10.19)
N
σx2 = σext
.
2
+ Nσw2 σy2 = σext
2
+ σ̃w2 σy2 , (10.20)
where we used (10.18) and (10.19) and that units receive N recurrent inputs. The
variance of the neural activity is determined by
σy2 + μ2y =
. dx tanh2 (x − b) P (x) , (10.21)
0
-3 -2 -1 0 1 2 3
z
-1
where .P (x) is the distribution of the membrane potential. We did set .a = 1 and
made use of . (y −μy )2 = y 2 −μ2y . For balanced activities the threshold vanishes,
.b = 0, together with .μy = 0 = μx .
Central Limit Theorem The central limit theorem states that sums of large
numbers of random variables converges to a normal distribution. We can hence
assume that the probability distribution .P (x) of the membrane potential entering
(10.21) is normal distributed with mean .μx and variance .σx2 ,
1
e−(x−μx )
2 /(2σ 2 )
P (x) =
. x , 1= dxP (x) . (10.22)
2π σx2
A last step is needed before the integral in (10.21) can be evaluated analytically.
(z) = 1 − e−z
2
σGauss = ± σGauss
.
2 , 2
σGauss (10.23)
Variance of the Neural Activity We use the Gaussian transfer function for a
balanced system, with b, .μx , .μy and .μw vanishing, together with (10.22) in (10.21).
The resulting expression,
1
dx 1 − e−x e−x /(2σx ) ,
2 2 2
σy2 =
. (10.24)
2π σx2
372 10 Complexity of Machine Learning
leads via
1 σeff 1 1 + 2σx2
dx e−x
2 /(2σ 2 )
1 − σy2 =
. eff , 2
=
2π σx σeff
2 σeff σx2
to
σeff 1 1
1 − σy2 =
. = , 2σx2 = − 1. (10.25)
σx 1 + 2σx2 (1 − σy2 )2
2 1
2 σext
. + Rw σy = 2σx2 =
2 2
− 1,
(1 − σy2 )2
or
2 2
2Rw
. σy 1 − σy2
2 2
= 1 − 1 + 2σext
2
1 − σy2 , (10.26)
which is quite remarkable. It states that the neural activity, in terms of the variance
σy2 , is determined exclusively by the spectral radius .Rw of the weight matrix,
.
2 .
together with an eventually present external input, .σext
when expanding both sides of (10.26) in powers of .σy2 . Consequently, the system
undergoes an “absorbing phase transition”8 at .Rw = 1.
The state with vanishing neural activity is the final state of all orbits when .Rw < 1,
whatever the starting configuration. It is hence termed “absorbing”. The active state
Criticality vs. External Driving In Fig. 10.7 numerical solutions of (10.26) are
presented. The phenomenology of a second-order phase transition is reproduced,9
2 acting as an external field.
with the input variance .σext
Interestingly, Fig. 10.7 shows that the neural activity is substantial at criticality
Rw = 1 even when external stimuli are comparatively weak. This result raises some
.
doubts with regard to the critical brain hypothesis. Information processing will still
benefit quantitatively from being close to criticality even when the variance .σext 2 of
the data input stream is not negligible small. On a qualitative level, the working
regime of the network does however not change as a function of the spectral radius
in the presence of an external driving.
1 1
2
σy,n
. =1− =1− , (10.27)
1 + 2σx,n
2 1+2 2
σext,n + σ̃w2 σy,n−1
2
2
where .σext,n is an additional input to layer n, if existing. Above expression is valid
when all averages are zero, which would not be the case in a practical application.
The generalization to non-zero .μy , .μx , .μw is straightforward, but a bit cumbersome,
see exercise (10.5).
9 SeeSect. 6.1 of Chap. 6 for an introduction to the Landau theory of phase transitions. The
connection is made in exercise (10.3).
374 10 Complexity of Machine Learning
f = f(x, ϑ)
.
Training Data Training data consists of a set of .Nα pairs of inputs .xα and target
outputs .Fα . The overall goal is to minimize a loss function L, e.g.
2
L=
. Fα − fα , fα = f(xα , ϑ) , (10.28)
α
which can be done by following the gradient, .ϑ̇ ∼ −∇ϑ L. For layered systems, as
the one illustrated in Fig. 10.3, the gradient is evaluated recursively.
yl = f(xl , ϑ l ),
. xl = yl−1 , (10.29)
where .ϑ l denotes the adaptable parameters of the lth layer, like thresholds and
synaptic weights. In order to simplify the notation, we set .NL = 3. For a given
training pair .(xα , Fα ), the gradient of the loss function with respect to the parameters
.ϑ 3 of the last layer is
∇ϑ 3 L = ∇ϑ 3 f(x3 , ϑ 3 ) · E3 ,
. E3 = (−2) Fα − fα .
We used here the compressed notation .∇A · B ≡ i (∇Ai )Bi . With the chain rule,
we have
∇ϑ 2 L = ∇ϑ 2 f(x2 , ϑ 2 ) · E2 ,
. E2 = ∇x3 f(x3 , ϑ 3 )E3 (10.30)
for the gradient of the loss function with respect to .ϑ 2 . The error .E2 is a linear func-
tion of .E3 , the error signal of the next layer. Errors are hence propagated from top
to down, which can be continued recursively. This is denoted “backpropagation”.
Layers process their inputs typically via (10.1), which involves a non-linear transfer
function .σ (·). The gradient of .σ (·) vanishes for large inputs when the transfer
10.3 Neural Differential Equations 375
function is limited, as is it the case, e.g., for the sigmoidal. The magnitude of the
error backpropagated via (10.30) decreases hence in magnitude layer by layer, a
phenomenon denoted “vanishing gradient problem”.
Skip Connections A popular workaround for the vanishing gradient problem are
ReLU units, see (10.6), for which the gradient is constant for positive arguments.
An alternative are “skip connections”,
yl = xl + f(xl , ϑ l ),
. xl = yl−1 , (10.31)
which corresponds to add the identity to the forward pass, see Fig. 10.8. Learning
by adapting the parameters .ϑ l has now the task to generate the correct ‘residual’
.yl − xl , hence the name ResNet, “residual net”.
dx
xl+1 − xl = f(xl , ϑ l ),
. ≈ f(x, ϑ) , (10.32)
dt
where the layer index l was substituted by a pro forma time t, using .xl → x(t)
and .ϑ l → ϑ(t). Layer time .t ∈ [0, TL ] is continuous, whereas the layer index
.l = 1, 2 . . . was discrete. The “neural differential equation” (10.32) corresponds
Backpropagated errors tend to increase, given that .∇x f is normally positive. In the
continuous formulation, the error function .E = E(t) is obtained by integrating back
in layer time. When adapting parameters via gradient minimization, .Δϑ = −η∇ϑ L,
one can use (10.30) to obtain
which leads in turn to two interdependent differential equations with respect to layer
time,
ẋ = f(x, ϑ),
. Ė = −∇x f(x, ϑ) · E . (10.35)
In the adiabatic approximation, when the learning rate .η is small, one takes .ϑ as
constant when integrating .x up from .x(0) = xα . Then the error function .E is
evaluated downwards from .E(TL ) = (−2)(Fα − x(TL )). In the end, the parameters
are updated.
It may seem a bit unusual, at first sight, to work with continuously stacked layers,
which is however a viable option. Which one to use, continuous or discrete layers,
is in the end a question of performance.
N −(xi −μi )2 /(2σ 2 )
e i
P0 (x) =
. (10.36)
i=1 2π σi2
is a special case of
1 1 T Σ −1 (x−μ)
P (x) = √
.
N√
e− 2 (x−μ) (10.37)
2π det(Σ)
10.4 Gaussian Processes 377
1
Σ ei = σi2 ei
. ⇔ ei = Σ −1 ei ,
σi2
with the precondition that none of the eigenvalues vanish, viz that .Σ is non-singular.
Equivalently, the relation
⎛ 1
⎞
0
⎜ σ12 ⎟
⎜ .. ⎟
U T Σ −1 U = ⎜
. . ⎟ (10.39)
⎝ ⎠
1
0
σN2
holds. For a quick check one multiplies both sides of (10.38) and (10.39).
x = U y,
. det(U ) = 1 . (10.40)
where we eliminated .μ via an appropriate shift of the origin. It also follows that
(10.36) and (10.37) are identical, modulo an orthogonal variable transformation.
378 10 Complexity of Machine Learning
Covariance Matrix We have to show that .Σ honors its name, namely that the
general definition of the covariance matrix,
. (xi − μi )(xj − μj ) = dx (xi − μi )(xj − μj ) P (x) , (10.41)
. (xi − μi )(xj − μj ) 0 = Σ0 ,
where the .Σij are matrices, not matrix elements. The probability of observing .x1 ,
the prediction we want to make after having observed .x2 , is denoted as
P (x1 |x2 ) ,
.
P (x1 , x2 )
P (x1 |x2 )P (x2 ) = P (x1 , x2 ),
. P (x1 |x2 ) = , (10.43)
P (x2 )
holds. It is also denoted “Bayes theorem”, in particular together with the recursive
substitution .P (x1 , x2 ) = P (x2 |x1 )P (x1 ). Given the relations,
1=
. dx1 P (x1 |x2 ), P (x1 ) = dx2 P (x1 , x2 ) ,
one has that .P (x1 |x2 ) ≥ 0 is a properly normalized probability distribution, for
any .x2 , denoted “posterior”. Idem for the “marginal”, .P (x1 ). Posterior distributions
10.4 Gaussian Processes 379
are the object of desire when the information gained from an observation is to be
expressed quantitatively, a process denoted “inference”.10
Multivariate Gaussians are prime candidates for variational inference. One can
show that the posterior .P (x1 |x2 ) is also a multivariate Gaussian, with mean vector
and covariance matrix, .μ1|2 and .Σ1|2 , given by
−1 −1
μ1|2 = μ1 + Σ12 Σ22
. (x2 − μ2 ), Σ1|2 = Σ11 − Σ12 Σ22 Σ21 . (10.44)
f ∼ Gp (m, k),
. k = k(x, x ) , (10.45)
defines a Gaussian process for f . The covariance matrix .k(x, x ) has two tasks.
– .k(x, x) is the variance of the probability distribution .f (x), for a given argument
x.
– .k(x, x ) makes sure that strong discontinuities are rare, when drawing function
values for .x = x .
multivariate realizations
2
m(x) = x
1 k0 = 0.04, ξx = 0.5
k0 = 0.04, ξx = 0.05
0.5
-1 -0.5 0 0.5 1
function argument x
Fig. 10.9 The Gaussian process defined by (10.48), with a diagonal variance .k0 = 0.04. Shown is
the mean function .m(x) = x 2 (shaded line) and two realizations of the corresponding multivariate
Gaussian distribution, respectively for a correlation length .ξx = 0.5/0.05 (blue/red). The function
estimate is smoother for longer-ranged correlations between the .N = 400 arguments
Estimates are independent for each argument x when the covariance matrix is
diagonal.
μi = m(xi ),
. Σi,j = k(xi , xj ) , (10.46)
is recovered, where .i, j ∈ [1, N]. For every argument .xi , the function values .fi are
distributed according to (10.37),
1 1 T −1
P (f) = √ N √
. e− 2 (f−μ) Σ (f−μ) (10.47)
2π det(Σ)
Inference The expressions (10.44) for the mean and the covariance matrix of
the posterior hold in analogy for Gaussian processes. We enlarge the number of
arguments and function values as
x = (x1 , x2 ),
. f = (f1 , f2 ) ,
where .x1 is the N -dimensional vector of arguments covering the entire support. The
data vector .x2 , of length .Nd < N , includes all arguments for which training data is
available, with corresponding function values .f2 . Inference consists of predicting .f1 .
1
2
x (data)
function values prediction
training data
0.5
-1 -0.5 0 0.5 1
arguments x
Fig. 10.10 Predictions (red line) inferred via (10.44) from a randomly drawn set of data points
(blue dots). The covariance matrix is .k(x, x ) = exp(−(x − x )2 /0.2). The function values for the
training data (blue bullets) follow .f2 (x) = x 2 . The mean function .m(x) = 0 vanishes, in contrast
to Fig. 10.9. Also included are .2σ confidence estimators (checkerborad red lines)
−1
μ1|2 = μ1 + Σ12 Σ22
. (x2 − μ2 ) → x2 ,
given that .μ1 = m(x1 ) coincides with .μ2 = m(x2 ) when .x1 → x2 . The inferred
mean function value, .μ1|2 , coincides hence with the training value .x2 . This will
not be the case when the data is noisy, which is described by adding .σD2 δij to the
covariance matrix .Σij in (10.46).
Infinite Layer Limit A basic setup is that of a single layer neural network,
f =
. vi yi , yi = σ (xi − bi ), xi = wij xjα , (10.51)
i j
where f is a classifying unit, the function to be computed, and .y = (y1 , y2 , ..) the
activity of the hidden layer. The training samples .xα = (x1α , x2α , ..) are specific to
the problem at hand. All internal parameters are taken to be random but frozen, viz
corresponding to a specific realization.
The components of the training vectors, like .xiα and .xjα , are not necessarily
uncorrelated, which implies that the membrane potentials .xi are independent
random variables, but possibly not normal distributed. The same holds for the .yi . As
a sum of independent distributions, f becomes however Gaussian for large hidden
layers. This line of arguments, denoted the “infinite layer limit”, can be extended to
deep architectures.
f ∼ Gp (m, k),
. m = m(x), k = k(x, x ) , (10.52)
compare (10.45). Mean function and covariance matrix are averages over random
sets .ϑ of parameters,
m(x) = f (x; ϑ) ϑ
. (10.53)
k(x, x ) = (f (x; ϑ) − m(x)(f (x ; ϑ) − m(x ) ϑ .
The output of the network is Gaussian because the parameters .ϑ act as random
variables, hence the average over .ϑ. The functional dependencies of the covariance
matrix were central for the application of Gaussian processes discussed before. This
is however not a relevant point for the “neural network gaussian process” defined by
(10.53), which serves mainly as a conceptual background.
f α = f (xα ; ϑ)
.
is compared to the target .F α associated with the specific training sample .xα .
The set of network parameters .ϑ are adapted by minimizing the loss function
384 10 Complexity of Machine Learning
L=
. α (F
α − f α )2 via gradient descent,
d
. ϑ =η (∇ϑ f α ) F α − f α , (10.54)
dt α
with .η being the learning rate. The right-hand-side includes the contribution of all
Nα training samples. Training is “supervised” because the target .F α is explicitly
.
d d
. f = ∇ϑ f · ϑ ≡ η Θ(x, xα ) F α − f α
dt dt α
when using the gradient update rule (10.54). So far we did not make use of the
infinite layer limit. Equations (10.54) and (10.55) are valid for any least-square
minimization via gradient descent.
Tangent Kernels As a function of its arguments, .xα and .xβ , the “neural tangent
kernel” .Θ(xα , xβ ) is a symmetric matrix. With the matrix elements being given
by a scalar product, the neural tangent kernel is also positive definite.11 For large
networks the tangent kernel is
One says that .Θ(xα , xβ ) is “deterministic”. We do not delve into the proof of these
two prepositions, which involve inductive argumentation tailored to the architecture
at hand. Systems become increasingly over-parametrized when network sizes
diverge, which leads to the emergence of an exponentially large number of optimal
parameter configurations. The probability for the starting parameter configuration to
be close to an optimum will therefore increase steadily with layer width. Individual
parameters need to change just a little when their number diverges.
Training Space Given that the tangent kernels are constant during training, one
can solve the time evolution of .f = f (x; ϑ) explicitly via (10.55). First one needs
to determine the time-dependency of the .f α = f (xα ; ϑ), which is done in the space
of training data,
F = (F 1 , . . . , F Nα ),
. f2 = (f 1 , . . . , f Nα ) ,
where we used the subscript ‘2’ to indicate training inputs, in equivalence to (10.49).
Specifying (10.55) to the training space yields
d β d
. f =η Θ(xβ , xα ) F α − f α , f2 = η Θ22 F − f2 , (10.56)
dt α
dt
which constitutes a closed set of equations for .Nα dynamical variables, .f2 . The
projection of the tangent kernel to the training space, .Θ22 , is a symmetric .Nα × Nα
matrix. The solution,
f2 (t) = e−η Θ22 t f2 (0) + 1 − e−η Θ22 t F ,
. (10.57)
holds when .Θ22 is constant. One verifies (10.57) by direct differentiation. The
constant of integration has been selected such that .f2 (t = 0) = f2 (0). Perfect
learning, .f2 (t) → F, is recovered in the limit .t → ∞.
Tangent Kernel Inference We define with .f1 the vector of network outputs to be
predicted and use (10.57) to rewrite (10.55) as
d
. f1 = η Θ12 F − f2 = η Θ12 e−ηΘ22 t F − f2 (0) ,
dt
where the first and second arguments of .Θ12 are respectively generic inputs and
training data. Starting from .f1 (0) and .f2 (0), direct integration yields
−1
f1 (t) = f1 (0) + Θ12 Θ22
. 1 − e−ηΘ22 t F − f2 (0) , (10.58)
– The output of the network is fully specified by (10.58), including the entire
training procedure.
– When training is complete, viz for .t → ∞, the predictions obtained using
the infinite layer limit coincide formally with the corresponding expression for
Bayesian inference, Eq. (10.49).
As for Bayesian inference, one has that .f1 → F when predictions are made for
training data. This is seen easily for the case that .f1 spans the entire training space,
which leads to .Θ12 → Θ22 and .f1 (0) → f2 (0).
Note, that the formal equivalence of (10.58) and (10.49) does not imply that the
tangent kernel .Θ(x, x ) and the covariance matrix .k(x, x ) are identical. They are
not. The covariance matrix is defined via (10.41) independently in terms of two-
point correlations of the output .f = f (x, ϑ), in contrast to the definition of the
neural tangent kernel, as given by (10.55). It is however possible, though rather
cumbersome, to express .k(x, x ) in terms of .Θ(x, x ).
386 10 Complexity of Machine Learning
The tangent kernel predictions (10.58) depend on .f1 (0) and .f2 (0), namely on
the outputs generated before training did start, which are determined in turn by
the starting parameter configuration. Given that .ϑ is drawn randomly, confidence
intervals are obtained by sampling (10.58) with respect to .ϑ, viz by sampling .f1 (0)
and .f2 (0) with respect to the initial set of parameters.
The Linear Paradox Training is “lazy” when parameters hardly change, as in the
infinite layer limit. The first order Taylor expansion,
f (x, ϑ) ≈ f (x, ϑ 0 ) + ∇ϑ f (x, ϑ 0 ) Δϑ,
. Δϑ = ϑ − ϑ 0 , (10.59)
becomes then exact. The “linear model” (10.59) is a bit paradoxical. It is still non-
linear in .x, but linear in parameter space. This is possible as one works in a regime
of a diverging number of parameters, one is hence dealing with overfitting.
Classical deep learning is about information processing. When data sets become
complex, as for language processing, it may become important to focus processing
resources to subsets of the input stream, the very task of “attention”. As such,
attention has long been studied both in the neurosciences and in psychology. The
first self-contained computational model was however developed from a machine
learning perspective.
The approach taken by machine learning is radically different. Here, all parts
of the system are jointly trained. This includes attention, when incorporated, as in
transformer architectures. The view is that attention can develop in a meaningful
manner only together with representation extractions and world modeling.
10.5.1 Transformer
Pairwise Attention Attention involves pairs of tokens where one token needs to
decide if another token carries relevant information. This decision cannot be carried
out directly on the level of the respective token activities, say .xi and .xj , with the
reason being that activity encodings are constrained by the nature of the overall task.
For a working attention mechanism, tokens therefore need to generate associated
objects. Given that attention is inherently asymmetric, at least two additional objects
are necessary, “query” and “key”. In the end, a given token will receive relevant
information not just from a single other token, but from a larger number. A third
object is hence needed for the alignment of the respective representations, “value”.
Q̂i ,
. K̂i , V̂i ,
denoted query, key and value. The entries of these matrices are adapted via
backpropagation during training. The three associated query, key and value vectors,
Qi = Q̂i · xi ,
. Ki = K̂i · xi , Vi = V̂i · xi , (10.60)
are activity dependent. In order to decide whether token j carries information that
is relevant to i, the respective queries and keys are compared, viz .Qi with .Kj .
eij = Qi · Kj ,
.
1 −βeij
αij =
. e , Zi = e−βeij , αij = 1 , (10.61)
Zi
j ≤i j ≤i
token output
queries
keys =α1 =α2 =α3 =α4
for the new token activity, modulo a normalization factor, where .⊗ denotes the outer
product.
390 10 Complexity of Machine Learning
shows that the inference effort scales linearly with context length .NT . This
formulation is also denoted “retentive network”.
Exercises
where σtanh has been used for the transfer function. Show that a single unit
can encode the AND and the XOR gate.
(10.2) VARIANCE OF A PRODUCT
Derive (10.18) for the product of two statistically independent random
variables. Use x = Δx + μx , where Δx = x − μx , equivalently for both
variables.
(10.3) WIGNER SEMI-CIRCULAR LAW
Show that (10.15) reduces to the Wigner semi-cricular law (1.15) for the
density of state of the adjacency matrix of an Erdös–Rényi graph with
coordination number z.
(10.4) LANDAU THEORY FOR ABSORBING PHASE TRANSITIONS
The variance σy2 of the neuronal activity is small close to the critical point
Rw = 1. An effective theory can be obtained by expanding the self-
consistency condition (10.26) with respect to powers of σy2 . Show that the
corresponding expression (6.2) for the Landau theory of phase transitions is
2 is measured in units of σ .
recovered when σext y
References 391
Further Reading
References
Akjouj, I., et al. (2022). Complex systems in ecology: A guided tour with large Lotka-Volterra
models and random matrices. arXiv:2212.06136.
Biehl, M. (2023). The shallow and the deep: A biased introduction to neural networks and old
school machine learning. University of Groningen Press.
Chen, R. T., Rubanova, Y., Bettencourt, J., & Duvenaud, D. K. (2018). Neural ordinary differential
equations. Advances in Neural Information Processing Systems, 31.
Dauphin, Y. N., Fan, A., Auli, M., & Grangier, D. (2017). Language modeling with gated
convolutional networks. PMLR, 70, 933–941.
392 10 Complexity of Machine Learning
Gros, C. (2021). A Devil’s advocate view on ‘Self-Organized’ brain criticality. Journal of Physics:
Complexity, 2, 2021.
Jacot, A., Gabriel, F., & Hongler, C. (2018). Neural tangent kernel: Convergence and generalization
in neural networks. Advances in Neural Information Processing Systems, 31, 2018.
Lindsay, G. W. (2020). Attention in psychology, neuroscience, and machine learning. Frontiers in
Computational Neuroscience, 14, 29.
Sommers, H. J., Crisanti, A., Sompolinsky, H., & Stein, Y. (1988). Spectrum of large random
asymmetric matrices. Physical Review Letters, 60, 1895.
Schubert, F., & Gros, C. (2021). Local homeostatic regulation of the spectral radius of echo-state
networks. Frontiers in Computational Neuroscience, 15, 587721.
Sun, Y. et al. (2023). Retentive network: A successor to transformer for large language models.
arXiv:2307.08621.
Vaswani, A. et al. (2017). Attention is all you need. In Advances in Neural Information Processing
Systems (vol. 30).
Williams, C. K., & Rasmussen, C. E. (2006). Gaussian processes for machine learning. MIT Press.
Yu, Y., Si, X., Hu, C., & Zhang, J. (2019). A review of recurrent neural networks: LSTM cells and
network architectures. Neural Computation, 31, 1235–1270.
Solutions
11
The nature of the exercises given at the end of the respective chapters varies. Some
exercises are concerned with the extension of formulas and material, others deal
with the investigation of models through computer simulations. Follow the index to
reach the solution to any exercise.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 393
C. Gros, Complex and Adaptive Dynamical Systems,
https://doi.org/10.1007/978-3-031-55076-8_11
394 11 Solutions
(λk )R
.P (Xk = R) = e−λk , λk = Npk = Xk , (11.2)
R!
viz the binomial distribution. Equation (11.1) reduces for pk 1 to the
Poisson distribution, namely to (11.2), in the thermodynamic limit N → ∞.
11.1 Solutions to the Exercises of Chap. 1 395
(4,5,9) (4,5,6,8)
5
1 4
(1,9) (2,4,6,7) 8
2 6
(1,2,3) (2,3,6)
3 7
Fig. 11.1 A network of cliques (left) and the associated network of nodes (right). Nodes
correspond to managers sitting on the board of several companies, the cliques
k1 = N − 1,
. ki = 1, i = 2, . . . , N ,
2
1 (N − 2)(N − 1) N
. ≈ ∼
N23 N 23
C C0 C1
∞, α < 2, X → ∞
= (11.4)
1/(α − 2), α > 2, X → ∞
where the normalization 1 = p(x)dx for α > 1 was used. With x 2 =
(x + 1)2 − 2x − 1 we find
X (x + 1)2 α−1 X
x 2 + 2x + 1 = (α − 1)
. dx = (1 + x)3−α
0 (1 + x)α 3−α 0
α−1 2 2 2 2
.x 2 = − −1 = − = . (11.5)
α−3 α−2 α−3 α−2 (α − 3)(α − 2)
Denoting by pG (k) and pH (k) the respective degree distributions one can
evaluate the degree distribution of the K as
∞
pK (k) =
. δk,l·l pG (l) pH (l ) = dl dl δ(k − l · l ) pG (l) pH (l )
l,l 0
∞ dl G
= p (l) pH (k/ l) ,
0 l
since a given site xij in K is linked to ki ·kj other sites, where ki and kj are the
respective degrees of the constituent sites in G and H . In above derivation we
have used δ(ax) = δ(x)/|a|. The average degree and hence the coordination
number is multiplicative,
kK =
. k pK (k) = l · l pG (l) pH (l ) = kG kH .
k l,l
1 1
.G0 (ω) = , G(ω) = ,
ω ω − Σ(ω)
1
Σ(ω) = zG(ω),
. G(ω) = ,
ω − zG(ω)
which is just the starting point for the semi-circle law, Eq. (1.14).
(1.7) PROBABILITY GENERATING FUNCTIONS
We start by 2
evaluating the variance σK of the distribution pk generated by
G0 (x) = k pk x k , considering
G0 (x) =
. k pk x k−1 , G0 (x) = k(k − 1) pk x k−2
k k
398 11 Solutions
together with
G0 (1) = k,
. G0 (1) = k 2 − k .
Hence
d
. GC (x) = GN (G0 (x)) G0 (x), μC = nk ,
dx
where we have denoted with μC the mean of the cummulative process. For
the variance σC2 we find w
d2 2
. GC (x) = GN (G0 (x)) G0 (x) + GN (G0 (x)) G0 (x) ,
dx 2
which leads with (11.7) to
3(z − 2)
C=
. ,
4(z − 1)
z−1
1
. k= z(z − 1)
2
k=1
11.1 Solutions to the Exercises of Chap. 1 399
3z z
. −1
4 2
connected pairs of neighbors. Thus the result for C is
1 z z 3(z − 2)
C z(z − 1) = 3
. −1 , C= .
2 4 2 4(z − 1)
∞
kc
F0 (x)
F0 (x) =
. pk bk x k = pk x k , F1 (x) =
z
k=0 k=0
1c k
F1 (1) = 1,
. F1 (1) = pk k(k − 1) .
z
k=2
c −2 k
k
zk z
pk = e−z
. , ez = z . (11.8)
k! k!
k =0
400 11 Solutions
The network will fall below the percolation threshold when the fraction of
removed vertices exceeds the critical fraction
kc
1 zkc −1 zkc
fc = 1 −
. pk = 1 − − e−z + , (11.9)
z (kc − 1)! kc !
k=0
where we have used (11.8). The result is plotted in Fig. 11.3. The critical
fraction fc approaches unity in the limit z → ∞.
(1.10) EPIDEMIC SPREADING IN SCALE-FREE NETWORKS
This task has the form of a literature study. One defines with ρk (t) the
probability that an individual having k social connections is ill, viz infected
(i.e. k is the degree). It changes via
The first term on the right-hand-side describes the recovering process, the
second term the infection via the infection rate λ. The infection process is
proportional to the number of social contacts k, to the probability [1 − ρk ]
of having been healthy previously, and to the probability
kρk (t)pk
Θ(λ) =
. (11.11)
k j jpj
Numerically one can the solve (11.10) and (11.11) self-consistently for
the stationary state ρ̇k = 0 and a given scale-invariant degree distribution
pk , as explained in the reference given. The central question then regards the
existence of a threshold: Does the infection rate λ need to be above a finite
threshold for an outbreak to occur or does an infinitesimal small λ suffice?
The result is, that e2 aligns more and more with e1 in the limit r → 1 (when
normalized), viz when the two eigenvalues become degenerate.
(2.2) CATASTROPHIC SUPERLINEAR GROWTH
No Damping For γ = 0 in (2.73), the polynomial growth equation
1−β
x 1−β x
ẋ = x β ,
. ẋx −β = 1, =t+ 0
1−β 1−β
402 11 Solutions
where x0 = x(t = 0). The long-time behavior is regular for sublinear growth
β < 1. For superlinear growth, β > 1, one finds instead a singularity ts with
limt→ts x(t) = ∞
1
1/(β − 1) β−1 1 1
x(t) =
. , ts = . (11.13)
ts − t β − 1 x β−1
0
ẋ = x β − x = x(x β−1 − 1) .
. (11.14)
– For a superlinear β > 1 one finds x(t) → 0 for all starting x0 ∈ [0, 1],
since then x β=1 < 1, approaching however again a singularity akin
to (11.13) for starting resources x0 > 1.
– For a sublinear β < 1 on finds however x(t) → 0 for all starting x0 , since
the Lyapunov exponent of the fixpoint x ∗ = 1 is β − 1.
– Note, that superlinear growth processes have been described for some
cultural phenomena, like the growth of cities, with exponents β ≈ 1.2
being slightly above unity. See, e.g., Bettencourt et al. (2007).
1−β
x 1−β x
ẋ = −x β ,
. ẋx −β = −1, = −t + 0 ,
1−β 1−β
Fig. 11.4 Sample trajectories of the system (11.16) for = 0 (left) and = 0.2 (right). Shown
are the stable manifolds (thick green lines), the unstable manifolds (thick blue lines) and the
heteroclinic orbit (thick red line)
has the two fixpoints x∗± = (x ∗ , 0), where x ∗ = ±1. The eigenvectors of the
Jacobian J (x ∗ ) are aligned with the x and the y axis respectively, just as for
the system (2.18), with (−1, 0) being now a saddle and (1, 0) a stable node.
The flow diagram is illustrated in Fig. 11.4, it is invariant when inverting x ↔
(−x). The system contains the y = 0 axis as a mirror line for a vanishing
= 0 and there is a heteroclinic orbit connecting the unstable manifold of x∗−
to one of the stable manifolds of x∗+ .
A finite removes the mirror line y = 0 present at = 0 and destroys
the heteroclinic orbit. The unstable manifold of x∗− is still attracted by x∗+ ,
however now as generic trajectory.
(2.4) THE SUBCRITICAL PITCHFORK TRANSITION
The fastest way to understand the behavior of the subcritical pitchfork
transition,
a x4
.ẋ = ax + x 3 , U (x) = − x 2 − , (11.17)
2 4
is to look at the respective√bifurcation potential U (x), which is illustrated in
Fig. 11.5. The fixpoints ± −a exist for a < 0 and are unstable, the trivial
fixpoint is stable (unstable) for a < 0 (a > 0).
(2.5) ENERGY CONSERVATION IN LIMIT CYCLES t
The numerical integration of the energy change 0 Ė dt for the Taken-
Bogdanov system (2.33) is presented in Fig. 11.6. Energy is continuously
taken up and dissipated even along the limit cycle, but the overall energy
404 11 Solutions
balance vanishes once one cycle has been completed. Simple integrator
methods like Euler’s method may not provide adequate numerical precision.
(2.6) MAXIMAL LYAPUNOV EXPONENT
We consider the definition
1 df (n) (x)
λ(max) = lim
. log , f (n) (x) = f f (n−1) (x) .
n1 n dx
1
n−1
.λ (max)
= lim log f (f (j ) ) , f (0) (x) = x ,
n1 n
j =0
which is just the definition of the time average of the local Lyapunov
exponent (2.54).
(2.7) LYAPUNOVEXPONENTS ALONG PERIODIC ORBITS
A perturbation δϕ of a solution ϕ ∗ (t) of (2.35) evolves in linear order as
d ∗ d d
. ϕ + δϕ = 1 − K cos(ϕ ∗ ) + K sin(ϕ ∗ )δϕ, δϕ = λϕ δϕ ,
dt dt dt
y μ = 0.9 > μc
0.1
1 x
0
0 0.5 1 1.5
x coordinate
Fig. 11.6 Left: The flow for the Taken-Bogdanov system (2.33), for μ = 0.9 > μc . The thick
black line is the limit cycle, compare Fig. 2.12. Right: The integrated energy balance (2.34) along
a trajectory starting at (x, y) = (1, 0.2) (in light green in the left panel), using forth-order Runge-
Kutta integration
For K < Kc the trajectory is periodic and λϕ = K sin(ϕ∗) varies along the
unit circle. The mean exponent,
2π 2π
λT =
. λϕ dϕ = K sin(ϕ)dϕ = 0 ,
0 0
Δϕ(t + T ) = Δϕ(t),
. ϕ ∗ (t + T ) = ϕ ∗ (t) . (11.18)
y2 x3 2
U (x, y) =
. + − x − 1 − x2 y .
2 3
The system (11.19) retains its fixpoints (±1, 0) even in the presence of
g(x, y). The respective Jacobians also remain unchanged, since
∂g ∂g
. =0= .
∂x x=(±1,0) ∂y x=(±1,0)
T dV
. 0 = V x∗ (T ) − V x∗ (0) = dt
0 dt
T T 2
= ∇V · ẋ dt = − ẋ dt .
0 0
The flow ẋ must hence vanish along any closed trajectory for gradient limit
cycles, which is only possible if this trajectory consists of an isolated fixpoint.
(2.10) DELAY DIFFERENTIAL EQUATIONS
Probing for a harmonic orbit, we have
where we have used cos2 + sin2 = 1 in the last step. For a = 0 we obtain
π
ωT =
. + π n, n = 0, ±1, ±2, . . .
2
and b = ±ω for even/odd values of n.
(2.11) PERIOD-3 CYCLES IN THE LOGIST MAP
The 3-fold iterated logistic map is illustrated in Fig. 11.7. At r = r ∗ it is
tangential to the identity, hence the name “tangent transition”. For r slightly
larger than r ∗ there are three pairs of stable/unstable fixpoints. At criticality
their values merge to
x1 ≈ 0.5144,
. x2 ≈ 0.9563, x3 ≈ 0.1510 ,
11.3 Solutions to the Exercises of Chap. 3 407
g(g(g(x)))
is tangential to the identity
map
0.4
x
0.2 r = 3.80
r = r* = 3.82842
r = 3.86
0
0 0.2 0.4 0.6 0.8 1
x
which can be found from the roots of x = g(g(g(x)))), with g(x) = rx(1−x).
The derivative of the flow,
d
. g(g(g(x)))) = g (x3 )g (x2 )g (x1 ) → 1
dx
becomes neutral, as one can verify numerically. As expected, the Lyapunov
exponent vanishes at criticality.
(2.12) CAR-FOLLOWING MODEL
We consider ẋ = v0 + Δẋ in (2.76), which reads ẍ(t + T ) = α(v(t) − ẋ(t)),
and obtain
d
. Δẋ(t + T ) = −αΔẋ(t), −λe−λT = −α, Δẋ(t) = e−λt .
dt
The steady-state solution is stable for λ > 0, which is the case when both α
and T are positive. It is fun to simulate the model numerically for non-constant
v(t).
(continued)
408 11 Solutions
for the logistic map g(x) = rx(1 − x) at r = 4, which takes the form
xn+1 = sin2 (π θn ))
.
= cos(2π θn ) .
The arguments of this expression lead directly to the shift map (3.73), which
is shown in Fig. 11.8.
Stationary Distribution The distribution functions p(x) and p(θ ) are related
by
dx π
p(x)dx = p(θ )dθ,
. = sin(π θ ), cos(π θ ) = 1 − 2x .
dθ 2
With sin(π θ ) = 1 − cos2 (π θ ) we find
which is illustrated in Fig. 11.8. One can convince oneself easily that p(θ ) =
1, namely that all angles θ occur with identical probabilities.
(3.2) FIXPOINTS OF THE LORENZ MODEL
Linearizing the differential equations (3.8) around the fixpoint (x ∗ , y ∗ , z∗ )
yields
x̃˙ = −σ x̃ + σ ỹ,
.
z̃˙ = y ∗ x̃ + x ∗ ỹ − b z̃ ,
11.3 Solutions to the Exercises of Chap. 3 409
0.6
400
θn+1
p(x)
0.4
200
0.2
0
0 0.2 0.4 0.6 0.8 1 0
0 0.2 0.4 0.6 0.8 1
θn x
Fig. 11.8 For r = 4 the logistic map (2.46) is equivalent to the shift map (left), as defined by
(3.73). The probability distribution p(x) can be evaluated exactly, see (11.20), and compared with
a simulation (right)
For r < 1 all three eigenvalues are negative and the fixpoint is stable. For
r = 1 the last eigenvalue λ3 = (−11 + 11)/2 = 0 is marginal. For r > 1 the
fixpoint becomes unstable.
The stability of the non-trivial fixpoints, Eq. (3.9), for 1 < r < rc can be
proven in a similar way, leading to a cubic equation. You can either find the
explicit analytical solutions via Cardano’s method or solve them numerically,
e.g. via Mathematica, Maple or Mathlab, and determine the critical rc , for
which at least one eigenvalue turns positive.
(3.3) HAUSDORFF DIMENSION OF THE CANTOR SET
Dimension of a Line To cover a line of length l we need one circle of diameter
l. If we reduce the diameter of the circle to l/2, we require two circles to
cover the line. Generally we require a factor of 2 more circles if we reduce the
diameter to a half. From the definition (3.12) of the Hausdorff dimension we
obtain
log[N(l)/N(l )] log[1/2]
DH = −
. =− = 1,
log[l/ l ] log[2]
where we used N(l) = 1 and N(l = l/2) = 2. Therefore, the line is one-
dimensional.
410 11 Solutions
Dimension of the Cantor Set If we reduce the diameter of the circles from l to
l/3, we require a factor of two more circles to cover the Cantor set. Therefore
we obtain the Hausdorff dimension
log[N(l)/N(l )] log[1/2]
.DH = − =− ≈ 0.6309 ,
log[l/ l ] log[3]
.x(t) = x0 cos(ωt + φ) ,
where we have to determine the amplitude x0 and the phase shift φ. Using this
ansatz for the damped harmonic oscillator we find
The amplitude x0 and the phase shift φ can now be found by splitting above
equation into sin(ωt) terms and cos(ωt) terms and comparing the prefactors.
For the case ω = ω0 we obtain φ = −π/2 and x0 = /(γ ω0 ). Note that
x0 → ∞ for γ → 0.
(3.5) MARKOV CHAIN OF UMBRELLAS
Let p and q = 1 − p be the probability of raining and not raining respectively,
with n ∈ {0, 1, 2, 3, 4} being the number of umbrellas in the place Lady Ann
is currently staing (which could be either the office or her apartment). This
process is illustrated in Fig. 11.9.
Neglecting the time indices for the stationary distribution ρ(n) for the
number of umbrellas, we have
The probability for Lady Ann to get wet is therefore p · ρ(0) = pq/(4 + q).
11.3 Solutions to the Exercises of Chap. 3 411
Fig. 11.9 Graphical representation of the Markov chain corresponding to the N = 4 umbrella
problem. Being at office/home (solid/dashed circles), there can be n = 0, 1, 2, 3, 4 umbrellas in
the respective place, being taken with probability p (when it rains) to the next location. When it
does not rain, with probability q = 1−p, no umbrella changes place. Note that the sum of outgoing
probabilities is one for all vertices
with (1 ± α)/2 being the probabilities for a walker to jump to the right or to
the left, and hence
(Δx)2 αΔx
ṗ(x, t) = Dp (x, t) − vp (x, t),
. D= , v= .
2Δt Δt
(11.22)
2α (Δx)2 2α
v=
. → D, α ∝ Δx (11.23)
Δx 2Δt Δx
we see that a finite propagation velocity v is recovered in the limit Δt → 0
and Δx → 0 whenever α is linear in Δx.
For a vanishing D → 0 the solution of (11.22) is given by a ballistically
travelling wave p(x, t) = u(x − vt).
(3.7) CONTINUOUS-TIME LOGISTIC EQUATION
The continuous-time logistic equation is
ẏ ẏ
ẏ(t) = αy(t) 1 − y(t) ,
. + = α,
y 1−y
412 11 Solutions
Fig. 11.10 Numerical simulation of (3.74), with K = 1.02 and g = 1.2, using Euler integration
with a time step Δt = 0.01. Noise, as defined by (3.75), with a standard deviation σ = 0.003/0.009
(left/right plot) has been employed. The filled/open circle denotes the locus of the stable/unstable
fixpoint, 1 = K cos(ϕ± ), compare Fig. 2.13
we find
for the next state with t = Δt. This map is stable,1 whenever
1 ∂ρ ∗
ρ ∗ (x, t) =
. , = βcρ ∗ (1 − ρ ∗ ) ,
1 + eβ(x−ct) ∂t
and, equivalently
∂ρ ∗ ∂ 2ρ∗
. = −βρ ∗ (1 − ρ ∗ ) , = β 2 (1 − 2ρ ∗ )ρ ∗ (1 − ρ ∗ ) .
∂x ∂x 2
∂ρ ∂ 2ρ
R(ρ) =
. − D 2 = ρ(1 − ρ) βc − β 2 D(1 − 2ρ)
∂t ∂x
= βρ(1 − ρ) (c − βD + 2βDρ
= rρ(1 − ρ)(ρ − θ )
R(ρ) = rρ 2 (1 − ρ),
. c = βD . (11.24)
u̇ u −2(u )2 u
ρ̇ =
. , ρ = , ρ = − 2
u2 u2 u3 u
and (4.71) becomes
u̇ 1 1 2(u )2 u 2 (u )2
.
2
= 1 − − 3
− 2 +
u u u u u 1/u u4
or
u̇ = (u − 1) + u ,
. v̇ = v + v ,
which is equivalent to the linearized Fisher equation (4.18) when shifting via
v = u − 1 the zero.
(4.4) TURING INSTABILITY WITH TWO STABLE NODES
For the two eigenvalues to be real and both negative (or both positive) the
determinant Δ1 = 1− a b need to be positive, compare (4.25), which implies
(
. a − b)
2
> 4Δ1 = 4 − 4 a b, ( a + b)
2
> 4.
This is easily possible, e.g. using a = 2 and b = 1/4 with the determinant
Δ1 = 1 − 2/4 = 1/2 remaining positive, and with the trace b − a remaining
11.4 Solutions to the Exercises of Chap. 4 415
negative. A Turing instability with two stable nodes is hence possible, e.g.
with the matrices
−2 1 −8 0 −10 1
.A1 = , A2 = , A= .
−1 1/4 0 −1/8 −1 1/8
√
The eigenvalues λ± (A1 ) = (−7 ± 17)/8 are both attracting and the
determinant of A = A1 + A2 is with Δ = −1/4 negative.
(4.5) EIGENVALUES OF 2 × 2 MATRICES
For a fixpoint x∗ = (x ∗ , 0) we find for the Jacobian J
0 1
J =
. , ẋ = y, ẏ = −λ(x)y − V (x) .
−V (x ∗ ) −λ(x ∗ )
The determinant
Δ = V (x ∗ ) < 0
.
and hence
σ
σn 1 + eσ ≈ σ eσ ,
. σn ≈ . (11.25)
1 + e−σ
σ e−σ
σc = σ − σn ≈
.
1 + e−σ
d/2 d
2
. dy dxρ(x + y)ρ(x − y) = − dy dxρ(x)ρ(x + y) ,
−d/2 −d
(11.26)
416 11 Solutions
compare (4.54). Using x = x̃ − y/2 and y = 2ỹ we rewrite the second term
in (11.26) as
d d/2
. dy d x̃ρ(x̃ − y/2)ρ(x̃ + y/2) = 2 d ỹ d x̃ρ(x̃ − ỹ)ρ(x̃ + ỹ) ,
−d −d/2
which has exactly the same form a the first term in (11.26).
(4.8) CLASSICAL VOTER MODEL
The probability pi that a given agent changes opinion is
⎛ ⎞
1⎝ σi ⎠
.pi = 1− σj ,
2 z
j
d 1
ṁ =
. σi = −2σi pi = −σi + σj , σi2 = 1 .
dt z
j
1+m
m = P+ − P− = 2P+ − 1,
. P+ = ,
2
which we wanted to prove.
(continued)
11.5 Solutions to the Exercises of Chap. 5 417
μ = G (1) =
. Gi (1) Gj (1) = μj .
j i=j j
The mean of the cumulative process is just the sum of the means of the
individual processes. For the evaluation of the variance we need
G (1) =
. Gi (1) Gj (1)Gl (1) + Gi (1) Gj (1)
l,j =l i=j,l j i=j
2
= μj μl + Gj (1) = μj − μ2j
l,j =l j j j
+ σi2 − μi + μ2i
j
2
= μj + σi2 − μi = μ2 + σi2 − μ ,
j j j
418 11 Solutions
probability densities
flat, N=2
variance of σ 2 = 1/12 is
shown together with the
probability density of the sum
of N = 2 flat distribution. 0.01
Also shown is the
corresponding limiting√
Gaussian with σ = 1/ 12N ,
compare (5.11). 100 bins and
a sampling of 105 have been 0
0 0.2 0.4 0.6 0.8 1
used x
where we have assumed that p(x) is the flat distribution, for x ∈ [0, 1],
which has the variance
1 1
σ2 =
. x 2 dx = .
0 12
as extracted from historical daily data. The default would be 1/8 = 0.125.
Of interest are the ratios
p(+| − +) p−++ 1
p(+| − +) + p(−| − +) = 1,
. = = ,
p(−| − +) p−+− 0.77
yielding p(+| − +) = 0.57 and p(−| − +) = 0.43. There had been conse-
quently, in the twentieth-century, a somehow substantially larger chance of
a third day increase in the Dow Jones index, when the index had decreased
previously on the first day and increased on the second day. This kind of
information could be used, at least as a matter of principle, for a money-
making scheme. But there are many caveats: There is no information about
the size of the prospective third-day increase in this analysis, the standard
deviation of the p±±± may be large, and there may be prolonged periods of
untypical statistics. We do recommend not to invest real world money using
home made money-making schemes on the basis of this or related statistical
analysis.
(5.4) THE OR TIME SERIES WITH NOISE
Recall that OR(0, 0) = 0, together with OR(0, 1) = OR(1, 0) = OR(1, 1) =
1. For the initial condition σ1 = 0, σ0 = 0, the time series is ..0000000.., for
all other initial conditions the time series is ..1111111..,
1 typical
p(1) =
. .
3/4 average
. . . . 1111101111111110111111111111110000000000000 . . .
d 1 ∞ 1 ∞ d
. ȳ(t) = dτ y (t − τ ) e−τ/T = − dτ e−τ/T y(t − τ )
dt T 0 T 0 dτ
1 −τ/T ∞ 1 ∞
=− e y(t − τ ) − dτ e−τ/T y(t − τ )
T 0 T2 0
y(t) − ȳ(t)
= , (11.27)
T
420 11 Solutions
ȳ(t)
.
y(t)≡y0
= y0 , yc (t) y(t)≡y0
= y0 , ys (t) y(t)≡y0
= y0 ,
1 + (ωT )2 ∞
ys (t, ω) =
. dτ y(t − τ ) e−τ/T sin(ωτ ) .
ωT 2 0
Again, the update rules can be derived using a partial integration. One finds
(x−μ̃)2
p(x) ∝ 2−λ2 (x−μ̃) ∼ e
2
. 2σ 2 ,
conditions (underlined),
and
and
p(1, 1) = 0,
. p(1, 0) = 0 .
1
p(0, 0) = p(0, 1) ≡
.
2
from (11.30) and (11.33), where we normalized the result in the last step.
This finding is independent of the noise level α, for any α =
0, 1. The
marginal distribution functions are
Lastly, we find
H [p] = 1,
. H [pσ ] = 0, H [pτ ] = 1
γ −1
.p(x) = e−(x−1) , q(x) = , x, γ > 1
xγ
is
∞
p(x)
K[p; q] =
. p(x) log2 dx
1 q(x)
∞
= −H [p] − e−(x−1) log2 (γ − 1)x −γ
1
∞
= −H [p] − log2 (γ − 1) + γ e−(x−1) log2 (x) ,
1
∂K[p; q]
. =0
∂γ
viz
1 ∞ ∞
. = ln(2) e−(x−1) log2 (x) = e−(x−1) ln(x) ,
γ −1 1 1
The graphs intersect twice, q(x) > p(x) both for x = 1 and x → ∞.
(5.9) CHI-SQUARED TEST
We rewrite the Kullback Leibler divergence, see (5.2.4), as
pi qi
K[p; q] =
. pi log =− pi log
qi pi
i i
qi − pi + pi qi − pi
=− pi log − pi log 1 +
pi pi
i i
11.5 Solutions to the Exercises of Chap. 5 423
(qi − pi )2 (pi − qi )2
≈− (qi − pi ) + =
2pi 2pi
i i i
χ 2 [p; q]
≡ ,
2
q−1 1 q−1
. pk ≥ 1, Hq [p] = pk pk − 1 ≥ 0.
1−q
k
424 11 Solutions
which shows that the Tsallis entropy is extensive only in the limit q → 1.
The Tsallis entropy is maximal for an equiprobability distribution, which
follows directly from the general formulas (5.34) and (5.36).
−1 −3 3 3
φ3 = 1,
. φ2 = , φ1 = , t= , h= .
4 4 16 16
d
P (φ2 ) =
. P (φ) − h φ = (φ2 − φ3 )(φ2 − φ1 ) < 0 ,
dφ 2
t −1 2 1 (t − 1)2
.f (T , φ, h) − f0 (T , h) = φ + φ4 = − ,
2 4 4
∂F t −1 ∂F V 1−t
. = −V ⇒ S= =
∂t 2 ∂T Tc 2
∂S V
C V = Tc
. =− , T < Tc .
∂T 2Tc
"cross"
Fig. 11.12 Evolution of the pattern “cross” in the game of life: after seven steps it gets stuck in a
fixed state with four blinkers
where p is the probability for a sampling to grow. Compare Fig. 6.6. The
stationary densities xe (t + 1) = xe (t) ≡ xe∗ , etc., are
2+p ∗
(1 + p)xe∗ = xf∗ ,
. 1 = xf∗ + xt∗ + xf∗ /(1 + p), xt∗ = 1 − x ,
1+p f
which leads to self-consistency condition for the stationary density xf∗ of fires,
2+p ∗
∗
.xf = 1 − (1 − xf∗ )Z 1− x . (11.34)
1+p f
In general one needs to solve (11.34) numerically. For small densities of fires
we expand
Z(Z − 1) ∗ 2
1 − (1 − xf∗ )Z ≈ 1 − 1 − Zxf∗ +
. (xf )
2
Z(Z − 1) ∗ 2
= Zxf∗ − (xf ) ,
2
11.6 Solutions to the Exercises of Chap. 6 427
Fig. 11.13 Construction of a small-world network out of the game of life on a 2D-lattice: One
starts with a regular arrangement of vertices where each one is connected to its eight nearest
neighbors. Two arbitrarily chosen links (wiggled lines) are cut with probability p and the remaining
stubs are rewired randomly as indicated (dashed arrows). The result is a structure showing
clustering as well as a fair amount of shortcuts between far away sites, as in the Watts and Strogatz
model, Fig. 1.14, but with conserved connectivities ki ≡ 8
finding
1 Z−1 ∗ 2+p ∗ Z−1 2+p
. = 1− xf 1− xf ≈ 1 − + x∗
Z 2 1+p 2 1+p f
for (11.34). The minimal number of neighbors for fires to burn continuously
is Z > 1 in mean-field theory.
(6.6) REALISTIC SANDPILE MODEL
The variable zi should denote the true local height of a sandpile; the toppling
starts when the slope becomes too big after adding grains of sand randomly,
i.e. when the difference zi − zj between two neighboring cells exceeds a
certain threshold K. Site i then topples in the following fashion:
zi → zi − 1,
. zj → zj + 1 . (11.35)
If more than one neighbor satisfies the criteria, select one randomly.
– Repeat the first step as long as there is at least a single neighbor j of i
satisfying the condition zi ≥ zj + 1.
The toppling process mimics a local instability, being at the same time
conserving. Sand is lost only at the boundaries. This model leads to true
sandpiles, in the sense that it is highest in the center and lowest at the
428 11 Solutions
Fig. 11.14 Example of a simulation of a one-dimensional realistic sandpile model, see (11.35),
with 60 cells, after 500 (left) and 2000 (right) time step
boundaries, compare Fig. 11.14. Note that there is no upper limit to zi , only to
the slope |zi − zj |.
(6.7) RECURSION RELATION FOR AVALANCHE SIZES
Substituting fn (x) = s Pn (s) x s (where we dropped the functional depen-
dence on the branching probability p), as defined by (6.17), into the recursion
relation (6.19) for the generating functional fn (x), one obtains
2
. Pn+1 (s) x s = x (1 − p) + x p Pn (s) x s . (11.36)
s s
The recursion relation for the probability Pn (s) of finding an avalanche of size
s is obtained by Taylor-expanding the right-hand side of (11.36) in powers of
x s , and comparing prefactors,
x0 :
. Pn+1 (0) = 0,
2
x1 : Pn+1 (1) = (1 − p) + p Pn (0) = 1 − p ,
Q̃n+1 = (1 − p) + p Q̃2n
. (11.39)
is valid, in analogy to the recursion relation (6.19) for the generating function
of avalanche sizes. The case here is simpler, as one can work directly
with probabilities: The probability Q̃n+1 to find an avalanche of duration
1 . . . (n + 1) is the probability (1 − p) to find an a avalanche of length 1
plus the probability p Q̃2n to branch one time step, generating two avalanches
of length 1 . . . n.
In the thermodynamic limit we can replace the difference Q̃n+1 − Q̃n by
the derivative ddn Q̃n
leading to the differential equation
dQ̃n 1 1 1
. = + Q̃2n − Q̃n , for p = pc = , (11.40)
dn 2 2 2
which can easily be solved by separation of variables. The derivative of the
solution Q̃n with respect to n is the probability of an avalanche to have a
duration of exactly n steps.
n
Q̃n =
. , (11.41)
n+2
dQ̃n 2
D(t = n) = = ∼ n−2 .
dn (n + 2)2
Check that (11.41) really solves (11.40), with the initial condition Q̃0 = 0.
(6.9) GALTON–WATSON PROCESS
The generating functions G0 (x) are generically qualitatively similar to the
ones shown in Fig. 6.12, due to the normalization condition G(1) = 1.
430 11 Solutions
The fixpoint condition G(q) = q, Eq. (6.26), has a solution for q < 1
whenever
G (x)
. > 1, W = G (1) > 1 ,
x=1
since G (1) is the expectation value of the distribution function, which is the
mean number of offsprings W .
and C2
DE → Ē D̄ → DE .
.
Fig. 11.15 Left: Solution of three K = 1, N = 3 boolean networks with a cyclic linkage tree
σ1 = f1 (σ2 ), σ2 = f2 (σ3 ), and σ3 = f3 (σ1 ). (a) All coupling functions are the identity. (b) All
coupling functions are the negation. (c) f1 and f2 = negation, f3 identity. Right: Solution for
the N = 4 Kauffman nets shown in Fig. 7.1, σ1 = f (σ2 , σ3 , σ4 ), σ2 = f (σ1 , σ3 ), σ3 = f (σ2 ),
σ4 = f (σ3 ), with all coupling functions f (. . .) being the generalized XOR functions, which count
the parity of the controlling elements
Depending on the initial state, the loops C1 und C2 lead to two cycles each.
Cycles L1 , L2 emerge from C1 , L3 and L4 from C2 ,
L1 :
. 000 → 001 → 101 → 111 → 110 → 010 → 000
L2 : 100 → 011 → 100
L3 : 00 → 11 → 00
L4 : 01 → 10 → 01
The possible combinations of L1 , L2 and L3 , L4 construct all possible
attractor cycles. This leads to two attractor cycles of length two and two
attractor cycles of length six,
Fig. 11.16 Solution of the N = 3, Z = 2 network defined in Fig. 7.3, when using sequential
asynchronous updating. The cycles completely change in comparison to the case of synchronous
updating shown in Fig. 7.3
s(t) = 2φ(t) − 1 .
.
Afterwards one has to consider the probability I(t) for the output function to
be positive, which gives us the recursion equation
The relation between I (t) and φ(t) is still unknown but can be calculated via
∞
I (t) =
. Pξ(t) (x)dx
0
After some calculus you should finally obtain the recursion relation for s(t)
and find both its fixed points and the critical value ηc .
11.8 Solutions to the Exercises of Chap. 8 433
Fig. 11.17 Solution for the 3-site network shown in Fig. 7.3, with the AND function substituted
by an XOR
as there are 2d possibilities for the first step and 2d − 1 for each subsequent
step. Self-retracing paths do however occur and not all of the such constructed
paths are hence distinct.
An exponentially small (as a function of n) fraction of σn (d) of paths of
length n and starting from a given site will be covered with bonds if the bond
covering probability p is lower than 1/λ(d) (because than pλ(d) < 1) and
the percolation transition hence occurs at pc (d) = 1/λ(d). We hence find the
lower bound
1 1
pc (d) =
. > . (11.42)
λ(d) 2d − 1
(continued)
434 11 Solutions
leading to
ZN (T , B) =
. TN = (λ1 )N + (λ2 )N (λ1 )N ,
S1,S2
S1=S2
F (T , B) −kT
. = ln ZN (T , B) ,
N N
and the magnetization per particle by
−1/2
M(T , B) ∂F (T , B) e−4βJ
. =− = 1+ .
N ∂B sinh2 βB
11.8 Solutions to the Exercises of Chap. 8 435
φ+σ −1
xi+1 =
. xi − xi−1 , i > 1, (11.45)
u
which can be cast into the 2 × 2 recursion form
xi 0 1 xi−1
. = φ+σ −1 .
xi+1 −1 u xi
The largest eigenvalues of the above recursion matrix determine the scaling
of xi for large i. For the determination of the error threshold you may solve
for the xi numerically, using (11.45) and the mass-normalization condition
i xi = 1 for the self-consistent determination of the flux φ.
436 11 Solutions
for the steady-state case Ċ = 0 = f˙, viz C = i xi = a/d; no life without
regenerating resources. The growth rates Wii = f ri − d need to vanish (or to
be negative) in the steady state and hence
d
. lim f (t) → min .
t→∞ i ri
Only a single species survives the competition for the common resource (the
one with the largest growth rate ri ), the ecosystem is unstable.
(8.4) COMPETITIVE POPULATION DYNAMICS
The fixpoints of the symmetric sheeps and rabbits Lotka–Volterra
model (8.65) are x∗0 = (0, 0), x∗x = (1, 0), and x∗y = (0, 1), together with the
non-trivial fixpoint
1 (1 − 2x − ky) −kx
x∗k =
. 1, 1 , J = .
1+k −ky (1 − 2y − kx)
The non-trivial fixpoint is stable for weak competition k < 1. In this case,
sheeps and rabbits coexist.
x∗k becomes however a saddle for strong competition k > 1 and the
phenomenon of spontaneous symmetry breaking occurs. Depending on the
initial conditions, the system will flow either to x∗x or to x∗y , even though the
constituting equations of motion (8.65) are symmetric under an exchange of x
and y, with x∗x and x∗y having the identical two Lyapunov exponents (−1) and
(1 − k).
The trivial fixpoint x∗0 is always unstable with a degenerate eigenvalue of
(+1).
(8.5) HYPERCYCLES
The fixpoints (x1∗ , x2∗ ) of the reaction network (8.41) are determined by
0 = x1∗ (α + κ x2∗ − φ)
.
where φ is the flux. The last equation, see (8.42), is used to satisfy the
constraint x1 + x2 = 1, together with x1∗ , x2∗ ≥ 0.
Solving above equations, one finds x1∗ = κ−α ∗
2κ and x2 = 2κ for κ > α.
κ+α
Otherwise, only the trivial solutions (x1 , x2 ) = (0, 1) and (x1∗ , x2∗ ) = (1, 0)
∗ ∗
are fixpoints. Linearizing the equations around the fixpoints leads us to the
matrix
(κ − α)x2∗ − 4κx1∗ x2∗ (κ − α)x1∗ − 2κ(x1∗ )2
.M = .
κx2∗ − 2κ(x2∗ )2 α + κx1∗ − 2αx2∗ − 4κx1∗ x2∗
The cooperating intruder will die and in the next step only defectors will
be present.
438 11 Solutions
intruding cooperators: 3 × S + 1 × R = 4 × 0 + 3 =3
.
defecting neighbors: 3 × P + 1 × T = 3 × 0.5 + 3.5 = 5
The cooperating intruders will die and in the next step only defectors will
be present.
One can go one step further and consider the case of three adjacent intruders.
Not all intruders will then survive for the case of defecting intruders and not
all intruders will die for the case of cooperating intruders.
(8.7) NASH EQUILIBRIUM
The payoff matrix of this game is given by
L L
.A= , L<H,
0H
for the cautious/risky player, where L signifies the low payoff and H the high
payoff. Denoting the number of cautious players by Nc we can compute the
reward for participants playing cautiously or riskily, respectively and from this
the global reward G:
Rc = [LNc + L(N − Nc )] /N = L ,
.
Rr = [0 · Nc + H (N − Nc )] /N = H (N − Nc )/N ,
(N − Nc )2
G(Nc ) = H + Nc L .
N
Differentiation leads to
p (x) p(x) −1
. − 2
c (x) = p(x) , (11.46)
c (x) (c (x)) v
c (x) −c(x)/v ∞
p(x) =
. e , p(x)dx = 1 .
v 0
cmax − ci
cmax = P (xtot ),
. xi = , (11.49)
−P (xtot )
which leads to
(cmax − ci )2
.Ei = .
−P (xtot )
Again, the dispersion relation is strictly quadratic, compare the original case,
Eq. (8.60). That it depends explicitly on xtot is not relevant, as it follows
from (11.48) that cmax −c̄ ∼ 1/N, which leads directly to catastrophic poverty.
Catastrophic poverty arises hence whenever the productivity function P (xtot )
is monotonically decaying.
440 11 Solutions
with damping γ , this contribution vanishes in the limit t → 0 and only the
special solution survives.
(9.2) SELF-SYNCHRONIZATION
We use the ansatz
for the steady-state solution ∝ ωt, together with a small perturbation ∝ eλt .
The steady-state solution is stable whenever the real part of λ is negative.3
Expanding for 1 we obtain
.ω = ω0 − K sin(ωT ), λ = K cos(ωT ) e−λT − 1 , (11.50)
Further on, for T K cos(ωT ) < −1, the exponent λ becomes positive and the
steady-state trajectory ∝ ωt becomes unstable.
Considering for the locking frequency ω graphically the limit K → ∞
in (11.50) on finds that
2π
ωT → n 2π,
. ω=n , n = 0, 1, 2, . . . .
T
The natural frequency ω0 becomes irrelevant and the system self locks with
respect to the delay time T .
(9.3) KURAMOTO MODEL WITH THREE OSCILLATORS
For the Kuramoto system,
K
θ̇i = ωi +
. sin(θj − θi ) , (11.51)
N
j =i
2π
(0, 0),
. (π, 0), (0, π ), (π, π ), ± (1, 1) , (11.55)
3
of which (0, 0) (fully synchronized) is stable for K > 0 (attractive cou-
pling). The last state, (1, 1)2π/3, corresponds to three oscillators separated
respectively by 120◦ , which is the stable configuration for K < 0 (repul-
sive coupling). The remainder three fixpoints, (0, π ), (π, 0) and (π, π ) are
unstable; they correspond to configurations in which two oscillators are
synchronized, with the third one being anti-synchronized.
(9.4) LYAPUNOV EXPONENTS ALONG SYNCHRONIZED ORBITS
For two coupled harmonic oscillators
K K
θ̇1 = ω1 +
. sin(θ2 − θ1 ), θ̇2 = ω2 + sin(θ1 − θ2 ),
2 2
see (9.7), the Jacobian at the fixpoint (9.9) is
K −1 1 ω2 − ω1
. cos(θ2 − θ1 ) , sin(θ2 − θ1 ) = .
2 1 −1 K
x1 (t) = x̄(t) + ct ,
. x2 (t) = x̄(t) − ct ,
The solution x1 (t) = x2 (t) = x̄(t) is stable for |c| < 1, viz for κ > κc . Setting
c = 1 in (11.56) we find
a−1
1 = a − 2κc a,
. κc = ,
2a
which is actually independent of the time delay T . It is instructive to plot
x1 (t) and x2 (t), using a small script or program. Note however, that the
synchronization process may take more and more time with increasing delay
T.
The synchronization process is driven by an averaging procedure, which is
most evident for κ = 1/2 and T = 0. For this setting of parameters, perfect
synchronization is achieved in a single time step.
(9.6) TERMAN–WANG OSCILLATOR
We linearize (9.39) around the fixpoint (x ∗ , y ∗ ), taking at the same time the
limit β → 0,
−1 (x < 0)
. lim tanh(x/β) = −1 + 2Θ(x) = .
β→∞ 1 (x > 0)
.x̃˙ = 3 (1 − x ∗ 2 ) x̃ − ỹ
ỹ˙ = − ỹ
compare Fig. 11.18. For two coupled neurons xA (t) and xB (t) the membrane
potential of the second neuron gets a kick when the first neuron spikes, and
vice versa.
When the two neurons are synchronized, say A spikes just before B, the
membrane potentials differ by just after the spike. Once xA reaches the
firing threshold, the difference xA (t) − xB (t) is smaller than , due to the
444 11 Solutions
1 1
xA(t)
0.8 γ=8 4 2 1 0.8
0.6 0.6
x(t)
0.4 0.4
xB(t)
0.2 -γt -γ 0.2
x(t)=(1-e )/(1-e )
0 0
0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 2.5 3
time t time t
Fig. 11.18 Left: The time development x(t) of a single leaky integrator (11.57). It spikes when
x > 1, resetting the membrane potential x to zero. Right: For γ = 2, two coupled leaky integrators.
When one neuron spikes, the membrane potential of the other neuron is increased by = 0.1
concaveness of the solution (11.57), compare Fig. 11.18. The resulting limit
cycle is stable.
If the two neurons are not synchronized the concaveness of x(t) reduces the
difference of the spiking times for A and B until the limit cycle is reached.
(9.8) SIRS MODEL
The fixpoint equation reads
x ∗ = ax ∗ 1 − (τR + 1)x ∗ ,
.
τR
x̃n+1 = −ax ∗
. x̃n−k + a x̃n 1 − (τR + 1)x ∗ . (11.58)
k=0
No Immunity For τR = 0 the stability condition (11.58) for the trivial fixed
point x ∗ = 0 reduces to
x̃n+1 = a x̃n ,
. leading to the stability condition a < 1.
The analysis for the second fixed point with τR = 0 runs analogously to the
computation concerning the logistic map.4
1 1
x̃n+1 =
. (3 − a)x̃n + (1 − a)x̃n−1 .
2 2
With the common ansatz x̃n = λn for linear recurrence relations, one finds the
conditions
1 3 1 2
. − a+ + a − 14a + 17 < 1, and a 2 − 14a + 17 > 0
4 4 4
for small perturbations to remain small and not to grow exponentially, i.e.
|λ| < 1. In consequence, a has to fulfill
√
1 < a < 7 − 4 2 ≈ 1.34 .
.
Ḣ = −αH Z,
. Ż = (2α − 1)H Z (11.59)
yields
dH α (1 − α)H0 − α α
. = , H = + Z, (11.60)
dZ 1 − 2α 1 − 2α 1 − 2α
where the constant of integration has been chosen such that the initial
condition Z0 = 1 − H0 is respected. Whether zombies or humans are more
effective depends on the value of α
– α > 1/2
Zombies win with α/(1 − 2α) being negative. Humans go extinct when the
population of zombies reaches
α − (1 − α)H0
Z∗ =
. , (11.61)
α
which is positive for α > 1/2 and all H0 ∈ [0, 1].
– α < 1/2
Whether humans manage to clinch on depends on the size of their initial
population. For Z → 0 the formally surviving human population is
% %
∗ (1 − α)H0 − α >0 > α
.H = = for H0 . (11.62)
1 − 2α <0 < 1−α
446 11 Solutions
w = (1, 0),
. v = (0, −1), bw = 0 = bv . (11.63)
σGLU (1, 1) = σ+ σ− ,
. σGLU (1, −1) = σ+ σ+ ,
and equivalently for (−1, −1) and (−1, 1). This is, apart from an overall
normalization, the definition of the XOR gate.
For the AND gate one sets, e.g., v1 = 0 = v2 , bv = −1, which leads to
constant second factor, and w1 = 1 = w2 and bw = 1.5.
11.10 VARIANCE OF A PRODUCT
Trivially, x = Δx + μx holds, where Δx = x − μx , and the same for y.
Consequently one has
Lastly it suffices to note that x and y are independent. One then obtains
2
σxy
. = σx2 + μ2x σy2 + μ2y − μ2x μ2y ,
z 1 z 2 z 2 z
μw =
. , σw2 = (N − z) +z 1− ≈ .
N N N N N
The boundaries,
√ √
(1 + Γ ) Nσw = 2 z ,
.
or
2
2σext
. = 2(1 − Rw
2
)σy2 + 3σy4 . (11.64)
2
σext 4
h≡2
. , a = 1 − Rw
2
, b= , φ ≡ σy (11.65)
σy 3
h = 2aφ + wbφ 3 ,
.
which is exactly Eq. (6.2), which determines the relation between the order
parameter φ and the external field h, where a = 1 − Rw 2 = a ∼ T −T .
c
448 11 Solutions
For h = 0 the order parameter is finite when a < 0, viz when T < Tc or
Rw2 > 1.
μ̃w
μx = μext + Nμw μy = μext + μ̃w μy ,
. μw = , (11.66)
N
where we made use of (10.16). The 1/N scaling for the mean implies that it
drops out of the relation (10.18) for the variance,
2
σw·y
. = N σw2 + μ2w σy2 + μ2y − μ2w μ2y
σ̃w
= σ̃w2 σy2 + μ2y , σw = √ . (11.67)
N
1
dx 1 − e−(x−b) e−(x−μx ) /(2σx ) ,
2 2 2
σy2 + μ2y =
.
2π σx2
or
!
1 − σy2 − μ2y =
. 2π σb2 dxN(x|μb , σb )N (x|μx , σx ). (11.68)
!
(μx −μb )2
2π σb2 −
2(σb2 +σx2 )
= ! e , (11.69)
2π(σb2 + σx2 )
and
μy,n =
. dxσGaus (x − bn )N (x|μx,n , σx,n ) ≈ σGaus (μx,n − bn )
(μx,n −bn )2
1 − 2
1 − σy,n
2
− μ2y,n = ! e 1+2σx,n
1 + 2σx,n
2
ẋ = ϑx,
. Ė = −ϑE, , x(0) = x α E(1) = 2[F α − x(1)] ,
(11.70)
where we normalized layer time, t ∈ [0, 1]. Also included are the boundary
conditions for x(t) (starting) and E(t) (ending). Here we used a different
sign convention for E(t) than in the main text. It is possible to assume that
ϑ(t) ≡ ϑ is uniform when training starts, given that the update procedure
for ϑ is independent of layer time,
d
Δϑ = −ηxE,
. xE = (ϑ − ϑ)xE = 0 ,
dt
x(t) = x α eϑt ,
. E(t) = 2(F α − f α )e−ϑ(t−1) , f α = x α eϑ .
which is always positive (or zero). This statement holds in particular for the
eigenvectors of Θ, which proves that the eigenvalues are positive (or zero).
(10.8) LINEAR ATTENTION
With (10.18), the variance of the entries of the query, key and value vectors
is
2
σQKV
. = Dσ02 σx2 .
1 2 1
σy2 =
.
2
DσQKV σQKV = D 4 σ06 σx6
1−γ 1−γ
for the variance of the elements of the output activity vector. It is desirable
that σy remains constant when D increases.
– When the normalization of the inputs activities does not scale with D, e.g.
when σx = 1, one has σ0 ∼ (1/D)2/3 .
– When input vectors are normalized, such that |xi | = 1, one has σx2 ∼ 1/D
and hence σ0 ∼ (1/D)1/6 .
References
Alemi, A. A., et al. (2015). You can run, you can hide: The epidemiology and statistical mechanics
of zombies. Physical Review E, 92, 052801.
Bettencourt, L. M. A., et al. (2007). Growth, innovation, scaling, and the pace of life in cities.
Proceedings of the National Academy of Sciences, 104, 7301–7306.
Huang, S.-Y., Zou, X.-W., Tan, Z.-J., & Jin, Z.-Z. (2003). Network-induced non-equilibrium phase
transition in the “Game of Life”. Physical Review E, 67, 026107.
Huepe, C., & Aldana-González, M. (2002). Dynamical phase transition in a neural network model
with noise: An exact solution. Journal of Statistical Physics, 108, 527–540.
Index
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 451
C. Gros, Complex and Adaptive Dynamical Systems,
https://doi.org/10.1007/978-3-031-55076-8
452 Index
H rigidity, 256
Hamming distance, 250, 287 Kolmogorov complexity, 198
Harmonic oscillator Kramer’s escape, 121
driven, 327 Kronecker graph, 42
Hausdorff dimension, 93 Kullback-Leibler divergence, 190
Sierpinski carpet, 94 2
.χ test, 191
Hawks and Doves game, 316 Fisher information, 193
Heteroclinic orbit, 55 mutual information, 192
Homocline, 63 Kuramoto model, 329, 329
Hopf bifurcation, 46, 59, 80 critical coupling, 334
subcritical, 60 drifting component, 332, 334
supercritical, 60 locked component, 332, 334
theorem, 60 rhythmic applause, 336
Huepe–Aldana network, 277 time delay, 337
Hydrogen atom, 209
Hypercycle, 306, 306
prebiotic evolution, 309 L
Hysteresis, 66 Lagrange parameter, 182
Landau-Ginzburg model, 205
Langevin equation, 113
I diffusion, 115
Information massless, 116
loss, 251 non-linear, 116
mutual, 185, 188 solution, 114
retention, 251 Laplace operator, 129, 209
routing, 387 Law
theory, 163 circular, random matrices, 368
Intensive (property) large numbers, 167
excess entropy, 196 Ohm, 118
macroecology, 311 power, 210
thermodynamic limit, 5, 178 second, thermodynamics, 177
vector, 306 semi-circle, 13
Ising model, 211 Leaky integrator neuron, 358
deterministic evolution, 290 Learning
transfer matrix, 291 Bayesian, 171
Isocline offline vs. online, 176
perceptron, 362 supervised, 383
Terman–Wang oscillator, 345 Lévy flight, 109
Van der Pol oscillator, 101 Liénard variable, 100
Life
adaptive system, 274
J edge of chaos, 271
Jacobian, 53 origin, 304, 310
Jensen inequality, 189 Limit cycle, 45, 46, 329
Linkage, 245
loop, 262
K Liouville’s theorem, 89
KAM Liquid-gas transition, 204
theorem, 51 Logistic map, 71
torus, 51 bifurcation, 74
Kauffman network, 243 chaos, 75
K.=1, 264 odd, 77
K.=2, 266 Loop
K.=N, 267 linkage, 262
Index 457
W X
Walk XOR problem, 365
adaptive, 298, 303
macroecology, 311
N–K network, 267 Y
random, 106 Yeast cell cycle, 271, 273
Wandering regime, 303
Watts–Strogatz model, 35
Wigner law, 13 Z
Wrightian fitness, 284 Zeldovich equation, 414