0% found this document useful (0 votes)
11 views

Gros C. Complex and Adaptive Dynamical Systems. A Comprehensive Introd. 5ed 2024

The document is the fifth edition of 'Complex and Adaptive Dynamical Systems' by Claudius Gros, providing a comprehensive introduction to complex systems theory and its applications across various scientific fields. It is designed for students and scientists with a basic knowledge of mathematics, featuring a modular approach that allows for flexible study. The new edition includes updated content on modern machine learning architectures and various complex systems topics, along with exercises and further reading suggestions.

Uploaded by

correa.alexander
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Gros C. Complex and Adaptive Dynamical Systems. A Comprehensive Introd. 5ed 2024

The document is the fifth edition of 'Complex and Adaptive Dynamical Systems' by Claudius Gros, providing a comprehensive introduction to complex systems theory and its applications across various scientific fields. It is designed for students and scientists with a basic knowledge of mathematics, featuring a modular approach that allows for flexible study. The new edition includes updated content on modern machine learning architectures and various complex systems topics, along with exercises and further reading suggestions.

Uploaded by

correa.alexander
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 468

Claudius Gros

Complex
and Adaptive
Dynamical
Systems
A Comprehensive Introduction
Fifth Edition
Complex and Adaptive Dynamical Systems
Claudius Gros

Complex and Adaptive


Dynamical Systems
A Comprehensive Introduction
Fifth Edition
Claudius Gros
Institute for Theoretical Physics
Goethe University Frankfurt
Frankfurt/Main, Germany

ISBN 978-3-031-55075-1 ISBN 978-3-031-55076-8 (eBook)


https://doi.org/10.1007/978-3-031-55076-8

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2008, 2011, 2013, 2015, 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Paper in this product is recyclable.


May our own thoughts surprise us.
Preface

Complexity, Emergence and Self Organization Complex systems theory deals


with dynamical systems showing a large variety of emergent phenomena. Examples
relevant to all realms of science abound.

– Powerlaw distributions in evolving networks.


– Deterministic chaos in one-dimensional maps.
– Strange attractors in adaptive dynamical systems.
– Spontaneous pattern formation in reaction-diffusion systems.
– Self-organized criticality in cellular automata.
– Organisms living at the edge of chaos.
– The rise of life in terms of hypercycles.
– Spontaneous synchronization of interacting oscillators.
– Neural networks underpinning modern machine learning algorithms.

These are a few examples treated in this textbook. Complex systems theory is a
fascinating subject becoming increasingly important in our socio-scientific culture,
with its rapidly growing degrees of complexity. It is also a deeply interdisciplinary
subject, providing the mathematical basis for applications in fields ranging from
the social sciences and psychology on one side to the neurosciences and artificial
intelligence on the other side.

Readership and Preconditions This primer is intended for students and scientists
from the natural and social sciences, engineering, informatics, experimental psy-
chology, or neuroscience. Technically, the reader should have a basic knowledge
of mathematics as it is used in the natural sciences or engineering. This textbook
is suitable both for studies in conjunction with teaching courses as well as for the
individual reader.

Course Material and the Modular Approach When used for teaching, this
primer is suitable for a course running over one or two terms, depending on the
pace and on the number of chapters covered. In general, there should be no problem
to follow the mathematics involved. Considerable care has been taken to perform
the respective derivations on a step-by-step basis.

vii
viii Preface

Individual chapters may be skipped, whenever time considerations demand it.


The book follows a modular approach and the individual chapters are, as far as
possible, independent of each other. Notwithstanding, cross references between the
different chapters are included throughout the text, since interrelations between
distinct topics may be helpful for a thorough understanding.

Style This interdisciplinary primer sets a high value on conveying concepts and
notions within their respective mathematical settings. Believing that a concise style
helps the reader to go through the material, this textbook mostly abstains from long
text passages with general background explanations or philosophical considerations.
Widespread use has been made of paragraph headings with the intention to facilitate
scientific reading.

A Primer to Scientific Common-Sense Knowledge To a certain extent, one


can regard this textbook as a primer to a wide range of scientific common-sense
knowledge regarding complex systems. A basic expertise about life’s organizational
principles, to give an example such as the notion of “life at the edge of chaos”,
is important in today’s world to an educated scientist. Other areas of scientific
common-sense knowledge discussed in this primer include network theory, which
has applications ranging from social networks to gene expression networks, the
fundamentals of evolution, cognitive systems theory, as well as the basic principles
of dynamical systems, self organization and information theory.

Exercises and Suggestions for Further Readings Towards the end of each
chapter, a selection of exercises is presented. The respective solutions can be found
in a dedicated chapter. In addition, the section “Further Reading” at the end of each
chapter contains references to standard introductory textbooks and review articles.
Also listed are selected articles for further in-depth studies dealing with specific
issues treated within the respective chapter.

Content of the Fifth Edition Since its first printing in 2008, this textbook has seen
several extensions. The present fifth edition has been completely revised. As a main
addition, the complex systems theory of modern machine learning architectures has
been included in a new chapter. In addition, a range of new sections have been
added to existing chapters, each dealing with an exciting topic. Such as “Rate
Induced Tipping”, “Partially Predictable Chaos”, the “Tragedy of the Commons”,
and “Piecewise Linear Dynamical Systems”, to mention a few. The field of complex
systems is continuously evolving, and so is this book.

Acknowledgements I would like to thank Urs Bergmann, Christoph Bruder, Dante


Cialvo, Tejaswini Dalvi, Florian Dommert, Barbara Drossel, Rodrigo Echeveste,
Bernhard Edegger, Florian Greil, Tim Herfurth, Chistoph Herold, Philipp Hövel,
Gregor Kaczor, Guillermo Ludueña, Maripola Kolokotsa, Jay Lennon, Mathias
Linkerhand, Wolfgang Mader, Dimitrije Marković, Anne Metz, Zoltán Néda, Bulću
Sándor, Ludger Santen, Daniel Nevermann, H.G. Schuster and DeLiang Wang for
Preface ix

their comments, for the preparation of figures and the reading of the manuscript.
Particular thanks go to Roser Valentí for continuing support.

Frankfurt/Main, Germany Claudius Gros


January 2024
Contents

1 Network Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Properties of Real-World Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 The Small World Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Basic Graph-Theoretical Concepts . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Network Degree Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Spectral Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.1 Graph Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3 Percolation in Generalized Random Graphs . . . . . . . . . . . . . . . . . . . . . . . 17
1.3.1 Graphs with Arbitrary Degree Distributions . . . . . . . . . . . . . 17
1.3.2 Probability Generating Function Formalism . . . . . . . . . . . . . 23
1.3.3 Distribution of Component Sizes . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.4 Robustness of Random Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.5 Small-World Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.6 Scale-Free Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2 Bifurcations and Chaos in Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.1 Basic Concepts of Dynamical Systems Theory . . . . . . . . . . . . . . . . . . . 45
2.2 Fixpoints, Bifurcations and Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.2.1 Fixpoints Classification and Jacobian . . . . . . . . . . . . . . . . . . . . 53
2.2.2 Bifurcations and Normal Forms . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.2.3 Hopf Bifurcations and Limit Cycles . . . . . . . . . . . . . . . . . . . . . 59
2.3 Global Bifurcations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.3.1 Infinite Period Bifurcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.3.2 Catastrophe Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.3.3 Rate Induced Tipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.4 Logistic Map and Deterministic Chaos . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.4.1 Colliding Attractors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.5 Dynamical Systems with Time Delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.5.1 Distributed Time Delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

xi
xii Contents

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3 Dissipation, Noise and Adaptive Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.1 Chaos in Dissipative Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.1.1 Phase Space Contraction and Expansion . . . . . . . . . . . . . . . . . 87
3.1.2 Strange Attractors and Dissipative Chaos . . . . . . . . . . . . . . . . 91
3.1.3 Partially Predictable Chaos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.2 Adaptive Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.2.1 Conserving Adaptive Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.3 Diffusion and Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.3.1 Random Walks, Diffusion and Lévy Flights . . . . . . . . . . . . . 106
3.3.2 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.4 Stochastic Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.4.1 Langevin Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.4.2 Stochastic Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.5 Noise-Controlled Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.5.1 Fokker–Planck Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.5.2 Stochastic Escape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.5.3 Stochastic Resonance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4 Self Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.1 Interplay Between Diffusion and Reaction . . . . . . . . . . . . . . . . . . . . . . . . 129
4.1.1 Travelling Wavefronts in the Fisher Equation . . . . . . . . . . . 131
4.1.2 Sum Rule for the Shape of the Wavefront. . . . . . . . . . . . . . . . 135
4.1.3 Self-Stabilization of Travelling Wavefronts. . . . . . . . . . . . . . 136
4.2 Interplay Between Activation and Inhibition . . . . . . . . . . . . . . . . . . . . . . 138
4.2.1 Turing Instability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.2.2 Pattern Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4.2.3 Gray–Scott Reaction Diffusion System . . . . . . . . . . . . . . . . . . 142
4.3 Collective Phenomena and Swarm Intelligence . . . . . . . . . . . . . . . . . . . 147
4.3.1 Phase Transitions in Social Systems . . . . . . . . . . . . . . . . . . . . . 147
4.3.2 Collective Decision Making and Stigmergy . . . . . . . . . . . . . 149
4.3.3 Collective Behavior and Swarms . . . . . . . . . . . . . . . . . . . . . . . . . 152
4.3.4 Opinion Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
4.4 Car Following Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
4.4.1 Linear Flow and Carrying Capacity . . . . . . . . . . . . . . . . . . . . . . 157
4.4.2 Self-Organized Traffic Congestions . . . . . . . . . . . . . . . . . . . . . . 158
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Contents xiii

5 Information Theory of Complex Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163


5.1 Probability Distribution Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
5.1.1 Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.1.2 Bayesian Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.1.3 Statistical Binning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
5.1.4 Time Series Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
5.2 Entropy and Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
5.2.1 Maximal Entropy Distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . 182
5.2.2 Minimal Entropy Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
5.2.3 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
5.2.4 Kullback-Leibler Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
5.3 Complexity Measures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5.3.1 Complexity and Predictability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
5.3.2 Algorithmic and Generative Complexity. . . . . . . . . . . . . . . . . 198
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
6 Self-Organized Criticality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
6.1 Landau Theory of Phase Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
6.2 Criticality in Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
6.2.1 1/f Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
6.3 Cellular Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
6.3.1 Conway’s Game of Life. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
6.3.2 Forest Fire Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.4 Sandpile Model and Self-Organized Criticality . . . . . . . . . . . . . . . . . . . 218
6.4.1 Absorbing Phase Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
6.5 Random Branching Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
6.5.1 Branching Theory of Self-Organized Criticality. . . . . . . . . 222
6.5.2 Galton–Watson Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
6.6 Application to Long-Term Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
7 Random Boolean Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
7.2 Random Variables and Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
7.2.1 Boolean Variables and Graph Topologies . . . . . . . . . . . . . . . . 243
7.2.2 Coupling Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
7.2.3 Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
7.3 Dynamics of Boolean Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
7.3.1 Flow of Information Through a Network . . . . . . . . . . . . . . . . 250
7.3.2 Mean-Field Phase Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
7.3.3 Bifurcation Phase Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
7.3.4 Scale-Free Boolean Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
xiv Contents

7.4 Cycles and Attractors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260


7.4.1 Quenched Boolean Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
7.4.2 K = 1 Kauffman Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
7.4.3 K = 2 Kauffman Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
7.4.4 K = N Kauffman Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
7.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
7.5.1 Living at the Edge of Chaos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
7.5.2 Yeast Cell Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
7.5.3 Application to Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 274
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
8 Darwinian Evolution, Hypercycles and Game Theory . . . . . . . . . . . . . . . . . 279
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
8.2 Mutations and Fitness in a Static Environment . . . . . . . . . . . . . . . . . . . . 282
8.3 Deterministic Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
8.3.1 Evolution Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
8.3.2 Beanbag Genetics: Evolution Without Epistasis . . . . . . . . . 290
8.3.3 Epistatic Interactions and the Error Catastrophe. . . . . . . . . 292
8.4 Finite Populations and Stochastic Escape . . . . . . . . . . . . . . . . . . . . . . . . . 297
8.4.1 Adaptive Climbing Under Strong Selective Pressure. . . . 298
8.4.2 Adaptive Climbing vs. Stochastic Escape. . . . . . . . . . . . . . . . 302
8.5 Prebiotic Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
8.5.1 Quasispecies Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
8.5.2 Hypercycles and Autocatalytic Networks . . . . . . . . . . . . . . . . 306
8.6 Macroecology and Species Competition. . . . . . . . . . . . . . . . . . . . . . . . . . . 310
8.7 Coevolution and Game Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
8.7.1 Tragedy of the Commons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
9 Synchronization Phenomena . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
9.1 Frequency Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
9.2 Coupled Oscillators and the Kuramoto Model . . . . . . . . . . . . . . . . . . . . 329
9.3 Synchronization in the Presence of Time Delays. . . . . . . . . . . . . . . . . . 337
9.4 Synchronization Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
9.4.1 Aggregate Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
9.4.2 Causal Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
9.5 Piecewise Linear Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
9.6 Synchronization Phenomena in Epidemics . . . . . . . . . . . . . . . . . . . . . . . . 351
9.6.1 Continuous Time SIRS Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
Contents xv

10 Complexity of Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361


10.1 Computation Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
10.1.1 Structured Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
10.1.2 The XOR Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
10.2 Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
10.2.1 Random Matrix Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
10.2.2 Criticality in Recurrent Networks . . . . . . . . . . . . . . . . . . . . . . . . 369
10.3 Neural Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
10.3.1 Residual Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
10.4 Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
10.4.1 Multivariate Gaussians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
10.4.2 Correlated Stochastic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 379
10.4.3 Machine Learning with Neural Tangent Kernels . . . . . . . . 382
10.5 Attention Induced Information Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
10.5.1 Transformer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
11 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
11.1 Solutions to the Exercises of Chap. 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
11.2 Solutions to the Exercises of Chap. 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
11.3 Solutions to the Exercises of Chap. 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
11.4 Solutions to the Exercises of Chap. 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
11.5 Solutions to the Exercises of Chap. 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
11.6 Solutions to the Exercises of Chap. 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
11.7 Solutions to the Exercises of Chap. 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
11.8 Solutions to the Exercises of Chap. 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
11.9 Solutions to the Exercises of Chap. 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
11.10 Solutions to the Exercises of Chap. 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
Network Theory
1

Dynamical and adaptive networks are the backbone of many complex systems.
Examples range from ecological prey–predator networks to the gene- and protein
expression networks on which all living creatures are based. The basis of our own
identity is provided by the highly sophisticated neural network we carry in our
heads. On a social level we interact through social and technical networks like the
Internet. Networks are ubiquitous throughout the domain of all living creatures.
A good understanding of network theory is of basic importance for complex
systems theory. In this chapter we will discuss the most important notions of
graph theory, like clustering and degree distributions, together with basic network
realizations. Central concepts like percolation, the robustness of networks with
regard to failure and attacks, and the “rich-get-richer” phenomenon in evolving
social networks will be treated.

1.1 Properties of Real-World Networks


1.1.1 The Small World Effect

Eight or more billion humans live on earth and it might seem that the world is a big
place. But, as an Italian proverb states,

Tutto il mondo è paese. – The world is a village.

The network of who knows whom—the network of acquaintances—is indeed


quite densely webbed. Modern scientific investigations mirror this century-old
proverb.

Social Networks Stanley Milgram performed a by now famous experiment in the


1960s. He distributed a number of letters addressed to a stockbroker in Boston to
a random selection of people in Nebraska. The task was to send these letters to the

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 1


C. Gros, Complex and Adaptive Dynamical Systems,
https://doi.org/10.1007/978-3-031-55076-8_1
2 1 Network Theory

Fig. 1.1 Bottom: A network


with two kind of nodes, a 8
bipartite networks. The letters
A–D may stand for distinct
3 7
companies, the numerals 1–8
for managers serving on the
respective boards of directors.
Top: Managers serving on the 4 5 6
same boards of directors are
acquainted, forming a densely
webbed social network 2 1

A B C D

1 2 3 4 5 6 7 8

addressee (the stockbroker) via mail to an acquaintance of the respective sender. In


other words, the letters were to be sent via a social network.
The initial recipients of the letters clearly did not know the Boston stockbroker
on a first-name basis. Their best strategy was to send their letter to someone whom
they felt was closer to the stockbroker, socially or geographically: perhaps someone
they knew in the financial industry, or a friend in Massachusetts.

Six Degrees of Separation About 20% of Milgram’s letters did eventually reach
their destination. Milgram found that it had only taken an average of six steps
for a letter to get from Nebraska to Boston. This result, dubbed “six degrees of
separation”, suggests that it should be possible to connect any two persons on earth
via social networks in a similar number of steps.

SMALL-WORLD EFFECT The “small-world effect” denotes the result that the average
distance linking two nodes belonging to the same network can be orders of magnitude
smaller than the number of nodes making up the network.

The small-world effect occurs in all kinds of networks. Milgram examined the
networks of friends, which may originate from participating in common activities.
This is the case also for other social networks, as illustrated in Fig. 1.1. An example
are managers serving together on the board of directors of the same company, which
results in a dense network of acquaintances between the managers.

Networks Are Everywhere Social networks are but just one important example
of a communication network. Most human communication takes place directly
among individuals. The spreading of news, rumors, jokes and of diseases takes place
1.1 Properties of Real-World Networks 3

Fig. 1.2 A protein


interaction network, showing
a complex interplay between
highly connected hubs and
communities of subgraphs
with increased densities of
edges. Reprinted from Palla
et al. (2005) with
permissions, © 2005 Nature
Publishing Group

Sui1
Hcr1
43S complex and Prt1 Tif34
protein metabolism Ribosome
Sua7
Tif5 biogenesis/assembly
Tif35 Nog2
Cdc68 Nop7
Cka2 Rpf2 Mak11
Rpg1 Tif6
Bud20Nsa2
CK2 complex and
Dbp10 Ytm1 Puf6 Arx1
transcription regulation Ckb2 Rlp7
Has1 Nug1 Ycr072c
Nop15
Ckb1 Erb1
Mak5 Sda1
Nop12 Mak21
Abf1 Cka1
Hta1 Nop4
Arp4 Nop2
Nop6 Brx1
Protein phosphatase Cic1
Hht1 Htb1 Mdn1 Rrp12
type 2A complex (part)
Rts3 Hhf1 DNA packaging,
Tpd3 Sir4
Pph21 chromatin assembly
Sir3
Cdc55
Sir1
Chromatin silencing
Pph22
Zds2 Rrp14
Hst1
Sif2
Hos2 Cell polarity,
Cla4 budding
Zds1 Gic1
Hos4 Bni5
Gic2 Cdc12
Snt1 Gin4
Bob1
Rga1
Cph1 Set3
Cdc42 Cdc11 Shs1
Set3c Ste20 Bem1
complex Far1
Kcc4 Cdc3
Cdc10
Cdc24 Pheromone response Cytokinesis
(cellular fusion) (septin ring)

by contacts between individuals. And we are all aware that rumors and epidemic
infections can spread fast in densely webbed social networks.
Communication networks are ubiquitous. Well known examples are the Internet
and the world-wide web. Inside a cell the constituent proteins form an interacting
network, as illustrated in Fig. 1.2. The same holds for artificial neural networks
and for the scaffolding of our brain, neural networks. It is therefore important to
understand the statistical properties of core network classes.
4 1 Network Theory

Fig. 1.3 Random graphs 3 3


with .N = 12 nodes and 4 2 4 2
connection probabilities 5 1 5 1
.p = 0.0758 and .p = 0.3788
(left/right). Three mutually
6 0 6 0
connected vertices, such
(0,1,7), contribute to the
clustering coefficient 7 11 7 11

8 10 8 10
9 9

1.1.2 Basic Graph-Theoretical Concepts

We start with some basic concepts allowing to characterize graphs and real-world
networks. We will use the terms graph and network as interchangeable, vertex, site
and node as synonyms, and either edge or link.

Degree of a Vertex A graph is made out of N vertices connected by edges, with


the degree k of a vertex being given by the number of edges linking to it. Nodes
having a degree k substantially above the average are denoted “hubs”, they are the
VIPs of network theory.

Coordination Number The simplest type of network is the random graph. It is


characterized by only two numbers: By the number of vertices N and by the average
degree z, also called the coordination number.

COORDINATION NUMBER The coordination number z is the average number of links per
vertex, viz the average degree.

A graph with an average degree z has .Nz/2 connections. Alternatively we can define
with p the probability to find a given edge.

CONNECTION PROBABILITY The probability that a given edge occurs is called the
connection probability p.

Erdös–Rényi Graphs We can construct a specific type of a random graph with


N nodes by drawing .Nz/2 lines, the edges, between randomly chosen pairs of
nodes, compare Fig. 1.3. This type of random graph is called an “Erdös–Rényi”
random graph after two mathematicians who studied this type of graph extensively.
In Sect. 1.3 we will introduce and study other types of random graphs.
For Erdös–Rényi random graphs we have

Nz 2 z
p=
. = (1.1)
2 N(N − 1) N −1
1.1 Properties of Real-World Networks 5

for the relation between the coordination number z and the connection probability
p. In (1.1) we divided the total number of active links, .N z/2, by the total number
of possible links, .N(N − 1)/2.

The Thermodynamic Limit Mathematical graph theory is often concerned with


the thermodynamic limit.

THE THERMODYNAMIC LIMIT The limit where the number of elements making up a
system diverges to infinity is called the “thermodynamic limit” in physics. A quantity is
extensive if it is proportional to the number of constituting elements, and intensive if it
scales to a constant in the thermodynamic limit.

We note that .p = z/(N − 1) → 0 in the thermodynamic limit .N → ∞ for


Erdös–Rényi random graphs when the coordination number is intensive, viz when
.z ∼ O(N ).
0

For real-world networks, the thermodynamic limit makes sense for very large
numbers of nodes. This is the case for the world-wide web, the network of
hyperlinks, which is so large that its size can only be estimated. In order of
magnitude, the number of documents available online did grow from .N ≃ 0.8 × 109
in 1999 to .N ≃ 3.4 × 1015 in 2023.

Network Diameter and the Small-World Effect A basic parameter characteriz-


ing networks is the diameter D, which specifies the maximal separation between all
possible pairs of vertices. For a random network with N vertices and coordination
number z we have,

zD ≈ N,
. D ∝ log N/ log z , (1.2)

in order of magnitude, since nodes have z neighbors, .z2 next-nearest neighbors, and
so on. The logarithmic increase in the number of degrees of separation with the size
of the network is characteristic of small-world networks. A logarithm increases very
slowly with its argument, which implies that network diameters remains small even
for networks containing large numbers of nodes.
For large networks, the diameter is closely related to the average distance.

AVERAGE DISTANCE The average distance .𝓁 is the average of the minimal path length
between all pairs of nodes of a network.

Stochastic consideration will use in general the average distance .𝓁.

Clustering in Networks Real-world networks, as the protein network reproduced


in Fig. 1.2, contain substantial local recurrent connections. Distinct topological
elements and loops are hence frequent.

CLUSTERING COEFFICIENT The clustering coefficient C is the average fraction of pairs


of neighbors of a node that are also neighbors of each other.
6 1 Network Theory

The clustering coefficient is a normalized measure of loops of length three. In a


fully connected network, in which everyone knows everyone else, .C = 1.
In a random graph a typical site has .z(z−1)/2 pairs of neighbors. The probability
of an edge to be present between a given pair of neighbors is .p = z/(N − 1),
see (1.1). The clustering coefficient, which is just the probability of a pair of
neighbors to be interconnected is therefore
z z
Crand =
. ≈ . (1.3)
N −1 N

The clustering coefficient scales to zero in the thermodynamic limit, viz for large
networks. In Table 1.1 the respective clustering coefficients for some real-world
networks and for the corresponding random networks are listed for comparison.

Cliques The clustering coefficient measures the normalized number of triples of


fully interconnected vertices. Cliques generalize the notion of clustering to larger
subgraphs, see Fig. 1.4.

CLIQUES A clique is a set of vertices for which (a) every node is connected by an edge
to every other member of the clique and (b) no node outside the clique is connected to all
members of the clique.

The term “clique” comes from social networks. A clique is a group of friends
where everybody knows everybody else. A clique corresponds, in terms of graph
theory, to a maximal fully connected subgraph. For Erdös–Rényi graphs with N
vertices and linking probability p, the number of cliques is
   N−K
N
. p K(K−1)/2 1 − pK ,
K

for cliques of size K. Here


N 
– . K is the number of sets of K vertices,
– .pK(K−1)/2 is the probability that all K vertices within the clique are mutually
interconnected and

Table 1.1 The number of nodes N , average degree of separation .𝓁, and clustering coefficient
C, for four real-world networks. The last column is the value .Crand = z/(N − 1) the clustering
coefficient would take for a random graph with the same size and coordination number. From
Watts and Strogatz (1998) and Ludueña et al. (2013)
Network N .𝓁 C .Crand

Movie actors 225,226 3.65 0.79 0.00027


Neural network 282 2.65 0.28 0.05
Power grid 4941 18.70 0.08 0.0005
Internet domains 6,242,195 – 0.20 0.000006
1.1 Properties of Real-World Networks 7

Fig. 1.4 The social graph of


Fig. 1.1, with the four-site 8
clique (1,2,3,4) highlighted.
The network contains three
additional cliques, (3,4,5,7), 3 7
(5,6) and (7,8)

4 5 6

2 1

 N −K
– .1 − pK the probability that every of the .N − K out-of-clique vertices is
not connected to all K vertices of the clique.

The only cliques occurring in random graphs in the thermodynamic limit have the
size two, since .p = z/N.

Communities Another term used is “community”. It is mathematically not as


strictly defined as ‘clique’, denoting subgraphs with above-the-average densities of
edges. Communities naturally emerge when decimating a bipartite graph stochas-
tically. For the example shown in Fig. 1.1 one would assume that managers are
not automatically acquainted when serving on the same board of directors, but
with a certain probability, say 80%. Instead of cliques, the respective network
of acquaintances between managers will be characterized by a dense community
structure.

Clustering for Real-World Networks Most real-world networks have clustering


coefficients that are substantially larger than the one a random network of the
same size would have. This is immediately evident for the protein network shown
in Fig. 1.2, which is characterized by a rich community structure and an elevated
clustering coefficient.
In Table 1.1, we list C and the average distance .𝓁 for four different networks.

– A network of collaborations between movie actors.


– The neural network of the worm C. Elegans.
– A power grid of the United States.
– The networks of hyperlinks between domains (not webpages).

Also given in Table 1.1 are the clustering coefficients .Crand of random graphs
of identical size and coordination number. Note that the real-world value is
systematically higher than that of random graphs. The small values for the average
8 1 Network Theory

distances .𝓁 given in Table 1.1 for three of the four networks relects their small-world
nature.
Erdös–Rényi random graphs obviously do not match the properties of real-world
networks well. In Sect. 1.3 we will discuss generalizations of random graphs that
approximate the properties of real-world graphs better. Before, we will discuss
several general properties of random graphs.

Correlation Effects The degree distribution .pk captures the statistical properties
of nodes as if they where all independent of each other. In general, the property
of a given node will however be dependent on the properties of other nodes, e.g.
of its neighbors. When this happens one speaks of “correlation effects”, with the
clustering coefficient C being an example.
Another example for a correlation effect is what one calls “assortative mixing”.
A network is assortatively correlated whenever large-degree nodes, the hubs, tend
to be mutually interconnected and assortatively anti-correlated when hubs are pre-
dominantly linked to low-degree vertices. Social networks tend to be assortatively
correlated, in agreement with the everyday experience that the friends of influential
persons, the hubs of social networks, tend to be VIPs themselves.

Tree Graphs Most real-world networks show strong local clustering and loops
abound. An exception are metabolic networks which contain typically only very
few loops since energy is consumed and dissipated when biochemical reactions go
around in circles.
For many types of graphs commonly considered in graph theory, like Erdös–
Rényi graphs, the clustering coefficient vanishes in the thermodynamic limit, and
loops become irrelevant. One denotes a loopless graph a “tree graph”.

Bipartite Networks Many real-world graphs have an underlying bipartite struc-


ture, see Fig. 1.1.

BIPARTITE GRAPH A bipartite graph has two kinds of vertices with links only between
vertices of unlike kinds.

Examples are networks of managers, where one kind of nodes is a company


and the other kind of nodes the managers belonging to the board of directors.
When eliminating one kind of vertices, in this case it is customary to eliminate
the companies, one retains a social network; the network of directors, as illustrated
in Fig. 1.1. This network has a high clustering coefficient, as all boards of directors
are mapped onto cliques of the respective social network.
1.1 Properties of Real-World Networks 9

1.1.3 Network Degree Distribution

So far we considered mostly averaged quantities of random graphs, like the


clustering coefficient or the average coordination number z. We will now develop
tools allowing for a more sophisticated characterization of graphs.

Degree Distribution The basic description of general random and non-random


graphs is given by the degree distribution .pk .

DEGREE DISTRIBUTION If .Xk is the number of vertices having the degree k, then .pk =
Xk /N is the degree distribution, where N is the total number of nodes.

The
 degree distribution is a probability distribution function and hence normalized,
. k pk = 1.

Degree Distribution for Erdös–Rényi Graphs For an Erdös–Rényi network, the


probability of a node to have k edges is given by the Binomial distribution,
 
N −1 k
pk =
. p (1 − p)N −1−k , (1.4)
k

where p is the link connection probability. For large .N ⪢ k we can approximate


the degree distribution .pk by

(pN)k zk
.pk ≃ e−pN = e−z , (1.5)
k! k!
where z is the average coordination number. We have used
  
x N N −1 (N − 1)! (N − 1)k
. lim 1− = e−x , = ≃ ,
N →∞ N k k!(N − 1 − k)! k!

and .(N − 1)k pk = zk , see (1.1). Equation (1.5) is a Poisson distribution with mean
∞ ∞
−z zk zk−1
〈k〉 =
. ke = z e−z = z.
k! (k − 1)!
k=0 k=1

The variance√of the Poisson distribution is z, which implies that the relative width
〈k〉/σ = 1/ k becomes progressively smaller when the coordination number z
.

increases.

Ensemble Fluctuations In general, two specific realizations of random graphs


differ. Their properties coincide on the average, but not on the level of individual
links. With “ensemble” one denotes the set of possible realizations. The set of all
possible realization of three-site networks is shown in Fig. 1.5.
10 1 Network Theory

B B B B

C C C C

A A A A

Fig. 1.5 Four networks with three nodes exist, respectively with .0/1/2/3 edges (from left to
right). Given a link probability p, the probability to generate networks with .0/1/2/3 edges is
.(1 − p) , .3p(1 − p) , .3p (1 − p) and .p . One speaks of ensemble fluctuations, compare (1.7)
3 2 2 3

In an ensemble of random graphs with fixed p and N the degree distribution


Xk /N will be slightly different from one realization to the next. On the average it
.

will be given by

1
. 〈Xk 〉 = pk . (1.6)
N
Here .〈. . .〉 denotes the ensemble average. One can go one step further and calculate
the probability .P (Xk = R) that in a realization of a random graph the number of
vertices with degree k equals R. It is given in the large-N limit by

(λk )R
P (Xk = R) = e−λk
. , λk = 〈Xk 〉 . (1.7)
R!
Note the similarity to (1.5) and that the mean .λk = 〈Xk 〉 is in general extensive
while the mean z of the degree distribution (1.5) is intensive.

Power Law Distribution A special case are degree distributions decaying for large
degrees as a power law,

1
pk ∼
. , k → ∞. (1.8)

Typically, for real-world graphs, the scaling .∼ k −α holds only for large degrees
k. The divergence at .k → 0 is avoided by the “Tsallis–Pareto distribution” .pk =
(α − 1)(1 + k)−α . For the normalization on .R + we consider .k ∈ [0, K], with
.K → ∞,

K K dk −1 K 1
. pk ≈ (α − 1) = =1− .
0 (1 + k)α (1 + k)α−1 k=0 (1 + K)α−1
k=0
1.2 Spectral Properties 11

Fig. 1.6 Log-log plot of the 2

log10(degree distribution)
distribution of the in-degrees slope -2.2
of the hyperlink network 6,242,194 domains
between 6,242,194 domains. 0
To a fair degree, the degree
distribution is scale invariant,
.pk ∼ 1/k . The value of the
α
-2
exponent, .α ≈ 2.2, implies
that the mean degree is finite,
with a diverging variance. At -4
the time of the crawl,
.〈k〉 = 35.8 was found. Data
2 3 4 5
from Ludueña et al. (2013)
log10(in-degree)

Powerlaw distributions are hence normalizable only if .α > 1. The average degree
of the Tsallis–Pareto distribution is
K k+1−1 α−1 K
〈k〉 = (α − 1)
. dk = (1 + k)2−α − 1,
0 (1 + k) α 2−α 0

which diverges when .α ≤ 2. Otherwise, when .α > 2, one has .〈k〉 = 1/(α − 2). The
variance is finite in analogy when .α > 3.

Scale-Free Graphs Power-law functional relations are “scale free”, since any
rescaling .k → a k can be reabsorbed into the normalization constant. A famous
example of a scale-invariant degree distribution is that of the in-degree of the
hyperlink network between domains, as shown in Fig. 1.6.
Scale-free functional dependencies are considered to be ‘critical’, since they
occur generally at the critical point of a phase transition. We will come back to
this issue recurrently in the following chapters.

1.2 Spectral Properties

In order to understand a given real-world network it is desirable to develop


characterization schemes relying just on a few key observables, e.g. like the
coordination number z or the clustering coefficient C. As a further step in this
direction we study now spectral properties.

Adjacency Matrix Any graph G with N nodes can be represented by a matrix


encoding the topology of the network, the adjacency matrix.

ADJACENCY MATRIX The .N × N adjacency matrix . has elements .Aij = 1 if nodes i
and j are connected, with .Aij = 0 if they are not connected.
12 1 Network Theory

Fig. 1.7 The spectral


densities (1.9) for N=100
N=1000
Erdös–Rényi graphs with a 0.1 semi-circle law
coordination number .z = 25
and .N = 100/1000 nodes
respectively. For comparison

ρ(λ)
the expected result for the
thermodynamic limit
.N → ∞, the semi-circle law 0.05
(1.15), is given. A broadening
of .ϵ = 0.1 has been used, as
defined by (1.11)

0
-10 -5 0 5 10
λ

The adjacency matrix is symmetric and consequently has N real eigenvalues.

SPECTRUM OF A GRAPH The spectrum of a graph G is given by the set of eigenvalues .λi
of the adjacency matrix .Â.

A graph with N nodes has N eigenvalues .λi and it is useful to define the
corresponding “spectral density”

1
ρ(λ) =
. δ(λ − λj ), dλ ρ(λ) = 1 , (1.9)
N
j

where .δ(λ) is the Dirac delta function. In Fig. 1.7 the spectral density of an Erdös–
Rényi graph is given.

Green’s Function The spectral density .ρ(λ) can be evaluated once the Green’s
function1 .G(λ),

1 1 1 1
G(λ) =
. Tr = , (1.10)
N λ − Â N λ − λj
j

is known. Here .Tr[. . .] denotes the trace over the matrix .(λ − Â)−1 = (λ1 − Â)−1 ,
where .1 is the identity matrix. In complex plane, we can decompose the inverse,

1 1
. lim =P − iπ δ(λ − λj ) . (1.11)
ε→0 λ − λj + iϵ λ − λj

1 The reader without prior experience with Green’s functions may skip the following derivation
and pass directly to the result, namely to (1.15).
1.2 Spectral Properties 13

where P denotes the principal part, which takes into account that the positive and
the negative contributions of the .1/λ divergence cancel. We find the relation

1
ρ(λ) = −
. lim ImG(λ + iε) . (1.12)
π ε→0

Semi-Circle Law The graph spectra can be evaluated for random matrices2 for the
case of small link densities .p = z/N, where z is the average connectivity. We note
that the Green’s function (1.10) can be expanded into powers of .Â/λ,
⎡  2 ⎤
1 Â Â
.G(λ) = Tr ⎣1 + + + ...⎦ . (1.13)
Nλ λ λ

Traces over powers of the adjacency matrix . can be interpreted, for random graphs,
as random walks which need to come back to the starting vertex.
Starting from a random site we can connect on the average to z neighboring
sites and from there on to .z − 1 next-nearest neighboring sites, and so on. This
consideration allows to recast the Taylor expansion (1.13) for the Green’s function
into a recursive continued fraction,

1 1
G(λ) =
. ≈ , (1.14)
λ− z
z−1 λ − z G(λ)
λ− z−1
λ− λ−...

where we used the approximation .z − 1 ≈ z in the last step. Equation (1.14)


constitutes a self-consistency equation for .G = G(λ), with the solution

λ 1 λ λ2 1
.G − G + = 0, G= − − ,
2
z z 2z 4z 2 z

since .limλ→∞ G(λ) = 0. The spectral density (1.12) then takes the form
√
4z − λ2 /(2π z) if λ2 < 4z
ρ(λ) =
. (1.15)
0 if λ2 > 4z

of a half-ellipse also known as “Wigner law”, or the “semi-circle law”.

Loops and the Clustering Coefficient The total number of triangles, viz
the overall number of loops of length three in a network is, on the average,
.C(N/3)(z − 1)z/2, where C is the clustering coefficient. This relation holds for

2 Random matrix theory will be further discussed in Sect. 10.2.1 of Chap. 10.
14 1 Network Theory

large networks. One can relate the clustering coefficient C to the adjacency matrix
A via
z(z − 1) N
C
. = number of triangles
2 3
1 1  
= Ai1 i2 Ai2 i3 Ai3 i1 = Tr A3 ,
6 6
i1 ,i2 ,i3

since three sites .i1 , .i2 and .i3 are interconnected only when the respective entries of
the adjacency matrix are unity. The sum of the right-hand side of above relation is a
moment of the graph spectrum. The factors .1/3 and .1/6 account for overcountings.
Above formula holds only when all node have  identical degrees .ki ≡ z. Otherwise
one needs to take an average, .z(z − 1) → i ki (ki − 1)/N.

Moments of the Spectral Density The graph spectrum is directly related to certain
topological features of a graph via its moments. The lth moment of .ρ(λ) is given by

N
1  l
. dλ λ ρ(λ) =
l
λj
N
j =1

1  l 1
= Tr A = Ai1 i2 Ai2 i3 · · · Ail i1 , (1.16)
N N
i1 ,i2 ,...,il

as one can see from (1.9). The lth moment of .ρ(λ) is therefore equivalent to the
number of closed paths of length l, the number of all paths of length l returning to
the starting point.

1.2.1 Graph Laplacian

Differential operators, like the derivative .df/dx of a function .f (x), may be


generalized to network functions. We start by recalling the definitions of the first,

d f (x + Δx) − f (x)
. f (x) = lim ,
dx Δx→0 Δx

and of the second derivative,

d2 f (x + Δx) + f (x − Δx) − 2f (x)


.
2
f (x) = lim .
dx Δx→0 Δx 2
1.2 Spectral Properties 15

Graph Laplacians Consider a function .fi on a graph with N sites, with .i =


1, . . . , N . One defines the graph Laplacian .Λ̂ via

  ⎧
⎨ ki i=j
Λij =
. Ail δij − Aij = −1 i and j connected , (1.17)

l 0 otherwise

where the .Λij = (Λ̂)ij are the elements of the Laplacian matrix, .Aij the adjacency
matrix, and where .ki is the degree of vertex i.
Apart from a sign convention, .Λ̂ corresponds to a straightforward generalization
of the usual Laplace operator .Δ = ∂ 2 /∂x 2 +∂ 2 /∂y 2 +∂ 2 /∂z2 . To see this, just apply
the Laplacian matrix .Λij to a graph-function .f = (f1 , . . . , fN ). The graph Laplacian
is hence intrinsically related to diffusion processes on networks.3 Alternatively one
defines by

⎨ 1 i=j

Lij =
. −1/ ki kj i and j connected , (1.18)

0 otherwise

the “normalized graph Laplacian”, where .ki = j Aij is the degree of vertex i. The
eigenvalues of the normalized graph Laplacian have a straightforward interpretation
in terms of the underlying graph topology. One needs however to remove first all
isolated sites from the graph, which do not generate entries to the adjacency matrix.
The respective .Lij would be ill defined.

Eigenvalues of the Normalized Graph Laplacian Of interest are the eigenvalues


λl of the normalized graph Laplacian (1.18).
.

– The normalized graph Laplacian is positive semidefinite,

0 = λ0 ≤ λ1 ≤ · · · ≤ λN −1 ≤ 2 .
.

– The lowest eigenvalue .λ0 is always zero, corresponding to the eigenfunction

1    
e(λ0 ) = √
. k1 , k2 , . . . , kN , (1.19)
C

where C is a normalization constant and where the .ki are the respective vertex
degrees.

3 See Sect. 3.3.1 of Chap. 3 for an indepth discussion of diffusion processes.


16 1 Network Theory

– The degeneracy of .λ0 is given by the number of disconnected subgraphs


contained in the network. The eigenfunctions of .λ0 vanish on all subclusters
beside one, where it has the functional form (1.19).
– The largest eigenvalue .λN −1 is .λN −1 = 2, if and only if the network is bipartite.
Generally, a small value of .2 − λN −1 indicates that the graph is nearly bipartite.
The degeneracy of .λ = 2 is given by the number of bipartite subgraphs.
– The inequality

. λl ≤ N
l

holds generally. The equality holds for connected graphs, viz when .λ0 has
degeneracy one.

Examples of Graph Laplacians The eigenvalues of the normalized graph Lapla-


cian can be evaluated analytically for some reference graphs.

– For a complete N-site graph, viz when all sites are mutually interconnected, the
eigenvalues are

λ0 = 0,
. λl = N/(N − 1), l = 1, . . . , N − 1 .

– For a complete bipartite graph (all sites of one subgraph are connected to all other
sites of the other subgraph) the eigenvalues are

λ0 = 0,
. λN −1 = 2, λl = 1, l = 1, . . . , N − 2 .

The eigenfunction for .λN −1 = 2 has the form

1     
e(λN −1 ) = √
. kA , . . . , kA , − kB , . . . , − kB . (1.20)
C      
A sublattice B sublattice

Denoting with .NA and .NB the number of sites in two sublattices A and B, with
NA + NB = N, the degrees .kA and .kB of vertices belonging to sublattice A and
.

B respectively are .kA = NB and .kB = NA for a complete bipartite lattice.

A densely connected graph will therefore have a substantial number of eigenvalues


close to unity. For real-world graphs one may therefore plot the spectral density of
the normalized graph Laplacian in order to gain an insight into its overall topological
properties. The information obtained from the spectral density of the adjacency
matrix and from the normalized graph Laplacian are somewhat distinct.
1.3 Percolation in Generalized Random Graphs 17

1.3 Percolation in Generalized Random Graphs

The most random of all graphs are Erdös–Rényi graphs. One can relax the extent
of randomness somewhat and construct random networks with an arbitrarily given
degree distribution. This procedure is also denoted “configurational model”. We will
use the configurational model to examine the “percolation transition”, which occurs
as a function for link probability p. Graphs decompose into a set of disconnected
subgraphs when containing only a few links, but not for high link densities. The
respective transition is the percolation transition.

1.3.1 Graphs with Arbitrary Degree Distributions

In order to generate random graphs having non-Poisson degree distributions, we


may choose a specific set of degrees.

DEGREE SEQUENCE A degree sequence is a specified set .{ki } of the degrees for the
vertices .i = 1 . . . N .

The degree sequence is also the first information one obtains when examining
real-world networks and hence the foundation for all further analysis.

Construction of Networks with Arbitrary Degree Distribution The degree


sequence can be chosen in such a way that the fraction of vertices having degree
k will tend to the desired degree distribution in the thermodynamic limit,

pk ,
. N → ∞.

For the construction, two steps suffice.

– Assign .ki “stubs” (ends of edges emerging from a vertex) to all vertices, .i =
1, . . . , N .
– Iteratively choose pairs of stubs at random and join them together to make
complete edges.

When all stubs have been used up, the resulting graph is a random member of
the ensemble of graphs with the desired degree sequence. Figure 1.8 illustrates the
construction procedure.

Average Degree and Clustering The mean number of neighbors is the coordina-
tion number

z = 〈k〉 =
. kpk .
k
18 1 Network Theory

Step A Step B

Fig. 1.8 Construction procedure of a random network with nine vertices and degrees .X1 = 2,
.X2 = 3, .X3 = 2, .X4 = 2. In step A (left) the vertices with the desired number of stubs (degrees)
are constructed. In step B (right) the stubs are connected randomly

The probability that one of the second neighbors of a given vertex is also a first
neighbor scales as .N −1 for random graphs, regardless of the degree distribution.
Loops can be ignored in the limit .N → ∞.

Degree Distribution of Neighbors Consider a given vertex A and a vertex B that


is a neighbor of A, i.e. A and B are linked by an edge.
We are now interested in the degree distribution for vertex B, viz in the degree
distribution of a neighbor vertex of A, where A is an arbitrary node of a random
network with degree distribution .pk . As a first step we consider the average degree
of a neighbor node.
A high-degree vertex has more edges connected to it. There is then a higher
chance that any given edge on the graph will be connected to it, with this chance
being directly proportional to the degree of the vertex. The probability distribution
of the degree of the vertex to which an edge leads is thus proportional to .kpk and
not just to .pk .

Excess Degree Distribution When we are interested in determining the size of


loops or the size of connected components in a random graph, we are normally
interested not in the complete degree of the vertex reached by following an edge
from A, but in the number of edges emerging from such a vertex that do not lead
back to A, because the latter contains all information about the number of second
neighbors of A.
The number of new edges emerging from B is just the degree of B minus one
and its correctly normalized distribution is therefore

kpk (k + 1)pk+1
qk−1 = 
. , qk =  , (1.21)
j jpj j jpj

since .kpk is the degree distribution of a neighbor, compare Fig. 1.9. The distribution
qk of the outgoing edges of a neighbor vertex is the “excess degree distribution”.
.
1.3 Percolation in Generalized Random Graphs 19

A B

Fig. 1.9 The excess degree distribution .qk−1 is the probability of finding k outgoing links of a
neighboring site. Here the site B is a neighbor of site A and has a degree .k = 5 and .k − 1 = 4
outgoing edges, compare (1.21). The probability that B is a neighbor of site A is proportional to
the degree k of B

The average number of outgoing edges of a neighbor vertex is then


∞ ∞ ∞
k=0 k(k + 1)pk+1 k=1 (k − 1)kpk
. kqk =  = 
k=0 j jpj j jpj

〈k 2 〉 − 〈k〉
= , (1.22)
〈k〉

where .〈k 2 〉 is the second moment of the degree distribution.

Number of Next-Nearest Neighbors We denote with

zm ,
. z1 = 〈k〉 ≡ z

the average number of m-nearest neighbors. Equation (1.22) gives the average
number of vertices two steps away from the starting vertex A via a particular
neighbor vertex. Multiplying this by the mean degree of A, namely .z1 ≡ z, we
find that the mean number of second neighbors .z2 of a vertex is

z2 = 〈k 2 〉 − 〈k〉 .
. (1.23)

Note that .z2 is not given by the variance of the degree distribution,4 which would be
.〈(k − 〈k〉) 〉 = 〈k 〉 − 〈k〉 .
2 2 2

4 Compare Sect. 5.1 of Chap. 5.


20 1 Network Theory

z2 for the Erdös–Rényi Graph The degree distribution of an Erdös–Rényi graph


.

is the Poisson distribution, .pk = e−z zk /k!, see (1.5). For the average number of
second neighbors, Eq. (1.23), we obtain
∞ ∞
zk zk−1
.z2 = k 2 e−z − z = ze−z (k − 1 + 1) −z
k! (k − 1)!
k=0 k=1

= z = 〈k〉 .
2 2

The mean number of second neighbors of a vertex in an Erdös–Rényi random


graph is just the square of the mean number of first neighbors. This is a special
case however. For most degree distributions, Eq. (1.23) will be dominated by the
term .〈k 2 〉, which implies that the number of second neighbors is roughly the mean
square degree, rather than the square of the mean. For broad distributions these two
quantities can be very different.

Number of Far Away Neighbors The average number of edges emerging from
a second neighbor, and not lead ing back to where it came from, is also given
by (1.22), and indeed this is true at any distance m away from vertex A. The average
number of neighbors at a distance m is then

〈k 2 〉 − 〈k〉 z2
zm =
. zm−1 = zm−1 , (1.24)
〈k〉 z1

where .z1 ≡ z = 〈k〉 and .z2 are given by (1.23). Iterating this relation we find

m−1
z2
.zm = z1 . (1.25)
z1

Giant Connected Cluster Depending on whether .z2 is greater than .z1 or not,
Eq. (1.25) will either diverge or converge exponentially as m becomes large.

∞ if z2 > z1
. lim zm = . (1.26)
m→∞ 0 if z2 < z1

The percolation point is therefore .z1 = z2 . Below the transition, the total number of
neighbors is
∞ m−1
z2 z1 z12
. z m = z1 = = .
m
z1 1 − z2 /z1 z 1 − z2
m=1

The total number of sites connected to the starting node is finite below the
percolation transition, which implies that the network decomposes into non-
1.3 Percolation in Generalized Random Graphs 21

connected components. Beyond the percolation transition one or more clusters with
macroscopic numbers of nodes form.

GIANT CONNECTED COMPONENT When the largest cluster of a graph encompasses a


finite fraction of all vertices in the thermodynamic limit, it is said to form a giant connected
component.

If the total number of neighbors is infinite, then there must be a giant connected
component. When the total number of neighbors is finite, there can be no giant
connected component.

Percolation Threshold When a system has two or more possibly macroscopically


different states, one speaks of a phase transition.

PERCOLATION TRANSITION A percolation transition occurs when the structure of an


evolving graph transits from a state in which two (far away) sites are on the average
connected/not connected.

This phase transition occurs when .z2 = z1 . Making use of (1.23), .z2 = 〈k 2 〉−〈k〉,
we find that this condition is equivalent to

〈k 2 〉 − 2〈k〉 = 0,
. k(k − 2)pk = 0 . (1.27)
k=0

Because of the factor .k(k − 2), vertices of degree zero and degree two do not
contribute to the sum. The number of vertices with degree zero or two therefore
affects neither the phase transition nor the existence of a giant component.

– Vertices of degree zero are not connected to any other node, they do not
contribute to the network topology.
– Vertices of degree one constitute the sole negative contribution to the percolation
condition (1.27). No further path is available when arriving at a site with degree
one.
– Vertices of degree two act as intermediators between two other nodes. Removing
vertices of degree two does not change the topological structure of a graph.

One can therefore remove (or add) vertices of degree two or zero without affecting
the existence of the giant component.

Clique Percolation Edges correspond to cliques with .Z = 2 sites, compare


Fig. 1.4. The percolation transition can hence be interpreted as a percolation of
cliques having size two and larger. It is then clear that the concept of percolation
can be generalized to that of percolation of cliques with Z sites.
22 1 Network Theory

Average Vertex–Vertex Distance Below the percolation threshold the average


vertex–vertex distance .𝓁 is finite and the graph decomposes into an infinite number
of disconnected subclusters.

DISCONNECTED SUBCLUSTERS A disconnected subcluster or subgraph constitutes a


subset of vertices for which (a) there is at least one path in between all pairs of nodes
making up the subcluster and (b) there is no path between a member of the subcluster and
any out-of-subcluster vertex.

Well above the percolation transition, .𝓁 is given approximately by the condition


z𝓁 ≃ N,
.

log(N/z1 )
. log(N/z1 ) = (𝓁 − 1) log(z2 /z1 ), 𝓁= + 1, (1.28)
log(z2 /z1 )

when using .zl = z1 (z2 /z1 )l−1 , as given by (1.25). For Erdös–Rényi random graphs
one has .z1 = z and .z2 = z2 , which leads to

log N − log z log N


𝓁=
. + 1= ,
log z log z

as derived previously, see (1.2).

Clustering Coefficient of Generalized Random Graphs The clustering coeffi-


cient C denotes the probability that two neighbors i and j of a particular vertex A
have stubs that do interconnect. The probability that two given stubs are connected
is .1/(zN − 1) ≈ 1/zN, since zN is the total number of stubs. We have
!2
〈ki 〉q 〈kj 〉q 1
.C = = kq k (1.29)
Nz Nz
k
2 2
1 〈k 2 〉 − 〈k〉 z 〈k 2 〉 − 〈k〉
= = ,
Nz 〈k〉 N 〈k〉2

where the notation .〈. . .〉q indicates that the average is to be taken with respect to the
excess degree distribution .q = qk , as given by (1.21).
As expected, the clustering coefficient vanishes in the thermodynamic limit
.N → ∞. However, it may have a very big leading coefficient, especially for degree

distributions with fat tails. The differences listed in Table 1.1, between the measured
clustering coefficient C and the value .Crand = z/N for Erdös–Rényi graphs,
are partly due to the fat tails in the degree distributions .pk of the corresponding
networks.
1.3 Percolation in Generalized Random Graphs 23

1.3.2 Probability Generating Function Formalism

Network theory is about the statistical properties of graphs. A powerful method from
probability theory is the generating function formalism, which we will discuss now
and apply later on.

Probability Generating Functions We define with



G0 (x) =
. pk x k (1.30)
k=0

the “generating function” .G0 (x) for the probability distribution .pk . The generating
function .G0 (x) contains all information present in .pk . We can recover .pk from
.G0 (x) by differentiation,

1 dk G0
. pk = . (1.31)
k! dx k x=0

One says that .G0 “generates” the probability distribution .pk .

Generating Function for the Excess Degree Distribution For later use we define
the generating function .G1 (x) for the distribution .qk = (k + 1)pk+1 /z measuring
the number of non-returning edges leaving a neighboring vertex,
∞ ∞ ∞
k=0 (k + 1)pk+1 x k k=1 kpk x
k−1
.G1 (x) = qk x =
k
=
z z
k=0

G'0 (x)
= , z = 〈k〉 , (1.32)
z

where .G'0 (x) denotes the first derivative of .G0 (x) with respect to its argument.

Properties of Generating Functions Probability generating functions have a


couple of important properties.

– NORMALIZATION
The normalization of .pk implies

G0 (1) =
. pk = 1 . (1.33)
k
24 1 Network Theory

– MEAN
Straightforward differentiation,

G'0 (1) =
. k pk = 〈k〉 , (1.34)
k

yields the average degree .〈k〉.


– MOMENTS
The nth moment .〈k n 〉 of the distribution .pk is given by
 
d n
〈k n 〉 =
. k n pk = x G0 (x) . (1.35)
dx x=1
k

Generating Function for Independent Random Variables Let us assume that


we have two random variables, such as two dice. Throwing the two dice correspond
to two independent random events. The joint probability to obtain .k = 1, . . . , 6
with the first dice and .l = 1, . . . , 6 with the second dice is .pk pl . This probability
function is generated by
  
. pk pl x k+l
= pk x k
pl x l
,
k,l k l

i.e. by the product of the individual generating functions. This is the reason
why generating functions are so useful in describing combinations of independent
random events. 
As an application consider n randomly chosen vertices. The sum . i ki of the
respective degrees has a cumulative degree distribution, which is generated by
 n
. G0 (x) .

Generating
 Function of the Poisson Distribution The generating function .G0 =
−z zk /k! is
k pk x of the Poisson distribution .pk = e
k


zk k
G0 (x) = e−z
. x = ez(x−1) ,
k!
k=0

with z being the average degree. Compare (1.5). The generating function .G1 (x) for
the excess degree distribution .qk is

G'0 (x)
G1 (x) =
. = ez(x−1) , (1.36)
z

compare (1.32). As expected, we find .G1 (x) = G0 (x) for the Poisson distribution.
1.3 Percolation in Generalized Random Graphs 25

Generating Function of the Exponential Distribution As a second example,


consider a graph with an exponential degree distribution:

1 − e−1/κ
pk = (1 − e−1/κ ) e−k/κ ,
. pk = = 1, (1.37)
1 − e−1/κ
k=0

where .κ is a constant.The generating function is



1 − e−1/κ
G0 (x) = (1 − e−1/κ )
. e−k/κ x k = ,
1 − xe−1/κ
k=0

together with

e−1/κ G'0 (x) 1 − e−1/κ 2


z = G'0 (1) =
. , G1 (x) = = .
1 − e−1/κ z 1 − xe−1/κ

Stochastic Sum of Independent Variables Let’s assume that the random variables
k1 , k2 , . . . are identically distributed, which implies that they have the same
.

generating functions .G0 (x). The powers

G02 (x),
. G03 (x), G04 (x) ,

are the generating functions for

k1 + k2 ,
. k 1 + k2 + k3 , k 1 + k2 + k3 + k4 ,

and so on. Next consider that the number of times n this stochastic process is
executed is distributed as .pn . As an example consider throwing a dice several times,
with a probability .pn to throw n times. The distribution of the results obtained is
then generated by

. pn G0n (x) = GN (G0 (x)) , GN (z) = pn z n . (1.38)


n n

We will make use of this relation further on.

1.3.3 Distribution of Component Sizes

Absence of Closed Loops For networks below the percolation transition we


are interested in the distribution of the sizes of the individual subclusters. The
calculations will depend on the absence of closed loops, or any other kind of
26 1 Network Theory

= + + + +

Fig. 1.10 Graphical representation of the self-consistency (1.39) for the generating function
H1 (x), represented by the open box. A single vertex is represented by a hashed circle. The
subcluster connected to an incoming vertex can be either a single vertex or an arbitrary number
of subclusters of the same type connected to the first vertex. In analogy to Newman et al. (2001)

clustering, for generalized random graphs in the thermodynamic limit. In physics


jargon, large random graphs are “tree-like”.
We recall that the number of closed loops of length three corresponds to the
clustering coefficient C, viz to the probability that two of your friends are also
friends of each other. For random networks .C = [〈k 2 〉 − 〈k〉]2 /(z3 N), see (1.29),
which tends to zero as .N → ∞.

Generating Function for the Size Distribution of Components We define by

H1 (x) =
. h(1)
m x
m

the generating function for the distribution of cluster sizes containing a given vertex
j , with the condition that j is linked to a specific incoming edge, see Fig. 1.10. That
is, h(1)
m is the probability that the such-defined cluster contains m nodes.

Self-Consistency Condition for H 1 (x) We note the following.

– The first vertex j belongs to the subcluster with probability 1, its generating
function is x.
– The probability that the vertex j has k outgoing stubs is qk .
– At every stub outgoing from vertex j there is a subcluster.
– The total number of vertices consists of those generated by [H1 (x)]k plus the
starting vertex.

The number of outgoing edges k from vertex j is described by the distribution


function qk , see (1.21). The total size of the k clusters is generated by [H1 (x)]k ,
as a consequence of the multiplication property of generating functions discussed in
1.3 Percolation in Generalized Random Graphs 27

Sect. 1.3.2. The self-consistency equation for the total number of vertices reachable
is then

H1 (x) = x
. qk [H1 (x)]k = x G1 (H1 (x)) , (1.39)
k=0

where we made use of (1.32) and (1.38).

Embedding Cluster Distribution Function The quantity that we actually want


to know is the distribution of the sizes of the clusters to which the entry vertex
belongs.

– The number of edges emanating from a randomly chosen vertex is distributed


according to the degree distribution pk .
– Every edge leads to a cluster whose size is generated by H1 (x).

The size of a complete component is thus generated by



. H0 (x) = x pk [H1 (x)]k = x G0 (H1 (x)) , (1.40)
k=0

where the prefactor x corresponds to the generating function of the starting vertex.
The complete distribution of component sizes is given by solving (1.39) self-
consistently for H1 (x) and then substituting the result into (1.40).

Mean Component Size The calculation of H1 (x) and H0 (x) in closed form is
not possible. We are, however, interested only in the first moment, viz the mean
component size, see (1.34).
The component size distribution is generated by H0 (x), Eq. (1.40). The the mean
component size below the percolation transition is therefore
 
〈s〉 = H0' (1) = G0 (H1 (x)) + x G'0 (H1 (x)) H1' (x)
.
x=1

= 1 + G'0 (1)H1' (1) , (1.41)

where we made use of the normalization

G0 (1) = H1 (1) = H0 (1) = 1


.
28 1 Network Theory

of generating functions, see (1.33). The value of H1' (1) can be calculated from
(1.39) by differentiating:

H1' (x) = G1 (H1 (x)) + x G'1 (H1 (x)) H1' (x),


. (1.42)
1
H1' (1) = .
1 − G'1 (1)

Substituting this into (1.41) we find

G'0 (1)
〈s〉 = 1 +
. . (1.43)
1 − G'1 (1)

Percolation Transition On a closer look, above expression for the mean compo-
nent size reproduces the percolation condition z2 = z1 derived in Sect. 1.3. We make
use of (1.23) and

G'0 (1) =
. k pk = 〈k〉 = z1 , (1.44)
k

k(k − 1)pk 〈k 2 〉 − 〈k〉 z2
G'1 (1) = k
 = = ,
k kpk 〈k〉 z1

where z1 and z2 are the mean numbers of nearest- and next-nearest neighbors.
Substitution into (1.43) gives the average component size below the transition as

z12
〈s〉 = 1 +
. . (1.45)
z 1 − z2

This expression has a divergence at z1 = z2 . The mean component size diverges


at the percolation threshold, compare Sect. 1.3, beyond which the giant connected
component forms.

Size of the Giant Component Above the percolation transition the network
contains a giant connected component, which contains a finite fraction S of all
sites N . Formally we can decompose the generating functional for the component
sizes into a part generating the giant component, H0∗ (x), and a part generating the
remaining components,

H0 (x) → H0∗ (x) + H0 (x),


. H0∗ (1) = S, H0 (1) = 1 − S ,

and equivalently for H1 (x). The self-consistency (1.39) and (1.40),

.H0 (x) = xG0 (H1 (x)), H1 (x) = xG1 (H1 (x)),


1.4 Robustness of Random Networks 29

Fig. 1.11 For Erdös–Rényí 1

size of giant component


graphs, the size S(z) of the S(z)
giant component vanishes 0.8
2(z-1)
linearly, at the percolation
transition occurring at the 0.6
critical coordination number
zc = 1. Compare (1.47) and 0.4

(1.48)
0.2

0
0 0.5 1 1.5 2 2.5 3
coordination number z

then take the form

1 − S = H0 (1) = G0 (u),
. u = H1 (1) = G1 (u) . (1.46)

For Erdös–Rényí graphs we have G1 (x) = G0 (x) = exp(z(x − 1)), see (1.36). This
leads to 1 − S = u in (1.46), which takes with

1 − S = u = ez(u−1) = e−zS ,
. S = 1 − e−zS (1.47)

the form of a self-consistency condition for the mean size S of the giant component.
Percolation takes place for z ≥ 1.

Critical Behaviour The Taylor expansion for the exponential,

(zS)2 2(z − 1)
S = 1 − e−zS ≈ zS −
. , S(z) ≈ (1.48)
2 z2

shows that the S(z) increases linearly in the percolating regime, as shown in
Fig. 1.11. It follows from (1.47) that the giant component engulfs the entire net
exponentially fast for large degrees z, namely that limz→∞ S(z) → 1.

1.4 Robustness of Random Networks

Degree distributions of real-world networks often have “fat tails”, a vaguely defined
notion that a given distribution decays only slowly when the degree becomes large.
In general, fat tails increase the robustness of a network. That is, the network retains
functionality even when a certain number of vertices or edges is removed. The
Internet remains functional, to give an example, even when a substantial number
of Internet routers fail.

Removal of Vertices We consider a graph model in which each vertex is either


“active” or “inactive”. Inactive vertices are nodes that have either been removed, or
30 1 Network Theory

are present but non-functional. We denote with

b = bk
.

the probability that a vertex of degree k is active. The generating function



F0 (x) =
. pk bk x k , F0 (1) = pk bk ≤ 1 , (1.49)
k=0 k

generates the probabilities that a vertex has degree k and is present. Strictly
speaking, .F0 (x) is not a proper generating function because the normalization .F0 (1)
is not unity, but the fraction of all vertices that are active. This shortcoming could
be rectified by adding via

  ∞
. 1− pk bk + pk bk x k ,
k k=0

a constant to .F (x), which corresponds to the number of inactive nodes, which


are in turn equivalent to nodes with degree zero. The normalization can hence be
disregarded. By analogy with (1.32) we define by

k pk bk x k−1 F ' (x)
F1 (x) =
.
k
 = 0 (1.50)
k k pk z

the generating function for the degree distribution of neighbor sites. Also .F1 (x) is
not proper normalized.

Distribution of Connected Clusters The distributions of the sizes of connected


clusters reachable from a given vertex, .H0 (x), or from a given edge, .H1 (x), are
generated respectively by the normalized functions

H0 (x) = 1 − F0 (1) + xF0 (H1 (x)),


. H0 (1) = 1,
H1 (x) = 1 − F1 (1) + xF1 (H1 (x)), H1 (1) = 1 , (1.51)

which are the logical equivalents of (1.39) and (1.40).

Random Failure of Vertices First we consider the case of random failures of


vertices. In this case, the probability

bk ≡ b ≤ 1,
. F0 (x) = b G0 (x), F1 (x) = b G1 (x)
1.4 Robustness of Random Networks 31

of a vertex being present is independent of the degree k and just equal to a constant,
b, which means that

H0 (x) = 1 − b + bxG0 (H1 (x)),


. H1 (x) = 1 − b + bxG1 (H1 (x)), (1.52)

where .G0 (x) and .G1 (x) are the standard generating functions for the degree of a
vertex and of a neighboring vertex, see (1.30) and (1.32). This implies that the
mean size of a cluster of connected and present vertices is

〈s〉 = H0' (1) = b + bG'0 (1) H1' (1)


.

b2 G'0 (1) bG'0 (1)


=b+ ' =b 1+ ,
1 − bG1 (1) 1 − bG'1 (1)

where we followed the derivation presented in (1.42) in order to derive the


expression .H1' (1) = b/(1 − bG'1 (1)), which we used. With (1.44) for .G'0 (1) =
z1 = z and .G'1 (1) = z2 /z1 , we obtain the generalization

b2 z12
〈s〉 = b +
.
z1 − bz2

of (1.45). The model has a phase transition at the critical value of b,

z1 1
bc =
. = ' . (1.53)
z2 G1 (1)

If the fraction b of the vertices present in the network is smaller than the critical
fraction .bc , then there will be no giant component. This is the point at which the
network ceases to be functional in terms of connectivity. When there is no giant
component, connecting paths exist only within small isolated groups of vertices,
but no long-range connectivity exists. For a communication network such as the
Internet, this would be fatal.
For networks with fat tails, however, we expect that the number of next-nearest
neighbors .z2 is large compared to the number of nearest neighbors .z1 and that
.bc is consequently small. The network is robust as one would need to take out a

substantial fraction of the nodes before it would fail.

Random Failure of Vertices in Scale-Free Graphs We consider a pure power-law


degree distribution

1 dk
pk ∼
. , < ∞, α > 1,
kα kα
32 1 Network Theory

see (1.8) and Sect. 1.6. The first two moments are

.z1 = 〈k〉 ∼ dk (k/k α ), 〈k 2 〉 ∼ dk (k 2 /k α ) .

Noting that the number of next-nearest neighbors .z2 = 〈k 2 〉 − 〈k〉, we can identify
three regimes:

– .1 < α ≤ 2, with .z1 → ∞ and .z2 → ∞.


.bc = z1 /z2 is arbitrary in the thermodynamic limit .N → ∞.

– .2 < α ≤ 3, with .z1 < ∞ and .z2 → ∞.


.bc = z1 /z2 → 0 in the thermodynamic limit. Any number of vertices can be

randomly removed with the network remaining above the percolation limit. The
network is extremely robust.
– .3 < α, with .z1 < ∞ and .z2 < ∞.
.bc = z1 /z2 can acquire any value and the network has normal robustness.

Biased Failure of Vertices What happens when somebody sabotages the most
important sites of a network? This is equivalent to removing vertices in decreasing
order of their degrees, starting with the highest degree vertices. The probability that
a given node is active then takes the form

bk = θ (kc − k) ,
. (1.54)

where .θ (x) is the Heaviside step function



0 for x < 0
θ (x) =
. . (1.55)
1 for x ≥ 0

This form of targeted attacks correspond to setting the upper limit of the sum in
(1.49) to .kc . Differentiating (1.51) with respect to x yields

F1 (1)
H1' (1) = F1 (H1 (1)) + F1' (H1 (1)) H1' (1),
. H1' (1) = ,
1 − F1' (1)

as .H1 (1) = 1. The phase transition occurs when .F1' (1) = 1,


∞ kc
k=1 k(k − 1)pk bk k(k − 1)pk
. ∞ ∞
= k=1 = 1, (1.56)
k=1 kpk k=1 kpk

where we used the definition (1.50) for .F1 (x).


1.4 Robustness of Random Networks 33

Fig. 1.12 For a scale-free 0.03


network with a power-law
degree distribution

critical fraction fc
.pk ∼ 1/k the critical
α

fraction .fc of vertices, as


defined by (1.59). Removing 0.02
a fraction greater than .fc of
high degree vertices drives
the network below the
percolation limit. For a 0.01
smaller loss of highest degree
vertices the giant connected
component remains intact
(shaded area)
0
2 2.5 3
exponent α

Biased Failure of Vertices for Scale-Free Networks Scale-free networks have a


power-law degree distribution, .pk ∝ k −α . We can then rewrite (1.56) as

Hk(α−2)
.
c
− Hk(α−1)
c
= H∞
(α−1)
, (1.57)

(r)
where .Hn is the nth harmonic number of order r,
n
1
Hn(r) =
. . (1.58)
kr
k=1

The number of vertices present is .F0 (1), see (1.49), or .F0 (1)/ k pk , since the
degree distribution .pk is normalized. If we remove a certain fraction .fc of the
vertices we reach the transition determined by (1.57),

(α)
F0 (1) Hkc
fc = 1 − 
. = 1 − (α) . (1.59)
k pk H∞

It is impossible to determine .kc from (1.57) and (1.59) to get .fc in closed form.
However one can, solve (1.57) numerically for .kc and substitute it into (1.59). The
results are shown in Fig. 1.12, as a function of the exponent .α. The network is very
susceptible with respect to a biased removal of highest-degree vertices.

– A removal of more than about 3% of the highest degree vertices always leads to
a destruction of the giant connected component. Maximal robustness is achieved
for .α ≈ 2.2, which is actually close to the exponents measured in some real-
world networks. Compare Fig. 1.6.
34 1 Network Theory

Fig. 1.13 Regular linear graphs with connectivities .z = 2 (top) and .z = 4 (bottom)

– Networks with .α > αc = 3.4788 never have a giant connected component. The
(α−2) (α−1)
critical exponent .αc is given by the percolation condition .H∞ = 2H∞ ,
see (1.27).

1.5 Small-World Models

Random graphs and random graphs with arbitrary degree distribution show no
clustering in the thermodynamic limit, in contrast to real-world networks. It is
therefore important to find methods to generate graphs that have a finite clustering
coefficient and, at the same time, the small-world property.

Clustering in Lattice Models Lattice models and random graphs are two extreme
cases of network models. In Fig. 1.13 we illustrate a simple one-dimensional lattice
with connectivity z. For periodic boundary conditions, viz when the chain wraps
around itself in a ring, one can evaluate the clustering coefficient C analytically.

– ONE DIMENSION
For number of clusters one finds
3(z − 2)
C=
. , (1.60)
4(z − 1)

which tends to .3/4 in the limit of large z.


– GENERAL DIMENSION
Square or cubic lattices have dimension .d = 2, 3, respectively. The clustering
coefficient for general dimensions d is

3(z − 2d)
C=
. , (1.61)
4(z − d)

which generalizes (1.60). We note that the clustering coefficient tends to .3/4 for
z ⪢ 2d for regular hypercubic lattices in all dimensions.
.
1.5 Small-World Models 35

Fig. 1.14 Crossover from a


regular lattice to a
small-world network via
rewiring. Left: Regular lattice
with periodic boundary
conditions. Right: Two
rewired links, original and
new links are color coded
(dashed/solid)

Distances in Lattice Models Regular lattices do not show the small-world effect.
A regular hypercubic lattice in d dimensions with linear size L has .N = Ld vertices.
The average vertex–vertex distance increases as L, or equivalently as

𝓁 ≈ N 1/d .
.

The Watts and Strogatz Model Watts and Strogatz proposed a small-world model
that interpolates smoothly between a regular lattice and an Erdös–Rényi random
graph. The construction starts with a one-dimensional periodic lattice, see Fig. 1.14.
There are several possibilities

– One goes through all the links of the lattice and rewires the link with a given
probability .pnew .
– A single edge is selected and rewired, with the procedure repeated .Nnew times.
– As a variation one may add links, instead of rewiring. The advantage is that
networks may not become disconnected.

For small .pnew and/or .Nnew , this process produces a graph that is still mostly regular,
possessing however a few connections stretching long distances across the lattice,
as illustrated in Fig. 1.14. The average coordination number z of the lattice remains
constant. The number of neighbors of any particular vertex can, however, be greater
or smaller than z.

Social Network Interpretation The small-world models such as the ones illus-
trated in Fig. 1.14, have an intuitive justification for social networks. Most people
are friends with their immediate neighbors. Neighbors on the same street, people
that they work with, or their relatives. However, some people are also friends with
a few far-away persons. Far away in a social sense, like people in other countries,
people from other walks of life, acquaintances from previous eras of their lives, and
so forth. These long-distance acquaintances are represented by the long-range links
in small-world models.
In Fig. 1.15 the clustering coefficient is shown as a function of network diameter
when rewiring is performed edge by edge. The key result is that a few steps are
sufficient to achieve a drastic reduction of intra-network distances, as measured by
36 1 Network Theory

1
link by link rewiring
0.8 N = 500, z = 4

diameter l/l0
5 runs, stopping when
0.6 graph becomes disconnected
0.4

0.2

0
0 0.2 0.4 0.6 0.8 1
clusterning coefficient C/C0

Fig. 1.15 The scaled clustering coefficient .C/C0 as as a function of the scaled diameter .l/ l0 ,
where .C0 = 3/4 and .l = N/4 are the initial values, for a one-dimensional ring with .N = 500 sites
and .z = 4, compare Fig. 1.14. Points correspond to the consecutive random rewiring of a single
edge

the graph diameter. At the same time, the clustering coefficient remains high, as
observed in many real-world networks.

1.6 Scale-Free Graphs

Evolving Networks Most real-world networks are open, i.e. they are formed by
the continuous addition of new vertices to the system. The number of vertices, N,
increases throughout the lifetime of the network, as it is the case for the world wide
web, which grows exponentially by the continuous addition of new web pages. The
small world networks discussed in Sect. 1.5 are, however, constructed for a fixed
number of nodes N, growth is not considered.

Preferential Connectivity Random network models assume that the probability


that two vertices are connected is random and uniform. In contrast, most real
networks exhibit the “rich-get-richer” phenomenon.

PREFERENTIAL CONNECTIVITY New vertices prefer to connect to existing nodes which


already have high degrees.

A newly created web page, to give an example, will include links to well-known
sites with a quite high probability. Popular web pages will therefore have both a high
number of incoming links and a high growth rate for incoming links. The growth of
vertices in terms of edges is therefore in general not uniform.
1.6 Scale-Free Graphs 37

t=0 t=1 t=2 t=3

Fig. 1.16 Illustration of the preferential attachment model for an evolving network. At t = 0 the
system consists of N0 = 3 isolated vertices. At every time step a new vertex (shaded circle) is
added, which is connected to m = 2 vertices, preferentially to the vertices with high connectivity,
determined by the rule (1.62)

Barabási–Albert Model We start at time t = 0 with N(0) = N0 unconnected


vertices. The preferential attachment growth process can then be carried out in two
steps.

– GROWTH
At every time step a new vertex is added and connected with m ≤ N0 links to
existing nodes.
– PREFERENTIAL ATTACHMENT
The new links are determined by connecting the m stubs of the new vertex to
nodes with degrees ki with probability

ki + C
Π (ki ) =
. a(t) = 2m(t − 1) + C(N0 + t − 1) , (1.62)
a(t)

where the normalization


 a(t) has been chosen such that the overall probability to
connect is unity, i Π (ki ) = 1.

The attachment probability Π (ki ) is linearly proportional to the number of links


already present, modulo an offset C. Other functional dependencies for Π (ki ) are
of course possible, but not considered here.
After t time steps above two steps leads to a network with N(t) = t +N0 vertices
and mt edges, see Fig. 1.16. We will now show that preferential attachment leads to
a scale-free degree distribution

C
pk ∼ k −γ
. γ =3+ . (1.63)
m
This scaling relation is valid for large degrees ki in the limit t → ∞.

Time-Dependent Connectivities The time dependence of the degree of a given


vertex can be calculated analytically using a mean-field approach. Only nodes with
large degrees matter, as the scaling relation (1.63) applies asymptotically in the
38 1 Network Theory

limit k → ∞, We may therefore assume ki to be continuous,

∂ki ki + C
.Δki (t) ≡ ki (t + 1) − ki (t) ≈ = A Π (ki ) = A , (1.64)
∂t a(t)

where Π (ki ) is the attachment probability. The overall number of new links is
proportional to a normalization constant A, which is hence determined by the sum
rule

. Δki (t) ≡ m = A Π (ki ) = A ,


i i

since the overall probability to attach, i Π (ki ), is unity. Making use of a(t) =
2m(t − 1) + C(N0 + t − 1), we obtain

∂ki ki + C m ki + C
. =m ≈ , (1.65)
∂t (2m + C)t + a0 2m + C t

where did neglect a0 = C(N0 − 1) − 2m for large times t. The solution of (1.65) is
given by
 m/(2m+C)
k̇i m 1   t
. = , ki (t) = C + m −C, (1.66)
ki + C 2m + C t ti

where ti is the time at which the vertex i had been added, ki (t) = 0 for t < ti .

Adding Times We specialize to the case C = 0, the general result for C > 0
can be obtained subsequently from scaling considerations. A vertex added at time
ti = Ni − N0 has initially m = ki (ti ) links, in agreement with (1.66), which reads
 0.5
t
ki (t) = m
. , ti = t m2 /ki2 (1.67)
ti

for C = 0. Older nodes, i.e. those with smaller ti , increase their connectivity faster
than the younger vertices, viz those with bigger ti , see Fig. 1.17. For social networks
this mechanism is dubbed the rich-gets-richer phenomenon.

The number of nodes N(t) = N0 + t is identical to the number of adding times,

.t1 , . . . , tN0 = 0, tN0 +j = j, j = 1, 2, . . . ,

where we did define the initial N0 nodes to have adding times zero.
1.6 Scale-Free Graphs 39

3 P(ti )
ki(t)

m²t/k²
2 1/(N0+t)
1 P(t i>m²t/k²)

0
0 1 2 3 4 5 6 t ti
adding times

Fig. 1.17 Left: Time evolution of the connectivities for vertices with adding times t = 1, 2, 3, . . .
and m = 2, following (1.67). Right: The integrated probability, P (ki (t) < k) = P (ti > tm2 /k 2 ),
see (1.68)

Integrated Probabilities Using (1.67), the probability that a vertex has a connec-
tivity ki (t) smaller than a certain k, P (ki (t) < k) can be written as
 
m2 t
P (ki (t) < k) = P ti > 2
. . (1.68)
k

Adding times are uniformly distributed, compare Fig. 1.17, which implies that the
probability P (ti ) to find an adding time ti is

1
P (ti ) =
. , (1.69)
N0 + t

viz just the inverse of the total number of adding times, which coincides in turn with
the total number of nodes. P (ti > m2 t/k 2 ) is therefore the cumulative number of
adding times ti larger than m2 t/k 2 , multiplied with the probability P (ti ) to add a
new node,
   
m2 t m2 t 1
P ti > 2 = t − 2
. . (1.70)
k k N0 + t

Scale-Free Degree Distribution The degree distribution pk follows from (1.70)


via a simple differentiation,

∂P (ki (t) < k) ∂P (ti > m2 t/k 2 ) 2m2 t 1


pk =
. = = , (1.71)
∂k ∂k N0 + t k 3

in accordance with (1.63).


The degree distribution (1.71) has a well defined limit t → ∞, approaching
a stationary distribution. We note that the exponent γ = 3 is independent of
the number m of added links per new site. This result indicates that growth and
preferential attachment play an important role for the occurrence of a power-
40 1 Network Theory

law scaling in the degree distribution. To verify that both ingredients are really
necessary, we investigate variants of above model.
One can repeat above calculation for a finite offset C > 0. The exponent γ is
identical to the inverse of the scaling power of ki with respect to time t in (1.66),
plus one, which leads to γ = (2m + C)/m + 1 = 3 + C/m.

Growth with Random Attachment We examine whether growth alone can result
in a scale-free degree distribution, assuming random instead of preferential attach-
ment. The growth equation for the connectivity ki of a given node i, compare (1.65),
then takes the form
∂ki m
. = . (1.72)
∂t N0 + t

The m new edges are linked randomly at time t to the (N0 + t − 1) nodes present at
the previous time step. Solving (1.72) for ki , with the initial condition ki (ti ) = m,
we obtain
 
N0 + t
.ki = m ln +1 , (1.73)
N0 + ti

which is a logarithmic increase with time. The probability that vertex i has
connectivity ki (t) smaller than k is
 
P (ki (t) < k) = P ti > (N0 + t) e1−k/m − N0
. (1.74)
  1
= t − (N0 + t) e1−k/m + N0 ,
N0 + t

where we assumed that we add the vertices uniformly in time to the system. Using

∂P (ki (t) < k)


pk =
.
∂k
for large times t, we find
 
1 1−k/m e k
.pk = e = exp − . (1.75)
m m m

Growing networks with random attachment are characterized hence by

k∗ = m ,
. (1.76)

which is identical to half of the average connectivities of the vertices in the


system, since 〈k〉 = 2m. Random attachment does not lead to a scale-free degree
Exercises 41

distribution. Note that pk in (1.75) is not properly normalized, nor in (1.71), since
we used a large-k approximation during the respective derivations.

Internal Growth with Preferential Attachment The original preferential attach-


ment model yields a degree distribution pk ∼ k −γ with γ = 3. Most social
networks such as the Internet and the Wikipedia network, however, have exponents
2 < γ < 3, with the exponent γ being relatively close to two. It is also observed
that new edges are mostly added in between existing nodes, albeit with internal
preferential attachment.
We can then generalize the preferential attachment model discussed above.

– At every time step a new vertex is added.


– At every time step m new edges are added.
– With probability r ∈ [0, 1] any one of the m new edges is added between the new
vertex and an existing vertex i, which is selected with a probability ∝ Π (ki ),
see (1.62).
– With probability 1 − r any one of the m new edges is added in between two
existing vertices i and j , which are selected with a probability ∝ Π (ki ) Π (kj ).

The model reduces to the original preferential attachment model in the limit r → 1.
The scaling exponent γ can be evaluated along the lines used above for the case
r = 1. One finds
1 1
pk ∼
. , γ =1 + . (1.77)
kγ 1 − r/2

The exponent γ = γ (r) interpolates smoothly between two and three, with γ (1) =
3 and γ (0) = 2. For most real-world graphs r is quite small, viz most links are
added internally. Note, that the average connectivity 〈k〉 = 2m remains constant,
since one new vertex is added for 2m new stubs.

Exercises

(1.1) NETWORK OF CLIQUES


Consider i = 1, . . . , 9 managers sitting on the boards of six companies with
(1,9), (1,2,3), (4,5,9), (2,4,6,7), (2,3,6) and (4,5,6,8) being the respective
board compositions. Draw the graphs for the managers and companies, by
eliminating from the bipartite manager/companies graph one type of node.
The result can be interpreted as a network of cliques. Evaluate for both
networks the average degree z, the clustering coefficient C and the graph
diameter D.
42 1 Network Theory

(1.2) ENSEMBLE FLUCTUATIONS


Derive (1.7) for the distribution of ensemble fluctuations. In the case of
difficulties Albert and Barabási (2002), can be consulted. Alternatively,
check (1.7) numerically.
(1.3) CLUSTERING COEFFICIENT AND DEGREE SEQUENCE
Rewrite the expression (1.29) for the clustering coefficient C as a sum over
the degree sequence {ki }. Show that the resulting formula, valid only for
statistically uncorrelated graphs, may violate the strict bound C ≤ 1, by
considering the degree sequence of a star-shaped network with a single
central site connected to N − 1 otherwise isolated sites.
(1.4) TSALLIS–PARETO DISTRIBUTION
Evaluate mean and variance of the Tsallis–Pareto distribution,

α−1
p(x) =
. , (1.78)
(1 + x)α

on the interval x ∈ [0, ∞].


(1.5) KRONECKER GRAPHS
A Kronecker graph K of two graphs G and H is defined as the outer product
K = G ⊗ H , in terms of the respective adjacency matrices, with a node Xij K

of K being linked to a node Xkl K of K if X G is linked to X G in G and X H is


i k j
H
linked to Xl in H .
Start by constructing the Kronecker graph of two very small networks
of your choice. Express then generically the degree distribution pK (k) of
the Kronecker graph in terms of pG (k) and pH (k), the respective degree
distributions of G and H . How does the coordination number scale?
(1.6) SELF-RETRACING PATH APPROXIMATION
Look at Brinkman and Rice (1970) and prove (1.14). This derivation is only
suitable for readers with a solid training in physics.
(1.7) PROBABILITY GENERATING FUNCTIONS
Prove that the variance 2
σ of ak probability distribution pk with2 a generating
functional G0 (x) = k pk x and average 〈k〉 is given by σ = G''0 (1) +
〈k〉 − 〈k〉2 .
Consider furthermore a cumulative process generated by GC (x) =
GN (G0 (x)), compare (1.38). Calculate the mean and the variance of the
cumulative process and discuss the result.
(1.8) CLUSTERING COEFFICIENT OF REGULAR LATTICES
Prove (1.60) for the clustering coefficient of one-dimensional lattice graphs.
Facultatively, generalize this formula to a d-dimensional lattice with links
along the main axis.
(1.9) ROBUSTNESS OF GRAPHS AGAINST FOCUSED ATTACKS
Using the formalism of Sect. 1.4 calculate the robustness of Erdös–Rényi
networks against focused attacks with bk = θ (kc − k) being the probability
of nodes remaining active. For which values of kc , as a function of the
References 43

coordination number z, does the network remain functional? Evaluate the


corresponding critical fraction fc of inactive nodes.
(1.10) EPIDEMIC SPREADING IN SCALE-FREE NETWORKS
Consult “R. Pastor-Satorras and A. Vespigiani, Epidemic spreading in scale-
free networks, Physical Review Letters, Vol. 86, 3200 (2001)”. The task is to
solve a simple molecular-field approach to the SIS model for the spreading
of diseases in scale-free networks by using the excess degree distribution
discussed in Sect. 1.3.1, where S and I stand respectively for susceptible and
infective individuals.

Further Reading

For further studies several books and review articles on general network theory are
recommended, Estrada (2012), Kadushin (2012), and Albert and Barabási (2002).
The interested reader might delve into some of the original literature, including
the original Watts and Strogatz (1998) small-world model, the mean-field solution
of the preferential attachment model, Barabasi et al. (1999), a study regarding the
community structure of real-world networks, Palla et al. (2005), or the mathematical
basis of graph theory, Erdös and Rényi (1959). A good starting point is the account
by Milgram (1967) of his by now famous experiment, which led to the law of “six
degrees of separation”, Guare (1990).

References
Albert, R., & Barabási, A. L. (2002). Statistical mechanics of complex networks. Review of Modern
Physics, 74, 47–97.
Barabasi, A. L., Albert, R., & Jeong, H. (1999). Mean-field theory for scale-free random networks.
Physica A, 272, 173–187.
Brinkman, W. F., & Rice, T. M. (1970). Single-particle excitations in magnetic insulators. Physical
Review B, 2, 1324–1338.
Erdös, P., & Rényi, A. (1959). On random graphs. Publications Mathematicae, 6, 290–297.
Estrada, E. (2012). The structure of complex networks: Theory and applications. Oxford: Oxford
University Press.
Guare, J. (1990). Six degrees of separation: A play. New York: Vintage.
Kadushin, C. (2012). Understanding social networks: Theories, concepts, and findings. Oxford:
Oxford University Press.
Ludueña, G. A., Meixner, H., Kaczor, G., & Gros, C. (2013). A large-scale study of the world wide
web: Network correlation functions with scale-invariant boundaries. The European Physical
Journal B, 86, 1–7.
Milgram, S. (1967). The small world problem. Psychology Today, 2, 60–67.
Newman, M. E. J., Strogatz, S. H., & Watts, D. J. (2001). Random graphs with arbitrary degree
distributions and their applications. Physical Review E, 64, 026118.
Palla, G., Derenyi, I., Farkas, I., & Vicsek, T. (2005). Uncovering the overlapping community
structure of complex networks in nature and society. Nature, 435, 814–818.
Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of small world networks. Nature, 393,
440–442.
Bifurcations and Chaos in Dynamical Systems
2

Complex systems theory deals with dynamical systems containing often large
numbers of variables. It extends dynamical systems theory, which treats dynamical
systems containing a few variables. A good understanding of dynamical systems
theory is therefore a prerequisite when studying complex systems.
In this chapter we introduce core concepts, like attractors and Lyapunov expo-
nents, bifurcations, and deterministic chaos from the realm of dynamical systems
theory. An introduction to catastrophe theory will be provided together with the
notion of rate-induced tipping and colliding attractors.
Most of the chapter will be devoted to ordinary differential equations and maps,
the traditional focus of dynamical systems theory, venturing however towards the
end into the intricacies of time delay dynamical systems.

2.1 Basic Concepts of Dynamical Systems Theory

Dynamical systems theory deals with the properties of coupled differential equa-
tions, determining the time evolution of a few, typically a handful of variables. We
present a concise overview covering the most important concepts and phenomena.

Fixpoints and Limit Cycles In order to illustrate methodologies typical for


dynamical systems theory, we start with an elementary non-linear rotator. For a
two-dimensional systems .x = (x, y), or

.x(t) = r(t) cos(ϕ(t)), y(t) = r(t) sin(ϕ(t)) (2.1)

in polar coordinates, we postulate the non-linear differential equations

ṙ = (Γ − r 2 ) r,
. ϕ̇ = ω (2.2)

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 45


C. Gros, Complex and Adaptive Dynamical Systems,
https://doi.org/10.1007/978-3-031-55076-8_2
46 2 Bifurcations and Chaos in Dynamical Systems

y
y

x x

Fig. 2.1 The solution of the non-linear rotator, compare (2.1) and (2.2). For .Γ < 0 the orbit is
attracted by a stable fixpoint (left), for .Γ > 0 by a stable limit cycle (right)

govern the dynamical behavior. The typical orbits .(x(t), y(t)) are illustrated in
Fig. 2.1. The limiting behavior of (2.2) is
⎧  

⎪ 0
  ⎪
⎨ Γ <0
x(t) 0
. lim =   (2.3)
t→∞ y(t) ⎪ rc cos(ωt)


⎩ rc2 = Γ > 0
rc sin(ωt)

In the first case, .Γ < 0, trajectories are attracted by the stable fixpoint .x∗0 = (0, 0).
In the second case, .Γ > 0, the dynamics approaches stable periodic orbit.

LIMIT CYCLE Trajectories retrace themselves with a period T for limit cycles, viz one
has .x(t + T ) = x(t).

Limit cycles can be attracting or repelling. In (2.3) one has an attracting limit
cycle for .Γ > 0.

Bifurcation Of particular interest are situations where the behavior of a dynamical


system changes qualitatively.

BIFURCATION The long-term limiting behavior of a dynamical system described by a set


of parameterized differential equations may change qualitatively as a function of an external
parameter. The resulting change in terms of fixpoints, limit cycles and/or chaotic attractors
constitutes a bifurcation.

The dynamical system (2.1) and (2.2) shows a bifurcation at .Γ = 0, at which


a fixpoint turns into a limit cycle. One denotes this specific type of bifurcation a
“Hopf bifurcation”, as discussed lateron in detail
2.1 Basic Concepts of Dynamical Systems Theory 47

Fig. 2.2 A fixpoint is stable


(unstable) when orbits are unstable
attracted (repelled)
stable

Stability of Fixpoints The dynamics of orbits close to fixpoints or limiting orbits


is determined by its stability.

STABILITY CONDITION A fixpoint is stable (unstable) if nearby orbits are attracted


(repelled) by the fixpoint, and metastable if the distance does not change.

An illustration is given in Fig. 2.2. The stability of fixpoints is closely related to


their Lyapunov exponents, as discussed in Sect. 2.4.
One can examine the stability of a fixpoint .x∗ by linearizing the equation of
motions for .x ≈ x∗ . For the fixpoint .r ∗ = 0 of (2.2) we find

ṙ = Γ − r 2 r ≈ Γ r,
. r ⪡ 1,

and .r(t) decreases (increases) for .Γ < 0 (.Γ > 0). For a d-dimensional system
x = (x1 , . . . , xd ) the stability of a fixpoint .x∗ is determined by calculating the
.

d eigenvalues of the linearized equations of motion. The system is stable if all


eigenvalues are negative and unstable if at least one eigenvalue is positive.

Phase Space A given dynamical systems lives in its phase space.

PHASE SPACE The phase space is the space spanned by all allowed values of the variables
entering a set of first-order differential equations defining the dynamical system.

For a two-dimensional system .(x, y) the phase space is .R2 , but in the polar
coordinates one has

. (r, ϕ) r ∈ [0, ∞], ϕ ∈ [0, 2π [ .

The representation of a phase space changes upon coordinate transformations.

Attracting States and Manifolds Attracting states play a center role in dynamical
systems theory.

ATTRACTOR A bounded region in phase space to which orbits with certain initial
conditions come arbitrarily close is called an attractor.

Attractors can be isolated points, fixpoints, limit cycles or more complex objects
like attracting manifolds, viz subsets of phase space, or chaotic attractors.
48 2 Bifurcations and Chaos in Dynamical Systems

BASIN OF ATTRACTION The set of initial conditions in phase space leading to orbits
approaching a certain attractor arbitrarily closely is the basin of attraction.

Most of the times the extend of a basin of attraction can be evaluated only
numerically.

First-Order Differential Equations Let us consider the third-order differential


equation

d3
. x(t) = f (x, ẋ, ẍ) . (2.4)
dt 3
Using

x1 (t) = x(t),
. x2 (t) = ẋ(t), x3 (t) = ẍ(t) , (2.5)

we can rewrite (2.4) as a first-order differential equation,


⎡ ⎤ ⎡ ⎤
x x2
d ⎣ 1⎦ ⎣ ⎦.
. x2 = x3
dt
x3 f (x1 , x2 , x3 )

One can reduce any set of coupled differential equations to a set of first-order differ-
ential equations by introducing an appropriate number of additional variables. We
therefore consider in the following only first-order, ordinary differential equations.

Autonomous Systems The generic form for a dynamical system is

dx(t)
. = f(x(t)), x, f ∈ Rd , t ∈ [−∞, +∞] , (2.6)
dt
when time is continuous, or, equivalently, maps such as

x(t + 1) = g(x(t)),
. x, g ∈ Rd , t = 0, 1, 2, . . . , (2.7)

when time is discrete. Together with the time evolution equation one has to set
the initial condition .x0 = x(t0 ). An evolution equation of type (2.15) is denoted
“autonomous”, since it does not contain an explicit time dependence. A system of
type .ẋ = f(t, x) is dubbed “non-autonomous”.
A particular solution .x(t) of a dynamical system in phase space is denoted “tra-
jectory”, or “orbit”. Orbits are uniquely determined by the set of initial conditions,
.x(0) ≡ x0 , a consequence of dealing with first-order differential equations.
2.1 Basic Concepts of Dynamical Systems Theory 49

Fig. 2.3 The Poincaré map,


.x → P(x), which maps an
intersection .x of the trajectory
with a hyperplane (shaded) to
the consecutive intersection
.P(x)

P(x)
x

Poincaré Map It is in general difficult to illustrate graphically the motion of .x(t)


in d dimensions. Our retina, as well as our print media, are two-dimensional and
it is therefore convenient at times to consider a plane .Σ in .Rd together with the
points .x(i) marking the consecutive intersections of an orbit with .Σ, as illustrated in
Fig. 2.3. As an example consider the plane

.Σ = { (x1 , x2 , 0, . . . , 0) | x1 , x2 ∈ R }

and the sequence of intersections (see Fig. 2.3)

x(i) = (x1(i) , x2(i) , 0, . . . , 0),


. (i = 1, 2, . . .)

which define the “Poincaré map”

.P : x(i) ⍿→ x(i+1) . (2.8)

The Poincaré map is a discrete map, compare (2.7), which can be constructed for
continuous-time dynamical systems like (2.15). The Poincaré map is very useful,
since we can print and analyze it directly. In the simplest case, a periodic orbit
would show up in the Poincaré map as the identity mapping.

Constants of Motion and Ergodicity Dynamical systems derived in physics from


Hamilton- or Lagrange formalism are “mechanical systems”. The vast majority of
dynamical systems we will examine are non-mechanical. There are however several
foundational concepts from the realm of mechanical systems that are generically
relevant.
50 2 Bifurcations and Chaos in Dynamical Systems

– CONSTANT OF MOTION
A function .F (x) on phase space .x = (x1 , . . . , xd ) is a “constant of motion” if it
is conserved under the time evolution of the dynamical system, i.e. when

d 
 
d ∂
. F (x(t)) = F (x) ẋi (t) ≡ 0
dt ∂xi
i=1

holds for all times t. In many mechanical systems the energy is a conserved
quantity.
– ERGODICITY
A dynamical system in which orbits come arbitrarily close to any allowed point
in the phase space, irrespective of the initial condition, is called ergodic.

All conserving systems of classical mechanics are ergodic. The ergodicity of a


mechanical system is closely related to “Liouville’s theorem”.1
Ergodicity holds only modulo conserved quantities, which is the case for the
energy in many mechanical systems. Then, only points in the phase space having the
same energy as the trajectory considered are approached arbitrarily close. It is clear
that ergodicity and attractors are mutually exclusive: An ergodic system cannot have
attractors and a dynamical system with one or more attractors cannot be ergodic.

Mechanical Systems and Integrability A dynamical system of type

ẍi = fi (x, ẋ),


. i = 1, . . . , f

is a mechanical system, since equations of motion in classical mechanics are of


this form, e.g. Newton’s law. f is called the degree of freedom, which implies
that mechanical systems can be written as a set of coupled first-order differential
equations with 2f variables constituting the phase space,

(x1 . . . xf , v1 . . . vf ),
. vi = ẋi , i = 1, . . . , f ,

with .v = (v1 , . . . , vf ) being the generalized velocity. A mechanical system is


“integrable” if there are .α = 1, . . . , f independent constants of motion .Fα (x, ẋ),
such that
d
. Fα (x, ẋ) = 0, α = 1, . . . , f .
dt

The motion in the 2f -dimensional phase space .(x1 . . . xf , v1 . . . vf ) is then


restricted to an f -dimensional subspace, which is an f -dimensional torus, see
Fig. 2.4.

1 Liouville’s theorem will be discussed in Sect. 3.1.1 of Chap. 3.


2.1 Basic Concepts of Dynamical Systems Theory 51

(1) (2) (3)

(4) ( 4)

(1) (2) (3)

Fig. 2.4 Left: A KAM-torus can be cut along two lines (vertical/horizontal) and unfolded. Right:
A closed orbit on the unfolded torus with .ω1 /ω2 = 3/1. The numbers indicate points that coincide
after refolding (periodic boundary conditions)

An example of an integrable mechanical system is the Kepler problem, viz the


motion of earth around the sun. Integrable systems are very rare, they constitute
however important reference points for the understanding of more complex dynam-
ical systems. A classical example of a non-integrable mechanical system is the
three-body problem, viz the combined motion of earth, moon and sun around each
other.

KAM Theorem Kolmogorov, Arnold and Moser (KAM) examined the question
of what happens to an integrable system when it is perturbed. Let us consider
a two-dimensional torus, as illustrated in Fig. 2.4. Orbit wraps around the torus
respectively with frequencies .ω1 and .ω2 . A key quantity is the ratio of revolution
frequencies .ω1 /ω2 , which can be rational or irrational.
We recalr that irrational numbers r may be approximated with arbitrary accuracy
by a sequence of quotients
m1 m2 m3
. , , , ... s1 < s2 < s3 < . . .
s1 s2 s3

with ever larger denominators .si . A number r is “very irrational” when it is


difficult to approximate r by such a series of rational numbers, viz when very large
denominators .si are required to achieve a certain given accuracy for .|r − m/s|.
The KAM theorem states that orbits with rational ratios of revolution frequencies
.ω1 /ω2 are the most unstable under a perturbation of an integrable system and that

tori are most stable when this ratio is very irrational.

Gaps in the Saturn Rings A spectacular example for the instability of rational
KAM-tori are the gaps in the rings of the planet Saturn.
The time a hypothetical particle in Cassini’s gap (between the A- and the B-
ring, .r = 118,000 km) would need to orbit Saturn is half the orbiting period of the
‘shepherd-moon’ Mimas. The quotient of these two revolving frequencies is 2:1.
Particle orbiting in Cassini’s gap are therefore unstable against perturbation caused
by Mimas and thrown out of their orbit.
52 2 Bifurcations and Chaos in Dynamical Systems

2.2 Fixpoints, Bifurcations and Stability

We start our systematic discussions by considering the stability of fixpoint .x ∗ of a


one-dimensional dynamical system

ẋ = f (x),
. f (x ∗ ) = 0 . (2.9)

A fixpoint is the most simplest invariant manifold.2 Generically, invariant manifolds


are sets of phase space that that are invariant under the dynamical flow.
Fixpoints are the only possible invariant manifolds for .d = 1 dimension, in
two dimensions fixpoints and limiting cycles are possible, with more complicated
objects, such as strange attractors, becoming accessible in three and higher dimen-
sions.

Lyapunov Exponents The stability of a fixpoint is determined by the direction


of the flow close to it, which can be determined by linearizing the time evolution
equation .ẋ = f (x) around the fixpoint .x ∗ ,

d 
. x − x ∗ = ẋ ≈ f (x ∗ ) + f ' (x ∗ )(x − x ∗ ) + . . . , (2.10)
dt

where .f ' () denotes the first derivative. We rewrite (2.10) as

d
. Δx = f ' (x ∗ ) Δx, Δx = x − x ∗ ,
dt

where we neglected terms of order .(x − x ∗ )2 and higher and where we made use of
the fixpoint condition .f (x ∗ ) = 0. This equation has the solution

' (x ∗ ) ∞ f ' (x ∗ ) > 0
. Δx(t) = Δx(0) etf → . (2.11)
0 f ' (x ∗ ) < 0

The perturbation .Δx decreases/increases with time and the fixpoint .x ∗ is hence
stable/unstable for .f ' (x ∗ ) < 0 and .f ' (x ∗ ) > 0 respectively.

LYAPUNOV EXPONENT The time evolution close to a fixpoint .x ∗ is generically exponen-


tial, .∼ exp(λt), and one denotes by .λ = f ' (x ∗ ) the Lyapunov exponent.

The Lyapunov exponent controls the direction of the flow close to a fixpoint.
Orbits are exponentially repelled/attracted for .λ > 0 and for .λ < 0 respectively. For
more than a single variable one has to find all eigenvalues of the linearized problem,

2 Invariant manifolds are touched upon further in in Sect. 3.2.1 of Chap. 3.


2.2 Fixpoints, Bifurcations and Stability 53

as discussed further below, with the fixpoint being stable only when all eigenvalues
are negative.

Fixpoints of Discrete Maps For a discrete map of type

x(t + 1) = g(x(t)),
. x ∗ = g(x ∗ ) (2.12)

the stability of a fixpoint .x ∗ can be determined by an equivalent linear analysis,

x(t + 1) = g(x(t)) ≈ g(x ∗ ) + g ' (x ∗ )(x(t) − x ∗ ) .


.

Using the fixpoint condition .g(x ∗ ) = x ∗ we write above expression as

Δx(t + 1) = x(t + 1) − x ∗ = g ' (x ∗ ) Δx(t) ,


.

which has the solution


 t
Δx(t) = Δx(0) g ' (x ∗ ) ,
. |Δx(t)| = |Δx(0)| eλt . (2.13)

The Lyapunov exponent,



<0 for |g ' (x ∗ )| < 1
λ = log |g ' (x ∗ )| =
. , (2.14)
>0 for |g ' (x ∗ )| > 1

controls the stability of the fixpoint. Note the differences in the relation of the
Lyapunov exponent .λ to the derivatives .f ' (x ∗ ) and .g ' (x ∗ ) for differential equations
and maps respectively, as given by (2.11) and (2.14).

2.2.1 Fixpoints Classification and Jacobian

We assume that the general d-dimensional dynamical system

dx(t)
. = f(x(t)), x, f ∈ Rd , f(x∗ ) = 0 (2.15)
dt

has a fixpoint .x∗ .

Jacobian and Lyapunov Exponents For a stability analysis of the fixpoint .x∗ one
linearizes (2.15) around the fixpoint, using .xi (t) ≈ xi∗ + δxi (t), with small .δxi (t).
One obtains

dδxi  ∂fi (x)


. = Jij δxj , Jij = . (2.16)
dt ∂xj x=x∗
j
54 2 Bifurcations and Chaos in Dynamical Systems

The matrix .Jij of all possible partial derivatives is the “Jacobian” of the dynamical
system (2.15). The Jacobian allows to generalizes the definition of the Lyapunov
exponent for one-dimensional systems, as given previously.

LYAPUNOV SPECTRUM The set of eigenvalues .{λi } of the Jacobian .i = 1, .., d is the
spectrum characterizing the fixpoint .x∗ .

Lyapunov exponents .λn = λ'n + iλ''n may have real .λ'n and imaginary .λ''n
components, which leads to the time evolution
' ''
eλn t = eλn t eiλn t
.

of infinitesimal perturbations around the fixpoint. A Lyapunov exponent .λn is


attracting/neutral/repelling when .λ'n is negative/zero/positive respectively.

HYPERBOLIC FIXPOINT The flow is well defined in linear order when all .λ'i /= 0. In this
case the fixpoint is hyperbolic.

For a non-hyperbolic fixpoint at least one of the Lyapunov exponents is neutral.


All Lyapunov exponents are neutral for a vanishing Jacobian.

Pairwise Conjugate Exponents With .λ = λ' +iλ'' also its conjugate .λ∗ = λ' −iλ''
in an eigenvalue of the Jacobian, which is a real matrix. It follows that .λ and .λ∗ differ
when .λ'' /= 0. In this caser there are (at least) two eigenvalues having the same real
part .λ' .

Fixpoint Classification for .d = 2 In Fig. 2.5 some example trajectories are shown
for several fixpoints in .d = 2 dimensions.

– NODE
Both eigenvalues are real and have the same sign, which is negative/positive for
a stable/unstable node.
– SADDLE
Both eigenvalues are real and have opposite signs.
– FOCUS
The eigenvalues are complex conjugate to each other. The trajectories spiral
in/out for negative/positive real parts.

Fixpoints in higher dimensions are characterized by the number of respective


attracting/neutral/repelling eigenvalues of the Jacobian, which may be in turn either
real or complex.
2.2 Fixpoints, Bifurcations and Stability 55

Fig. 2.5 Example trajectories for a stable node (left), with a ratio .λ2 /λ1 = 2, for a saddle (middle)
with .λ2 /λ1 = −3, and for a unstable focus (right)

Stable and Unstable Manifolds For real eigenvalues .λn /= 0 of the Jacobian J ,
with eigenvectors .en and a sign .sn = λn /|λn |, we define via

. lim xn (t) = x∗ + etλn en , J en = λn en , (2.17)


t→−sn ∞

trajectories .xn (t) that leave/approach the fixpoint parallel to an eigendirection.


These are stable manifolds (for .λn < 0) and unstable manifolds (for .λn > 0).

For a neutral Lyapunov exponent with .λn = 0 one define a “center manifold”
which we will discuss in the context of catastrophe theory in Sect. 2.3.2. The term
manifold denotes in mathematics, loosely speaking, a smooth topological object.
Stable and unstable manifolds control the flow infinitesimal close to the fixpoint
along the eigendirections of the Jacobian and may be continued to all positive and
negative times t. Typical examples are illustrated in Figs. 2.5 and 2.6.

Heteroclinic Orbits One speaks of a “heteroclinc orbit” when the unstable


manifold of one fixpoint connects to the stable manifold of another fixpoint, and
viceversa. As an example, we consider a two dimensional dynamical system defined
by
 
ẋ = 1 − x 2 −2x ∗ 0
.  2 J (x ∗ ) = , (2.18)
ẏ = yx + ϵ 1 − x 2 0 x∗

which has two saddles .x∗± = (x ∗ , 0), where .x ∗ = ±1. The eigenvectors of the
Jacobian .J (x ∗ ) are aligned with the x and the y axis respectively, for all values of
the control parameter .ϵ.

The flow diagram, as illustrated in Fig. 2.6, is invariant when inverting both .x ↔
(−x) and .y ↔ (−y). The system contains additionally the .y = 0 axis as a mirror
line for a vanishing .ϵ = 0 and there is a heteroclinic orbit connecting an unstable
manifold of .x∗− to one of the stable manifolds of .x∗+ . A finite .ϵ removes the mirror
56 2 Bifurcations and Chaos in Dynamical Systems

Fig. 2.6 Sample trajectories of the system (2.18), for .ϵ = 0 (left) and .ϵ = 0.2 (right). Shown
are the stable manifolds (thick green lines), the unstable manifolds (thick blue lines) and the
heteroclinic orbit (thick red line)

line .y = 0, present at .ϵ = 0, destroying the heteroclinic orbit. Real world systems


are often devoid of symmetries and heteroclinic orbits hence rare.

2.2.2 Bifurcations and Normal Forms

The nature of the solution to a dynamical system, as defined by a suitable first order
differential equation (2.15), may change qualitatively as a function of a given control
parameter. When this happens, a bifurcation occurs. Here we list the most important
classes of what is called a “local bifurcation”. The distinction to their counterpart,
“global bifurcations”, will be elucidated in Sect. 2.3.
A local bifurcations can be characterized by a simple archetypical equation, the
“normal form”, to which a more complex dynamical systems will generically reduce
close to the transition point.

Saddle-Node Bifurcation We consider the dynamical system defined by

dx
. = a − x2 , (2.19)
dt
for a real variable x and a real control parameter a. The fixpoints .ẋ = 0
∗ √ ∗ √
x+
. = + a, x− = − a, a>0 (2.20)

exist only for positive control parameters, .a > 0; there are no fixpoints for negative
a < 0. For the flow we find
.

⎧ √

⎨ < 0 for x > a
dx √ √
. = > 0 for x ∈ [− a, a] (2.21)
dt ⎪
⎩ < 0 for x < − a √
2.2 Fixpoints, Bifurcations and Stability 57

Fig. 2.7 The saddle-node bifurcation, as described by (2.19). There are two fixpoints for .a > 0,
∗ = −√a and a stable branch .x ∗ = +√a. Left: The phase diagram together
an unstable branch .x− +
with the flow (arrows). Right: The bifurcation potential .U (x) = −ax + x 3 /3, compare (2.26)

when .a > 0. The upper branch .x+∗ is hence stable and the lower branch .x ∗ unstable,

as illustrated in Fig. 2.7.
For a saddle-node bifurcation a stable and an unstable fixpoint collide and
annihilate each other, one speaks also of a fold, tangential or blue sky bifurcation.

Transcritical Bifurcation We now consider the dynamical system

dx
. = x(a − x) , (2.22)
dt
again for a real variable x and a real control parameter a. The two fixpoint solutions
ẋ = 0,
.

x0∗ = 0,
. xa∗ = a, ∀a (2.23)

exist for all values of the control parameter. The direction of the flow .ẋ is positive for
x in between the two solutions and negative otherwise, see Fig. 2.8. The respective
stabilities of the two fixpoint solutions exchange consequently at .a = 0.

Pitchfork Bifurcation The “supercritical” pitchfork bifurcation is described by

dx √ √
. = ax − x 3 , x0∗ = 0, ∗
x+ = + a, ∗
x− = − a. (2.24)
dt

The trivial fixpoint .x0∗ = 0 becomes unstable at criticality, .a = 0, and two


symmetric stable fixpoints appear, see Fig. 2.9.

Bifurcation Potentials In many cases one can write the dynamical system under
consideration as
dx d
. =− U (x) , (2.25)
dt dx
58 2 Bifurcations and Chaos in Dynamical Systems

Fig. 2.8 The transcritical bifurcation, see (2.22). The two fixpoints .x0∗ = 0 and .xa∗ = a exchange
stability at .a = 0. Left: Phase diagram and flow (arrows). Right: The bifurcation potential .U (x) =
−ax 2 /2 + x 3 /3, compare (2.26)

where .U (x) is a potential in analogy to Newton’s equation of motion in classical


mechanics. Local minima of the potential correspond to stable fixpoints, compare
Fig. 2.2. The potentials for the saddle-node and the transcritical bifurcation are

1 a 1
Usaddle (x) = −ax + x 3 ,
. Utrans (x) = − x 2 + x 3 , (2.26)
3 2 3
respectively, see the definitions (2.19) and (2.22). The bifurcation potentials, as
shown in Figs. 2.7 and 2.8, bring immediately to evidence the stability of the
respective fixpoints. The bifurcation potential of the pitchfork bifurcation,

a 1
Upitch (x) = − x 2 + x 4 ,
. (2.27)
2 4
is identical to the Landau-Ginzburg potential describing second-order phase transi-
tions in statistical physics.3 Of relevance is also the subcritical pitchfork transition,
defined by .ẋ = ax + x 3 .

Bifurcation Symmetries The three bifurcation scenarios discussed above, saddle-


node, transcritical and pitchfork, are characterized by their symmetries close to the
critical point, which has been set to .x = 0 and .a = 0 in all three cases. The normal
forms and their respective bifurcation potentials constitute the simplest formulations
consistent with the defining symmetry properties.

3 The Landau-Ginzburg theory of phase transitions will be treated at length in the context of self-
organized criticality, in Sect. 6.1. of Chap. 6.
2.2 Fixpoints, Bifurcations and Stability 59

Fig. 2.9 The supercritical pitchfork bifurcation, as defined by (2.24). The t.x0∗ = 0 becomes
∗ = +√a and .x ∗ = −√a appear. Left: Phase
unstable for .a > 0 and two new stable fixpoints, .x+ −
diagram and flow (arrows). Right: The bifurcation potential .U (x) = −ax 2 /2 + x 4 /4, compare
(2.26)

The bifurcation potentials of the saddle-node and the pitchfork transitions are
respectively antisymmetric and symmetric under a sign change .x ↔ −x of the
dynamical variable, compare (2.26) and (2.27).

(+) ↔ (−) saddle-node transcritical pitchfork


. x anti – symm
a, x – anti –

The bifurcation potential of the transcritical bifurcation is, on the other hand,
antisymmetric under the combined symmetry operation .x ↔ −x and .a ↔ −a,
compare (2.26).

2.2.3 Hopf Bifurcations and Limit Cycles

Hopf Bifurcation A Hopf bifurcation occurs when a fixpoint changes its stability
together with the appearance of an either stable or unstable limiting cycle, e.g. as
for non-linear rotator illustrated in Fig. 2.1. The canonical equations of motion are4

ẋ = −y + d(Γ − x 2 − y 2 ) x
. (2.28)
ẏ = x + d(Γ − x 2 − y 2 ) y

4 In complex plane, with .z = x + iy, Eq. (2.28) takes the form of a Stuart–Landau oscillator,
.ż = iωz + d(Γ − |z|2 )z, with .ω = 1.
60 2 Bifurcations and Chaos in Dynamical Systems

in Euclidean phase space .(x, y) = (r cos ϕ, r sin ϕ). The standard non-linear rotator
(2.2) is recovered when setting .d = 1. There are two steady-state solutions for
.Γ > 0,


(x0∗ , y0∗ ) = (0, 0),
. (xΓ∗ , yΓ∗ ) = Γ (cos(t), sin(t)) , (2.29)

a fixpoint and a limit cycle. The limit cycle disappears for .Γ < 0.

Super- vs. Sub-critical Hopf Bifurcation For d > 0 the bifurcation is “supercrit-
ical”. The fixpoint x∗0 = (x0∗ , y0∗ ) is stable/unstable for Γ < 0 and Γ > 0 and the
limit cycle x∗Γ = (xΓ∗ , yΓ∗ ) is stable, as illustrated in Fig. 2.1.

The direction of flow is reversed for d < 0, with the limit cycle x∗Γ becoming
repelling. The fixpoint x∗0 is then unstable/stable for Γ < 0 and Γ > 0 and one
speaks of a “subcritical” Hopf bifurcation.

Hopf Bifurcation Theorem One may be interested to find out whether a generic
two dimensional system,

ẋ = fμ (x, y),
. ẏ = gμ (x, y) , (2.30)

can be reduced to the normal form (2.28) for a Hopf bifurcation, where μ is the
bifurcation parameter. Without loss of generality one can assume that the fixpoint
x∗0 stays at the origin for all values of μ and that the transition takes place for μ = 0.

To linear order the normal form (2.28) and (2.30) are equivalent if the Jacobian
of (2.30) has a pair of complex conjugate eigenvalues, with the real value crossing
μ = 0 with a finite slope, which corresponds to a transition from a stable to an
unstable focus.
Comparing (2.28) and (2.30) to quadratic order one notices that quadratic terms
are absent in the normal form (2.28) but not in (2.30). One can however show,
with the help of a suitable non-linear transformation, that it is possible to eliminate
all quadratic terms from (2.30). One finds that the nature of the bifurcation is
determined by a combination of partial derivatives up to cubic order,
 
a = fxxx + fxyy + gxxy + gyyy /16
.  
+ fxy (fxx + fyy ) − gxy (gxx + gyy ) − fxx gxx − fyy gyy /(16ω)

where ω > 0 is the imaginary part of the Lyapunov exponent at the critical point
μ = 0 and where the partial derivatives as fxy are to be evaluated at μ = 0 and
x → x∗0 . The Hopf bifurcations is supercritical and subcritical respectively for a < 0
and a > 0.
2.2 Fixpoints, Bifurcations and Stability 61

μ = 1.1

dr/dt
μ = 0.7
limit cycles

μ=0
stable μ =-0.7

unstable 1 r

0 1 μ
 √
Fig. 2.10 Left: For a fold bifurcation of limit cycles, the locations R± = 1 ± μ of the stable
and unstable limit cycles, R− and R+ , as defined by (2.31) and (2.32). At μ = 0 a fold bifurcation
of limit cycles occurs and a subcritical Hopf bifurcation at μ = 1. Right: The respective flow
diagram. The filled/open circles denote stable/unstable limit cycles. For μ → 1 the unstable limit
cycle vanishes, a subcritical Hopf bifurcation. The stable and the unstable limit cycle collided for
positive μ → 0 and annihilate each other, a fold bifurcation of limit cycles

Interplay Between Multiple Limit Cycles A dynamical system may dispose of


a number of distinct limit cycles, which can merge or disappear as a function of a
given parameter. We consider the simplest case, generalizing the non-linear rotator
(2.2) to the next order in r 2 ,

ṙ = −(r 2 − γ− )(r 2 − γ+ ) r,
. ϕ̇ = ω, γ − ≤ γ+ . (2.31)

Real-world physical or biological systems have bounded trajectories, which implies


that ṙ must be negative for large radii r → ∞. This requirement has been taken into
account in (2.31), the “Bautin normal form”.

For γ− < 0 the first factor (r 2 − γ− ) is smooth for r 2 ≥ 0 and does not influence
the dynamics qualitatively. In this case (2.31) reduces to a standard supercritical
Hopf bifurcation, as a function of the bifurcation parameter γ+ .

Phenomenological Parametrization The roots γ± of r 2 in (2.31) typically result


from some determining relation. As a possible simple assumption we consider the
form

γ± = 1 ±
. μ, (2.32)

where μ will be our bifurcation parameter. For μ ∈ [0, 1] we have two positive roots
and consequently also two limit cycles, a stable and an unstable one. For μ → 1 the
unstable limit cycle vanishes in a subcritical Hopf bifurcation, compare Fig. 2.10.

Fold Bifurcation of Limit Cycles For a saddle-node bifurcation of fixpoints, aka


fold bifurcation, a stable and an unstable fixpoint merge and annihilate each other, as
illustrated in Fig. 2.7. The equivalent phenomenon occurs for limit cycles, as shown
62 2 Bifurcations and Chaos in Dynamical Systems

in Fig. 2.10, happening in our model when μ becomes negative,


  
γ± = 1 ± i |μ|,
. ṙ = − (r 2 − 1)2 + |μ| r .

No limit cycle exists for μ ≤ 0, only a stable fixpoint at r0∗ = 0 remains.

2.3 Global Bifurcations

The bifurcations discussed in Sect. 2.2 are termed “local” as they are based on
Taylor expansions around a local fixpoint, which implies that the dynamical state
changes smoothly at the bifurcation point. There are, on the other hand, bifurcations
characterized by the properties of extended orbits. These kinds of bifurcations are
hence of “global” character.

Taken–Bogdanov System We consider a mechanical system with a cubic poten-


tial .V (x) and velocity-dependent forces,

ẍ = (x − μ)ẋ − V ' (x) ẋ = y


. . (2.33)
V (x) = x 3 /3 − x 2 /2 ẏ = (x − μ)y + x(1 − x)

The conservative contribution to the force is .−V ' (x) = x(1 − x), as illustrated in
Fig. 2.11. The energy, defined by

ẋ 2 dE  
. E= + V (x), = ẍ + V ' (x) ẋ = (x − μ)ẋ 2 , (2.34)
2 dt

is dissipated when .x < μ, viz when the term .(x −μ)ẋ in (2.33) reduces the velocity.
The energy increases however for .x > μ, which means that the term .(x − μ)ẋ
induces accelerated a movement. This interplay between energy dissipation and
uptake is typical for adaptive systems.5

Fixpoints and Jacobian The Taken–Bogdanov system (2.33) has two fixpoints
(x ∗ , 0), with .x ∗ = 0, 1. The eigenvalue of the Jacobian,
.

  
0 1 λ± (0, 0) = −μ/2 ± μ2 /4 + 1
J =
. , 
(1 − 2x ∗ ) (x ∗ − μ) λ± (1, 0) = (1 − μ)/2 ± (1 − μ)2 /4 − 1

show that .(0, 0) is always a saddle, a consequence of the quadratic maximum of


mechanical potential .V (x) at .x = 0.

5 Per definition, adaption implies that a system may both dissipate energy and increase its own
reservoir, as discussed further in Sect. 3.2 of Chap. 3.
2.3 Global Bifurcations 63

y μ =μc
V(x)

energy energy
dissipation uptake 1 x

0 μ 1 x

Fig. 2.11 Left: The potential .V (x) of the Taken–Bogdanov system (2.33). Energy is dissipated
to the environment for .x < μ and taken up for .x > μ. For .μ < 1 the local minimum .x = 1 of
the potential becomes an unstable focus. Right: The flow for the critical .μ = μc ≈ 0.8645. The
stable and unstable manifolds form an homoclinic loop (red line)

The local minimum .(1, 0) of the potential is a stable/unstable focus for .μ > 1 and
μ < 1 respectively, with .μ = 1 being the locus of a supercritical Hopf bifurcation.
.

We consider now .μ ∈ [0, 1] and examine the further evolution of the resulting limit
cycle.

Escaping the Potential Well We consider a particle starting with a vanishing


velocity close to .x = 1, the local minimum of the Potential well.
When the particle takes up enough energy from the environment, due to the
velocity dependent force .(x − μ)v, it may be able to reach the local maximum
at .x = 0 and escape to .x → −∞. This is the case when .μ < μc ≈ 0.8645. All
orbits will escape when the region of energy uptake did expand to .μc .
The particle remains trapped in the local potential well if dissipation dominates,
which is the case for .μ > μc . The particle is both trapped in the local well and
repelled, at the same time, from the unstable minimum at .x = 1, when .μc < μ < 1.
The orbit is forced in this case to perform an oscillatory motion around .x = 1, which
is equivalent to a stable limit cycle. This limit cycle increases in size for decreasing
.μ, exactly touching .(0, 0) for .μ = μc , breaking apart byond, when .μ < μc .

The bifurcation occurring at .μ = μc depends non-locally on the overall energy


balance, as illustrated in Fig. 2.12. The “homoclinic transition” taking place in the
Taken–Bogdanov system is therefore an example of a global bifurcation.

Homoclinic Bifurcation With a “homocline” one denotes a loop formed by


joining a stable and an unstable manifold of the same fixpoint. Homoclines may
generically only occur if either forced by symmetries or for special values of
bifurcation parameters, with the later being the case for the Taken–Bogdanov
system, which undergoes a homoclinic bifurcation.

An unstable and a stable manifold cross at .μ = μc , compare Figs. 2.11 and 2.12,
forming a homocline. The homocline is also the endpoint of the limit cycle present
for .μ > μc , which disappears for .μ < μc . The limit cycle is hence destroyed at the
64 2 Bifurcations and Chaos in Dynamical Systems

y y
μ > μc μ < μc

1 x 1 x

Fig. 2.12 The flow for the Taken–Bogdanov system (2.33). Left: The flow in the subcritical
region, for .μ = 0.9 > μc ≈ 0.8645, together with the resulting limit cycle (thick black line).
For .μ → μc the unstable (red) and stable (orange) manifold join to form a homoclinic loop, which
is identical to the locus of the limit cycle for .μ → μc . Right: The flow in the supercritical region,
for .μ = 0.8 < μc . The limit cycle did break after touching the saddle at .(0, 0)

point of its maximal size for a homoclinic bifurcation, and not when vanishing, as
for a supercritical Hopf bifurcation.

2.3.1 Infinite Period Bifurcation

For a further example of how a limit cycle may disappear discontinuously we con-
sider two coupled oscillators characterized by their individual phases, respectively
.θ1 and .θ2 . A typical evolution equation for the phase difference .ϕ = θ1 − θ2 is
6

ϕ̇ = 1 − K cos(ϕ) .
. (2.35)

We can interpret (2.35) via



.ṙ = 1 − r 2 r, x = r cos(ϕ), y = r sin(ϕ) (2.36)

within the context of a two dimensional limit cycle, compare (2.2).


The phase difference .ϕ continuously increases for .|K| < 1, with the system
settling into a limit cycle. For .|K| > 1 two fixpoints for .ϕ̇ = 0 appear, a saddle
and a stable node, as illustrated in Fig. 2.13. At this point, which correspond to a
saddle-node bifurcation on an invariant cycle, the limit cycle breaks up.

6 This formulation parallels the Kuramoto model, which is treated detail in Sect. 9.1 of Chap. 9.
2.3 Global Bifurcations 65

Fig. 2.13 The flow for a


system showing an infinite
y
period bifurcation, as defined
by (2.35) and (2.36). Here
.K = 1.1. The stable fixpoint
(filled circle) merges with the
saddle (open circle) in the x
limit .K → 1, which leads to
a saddle-node bifurcation on
an invariant cycle

Infinite Period Bifurcation The limit cycle for .|K| < 1 has a revolution period T
of
 T  2π  2π  2π
dt dϕ dϕ 2π
.T = dt = dϕ = = =√ ,
0 0 dϕ 0 ϕ̇ 0 1 − K cos(ϕ) 1 − K2

which diverges in the limit .|K| → 1. The transition occurring at .|K| = 1 is hence a
“infinite period bifurcation”, being characterized by a diverging time scale.

2.3.2 Catastrophe Theory

A catchy terminology for potentially discontinuous bifurcations in dynamical


systems is “catastrophe theory”, especially when placing emphasis on a geometric
interpretation. Catastrophe theory is interested in bifurcations with “codimension”
two or higher.

CODIMENSION The degrees of freedom characterizing a bifurcation diagram.

The codimension corresponds, colloquially speaking, to the number of parame-


ters one may vary such that something interesting happens. The bifurcation normal
forms discussed in Sect. 2.2.2 have a codimension of one.

Symmetry Broken Pitchfork Bifurcation We consider the one-dimensional sys-


tem

ẋ = h + ax − x 3 .
. (2.37)

For .h = 0 the system reduces to the pitchfork normal form of (2.24), becoming
invariant under the parity transformation .x ↔ −x.

Parity is broken whenever .h /= 0. Equation (2.37) can hence be used to study


the influence of symmetry breaking onto a bifurcation diagram. There are two free
parameters, h and a, the codimension is two.
66 2 Bifurcations and Chaos in Dynamical Systems

(3)

(4)

x (2) x*

(1)

Fig. 2.14 Left: The self-consistency condition .x 3 = h + ax for the fixpoints .x ∗ of the symmetry
broken pitchfork system (2.37), for various fields h and a positive .a > 0. The unstable fixpoint
at .x = 0 would becomes stable for .a < 0, compare Fig. 2.9. Right: The hysteresis loop .(1) →
(2) → (3) → (4) → (1) occurring for .a > 0 as function of the field h

Thermodynamic Phases The generalized pitchfork system (2.37) has a close


relation to the theory of thermodynamic phase transitions, when assuming that

a = a(T ) = a0 (Tc − T ),
. a0 > 0 ,

where T is the temperature. In the absence of an external field h, only the trivial
fixpoint .x ∗ = 0 exists for .T > Tc , viz for temperatures above the critical
temperature .Tc . In the ordered state, for .T < Tc , there are two possible phases,
characterized by the positive and negative stable fixpoints .x ∗ .

Hysteresis and Memory The behavior of the phase transition changes when an
external field h is present. Switching the sign of the field is accompanied, in the
ordered state for .T < Tc , by a “hysteresis-loop”

(1) → (2) → (3) → (4) → . . . ,


.

as illustrated in Fig. 2.14.

– The field h changes from negative to positive values along .(1) → (2), with the
fixpoint .x ∗ remaining negative.
– At (2) the negative stable fixpoint disappears and the system makes a rapid
transition to (3), the catastrophe.
– Lowering eventually again the field h, the system moves to (4), jumping in the
end back to (1).

The system retains its state, .x ∗ being positive or negative, to a certain extend and
one speaks of a memory in the context of catastrophe theory.
2.3 Global Bifurcations 67

Center Manifold A d-dimensional dynamical system with a fixpoint .x∗ and a


Jacobian J ,

∂fi
ẋ = f(x),
. Jij = , J en = λn en ,
∂xj x=x∗

may have a number of neutral Lyapunov exponents with vanishing eigenvalues .λi =
0 of the Jacobian.

CENTER MANIFOLD The space spanned by the neutral eigenvectors .ei is the center
manifold.

Catastrophe theory deals with fixpoints having a non-vanishing center manifold.

Center Manifold of the Pitchfork System The Lyapunov exponent


 2
λ = a − 3 x∗
.

of the pitchfork system (2.37) vanishes at the jump-points .(2) and .(4) at which the
catastrophe occurs, compare Fig. 2.14. At the jump-points .h + ax is per definition
tangent to .x 3 , having identical slopes,

d 3 d  
. x = h + ax , 3x 2 = a . (2.38)
dx dx
At these transition points the autonomous flow becomes hence infinitesimally slow,
since .λ → 0, a phenomenon called “critical slowing down” in the context of the
theory of thermodynamic phase transitions.

Center Manifold Normal Forms The program of the catastrophe theory consists
of finding and classifying the normal forms for the center manifolds of stationary
points .x∗ , by expanding to the next, non-vanishing order in .δx = x − x∗ . The
aim is hence to classify the types of dynamical behavior potentially leading to
discontinuous transitions.

Catastrophic Fixpoints A generic fixpoint .x∗ = x∗ (c) may depend on control


parameters .c = (ci , .., cS ) of the equations of motion, with S being the codimension.
The flow is however smooth around a generic fixpoint and a finite center manifold
arises only for certain sets .{cic } of control parameters.
The controlling parameters of the pitchfork system (2.37) are h and a, in our
example system, and a center manifold exists only when (2.38) is fulfilled, viz when
 2
3 x ∗ (hc , a c ) = a c ,
. x ∗ (hc , a c ) = xc∗

holds, which determines the set of “catastrophic fixpoints” .xc∗ .


68 2 Bifurcations and Chaos in Dynamical Systems

Changing the Controlling Parameters How does the location .x∗ of a fixpoint
change upon variations .δc around the set of parameters .cc determining the catas-
trophic fixpoint .x∗c ? With

x∗c = x∗ (cc ),
. x∗ = x∗ (c), δc = c − cc

we expand the fixpoint condition .f(x, c) = 0 and obtain

∂fi ∂fi
J δx∗ + P δc = 0,
. Jij = , Pij = , (2.39)
∂xj ∂cj

which we can invert if the Jacobian J is non-singular,

δx∗ = −J −1 P δc,
. if |J | /= 0 . (2.40)

The fixpoint may change however in a discontinuous fashion whenever the determi-
nant .|J | of the Jacobian vanishes, viz in the presence of a center manifold. This is
precisely what happens at a catastrophic fixpoint .x∗c (cc ).

Catastrophic Manifold The set .x∗c = x∗c (cc ) of catastrophic fixpoints is deter-
mined by two conditions, by the fixpoint condition .f = 0 and by .|J | = 0. For the
pitchfork system (2.37) we find,
 2  3  3
ac = 3 xc∗ ,
. hc = xc∗ − ac xc∗ = −2 xc∗ ,

when using (2.38). Solving for .xc∗ = ac /3 we can eliminate .xc∗ and obtain

hc = −2 (ac /3)3/2 ,
. (hc /2)2 = (ac /3)3 , (2.41)

which determines the “catastrophic manifold” .(ac , hc ) for the pitchfork transition,
as illustrated in Fig. 2.15.

Classification of Perturbations The control parameters .(ac , hc ) of the pitchfork


transition may be changed in two qualitatively distinct ways, namely along the
catastrophic manifold (2.41), or perpendicular to it.
It would hence be useful to dispose of a list of canonical perturbations charac-
terizing all possible distinct routes to change catastrophic fixpoints .xc∗ = x ∗ (cc )
qualitatively, upon small changes .δc = c − cc of the control parameters .c. It is the
aim of catastrophe theory to develop such a canonical classification scheme for the
perturbations of center manifolds.

Gradient Dynamical Systems At times the flow .f(x) of a dynamical system may
be represented as a gradient of a bifurcation potential .U (x),

ẋ = f(x) = −∇U (x) .


. (2.42)
2.3 Global Bifurcations 69

Fig. 2.15 The fixpoints .x ∗


(upper folded plane) of the x*
symmetry broken pitchfork
system (2.37), as a function of
the control parameters a and
h. The catastrophic manifold
.(ac , hc ), compare (2.41), has
a cusp-like form (green lines)

0 h

0
a

This is generically the case for one-dimensional systems, as discussed in Sect. 2.2.2,
but otherwise not. For the gradient representation

ẋ = g(x, y) g = −Ux (x, y)


.
ẏ = h(x, y) h = −Uy (x, y)

of a two-dimensional system to be valid, to give an example, the cross-derivatives


gy = −Uxy = −Uyx = hx would need to coincide. This is however normally not
.

the case.

For gradient dynamical systems one needs to discuss only the properties of the
bifurcation potential .U (x), as scalar quantity, and they are hence somewhat easier
to investigate than a generic dynamical system of the form .ẋ = f(x). Catastrophe
theory is mostly limited to gradient systems.

2.3.3 Rate Induced Tipping

For the bifurcation scenarios studied hitherto, we took as a precondition that


controlling parameters changed slowly, on time scales substantially exceeding that
of the primary dynamics. This is the “adiabatic limit”.

Bifurcation vs. Tipping A rapid transfer between distinct stable manifolds may
occur when systems reach a bifurcation point upon adiabatic parameter changes, as
illustrated in Fig. 2.14. Systems undergoing self-accelerating instabilities are said
“to tip”, which is a somewhat broader terminology. Instead of a description in terms
70 2 Bifurcations and Chaos in Dynamical Systems

20 r = 0.4
0.4
flow f(x-a) r = 0.5
15 r = 0.6
f(x-b)
r = 1.2
10

x(t)
x 5

b a 0

-5

-0.4 0 100 200 300


time t

Fig. 2.16 Rate-induced tipping. Left: The fixpoints .x ∗ = a/b are stable√ for the individual flows
(2.43), here for .a = 5 and .b = −5. The stationary points shift to .± 52 − 1 ≈ ±4.9 when the
two flows are superimposed, as given by (2.45). Right: The orbits .x(t) under the influence of a
forcing .ȧ = r. For small rates the system manages to adapt, as for .r = 0.4, for larger rates not. For
.r = 0.5/0.6/1.2 the system tips towards the alternative steady state

of an abstract bifurcation, ‘tipping’ emphasizes the role of positive feedback effects.


This view is particularly important in climatology and ecology.

Rate Induced Tipping Individual components of a dynamical system may evolve


on vastly different time scales. In climatology, a widely discussed slow component is
the Greenland iceshield, which takes millenia to grow or melt. In ecology, adaption
to changing environmental condition via Darwinian evolution is likewise a slow
process, extending over geological periods.7 Systems with slow components may
show a reduced resilence against shocks in terms of rapid parameter changes, as
typical for climatic and ecological systems facing anthropogenic forcing. When the
rate matters, at which control parameters change, “rated induced tipping” may occur.

Competing Fixpoints Close to an attracting state the flow is linear. At larger


distances z, the numerical size of the flow .f = f (z) may however fade away, f.i. as
described by

−z
ẋ = f (x − a),
. f (z) = . (2.43)
1 + z2

In this example the flow decays inversely with the distance to the fixpoint .x ∗ = a,
as illustrated in Fig. 2.16, after being maximal in magnitude at .x = a ± 1. For two
attracting states the dynamics takes the form

ẋ = f (x − a) + f (x − b) ,
. (2.44)

7 See Chap. 8.
2.4 Logistic Map and Deterministic Chaos 71

which makes the system bistable. The two flows compete with each other, which
shifts the two stable fixpoints,
   
(x − a) 1 + (x − b)2 = −(x − b) 1 + (x − a)2 ,
.

as determined by .ẋ = 0, or

(x − a)(x − b) = −1,
. x = ± a2 − 1 , (2.45)

where we specialized in the last step to .b = −a. The two stable fixpoints are
seperated by an unstable fixpoint at .x = 0. Equation (2.44) describes a situation
in which the influence of attracting states decays in phase space, which is typical
for dynamical systems composed of interacting real-world identities. An example
would be physical objects, whose sphere of influence is often limited to the
immediate surrounding.

Adaption Catastrophy In (2.44) the dynamical variable .x = x(t) reflects the state
of the system. In the following we take .x ≈ a to correspond to a healthy system,
say a pristine ecosystem, and states .x ≈ b to a disrupted system. External impacts
may change the control parameter .a = a(t), e.g. with a constant rate r,

ȧ = r .
.

The orbit cannot follow when a moves to the right too fast, which happens when
ẋ < ȧ = r. Given that .f (x −a) is bounded, this necessarily happens with increasing
.

r. At this point the system fails to adapt to the changing environment, with the orbit
reverting to a disrupted state close to b. Rate induced tipping occurs in the form of
an adaption catastrophy. Extended transients are observed close to the critical rate
r, as shown in Fig. 2.16.

2.4 Logistic Map and Deterministic Chaos

The notion of “chaos” plays an important role in dynamical systems theory. A


chaotic system is defined as a system that cannot be predicted within a given
numerical accuracy. At first sight this seems to be a surprising concept, since
differential equations of the type of (2.15), which do not contain any noise or
randomness, are perfectly deterministic. Once the starting point is known, the
resulting trajectory can be calculated for all times. Chaotic behavior can arise
nevertheless, due to an exponential sensitivity to initial conditions.

DETERMINISTIC CHAOS A deterministic dynamical system is chaotic when it shows


exponential sensibility with respect to initial conditions.
72 2 Bifurcations and Chaos in Dynamical Systems

prediction reliablity scenario

2011 - DWD complexity


2001 - ECMWF barrier?

0
0 2 4 6 8 10 12 14 16 18 20 22
forecasting timespan [days]

Fig. 2.17 The average accuracy of weather forecasting, normalized to .[0, 1], decreases rapidly
with increasing prediction timespan, due to the chaotic nature of weather dynamics. Increasing
the resources devoted for improving the prediction accuracy results in decreasing returns close to
the resulting complexity barrier. Reprinted from Gros (2012) under CC-BY-4.0 license, © 2012
Complex Systems Publications, Inc.

This means that a very small change in the initial setting can blow up even after
a short time. When considering real-world applications, when models need to be
determined from measurements containing inherent errors and limited accuracies,
an exponential sensitivity can result in unpredictability. A well known example is
the problem of long-term weather prediction, as shown in Fig. 2.17.

Logistic Map One of the most cherished models in the field of deterministic chaos
is the logistic map of the interval .[0, 1] onto itself,

.xn+1 = g(xn ) ≡ r xn (1 − xn ), xn ∈ [0, 1], r ∈ [0, 4] , (2.46)

where we made use of the notation .xn = x(n), for discrete times .n = 0, 1, .. . The
functional dependence is illustrated in Fig. 2.18. Despite its apparent simplicity, the
logistic map shows an infinite series of bifurcations that culminate in a transition to
chaos.

Biological Interpretation We may consider .xn ∈ [0, 1] as standing for the


population density of a reproducing species in the year n. In this case the factor
.r(1 − xn ) ∈ [0, 4] corresponds to the number of offspring per year and animal,

which is limited for high population densities .x → 1, viz when resources become
scarce. The classical example is that of a herd of reindeer on an island.
Knowing the population density .xn in a given year n we may predict via (2.46)
the population density for all subsequent years exactly; the system is deterministic.
2.4 Logistic Map and Deterministic Chaos 73

1 1

0.8 0.8

0.6 0.6

0.4 0.4

r = 2.5 r = 3.3
0.2 g(x)
0.2 g(x)
g(g(x)) g(g(x))

0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x
Fig. 2.18 The logistic map .g(x) = rx(1 − x) (red), and the iterated logistic map .g(g(x)) (green);
for .r = 2.5 (left) and .r = 3.3 (right). Also shown are iterations of .g(x), starting from .x = 0.1 (thin
solid line). Note, that the fixpoint .g(x) = x is stable/unstable for .r = 2.5 and .r = 3.3, respectively.
The orbit is attracted to a fixpoint of .g(g(x)) for .r = 3.3, corresponding to a cycle of period two
for .g(x)

Nevertheless the population shows irregular behavior for certain values of r, which
is hence chaotic.

Fixpoints of the Logistic Map We start with the fixpoints of .g(x),

x = rx(1 − x)
. ⇐⇒ x=0 or 1 = r(1 − x) .

The non-trivial fixpoint is

1/r = 1 − x,
. x (1) = 1 − 1/r, r1 < r, r1 = 1 . (2.47)

It is present for .r1 < r, with .r1 = 1, due to the restriction .x (1) ∈ [0, 1].

Stability Analysis For maps, fixpoints are stable when the derivative .g ' of the flow
is smaller than one in magnitude, see (2.14). For .x (1) = 1 − 1/r this translates to
 
g ' (x) = r(1 − 2x),
. g ' x (1) = 2 − r . (2.48)

Noting that .r ∈ [1, 4], we obtain



r1 = 1
|2 − r| < 1
. ⇐⇒ r1 < r < r2 with (2.49)
r2 = 3

for the region of stability of .x (1) .


74 2 Bifurcations and Chaos in Dynamical Systems

Fixpoints of Period Two For .r > 3 a fixpoint of period two appears. This fixpoint
is a fixpoint of the iterated function

g(g(x)) = rg(x)(1 − g(x)) = r 2 x(1 − x)(1 − rx(1 − x)) .


.

The fixpoint equation .x = g(g(x)) leads to the cubic equation

1 = r 2 (1 − rx + rx 2 ) − r 2 x(1 − rx + rx 2 ),
.

0 = r 3 x 3 − 2r 3 x 2 + (r 3 + r 2 )x + 1 − r 2 . (2.50)

In order to find the roots of (2.50) we use the fact that .x = x (1) = 1 − 1/r is a
stationary point of both .g(x) and .g(g(x)), see Fig. 2.18. Dividing (2.50) by the root
.(x − x
(1) ) = (x − 1 + 1/r) one obtains

(r 3 x 3 − 2r 3 x 2 + (r 3 + r 2 )x + 1 − r 2 ) : (x − 1 + 1/r) =
.

r 3 x 2 − (r 3 + r 2 )x + (r 2 + r) .

The two fixpoints of .g(g(x)) are therefore the roots of


   
1 1 1
.x − 1 + x+ + 2 = 0,
2
r r r

which leads to

     
(2) 1 1 1 1 2 1 1

. = 1+ ± 1+ − + 2 . (2.51)
2 r 4 r r r

Period Doubling Bifurcation We have three fixpoints of period two for .r > 3
(two stable ones and one unstable), and only a single fixpoint for .r < 3. What
happens at .r = 3?

 2  
(2) 13+1 1 3+1 3+1
.x± (r = 3) = ± −
2 3 4 3 9
2 1
= = 1 − = x (1) (r = 3) .
3 3
At .r = 3 the fixpoint splits into two stable and one unstable branch, see Fig. 2.19,
akin to the pitchfork bifurcations discussed in Sect. 2.2. Given that a period doubling
occurs at .r = 3, the transition can be seen alternatively as the discrete analog of a
supercritical Hopf bifurcation.
2.4 Logistic Map and Deterministic Chaos 75

1 1

0.8
0
3 3.2 3.4 3.6 3.8 4
0.6 r

<λ>
-1
0.4

-2
0.2

0 -3
2.8 3 3.2 3.4 3.6 3.8 4
r

Fig. 2.19 Left: The values .xn for the iterated logistic map (2.46). For .r < r∞ ≈ 3.57 the .xn go
through cycles with finite but progressively longer periods. For .r > r∞ the plot would be fully
covered in most regions if all .xn would be shown. Right: The corresponding maximal Lyapunov
exponents, as defined by (2.55). Positive Lyapunov exponents .λ indicate chaotic behavior

(2)
Bifurcation Cascade One may carry out a stability analysis for .x± , just as for
.x
(1) . One finds a critical value .r > r such that
3 2

(2)

. (r) stable ⇐⇒ r2 < r < r3 . (2.52)

Going further on one finds an .r4 such that there are four fixpoints of period four,
that is of .g(g(g(g(x)))), for .r3 < r < r4 . In general there are critical values .rn and
.rn+1 such that there are

2n−1 fixpoints x (n) of period 2n−1


. ⇐⇒ rn < r < rn+1 .

The logistic map therefore shows iterated bifurcations. This, however, is not yet
chaotic behavior.

Chaos in the Logistic Map There exists a critical .r∞ at which the period-doubling
series converges,

. lim rn → r∞ , r∞ = 3.5699456 . . .
n→∞

No stable fixpoint of even period exists in the region

r∞ < r < 4 .
.

In order to characterize the sensitivity of (2.46) with respect to the initial condition,
we consider two slightly different starting points .x1 and .x1' :

x1 − x1' = y1 ,
. |y1 | ⪡ 1 .
76 2 Bifurcations and Chaos in Dynamical Systems

The key question is then whether the difference

' dg(x)
ym = xm − xm
. , ym+1 ≈ ym (2.53)
dx x=xm

between the two respective orbits is still small after m iterations. For (2.53) we used
' = x − y , neglecting terms .∼ y 2 . We hence obtain
xm
. m m m

ym+1 = r(1 − 2xm ) ym ≡ ϵ ym .


.

For .|ϵ| < 1 the map is stable, as two initially different populations close in with
time passing. For .|ϵ| > 1 they diverge; the map is chaotic.

Lyapunov Exponents We remind ourselves of the definition

dg(x)
|ϵ| = eλ ,
. λ = log (2.54)
dx

the Lyapunov exponent .λ = λ(r) of a map, as introduced in Sect. 2.2. For positive
Lyapunov exponents the time development is exponentially sensitive to the initial
conditions and shows chaotic features,

.λ < 0 ⇔ stability, λ > 0 ⇔ instability .

This is indeed observed in nature, e.g. for populations of reindeer on isolated islands,
as well as for the logistic map for .r∞ < r < 4, compare Fig. 2.19.

Maximal Lyapunov Exponent The Lyapunov exponent, as defined by (2.54), pro-


vides a description of the short time behavior. For a corresponding characterization
of the long time dynamics one defines the “maximal Lyapunov exponent”

1 dg (n) (x)
λ(max) = lim
. log , g (n) (x) = g(g (n−1) (x)) . (2.55)
n⪢1 n dx

Using (2.54) for the short time evolution we can decompose .λ(max) into an averaged
sum of short time Lyapunov exponents. .λ(max) is the “global Lyapunov exponent”.
One needs to select advisedly the number of iterations n in (2.55). On one side n
should be large enough such that short-term fluctuations of the Lyapunov exponent
are averaged out. The available phase space is however generically finite, for the
logistic map we have .y ∈ [0, 1]. Therefore, two initially close orbits cannot diverge
ad infinitum. One needs hence to avoid phase-space restrictions, evaluating .λ(max)
for large but finite numbers of iterations n.
2.4 Logistic Map and Deterministic Chaos 77

Routes to Chaos The chaotic regime .r∞ < r < 4 of the logistic map connects to
the regular regime .0 < r < r∞ with increasing period doubling. One speaks of a
“route to chaos via period-doubling”. The study of chaotic systems is a wide field of
research and a series of routes leading from regular to chaotic behavior have been
found. Two important alternative routes to chaos are:

– INTERMITTENCY ROUTE TO CHAOS


The trajectories are almost periodic and interdispersed with regimes of
irregular behaviour. The occurrence of these irregular bursts increases until the
system becomes irregular.
– RUELLE–TAKENS–NEWHOUSE ROUTE TO CHAOS
A strange attractor appears in a dissipative system after two (Hopf) bifurca-
tions. As a function of an external parameter a fixpoint evolves into a limit cycle
(Hopf bifurcation), which then turns into a limiting torus, which subsequently
turns into a strange attractor.

2.4.1 Colliding Attractors

Bifurcations are generically the result of colliding invariant manifolds, viz of stable
and unstable fixpoints, limit cycles and chaotic sets. Examples are the saddle-node
bifurcation (2.19), at which a stable and an unstable fixpoint merge, and the collision
between a limit cycle and a saddle within the Taken–Bogdanov system (2.33).

Odd Logistic Map The collision between two chaotic attractors can be studied
within the odd logistic map
 
xn+1 = g(xn ) = xn λ − xn2 ,
. λ > 0, (2.56)

which is invariant under inversion .x ↔ (−x). The local maximum .gm = g(xm ) is
given by
 
λ λ3
g ' = 0,
. xm = , gm = 2 , (2.57)
3 33

as illustrated in Fig. 2.20.

Merging Chaotic Attractors For .λ < 1 only the trivial fixpoint exists. Afterwards,
for .1 < λ < λs , there are two attracting states that are related via inversion
symmetry, as shown in Fig. 2.20. Both branches undergo a period-double transition
to chaos, in equivalence to the one observed for the standard logistic map, compare
78 2 Bifurcations and Chaos in Dynamical Systems

2
2
gm xn+1 = xn(λ-xn )
1 1

0
0
-1 0
xm 1

unique fixpoint
-1 positive attractor
-1 x negative attractor
2
g=x(2-x ) merged
-2
0 0.5 1 1.5 2 2.5 3
λ

√ Left: For .λ = 2, the odd logistic map (2.56). The local maximum .gm is reached for
Fig. 2.20
.xm = λ/3 ≈ 0.82. Right: The bifurcation diagram of the odd logistic map as a function of .λ.
One has a unique fixpoint .xn = 0 (black)
√ for .λ < 1 and two equivalent attractors (blue/green) for
.1 < λ < λs , which merge at .λs = 27/2 ≈ 2.6. Inversion symmetry is restored at .λs (red). A
period-doubling transition to chaos is observable in the symmetry broken phase

Fig. 2.19. At .λs , determined by



λ3 27
g(gm ) = 0,
. λs = 4 3s , λs = , (2.58)
3 2
the two attractors merge. Only a single chaotic attractor remains for .λ > λs .

Chaotic Attractors in Crisis In general, different types of attractors may collide.


A Hopf bifurcation, f.i., occurs when limit cycles contract to a point, which is
equivalent to the merger of limit cycles with fixpoints. One speaks of a “crisis” when
chaotic attractors collide with unstable fixpoints or limit cycles. For the logistic map
(2.46), a crisis occurs at .r = 4, namely when the chaotic attractor hits the unstable
fixpoint .x = 0. Orbits diverge for .r > 4.

2.5 Dynamical Systems with Time Delays

The dynamical systems we considered so far all had instantaneous dynamics, being
of the type

d
.y(t) = f (y(t)), t >0 (2.59)
dt
y(t = 0) = y0 ,

when denoting with .y0 the initial condition. This is the simplest case: one dimen-
sional (a single dynamical variable only), autonomous (.f (y) is not an explicit
function of time) and deterministic (no noise).
2.5 Dynamical Systems with Time Delays 79

Time Delays In many real-world applications the couplings between different


subsystems and dynamical variables is not instantaneous. Signals and physical
interactions need a certain time to travel from one subsystem to the next. Time
delays are therefore encountered commonly and become important when the delay
time T is comparable to the intrinsic time scales of the dynamical system. We
consider here the simplest case, a noise-free one-dimensional dynamical system
with a single delay time,

d
. y(t) = f (y(t), y(t − T )), t >0 (2.60)
dt

y(t) = φ(t), t ∈ [−T , 0] .

Due to the delayed coupling we need now to specify an entire initial function .φ(t).
Differential equations containing one or more time delays need to be considered
carefully, with the time delay introducing additional dimensions to the problem. We
discuss a several basic examples.

Linear Couplings We start with the linear delay equation

d
. y(t) = −ay(t) − by(t − T ), a, b > 0 . (2.61)
dt

The only constant solution for .a + b /= 0 is the trivial state .y(t) ≡ 0. The trivial
solution is stable in the absence of time delays, .T = 0, whenever .a + b > 0. The
question is, whether a finite T may change this.
We may expect the existence of a certain critical .Tc , such that .y(t) ≡ 0 remains
stable for small time delays .0 ≤ T < Tc . In this case the initial function .φ(t) will
affect the orbit only transiently, in the long run the motion would be damped out,
approaching the trivial state asymptotically for .t → ∞.

Delay Induced Hopf Bifurcation Trying our luck with the usual exponential
ansatz, we find

λ = −a − be−λT ,
. y(t) = y0 eλt , λ = p + iq . (2.62)

Separating into a real and an imaginary part we obtain

p + a = −be−pT cos(qT ),
. (2.63)
q = be−pT sin(qT ) .

For .T = 0 the solution is .p = −(a + b), .q = 0, as expected, and the trivial solution
y(t) ≡ 0 is stable. A numerical solution is shown in Fig. 2.21 for .a = 0.1 and .b = 1.
.
80 2 Bifurcations and Chaos in Dynamical Systems

Fig. 2.21 For .a = 0.1 and Im - part q


.b= 1, the components p and
q of the solution .e(p+iq)t of 0
0.5 1 1.5 2 2.5 3
the delay system given by
(2.61). The state .y(t) ≡ 0 Re - part p
delay time T
-1
becomes unstable whenever
the real part p turns positive.
The imaginary part q is given
-2
in units of .π

-3

The crossing point .p = 0 is determined by

a = −b cos(qT ),
. q = b sin(qT ) . (2.64)

The first condition in (2.64) can be satisfied only for .a < b. Taking the squares in
(2.64) and eliminating qT one has

q=
. b2 − a 2 , T ≡ Tc = arccos(−a/b)/q .

One therefore finds a Hopf bifurcation at .T = Tc , which implies that the trivial
solution becomes unstable for .T > Tc . For .a = 0 the transition point is defined
by .q = b, together with .Tc = π/(2b). Note, that there is a Hopf bifurcation only
for .a < b, viz whenever the time delay dominates, and that q becomes non-zero
well before the bifurcation point, compare Fig. 2.21. One has therefore a region of
damped oscillatory behavior with .q /= 0 and .p < 0.

Discontinuities For differential equations with time delays one may specify
arbitrary initial functions .φ(t), also initial functions with discontinuities, which
will induce discontinuities in the derivatives of the respective trajectories. As an
example we consider the case .a = 0, .b = 1 of (2.61), with a non-zero constant
initial function,

d
. y(t) = −y(t − T ), φ(t) ≡ 1 . (2.65)
dt
The solution can be evaluated by stepwise integration,
 t  t  t
y(t) − y(0) =
. dt ' ẏ(t ' ) = − dt ' y(t ' − T ) = − dt ' = −t, 0 < t < T .
0 0 0

The first derivative in consequently discontinuous at .t = 0,

d d
. lim y(t) = 0, lim y(t) = −1 .
t→0− dt t→0+ dt
2.5 Dynamical Systems with Time Delays 81

In analogy one show that the second derivative has a discontinuity at .t = T , the
third derivative at .t = 2T , and so on.

Injectivity Ordinary differential equations are injective in the sense that distinct
initial conditions lead to distinct trajectories. This holds regardless of the presence
of attracting manifolds, which determine solely the long-term behavior.
Delay systems are not necessarily injective with respect to the initial function.
Consider logistic growth with a delayed growth rate,

d  
. y(t) = y(t − T ) y(t) − 1 , φ(t = 0) = 1 . (2.66)
dt

For any .φ(t) with .φ(0) = 1 the solution is .y(t) ≡ 1 for all .t ∈ [0, ∞]. Distinct
initial functions lead to identical orbits.

Non-constant Time Delays Things may become rather weird when the time delays
are not constant, as for
 
d
dt y(t) = y t − T (y) + 12 , T (y) = 1 + |y(t)|,
 (2.67)
.
1 t < −1
φ(t) = .
0 t ∈ [−1, 0]

It is easy to see, that the two functions

t 3t
y(t) =
. , y(t) = , t ∈ [0, 2] ,
2 2
are both solutions of (2.67), with appropriate continuations for .t > 2. Two different
solutions of the same differential equation and identical initial conditions, this
cannot happen for ordinary differential equations. It is evident, that special care
must be taken when examining dynamical systems with time delays numerically.

2.5.1 Distributed Time Delays

Basic delay differential systems contain a single time delay T , like (2.61), which
corresponds to an instantaneous memory process. In general, the memory .yM (t) of
past trajectories will be a convolution,
 ∞  ∞
yM (t) =
. K(τ )y(t − τ )dτ, K(τ )dτ = 1 , (2.68)
0 0

where we defined with .K(τ ) the delay kernel. For a sharply peaked delay kernel,
K(τ ) = δ(τ − T ), one recovers .yM (t) = y(t − T ).
.
82 2 Bifurcations and Chaos in Dynamical Systems

Exponentially Distributed Delays Explicitly we consider with

1 −τ/T
K(τ ) =
. e (2.69)
T
exponentially distributed time delays. One has
 ∞  ∞
d d d
. yM (t) = dτ K(τ ) y(t − τ ) = − dτ K(τ ) y(t − τ ), (2.70)
dt 0 dt 0 dτ

which allows to integrate the last expression by parts when using (2.69). The
resulting closed expression,
 ∞
−1 d y − yM
.ẏM = dτ e−τ/T y(t − τ ) = , (2.71)
T 0 dτ T

corresponds to a trailing average, with the memory variable .yM trying to approach
y.

Kernel Series Framework Equation (2.71) implies that a delay differential equa-
tion with exponentially distributed delays can be mapped exactly to a system of
ordinary differential equations by adding an additional variable, namely .yM (t).
For generic kernels .K(τ ) in (2.68) one can generalize this concept by adding a
diverging number of memory variables. With this approach, denoted “kernel series
framework”, one can map any time delay system to a N-dimensional system of
ordinary differential equations. For the case of a single time delay one speaks of the
“linear chain trick”. In general one has .N → ∞, which reflects the notion that delay
systems are formally infinite dimensional.

Exercises

(2.1) JACOBIANS WITH DEGENERATE EIGENVALUES


Jacobians with degenerate eigenvalues may not dispose of a corresponding
number of eigenvectors, being non-symmetric. Consider the linear system

ẋ = −x + y,
. ẏ = −ry, r > 0, (2.72)

and determine the eigenvalues of the Jacobian for the fixpoint (0, 0) and its
eigenvectors. What happens in the limit r → 1?
(2.2) CATASTROPHIC SUPERLINEAR GROWTH
The growth of a resource variable x(t) ≥ 0, as defined by

ẋ = x β − γ x,
. γ = 0, 1 , (2.73)
Exercises 83

is sub/super-linear for β < 1 and β > 1 respectively. Start by solving (2.73)


analytically for γ = 0. Is there a singularity, viz a finite time ts < ∞ for
which limt→ts x(t) diverges catastrophically? Continue then by discussing
qualitatively the behavior for γ = 1.
Next consider sublinear decay, viz ẋ = −x β , with β < 1. Can the stable
fixpoint x ∗ = 0 be reached in finite time?
(2.3) HETEROCLINIC ORBITS AND SYMMETRIES
Discuss, in analogy to (2.18), the fixpoints and the stable and unstable
manifolds of
 2
ẋ = 1 − x 2 ,
. ẏ = −y + ϵ 1 − x 2 . (2.74)

Is there a heteroclinic orbit?


(2.4) THE SUBCRITICAL PITCHFORK TRANSITION
Discuss, in analogy to Sect. 2.2, the fixpoints, and their stability, of the
subcritical pitchfork transition described by the canonical equation of motion
ẋ = ax + x 3 .
(2.5) ENERGY CONSERVATION IN LIMIT CYCLES
Evaluate, for the Taken–Bogdanov system (2.33) and μ ∈ [μc , 1], the energy
balance, as defined by (2.34), by integrating numerically dE/dt. How large
is the overall energy dissipation and uptake when completing one cycle?
Good numerical accuracy is important, you will probably need to use the
Runge-Kutta method.
(2.6) MAXIMAL LYAPUNOV EXPONENT
Show, that the maximal Lyapunov exponent (2.55) can be written as the time-
averaged local exponent, see (2.54).
(2.7) LYAPUNOVEXPONENTS ALONG PERIODIC ORBITS
Evaluate the longitudinal Lyapunov exponent λϕ along the r = 1 trajectory
of (2.35). Consider both the case when the coupling K is smaller or larger
than the critical Kc = 1 of the infinite period bifurcation. For K < Kc the
orbit is periodic and the integral of the longitudinal Lyapunov exponent over
one period T vanishes. Why is this generically the case for periodic orbits?
(2.8) GRADIENT DYNAMICAL SYSTEMS
Is it possible to add a term to the evolution equation ẋ = . . . of the dynamical
system (2.74), such that it becomes a gradient dynamical system, as defined
in (2.42)?
(2.9) LIMIT CYCLES IN GRADIENT SYSTEMS
Show that gradient dynamical systems, compare (2.42), have no limit cycles.
(2.10) DELAY DIFFERENTIAL EQUATIONS
The delay equation (2.61) allows for harmonically oscillating solutions for
certain sets of parameters a and b. Which are the conditions? In addition,
specialize for the case a = 0.
(2.11) PERIOD-3 CYCLES IN THE LOGIST MAP
For the logistic map, g(x) = √rx(1 − x), a window of stability for period-3
cycles starts at r ∗ = 1 + 2 2 ≈ 3.8284, viz inside the otherwise chaotic
84 2 Bifurcations and Chaos in Dynamical Systems

regime. Compare Fig. 2.19. Find the values for

.x1 = g(x3 ), x2 = g(x1 ), x3 = g(x3 ) , (2.75)

together with Lyapunov exponent. Plot the iterated map for r = r ∗ and
values a bit smaller/larger. The occurring saddle-node bifurcation is also
called ‘tangent transition’ Why?
(2.12) CAR-FOLLOWING MODEL
A car moving with velocity ẋ(t) follows another car driving with velocity
v(t) via
 
ẍ(t + T ) = α v(t) − ẋ(t) ,
. α > 0, (2.76)

with T > 0 being the reaction time of the driver. Prove the stability of the
steady-state solution for a constant velocity v(t) ≡ v0 of the preceding car.8

Further Reading

For further studies we refer to introductory texts to classical dynamical systems,


Goldstein (2002), to bifurcation theory, Kielhöfer (2012), to chaos theory, Devaney
(2018), Poston and Stewart (2014), and to the generalization of chaos to quantum
mechanics, Gutzwiller (2013). Readers interested in the KAM theorem may consult
Ott (2002).
An easy to follow introduction to time delays system can be found in Wernecke
et al. (2019), for more details see Kharitonov (2012). The kernel series framework
for delay dynamical systems is elaborated in Nevermann and Gros (2023), rate
induced tipping in Ritchie et al. (2023). The interplay between complexity barriers
and chaotic dynamics is discussed in Gros (2012).

References
Devaney, R. (2018). An introduction to chaotic dynamical systems. London: CRC Press.
Goldstein, H. (2002). Classical mechanics (3rd ed.). Reading: Addison-Wesley.
Gros, C. (2012). Pushing the complexity barrier: Diminishing returns in the sciences. Complex
Systems, 21, 183.
Gutzwiller, M. C. (2013). Chaos in classical and quantum mechanics. Berlin: Springer Science &
Business Media.
Kharitonov, V. (2012). Time-delay systems: Lyapunov functionals and matrices. Berlin: Springer
Science & Business Media.
Kielhöfer, H. (2012). Bifurcation theory. Berlin: Springer.

8 More realistic car-following models are discussed in Sect. 4.4 of Chap. 4.


References 85

Nevermann, D. H., & Gros, C. (2023). Mapping dynamical systems with distributed time delays
to sets of ordinary differential equations. Journal of Physics A: Mathematical and Theoretical,
56, 345702.
Ott, E. (2002). Chaos in dynamical systems. Cambridge: Cambridge University Press.
Poston, T., & Stewart, I. (2014). Catastrophe theory and its applications. Chelmsford: Courier
Corporation.
Ritchie, P. D., Alkhayuon, H., Cox, P. M., & Wieczorek, S. (2023). Rate-induced tipping in natural
and human systems. Earth System Dynamics, 14, 669–683.
Wernecke, H., Sandor, B., & Gros, C. (2019). Chaos in time delay systems, an educational review.
Physics Reports, 824, 1.
Dissipation, Noise and Adaptive Systems
3

Most dynamical systems are not isolated, but interacting with an embedding
environment that may add stochastic components to the evolution equations. The
internal dynamics slows down when energy is dissipated to the outside world,
approaching attracting states which may be regular, such as fixpoints or limit cycle,
or irregular, such as chaotic attractors. Adaptive systems alternate between phases of
energy dissipation and uptake, until a balance between these two opposing processes
is achieved.
In this chapter an introduction to adaptive, dissipative and stochastic systems
will be given together with important examples from the realm of noise controlled
dynamics, like diffusion, random walks and stochastic escape and resonance. We
will discuss to which extent chaos, a regular guest of adaptive systems, may remain
predictable.

3.1 Chaos in Dissipative Systems

The time evolution of deterministic systems can be computed exactly, at least


as matter of principle, once the initial conditions are known.1 We now turn to
“stochastic systems”, i.e. dynamical systems that are influenced by noise and
fluctuations. On a mean level, the impact of noise shows up as “dissipation”.

3.1.1 Phase Space Contraction and Expansion

Friction and Dissipation Friction plays an important role in real-world systems,


causing energy to be dissipated to the environment.

1 For references, Chap. 2 is devoted to determinstic dynamics.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 87


C. Gros, Complex and Adaptive Dynamical Systems,
https://doi.org/10.1007/978-3-031-55076-8_3
88 3 Dissipation, Noise and Adaptive Systems

Energy is conserved when the contributions from all constituent parts of the
overall system are taken into account. Friction just stands for a transfer process
of energy; when energy is transferred from a system we observe, like a car on a
motorway with the engine turned off, to a system not under observation, such as the
surrounding air. In this case the combined kinetic energy of the car and the thermal
energy of the air body remains constant; the air heats up a little bit while the car
slows down.

Mathematical Pendulum As an example we consider the damped mathematical


pendulum,

.φ̈ + γ φ̇ + ω02 sin φ = 0 , (3.1)

which describes a pendulum with a rigid bar, capable of turning over completely,
with φ corresponding to the angle between the bar and the vertical.

The mathematical pendulum reduces to the damped harmonic oscillator for small
φ ≈ sin φ, which is damped/critical/overdamped for γ < 2ω0 , γ = 2ω0 and γ >
2ω0 . In the absence of damping, γ = 0, the energy

φ̇ 2
E=
. − ω02 cos φ (3.2)
2
is conserved,

d  
. E = φ̇ φ̈ + ω02 φ̇ sin φ = φ̇ φ̈ + ω02 sin φ = −γ φ̇ 2 ,
dt
when using (3.1).

Normal Coordinates Transforming the damped mathematical pendulum (3.1) to a


set of coupled first-order differential equations via x = φ and φ̇ = y one gets

ẋ = y
. (3.3)
ẏ = −γ y − ω02 sin x .

The phase space for x = (x, y) R 2 . For all γ > 0 the motion approaches for t → ∞
one of the equivalent global fixpoints (2π n, 0), where n ∈ Z.

Phase Space Contraction By definition, phase space contracts close to an


attractor. For a three-dimensional phase space, x = (x, y, z), the quantity

ΔV (t) = Δx(t)Δy(t)Δz(t)
.

= (x(t) − x ' (t)) (y(t) − y ' (t)) (z(t) − z' (t))


3.1 Chaos in Dissipative Systems 89

3
.
φ 3 .
φ
2
2

1
1

0
π 0
φ -π/2 π/2
-π/2 π/2
φ π

-1 -1

-2 -2

Fig. 3.1 Simulation of the mathematical pendulum φ̈ = − sin(φ) − γ φ̇, illustrating the evolution
of the phase space volume for consecutive times (shaded regions), starting with t = 0 (top). Left:
The dissipationless case γ = 0. The energy, see (3.2), is conserved as well as the phase space
volume (Liouville’s theorem). Shown are trajectories for E = 1 and E = −0.5 (solid/dashed line).
Right: For γ = 0.4. Note the consecutive contraction of the phase space volume

corresponds to a small volume in phase space when |x − x' | is small. Its time
evolution is given by

d
. ΔV = ΔẋΔyΔz + ΔxΔẏΔz + ΔxΔyΔż ,
dt
or

ΔV̇ Δẋ Δẏ Δż


. = + + = ∇ · ẋ , (3.4)
ΔxΔyΔz Δx Δy Δz

with the right-hand side corresponding to the trace of the Jacobian.

In Fig. 3.1 the time evolution of a phase space volume is illustrated for the
case of the mathematical pendulum. Volumes of the phase space remain connected
under the effect of the time evolution, undergoing however at times substantial
deformations.

DISSIPATIVE AND CONSERVING SYSTEMS A dynamical system is dissipative, if its


phase space volume contracts continuously, viz when ∇ · ẋ < 0 for all x(t). The system is
said to be conserving if the phase space volume is a constant of motion, viz if ∇ · ẋ ≡ 0.

Mechanical systems, i.e. systems described by Hamiltonian mechanics, are


conserving in above sense. One denotes this result from classical mechanics as
“Liouville’s theorem”.
90 3 Dissipation, Noise and Adaptive Systems

Depending on the energy, mechanical systems may have bounded or non-


bounded orbits. Planets run through bounded orbits around the sun, to give an
example, with certain comets leaving the solar system for ever on unbounded
trajectories. One can easily deduce from Liouville’s theorem, i.e. from phase space
conservation, that bounded orbits are “ergodic”. They come arbitrarily close to all
points in phase space having the identical conserved energy.

Global Dissipation Dissipation may occur everywhere in phase space, or only


locally. The first holds for the damped mathematical pendulum (3.3),

∂ ẋ ∂ ẏ ∂[−γ y − ω02 sin x]


. = 0, = = −γ ,
∂x ∂y ∂y

which implies

∇ · ẋ = −γ < 0 .
.

The damped harmonic oscillator is consequently globally dissipative. The basin of


attraction of the fixpoint (0, 0) covers the full phase space (modulo 2π ). Selected
trajectories are illustrated in Fig. 3.1 together with phase space contraction.

Local Dissipation For the non-linear rotator defined by (2.2),

ṙ = (Γ − r 2 ) r,
. ϕ̇ = ω , (3.5)

we have

∂ ṙ ∂ ϕ̇ ⎨ < 0 for Γ < 0 √
+ = Γ − 3r = < 0 for Γ > 0 and r > rc / 3
2
, (3.6)

.
∂r ∂ϕ ⎩
> 0 for Γ > 0 and 0 < r < rc / 3

where rc = Γ is the radius of the limit cycle existing when Γ > 0. The system
might either dissipate or take up energy, which is typical behavior of adaptive
systems, as we will discuss further in Sect. 3.2.

Phase space contracts globally when Γ < 0, and locally, close to the limit cycle,
when Γ > 0. In the latter case, when Γ > 0, phase space expands around the
unstable fixpoint (0, 0).

Coordinate Transformation The time development of a small phase space vol-


ume, as defined by (3.4), depends on the coordinate system chosen to represent the
variables. As an example we reconsider the non-linear rotator (3.5) in terms of the
Cartesian coordinates x = r cos ϕ and y = r sin ϕ.
3.1 Chaos in Dissipative Systems 91

The respective infinitesimal phase space volumes are related via the Jacobian,

.dx dy = r dr dϕ ,

and we find

ΔV̇ ṙΔrΔϕ + r Δ̇rΔϕ + rΔr Δ̇ϕ ṙ ∂ ṙ ∂ ϕ̇


. = = + + = 2Γ − 4r 2 .
ΔV rΔrΔϕ r ∂r ∂ϕ

Comparing with (3.6) we see that the amount and even the sign of phase space
contraction can depend on the choice of the coordinate system. However, phase
space will always contract close to an attractor, regardless of the coordinate
√ system
selected. This hold for (0, 0) when Γ < 0 and for the limit cycle at r = Γ when
Γ > 0.

Divergence of the Flow and Lyapunov Exponents For a dynamical system ẋ =


f(x) the local change in phase space volume is given by the divergence of the flow,

ΔV̇  ∂fi 
. =∇·f= = λi , (3.7)
ΔV ∂xi
i i

and hence by the trace of the Jacobian Jij= ∂fi /∂xj , as mentioned before. The
trace of a matrix corresponds to the sum i λi of its eigenvalues λi . Phase space
hence contracts when the sum of the local Lyapunov exponents is negative.

3.1.2 Strange Attractors and Dissipative Chaos

Lorenz Model A rather natural question regards the possible existence of attractors
with irregular behaviors, i.e. which are different from stable fixpoints, periodic or
quasi-periodic motion. For this question we examine the Lorenz model

ẋ = −σ (x − y),
.

ẏ = −xz + rx − y, (3.8)
ż = xy − bz .

The classical values are .σ = 10 and .b = 8/3, with r being the control variable.
92 3 Dissipation, Noise and Adaptive Systems

Fig. 3.2 A typical trajectory of the Lorenz system (3.8), for the classical set of parameters, σ =
10, b = 8/3 and r = 28. The chaotic orbit loops around the remnants of the two fixpoints (3.9),
which are unstable for the selected set of parameters. Color coding with respect to z, projected to
the z = 0 plane

Fixpoints of the Lorenz Model A trivial fixpoint is (0, 0, 0). The non-trivial
fixpoints are

0 = −σ (x − y), x = y,
. 0 = −xz + rx − y, z = r − 1,
0 = xy − bz, x 2 = y 2 = b (r − 1) .

It is easy to see by linear analysis that the fixpoint (0, 0, 0) is stable for r < 1. For
r > 1 it becomes unstable via a pitchfork bifurcation and two new fixpoints appear,

C± = ± b(r − 1), ± b(r − 1), r − 1 .


. (3.9)

These are stable for r < rc = 24.74 (for σ = 10 and b = 8/3), at which point a
subcritical Hopf bifurcation occurs. Generally one has rc = σ (σ +b+3)/(σ −b−1).
For r > rc the behavior becomes more complicated and generally non-periodic.

Strange Attractors One can show, that the Lorenz model has a positive Lyapunov
exponent for r > rc . It is chaotic with sensitive dependence on the initial conditions.
A typical orbit is illustrated in Fig. 3.2. The Lorenz model is at the same time
globally dissipative, since

∂ ẋ ∂ ẏ ∂ ż
. + + = −(σ + 1 + b) < 0, σ, b > 0 . (3.10)
∂x ∂y ∂z

The consequence is that the attractor of the Lorenz system cannot be a smooth
surface. Close to the attractor phase space contracts. At the same time two nearby
orbits are repelled due to the positive Lyapunov exponents. One finds a self-similar
structure for the Lorenz attractor with a fractal dimension 2.06 ± 0.01, as defined
further below. Such a structure is called a “strange attractor”.
3.1 Chaos in Dissipative Systems 93

Fig. 3.3 The Sierpinski carpet and its iterative construction

Dissipative Chaos and Strange Attractors Strange attractors can only occur in
dynamical system of dimension three and higher, in one dimension fixpoints are the
only possible attracting states and one needs at least two dimensions for limit cycles.

The Lorenz model has an important historical relevance for the development
of chaos theory It is considered a paradigmatic model, since chaos in dissipative
and deterministic dynamical systems is closely related to the emergence of strange
attractors. Chaos may arise in one dimensional maps, as we see in Sect. 2.4, but
continuous-time dynamical systems need to be at least three dimensional in order to
show chaotic behavior.

Hyperchaos An attractor is chaotic when at least a single Lyapunov exponent is


positive. As a function of the position on the attracting manifold, local Lypunov
exponets may vary substantially, which implies that the occurance of chaos is
determined by the sign of the maximal Lypunov, compare the corresponding
definition (2.55) for maps. Several Lypunov exponents may become positive for
systems with larger numbers of dynamical variables, such as dynamical networks.
One speaks of “hyperchaos” when this happens.

Fractals Strange attractors often show a high degree of self-similarity, being


fractal. Fractals can be defined on an abstract level by recurrent geometric rules,
prominent examples are the Cantor set, the Sierpinski triangle and the Sierpinski
carpet illustrated in Fig. 3.3. Strange attractors are normally “multi fractal”, i.e.
fractals with non-uniform self similarity.

Hausdorff Dimension An important notion in the theory of fractals is the


“Hausdorff dimension”. We consider a geometric structure defined by a set of points
in d dimensions and the number N(l) of d-dimensional spheres of diameter l needed
to cover this set. If N(l) scales like

N(l) ∝ l −DH ,
. for l → 0, (3.11)
94 3 Dissipation, Noise and Adaptive Systems

Fig. 3.4 The fundamental


unit of the Sierpinski carpet,
compare Fig. 3.3. It contains 1 2 3
eight squares which can be
covered by discs of
appropriate diameter 4 5

6 7 8

then DH is called the Hausdorff dimension of the set. Alternatively we can rewrite
(3.11) as
−DH
N (l) l log[N(l)/N(l ' )]
. = , DH = − , (3.12)
N(l ' ) l' log[l/ l ' ]

which is useful for self-similar structures (fractals).

The d-dimensional spheres necessary to cover a given geometrical structure will


generally overlap. The overlap does not affect the value of the fractal dimension
as long as the degree of overlap does not change qualitatively with decreasing
diameter l.

Hausdorff Dimension of the Sierpinski Carpet For the Sierpinski carpet we


increase the number of points N(l) by a factor of eight when we decrease the length
scale l by a factor of three,

log[8/1] log 8
DH → −
. = ≈ 1.8928 .
log[1/3] log 3

See Fig. 3.4 for an illustration.

3.1.3 Partially Predictable Chaos

The strange attractor of the Lorenz model has the form of a ‘butterfly’, but otherwise
seemingly no further internal structure. Orbits exponentially diverge close to the
attractor, remaining however bounded by its overall size. The only feature remaining
predictable is that orbits will not diverge indefinitely. The sistuation is more complex
for most real-world chaotic systems.
As an example of a real-world system with substantial chaotic components we
may take weather forecasting. When a hurricane approaches a coastline, one may
3.1 Chaos in Dissipative Systems 95

ΔXt

ΔX0
x
x x

y y y

Fig. 3.5 The evolution of two initially close trajectories (blue/magenta lines). The respective
initial states (shaded circles) are evolved for the identical period t (filled circles). Shown are orbits
for the non-linear rotator (3.5), for .Γ < 0 (left panel) and .Γ > 0 (middle panel), which converge
respectively to a stable focus and limit cycle (black ring). For the Lorenz model (right panel),
trajectories diverge on the attractor, with one orbit circling one of the unstable fixpoints (grey
circles) an additional time. Parameters as in Fig. 3.2

not be able to predict the location of landfall precisely, nor the strength of the
hurricane at that point. Nobody will doubt however, that a full-blown hurricane
is approaching. Certain features can remain predictable even in otherwise chaotic
systems. This is the notion of “partially predictable chaos”, which we will examine
in the following.
For a precise definition of partially predictable chaos one needs an observational
tool allowing to determine the presence of chaos directly from the properties of
typical orbits, without resorting to the evaluation of Lyapunov exponents. This tool,
a “0-1 test for chaos”, is developed first.

Initially Close Orbits For a given dynamical system we examine the long-term
fate of two initially close orbits, .x = x(t) and .x' = x' (t), with

ΔX0 = ||x − x' ||t=0


.

being small. We are interested in the average long-term inter-orbit distance,


 
ΔX∞ = ||x − x' || t>T ,
. (3.13)

which is defined as the average distance between .x and .x' for times .t > T , where T
is larger than the time scales of the defining dynamical system.

0-1 Test for Chaos In Fig. 3.5 the evolution for two initial close orbits is illustrated
for the case that both orbits approach the same attractor, which may be a fixpoint,
limit cycle, or a chaotic attractor.
96 3 Dissipation, Noise and Adaptive Systems

ΔXt
classical fully
decorrelated

partially
predictable

log(time)

Fig. 3.6 Chaotic dynamics is partially predictable (black curve) when the distance .ΔXt of two
initially close orbits remains at a temporary plateau after the starting divergence. The fully
decorrelated final state is reached only after a substantial delay. Information is lost within a single
process for classical chaos (orange curve)

– FIXPOINT
One has .ΔX∞ → 0 independently of .ΔX0 .
– LIMIT CYCLE
When .ΔX0 is small, substantially smaller than the size of the attracting limit
cycle, one observes .ΔX∞ ∝ ΔX0 . The two orbits will follow each other
indefinitely after having entered the limit cycles at slightly different points, with
the average distance .ΔX∞ scaling linearly with .ΔX0
– CHAOTIC ATTRACTOR
Orbits fully decorrelate on the attractor, with .ΔX∞ approaching the average two-
point distance irrespectively of .ΔX0 .

Above rules constitute a “0-1 test” for chaos. Operationally, one integrates the
system in question for pairs of initially close orbits. The resulting average final
distance .ΔX∞ is plotted relative to its initial value. The attracting state is a limit
cycle if .ΔX∞ scales with .ΔX0 , otherwise a fixpoint or a chaotic attractor.

Partially Predictable Chaos Decorrelation is characterized by a single time scale


for classical chaotic attractors, such as the butterfly attractor illustrated in Fig. 3.2.
The incurring loss of information is captured by the time development of the inter-
orbit distance, .ΔXt , as illustrated in Fig. 3.6. Chaos remains however partially
predictable when decorrelation is limited for prolonged periods. In this case, when
two orbits retain a certain finite distance, one can evaluate their relative development
with an accuracy given by the respective .ΔXt . One can still predict what is going to
happen for extended time spans, albeit with a finite accuracy. The whole process is
shown in Fig. 3.6.

As a function of the control parameter r, a variety of attracting states appear


in the Lorenz model. For .r = 28 the classical butterfly attractor is observed, see
Fig. 3.2. For larger values of r, windows with period-doubling limit cycles appear.
Interestingly, for certain values of .r, like .r = 180.78, a distinct type of chaotic
attractors is observed, in the form of chaotic braids. Topologically, these braids
3.2 Adaptive Systems 97

correspond to chaotically broadened limit cycles. Decorrelation slows down by


orders of magnitude once .ΔXt becomes comparable to the width of the braids. The
subsequent diffusive decorrelation along the former limit cycle is very slow. This is
a typical example of partially predictable chaos.

3.2 Adaptive Systems

In general, complex systems are neither fully conserving nor fully dissipative.
Adaptive systems will have phases where they take up energy and periods where
they give energy back to the environment. An example is the non-linear rotator
defined in (3.5), see also (3.6).
One affiliates with the term “adaptive system” often to the notion of complexity
and adaption. Strictly speaking, any dynamical system is adaptive if .∇ · ẋ may take
both positive and negative values. In practice, however, it is usual to reserve the
term adaptive system to dynamical systems showing a certain complexity, such as
emerging behavior.

Van der Pol Oscillator Circuits or mechanisms built for the purpose of controlling
an engine or machine are intrinsically adaptive. An example is the Van der Pol
oscillator,

ẋ = y
ẍ − ϵ(1 − x 2 )ẋ + x = 0,
. (3.14)
ẏ = ϵ(1 − x 2 )y − x

where .ϵ > 0 and where we did use the phase space variables .x = (x, y). The
evolution of the phase space volume is

∇ · ẋ = ϵ (1 − x 2 ) .
.

The oscillator takes up/dissipates energy for .x 2 < 1 and .x 2 > 1, respectively. A
simple mechanical example for a system with similar properties is illustrated in
Fig. 3.7

Secular Perturbation Theory We consider a perturbation expansion in .ϵ. For


ϵ = 0, the solution of (3.14) describes harmonic oscillations,
.

x0 (t) = a ei(ω0 t+φ) + c.c.,


. ω0 = 1 . (3.15)

Amplitude a and phase .φ are arbitrary, as usual for harmonic oscillators. The
perturbation .ϵ(1 − x 2 )ẋ may change both the amplitude and the unperturbed
frequency .ω0 = 1 by an amount .∝ ϵ. In order to account for this “secular
perturbation” we make the ansatz
 
x(t) = A(T )eit + A∗ (T )e−it + ϵx1 ,
. A(T ) = A(ϵt) , (3.16)
98 3 Dissipation, Noise and Adaptive Systems

Fig. 3.7 The seesaw with a


water container at one end; an
example of an oscillator that
takes up and disperses energy
periodically

which differs from the usual expansion .x(t) → x0 (t) + ϵx ' (t) of the full solution
.x(t) of a dynamical system with respect to a small parameter .ϵ. The yet to be

determined time-dependent complex prefactor A allows for an adaption of the


original frequency. The ansatz .A = A(ϵt) ensures that the prefactor is constant
when .ϵ → 0.

Expansion The goal is to expand the Van der Pol oscillator together with (3.16)
with respect to the perturbation2 We start be evaluating several expressions involing
1
.x(t) up to order .O(ϵ ), namely

 
x 2 ≈ A2 e2it + 2|A|2 + (A∗ )2 e−2it + 2ϵx1 Aeit + Ae−it
.

 
ϵ(1 − x 2 ) ≈ ϵ(1 − 2|A|2 ) − ϵ A2 e2it + (A∗ )2 e−2it ,
  ∂A(T )
ẋ ≈ (ϵAT + iA) eit + c.c. + ϵ ẋ1 , AT =
∂T
 
ϵ(1 − x 2 )ẋ = ϵ(1 − 2|A|2 ) iAeit − iA∗ e−it
  
− ϵ A2 e2it + (A∗ )2 e−2it iAeit − iA∗ e−it

and
 
ẍ =
. ϵ 2 AT T + 2iϵAT − A eit + c.c. + ϵ ẍ1
 
≈ (2iϵAT − A) eit + c.c. + ϵ ẍ1 .

2 The following derivations are informative, but somewhat advanced. In case, the reader may skip
directly to the result, Eq. (3.18).
3.2 Adaptive Systems 99

Substituting these expressions into (3.14), we obtain

ẍ1 + x1 = −2iAT + iA − i|A|2 A eit − iA3 e3it + c.c. ,


. (3.17)

to order .O(ϵ 1 ).

Solvability Condition The time dependencies

. ∼ eit and ∼ e3it

of the two terms on the right-hand side of (3.17) are proportional to the unperturbed
frequency .ω0 = 1 and to .3ω0 , respectively.

Equation (3.17) is identical to that of a driven harmonic oscillator.3 The term


.∼ eit is therefore exactly at resonance and would induce a diverging response
.x1 → ∞, in contradiction to the perturbative assumption made by ansatz (3.16).

Its prefactor must therefore vanish,

∂A 1 ∂A ϵ
AT =
. = 1 − |A|2 A, = 1 − |A|2 A , (3.18)
∂T 2 ∂t 2
where we used .T = ϵt. The solvability condition (3.18) can be written as
ϵ
ȧ eiφ + i φ̇ a eiφ =
. 1 − a 2 a eiφ ,
2

in phase-magnitude representation .A(t) = a(t)eiφ(t) , or

ȧ = ϵ 1 − a 2 a/2,
. φ̇ ∼ O(ϵ 2 ) . (3.19)

The system takes up energy for .a < 1 and the amplitude a increases till the
saturation limit .a → 1, the conserving point. For .a > 1 the system dissipates
energy to the environment and the amplitude a decreases, approaching unity for
.t → ∞, just as we discussed in connection with (3.5).

In the limit .ϵ → 0 the long-term solution of the Van der Pol oscillator is
.x(t) ≈ 2 a cos(t), compare (3.16) and (3.19), which constitutes an amplitude-

regulated oscillation. This behavior, compare Fig. 3.8, is relevant for technical
control tasks.

3 The harmonic oscillator is resonant only when the frequency of the perturbation matches the
internal frequency. Non-harmonic oscillators may however unstable against rational frequency
ratios, as discussed in Sect. 2.1 of Chap. 2 in the context of the KAM theorem, with regard to
the gaps in the Saturn rings.
100 3 Dissipation, Noise and Adaptive Systems

Fig. 3.8 Two solutions of 4


x(t)
the Van der Pol oscillator
(3.14), for small .ϵ and two
2
different initial conditions.
Note the self-generated
amplitude stabilization 0

-2

-4

Liénard Variables For large .ϵ it is convenient to define with

d
. ϵ Y (t) = −x(t) = ẍ(t) − ϵ 1 − x 2 (t) ẋ(t) (3.20)
dt

the Liénard variable .Y (t), where the second equality is just the definition of the Van
der Pol oscillator, see (3.14). With the .X(t) ≡ x(t) we rewrite (3.20) as

ϵ Ẏ = Ẍ − ϵ 1 − X2 Ẋ,
. X(t) = x(t) ,

which we integrate with respect to t,

X3
ϵY = Ẋ − ϵ X −
. ,
3

where we did set the integration constant to zero. Together with (3.20) we obtain

Ẋ = c Y − f (X) ,
. f (X) = X3 /3 − X,

Ẏ = −X/c , (3.21)

with .c ≡ ϵ, as we are now interested in the case .c ⪢ 1.

Relaxation Oscillations For a large driving c, we can discuss the solution of the
Van der Pol oscillator (3.21) graphically, as illustrated in Fig. 3.9. Of relevance is the
flow .(Ẋ, Ẏ ) in phase space .(X, Y ). For .c ⪢ 1 there is a separation of time scales,

(Ẋ, Ẏ ) ∼ (c, 1/c),


. Ẋ ⪢ Ẏ ,
3.2 Adaptive Systems 101

Y y
2

2
1
a0 1 a0
1 -a 0 X -a 0 x
1

Fig. 3.9 Van der Pol oscillator for a large driving .c ≡ ϵ. Left: Relaxation oscillations with respect
to the Liénard variables (3.21). Indicated is the flow .(Ẋ, Ẏ ) (arrows), for .c = 3, see (3.21). Also
shown is the .Ẋ = 0 isocline .Y = −X + X 3 /3 (solid line) and the limit cycle, which includes a
non-constant part (dashed line) and a section of the isocline. Right: The limit cycle in terms of the
original variables .(x, y) = (x, ẋ) = (x, v). Note that .X(t) = x(t)

which leads to the following dynamical behavior:

– Starting at a general .(X(t0 ), Y (t0 )) the orbit develops very fast .∼ c and nearly
horizontally until it hits the “isocline”4

. Ẋ = 0, Y = f (X) = −X + X3 /3 . (3.22)

– Once the orbit is close to the .Ẋ = 0 isocline .Y = −X + X3 /3 the motion slows
down, proceeding with a velocity .∼ 1/c close-to (but not exactly on) the isocline
(3.22).
– Once the slow motion reaches one of the two local extrema .X = ±a0 = ±1 of
the isocline, it cannot follow the isocline any more and makes a rapid transition
towards the other branch of the .Ẋ = 0 isocline, with .Y ≈ const. Note, that
trajectories may cross the isocline vertically, which implies that .Ẏ |X=±1 = ∓1/c
is small but finite right at the extrema.

The orbit therefore relaxes rapidly towards a limiting oscillatory trajectory,


illustrated in Fig. 3.9, with the time needed to perform a whole oscillation depending
on the relaxation constant c; therefore the term “relaxation oscillation”. Relaxation
oscillators represent an important class of cyclic attractors, allowing to model
systems going through several distinct and well characterized phases during the
course of one cycle.5

4 The term isocline stands for “equal slope” in ancient Greek.


5 We will discuss relaxation oscillators further in Sect. 9.4.2 of Chap. 9.
102 3 Dissipation, Noise and Adaptive Systems

3.2.1 Conserving Adaptive Systems

Per definition, conserving dynamical systems conserves the volume of phase space
enclosed by a set of trajectories, as illustrated in Fig. 3.1. In contrast, phase space
expands and contracts along a given orbit when the system is adaptive, as we
discussed for the case of the Van der Pol oscillator, as defined by (3.14), and for the
Taken-Bogdanov system.6 Adaptive systems cannot conserve phase space volume,
but they may conserve other quantities, like a generalized energy functional.

Energy Conservation in Mechanical Systems Newton’s equation

ẋ = v v2
. E(x, v) = + V (x) (3.23)
v̇ = −∇V (x) 2

for a mechanical systems with a potential .V (x) conserves energy, .E = E(x, v),

dE ∂E ∂E  
. = ẋ + v̇ = ∇V + v̇ v = 0 .
dt ∂x ∂v
Energy is an instance of a “constant of motion”, viz of a conserved quantity.

Lotka–Volterra Model for Rabbits and Foxes Evolution equations for one or
more interacting species are termed “Lotka–Volterra” models. A basic example is
that of a prey (rabbit) with population density x being hunted by a predator (fox)
with population density y,

ẋ = Ax − Bxy
. . (3.24)
ẏ = −Cy + Dxy

The population x of rabbits can grow by themselves but the foxes need to eat rabbits
in order to multiply. All constants .A, B, C and D, are positive.

Fixpoints The Lotka–Volterra equation has two fixpoints,

x∗0 = (0, 0),


. x∗1 = (C/D, A/B) . (3.25)

with the respective Jacobians,7

A 0 0 −BC/D
J0 =
. , J1 = .
0 −C AD/B 0

6 SeeSect. 2.3 of Chap. 2). for an in-depth treament of the Taken-Bogdanov equations.
7 We recall that the Jacobian is the matrix of all possible partial derivatives, see Sect. 2.2.1, of
Chap. 2.
3.2 Adaptive Systems 103

3
y

1
E = -2.2 -3.0 -4.0

0
0 1 2 3
x
Fig. 3.10 In the space of population densities .(x, y), the flow of the fox and rabbit Lotka–Volterra
model, see (3.24) and (3.26). The flow expands (contracts) for .x > y (.x < y), as indicated
(blue/green orbits). Trajectories coincide with the iso-energy lines of the conserved function E, as
defined by (3.27), with .(0, 0) being a saddle and .(1, 1) a neutral focus (open red circle). Lyapunov
exponents are real in the shaded region and complex otherwise

The trivial fixpoint .x∗0 is hence a saddle and .x∗1 a neutral focus with purely imaginary

Lyapunov exponents .λ = ±i CA. The trajectories circling the focus close onto
themselves, as illustrated in Fig. 3.10 for .A = B = C = D = 1.

Phase Space Evolution We now consider the evolution of phase space volume, as
defined by (3.4),

∂ ẋ ∂ v̇
. + = A − By − C + Dx . (3.26)
∂x ∂v

The phase space expands/contracts for y smaller/larger than .(A + Dx − C)/B, the
tell sign of an adaptive system.

Constant of Motion The function

E(x, y) = A log(y) + C log(x) − By − Dx


. (3.27)
104 3 Dissipation, Noise and Adaptive Systems

on phase space .(x, y) is a constant of motion for the Lotka–Volterra model (3.24),
since
dE
. = Aẏ/y + C ẋ/x − B ẏ − D ẋ
dt
= A(−C + Dx) + C(A − By) − B(−C + Dx)y − D(A − By)x
= 0.

The prey-predator system (3.24) does hence dispose of a non-trivial constant of


motion. Note that .E(x, y) has no biological significance which would be evident
per se.

ISO-Energy Manifolds The flow of a d-dimensional dynamical system .ẋ = f(x)


disposing of a conserved functional .E(x), we call it here a generalized energy, is
always restricted to an iso-energy manifold defined by .E(x) = const,

dE
. = ∇E · ẋ = 0, ∇E ⊥ ẋ . (3.28)
dt
The flow .ẋ is consequently perpendicular to the gradient .∇E the conserved
generalized energy. Orbits are therefore confined to an iso-energy manifold, which is
one-dimensional, given that phase space is two-dimensional. Trajectories coincided
hence with the iso-energy manifold for the fox and rabbit Lotka–Volterra, as
illustrated in Fig. 3.10.

Lyapunov Exponent We evaluate the Jacobian for a generic point .(x, y) in phase
space,

(1 − y) −x x−y 1
. , λ± = ± (x − y)2 − 4(x + y − 1) ,
y (x − 1) 2 2

where we did set .A = B = C = D = 1. The Lyapunov exponents .λ± are real close
to the axes and complex further away, with the separatrix given by

(x − y)2 = 4(x + y − 1),
. y =x+2± 8x . (3.29)

There is no discernible change in the flow dynamics across the separatrix (3.29),
which we included in Fig. 3.10, viz when the Lyapunov exponents acquire finite
imaginary components.

Invariant Manifolds Fixpoints and limit cycles are examples of invariant subsets
of phase space.

INVARIANT MANIFOLD A subset of phase space invariant under the flow for all times
.t∈ [−∞, ∞] is denoted an invariant manifold.
3.2 Adaptive Systems 105

All trajectories of the fox and rabbit Lotka–Volterra model, apart from the stable
and the unstable manifolds of the saddle .(0, 0), are closed and constitute hence
invariant manifolds.

Fixpoints and limit cycles are invariant manifolds with dimensions zero and one
respectively, strange attractors, see Sect. 3.1.1, have generically a fractal dimension.

Closed Invariant Manifolds The evolution of phase space on an invariant man-


ifold (viz inside the manifold) with dimension m is determined by m Lyapunov
exponents whenever the manifold has a smooth topology (which is not the case for
fractals).

Phase space cannot expand or contract forever for orbits on closed invariant
manifolds M, which are finite. This is the case for all trajectories of the fox and
rabbit Lotka–Volterra model and manifestly evident for the case .A = B = C =
D = 0 discussed above, for which the real part of the Lyapunov exponent .λ± is
anti-symmetric under the exchange .x ↔ y.

Lotka–Volterra System with Resource Limitation The reproduction of the prey


in the original Lotka–Volterra model (3.24) is not limited. In practice, the population
density x of the rabbits will be bounded by the carrying capacity .xmax of the
supporting environment, such that .x < xmax . The modified model,

x
ẋ = Ax 1 −
. − Bxy, ẏ = (Dx − C)y .
xmax

has the non-trival fixpoint

C A C
. x∗ = , y∗ = 1− , (3.30)
D B Dxmax

which exists for .C < Dxmax . The population of rabbits is never large enough to
support a finite population of foxes in the opposite case, when .C > Dxmax , which
leads to .x → xmax and .y → 0. When existing, the steady state defined by (3.30) is
stable.

The stability of (3.30) can be shown via a direct evaluation of the respective
Jacobian. On a general level one can argue that the resource-limiting factor .1 −
x/xmax adds a contracting element. It is hence not surprising that the previously
neutral fixpoint is stabilized.
106 3 Dissipation, Noise and Adaptive Systems

3.3 Diffusion and Transport

Deterministic vs. Stochastic Time Evolution So far we discussed concepts of


deterministic dynamical systems, governed by sets of coupled differential equations
without noise or randomness. At the other extreme are diffusion processes for which
the random process dominates the dynamics.

Dissemination of information through social networks is one of many examples


where diffusion processes plays a paramount role. The simplest model of diffusion
is Brownian motion, which describes the erratic movement of grains suspended in
a liquid observed by the botanist Robert Brown as early as 1827. Brownian motion
became the prototypical example of a stochastic process with the seminal works of
Einstein and Langevin at the beginning of the twentieth century.

3.3.1 Random Walks, Diffusion and Lévy Flights

One-Dimensional Diffusion We start with the random walk of particles along a


line, with each particle having an equal probability .1/2 to move left/right at every
time step. The probability

pt (x),
. x = 0, ±1, ±2, .. . . . , t = 0, 1, 2, . . .

to find the particle at time t at position x obeys the master equation

pt (x − 1) + pt (x + 1)
.pt+1 (x) = . (3.31)
2
Next we take the limit of continuous time and space by generalizing (3.31) to
discrete steps .Δx and .Δt in space and time,

pt+Δt (x) − pt (x) (Δx)2 pt (x + Δx) + pt (x − Δx) − 2pt (x)


. = , (3.32)
Δt 2Δt (Δx)2

where we subtracted on both sides the current distribution .pt (x). Taking the limit
Δx, Δt → 0 in such a way that .(Δx)2 /(2Δt) remains finite, we obtain the diffusion
.

equation

∂p(x, t) ∂ 2 p(x, t) (Δx)2


. =D , D= , (3.33)
∂t ∂x 2 2Δt
3.3 Diffusion and Transport 107

with D being the diffusion constant. Note that the diffusion equation can be cast
into the form of a continuity equation,

ṗ + ∇ · j = 0,
. j = −D ∇p , (3.34)

with the diffusion current .j encoding the diffusive transport of particle from high- to
low concentrations.

Solution of the Diffusion Equation The solution Φ(x, t) of the diffusion equation
(3.33) is given by
 ∞
1 x2
Φ(x, t) = √
. exp − , dx Φ(x, t) = 1 , (3.35)
4π Dt 4Dt −∞

which holds8 for a localized initial state Φ(x, t = 0) = δ(x). For the derivation one
enters the appropiate derivates,

−Φ x2Φ −xΦ −Φ x2Φ


. Φ̇ = + , Φ' = , Φ '' = + .
2t 4Dt 2 2Dt 2Dt 4D 2 t 2
into the diffusion equation (3.33).

Diffusive Transport As a function of the coordinate x, the solution (3.35) of the


diffusion equation corresponds to a Gaussian with variance σ 2 = 2Dt. One hence
concludes that the variance of the displacement follows diffusive behavior, i.e.
 √
〈x 2 (t)〉 = 2D t,
. x̄ = 〈x 2 (t)〉 = 2D t , (3.36)

where we assumed that the mean 〈x〉 = 0 vanishes. Diffusive transport is therefore
characterized by transport sublinear in time, in contrast to ballistic transport
following x = vt. Compare Fig. 3.11.

Green’s Function for Diffusion For general initial distributions p0 (x) = p(x, 0)
of walkers the diffusion equation (3.33) is solved by

. p(x, t) = dy Φ(x − y, t) p0 (y) , (3.37)

 √ √
e−x
2 /a
8 Note that dx = aπ, together with lima→0 exp(−x 2 /a)/ aπ = δ(x).
108 3 Dissipation, Noise and Adaptive Systems

Fig. 3.11 Examples of random walkers with scale-free distributions ∼ 1/|Δx|1+β for real-space
jumps, see (3.39). Left: β = 3, which falls into the universality class of standard Brownian motion.
Right: β = 0.5, a typical Levy flight. Note the occurrence of longer-ranged jumps in conjunction
with local walking

since limt→0 Φ(x − y, t) = δ(x − y). An integral kernel allowing to construct


the solution of a differential equation for arbitrary initial conditions is a “Green’s
function”.

First Passage Time When starting from the origin, which is the typical time ty a
random walker would need to reach a certain distance y > 0 for the first time? We
define the survival probability
 y  
Sy (t) =
. dx Φ(x, t) − Φ(x − 2y, t) , Sy (0) = 1, Sy (∞) = 0 ,
−∞

which denotes the probability that the walker is below y at time t, without having
ever crossed the ‘cliff’ x = y. Here we used the solution (3.35) of the diffusion
equation, as describing walkers starting respectively at x = 0 and x = 2y.
Importantly, the kernel Φ(x, t) − Φ(x − 2y, t) vanishes for x = y. The survival
probability is monotonically decreasing with increasing time. The first passage time
t = tF is not fixed, but distributed as

d y d2  
.Fy (t) = − Sy (t) = −D dx Φ(x, t) − Φ(x − 2y, t) ,
dt −∞ d 2x
3.3 Diffusion and Transport 109

where we used the diffusion equation Φ̇ = DΔΦ in one dimension below the
integral. Direct integration yields

−D d 2  −x 2 /(4Dt)
y
−(x−2y)2 /(4Dt)

Fy (t) = √
. edx − e
4π Dt −∞ dx 2
−D d  −x 2 /(4Dt) 
− e−(x−2y) /(4Dt)
2
= √ e
4π Dt dx x=y

y y2
= √ exp − , (3.38)
4π Dt 3 4Dt

which is a “Lévy distribution”.9

For fixed y, one has that (3.38) scales as ∼ t −α for large t, with α = 3/2. The
first moment of the first passage time density Fy (t) diverges consequently.10 An
average first passage time is not defined.

Lévy Flights One can generalize the concept of a random walker, which is at the
basis of ordinary diffusion, and consider a random walk with distributions ρ(Δt)
and ρ(Δx) for waiting times Δti and jumps Δxi , at time step i = 1, 2, . . . of the
walk, as illustrated in Fig. 3.12. One may assume scale-free distributions

1 1
ρ(Δt) ∼
. , ρ(Δx) ∼ , α, β > 0 . (3.39)
(Δt)1+α (Δx)1+β

If α > 1 (finite mean waiting time) and β > 2 (finite variance), nothing special
happens. In this case the central limiting theorem for well behaved distribution
functions is valid for the spatial component and one obtains standard Brownian
diffusion. Relaxing the above conditions one finds four regimes: normal Brownian
diffusion, “Lévy flights”, fractional Brownian motion, also denoted “subdiffusion”
and generalized Lévy flights termed “ambivalent processes”. Their respective
scaling laws are listed in Table 3.1, with two examples being shown in Fig. 3.11.

Lévy flights occur for a wide range of processes, such as for the flight patterns
of wandering albatrosses. Human travel habits seem to be characterized by a
generalized Lévy flight with α, β ≈ 0.6.


9 The Lévy distribution c/(2π ) exp(− 2tc )/t 3/2 is normalized on the interval t ∈ [0, ∞].
10 The moments of powerlaw distributions are discussed in Sect. 1.1.3 of Chap. 1.
110 3 Dissipation, Noise and Adaptive Systems

Fig. 3.12 A random walker


with distributed waiting times

distance
Δti and jumps Δxi may Δ ti
become a generalized Lévy
flight, compare (3.39)
Δxi

time

3.3.2 Markov Chains

For many common stochastic processes .x1 → x2 → x3 → . . . the probability to


visit a state .xt+1 = y depends solely on the current state .xt = x.

MARKOV PROPERTY A stochastic process is “markovian” if it has no memory.

A memory would be present, on the other hand, if the transition rule .xt → xt+1
would be functionally dependent on earlier .xt−1 , xt−2 , . . . elements of the process.

Absorbing States The transition probabilities .p(x, y) to visit a state .xt+1 = y,


when being at .xt = x, are normalized,

1=
. p(x, y), p(x, y) ≥ 0 , (3.40)
y

since one always arrives to some state .xt+1 = y when starting from a given .xt = x.
A process may stay in place, which occurs with the probability .p(x, x). A state .x ∗
is “absorbing” whenever

p(x ∗ , x ∗ ) = 1,
. p(x ∗ , y) = 0, ∀y /= x ∗ . (3.41)

A stochastic process can be viewed as being terminated, or “extinct”, when reaching


an absorbing state. The extinction probability is then the probability to hit .x ∗ when

Table 3.1 The four regimes of a generalized walker with distribution functions, Eq. (3.39),
characterized by scalings ∼ (Δt)−1−α and ∼ (Δx)−1−β for the waiting times Δt and jumps
Δx, as depicted in Fig. 3.12

α>1 β>2 x̄ ∼ t Ordinary diffusion
α>1 0<β<2 x̄ ∼ t 1/β Lévy flights
0<α<1 β>2 x̄ ∼ t α/2 Subdiffusion
0<α<1 0<β<2 x̄ ∼ t α/β Ambivalent processes
3.3 Diffusion and Transport 111

starting from a given state .x0 . A famous example is the Galton-Watson process,
which describes the extinction probabilities of family names.11

Master Equation We consider density distributions .ρt (x) of walkers, with each
walker having the same transition probabilities .p(x, y). For discrete times .t =
0, 1, . . . , the evolution of the density of walkers is given by the “master equation”
  
ρt+1 (y) = ρt (y) + x ρt (x)p(x, y) − ρt (y)p(y, x)

. = ρt (y) + x ρt (x)p(x, y) − ρt (y) (3.42)

= x ρt (x)p(x, y) ,

where
 we took into account that the number of walkers is conserved, namely that
. y p(x, y) = 1. Random walks and any other stochastic time series, are described
by their defining master equations.

Stationarity A Markov process becomes stationary when the distribution of


walkers does not change anymore with time, viz when

ρ ∗ (y) = ρt+1 (y) = ρt (y),
. ρ ∗ (x)p(x, y) = ρ ∗ (y), ρ∗P = ρ∗ ,
x

where P is the transition matrix .p(x, y). The stationary distribution of walkers .ρ ∗
is consequently a left eigenvector of P .

General Two-State Markov Process As an example we define with

α 1−α
P =
. , α, β ∈ [0, 1] (3.43)
1−β β

the transition matrix P for the general two-state Markov process, compare Fig. 3.13.
The eigenvalues .λ of the left eigenvectors .ρ ∗ = (ρ1 , ρ2 ) of P are determined by

α ρ1 + (1 − β)ρ2 = λρ1
. , (3.44)
(1 − α)ρ1 + β ρ2 = λρ2

which has the solutions



1 1−β
λ1 = 1,
. ρλ∗1 = , N1 = (1 − α)2 + (1 − β)2
N1 1−α

11 The Galton-Watson process will be treated in Sect. 6.5.2 of Chap. 6.


112 3 Dissipation, Noise and Adaptive Systems

Fig. 3.13 The general


two-state Markov chain, as α β
defined by (3.43)
0 1−β 1
1−α

and

1 1
λ2 = α + β − 1,
. ρλ∗2 = √ .
2 −1

The first eigenvalue dominates generically, .λ1 = 1 > |α + β − 1| = |λ2 |, with the
contribution to .ρλ∗2 dying out. Absorbing states are present whenever .α/β = 1, see
Fig. 3.13.

Random Surfer Model A famous diffusion process is the “random surfer model”
which tries to capture the behavior of Internet users. This model is at the basis of
the original Page & Brin Google page-rank algorithm.

Consider a network of .i = 1, . . . , N Internet hosts connected by directed


hyperlinks characterized by the adjacency matrix .Aij .12 We denote with


N
ρi (t),
. ρi (t) = 1
i=1

the probability of finding an Internet surfer visiting host i at time t. The surfers are
assumed to perform a markovian walk on the Internet by clicking randomly any
available out-going hyperlink, giving raise to the master equation

c  Aij
ρi (t + 1) =
. + (1 − c)  ρj (t) . (3.45)
N l Alj j

Normalization is conserved,

  Aij 
. ρi (t + 1) = c + (1 − c) i ρj (t) = c + (1 − c) ρj (t) .
i j l Alj j

 
Hence . i ρi (t + 1) = 1 whenever . j ρj (t) = 1.

12 More about the adjaceny matrix in Sect. 1.2 of Chap. 1.


3.4 Stochastic Systems 113

Google Page Rank The parameter c in the random surfer model regulates the
probability to randomly enter the Internet:

– For .c = 1 the adjacency matrix and hence the hyperlinks are irrelevant. We can
interpret therefore c as the uniform probability to enter the Internet.
– For .c = 0 a surfer never enters or leaves the Internet, continuing to click around
forever, c is hence also the probability to stop clicking hyperlinks.

The random surfer model (3.45) can be solved iteratively. Convergence is fast for
not too small c. At every iteration authority is transferred from one host j to other
hosts i through its outgoing hyperlinks .Aij . The steady-state density .ρi of surfers
can hence be considered as a measure of host authority and is equivalent to the
orginal Google page rank, which was an important score for ranking search results
at the dawn of the Internet.

Relation to Graph Laplacian The continuous time version of the random surfer
model can be derived, for the case .c = 0, from

ρi (t + Δt) − ρi (t)  Aij 


. = ρj (t) − ρi (t), kj = Alj ,
Δt kj
j l

where .kj is the out-degree of host j and .Δt the time step. Taking the limit .Δt → 0
yields

d Λij
. ρ = Λ̃ ρ, Λ̃ij = − , Λij = kj δij − Aij , (3.46)
dt kj

where .Λij is the Graph Laplacian.13 Equation (3.46) corresponds to a generalization


of the diffusion equation (3.33) to networks.

3.4 Stochastic Systems


3.4.1 Langevin Equation

Diffusion as a Stochastic Process Langevin proposed to describe the diffusion of


a particle by the stochastic differential equation,

m v̇ = −m γ v + ξ(t),
. 〈ξ(t)〉 = 0, 〈ξ(t)ξ(t ' )〉 = Qδ(t − t ' ) , (3.47)

where .v(t) is the velocity of the particle and .m > 0 its mass.

13 The Graph Laplacian is treated in Sect. 1.2.1 of Chap. 1.


114 3 Dissipation, Noise and Adaptive Systems

– The term .−mγ v on the right-hand-side of (3.47) corresponds to a damping term,


the friction being proportional to .γ > 0.
– .ξ(t) is a stochastic variable, viz noise. The brackets .〈. . .〉 denote ensemble
averages, i.e. averages over different noise realizations.
– As “white noise” (in contrast to “colored noise”) one denotes noise with a flat
power spectrum (as white light), viz . 〈ξ(t)ξ(t ' )〉 ∝ δ(t − t ' ).
– The constant Q is a measure for the strength of the noise.

All stochastic systems have a noise term on the right-hand side.

Solution of the Langevin Equation Considering a specific noise realization ξ(t),


one finds
 t
−γ t e−γ t '
.v(t) = v0 e + dt ' eγ t ξ(t ' ) (3.48)
m 0

for the formal solution of the Langevin equation (3.47), where v0 ≡ v(0).

Mean Velocity Taking the ensemble average 〈v(t)〉 of the velocity leads to

−γ t e−γ t t '
〈v(t)〉 = v0 e
. + dt ' eγ t 〈ξ(t ' )〉 = v0 e−γ t . (3.49)
m 0   
0

The average velocity decays hence exponentially to zero.

Mean Square Velocity For the ensemble average 〈v 2 (t)〉 of the velocity squared
one finds

2 v0 e−2γ t t '
〈v (t)〉 =
.
2
v02 e−2γ t + dt ' eγ t 〈ξ(t ' )〉
m 0   
0
 
e−2γ t t t ' ''
+ dt ' dt '' eγ t eγ t 〈ξ(t ' )ξ(t '' )〉
m2 0 0   
Q δ(t ' −t '' )

Q e−2γ t t '
= v02 e−2γ t + dt ' e2γ t ,
m2
  
0

(e2γ t −1)/(2γ )

and finally

Q
〈v 2 (t)〉 = v02 e−2γ t +
. 1 − e−2γ t . (3.50)
2 γ m2
3.4 Stochastic Systems 115

For long times the average squared velocity

Q
. lim 〈v 2 (t)〉 = (3.51)
t→∞ 2 γ m2

becomes, as expected, independent of the initial velocity v0 . Equation (3.51) shows


explicitly that the dynamics is driven exclusively by the stochastic process ∝ Q for
long time scales.

Langevin Equation and Diffusion The Langevin equation is formulated in terms


of the particle velocity. In order to make connection with the time evolution of a
real-space random walker, see (3.36), we multiply the Langevin equation by x, take
at the same time the ensemble average,

1
〈x v̇〉 = −γ 〈x v〉 +
. 〈x ξ 〉 . (3.52)
m
We note that

d x2 d2 x 2
x v = x ẋ =
. , x v̇ = x ẍ = − ẋ 2
dt 2 dt 2 2
and
  t   t  t' ' '' )
e−γ (t −t  
〈xξ 〉 = ξ(t)
. v(t ' )dt ' = dt ' dt '' ξ(t)ξ(t '' ) = 0 ,
0 0 0 m   
Qδ(t−t '' )

where we have used (3.48) in the limit of large times and that t '' < t. We then find

d2 〈x 2 〉 d 〈x 2 〉
.
2
− 〈v 2 〉 = −γ
dt 2 dt 2
for (3.52), or

d2 2 d Q
. 〈x 〉 + γ 〈x 2 〉 = 2〈v 2 〉 = , (3.53)
dt 2 dt γ m2

the latter with the help of the long-time result (3.51) for 〈v 2 〉. The solution of (3.53)
is
  Q
〈x 2 〉 = γ t − 1 + e−γ t
. . (3.54)
γ 3 m2
116 3 Dissipation, Noise and Adaptive Systems

For long times we find we

Q Q
. lim 〈x 2 〉 ≃ t ≡ 2Dt, D= (3.55)
t→∞ γ 2 m2 2γ 2 m2

that the solutions of the Langevin equation show diffusive behavior, compare (3.36).
This result, that D ∝ Q, underpins the notion that diffusion is microscopically due
to a stochastic process.

Massless Limit We add an external force F (x) to the Langevin equation (3.47),

ẋ = v,
. m v̇ = −m γ v + F (x) + ξ(t) , (3.56)

where we included the definition v = ẋ of the velocity. Keeping Γ = mγ


constant while taking the limit m → 0 leads to a well defined diffusion constant
D = Q/(2Γ 2 ), as given by (3.55). In this limit, the Langevin equation reduces to

Γ ẋ = F (x) + ξ(t) .
. (3.57)

For stochastic processes in non-physical settings, like in finance, one usually starts
with (3.57), which may be further adapted to the problem at hand.

3.4.2 Stochastic Calculus

Stochastic effects may depend functionally on the location x, e.g. via

Γ ẋ = F (x) + b(x)ξ(t) ,
. (3.58)

which is called the “non-linear Langevin equation”. While looking like a fairly
innocuous term, .b(x)ξ is actually not uniquely defined. The random kicks the
system receives at time t depend via .b(x) = b(x(t)) on a yet undefined position
.x(t). Equation (3.58) needs hence to be supplemented with a rule on how to treat

the last term. This is usually done by looking at an integral over a small time inveral
.[t, t + Δt].

Ito Stochastic Calculus Assuming that the strength of the kicks received is
determined by the position of the sytem immediately before the kick takes place
is consistent with the substitution
 t+Δt  t+Δt
. dt ' b(x(t ' )) ξ(t ' ) → b(x(t)) dt ' ξ(t ' ) , (3.59)
t t

which is known as the “Ito stochastic calculus”.


3.5 Noise-Controlled Dynamics 117

Stratonovich Stochastic Calculus Alternatively, one may assume that the strength
of the individual kicks depend on a suitable average of the position. A possibility to
 t+Δt '
model this situation is to substitute . t dt b(x(t ' )) ξ(t ' ) by

b(x(t)) + b(x(t + Δt)) t+Δt
. dt ' ξ(t ' ) , (3.60)
2 t

which is known as the “Stratonovich stochastic calculus”. The two formalisms,


Stratonovich and Ito, lead in general to different results. In physics, Stratonovich
is usually the correct choice, with Ito describing what happens in finance.

3.5 Noise-Controlled Dynamics

Stochastic Systems A set of first-order differential equations with a stochastic


term is generally denoted a “stochastic system”. The Langevin equation (3.47) is
a prominent example. Depending on the circumstances, the stochastic component
may determine the long-term dynamical behavior alltogether. Some examples.

– NEURAL NETWORKS
Networks of interacting neurons are responsible for the cognitive information
processing in the brain. They must remain functional also in the presence of
noise and be stable as stochastic systems. In this case the introduction of a noise
term to the evolution equation should not change the dynamics qualitatively. This
postulate should be valid for the vast majorities of biological networks.
– DIFFUSION
The Langevin equation reduces, in the absence of noise, to a damped motion
without an external driving force, with .v = 0 acting as a global attractor. The
stochastic term is therefore essential in the long-time limit, leading to diffusive
behavior, with .〈x 2 〉 ∝ t.
– STOCHASTIC ESCAPE AND STOCHASTIC RESONANCE
A particle trapped in a local minimum may escape the minimum by a noise-
induced diffusion process; a phenomenon denoted “stochastic escape”. Stochas-
tic escape in a driven bistable system leads to an even more subtle consequence
of noise-induced dynamics, “stochastic resonance”.

In the following we detail out both stochastic escape and stochastic resonance.

3.5.1 Fokker–Planck Equation

Drift Velocity We add an external potential .V (x) to the Langevin equation (3.47),

d
m v̇ = −m γ v + F (x) + ξ(t),
. F (x) = −V ' (x) = − V (x) , (3.61)
dx
118 3 Dissipation, Noise and Adaptive Systems

where v and m are the velocity and the mass of the particle, .〈ξ(t)〉 = 0 and
〈ξ(t)ξ(t ' )〉 = Qδ(t − t ' ). In the absence of damping and noise, when .γ = 0 = Q,
.

Eq. (3.61) reduces to Newton’s law.

We consider for a moment a constant force F (x) = F and the absence of noise,
ξ(t) ≡ 0. The system reaches an equilibrium for t → ∞ when relaxation and force
cancel each other:
F
m v̇D = −m γ vD + F ≡ 0,
. vD = . (3.62)
γm

vD is called the “drift velocity”. A typical example is the motion of electrons in a


metallic wire. As described by Ohm’s law, an applied voltage leads to an electric
field along the wire, which induces in turn an electrical current. As a result one
has drifting electrons that are continuously accelerated by the electrical field, while
bumping into lattice imperfections or colliding with the lattice vibrations, i.e. with
phonons.

Continuity Equation Generalizing to an ensemble of particles diffusing in an


external potential, we denote with P (x, t) the density of particles at location x and
time t. Particle number conservation defines the particle current density J (x, t) via
the continuity equation

∂P (x, t) ∂J (x, t)
. + = 0. (3.63)
∂t ∂x
There are two contributions, JD and Jξ , to the total particle current density, J =
JD + Jξ , induced respectively by diffusion and by stochastic motion. We derive
these two contributions in two steps.

Drift and Diffusion Currents In a first step we disregard noise in (3.61) and set
Q = 0. In the stationary limit particles move in this case uniformly, with the drift
velocity vD . The respective current density is

JD = vD P (x, t) .
.

In a second step we derive the contribution Jξ of the noise term ∼ ξ(t) to the particle
current density by setting the force term to zero, F = 0. For this purpose we rewrite
the diffusion equation (3.33) with

∂P (x, t) ∂ 2 P (x, t) ∂Jξ (x, t) ∂P (x, t) ∂Jξ (x, t)


. =D ≡− + =0
∂t ∂x 2 ∂x ∂t ∂x
3.5 Noise-Controlled Dynamics 119

as a continuity equation, which allows us to determine the functional form of Jξ ,

∂P (x, t)
Jξ = −D
. . (3.64)
∂x

Fokker–Planck Equation We recall the relation D = Q/(2γ 2 m2 ) between the


diffusion constant D and the amplitude of the noise term Q, see (3.55). Adding both
current contributions, we obtain

∂P (x, t)
J (x, t) = vD P (x, t) − D
. (3.65)
∂x
F Q ∂P (x, t)
= P (x, t) −
γm 2γ 2 m2 ∂x

for the total current density J = JD + Jξ . Substituting this expression for the
total particle current density into the continuity equation, see (3.63), one obtains
the “Fokker–Planck” or “Smoluchowski” equation

∂P (x, t) ∂vD P (x, t) ∂ 2 D P (x, t)


. =− + , (3.66)
∂t ∂x ∂x 2

aka the master equation of the density distribution P (x, t). The first and second term
on the right-hand side correspond respectively to ballistic and diffusive transport.
Without going into details, we note that above expression for the Fokker–Planck
equation is consistent with the Ito stochastic calculus, as defined by (3.59).

3.5.2 Stochastic Escape

Harmonic Potential One can solve the Fokker–Planck equation (3.66) analytically
for a harmonic confining potential,

f 2
V (x) =
. x , F (x) = −f x .
2
We are interested in particular in the stationary density distribution,

dP (x, t) dJ (x, t)
. =0 =⇒ = 0,
dt dx
where the second equation follows from the continuity condition (3.63). With (3.65)
for the total current density we find
   
d fx Q d d d
. + P (x) = 0 = βf x + P (x) ,
dx γ m 2γ 2 m2 dx dx dx
120 3 Dissipation, Noise and Adaptive Systems

V(x)
ΔV
P(x)
xmax xmin

Fig. 3.14 Left: Stationary distribution .P (x) of diffusing particles in a harmonic potential .V (x).
Right: Stochastic escape from a local minimum, with .ΔV = V (xmax )−V (xmin ) being the potential
barrier height and J the escape current

where we used .F = −f x in (3.65). We defined .β = 2γ m/Q together with the


stationary limit .P (x) = limt→∞ P (x, t) for the distribution of random walkers.
The system is confined, which implies that the steady-state current vanishes, which
implies in turn that
 
d
0 = βf x +
. P (x) (3.67)
dx

holds. The corresponding solution is

1
P (x) = A e−βf x = A e−βV (x) ,
2 /2
. A= √ , (3.68)
2π σ 2

where .σ 2 = 1/(βf ) = Q/(2f γ m), with normalization condition . dxP (x) = 1
being fulfilled. The density of diffusing particles in a harmonic trap is Gaussian-
distributed, see Fig. 3.14.

Escape Current We now consider particles in a local minimum, as depicted in


Fig. 3.14. A typical partially confining potential has a functional form like

V (x) ∼ −x + x 3 .
. (3.69)

Without noise, the particle will oscillate around the local minimum eventually
coming to a standstill under the influence of friction, with x → xmin . With noise,
the particle will have a small but finite probability

. ∝ e−βΔV , ΔV = V (xmax ) − V (xmin )

to reach the next saddle, where ΔV is the potential difference between the saddle
and the local minimum, see Fig. 3.14.
3.5 Noise-Controlled Dynamics 121

The solution (3.68) for the stationary particle distribution in a confining potential
V (x) has a vanishing total current J . For non-confining potentials, like (3.69), the
particle current J (x, t) never vanishes. Stochastic escape occurs when starting with
a density of diffusing particles close the local minimum, as illustrated in Fig. 3.14.
The escape current will be nearly constant whenever the escape probability is small.
In this case the escape current


J (x, t)
. ∝ e−β [V (xmax )−V (xmin )] ,
x=xmax

will be proportional to the probability a particle has to reach the saddle. Function-
ally, we did approximate P (x) with one valid for a perfect harmonic potential, as
given by (3.68).

Kramer’s Escape When the escape current is finite, there is a finite probability
per unit of time for the particle to escape the local minima, the “Kramer’s escape
rate”, rK ,
ωmax ωmin  
rK =
. exp − β (V (xmax ) − V (xmin )) , (3.70)
2π γ

where β = 2γ m/Q and where the prefactors can be derived from a more detailed
2 = |V '' (x ''
min )|/m and ωmax = |V (xmax )|/m.
calculation, with ωmin 2

Stochastic Escape in Evolution Stochastic escape occurs in many real-world


systems. Noise allows the system to escape from a local minimum where it would
otherwise remain stuck for eternity. As an example, we mention stochastic escape
from a local fitness maximum (in evolution fitness is to be maximized) by random
mutations that play the role of noise.14

3.5.3 Stochastic Resonance

Driven Double-Well Potential We consider diffusive dynamics in a driven double-


well potential, see Fig. 3.15,

1 1
ẋ = −V ' (x) + A0 cos(Ωt) + ξ(t),
. V (x) = − x 2 + x 4 . (3.71)
2 4
Several remarks.

14 Stochastic escape from local fitness maxima will be discussed in more detail in Sect. 8.4 of
Chap. 8.
122 3 Dissipation, Noise and Adaptive Systems

Fig. 3.15 The driven


V(x, t)
double-well potential,
V (x) − A0 cos(Ωt)x,
compare (3.71). The driving
force is small enough to
retain the two local minima

– Equation (3.71) is an example for a Langevin equation in the massless limit


(3.57), here with .Γ = mγ = 1. Friction has hence been taken to diverge.
– For the potential .V (x) a normal form has been taken, which one can always
achieve by rescaling variables appropriately.
– The potential has two stable minima .x0 ,

. − V ' (x) = 0 = x − x 3 = x(1 − x 2 ), x0 = ±1 .

The local maximum at .x0 = 0 is unstable.


– We assume that the periodic driving .∝ A0 is small enough, such that the effective
potential .V (x)−A0 cos(Ωt)x retains two minima at all times, compare Fig. 3.15.

Transient State Dynamics The system will stay close to one of the two minima,
x ≈ ±1, for most of the time when both A0 and the noise strength are weak, as
illustrated in Fig. 3.16. The dynamics is therefore characterized by rapid transitions
between transiently stable states.

Switching Times An important question is then: How often does the system switch
between the two preferred states x ≈ 1 and x ≈ −1? There are two time scales
present.

– STOCHASTIC ESCAPE
In the absence of external driving, A0 ≡ 0, the transitions are noise driven and
irregular, with the average switching time given by Kramer’s lifetime TK =
1/rK , see Fig. 3.16. The system is translational invariant with respect to time
and the ensemble averaged expectation value

. 〈x(t)〉 = 0

therefore vanishes in the absence of an external force.


– EXTERNAL FORCING
When A0 /= 0 the external force induces a reference time together with a non-
zero response x̄,

〈x(t)〉 = x̄ cos(Ωt − φ̄) ,


. (3.72)

which follows the time evolution of the driving potential with a certain phase
shift φ̄, see Fig. 3.17.
3.5 Noise-Controlled Dynamics 123

Fig. 3.16 Example 2


trajectories x(t) for the driven
1
double-well potential. The
strength and the period of the 0
driving potential are –1
A0 = 0.3 and 2π/Ω = 100,
respectively. The noise level –2
Q is 0.05, 0.3 and 0.8 2
(top/middle/bottom), see 1
(3.71)
0
–1
–2
2
1
0
–1
–2
0 100 200 300 400 500
t

The phenomenon of stochastic resonance regards the size of the response, as


measured by x̄

Resonance Condition When the time scale 2TK = 2/rK to switch back and forth
due to the stochastic process equals the period 2π/Ω, we expect a large response x̄,
see Fig. 3.17. The time-scale matching condition

2π 2
. ≈
Ω rK

depends via (3.70) on the noise-level Q, which enters the Kramer’s escape rate
rK . For otherwise constant parameters, the response x̄ first increases with rising Q,
decreasing however for elevated noise levels Q. This is the telltale characteristic of
“stochastic resonance”, as shown in Fig. 3.17.

Ice Ages The average temperature Te of the earth differs by about ΔTe ≈ 10 ◦ C
in between a typical ice age and interglacial periods. Both states of the climate are
locally stable.

– ICE AGE
A substantial ice covering increases the albedo of the earth, which leads in turn
to a larger part of sunlight to be reflected back to space. Earth remains cool.
124 3 Dissipation, Noise and Adaptive Systems

Fig. 3.17 The gain x̄, see 1.2


(3.72), as a function of noise
level Q. The strength of the
driving amplitude A0 is 0.1,
0.2 and 0.3 0.8
(bottom/middle/top curves), _
see (3.71) and the period x
2π/Ω = 100. The response x̄
is very small for vanishing 0.4
noise Q = 0, when the
system performs only
small-amplitude oscillations
in one of the local minima 0
0 0.2 0.4 0.6 0.8 1
Q

– INTERGLACIAL PERIOD
The ice covering is reduced. A larger portion of the sunlight is absorbed by the
oceans and land, with earth remaining warm.

A parameter of the orbit of planet earth, the eccentricity, varies slightly with a period
T = 2π/Ω ≈ 105 years. The intensity of the incoming radiation from the sun
therefore varies with the same period. Long-term climate changes can therefore be
modeled by a driven two-state system, i.e. by (3.71). The driving force, viz the
variation of the energy flux the earth receives from the sun, is however small. The
increase in the amount of incident sunlight is too weak to pull the earth out of
an ice age into an interglacial period or vice versa. Random climatic fluctuation,
like variations in the strength of ocean circulations, are needed to finish the job.
The alternation of ice ages with interglacial periods may therefore be modeled as a
stochastic resonance phenomenon.

Beyond Stochastic Resonance Resonance phenomena generally occur when two


frequencies, or two time scales, match as a function of some control parameter. For
the case of stochastic resonance these two time scales correspond to the period of the
external driving and to the average waiting time for Kramer’s escape respectively,
with the latter depending directly on the level of the noise. The phenomenon is
denoted as stochastic resonance since one of the time scales involved is controlled
by noise.

One generalization of stochastic resonance is “coherence resonance”. In this case


one has a dynamical system with two internal time scales, say t1 and t2 . These two
time scales can be affected to a different degree by an additional source of noise. An
additional stochastic term may therefore change the ratio t1 /t2 , leading to internal
resonance phenomena.
Exercises 125

Exercises

(3.1) SHIFT MAP


Using the representation xn = (1 − cos(π θn ))/2 show that the logistic map
(2.46) at r = 4 is equivalent to the shift map

2θn for 0 < θ < 0.5


θn+1 = (2θn )%1 =
. (3.73)
2θn − 1 for 0.5 < θ < 1

where the %-sign denotes the modulus operation. This representation can be
used to evaluate analytically the distribution p(x) of finding x when iterating
the logistic map at r = 4 ad infinitum.
(3.2) FIXPOINTS OF THE LORENZ MODEL
Perform
√ the stability
√ analysis of the fixpoint (0, 0, 0) and of C+,− =
(± b(r − 1), ± b(r − 1), r − 1) for the Lorenz model (3.8) with r, b > 0.
(3.3) HAUSDORFF DIMENSION OF THE CANTOR SET
Calculate the Hausdorff dimension of the Cantor set, which is generated
by removing consecutively the middle-1/3 segment of a line having a given
initial length. Start with the Hausdorff dimension of simple straight line.
(3.4) DRIVEN HARMONIC OSCILLATOR
Solve the driven, damped harmonic oscillator

ẍ + γ ẋ + ω02 x = ϵ cos(ωt)
.

in the long-time limit. Discuss the behavior close to the resonance ω → ω0 .


(3.5) MARKOV CHAIN OF UMBRELLAS
Lady Ann has four umbrellas which she uses whenever it rains, to go from
work to home, or vice versa. She takes only an umbrella with her whenever it
rains, leaving the umbrellas otherwise in the office and at home respectively.
It rains with probability p ∈ [0, 1]. How often does Lady Ann get wet?
(3.6) BIASED RANDOM WALK
Generalize the derivation of the diffusion equation (3.33) for a random
walker jumping with probabilities (1 ± α)/2 either to the right or to the
left, with α ∈ [−1 : 1]. How does α needs to scale such that a non-trivial
contribution is retained in the limit Δt → 0 and Δx → 0? What kind of
solutions does one find for a vanishing diffusion constant D → 0?
(3.7) CONTINUOUS-TIME LOGISTIC EQUATION
Consider the continuous-time logistic equation
 
ẏ(t) = αy(t) 1 − y(t) .
.

Find the general solution. Furthermore, compare to the logistic map (2.46)
for discrete times t = 0, Δt, 2Δt, ...
126 3 Dissipation, Noise and Adaptive Systems

(3.8) NOISE-INDUCED QUASI-PERIODIC ORBITS


Consider with

ṙ(t) = g(1 − r),


. ϕ̇ = 1 − K cos(ϕ) (3.74)

a system showing a saddle-node bifurcation on a limit cycle, as illustrated in


Fig. 2.13. Write a script and demonstrated that noise may allow the orbit
to pass from the stable to the unstable fixpoint. Use the case of noisy
measurements, which corresponds to substitute

.r → r(1 + ξ ), ϕ → ϕ(1 + ξ ), (3.75)

on the right-hand-side of (3.74). Here ξ = ξ(t) is normal distributed with


mean zero and standard deviation σ ⪡ 1.

Further Reading

For further studies we refer to introductory texts, on dynamical systems theory,


Datseris and Parlitz (2022), containing Julia code snippets, on chaos theory and
fractals, Layek (2015), and to textbooks on stochastic systems, Karatzas and Shreve
(2012), Kulkarni (2002).
Overview articles potentially of interest for the reader are Benzi (2010), for
stochastic resonance as a real-world phenomenon, Ginoux and Letellier (2012) for
the history of Van der Pol and relaxation oscillators, and Baronchelli and Radicchi
(2013) for the occurrence of Lévy flights. Partially predictable chaos is treated in
Wernecke et al. (2017).
For insights into some of the original literature we recommend Einstein (1905)
and Langevin (1908) on Brownian motion, or the first formulation and study of the
Lorenz (1963) model.

References
Baronchelli, A., & Radicchi, F. (2013). Lévy flights in human behavior and cognition. Chaos,
Solitons & Fractals, 56, 101–105.
Benzi, R. (2010). Stochastic resonance: From climate to biology. Nonlinear Processes in Geo-
physics, 17, 431–441.
Datseris, G., & Parlitz, U. (2022). Nonlinear dynamics: A concise introduction interlaced with
code. Springer.
Einstein, A. (1905). Über die von der molekularkinetischen Theorie der Wärme geforderte
Bewegung von in ruhenden Flüssigkeiten suspendierten Teilchen. Annalen der Physik, 17, 549.
Ginoux, J. M., & Letellier, C. (2012). Van der Pol and the history of relaxation oscillations: Toward
the emergence of a concept. Chaos: An Interdisciplinary Journal of Nonlinear Science, 22,
023120.
Karatzas, I., & Shreve, S. (2012). Brownian motion and stochastic calculus. Springer.
Kulkarni, V. G. (2016). Modeling and analysis of stochastic systems. CRC Press.
References 127

Langevin, P. (1908). Sur la théorie du mouvement brownien. Comptes Rendus, 146, 530–532.
Layek, G. C. (2015). An introduction to dynamical systems and chaos. Springer.
Lorenz, E. N. (1963). Deterministic nonperiodic flow. Journal of the Atmospheric Sciences, 20,
130–141.
Wernecke, H., Sándor, B., & Gros, C. (2017). How to test for partially predictable chaos. Scientific
Reports, 7, 1087.
Self Organization
4

Self-organized pattern formation occurs when complex spatio-temporal structures


result from seemingly simple dynamical evolution processes. The formation of
animal markings can be understood in this context in terms of a Turing instability.
Zebra stripes, and other biological patterns, emerge in reaction-diffusion systems,
which will be discussed together with the notion of self-stabilizing wavefronts, as
observed for the Fisher equation.
Further prominent examples of self-organizing processes treated in this chapter
involve collective decision making and swarm intelligence, as occurring in social
insects and flocking birds, information offloading in terms of stigmergy, opinion
dynamics and the physics of traffic flows, including the ubiquitous phenomenon of
self-organized traffic congestions.

4.1 Interplay Between Diffusion and Reaction

Processes characterized by the random motion of particles or agents are said to be


diffusive. Consequently, they are described by the diffusion equation1

∂ ∂2 ∂2 ∂2
. ρ(x, t) = DΔρ(x, t), Δ= + + , (4.1)
∂t ∂x 2 ∂y 2 ∂z2

with .D > 0 denoting the diffusion constant, .Δ the Laplace operator and .ρ(x, t) the
density of walkers at any given point .(x, t) in space and time.

1 Diffussion is treated in depth in Sect. 3.3 of Chap. 3.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 129
C. Gros, Complex and Adaptive Dynamical Systems,
https://doi.org/10.1007/978-3-031-55076-8_4
130 4 Self Organization

Reaction-Diffusion Systems The diffusion Eq. (4.1) describes


 a conserving pro-
cess, which implies that the overall number of particles, . dxρ(x, t), remains
constant. With

. ρ(x, t) = R(ρ) + DΔρ(x, t) (4.2)
∂t

a “reaction-diffusion system” is formulated, where .R(ρ) constitutes the reaction of


the system to the current state .ρ = ρ(x, t).
In biological settings .ρ typically stands for the population density and .R(ρ) for
reproduction processes, in the context of chemical reactions .ρ is a measure for the
relative concentration of a given substance whereas .R(ρ) functionally describes
effective reaction rates. Reaction-diffusion systems are fundamental for describing
the self-organized formation of spatial-temporal patterns.

Fisher Equation Considering a one-dimensional system and logistic growth for


the reaction term, one obtains the “Fisher equation”

ρ̇ = rρ(1 − ρ) + Dρ '' ,
. ρ ∈ [0, 1] , (4.3)

which describes a reproductive and diffusive species in an environment with


spatially local resource limitation.

Normal Form One-component reaction-diffusion equations can be cast into a


dimensionless normal form, by rescaling the time and space coordinates appropri-
ately via

∂ 1 ∂ ∂2 1 ∂2
t = α t˜,
. x = β x̃, = , = .
∂t α ∂ t˜ ∂x 2 β 2 ∂ x̃ 2

The rescaled Fisher equation takes the form

∂ρ αD ∂ 2 ρ αD
. = αrρ(1 − ρ) + 2 , αr = 1, = 1.
∂ t˜ β ∂ x̃ 2 β2

It is hence sufficient to consider the normal form

ρ̇ = ρ(1 − ρ) + ρ ''
. (4.4)

of the Fisher equation.

Saturation and Wavefront Propagation The reaction term .R(ρ) = ρ(1 − ρ) of


the Fisher equation is strictly positive for .ρ ∈ [0, 1], with the consequence that

. lim ρ(x, t) = 1, ∀x ,
t→∞
4.1 Interplay Between Diffusion and Reaction 131

1
t=0
t=1
t=2
0.8 t=3
t=4
t=5
t=6
0.6 t=7
ρ(x,t)

t=8

0.4

0.2

0
-15 -10 -5 0 5 10 15
x
Fig. 4.1 Simulation of the Fisher reaction-diffusion equation (4.4). Plotted is .ρ(x, t) for .t =
0, . . . , 8. For the initial distribution .ρ(x, 0) a Gaussian has been taken, which does not correspond
to the natural line-form, but already at .t = 1 the system has relaxed. The wavefronts propagate
asymptotically with velocities .±2

viz that the system saturates. The question of interest is however how saturation is
achieved when starting from a local population .ρ(x, 0), a simulation is presented in
Fig. 4.1. The system is seen to develop wavefronts with a characteristic shape and
velocity. In a biological setting this corresponds to an expansion wave allowing
an initial local population to invade ballistically the uninhabited regions of the
surrounding ecosystem.
This is an interesting observation, since diffusion alone,
√ viz in the absence of
a reaction term, would lead to a sublinear expansion .∼ t. In contrast, ballistic
propagation is linear in time.

4.1.1 Travelling Wavefronts in the Fisher Equation

In order to describe the propagation of wavefronts in a reaction-diffusion system,


we examine the properties of plane-wave solutions of the form

ρ(x, t) = u(x − ct),


. ρ̇ = −cu' , ρ '' = u'' . (4.5)

For the Fisher equation (4.4), this ansatz leads to the two-component ordinary
differential equation
 
u' = v
u'' + cu' + u(1 − u) = 0,
. , (4.6)
v ' = −cv − u(1 − u)
132 4 Self Organization

0.2

v 0

-0.2

0 0.5 1
u
Fig. 4.2 Phase space trajectories of the travelling-wave solution .ρ(x, t) = u(x − ctz), with .v =
u' , as given by (4.6). The shape of the propagating wavefront is determined by the heteroclinic
trajectory (red line) emerging from the saddle .(1, 0) and leading to the stable fixpoint .(0, 0)

with fixpoints .u∗ = (u∗ , v ∗ ),

u∗0 = (0, 0),


. u∗1 = (1, 0) . (4.7)

Minimal Propagation Velocity For the stability of the trivial fixpoint .u∗0 = (0, 0)
one expands (4.6) for small .(u, v),
    
u' 0 1 u
. = , λ(λ + c) + 1 = 0 ,
v' −1 −c v

where .λ is an eigenvalue of the Jacobian2 given by

1 
λ± =
. −c ± c2 − 4 , c ≥ 2. (4.8)
2

A complex eigenvalue would lead to a spiral around .u∗0 = (0, 0), which is not
admissible since .u ∈ [0, 1] needs to be strictly positive. Hence .c = 2 is the√ minimal
occurring propagation velocity. The trivial fixpoint .(0, 0) is stable, since . c2 − 4 <
c for .c ≥ 2 and hence .λ± < 0. The situation is illustrated in Fig. 4.2.

Saddle Restpoint The eigenvalues of the .u∗1 = (1, 0) fixpoint are given by
    
d u−1 0 1 u−1 1 
. = , λ± = −c ± c2 + 4 ,
dz v 1 −c v 2

when denoting .u = u(z). The fixpoint .(1, 0) is hence a saddle, with .λ− < 0 and
λ+ > 0.
.

2 The entries of Jacobian matrix are the derivatives of the flow, see Sect. 2.2 of Chap. 2.
4.1 Interplay Between Diffusion and Reaction 133

1
c=2 (numerical)
c=2.041 (exact)
0.75

u(z)
0.5

0.25

0
0 5 10 15 20
z=x-ct

Fig. 4.3 Numerical result for the minimal velocity (.c = 2) wavefront solution .u∗ (z), compare
(4.9), of
√ the Fisher equation (4.4). For comparison, the wavefront for a slightly larger velocity
.c = 5/ 6 ≈ 2.041 has been plotted, for which the shape can be obtained analytically, see (4.10).
Indicated is .u = 1/4 (dashed lines), which has been used to align the respective horizontal offsets

The unstable direction .u∗ (z), emerging from the saddle and leading to the stable
fixpoint .(0, 0), viz the heteroclinic trajectory, is the only trajectory in phase space
fulfilling the conditions

. lim u∗ (z) = 1, lim u∗ (z) = 0, lim v ∗ (z) = 0 (4.9)


z→−∞ z→∞ z→±∞

characterizing a propagating wavefront. The lineshape .u∗ (z) of the wavefront can
be evaluated numerically, see Fig. 4.3.

Exact Particular Solution of the Fisher Equation A special travelling-wave


solution for the Fischer equation (4.4) is given by

1
ρ ∗ (x, t) = σ 2 (x − ct),
. σ (z) = , (4.10)
1 + eβz

where .σ (βz) is the “sigmoidal” or “Fermi function”. It’s derivatives are

−βeβz
.σ' = 2
= −βσ (1 − σ ), σ '' = β 2 (1 − 2σ )σ (1 − σ ) , (4.11)
1 + eβz

which leads to

∂ρ ∗
. = 2cβσ 2 (1 − σ )
∂t
and

∂ 2ρ∗ ∂
. = (−2βσ 2 )(1 − σ ) = β 2 (4σ − 6σ 2 )σ (1 − σ ) .
∂x 2 ∂x
134 4 Self Organization

The solution .ρ ∗ fulfills the Fisher equation,

∂ρ ∗ ∂ 2ρ∗ !
. − = σ 2 (1 − σ ) 2cβ − β 2 (4 − 6σ ) = σ 2 (1 − σ 2 ) ≡ ρ ∗ (1 − ρ ∗ ) ,
∂t ∂x 2   
!
= (1+σ )

if
4 10 5
1 = 6β 2 ,
. 1 = 2cβ − 4β 2 = 2cβ − , 2cβ = = .
6 6 3
This last condition determines the two free parameters .β and c as

1 5 5
β=√ ,
. c= = √ ≈ 2.041 , (4.12)
6 6β 6

which shows that the propagation velocity c of .ρ ∗ (x, t) is very close to the lower
bound .c = 2 for the allowed velocities. The lineshape of the exact particular
solution is nearly identical to the numerically-obtained minimal-velocity shape for
the propagating wavefront, as shown in Fig. 4.3.

Exponential Falloff of Wavefronts The population density .ρ(x, t) = u(z)


becomes very small for large .z = x − ct, with the consequence that the quadratic
term in (4.6) may be neglected. The resulting linear equation,

u'' + cu' + u ≈ 0 ,
.

has the solution


1
u(z) = e−az ,
. a 2 − ca + 1 = 0, c=a+ . (4.13)
a
The forward tails of all propagating solutions are hence exponential, with the
minimal velocity c corresponding to .a = 1. Relation (4.13) holds also for the exact
particular solution (4.10), for which .a = 2β,

1 1 + 4β 2 6+4 5
c=
. + 2β = = √ =√ ,
2β 2β 2 6 6

in agreement with (4.12), when using .β = 1/ 6.
4.1 Interplay Between Diffusion and Reaction 135

4.1.2 Sum Rule for the Shape of the Wavefront

The travelling-wave Eq. (4.6),

c(u' )2 = − u(1−u)+u'' )u' ,


. lim u(z) = 0, lim u(z) = 1 , (4.14)
z→∞ z→−∞

may be used to derive a sum rule for the shape .u(z) of the wavefront. For this
purpose one integrates (4.14),
 2
z u3 (z) u2 (z) u' (z)
u' (w) dw = A +
2
c
. − − . (4.15)
−∞ 3 2 2

The integration constant A is determined by considering .z → −∞ and taking into


account that .limz→±∞ u' (z) = 0,
 ∞
1 1 1 1
u' (w) dw =
2
A=
. − = , , (4.16)
2 3 6 −∞ 6c

where the second equation is the sum rule for .u' . The wavefront is steepest for the
minimal velocity .c = 2.

Sum Rule for Generic Reaction Diffusion Systems Sum rules equivalent
to (4.16) can be derived for any integrable reaction term .R(ρ) in (4.2). One finds,
by generalizing the derivation leading to (4.16),
 ∞  1
1
u' (w) dw =
2
. R(ρ)dρ . (4.17)
−∞ c 0

In biological setting the reaction term corresponds to the reproduction rate, it is


hence generically non-negative, .R(ρ) ≥ 0, which makes the right-hand side of the
sum rule (4.17) positive. Interestingly, the diffusion constant D has no influence on
the sum rule.

Sum Rule for the Exact Particular Solution We verify that the sum rule (4.16)
for the Fisher equation is satisfied by the particular solution (4.10),

1
u∗ (z) = σ 2 (z),
. σ (z) = σ ' = −βσ (1 − σ ) .
1 + eβz
136 4 Self Organization

With .u'∗ = −2βσ 2 (1 − σ ) we integrate


 ∞  ∞
u'∗ (w) dw
2
. = (−4β)σ 3 (1 − σ ) (−β)σ (1 − σ ) dw
−∞ −∞   
σ'
 
1 1 β β2 1
= 4β − = = = ,
4 5 5 5β 6c

where we used .β 2 = 1/6 and .c = 5β, see (4.12). Note that only the lower bound
z → −∞ contributes to above integral.
.

4.1.3 Self-Stabilization of Travelling Wavefronts

The Fisher equation .ρ̇ = ρ(1 − ρ) + ρ '' supports travelling wavefront solutions
ρ(x, t) = u(x − ct) for any velocity .c ≥ 2. The observed speed of a wavefront
.

may either depend on the starting configuration .ρ(x, 0), or may self-organize to a
universal value. We use perturbation theory in order to obtain an heuristic insight
into this issue.

Solution of the Linearized Fischer Equation The solution of the linearized


Fischer equation,

.ρ̇ = rρ + ρ '' , r = 1, (4.18)

can be constructed for arbitrary initial conditions .p0 (x) = ρ(x, 0), using via

1
e−z /(4t)
2
ρ0 (x, t) =
. dy Φ(x − y, t) p0 (y), Φ(z, t) = √ (4.19)
4π t

the Green’s function .Φ(z, t) for diffusion systems,3 which obeys .limt→0 Φ(z, t) =
δ(z). Note that .Φ(z, t) is the generic solution of the diffusion equation,viz of (4.18)
with .r → 0. One can easily verify that

ρ(x, t) = ert ρ0 (x, t),


. ρ(x, 0) = p0 (x) (4.20)

is the solution of the linearized Fisher equation (4.18), propagating the initial
distribution .p0 (x) to finite times. Equation (4.20) describes exponentially growing
diffusive behavior.

3 Diffusion Green’s functions are introduced in Sect. 3.3 of Chap. 3.


4.1 Interplay Between Diffusion and Reaction 137

Velocity Stabilization and Self Organization Using (4.19) for .ρ0 (x, t) in (4.20)
leads to terms of the type

et Φ(z, t) ∝ e−z = e−(z


2 /(4t)+t 2 −4t 2 )/(4t)
.

= e−(z−2t)(x+2t)/(4t) (4.21)

in the kernel, viz inside the integral, with .z = x − y. Equation (4.21) describes
left- and right propagating fronts, travelling with propagation speeds .c = ±2. The
envelopes of the respective wavefronts are time dependent and do not show the
simple exponential tail (4.13), as exponential falloffs are observed only for solutions
.ρ(x, t) = u(x − ct) characterized by a single velocity c.

– Ballistic Transport
The expression (4.21) shows that the interplay between diffusion and exponential
growth, the reaction of the system, leads to ballistic transport.
– Velocity Selection
The perturbative result (4.21) indicates, that the minimal velocity .|c| = 2 is
achieved for the wavefront when starting from an arbitrary localized initial state
.p0 (x), since .limt→0 e Φ(x, t) = δ(x).
t

– Self Organization
Propagating wavefronts with any .c ≥ 2 are stable solutions of the Fisher
equation, but the system settles to .c = 2 for localized initial conditions. The
stabilization of a non-trivial dynamical property for a wide range of starting
conditions is an example of a self-organizing process.

Above considerations concern the speed c of the travelling wavefront, but do not
make any direct statement regarding the lineshape.

Stability Analysis of the Wavefront In order to examine the stability of the shape
of the wavefront we consider

ρ(x, t) = u(z) + ϵψ(z)e−cz/2 e−λt ,


. z = x − ct , (4.22)

where the second term with .ϵ ⪡ 1 is a perturbation to the solution .u(z) = u(x − ct)
of the travelling wave Eq. (4.6). The particular form of above ansatz is motivated by
yet to be derived final expression.
With the help of the derivatives,

.ρ̇ = −cu' + ϵ (c2 /2 − λ)ψ − cψ ' e−cz/2 e−λt ,

ρ ' = u' + ϵ − cψ/2 + ψ ' e−cz/2 e−λt ,

ρ '' = u'' + ϵ c2 ψ/4 − cψ ' + ψ '' e−cz/2 e−λt ,


138 4 Self Organization

we obtain

ρ̇ − ρ '' = − cu' + u'' + ϵ (c2 /4 − λ)ψ − ψ '' e−cz/2 e−λt


.

ρ(1 − ρ) ≈ u(1 − u) + ϵ 1 − 2u ψe−cz/2 e−λt .

Wavefront Self-Stabilization With above results the Fisher equation .ρ̇ − ρ '' =
ρ(1 − ρ) reduces with
 
d2 c2
. − + V (z) ψ(z) = λψ(z), V (z) = 2u(z) + −1, (4.23)
dz2 4

to a differential equation for .ψ(z), which holds to order .O(ϵ). This expression
corresponds to a time-independent one-dimensional Schrödinger equation4 for a
particle with a mass .m = h̄2 /2 moving in a potential .V (z) that is strictly positive
since .u(z) ∈ [0, 1],

V (z) ≥ 0,
. for c≥2.

the eigenvalues .λ are hence also positive. The perturbation term in (4.22) conse-
quently decays with .t → ∞, which implies that that the wavefront self-stabilizes.
This result would be trivial if it were known a priori that the wavefront equation
'' '
.u +cu +u(1−u) = 0 has a unique solution .u(z). In this case, all states of the form

of (4.22) would need to contract to .u(z). The stability condition (4.23) indicates that
the travelling-wavefront solution to the Fisher equation is indeed unique.
Finally, we remark that the procedure followed here, to derive a differential
equation for a perturbation containing a yet unknown wavefunction, here .ψ(z), is
analogous to secular perturbation theory, which we got to know in the context of the
Van der Pol oscillator.5

4.2 Interplay Between Activation and Inhibition

In chemical reaction systems one reagent may activate or inhibit the production of
the other components, leading to non-trivial chemical reaction dynamics. Chemical
reagents typically also diffuse spatially and the interplay between the diffusion
process and the chemical reaction dynamics may give rise to the development to
spatially structured patterns.

2
4 Recall the expression .ih̄ ∂ψ
∂t = − 2mh̄
Δψ + V ψ for the time-dependent one dimensional
Schrödinger equation. Equation (4.23) is recovered for an exponential time dependency .∼
exp(−iλt/h̄) of the wavefunction .ψ.
5 For the Van der Pol oscillator, secular perturbation theory is developed in Sect. 3.2 of Chap. 3.
4.2 Interplay Between Activation and Inhibition 139

4.2.1 Turing Instability

The reaction-diffusion system (4.2) contains additively two terms, the reaction and
the diffusion term. Diffusion alone leads generically to an homogeneous steady
state.
For the reaction terms considered in Sect. 4.1 the reference state .ρ = 0 was
unstable against perturbations. Will now consider reaction terms for which the
reference homogeneous state is however stable. Naively one would expect a further
stabilization of the reference state, but this is not necessarily the case.

TURING INSTABILITY The interaction of two processes, which separately would stabilize
a given homogeneous reference state, can lead to an instability.

The Turing instability is thought to be the driving force behind the formation of
spatio-temporal patterns observed in many physical and biological settings, such as
the stripes of a zebra.

Turing Instability of Two Stable Foci As a first example we consider a linear two-
dimensional dynamical system composed of two subsystem, which are per se stable
foci6
   
−ϵ1 1 −ϵ2 −a
.ẋ = Ax, A1 = , A2 = . (4.24)
−a −ϵ1 1 −ϵ2

Here .0 < a < 1 and .ϵα > 0, for .α = 1, 2. The eigenvalues for the matrices .Aα and
A = A1 + A2 are
.


λ± (Aα ) = −ϵα ± i a,
. λ± (A) = −(ϵ1 + ϵ2 ) ± (1 − a) .

The origin .x = 0 of the system .ẋ = (A1 + A2 )x hence is consequently a saddle if


1 − a > ϵ1 + ϵ2 , an instance of a Turing instability: Superposing two stable vector
.

fields may generate unstable directions.

Eigenvalues of Two-Dimensional Matrices We remind ourselves that the eigen-


values .λ± of a .2 × 2 matrix A can be written in terms of the trace .a + b and of the
determinant .Δ = ab − cd,
 
ad a + b 1
A=
. , λ± = ± (a + b)2 − 4Δ . (4.25)
cb 2 2

For negative determinants .Δ < 0 the system is a saddle, having both an attracting
and a repelling eigenvalue.

6 As a reminder, note that a node/focus has real/complex Lypunov exponents, as defined in Sect. 2.2

of Chap. 2.
140 4 Self Organization

Superposing a Stable Node and a Stable Focus We start by investigating what


may happen when superimposing a stable focus and a stable node, with the focus
later being defined as
 
−ϵa 1
A1 =
. , Δ 1 = 1 − ϵa ϵ b > 0 , (4.26)
−1 ϵb

with

(ϵb − ϵa )2 < 4Δ1 ,


. ϵa > ϵb > 0 .

The system .ẋ = A1 x has a stable focus at .x = 0 when the trace .ϵb − ϵa < 0 is
negative, compare (4.25), viz when it has two complex conjugate eigenvalues with
negative real parts. Possible values are, e.g. .ϵa = 1/2 and .ϵb = 1/4.
Next we add a stable node
   
−a 0 −ϵa − a 1
.A2 = , A= . (4.27)
0 −b −1 ϵb − b

Can we select .a, b > 0 such that .A = A1 + A2 becomes a saddle? In this case, the
determinant .Δ12 of A,

Δ12 = 1 − (ϵa + a)(ϵb − b) ,


.

should then become negative, compare (4.25). This is clearly possible for .b < ϵb
and a large enough a.

Turing Instability and Activator-Inhibitor Systems The interplay of one activa-


tor and one inhibitor often results in a stable focus, as described by (4.26). Diffusion
processes correspond, on the other side, to stable nodes, such as .A2 in (4.27). The
Turing instability possible when superimposing a stable node and a stable focus is
hence of central relevance for chemical reaction-diffusion systems.

4.2.2 Pattern Formation

We now consider a generic reaction-diffusion system of type (4.2)

ρ̇ = f (ρ, σ ) + Dρ Δρ
. (4.28)
σ̇ = g(ρ, σ ) + Dσ Δσ

containing two interacting components, .ρ = ρ(x, t) and .σ = σ (x, t), characterized


by respective diffusion constants .Dρ , Dσ > 0. We assume, that the reaction system
.R = (f, g) has a stable homogeneous solution with steady-state densities .ρ0 and .σ0
4.2 Interplay Between Activation and Inhibition 141

respectively, together with the stability matrix


 
fρ fσ
A1 =
. , gρ fσ < 0, fρ + gσ < 0 (4.29)
gρ gσ

which we assume to model an activator-inhibitor system characterized by .gρ fσ < 0.


Here .fρ = ∂f/∂ρ, etc, denotes the relevant partial derivatives. We may always
rescale the fields .ρ and .σ such that .|gρ | = 1 = |fσ | and (4.29) is then identical with
the previously used representation, namely (4.26).

Fourier Expansion of Perturbations The reaction term .R = (f, g) is independent


of of the spatial coordinate, .x. We may hence expand the perturbation of the fields
around the equilibrium state in terms of a spatial Fourier analysis, with

ρ(x, t) = ρ0 + e−ik·x δρ(t)


. (4.30)
σ (x, t) = σ0 + e−ik·x δσ (t)

describing the deviation from the equilibrium state .(ρ0 , σ0 ) by a single Fourier
component. We obtain
     
δ ρ̇ δρ −Dρ k 2 0
. =A , A2 = , (4.31)
δ σ̇ δσ 0 −Dσ k 2

where the overall stability matrix .A = A1 + A2 is given by the linear superposition


of a stable focus .A1 and a stable node .A2 .

Spatio-Temporal Turing instability For concreteness we assume

fρ < 0,
. gσ > 0, 0 > gσ fρ = −|gσ fρ | ,

in analogy with (4.26). The determinant .Δ of A is then

Δ = |fσ gρ | − |fρ | + Dρ k 2 |gσ | − Dσ k 2 .


. (4.32)

A Turing instability occurs when the determinant .Δ becomes negative, which


is possible if the respective diffusion constants .Dρ and .Dσ are different in
magnitude.

– When the determinant (4.32) is reduced, the homogeneous fixpoint solution


changes first from a stable focus to a stable node as evident from expres-
sion (4.25). The instability at .Δ = 0 is hence a transition from a stable focus
to a saddle, as illustrated in Fig. 4.4.
– The contribution .A2 of the diffusion induces the transition and can be hence
regarded as a bifurcation parameter.
142 4 Self Organization

sub-critical
sub-critical
critical
determinant Δ super-critical
super-critical

stable focus

stable node

saddle

wavevector k
Fig. 4.4 The determinant .Δ of the Turing bifurcation matrix, see (4.32), for a range .d = Dσ /Dρ
of ratios of the two diffusion constants. For large .d → 1 the determinant remains positive,
becoming negative for a finite interval of wavevectors k when d becomes small enough. With
decreasing size of the determinant the fixpoint changes from a stable focus (two complex conjugate
eigenvalues with negative real components) to a stable node (two negative real eigenvalues) and to
a saddle (a real positive and a real negative eigenvalue), compare (4.25)

Diffusion processes may act as bifurcation parameters also within other bifurcation
scenarios, like a Hopf bifurcation, as we will discuss in more detail in Sect. 4.2.3.

Mode Competition For a real-world chemical reaction system the parameters


are fixed. A range of Fourier modes with negative determinants (4.32), and
corresponding positive Lyapunov exponents (4.25), will start to grow and compete
with each others. The shape of the steady-state pattern, if any, will be determined in
the end by the non-linearities of the reaction term.

4.2.3 Gray–Scott Reaction Diffusion System

Several known examples of reaction-diffusion models showing instabilities towards


the formation of complex spatio-temporal patterns are characterized by contribu-
tions .±ρσ 2 to the reaction term,

ρ̇ = −ρσ 2 + F (1 − ρ) + Dρ Δρ,
. (4.33)
σ̇ = ρσ 2 − (F + k)σ + Dσ Δσ ,

corresponding to a chemical reaction needing two .σ -molecules and one .ρ-molecule.


The remaining contributions to the reaction term in (4.33) serve for overall mass
balance, which can be implemented in various ways. The replenishment rate for
the reactant .ρ < 1 is determined in (4.33) by F , with .σ particles being lost to
the environment at a rate .K = F + k. A slightly different choice for the mass
conservation terms would lead to the “Brusselator model”, the parametrization used
in (4.33) corresponds to the “Gray–Scott system”.
4.2 Interplay Between Activation and Inhibition 143

Saddle-Node Bifurcation The reaction term of the Gray–Scott system (4.33)


supports three fixpoints .p∗i = (ρi∗ , σi∗ ), .i = 0, 1, 2. The first one is .p∗0 = (1, 0),
with the other two being determined by

ρi∗ σi∗ = K,
. Fρi∗ + Kσi∗ = F, i = 1, 2 , (4.34)

where .K = F + k. The resulting steady-state densities are

∗ 1±a ∗ 1∓a 4K 2
ρ1,2
. = , σ1,2 = F, a2 = 1 − .
2 2K F

The trivial fixpoint .p∗0 has a diagonal Jacobian with eigenvalues .−F and .−K
respectively. It is always stable, even in the presence of diffusion.

∗ merge when .a = 0,
Critical Loss of Particles The two non-trivial fixpoints .ρ1,2
which happens at

F = 4K 2 = 4(k + F )2 ,
. kc = Fc /2 − Fc . (4.35)

We recall that two fixpoints merge via a saddle-node bifurcation,7 which implies
that the two non-trivial fixpoints .ρ1,2∗ cease to exists when the loss rate .K = F + k

of .σ -particles becomes too large. Beyond, .p∗0 = (1, 0) remains as a global attractor.
The critical line .(kc , Fc ) is shown in Fig. 4.5, where the phase diagram of the Gray–
Scott system is presented.

Saddle Fixpoint .p∗1 The Jacobians and determinants of the non-trivial restpoints
.p
∗ are
1,2

  
∗ )2 ) −2K ∗ )2 − F
Δ1,2 = K (σ1,2
−(F + (σ1,2
. ,  . (4.36)
∗ )2 ∗ )2 /F − 1
(σ1,2 K = KF (σ1,2

Inserting the steady-state densities (4.34) leads to


  2
Δ1,2 1∓ 1 − 4γ 2 K
. = − 1, γ = √ ∈ [0, 1/2] . (4.37)
KF 2γ F

The restpoint corresponding to the minus sign in above expression, .p∗1 , is always
a saddle, given that .Δ1 is negative for all .γ 2 ∈ [0, 1/2], viz when .p∗1 exists.
Compare (4.25).

7 Bifurcation theory is discussed in Sect. 2.2.2 of Chap. 2.


144 4 Self Organization

0.1

2
saddle-node bifurcation: F = 4(k+F)
stable focus -> unstable focus
stable node -> stable focus
simulation parameters
F

0.05

0
0 0.05
k
Fig. 4.5 Phase diagram for the reaction term of the Gray–Scott model (4.33). Inside the saddle-
node bifurcation line (solid red line), see (4.35) there are three fixpoints .p∗i (.i = 0, 1, 2), outside
the saddle-node bifurcation line only the stable trivial restpoint .p∗0 is present. .p∗1 is a saddle and .p∗2
a stable node (checkerboard brown area), a stable focus (light shaded green area) or an unstable
focus (shaded brown area), the later two regions being separated by the focus transition line (solid
blue line), as defined by (4.39). Also include is conjunction point .pc = (1, 1)/16 (black filled
circle) and the parameters used for the simulation presented in Fig. 4.7 (open diamonds)

Focus Transition for .p∗2 Eigenvalues are complex when the discriminant .D =
(trace)2 − 4Δ is negative, compare (4.25). For the Gray–Scott system one has
 2
∗ 2 ∗ 2
D = k − (σ1,2
. ) − 4K(σ1,2 ) + 4KF ,

where we used .k = K −F . The discriminant is positive for small and large densities
∗ , but negative in between. At the transition, .p∗ changes from a stable node to a
σ1,2
.
2
stable focus. The explicit analytic expression is not of relevance for the following,
the transition line is however shown in Fig. 4.5.

Stability of .p∗2 The focus .p∗2 changes stability when the trace of its Jacobian (4.36),
 
Kf − Ff − (σ2∗ )2 = 0,
. σ2∗ = Kf − F f = kf , (4.38)

changes sign. The respective focus line .(kf , Ff ) has been included in Fig. 4.5.
Inserting this relation, .σ2∗ = kf , into the fixpoint conditions (4.34)

K K F + (σ2∗ )2
.
∗ = ρ2∗ = 1 − σ2∗ , K = σ2∗ ,
σ2 F F
4.2 Interplay Between Activation and Inhibition 145

0.5 0.5

0.4 0.4

0.3 0.3
σ

σ
0.2 0.2

0.1 0.1

0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
ρ ρ

Fig. 4.6 Flow of the reaction term of the Gray–Scott model (4.33) in phase space .(ρ, σ ), for
.F = 0.01. The stable fixpoint .p∗0 = (1.0) (black filled circle) is globally attracting. The focus
transition (4.39) occurs at .kf = 0.0325 and the saddle-node transition (4.35) at .kc = 0.04 Left:
∗ ∗
.kf < k = 0.035 < kc , with .p1 and .p2 (red circles) being a saddle an unstable focus respectively.
Right: .kc < k = 0.045. The locus of the attractor relict (shaded circle) corresponds to the local
minimum of .q = f2 , see (4.41), which evolves into the bifurcation point for .k → kc

we obtain

(Ff + kf )2 = Ff
. kf , (4.39)

where we also used .K = F + k. One can verify that the Lyapunov exponents are
complex along (4.39), becoming however real for lower values of k, as illustrated in
Fig. 4.5. The endpoint of (4.39), determined by .∂kf /∂Ff = 0, is

pc = (1/16, 1/16) = (0.0625, 0.0625)


. (4.40)

in the .(k, F ) plane, which coincides with the turning point of the saddle-node
line (4.35).

Merging of an Unstable Focus and a Saddle In Fig. 4.6 we illustrate the flow of
the reaction term of the Gray–Scott model for two sets of parameters .(k, F ). The
unstable focus .p∗2 and the saddle .p∗1 annihilate each other for .k → kc . One observes
that the large-scale features of the flow are remarkable stable similar for .k < kc and
.k > kc , as all trajectories, apart from the stable manifolds of the saddle, flow to the

global attractor .p∗0 = (1, 0).

Attractor Relict Dynamics Close to the outside of the saddle-node line .(kc , Fc )
the dynamics slows down when approaching a local minimum of

q(x) = f2 (x),
. ẋ = f(x) , (4.41)

with q being a measure for the velocity of the flow, which vanishes at a fixpoint.
146 4 Self Organization

Fig. 4.7 Dynamical patterns for the Gray–Scott model (4.33). Shown is .ρ(x, y, t), the diffusion
constants are .Dρ /2 = Dσ = 10−5 . The simulation parameters are .(k, F ) = (0.062, 0.03)
(left) and .(k, F ) = (0.06, 0.037) (right), as indicated in the phase diagram, Fig. 4.5 (Illustrations
courtesy P. Bastani)

ATTRACTOR RELICT A local, non-zero minimum of .q(x), close to a bifurcation point,


and which turns into a fixpoint at the bifurcation, is denoted an “attractor relict” or a “slow
point”.

Beyond the transition, for .k > kc , the attractor relict still influences the flow
strongly, as indicated in Fig. 4.6, determining a region in phase space where the
direction of the flow turns sharply.

Dynamical Pattern Formation The Gray–Scott system shows a wide range of


complex spatio-temporal structures close to the saddle-node line, as illustrated in
Fig. 4.7, ranging from dots growing and dividing in a cell-like fashion to more
regular stripe-like patterns.
The generation of non-trivial pattern occurs even, though not exclusively, when
only the trivial fixpoint .(1, 0) exists, with
 
−F 0
A1 =
. (4.42)
0 −(F + k)

being the respective Jacobian. There is no way to add a diffusion term .A2 , compare
expression (4.31), such that .A1 + A2 would have a positive eigenvalue. Pattern
formation in the Gray–Scott system is hence not due to a Turing instability.
4.3 Collective Phenomena and Swarm Intelligence 147

4.3 Collective Phenomena and Swarm Intelligence

When a system is composed out of many similar or identical constituent, parts,8 their
mutual interaction may give rise to interesting phenomena. Several distinct concepts
have been developed in this context, each carrying its own specific connotation.

– Collective Phenomena
Particles like electrons obey relatively simple microscopic equations of motions,
like Schrödinger’s equation, interacting pairwise. Their mutual interactions may
lead to phase transitions and to emergent macroscopic collective properties,
like superconductivity or magnetism, not explicitly present in the underlying
microscopic description.
– Emergence
At times heavily loaded with philosophical connotations, “weak emergence” is
equivalent to collective behavior, e.g. such as occurring in physics. There is
also “strong emergence”, which denotes the emergence of higher-level properties
which cannot be traced back causally to microscopic laws. Strong emergence is
a popular playball in philosophy, it transcends however the realm of scientific
investigations.
– Self Organization
When generic generative principles give rise to complex behavior for a wide
range of environmental and/or starting conditions, one speak of self organization
in the context of complex systems theory. The resulting properties may be
interesting, biologically relevant or emergent.
– Swarm Intelligence
In biological settings, with the agents being individual animals, one speaks of
swarm intelligence whenever the resulting collective behavior is of behavioral
relevance. Given that evolutionary optimized behavior is performance oriented,
one can use the principles underlying swarm intelligence as an algorithm to solve
certain computational problems.

Collective phenomena arise, loosely speaking, when “the sum is more than the
parts”, just as mass psychology transcends the psychology of the constituent
individuals.

4.3.1 Phase Transitions in Social Systems

Phase transitions occur in many physical systems,9 being of central importance also
in biology, climatology and sociology. A well known psychological phenomenon is,

8A prototypical example are the boolean networks discussed in Chap. 7.


9 The generic framework for phase transitions„ the Ginzburg-Landau theory, is developed in
Sect. 6.1 of Chap. 6.
148 4 Self Organization

in this context, the transition from normal crowd behavior to collective hysteria. As
a basic example we consider here the nervous rats problem.

Calm and Nervous Rats Consider N rats constrained to live in an area A, with an
overall population density .ρ = N/A. There are .Nc calm and .Nn nervous rats with
.Nc + Nn = N, together with the respective densities .ρc = Nc /A and .ρn = Nn /A.

Comfort Zone Each rat has a zone .a = π r 2 around it, with r being a characteristic
radius. A calm rat will get nervous if at least one nervous rat comes too close,
entering its comfort zone a, with a nervous rat calming down when having its
comfort zone all for itself.

Master Equation The time evolution for the density of nervous rats is given by

ρ̇n = Pc→n ρc − Pn→c ρn ,


. (4.43)

with transition probabilities


 a Nn  a N −1
Pc→n = 1 − 1 −
. , Pn→c = 1 − , (4.44)
A A
since .1 − a/A is the probability for a rat being out of a given comfort zone. A calm
rat become nervous when there is at least another nervous rat in her comfort zone,
calming down when being all alone.

Thermodynamic Limit For constant comfort areas a we make take with


 a ρn A  a ρA
. lim 1− = e−ρn a , lim 1− = e−ρa
A→∞ A A→∞ A

the thermodynamic limit .A → ∞ for (4.44), where we made use of .(N − 1) ≈ N ,


Nn = ρn A and .N = ρA.
.

Stationary Solution For the stationary solution .ρ̇n = 0 of the master equa-
tion (4.43) we obtain

. 1 − e−ρn a ρc = e−ρa ρn , ρc = ρ − ρn ,

which has a trivial solution .ρn = 0 for all population densities .ρ and comfort
zones a. Multiplying with a yields

σn = eσ 1 − e−σn (σ − σn ),
. σ = ρa, σn = ρn a , (4.45)

where the dimensionless densities .σ and .σn correspond to the average numbers of
rats within a comfort area a.
4.3 Collective Phenomena and Swarm Intelligence 149

Fig. 4.8 Solution of the


nervous rats problem (4.43).
Shown is the left- and the σ > σ*
right-hand side of the 0.2

self-consistency condition σ = σ*
(4.45), for various numbers of
rats .σ in the comfort zone a, 0.1
as function of the average σ < σ*
number of nervous rats .σn per
a. The transition occurs at .σ ∗ 0
0 0.1 0.2
(dashed line) σn

Critical Rat Density The graphical representation of (4.45) is given in Fig. 4.8. A
non-trivial solution .σn > 0 is possible only above a certain critical number .σ ∗ of
rats per comfort zone. At .σ = σ∗ the right-hand side of (4.45) has unitary slope
with respect to .σn , for .σn → 0,

1 = eσ σ ∗ ,
. σ ∗ ≈ 0.56713 , (4.46)

with .σ ∗ corresponding to roughly half-a-rat per comfort zone. A finite fraction


of nervous rats is present roughly whenever a rat has less than two comfort
areas for itself, whereas all rats calm eventually down whenever the average
population density .ρ is smaller than .ρ ∗ = σ ∗ /a. Note, that one can rewrite (4.46)
approximatively as

σ ∗ = e−σ ≈ 1 − σ ∗ ,
. σ ∗ ≈ 1/2 ,

which yield a first-order estimate for .σ ∗ .

4.3.2 Collective Decision Making and Stigmergy

Herding animals and social insects are faced at times with the problem of taking
decisions collectively. When selecting a new nest for swarming honey bees or a
good foraging site for ants, no single animal will compare two prospective target
sites. Individual opinions regarding the quality of prospect sites are instead pooled
together into groups of alternative decisions with a competitive dynamical process
reducing the number of competing opinions until a single target site remains.

Swarming Bees Bees communicate locally by dancing, both when they are in the
nest and communicate prospective foraging sites as well when a new queen leaves
the old nest together with typically about 10.000 workers, searching for a suitable
location for building a new nest.
The swarm stays huddled together with a small number of typically .5% of bees
scouting in the meantime for prospective new nest sites. Scouts coming back to
150 4 Self Organization

Fig. 4.9 Illustration of


binary decision making for
foraging ants starting from
the nest (to the left). The
colony needs to decide which
of the two food sources
(upper/lower paths) is more
profitable

the waiting swarm will advertise new locations they found by dancing, with the
duration of the dance being proportional to the estimated quality of the prospective
nest location.
New scouts ready to fly will observe the dancing returning scouts. The proba-
bility that departing scouts will target advertised sites is consequently proportional
to their quality. This mechanism leads to a competitive process with lower quality
prospective nest sites receiving fewer consecutive visits by scouting bees.
Opinion pooling is inherent in this process as there are many scout bees flying
out for site inspection. The whole consensus process is an extended affair, taking up
to a few days.

Foraging Ants Social animals gain in survival efficiency by making collective


decisions when it comes to key tasks. A celebrated example are ant colonies which
profit from exploiting the richest food sources available. The corresponding decision
problem is illustrated schematically in Fig. 4.9.
The quality of a food resource is given by its distance to the nest and its utility
for the species. When the ant returns to the nest it lays down a path of pheromones,
with the amount of pheromone released being proportional to the quality of the food
site discovered.
Ants leaving the nest in the search for food will tend to follow the pheromone
gradient and such arrive predominantly to high quality food sites. Returning they
will reenforce the existing pheromone path with an intensity appropriate to their
own assessment of the food site. With a large number of ant going forth and back
this process will lead to an accurate weighting of foraging sites, with the best site
eventually winning in the end over all other prospective food resources.

Information Encoding and Stigmergy Bees and ants need an observable scalar
quantity in order to exchange information about the quality of prospective sites.
This quantity is time for the case of bees and pheromone intensity for the ants.

STIGMERGY When information is off-loaded to the environment for the purpose of


communication one speaks of “stigmergy”.
4.3 Collective Phenomena and Swarm Intelligence 151

Producing pheromone traces ants manipulate the environment for the purpose of
information exchange.

Ant and Pheromone Dynamics For the binary decision process of Fig. 4.9 we
denote with .ρ1,2 the densities of travelling ants along the two routes, with .ϕ1,2 and
.Q1,2 the respective pheromone densities and site qualities. The process

T ρ̇1 = (ρ − ρ1 )ϕ1 − Ψ ϕ̇1 = −Γ ϕ1 + Q1 ρ1


. (4.47)
T ρ̇2 = (ρ − ρ2 )ϕ2 − Ψ ϕ̇2 = −Γ ϕ2 + Q2 ρ2

models the dynamics of .ρ = ρ1 + ρ2 ants foraging with T setting the real-world


time needed for the foraging and .Γ being the pheromone decay constant. The update
rates .ρ̇i for the individual number of ants in (4.47) are relative to an average update
rate .Ψ/T .

Ant Number Conservation The updating rules .ρ̇1,2 need to obey conservation of
the overall number .ρ = ρ1 + ρ2 of ants. This can be achieved by selecting the flux
.Ψ in (4.47) appropriately as

2Ψ = ρ(ϕ1 + ϕ2 ) − ρ1 ϕ1 − ρ2 ϕ2 2T ρ̇1 = ρ2 ϕ1 − ρ1 ϕ2
. , (4.48)
= ρ2 ϕ1 + ρ1 ϕ2 2T ρ̇2 = ρ1 ϕ2 − ρ2 ϕ1

which results from demanding .ρ̇1 + ρ̇2 to vanish.

Pheromone Conservation Considering with .Φ = Q2 ϕ1 + Q1 ϕ2 a weighted


pheromone concentration, we find

Φ̇ = −Γ Φ + Q1 Q2 (ρ1 + ρ2 ),
. Φ → ρ Q1 Q2 /Γ ,

when using (4.47). Any initial weighted pheromone concentration .Φ will hence
relax fast to .ρQ1 Q2 /Γ . It is hence enough to consider the time evolution in the
subspace spanned by .ϕ2 = (Φ − Q2 ϕ1 )/Q1 ,

2T ρ̇1 = (ρ − ρ1 )ϕ1 − ρ1 (Φ − Q2 ϕ1 )/Q1


. , (4.49)
ϕ̇1 = −Γ ϕ1 + Q1 ρ1

which leaves us with two dynamical variables, .ρ1 and .ϕ1 .

Binary Decision Making here are two fixpoints of (4.49), given by

ρ1 = ρ,
. ρ2 = 0, ϕ1 = Q1 ρ/Γ, ϕ2 = 0 ,
152 4 Self Organization

and viceversa, with .1 ↔ 2. The Jacobian of (4.49) for the fixpoint .ρ1 = 0 = ϕ1 is
 
−Φ/Q1 ρ
. , Δ = ΦΓ /Q1 − Q1 ρ = (Q2 − Q1 )ρ ,
Q1 −Γ

when setting .2T = 1 for simplicity. The trace .−(Γ + Φ/Q1 ) of the Jacobian
is negative and the fixpoint is hence stable/unstable, compare (4.25), when the
determinant .Δ is positive/negative, hence when .Q2 > Q1 and .Q1 > Q2
respectively.
The dynamics (4.49) hence leads to binary decision process with all ants
proceeding in the end along the path with the higher quality factor .Qj .

Ant Colony Optimization Algorithm One can generalize (4.48) to a network of


connections with the quality factors .Qi being proportional to the inverse travelling
times. This setup, the “ant colony optimization algorithm”, has a close connection
to the travelling salesman problem. It can be used to find shortest paths on networks.

4.3.3 Collective Behavior and Swarms

In a large body of moving agents, like a flock of birds in the air, a school of fishes in
the ocean, or cars on a motorway, agents will adapt individually their proper velocity
according to the perceived positions and movements of other close-by agents. For
modelling purposes one can consider the behavior of the individual agents, as we
will do. Alternatively, for a large enough number of agents one may also use a
hydrodynamic description together with a corresponding master equation for the
density of agents.

Individual Decision Making For a swarm of birds, the individual birds take their
decisions individually, an explicit mechanism for collective decision making is not
present, f.i. regarding the overall direction the swarm should be heading to. The
resulting group behavior may nevertheless have high biological relevance, such as
predator avoidance.

Newton’s Equations for Birds The motion of .i = 1, . . . , N birds with positions


xi and velocities .vi can be modelled in analogy to the dynamics of point masses. An
.

example is

ẋi = vi ,
. (4.50)
 
v̇i = γ (vi0 )2 − (vi )2 vi + f(xj , xi ) + g(xj , xi |vj , vi ) ,
j /=i j /=i
4.3 Collective Phenomena and Swarm Intelligence 153

Fig. 4.10 Examples of two V(x)


typical Mexican–hat gaussian
potentials (4.51), as generated exponential
by superposing two Gaussian
or two exponentials. Also
indicated are the respective
positions of maximal positive x
slope (thin vertical lines)

with the first term in .v̇i modelling the preference for moving with a preferred
velocity .vi0 . The collective behavior is generated by the pairwise interaction terms
.f and .g. Reacting to observations needs time and time delays are inherently present

in .f and .g.

Distance Regulation Animals and humans alike have preferred distances. The
regulation of the distance to neighbors, to the side, front, back, above and below,
is job of the inter-agent force .f (xj , xi ), which may be taken as the derivative of a
“Mexican hat potential” .V (z),

f (xj , xi ) = f (xj − xi ) = −∇V (|xj − xi |)


. , (4.51)
V (z) = A1 κ(z/R1 ) − A2 κ(z/R2 )

where .κ(z) is a monotonically decreasing function of its argument, such as an


exponential .∼ exp(−z), Gaussian .∼ exp(−z2 ), powerlaw .∼ 1/zα . For a suitable
selection of the amplitudes .Ai and of the characteristic distances .Ri one can achieve
that the potential .V (z) is repelling at short distances, having at the same time a stable
minimum, as shown in Fig. 4.10.

Alignment of Velocities The function .g(xj , xi |vj , vi ) in (4.50) expresses the


tendency to align one’s own velocity with the speed and the direction of the
movement of other nearby members of the flock. A suitable functional form
would be

g(xj , xi |vj , vi ) = vj − vi Av κ(|xj − xi |/Rv ) ,


. (4.52)

with the tendency to align vanishing both for identical velocities .vj and .vi and for
large inter-agent distances .|xj − xi | ⪢ Rv .

Hiking Humans Numerical simulation of equations like (4.50) describe nicely the
observed flocking behavior of birds and schools of fishes. For a specific application
154 4 Self Organization

we ask when a group of humans hiking along a one-dimensional trail will break
apart due to differences in the individual preferred hiking speeds .vi0 .

⇐ ⇐ ⇐
x0 , v0 x1 , v1 x2 , v2
.

The distance alignment force .f is asymmetric when walkers pay attention only to the
person in front, but not the one next in line. In the simplest setting we may assume
furthermore that .g vanishes, namely that the hikers are primarily concerned to walk
close to their own preferred velocities .vi . The group stays together in this case if
everybody walks with the speed .v ≡ v0 of the leader, we consider here the case that
.vi < v0 for .i = 1, 2, ...

The restraining potential illustrated in Fig. 4.10 has a maximal positive slope
.fmax when the walker lags behind, which implies that a maximal speed difference

.v0 − vi exists, as determined through

.γ (vi0 )2 − (v)2 v = −fmax , (vi0 )2 = (v)2 − fmax /(γ v) .

The group is more likely to break apart when the desire .γ to walk with one’s own
preferred velocity .vi0 is large. An analogous argument holds for .vi > v0 , for which
the maximal negative slope of .V (z) would enter above expression.

4.3.4 Opinion Dynamics

Opinion dynamics research deals with agents having real-valued opinions .xi , which
may change through various processes, like conviction or consensus building.

Bounded Confidence We consider a basic process for consensus building. When


meeting, two agents .(xi , xj ) agree on the consensus opinion .x̄ij = (xi + xj )/2,
however only when their initial opinions are not too different,

(x̄ij , x̄ij ) if |xi − xj | < d
(xi , xj )
. → . (4.53)
(xi , xj ) if |xi − xj | > d

The two agents do not agree to a common opinion if they initially differ beyond the
confidence bound d. As a consequence, distinct sets of attracting opinions tend to
form, as illustrated in Fig. 4.11.

Master Equation For large populations of agents we may define with .ρ(x, t) the
density of agents having opinion x, with the time evolution given by the master
4.3 Collective Phenomena and Swarm Intelligence 155

Fig. 4.11 For a confidence starting


10
interval .d = 0.1, a simulation 1 million

log(1+(nunber of agents))
4 million
of .60 · 103 agents with 16 million
64 million
.1//4/16/64 million pairwise
updatings of type (4.53).
Opinion attractors with a 5
large number of supporters
tend to stabilize faster than
attracting states containing
smaller number of agents
0
0 0.2 0.4 0.6 0.8 1
opinion

equation
 d/2  d
τ ρ̇(x) = 2
. ρ(x + y)ρ(x − y)dy − ρ(x)ρ(x + y)dy , (4.54)
−d/2 −d

with .τ setting the time scale of the consensus dynamics and with the time
dependency of .ρ(x, t) being suppressed. The first term results from two agents
agreeing on x, the second term describes the flow of agents leaving opinion x by
agreeing with other agents within the confidence interval d.

Infinitesimal Confidence It is possible to turn (4.54) into an ordinary differential


equation. For this purpose we take the limit .d → 0 via a Taylor expansion

y2
ρ(x + y) ≈ ρ(x) + ρ ' (x)y + ρ '' (x)
. + ... . (4.55)
2

Substituting (4.55) into (4.54) one notices that the terms proportional to .y 0 cancel, as
the overall number of agents is conserved. The terms .∼ y 1 also vanish by symmetry
and we obtain
 d/2    d y2
ρρ '' − ρ ' y 2 dy − ρρ ''
2
τ ρ̇ = 2
. dy
−d/2 −d 2
  3 d3 d 3  '' 
2 1d
= 4 ρρ '' − ρ ' − ρρ '' ρρ + ρ '
2
=− .
3 8 3 6

With .∂ 2 ρ 2 /∂ 2 x = 2[ρ ' ρ ' + ρρ '' ] we find

∂ρ d 3 ∂ 2ρ2
. =− , ρ̇ = −Do Δρ 2 , (4.56)
∂t 12τ ∂ 2 x

with .Δ denoting the Laplace operator in one dimension. .ρ 2 enters the evolution
equation as two agents need to interact for the dynamics to proceed.
156 4 Self Organization

One can keep .Do = d 3 /(12τ ) in (4.56) constant in the limit .d → 0 by


appropriately rescaling the time constant .τ .10

Opinion Current Recasting (4.56) in terms of the continuity equation

ρ̇ + ∇ · jo = 0,
. jo = Do ∇ρ 2 , (4.57)

defines the opinion current .jo . With .D0 and .ρ being positive this implies that the
current strictly flows uphill, an example of a “rich gets richer dynamics” which is
evident also in the simulation shown in Fig. 4.11.

Voter Models Large varieties of models describing opinion dynamics have been
examined, such as networks of agents having a discrete number of opinions, like 0/1
for conservative/progressive, which can be described by “voter models”.11 Voters
change opinion when being ‘infected’ by the opinion of one or more neighbors.
The properties of the underlying social network, like the clustering coefficient, will
determine the overall outcome.

4.4 Car Following Models

The flow of cars on a motorway can be modelled by “car following models”, akin
to the bird flocking model discussed in Sect. 4.3.3. Of central importance is here the
interplay between velocity dependent forces and human reaction times.

Chain of Cars We denote with .xj (t) the position of cars on a one-dimensional
motorway, .j = 0, 1, .. . The acceleration .ẍj of the j th car is given by

.ẍj +1 (t + T ) = g(xj , xi |ẋj , ẋi ) = λ ẋj (t) − ẋj +1 (t) , (4.58)
j /=i

where .λ is a reaction constant and T the reaction time.12 Actions are delayed by T
relative to the perception of sensory input. Drivers tend to break when closing in to
the car in the front and to accelerate when the distance grows. In car following
models, using the notation of (4.50), one considers mostly velocity-dependent
forces.

10 An analogous rescaling is done when deriving the diffusion equation .ṗ = DΔp,as discussed in
Sect. 3.3.1 of Chap. 3.
11 The classical voter model is treated in exercise (4.8).
12 The general theory of dynamical systems with time delays is developed in Sect. 2.5 of Chap. 2.
4.4 Car Following Models 157

4.4.1 Linear Flow and Carrying Capacity

One of the most important questions for traffic planning regards the carrying
capacity of a road, namely the maximal possible number of cars per hour, q. When
all cars advance with identical velocities u, the optimal situation, one would like to
evaluate .q(u).

Carrying Capacity for the Linear Model Integrating the linear model (4.58) and
assuming a steady state, identical velocities .ẋj ≡ u and inter-vehicle distances .xj −
xj +1 ≡ s, one obtains

ẋj +1 = λ[xj − xj +1 ] + c0 ,
. u = λ(s − s0 ) , (4.59)

where we did write the constant of integration as .c0 = −λs0 , with .s0 being the
minimal allowed distance between the cars. The carrying capacity q, the number
of cars per time transiting, is given by the product of the mean velocity u and the
density .1/s of cars,
u  s0 u u
q=
. =λ 1− = , s = s0 + , (4.60)
s s s0 + u/λ λ

where we expressed q either as a function of the inter-vehicle distance s, or as


a function of the mean velocity u. Above expression suggests that the carrying
capacity should be maximal for empty streets, viz when .s → ∞, dropping
monotonically with increasing traffic until the maximal density .1/s0 of cars is
reached.

Maximal Velocity The linear model (4.58) cannot be correct, as the velocity .u =
λ(s − s0 ), as given by (4.59), would diverge for empty highways, namely when the
inter-vehicle distances diverge, .s → ∞. The circumstance that real-world cars have
an upper velocity .umax implies that the carrying capacity must vanish instead as
.umax /s for large inter-vehicle spacings s.

A natural way to overcome this deficiency of the basic model (4.58) would be to
consider terms, like for the bird flocking model (4.50), expressing the preference
to drive with a certain velocity. An alternative venue, pursued normally when
modelling traffic flow, is to consider non-trivial distance dependencies for the
velocity dependent force .g(xj , xi |ẋj , ẋi ).

Non-Linear Model Driving a car one reacts stronger when the car in front is closer,
an observation which can be modeled as
ẋj (t) − ẋj +1 (t)
ẍj +1 (t + T ) = λ
. , α > 0, (4.61)
[xj (t) − xj +1 (t)]1+α
158 4 Self Organization

Fig. 4.12 For .λ = 3, the α=1


number of cars per hour linear model
passing on a highway for the
linear (4.60), and for the

cars / time
non-linear car following
model (4.63). The carrying
capacity vanishes when the
road is congested, and the
mean velocity vanishes,
.u → 0. Arbitrary large
cruising velocities are
possible for the linear model
in the limit of empty streets
0 umax
velocity

where we postulated a scale-invariant dependency of the reaction strength with


respect to the inter-particle distance .xj − xj +1 . Integrating (4.61) one obtains, in
analogy to (4.59),
 
λ λ 1 1
ẋj +1 =
. [xj − xj +1 ]−α + d0 , u= − , (4.62)
−α α s0α sα

with .s0 denoting again the minimal inter vehicle distance and .d0 = λ/(αs0α ). For
.α > 0 the mean velocity u takes a finite value

λ λ 1 α 1/α
umax =
. , u = umax − , = (umax − u)1/α
αs0α αs α s λ

for near to empty streets with .s → ∞. The carrying capacity .q = u/s is then given
by the parabola

q(u, α = 1) = u(umax − u)/λ


. (4.63)

for .α = 1, as illustrated in Fig. 4.12. The traffic volume is maximal for an


intermediate mean velocity u, in accordance with daily observations.

4.4.2 Self-Organized Traffic Congestions

A steady flow of cars with .ẋj (t) ≡ u may become unstable when fluctuations
propagate along the line of vehicles, a phenomena that can induce traffic congestion
even in the absence of external influences, such as an accident.

Moving Frame of Reference The linear car following model (4.58) is the minimal
model for analyzing the dynamics for intermediate to high densities of vehicles. We
are interested in the evolution of the relative deviations .zj = xi /u − 1 from the
4.4 Car Following Models 159

steady-state velocity u,

żj +1 (t + T ) = λ zj − zj +1 ,
. ẋi (t) = u 1 − zi (t) , (4.64)

which corresponds to a moving frame of reference.

Slow Perturbations The steady-state configuration .ẋi (t) ≡ u is perturbed when


the leading car changes its cruising speeds, e.g. via

z0 (t) → eγ t ,
. γ = 1/τ + iω τ >0. (4.65)

For evaluating the exact time evolution of the following cars one would need to
integrate (4.64) piecewise, step by step for the intervals .t ∈ [nT , (n + 1)T ].
The situation simplifies when assuming that the delay interval T is substantially
smaller than the timescale of the perturbation, .τ . In this case the system will follow
the behavior of the lead car smoothly.

Recursion We make the ansatz that the column follows the leading car with the
same time-dependency, as given by (4.65), albeit with individual amplitudes .aj ,

zj (t) = aj eγ t ,
. γ eγ T aj +1 = λ(aj − aj +1 ) .

Solving for .aj +1 one obtains a linear recursion,


 n
λ λ
aj +1
. = aj , an = a0 . (4.66)
λ + γ eT γ λ + γ eT γ

The recursion is stable for real exponents .γ = 1/τ .

Delay Induced Instabilities We consider harmonic oscillations of the lead car


velocity, which corresponds to imaginary exponents .γ = iω in (4.65). Evaluating
the norm
 2  2
    2
. λ + γ eT γ  = λ + iωeiT ω  = λ − ω sin(T ω) + ω2 cos2 (T ω)

we obtain
   1/2
 λ  λ2
 
.
 λ + γ eT γ  = λ2 + ω2 − 2λω sin(T ω) (4.67)

for the norm of the prefactor in the recursion (4.66). The recursion is unstable if

λ2 + ω2 − 2λω sin(T ω) < λ2 ,


. ω < 2λ sin(T ω) . (4.68)
160 4 Self Organization

Suitable large time delays, with .T ω ≈ π/2, induce an instability for any values of
λ and .ω.13 Non-constant driving may induce traffic jams.
.

Self-Organized Traffic Jams A traffic instability occurs also in the limit of


infinitesimal small frequencies .ω → 0, namely when

1/(2λ) < T ,
. sin(T ω) ≈ T ω . (4.69)

When reactions are strong and slow (.λ and T large), self-organized traffic jams are
likely to emerge.

Propagating Perturbations Lastely, we examine how perturbations grow when


γ = 1/τ is real. The deviations .zn (t) from the steady state are
.

λτ
zn (t) = a0 C n et/τ = a0 en log(C) et/τ ,
. C= .
λτ + eT /τ

We are interested in the speed v characterizing the propagation of the perturbation


along the line of cars,

−1
zn (t) = a0 e(n−vt) log(C) ,
. v= . (4.70)
τ log(C)

The speed is positive, .v > 0, since .0 < C < 1 and .log(C) < 0. Large reaction
times T limit propagation speed, since

. log(C) → −T /τ, v → 1/T

in the limit .T ⪢ τ .

Exercises

(4.1) INSTABILITY OF EULER’S METHOD


Euler’s method is in general not suitable for numerically integrating partial
differential equations. Discretizing in space and time the diffusion Eq. (4.1) in
one dimension reads
ρ(x, t + Δt) − ρ(x, t) ρ(x + Δx, t) + ρ(x − Δx, t) − 2ρ(x, t)
. =D .
Δt (Δx)2

13 This result is in accordance with our discussion in Sect. 2.5 of Chap. 2, regarding the influence
of time delays in ordinary differential equations.
Exercises 161

Prove that the resulting explicit time evolution map becomes unstable for
DΔt/(Δx)2 > 1/2 by considering ρ(x, 0) = cos(π x/Δx) as a particular
initial state.
(4.2) EXACT PROPAGATING WAVEFRONT SOLUTION
Find the reaction-diffusion system (4.2) for which the Fermi function

1
.ρ ∗ (x, t) =
1 + eβ(x−ct)

is an exact particular solution for a solitary propagating wavefront. Determine


the reaction term R(ρ) by evaluating the derivatives of ρ ∗ . Consider appropri-
ate special values for β and for the propagation velocity c.
(4.3) LINEARIZED FISHER EQUATION
Consider the modified reaction-diffusion system

2
ρ̇ = ρ(1 − ρ) + ρ '' +
. ρ ' )2 (4.71)
1−ρ

and show that it is equivalent to the linearized Fisher equation (4.18) using the
transformation ρ = 1 − 1/u.
(4.4) TURING INSTABILITY WITH TWO STABLE NODES
Is it possible that the matrix A1 entering the Turing instability and defined
in (4.26), with positive ϵa , ϵb > 0, describes a stable node with both λ± <
0? If yes, show that the superposition of two stable nodes may generate an
unstable direction.
(4.5) EIGENVALUES OF 2 × 2 MATRICES
Use the standard expression (4.25) for the eigenvalues of a 2 × 2 matrix
and show that local maxima of the potential V (x) of a one-dimensional
mechanical system

ẋ = y,
. ẏ = −λ(x)y − V ' (x)

with a space-dependent damping factor λ(x) are always saddles.


(4.6) LARGE DENSITIES OF NERVOUS RATS
Evaluate, using the stationarity condition (4.45), the number of nervous rats
σn per comfort zone for large values of σ . When do all rats become nervous?
(4.7) AGENTS IN THE BOUNDED CONFIDENCE  MODEL
Show that the total number of agents ρ(x)dx is conserved for the bounded
confidence model (4.54) for opinion dynamics.
(4.8) CLASSICAL VOTER MODEL
The basic asynchronous update step,
“A random voter adapts the opinion of a random neighbor.”
is repeated until consensus is reached. For a state with N± voters having
voting preferences σi = ±1, one defines the magnetization m = (N+ −
162 4 Self Organization

N− )/(N+ + N− ). Show that m is conserved on the average and that the


σi ≡ +1 consensus is reached with probability (1 + m)/2.

Further Reading

The interested reader is encouraged to take a look at Ellner et al. (2011) for an
in-depth treatise on mathematical modelling in biology and ecology, at Walgraef
(2012) and Landge et al. (1920) for the generic mathematics of pattern formation in
reaction-diffusion systems, and to Petrovskii et al. (2010) for a discussion of exactly
solvable models in the field.
We further suggest Hassanien et al. (2018) for a comprehensive textbook on
swarm intelligence, Kerner (2004) and Jusup et al. (2022) for traffic modelling and
for reviews on opinion dynamics and flocking behaviors. A review of voter models
is given in Redner (2019).

References
Ellner, S. P., & Guckenheimer, J. (2011). Dynamic models in biology. Princeton University Press.
Hassanien, A. E., & Emary, E. (2018). Swarm intelligence: Principles, advances, and applications.
CRC Press.
Jusup, M., et al. (2022). Social physics. Physics Reports, 948, 148.
Kerner, B. S. (2004). The physics of traffic: Empirical freeway pattern seatures, engineering
applications, and theory. Springer.
Landge, A. N., Jordan, B. M., Diego, X., & Müller, P. (2020). Pattern formation mechanisms of
self-organizing reaction-diffusion systems. Developmental Biology, 460, 460.
Petrovskii, S. V., & Bai-Lian, L. (2010). Exactly solvable models of biological invasion. CRC
Press.
Redner, S. (2019). Reality-inspired voter models: A mini-review. Comptes Rendus Physique, 20,
275–292.
Walgraef, D. (2012). Spatio-temporal pattern formation: with examples from physics, chemistry,
and materials science. Springer Science & Business Media.
Information Theory of Complex Systems
5

What do we mean, when we say that a given system shows “complex behavior”,
can we provide precise measures for the degree of complexity? This chapter offers
an account of several common measures of complexity together with the relation of
complexity to predictability and emergence.
Following a self-contained introduction to information theory and statistics,
we will learn about probability distribution functions, Bayesian inference, the
law of large numbers, and the central limit theorem. Next, Shannon entropy and
mutual information will be discussed, two concepts that play central roles both in
the context of time series analysis, and as starting points for the formulation of
quantitative measures of complexity. The chapter concludes with a short overview
regarding generative approaches to complexity.

5.1 Probability Distribution Functions

Statistics is ubiquitous in everyday life. We are used to chat, e.g. about the
probability that our child will have blue or brown eyes, the chances to win a lottery,
or those of a candidate to win the presidential elections. Statistics is ubiquitous
also throughout the realms of science. Indeed, basic statistical concepts are used
abandonedly all over these lecture notes.

Variables and Symbols Probability distribution functions may be defined for


continuous or discrete variables, as well as for sets of symbols,

x ∈ [0, ∞],
. xi ∈ {1, 2, 3, 4, 5, 6}, α ∈ {blue, brown, green} .

For example, we may define with .p(x) the probability distribution of human life
expectancy x, with .p(xi ) the chances to obtain .xi when throwing a dice, or with
.p(α) the probability to meet somebody having eyes of color .α. Probabilities are

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 163
C. Gros, Complex and Adaptive Dynamical Systems,
https://doi.org/10.1007/978-3-031-55076-8_5
164 5 Information Theory of Complex Systems

positive definite and the respective distribution functions normalized,


 ∞  
. p(x), p(xi ), p(α) ≥ 0, 1= p(x) dx = p(xi ) = p(α) .
0 i α

The notation used for a given variable will indicate in the following its nature, i.e.
whether it is a continuous or discrete variable, or denoting a symbol. For continuous
variables the distribution .ρ(x) represents a probability density function (PDF).

Continuous vs. Discrete Stochastic Variables When discretizing a stochastic


variable, e.g. when approximating an integral by a Riemann sum,
 ∞ ∞

. p(x) dx ≈ p(xi ) Δx, xi = Δx (0.5 + i) , (5.1)
0 i=0

the resulting discrete distribution function .p(xi ) is not any more normalized; the
properly normalized discrete distribution function is .p(xi )Δx. The two notations,
.pi and .p(xi ), are both used for discrete distributions.
1

Mean, Median and Standard Deviation Common symbols for the average .〈x〉
are .μ and .x̄. Average and standard deviation .σ are given by
 
〈x〉 =
. x p(x) dx, σ2 = (x − x̄)2 p(x) dx . (5.2)

Mean and expectation value are synonyms for .x̄, with .σ 2 being the variance.2 For
everyday life situations the median .x̃, defined by
 
1
. p(x) dx = = p(x) dx , (5.3)
x<x̃ 2 x>x̃

is somewhat more intuitive than the mean. We have a 50 % chance to meet somebody
being smaller/taller than the median height.

1 The expression .p(xi ) is context specific and can denote both a properly normalized discrete
distribution function as well as the value of a continuous probability distribution function.
2 In formal texts on statistics and information theory, the notation .μ = E(X) is used, where X

stands for an abstract random variable, with x denoting a particular value and .pX (x) the probability
density.
5.1 Probability Distribution Functions 165

1 0.4
exp(-t)

0.5

0
0 ln(2) 1 2 3 t 4 -4 -3 -2 -1 0 1 2 3 x 4

Fig. 5.1 Left: For an average waiting time .T = 1, the exponential distribution .exp(−t/T )/T .
With 50% probability waiting times are below the the median .ln(2)
√ (shaded area). Right: For a
standard deviation .σ = 1, the normal distribution .exp(−x 2 /2)/ 2π . The probability to draw a
result within one/two standard deviations of the mean, .x ∈ [−1, 1] and .x ∈ [−2, 2] respectively
(shaded regions), is 68 and .95%

Exponential Distribution A first example is the exponential distribution, which


describes, e.g. the distribution of waiting times for radioactive decay,
 ∞
1 −t/T
p(t) =
. e , p(t) dt = 1 . (5.4)
T 0

The mean waiting time is


 ∞ ∞  ∞
1 −t/T 
.〈t〉 = te dt = −t e−t/T  + e−t/T dt = T .
T 0 0 0

Median .t˜ and standard deviation .σ are evaluated readily as

t˜ = T ln(2),
. σ =T.

In 50 % of times one has to wait less than .t˜ ≈ 0.69 T , which is smaller than the
average waiting time T , compare Fig. 5.1.

Standard Deviation and Bell Curve The standard deviation .σ measures the size
of the fluctuations around the mean. The standard deviation is especially intuitive
for the Gaussian distribution
2
1 − (x−μ)
.p(x) = √ e 2 2σ , 〈x〉 = μ, 〈(x − x̄)2 〉 = σ 2 . (5.5)
2π σ 2

“Gaussian”, “Bell curve”, and “normal distribution” all denote (5.5). Bell curves
are ubiquitous in daily life, characterizing cumulative processes, as detailed out in
Sect. 5.1.1.
Gaussians falls off rapidly with distance from the mean .μ, compare Fig. 5.1.
The probability to draw a value within n standard deviation of the mean, viz the
166 5 Information Theory of Complex Systems

probability that .x ∈ [μ − nσ, μ + nσ ], is .68, 95, 99.7% for .n = 1, 2, 3. These


numbers are valid only for Gaussians, not for general probability distributions.

Probability Generating Functions We recall the basic properties of generating


functions,3

.G0 (x) = pk x k , (5.6)
k

which are defined for discrete distributions .pk , where .k = 0, 1, 2, .. . For the
normalization and the mean .k̄ = 〈k〉 one evaluates
 
G0 (1) =
. pk = 1, G'0 (1) = k pk = 〈k〉 . (5.7)
k k

The second moment .〈k 2 〉,


  d  

〈k 2 〉 =
. k 2 pk x k  =x x G'0 (x)  , (5.8)
x=1 dx x=1
k

allows to express the variance .σ 2 = 〈(k − k̄)2 〉 as

d     2
.σ 2 = 〈k 2 〉 − k̄ 2 = x G'0 (x)  − G'0 (1)
dx x=1
 2
= G''0 (1) + G'0 (1) − G'0 (1) . (5.9)

The importance of probability


 generating functions lies in the fact that the distribu-
tion for the sum .k = α k α of independent stochastic variables .k α is generated by
the product of the generating functions .Gα0 (x) of the respective individual processes
α
.p , viz
k
  
G0 (x) =
. pk x k = Gα0 (x), Gα0 (x) = pkα x k .
k α k

This relation is easily verified, f.i. for the case of two random variables by
multiplying out .G10 (x)G20 (x).

3 Generating functions are discussed further in Sect. 1.3.2, of Chap. 1.


5.1 Probability Distribution Functions 167

5.1.1 Law of Large Numbers

Throwing a dice many times and adding up the results obtained, the resulting
average will be close to .3.5 N, where N is the number of throws. This is the typical
outcome for cumulative stochastic processes.4

LAW OF LARGE NUMBERS Repeating . N times a stochastic process with mean . x̄ and
standard deviation . σ √
, the mean and the standard deviation of the cumulative result will
approach . x̄ N and . σ N respectively in the thermodynamic limit . N → ∞.

The law of large numbers


√ implies, that one obtains .x̄ as an averaged result, with
a standard deviation .σ/ N for the averaged process. One needs to increase the
number of trials by a factor of four in order to improve accuracy by a factor of two.

Proof We prove the law of large numbers for a a discrete process .pk described by
the generating functional .G0 (x). This is not really a restriction, since probability
densities of continuous variables can be discretized with arbitrary accuracy. The
generating function of the cumulative process is .GN 0 (x), which allows to express
the mean as
d N  −1 '


k̄ (N ) =
. G0 (x) = N GN
0 (x) G0 (x)  = N k̄ ,
dx x=1 x=1

with the help of (5.7). For the standard deviation .σ (N ) of the cumulative process we
obtain
  2
2 d d N 
. σ (N ) = x G0 (x)  − N k̄ (5.10)
dx dx x=1

d  −1 '
 2
= x N GN0 (x) G 0 (x)  − N 2 G'0 (1)
dx x=1
2 2
= NG0 (1) + N(N − 1) G0 (1) + N G''0 (1) − N 2 G'0 (1)
' '

2
= N G''0 (1) + G'0 (1) − G'0 (1) ≡ N σ2 ,

viz the law of large numbers, where (5.9) was used twice.

4 Please take note of the difference between a cumulative stochastic process, when adding the
results of individual trials, and the “cumulative PDF”, .F (x), defined by .F (x) = −∞ p(x ' )dx ' .
x
168 5 Information Theory of Complex Systems

flat, N=1
Gaussian

probability densities
0.02 flat, N=3

0.01

0
0 0.2 0.4 0.6 0.8 1
x
Fig. 5.2 The flat distribution, with variance .σ 2 = 1/12, and the probability density of the sum
of .N =√ 3 flat distributions. The latter approximates remarkably well the limit Gaussian with
.σ = 1/ 3 · 12 = 1/6, compare (5.11), in accordance with the central limit theorem. .10 random
5

samples, with .Nbin = 100 bins

Central Limit Theorem The law of large numbers states, that the variance .σ 2 is
additive for cumulative processes, not the standard deviation .σ . The “central limit
theorem” tells us in addition, that the limit distribution is a Gaussian, as illustrated
in Fig. 5.2.

CENTRAL LIMIT THEOREM Given are .i = 1, . . . , N independent random variables .xi ,


 .μi and standard deviations .σi . For .N → 
distributed with means ∞, the cumulative
distribution of .x = i xi is described by a Gaussian with mean .μ = i μi and variance
 2
.σ =
2
i σi .

In most cases one is not interested in the cumulative result, but in averaged
quantities, which are obtained by rescaling variables,
2
1 − (y−μ̄)
y = x/N,
. μ̄ = μ/N, σ̄ = σ/N, p(y) = √ e 2σ̄ 2 .
σ̄ 2π

The rescaled standard deviation scales with .1/ N. To see this, just consider
identical processes with .σi ≡ σ0 ,

1 σ0
σ̄ =
. σi2 = √ , (5.11)
N N
i

in accordance with the law of large numbers.

Is Everything Boring Then? One might be tempted to draw the conclusion that
systems containing large numbers of variables are boring, since everything seems to
average out. This is actually not the case, the law of large numbers holds only for
statistically independent processes. Subsystems of distributed complex systems are
however dynamically dependent, and it is often the case that dynamical correlations
lead to highly non-trivial properties in the thermodynamic limit.
5.1 Probability Distribution Functions 169

5.1.2 Bayesian Statistics

The notions of statistics considered so far can be easily generalized to the case of
more than one random variable. Whenever a certain subset of random variables is
considered to be the causing event for the complementary subset of variables one
speaks of inference, a domain of the Bayesian approach.

Conditional Probability Events and processes may have dependencies upon each
other. A physician will typically have to know, to give an example, the probability
that a patient has a certain illness, given that the patient shows a specific symptom.

CONDITIONAL PROBABILITY The probability that an event x occurs, given that an event
y has happened, is the “conditional probability” .p(x|y).

Throwing a dice twice, the probability that the first throw resulted in a 1, given
that the total result was .4 = 1 + 3 = 2 + 2 = 3 + 1, is 1/3. One defines with

.p(x) = p(x|y)p(y)dy (5.12)

the ‘marginal’ distribution .p(x). The probability of finding x is given by the


probability of finding x given y, .p(x|y), integrated over the probability of finding y
in the first place.

Bayes Theorem The probability distribution of throwing x in the first throw and
y in the second throw is determined by the joint distribution .p(x, y), which obeys
 
.1= p(x, y)dxdy, p(x) = p(x, y)dy . (5.13)

Together, the two expressions (5.12) and (5.13) for the marginal are equivalent to

p(x, y) = p(x|y)p(y) = p(y|x)p(x) ,


.

or
p(x|y)p(y) p(x|y)p(y)
p(y|x) =
. = , (5.14)
p(x) p(x|y)p(y)dy

where (5.12) was used in the second step. This relation is denoted “Bayes theorem”.
The conditional probability .p(x|y) of x happing given that y had occurred, is the
“likelihood”.

Bayesian Statistics As an exemplary application of Bayes theorem (5.14) consider


a medical test.
170 5 Information Theory of Complex Systems

– The probability of being ill/healthy is given by .p(y), .y = ill/healthy.


– The likelihood of passing the test is .p(x|y), with .x = positive/negative.

Let’s consider an epidemic outbreak with 1 % of the population being infected on


the average. Furthermore we assume that the test has an accuracy of 99 %,

. p(positive|ill) = 0.99, p(positive|healty) = 0.02 ,

with the latter being the rate of false positives. The probability of a positively tested
person of being infected is then actually just 33%,

p(pos|ill)p(ill)
p(ill|pos) =
.
p(pos|ill)p(ill) + p(pos|healthy)p(healthy)
0.99 · 0.01 1
= = ,
0.99 ∗ 0.01 + 0.02 ∗ 0.99 3

where Bayes theorem (5.14) was used. A second follow-up test is hence necessary.

Statistical Inference We consider again a medical test, but in a slightly different


situation. A series of test is performed in a city where an outbreak has occurred, but
now with the purpose to estimate the percentage of people being infected.
We can then use expressing (5.12) for the marginal probability .p(positive) for
obtaining positive test results,

p(positive) = 0.99 p(ill) + 0.02 (1 − p(ill))


. (5.15)

and solve for our estimate .p(ill) of infections. In addition one needs to estimate the
confidence of the obtained result, viz the expected fluctuations due to the limited
number of tests actually carried out.

Bayesian Inference We start by noting that both sides of Bayes theorem (5.14) are
properly normalized,
 
p(x|y)p(y) dy
. p(y|x) dy = 1 = .
p(x)

For a given x, the probability that any y happens is unity, and vice versa. For a given
x we may hence interprete the left-hand side as the probability that y is true. We
change the notation slightly,

p(x|y)p0 (y)
p1 (y) ≡ 
. . (5.16)
p(x|y)p0 (y)dy
5.1 Probability Distribution Functions 171

One denotes

– .p1 (y) = p(y|x) the “posterior” distribution,


– .p(x|y) the likelihood and with
– .p0 (y) the “prior”.

Equation (5.16) constitutes the basis of Bayesian inference. In this setting one is not
interested in finding a self-consistent solution .p0 (y) = p1 (y) = p(y). Instead it is
premised that one disposes of prior information, viz knowledge and expectations,
about the status of the world, .p0 (y). Performing an experiment a new result x is
obtained which is then used to improve the expectations of the world status through
.p1 (y), using (5.16).

Bayesian Learning The most common application of Bayesian inference is a


situation where inference from a given set of experimental data needs to be drawn,
using (5.16) a single time.
Alternatively one can consider (5.16) as the basis of cognitive learning processes,
updating the knowledge about the world iteratively with any further observation
.x1 , x2 , . . . , xn ,

pi (y) ∝ p(xi |y) pi−1 (y),


. ∀y .

This update procedure of the knowledge .pi (y) about the world is independent of
the grouping of observations .xi , viz

p 0 → p1 → · · · → p n
. and p 0 → pn

yield the same result, due to the multiplicative nature of the likelihood .p(x|y), viz
when considering in the last relation all consecutive observations .{x1 , . . . , xn } as a
single event.

5.1.3 Statistical Binning

Beyond the elementary parameters, like mean and variance, one is interested in
many cases in estimating the very probability distribution, in particular for data
generated by some known or unknown process, like the temperature measurements
of a weather station. When doing so, it is important to keep a few caveats in mind.
172 5 Information Theory of Complex Systems

6
xn + 1 = rxn(1 − xn)

0,4
4

joint probabilities
2 0,2

0 0
0 0,2 0,4 0,6 0,8 1 p++ p+− p−+ p−−
t

Fig. 5.3 For the logistic map with .r = 3.9 and .x0 = 0.6, two statistical analyses of the
identical time series .{xn |n = 0, . . . , N }, with .N = 106 . Left: The distribution .p(x) of the .xn .
Plotted at the midpoints of the respective bins is .p(x)(Nbin /N ), for .Nbin = 10 (square symbols)
and .Nbin = 100 (vertical bars). Right: The joint probabilities .p±± , as defined by (5.20), of
consecutive increases/decreases of .xn . The probability .p−− that the data decreases consecutively
twice vanishes

Binning of Variables Data in the form of a time series of observations is generated


typically by a dynamical system. As an example we examine the statistical
properties of the logistic map,5

xn+1 = r xn (1 − xn ),
. xn ∈ [0, 1], r ∈ [0, 4] . (5.17)

For systems with continuous readings, as for the logistic map, one needs to bin
observations in order to estimate the respective probability distribution. In Fig. 5.3
the statistics of a time series in the chaotic regime is given, here for .r = 3.9.
Apart from the overall number of bins, .Nbin , a choice has to be made regarding
the positions and the widths of the individual bins. When the data is not uniformly
distributed, one may place more bins in a region of interest, generalizing the
relation (5.1) through .Δx → Δxi , with the .Δxi being the width of the individual
bins.
For the example shown in Fig. 5.3, we selected .Nbin = 10/100 equidistant
bins. Note that the average number of observations scales with .1/Nbin . Rescaling
the count per bin with .Nbin allows therefore to compare distributions obtained for
different .Nbin , as done in Fig. 5.3.
The selection of the binning procedure is in general an intricate choice. Fine
structure will be lost when .Nbin is too low, but statistical noise will dominate for
large numbers of bins.

5A detailed account of period doubling and chaos in the logistic map can be found in Sect. 2.4, of
Chap. 2.
5.1 Probability Distribution Functions 173

Adaptive Binning Classically, number and positions of the set .{Bi } of bins are
predetermined,
 
Bi = x|x ∈ [bi− , bi+ ] ,
. (5.18)

where .bi± is the upper/lower border of the ith bin. Observations .xn ∈ Bj are added
to the count of the j th bin. Results are assigned either to the bare midpoint .(bi+ +
bi− )/2, or to a weighted average.
For adaptive binning, the desired number .Nobs of observations per bin is
predetermined, not the .Bi themselves. For the following we assume that the
observations .xn > 0 are ordered, with .xn+1 ≥ xn . Starting with .b1− = 0, the
following steps are repeated.

– For the ith bin, add data points until .Nobs is reached. Say, that the last addition is
the mth observation.
– Set the border of the current bin halfway to the next data point, .bi+ = (xm+1 +
xm )/2.

– Repeat for the next bin, with matching bin borders, .bi+1 = bi+ .

Per construction, the statistical accuracy of all bins are identical when adaptive
binning is used. Adaptive binning improves the quality of estimates substantially
when the data is distributed unequally over large ranges. This is in particular the
case for data showing powerlaw behavior, as for scale-free graphs.6

5.1.4 Time Series Characterization

Till now, we implicitly assumed now that the statistical evaluation of a given set
of observations is done directly, without further preprocessing. This is however not
always the optimal approach.

Symbolization One denotes with “symbolization” the construction of a finite


number of symbols suitable for the statistical characterization of a of data of interest.
For a given time series .{xt }, a standard preprocessing procedure is to extract
stepwise changes, such as .xt − xt−1 . Of interest is also if the data increases or
decreases,

1 xt > xt−1
δt = sign(xt − xt−1 ) =
. . (5.19)
−1 xt < xt−1

6 The data for the in-degree of Internet domains presented in Fig. 1.6 of Chap. 1, has been processed

using adaptive binning with .Nobs = 100.


174 5 Information Theory of Complex Systems

The consecutive development of the .δt may also be encoded, using higher-level
symbolic stochastic variables. For example, one might be interested in the joint
probabilities
   
p++ = p(δt = 1, δt−1 = 1) t , p+− = p(δt = 1, δt−1 = −1) t ,
.     (5.20)
p−+ = p(δt = −1, δt−1 = 1) t , p−− = p(δt = −1, δt−1 = −1) t ,

where .p++ gives the probability that the data increases at least twice consecutively,
etc., and where .〈. . . 〉t denotes the time average. In Fig. 5.3 the values for the joint
probabilities .p±± are given for a selected time series of the logistic map in the
chaotic regime. The data never decreases twice consecutively, .p−− = 0, a somewhat
unexpected result. It tells us, that certain properties of an otherwise chaotic system
may be predictable,7 at times even at a 100% level.
The symbolization procedure selected to analyze a given time series determines
the type of information one may hope to extract, as evident from the results
presented in Fig. 5.3. The selection of symbolization procedures is given further
attention in Sect. 5.2.2.

Self Averaging Per definition, a time series produced by a dynamical system


depends on the initial condition. The resulting statistical properties may hence also
vary, e.g. when several distinct attracting states are present. We start with a basic
example, the XOR series,8

σt+1 = XOR(σt , σt−1 ),


. σt = 0, 1 . (5.21)

The four initial conditions 00, 01, 10 and 11, give rise to the following time series’,

. . . 000000000 . . . 101101101
. (5.22)
. . . 110110110 . . . 011011011

where time runs from right to left. In (5.22) the initial conditions .σ1 and .σ0 have been
underlined. The typical time series, occurring for 75 % of the initial conditions, is
.. . . 011011011011 . . ., with .p(0) = 1/3 and .p(1) = 2/3 for the probability to find

respectively .0/1. When averaging over all four initial conditions, we have on the
other hand .(2/3)(3/4) = 1/2 for the probability to find a 1. Then

2/3 typical
p(1) =
. .
1/2 average

7 General aspects of predictability in chaotic systems are developed in Sect. 3.1.3 of Chap. 3.
8 Remember, that .XOR(0, 0) = 0 = XOR(1, 1) and .XOR(0, 1) = 1 = XOR(1, 0).
5.1 Probability Distribution Functions 175

When observing a single time series we are likely to obtain the typical probability,
analyzing many time series will result on the other hand in the average probability.

SELF AVERAGING When the statistical properties of a time series generated by a


dynamical process are independent of the respective initial conditions, one says the time
series is “self averaging”.

The XOR series is not self averaging and one can generally not assume self
averaging to occur. An inconvenient situation whenever only a single time series is
available, as it is the case for most historical data, e.g. of past climatic conditions.

XOR Series with Noise Most real-world processes involve a certain degree of
noise. It is therefore tempting to presume, that noise could effectively restart the
dynamics, leading to an implicit averaging over initial conditions. This assumption
is not generally valid, it holds however for the XOR process with noise,

XOR(σt , σt−1 ) with probability 1 − ξ
σt+1 =
. 0 ≤ ξ ⪡ 1.
¬ XOR(σt , σt−1 ) with probability ξ

For low level of noise, .ξ → 0, the time series

. . . . 000000001101101101011011011011101101101100000000 . . .

has stretches of regular behavior interseeded by four types of noise induced


dynamics (underlined, time running from right to left). Denoting with .p000 and
.p011 the probability of finding regular dynamics of type “.. . . 000000000 .. . .” and

“.. . . 011011011 .. . .” respectively, we obtain the master equation

ṗ011 = ξp000 − ξp011 /3 = −ṗ000


. (5.23)

for the noise-induced transition probabilities. In the stationary case .p000 = p011 /3
for the XOR process with noise, the same ratio one would obtain for the determin-
istic XOR series averaged over the initial conditions.
The introduction of noise generally introduces complex dynamics akin to (5.23),
which will lead in most cases to self-averaging time series. This is also the case for
the OR time series,9 for which the small noise limit does however not coincide with
the time series obtained in the absence of noise.

Time Series Analysis and Cognition Time series analysis is a tricky business
whenever the fundamentals of the generative process are unknown, e.g. whether
noise is important or not. This is however the setting in which cognitive systems
are operative. Our sensory organs, eyes and ears, provide us with a continuous time

9 For the noisy OR series see exercise (5.4).


176 5 Information Theory of Complex Systems

series encoding environmental information. Performing an informative and fast time


series analysis is paramount for surviving.

ONLINE VS. OFFLINE ANALYSIS If one performs an analysis of a previously recorded


time series one speaks of “offline” analysis. An analysis performed on-the-fly, while
observing, corresponds to “online” processing.

Animals need to perform online analysis of their sensory data input streams,
otherwise they would not survive long enough to react. Training of most machine
learning algorithms is however offline.

Trailing Averages Online characterization of a time series in terms of its basic


statistical properties, like mean and standard deviation, is quite straightforward.
For a continuous-time input stream .x(t) we define with
 ∞
1
.μt = dτ x(t − τ ) e−τ/T . (5.24)
T 0
 ∞
1  2
σt2 = dτ x(t − τ ) − μt e−τ/T (5.25)
T 0

respectively the “trailing average” .μt and the trailing variance .σt2 . Trailing expec-
tation values exponentially discount older data, with the respective moments of the
input stream .x(t) being recovered in the limit .T → ∞. The factor .1/T in (5.24)
and (5.25) normalizes the respective trailing averages. For the case of a constant,
time independent input .x(t) ≡ x̄, we obtain correctly
 ∞
1
μt →
. dτ x̄ e−τ/T = x̄ .
T 0

The trailing average can be evaluated by a simple online update rule, there is no
need to store past data .x(t − τ ). To see this, we evaluate the time dependence
 ∞  ∞
1 −τ/T d −1 d
.μ̇t = dτ e x(t − τ ) = dτ e−τ/T x(t − τ ) .
T 0 dt T 0 dτ

The last expression can be evaluated by direct partial integration.10 One obtains

x(t) − μt
μ̇t =
. , (5.26)
T

10 The identical procedure can be used in the context of dynamical systems with distributed time
delays, as shown in Sect. 2.5.1 of Chap. 2.
5.2 Entropy and Information 177

together with an analogous update rule for the variance .σt2 , by substituting .x →
(x − μ)2 . Expression (5.26) is an archetypical example of an online updating rule
for a time averaged quantity, here the trailing average .μt .

5.2 Entropy and Information

Entropy is a venerated concept from physics encoding the amount of disorder


present in a thermodynamic system at a given temperature. The “Second Law
of Thermodynamics” states, that entropy can only increase in an isolated, viz
closed system. The second law has far reaching consequences, e.g. determining
the maximal efficiency of engines and power plants, together with philosophical
implications for our understanding of the fundamentals underpinning the nature of
life as such.

Entropy and Life Living organisms have a body, which means that they are
capable of creating ordered structures from basic chemical constituents. As a
consequence, living entities decrease entropy locally, with their bodies, seemingly
in violation of the second law. In reality, the local entropy depressions are created
at the expense of corresponding entropy increases in the environment, in agreement
with the second law of thermodynamics. All living beings need to be capable of
manipulating entropy.

Information Entropy and Predictability Entropy is a central concept in infor-


mation theory, where it is commonly denoted “Shannon entropy” or “information
entropy”. In this context, one is interested in the amount of information encoded by
a sequence of symbols

. . . . σt+2 , σt+1 , σt , σt−1 , σt−2 , . . . ,

e.g. when transmitting a message. Typically, in everyday computers, the .σt are
words of bits. Let us consider two time series of bits, e.g.

. . . . 101010101010 . . . , . . . 1100010101100 . . . . (5.27)

The first example is predictable, from the perspective of a time-series, and ordered,
from the perspective of an one-dimensional alignment of bits. The second example
is unpredictable and disordered respectively.
Information can be transmitted through a time series of symbols only when this
time series is not predictable. Talking to a friend, to illustrate this statement, we will
not learn anything new when capable of predicting his next joke. We have therefore
the following two perspectives,

large disorder physics
high entropy =
. ˆ
high information content information theory
178 5 Information Theory of Complex Systems

and vice versa. Only seemingly disordered sequences of symbols are unpredictable
and thus potential carriers of information. Note, that the predictability of a given
time series, or its degree of disorder, may not necessarily be as self evident as in
above example, Eq. (5.27), depending generally on the analysis procedure used, see
Sect. 5.2.2.

Extensive Information In complex systems theory, as well as in physics, we are


often interested in properties of systems composed of many subsystems.

EXTENSIVE AND INTENSIVE PROPERTIES For systems composed of N subsystems a


property is denoted “extensive” if it scales as .O(N 1 ) and “intensive” when it scales with
0
.O(N ).

A typical extensive property is the mass, a typical intensive property the density.
When lumping together two chunks of clay, their mass adds, but the density does
not change.
One demands, both in physics and in information theory, that the entropy should
be an extensive quantity. The information content of two independent transmission
channels should be just the sum of the information carried by the two individual
channels.

Shannon Entropy The Shannon entropy .H [p] is defined as



H [p] = −
. p(xi ) logb (p(xi )) = −〈 logb (p) 〉, H [p] ≥ 0 , (5.28)
xi

where .p(xi ) is a normalized discrete probability distribution function and where the
brackets in .H [p] denote the functional dependence.11 Note, that .−p log(p) ≥ 0 for
.0 ≤ p ≤ 1, see Fig. 5.4, the entropy is therefore strictly positive.

b is the base of the logarithm used in (5.28). Common values of b are 2, Euler’s
number e and 10. The corresponding units of entropy are then termed “bit” for
.b = 2, “nat” for .b = e and “digit” for .b = 10. In physics the natural logarithm is

always used and there is an additional constant (the Boltzmann constant .kB ) in front
of the definition of the entropy. Here we will use .b = 2 and drop in the following
the index b.

Extensiveness of the Shannon Entropy The .log-dependence in the definition of


the information entropy in (5.28) is necessary for obtaining an extensive quantity.
To see this, let us consider a system composed of two independent subsystems. The

11 A function .f (x) is a function of a variable x; a functional .F [f ] is, on the other hand, functionally

dependent on a function .f (x). In formal texts on information theory the notation .H (X) is often
used for the Shannon entropy and a random variable X with probability distribution .pX (x).
5.2 Entropy and Information 179

2
0.4
0
-x log 2 (x)
-2
0.2

-4 log2 (x)

0 -6
0 0.2 0.4 0.6 0.8 1 0 1 2 3
x x

Fig. 5.4 Left: Plot of .−x log2 (x). Right: The logarithm .log2 (x) (full line) is concave, every cord
(dashed line) lies below the graph

joint probability distribution is multiplicative,

p(xi , yj ) = pX (xi )pY (yj ),


. log(p(xi , yj )) = log(pX (xi )) + log(pY (yj )) .

The logarithm is the only function which maps a multiplicative input onto an
additive output. Consequently,

H [p] = −
. p(xi , yj ) log(p(xi , yj ))
xi ,yj
  
=− pX (xi )pY (yj ) log(pX (xi )) + log(pY (yj ))
xi ,yj
 
=− pX (xi ) pY (yj ) log(pY (yj ))
xi yj
 
− pY (yj ) pX (xi ) log(pX (xi ))
yj xi

= H [pY ] + H [pX ] ,

as necessary for the extensiveness of .H [p]. Hence the .log-dependence in (5.28).

Degrees of Freedom We specialise to a discrete system with .xi ∈ [1, . . . , n],


having n “degrees of freedom” in physics’ slang. If the probability of finding any
value is equally likely, as it is the case for a thermodynamic system at infinite
temperatures, the entropy is
 1
H =−
. p(xi ) log(p(xi )) = −n log(1/n) = log(n) , (5.29)
xi
n
180 5 Information Theory of Complex Systems

a celebrated result. The entropy grows logarithmically with the number of degrees
of freedom.

Shannon’s Source Coding Theorem So far we did show, that (5.28) is the
only possible definition, modulo renormalizing factors, for an extensive quantity
depending exclusively on the probability distribution. The operative significance of
the entropy .H [p] in terms of informational content is given by Shannon’s theorem.

SOURCE CODING THEOREM Given is a random variable x with a PDF .p(x) and entropy
.H [p]. The cumulative entropy .N H [p] is then, for .N → ∞, a lower bound for the number
of bits necessary when compressing N independent processes drawn from .p(x).

If we compress more, we will lose information, the entropy .H [p] is therefore a


measure of information content.

Entropy and Compression Let’s make an example. Consider the four letter
alphabet .{A, B, C, D}. Suppose, that these four letters do not occur with the same
probability, the relative frequencies being instead

1 1 1
. p(A) = , p(B) = , p(C) = = p(D) .
2 4 8
When transmitting a long series of words using this alphabet we will have the
entropy

−1 1 1 1
H [p] =
. log(1/2) − log(1/4) − log(1/8) − log(1/8)
2 4 8 8
1 2 3 3
= + + + = 1.75 , (5.30)
2 4 8 8
since we are using the logarithm with base .b = 2. The most naive bit encoding,

A → 00,
. B → 01, C → 10, D → 11 ,

would use exactly 2 bit, which is larger than the Shannon entropy. An optimal
encoding would be, on the other hand,

. A → 1, B → 01, C → 001, D → 000 , (5.31)

leading to an average length of words transmitted of

1 2 3 3
p(A) + 2p(B) + 3p(C) + 3p(D) =
. + + + = 1.75 , (5.32)
2 4 8 8
which is the same as the information entropy .H [p]. The encoding given in (5.31) is
actually “prefix-free”. When we read the words from left to right, we know where a
5.2 Entropy and Information 181

new word starts and stops,

110000010101
. ←→ AADCBB ,

without ambiguity. Fast algorithms for optimal, or close to optimal encoding are
clearly of importance in the computer sciences and for the compression of audio
and video data.

Discrete vs. Continuous Variables When defining the entropy, we considered


hitherto discrete variables. The information entropy can also be defined for con-
tinuous variables. We should be careful though, being aware that the transition from
continuous to discrete stochastic variables, and vice versa, is slightly non-trivial,
compare (5.1),
  

H [p]
. =− p(x) log(p(x)) dx ≈ − p(xi ) log(p(xi )) Δx
con
i
  
=− pi log(pi /Δx) = − pi log(pi ) + pi log(Δx)
i i i


= H [p] + log(Δx) , (5.33)
dis

where .pi = p(xi )Δx is the properly normalized discretized PDF, see (5.1). The
difference .log(Δx) between
 the continuous-variable entropy .H [p]con and the dis-

cretized version .H [p] dis diverges as .Δx → 0, the transition is hence discontinuous.

Entropy
 of a Continuous PDF If follows from (5.33), that the Shannon entropy
H [p]con can be negative for a continuous probability distribution function. As an
.

example consider a flat distribution in a small interval, .x ∈ [0, ϵ],



1/ϵ for x ∈ [0, ϵ]
p(x) =
. ,
0 otherwise

which leads to the entropy


  ϵ
 1
H [p]
. =− log(1/ϵ) dx = log(ϵ) < 0, for ϵ < 1.
con 0 ϵ

The absolute value of the entropy is hence not meaningful forcontinuous probability
density, only entropy differences. Hence one refers to .H [p]con as the “differential
entropy”.
182 5 Information Theory of Complex Systems

5.2.1 Maximal Entropy Distributions

Which kind of distributions maximize entropy, viz information content? Remem-


bering that

. lim p log(p) = 0, log(1) = 0 ,


p→0,1

see Fig. 5.4, it is intuitive that a flat distribution might be optimal. This is indeed
correct in the absence of any constraint other than the normalization condition
. p(x)dx = 1.

Variational Calculus We turn to the task to maximize the functional



.H [p] = f (p(x)) dx, f (p) = −p log(p) , (5.34)

where the notation used will be of use later on. Maximizing a functional like .H [p]
is a typical task of variational calculus, which examines the variation .δp(x) around
an optimal function .popt (x),

p(x) = popt (x) + δp(x),


. δp(x) arbitrary .

At optimality, the dependence of .H [p] on the variation .δp should be stationary,



0 ≡ δH [p] =
. f ' (p) δp dx, 0 = f ' (p) , (5.35)

where .f ' (p) = 0 follows from the fact that .δp is an arbitrary function.
For the entropy functional .f (p) = −p log(p) we find then with

f ' (p) = − log(p) − 1 = 0,


. p(x) = const. (5.36)

the expected flat distribution.

Maximal Entropy Distributions with Constraints Under the constraint of a fixed


average .μ, the maximal entropy is determined by

f (p) = −p log(p) − λxp,
. μ= x p(x) dx , (5.37)

where .λ is a “Lagrange parameter”. It is used to enforce a given condition, here that


μ takes a predefined value. The stationary condition .f ' (p) = 0 leads to
.

f ' (p) = − log(p) − 1 − λx = 0,


. p(x) ∝ 2−λx ∼ e−x/μ . (5.38)
5.2 Entropy and Information 183

For a given mean .μ, the the exponential distribution (5.38) maximises entropy. The
Lagrange parameter .λ is determined such that the condition (5.37) is satisfied. For a
support .x ∈ [0, ∞], as assumed above, we have .λ loge (2) = 1/μ.
One can generalize this procedure and consider distribution maximizing the
entropy under the constraint of a given mean .μ and variance .σ 2 ,
 
μ=
. x p(x) dx, σ2 = (x − μ)2 p(x) dx . (5.39)

Generalizing the derivation leading to (5.38), one sees that the maximal entropy
distribution constrained by (5.39) is a Gaussian,12 as given by (5.5).

Pairwise Constraints We consider a joint distribution function .p(x1 , . . . , xn ) for


n variables .xi with pairwise correlations

.〈xi xj 〉 = dx n xi xj p(x1 , . . . , xn ) . (5.40)

Pair correlations can be measured in many instances experimentally, it is hence


natural to considered them as constraints for modelling. One can adjust in the
maximal entropy distribution

e−H  
p(x1 , . . . , xn ) =
. , H = Jij xi xj + λi xi (5.41)
N
ij i

the .n(n − 1)/2 variational parameters .Jij in order to reproduce given .n(n − 1)/2
pairwise correlations .〈xi xj 〉, and the Lagrange multiplier .λi for regulating the
respective individual averages .〈xi 〉.
The maximal entropy distribution (5.41) has the form of a Boltzman factor of
statistical mechanics with H representing an Hamiltonian, viz the energy function.
It contains coupling constants .Jij encoding the strength of pairwise interactions.

5.2.2 Minimal Entropy Principle

The Shannon entropy is a very powerful concept in information theory. The


encoding rules are typically known in practical applications, where there is no
ambiguity regarding the symbolization procedure to employ when receiving a
message via a given technical communication channel. This is however not the
case when we are interested in determining the information content of real-world

12 The derivation of the maximal entropy distribution constrained by .μ and .σ is treated in


exercise (5.6).
184 5 Information Theory of Complex Systems

processes, such as the time series of certain financial data or the data stream
produced by our sensory organs.

Symbolization and Information Content The result obtained for the information
content of a real-world time series .{σt } depends in general on the symbolization
procedure used. Let us consider, as an example, the first time series of (5.27),

. . . . 101010101010 . . . . (5.42)

When using a 1-bit symbolization procedure, we have

1 1
p(0) =
. = p(1), H [p] = −2 log(1/2) = 1 ,
2 2
as expected. If, on the other hand, we use a 2-bit symbolization, we find

p(00) = p(11) = p(01) = 0,


. p(10) = 1, H [p] = − log(1) = 0 .

When 2-bit encoding is presumed, the time series is predictable and carries no
information. This seems intuitively the correct result and the question is: Can we
formulate a general guiding principle which tells us which symbolization procedure
would yield the more accurate result for the information content of a time series at
hand?

Minimal Entropy Principle The Shannon entropy constitutes a lower bound for
the number of bits per symbol necessary when compressing the data without infor-
mation loss. Trying various symbolization procedures, the symbolization procedure
yielding the lowest information entropy allows us consequently to represent a given
time series lossless with the least number of bits.

MINIMAL ENTROPY PRINCIPLE The information content of a time series with unknown
encoding is given by the minimum (actually the infimum) of the Shannon entropy over all
possible symbolization procedures.

The minimal entropy principle then gives us a definite answer with respect to
the information content of the time series in (5.42). We have seen that at least one
symbolization procedure yields a vanishing entropy, the lowest possible value since
.H [p] ≥ 0. This is the expected result, since .. . . 01010101 .. . . is predictable.

Information Content of a Predictable Time Series Note, that a vanishing infor-


mation content .H [p] = 0 only implies that the time series is strictly predictable,
not that it is constant. One therefore needs only a finite amount of information to
encode the full time series, viz for arbitrary lengths .N → ∞. When the time series
is predictable, the information necessary to encode the series is intensive and not
extensive.
5.2 Entropy and Information 185

Symbolization and Time Horizons The minimal entropy principle is rather


abstract. In practice, one may not be able to try out more than a handful of different
symbolization procedures. It is therefore important to gain an understanding of the
time series at hand.
A core defining aspect of many time series is the intrinsic time horizon .τ .
Most dynamical processes have characteristic time scales, with the consequence
that memories of past states are effectively lost for times exceeding these intrinsic
time scales. The symbolization procedure used should therefore match the time
horizon .τ .
This is what happened when analyzing the time series of (5.42), for which
.τ = 2. A 1-bit symbolization procedure implicitly presumes that .σt and .σt+1 are

statistically independent, missing the intrinsic time scale .τ = 2, in contrast to a 2-bit


symbolization procedure.

5.2.3 Mutual Information

So far, we have been concerned mostly with individual stochastic processes as well
as the properties of cumulative processes generated by the sum of stochastically
independent random variables. In order to understand complex systems, we need to
develop tools for the description of a large number of interdependent processes. As a
first step towards this direction, we study in the following the case of two stochastic
processes, which may now be statistically correlated.

Two Channel Markov Process We start with an illustrative example of two


correlated channels .σt and .τt , with

.σt+1 = XOR(σt , τt ) (5.43)



XOR(σt , τt ) with probability 1 − ξ
τt+1 = .
¬XOR(σt , τt ) with probability ξ

This dynamics is markovian, as the value for the state .{σt+1 , τt+1 } depends only on
the state at the previous time step,13 viz on .{σt , τt }.

MARKOV PROCESS A discrete-time memory-less dynamical process is denoted a Markov


process. The likelihood of future states depends only on the present state, and not on any
past states.

When state space is finite, as in our example, one has a Markov chain.

13 Markov chains are the subject of Sect. 3.3.2 of Chap. 3.


186 5 Information Theory of Complex Systems

Joint Probabilities A typical instance of the Markov chain specified in (5.43) is

. . . σt+1 σt . . . : 00010000001010...
.
. . . τt+1 τt . . . : 00011000001111...

where we did underline the loci of noise-induced transitions. For .ξ = 0 the


stationary state is .{σt , τt } = {0, 0}, which is fully correlated. We now calculate
the joint probabilities .p(σ, τ ) for general values of noise .ξ , using the transition
probabilities

pt+1 (0, 0) = (1 − ξ ) [pt (1, 1) + pt (0, 0)] pt+1 (1, 0) = ξ [pt (0, 1) + pt (1, 0)]
.
pt+1 (1, 1) = (1 − ξ ) [pt (1, 0) + pt (0, 1)] pt+1 (0, 1) = ξ [pt (0, 0) + pt (1, 1)]

for the ensemble averaged joint probability distributions .pt (σ, τ ) = 〈p(σt , τt )〉ens ,
where the average .〈..〉ens denotes the average over an ensemble of time series. For
the solution in the stationary case, .pt+1 (σ, τ ) = pt (σ, τ ) ≡ p(σ, τ ), we use the
normalization

p(1, 1) + p(0, 0) + p(1, 0) + p(0, 1) = 1 ,


.

finding

p(1, 1) + p(0, 0) = 1 − ξ,
. p(1, 0) + p(0, 1) = ξ ,

by adding the terms .∝ (1 − ξ ) and .∝ ξ respectively. It then follows immediately


that

p(0, 0) = (1 − ξ )2 p(1, 0) = ξ 2
. , . (5.44)
p(1, 1) = (1 − ξ )ξ p(0, 1) = ξ(1 − ξ )

For .ξ = 1/2 the two channels become 100 % uncorrelated, as the .τ -channel is then
fully random. The dynamics of the Markov process given in (5.43) is self averaging,
which allows to verify (5.44) by a straightforward numerical simulation.

Marginal Distributions Using the notation


 
pσ (σ ' ) =
. p(σ ' , τ ' ), pτ (τ ' ) = p(σ ' , τ ' )
τ' σ'
5.2 Entropy and Information 187

2
H[pσ] + H[pτ]
entropy [bit] 1.5
H[p]
H[pσ]
1

H[pτ]
0.5

0
0 0.1 0.2 0.3 0.4 0.5
ξ
Fig. 5.5 For the two-channel XOR-Markov chain .{σt , τt } with noise .ξ , see (5.43), the entropy
.H [p] of the combined process (full line), see (5.47), of the individual channels (dashed lines),
see(5.46), .H [pσ ] and .H [pτ ], together with the sum of the joint entropies (dot-dashed line). Note
the positiveness of the mutual information, .I (σ, τ ) > 0, with .I (σ, τ ) = H [pσ ] + H [pτ ] − H [p]

for the marginal distributions .pσ and .pτ , we find from (5.44)

pσ (0) = 1 − ξ pτ (0) = 1 − 2ξ(1 − ξ )


. , (5.45)
pσ (1) = ξ pτ (1) = 2ξ(1 − ξ )

for the distributions of the two individual channels.

Joint and Marginal Entropy We evaluate now two entropies, that of the individual
channels, .H [pσ ] and .H [pτ ], the “marginal entropies”, viz

.H [pσ ] = −〈log(pσ )〉, H [pτ ] = −〈log(pτ )〉 , (5.46)

as well as the entropy of the combined process, termed “joint entropy”,



H [p] = −
. p(σ ' , τ ' ) log(p(σ ' , τ ' )) . (5.47)
σ ' ,τ '

In Fig. 5.5 the respective entropies are plotted as a function of noise strength .ξ .
Some observations.

– In the absence of noise, .ξ = 0, both the individual channels, as well as the


combined process, are predictable and all three entropies, .H [p], .H [pσ ] and
.H [pτ ], vanish.

– For maximal noise .ξ = 0.5, the information content of both individual chains is
1 bit and of the combined process 2 bits, implying statistical independence.
188 5 Information Theory of Complex Systems

– For general noise strengths .0 < ξ < 0.5, the two channels are statistically corre-
lated. The information content of the combined process .H [p] is consequently
smaller than the sum of the information contents of the individual channels,
.H [pσ ] + H [pτ ].

Mutual Information The degree of statistical dependency of two channels can be


measured by comparing the joint entropy with the respective marginal entropies.

MUTUAL INFORMATION For two stochastic processes .σt and .τt , the difference

.I (σ, τ ) = H [pσ ] + H [pτ ] − H [p] (5.48)

between the sum of the marginal entropies .H [pσ ] + H [pτ ] and the joint entropy .H [p] is
the mutual information .I (σ, τ ).

When two dynamical processes become correlated, information is lost and this
information loss is given by the mutual information. Note, that .I (σ, τ ) = I [p] is
a functional of the joint probability distribution p only, the marginal distribution
functions .pσ and .pτ being themselves functionals of p.

Positiveness In the following we refer to the general case of two stochastic


processes described by the joint distribution .p(x, y) and the respective marginal
densities .pX (x) = p(x, y)dy, and .pY (y) = p(x, y)dx. The mutual information

I (X, Y ) = 〈log(p)〉 − 〈log(pX )〉 − 〈log(pY )〉


. (5.49)
  
= p(x, y) log(p(x, y)) − log(pX (x)) − log(pY (y)) dxdy

is strictly positive, .I (X, Y ) ≥ 0, as we will show now. Rewriting the mutual


information as

p(x, y)
.I (X, Y ) = p(x, y) log dxdy
pX (x)pY (y)

pX pY
= − p log dxdy , (5.50)
p

the positiveness of .I (X, Y ) follows from the concaveness of the logarithm,

. log(p1 x1 + p2 x2 ) ≥ p1 log(x1 ) + p2 log(x2 ), ∀x1 , x2 ∈ [0, ∞] , (5.51)

as illustrated in Fig. 5.4. The inequality (5.51) holds for .p1 , p2 ∈ [0, 1], with .p1 +
p2 = 1, which expresses that any cord of a concave function lies below the graph.
We can regard .p1 and .p2 as the coefficients of a distribution function and generalize,

p1 δ(x − x1 ) + p2 δ(x − x2 )
. −→ p(x) ,
5.2 Entropy and Information 189

where .p(x) is now a generic, properly normalized probability density. The concave-
ness condition (5.51) then becomes
 
. log p(x) x dx ≥ p(x) log(x)dx, Φ (〈x〉) ≥ 〈 Φ(x) 〉 . (5.52)

This is the Jensen inequality, which holds for any concave function .Φ(x). It remains
valid when substituting .x → pX pY /p for the argument of the logarithm.14 For the
mutual information (5.50) we then obtain
 
p X pY pX pY
I (X, Y ) = −
. p log dxdy ≥ − log p dxdy
p p
 
= − log pX (x)dx pY (y)dy = − log(1) = 0 ,

viz that .I (X, Y ) is non-negative. Information can only be lost, and not gained, when
correlating two previously independent processes.

Conditional Entropy There are various ways to rewrite the mutual information,
using Bayes theorem .p(x, y) = p(x|y)pY (y) between the joint density .p(x, y), the
conditional probability distribution .p(x|y) and the marginal .pY (y), e.g.
  
p p(x|y)
I (X, Y ) = log
. = p(x, y) log dxdy
pX pY pX (x)
≡ H (X) − H (X|Y ) , (5.53)

where we used the notation .H (X) = H [pX ] for the marginal entropy, together with
the “conditional entropy”

H (X|Y ) = −
. p(x, y) log(p(x|y))dxdy . (5.54)

The conditional entropy is positive for discrete processes, since

. − p(xi , yj ) log(p(xi |yj )) = −p(xi |yj )pY (yj ) log(p(xi |yj ))

is positive, given that .−p log(p) ≥ 0 holds in the interval .p ∈ [0, 1]. Compare (5.33)
for changing from continuous to discrete variables. Several variants for the condi-
tional entropy can be used to define statistical complexity measures, as discussed in
Sect. 5.3.1.

14 For a proof consider the generic substitution .x → q(x) and a transformation of variables .x → q
via .dx = dq/q ' , with .q ' = dq(x)/dx, for the integration in (5.52).
190 5 Information Theory of Complex Systems

Causal Dependencies For independent processes one has .p(x, y) = p(x)p(y) =


p(x|y)p(y) and hence

p(x|y) = p(x),
. H (X|Y ) → H (X) .

The opposite extreme is realized when the first channel is just a function of the
second channel, viz when

xi = f (yi ),
. p(xi |yi ) = δxi ,f (yi ) , p(xi , yi ) = δxi ,f (yi ) p(yi ) .

The conditional entropy (5.54) then vanishes,


  
H (X|Y ) = −
. δxi ,f (yj ) pY (yj ) log δxi ,f (yj ) = 0 ,
xi ,yj

since .δxi ,f (yj ) is either unity, in which case .log(δ) = log(1) = 0, or zero, in
which case .0 log(0) vanishes as a limiting process. The conditional entropy .H (X|Y )
measures hence the amount of information present in the stochastic process X which
is not causally related to the process Y . The mutual entropy reduces to the marginal
entropy, as a corollary,

I (X, Y ) → H (X) ,
.

for the case that X is fully determined by Y , compare (5.53).

5.2.4 Kullback-Leibler Divergence

One is often interested in comparing two distribution functions .p(x) and .q(x)
with respect to their similarity. When trying to construct a measure for the degree
of similarity one is facing the dilemma that probability distributions are positive
definite and one can hence not define a scalar product as for vectors; two probability
densities cannot be orthogonal. It is nevertheless possible to define a positive definite
measure.

KULLBACK-LEIBLER DIVERGENCE Given two probability distribution functions .p(x)


and .q(x), the functional

p(x)
.K[p; q] = p(x) log dx ≥ 0 (5.55)
q(x)

is a non-symmetric measure for the difference between .p(x) and .q(x).

The Kullback-Leibler divergence .K[p; q] is also denoted “relative entropy.” The


proof for .K[p; q] ≥ 0 is analogous to the one for the mutual information given
5.2 Entropy and Information 191

in Sect. 5.2.3. The Kullback-Leibler divergence vanishes for identical probability


distributions, viz when .p(x) ≡ q(x).

Relation to the .χ 2 Test We consider the case that the two distribution functions p
and q are nearly identical,

q(x) = p(x) + δp(x),


. δp(x) ⪡ 1 ,

and expand .K[p; q] in powers of .δp(x), using

2
δp δp
. log(q) = log(p + δp) ≈ log(p) + − + ... ,
p p

obtaining
  2
δp δp
K[p; q] ≈
. dx p log(p) − log(p) − +
p p
 
(δp)2 (p − q)2
= dx = dx , (5.56)
p p
 
since . δpdx = 0, as a consequence of the normalization conditions . pdx = 1 =2
qdx. This measure for the similarity of two distribution functions is termed .χ
test. It is actually symmetric under exchanging .q ↔ p, up to order .(δp)2 .

Example As a simple example we take two distributions .p(σ ) and .q(σ ) for a
binary variable .σ = 0/1,

p(0) = 1/2 = p(1),


. q(0) = α, q(1) = 1 − α , (5.57)

with .p(σ ) being flat and .α ∈ [0, 1]. The Kullback-Leibler divergence,

 p(σ ) −1 1
K[p; q] =
. p(σ ) log = log(2α) − log(2(1 − α))
q(σ ) 2 2
σ =0,1

−1
= log(4(1 − α)α) ≥ 0 ,
2
is unbounded, since .limα→0,1 K[p; q] → ∞. Interchanging .p ↔ q yields

K[q; p] = α log(2α) + (1 − α) log(2(1 − α))


.

= log(2) + α log(α) + (1 − α) log(1 − α) ≥ 0 ,


192 5 Information Theory of Complex Systems

Fig. 5.6 For two 3


distributions p and q
parametrized by .α, see (5.57), K[p;q]
the respective K[q;p]
Kullback-Leibler divergences 2
.K[p; q] (dashed line) and
.K[q; p] (full line). Note the
maximal asymmetry for
.α → 0, 1, where 1
.limα→0,1 K[p; q] = ∞

0 0.5 1
α
which is now finite in the limit .limα→0/1 . The Kullback-Leibler divergence is highly
asymmetric, compare Fig. 5.6.

Kullback-Leibler Divergence vs. Mutual Information The mutual information,


as defined by (5.50), is a special case of the Kullback-Leibler divergence. We first
write (5.2.4) for the case that p and q depend on two variables x and y,

p(x, y)
K[p; q] =
. p(x, y) log dxdy . (5.58)
q(x, y)

This expression is identical to the mutual information (5.50) for the case that .q(x, y)
is the product of the two marginal distributions of .p(x, y),
 
.q(x, y) = p(x)p(y), p(x) = p(x, y)dy, p(y) = p(x, y)dx .

Two independent processes are described by the product of their probability


distributions. The mutual information hence measures the distance between a joint
distribution .p(x, y) and the product of its marginals, viz the distance between
correlated and independent processes.

Fisher Information The Fisher information .F (θ ) measures the sensitivity of a


distribution function .p(y, θ ) with respect to a given parametric dependence .θ ,
 2

F (θ ) =
. ln(p(y, θ )) p(y, θ )dy . (5.59)
∂θ

In typical applications the parameter .θ is a hidden observable one may be interested


to estimate.
5.3 Complexity Measures 193

Kullback-Leibler Divergence vs. Fisher Information The infinitesimal


Kullback-Leibler divergence between .p(y, θ ) and .p(y, θ + δθ ) is
 
p(y, θ ) p + p' δθ
K=
. dy p(y, θ ) log ≈ − dy p log
p(y, θ + δθ ) p
 
∂p(y, θ ) (δθ )2 1 ∂p(y, θ ) 2
= − dy δθ + dy (5.60)
∂θ 2 p(y, θ ) ∂θ

with .p = p(y, θ ) and .p' = ∂p(y, θ )/∂θ . The first term in (5.60),

∂ ∂
.(−δθ ) dy p(y, θ ) = (−δθ ) 1 ≡ 0,
∂θ ∂θ

vanishes. The second term in (5.60) contains the Fisher information (5.59). Hence

 (δθ )2
K p(y, θ ); p(y, θ + δθ ) = F (θ )
. , (5.61)
2
which establishes the role of the Fisher information as a metric.

5.3 Complexity Measures

Can we provide a single measure, or a small number of measures, suitable for


characterizing the ‘degree of complexity’ of any dynamical system at hand?
This rather philosophical question has fascinated researchers for decades, yet no
definitive answer is known.
The quest of complexity measures touches a range of interesting topics in
dynamical systems theory, which has led to a number of powerful tools suitable
for studying dynamical systems, the original goal of developing a one-size-fits-
all measure for complexity seems however not longer a scientifically valid target.
Complex dynamical systems can show an extended variety of qualitatively different
behaviors, one of the reasons why complex system theory is fascinating, and it is
not appropriate to shove all complex systems into a single basket for the purpose of
measuring their degree of complexity with a single yardstick.

Intuitive Complexity The task of developing a mathematically well defined


measure for complexity is handicapped by the lack of a precisely defined goal. In
the following we discuss selected prerequisites and constraints one may postulate for
valid complexity measures. In the end it is, however, up to our intuition to decipher
whether these requirements are appropriate or not.
An example of a process one may intuitively attribute a high degree of complexity
are the intricate spatio-temporal patterns generated by the forest fire model, with
194 5 Information Theory of Complex Systems

Fig. 5.7 The degree of


complexity (full line) should
be minimal both in the fully
ordered and the fully

complexity
disordered regime. For some
applications it may however
be meaningful to consider
complexity measures
maximal for random states
(dashed line)

order disorder

perpetually changing fronts of fires burning through a continuously regrowing


forest.15

Complexity vs. Randomness A popular proposal for a complexity measure is the


information entropy .H [p], as defined by (5.28). It vanishes when the system is
regular, which agrees with our intuitive presumption that complexity is low when
nothing happens. The entropy is however maximal for random dynamics.
It is a question of viewpoints to which extent one should consider random
systems as complex, compare Fig. 5.7. For some considerations, e.g. when dealing
with ‘algorithmic complexity’, which will be treated in Sect. 5.3.2), it makes sense
to attribute maximal complexity degrees to fully random sets of objects. In general,
however, complexity measures should be concave, attaining minimal values for
regular behavior as well as for purely random sequences.

Complexity of Multi-component Systems Complexity should be a positive


quantity, like entropy. But what about being extensive or intensive? This is a non-
trivial question.
Intuitively one may demand complexity to be intensive, as one would not expect
to gain complexity when lotting together N independent dynamical systems. On
the other side we cannot rule out that a set of strongly interacting dynamical
systems could show more and more complex behavior with an increasing number
of subsystems, along the lines of the saying ‘quantity has its own quality’. This
purposely a feature of massive machine learning architecture, or of human brains.
There is no simple way out of this quandary when searching for a single one-
size-fits-all complexity measure. Both intensive and extensive complexity measures
have their areas of validity.

15 States of the forest fire model are presented in Fig. 6.7, see Chap. 6.
5.3 Complexity Measures 195

Complexity and Behavior The search for complexity measures is not just an
abstract academic quest. As an example consider how bored we are when our envi-
ronment is repetitive, having low complexity, and how stressed when complexity
overwhelms our sensory organs. There are indeed indications that a valid behavioral
strategy for highly developed cognitive systems may consist in optimizing the
degree of complexity. Well defined complexity measures are necessary in order to
quantify this intuitive statement mathematically.

5.3.1 Complexity and Predictability

Interesting complexity measures can be constructed using statistical tools, general-


izing concepts like information entropy and mutual information. We consider here
time series generated from a finite set of symbols. One may, however, interchange
time with space whenever one is concerned with studying the complexity of spatial
structures.

Stationary Dynamical Processes As a prerequisite for the analysis of complexity


we need stationary dynamical processes, viz dynamical processes which do not
change their behavior and their statistical properties qualitatively over time. In
practice, this implies that the time series considered has a finite time horizon .τ .
The system might have several time scales .τi ≤ τ , but for large times .t ⪢ τ all
correlation functions need to fall off exponentially. This is the case for ‘normal’
systems, but not for critical dynamical systems characterized by dynamical and
statistical correlations that do not decay exponentially, but as power laws.16

Measuring Joint Probabilities For times .t0 , t1 , . . ., a set of symbols X, and a


time series containing n elements,

xn , xn−1 , . . . , x2 , x1 ,
. xi = x(ti ), xi ∈ X, , (5.62)

we may define the joint probability distribution

.pn : p(xn , . . . , x1 ) . (5.63)

The joint probability .p(xn , . . . , x1 ) is not given a priori, it needs to be measured


from an ensemble of time series. This is a demanding task as .p(xn , . . . , x1 ) has
n
.(Ns ) components, when .Ns is the number of symbols in X.

It makes no sense to evaluate joint probabilities .pn for time differences .tn ⪢ τ ,
as all joint distributions factorize when time lags become large. For finite values of
n large numbers of subsets of length n can be cut out of a complete time series,

16 An analogous discussion for the autocorrelation function of critical vs. non-critical system is
presented in Sect. 6.2. of Chap. 6.
196 5 Information Theory of Complex Systems

providing the basis for reliable statistical estimates. This is an admissible procedure
for stationary dynamical processes.

Entropy Density We recall the definition of the Shannon entropy



H [pn ] = −
. p(xn , . . . , x1 ) log(p(xn , . . . , x1 ))
xn ,...,x1 ∈X

= −〈 log(pn ) 〉pn , (5.64)

which needs to be measured for an ensemble of time series of length n or greater.


Of interest is the entropy density in the limit of large times,

1
.h∞ = lim H [pn ] , (5.65)
n→∞ n

which exists for stationary dynamical processes with finite time horizons. The
entropy density is the mean number of bits per time step needed for encoding the
time series statistically.

Excess Entropy We define the “excess entropy” E as


 
. E = lim H [pn ] − n h∞ ≥ 0 . (5.66)
n→∞

The excess entropy is equivalent to the non-extensive part of the entropy, being the
coefficient of the term .∝ n0 when expanding the entropy in powers of .1/n,

H [pn ] = n h∞ + E + O(1/n),
. n → ∞, (5.67)

compare Fig. 5.8. The excess entropy E is positive as long as .H [pn ] is concave as a
function of n, which is the case for stationary dynamical processes.17 For practical
purposes, the excess entropy can be approximated using finite differences,

h∞ = lim hn ,
. hn = H [pn+1 ] − H [pn ] , (5.68)
n→∞

since .h∞ corresponds to the asymptotic slope of .H [pn ], compare Fig. 5.8.

– One may use (5.68) together with (5.54) for expressing entropy density .hn in
terms of an appropriately generalized conditional entropy.

17 To prove that the excess entropy is positive is the task of exercise (5.10).
5.3 Complexity Measures 197

Fig. 5.8 For a time series of


length n, the entropy .H [pn ]
(full line) increases
monotonically, with a
limiting slope .h∞ (dashed
line). For .n → ∞ the entropy H[pn]
scales as .H [pn ] ≈ E + h∞ n,
with the excess entropy E
given by the intercept of E
asymptote with the y-axis

– Equation (5.67) allows to rewrite the excess entropy as

  H [pn ] 
. − h∞ .
n
n

In this form the excess entropy is known as the “effective measure complexity”
(EMC) or “Grassberger entropy”.

Excess Entropy and Predictability The excess entropy vanishes both for a
random and for an ordered system. For a random system

H [pn ] = n H [pX ] ≡ n h∞ ,
.

where .pX is the marginal probability. The excess entropy (5.66) vanishes conse-
quently. For an example of ordered dynamics we can take a system generating only
two types of sequences, say

. . . . 000000000000000 . . . , . . . 111111111111111 . . . ,

respectively with probabilities .α and .1 − α. This kind of dynamics is the natural


output of logical AND or OR rules. The joint probability distribution has only two
non-zero components,

p(0, . . . , 0) = α,
. p(1, . . . , 1) = 1 − α, ∀n ,

all other .p(xn , . . . , x1 ) vanish, which leads to

H [pn ] ≡ −α log(α) − (1 − α) log(1 − α),


. ∀n .

The entropy density .h∞ vanishes for .α → 0, 1, viz in the deterministic limit, with
the excess entropy E becoming .H [pn ].
198 5 Information Theory of Complex Systems

The excess entropy therefore fulfills the concaveness criteria illustrated in


Fig. 5.7, vanishing both in the absence of predictability (random states), and for
the case of strong predictability (i.e. for deterministic systems). The excess entropy
does however not vanish in above example for .0 < α < 1, when two predictable
states are superimposed statistically in an ensemble of time series. Whether this
behavior is compatible with our intuitive notion of complexity is, to a certain extent,
a matter of taste.

5.3.2 Algorithmic and Generative Complexity

So far we discussed descriptive approaches using statistical methods for the


construction of complexity measures. One may, on the other hand, be interested in
modelling the generative process. The question is then, which is the simplest model
able to explain the observed data?

Individual Objects For the statistical analysis of a time series we have been
concerned with ensembles of time series, as generated by the identical underlying
dynamical system, together with the limit of infinitely long times. In this section we
will be dealing with individual objects composed of a finite number of n symbols,
like

0000000000000000000000,
. 0010000011101001011001 .

The question is then: which dynamical model can generate the given string of
symbols? One is interested, in particular, in strings of bits and in computer codes
capable of reproducing them.

Turing Machine In theoretical informatics, the reference computer code is the set
of instructions needed for a “Turing machine” to carry out a given computation. The
exact definition of a Turing machine is not of relevance here, it is essentially a finite-
state machine working on a set of instructions called code. The Turing machine
plays a central role in the theory of computability, e.g. when one is interested in
examining how hard it is to find the solution to a given set of problems.

Algorithmic Complexity The notion of algorithmic complexity tries to find an


answer to the question of how hard it is to reproduce a given time series, in the
absence of prior knowledge.

ALGORITHMIC COMPLEXITY The algorithmic complexity of a string of bits is the length


of the shortest program that prints the given string of bits and then halts.

Algorithmic complexity is equivalent to “Kolmogorov complexity”. Note, that


the involved computer or Turing machine is supposed to start with a blank memory,
viz with no prior knowledge.
5.3 Complexity Measures 199

Algorithmic Complexity and Randomness Algorithmic complexity is a powerful


concept for theoretical considerations in the context of optimal computability. It
comes however with two drawbacks, being not computable and attributing maximal
complexity to random sequences.
Random number generators can only be approximated by finite state machines
like the Turing machine, which would need an infinite code length to produce
perfectly decorrelated symbols. This is the reason why real-world codes for
random number generators generate ‘pseudo random numbers’, with the degree
of randomness to be tested statistically. Algorithmic complexity conflicts therefore
with the common postulate for complexity measures to vanish for random states,
compare Fig. 5.7.

Deterministic Complexity There is a vast line of research trying to understand


the generative mechanism of complex behavior not algorithmically, but from the
perspective of dynamical systems theory, in particular for deterministic systems.
The question is then: in the absence of noise, which are the features needed to
produce intricated trajectories?
Of interest is in this context the sensitivity to initial conditions for systems having
a transition between chaotic and regular states in phase space,18 as well as the effects
of bifurcations and non-trivial attractors like strange attractors.19 Also of relevance
are the consequences of feedback and tendencies towards synchronization.20 This
line of research is embedded in the general quest to understand both the properties
and the generative causes of complex and adaptive dynamical systems.

Complexity and Emergence Intuitively, we attribute a high degree of complexity


to ever changing structure emerging from possibly simple underlying rules, an
example being the forest fires burning their way through the forest along self-
organized fire fronts. This link between complexity and ‘emergence’ is, however,
not easy to mathematize, as no precise measure for emergence has been proposed to
date.

Weak and Strong Emergence On a final note one needs to mention that a
vigorous distinction is being made in philosophy between the concept of ‘weak
emergence’, which we treated here, and the scientifically irrelevant notion of ‘strong
emergence’. Properties of a complex system generated via weak emergence result
from the underlying microscopic laws, whereas strong emergence leads to top-level
properties which are strictly novel in the sense that they cannot, like magic, linked
causally to the underlying microscopic laws of nature.

18 Transitions between extended chaotic and regular phases occur in boolean networks, see Chap. 7.
19 For strange attractors and the like consult Chap. 3.
20 The Kuramoto model is the the standard reference for globally synchronized states, as detailed
out in Chap. 9.
200 5 Information Theory of Complex Systems

Exercises

(5.1) THE LAW OF LARGE NUMBERS


Generalize the derivation for the law of large numbers given in Sect. 5.1.1
(i)
for the case of i = 1, . . . , N independent discrete stochastic processes pk ,
 (i) k
described by their respective generating functionals Gi (x) = k pk x .
(5.2) CUMMULATIVE DISTRIBUTION FUNCTIONS
Evaluate the distribution function of the sum of N = 2 flat distributions and
compare the result with the Gaussian (5.11) from the central limit theorem,
in analogy to Fig. 5.2.
(5.3) SYMBOLIZATION OF FINANCIAL DATA
Generalize the symbolization procedure defined for the joint probabilities
p±± defined by (5.20) to joint probabilities p±±± . For example, p+++
would measure the probability of three consecutive increases. Download
from the Internet the historical data for your favorite financial asset, like
the Dow Jones or the Nasdaq stock indices, and analyze it with this
symbolization procedure. Discuss, whether it would be possible, as a matter
of principle, to develop in this way a money-making scheme.
(5.4) THE OR TIME SERIES WITH NOISE
Consider the time series generated by a logical OR, akin to (5.21). Evaluate
the probability p(1) for finding a 1, with and without averaging over initial
conditions, both without noise and in presence of noise. Discuss the result.
(5.5) TRAILING AVERAGES
For a time series y(t), with continuous time t ∈ R, the trailing average is
given by the exponentially discounted mean of its past values,
 ∞
1
ȳ(t) =
. y(t − τ ) e−τ/T dτ .
T 0

Show that y(t) can be evaluated by the local updating rule dt d


ȳ = (y − ȳ)/T .
Is there also a local updating rule for the trailing cos-transform
 ∞
1 + (ωT )2
.yc (t, ω) = y(t − τ ) e−τ/T cos(ωτ ) dτ ,
T 0

and, analogously, for the trailing sin-transform ys (t, ω)?


(5.6) MAXIMAL ENTROPY DISTRIBUTION FUNCTION
Determine the probability distribution function p(x), having a given mean
μ and a given variance σ 2 , compare (5.39), which maximizes the Shannon
entropy.
Exercises 201

(5.7) TWO-CHANNEL MARKOV PROCESS


Consider, in analogy to (5.43) the two-channel Markov process {σt , τt },

OR(σt , τt ) probability 1 − α
σt+1 = AND(σt , τt ),
. τt+1 = .
¬OR(σt , τt ) probability α

Evaluate the joint and marginal distribution functions, the respective


entropies and the resulting mutual information. Discuss the result as a
function of noise strength α.
(5.8) KULLBACK-LEIBLER DIVERGENCE
Try to approximate an exponential distribution function by a scale-invariant
density, considering the Kullback-Leibler divergence K[p; q], see (5.2.4),
between the two normalized distributions
γ −1
p(x) = e−(x−1) ,
. q(x) = , x, γ > 1 .

Which exponent γ minimizes K[p; q]? How many times do the graphs for
p(x) and q(x) cross?
(5.9) CHI-SQUARED TEST
The quantity


N
(pi − qi )2
χ 2 [p; q] =
. (5.69)
pi
i=1

measures the similarity of two normalized probability distribution functions


pi and qi . Show, that the Kullback-Leibler divergence K[p; q], Eq. (5.2.4),
reduces to χ 2 [p; q]/2 if the two distributions are quite similar.
(5.10) EXCESS ENTROPY
Use the representation
 
E = lim En ,
. En ≈ H [pn ] − n H [pn+1 ] − H [pn ] ,
n→∞

to prove that E ≥ 0, compare (5.66) and (5.68), as long as H [pn ] is concave


as a function of n.
(5.11) TSALLIS ENTROPY
The “Tsallis Entropy”

1   q 
.Hq [p] = pk − pk , 0<q≤1 (5.70)
1−q
k
202 5 Information Theory of Complex Systems

of a probability distribution function p is a non-extensive generalization of


the Shannon entropy H [p]. Prove that

. lim Hq [p] = H [p], Hq [p] ≥ 0,


q→1

and the non-extensiveness

Hq [p] = Hq [pX ] + Hq [pY ] + (1 − q) Hq [pX ] Hq [pY ],


. p = p X pY

for two statistically independent systems X and Y . For which distribution


function p is Hq [p] maximal?

Further Reading

For further readings we recommend introductions to information theory, Pierce


(2012), to Bayesian statistics, Bolstad and Curran (2016), and to algorithmic and
algebraic complexity, Zenil (2020) and Bürgisser (2013). For the case that the
reader may be inclined to delve into the philosophical intricacies of strong and weak
emergence, we recommend Wilson (2016).
Of further interest are several review articles, on complexity and predictability,
Boffetta et al. (2002), as well as regarding holographic complexity, Alishahiha
(2015). A critical assessment of various complexity measures can be found in
Olbrich et al. (2008).

References
Alishahiha, M. (2015). Holographic complexity. Physical Review D, 92, 126009.
Boffetta, G., Cencini, M., Falcioni, M., & Vulpiani, A. (2002). Predictability: A way to characterize
complexity. Physics Reports, 356, 367–474.
Bolstad, W. M., & Curran, J. M. (2016). Introduction to Bayesian statistics. John Wiley & Sons.
Bürgisser, P., Clausen, M., & Shokrollahi, M. A. (2012). Algebraic complexity theory. Springer
Science & Business Media.
Olbrich, E., Bertschinger, N., Ay, N., & Jost, J. (2008). How should complexity scale with system
size? The European Physical Journal B, 63, 407–415.
Pierce, J. R. (2012). An introduction to information theory: Symbols, signals and noise. Courier
Corporation.
Wilson, J. (2016). Metaphysical emergence: Weak and strong. In Metaphysics in contemporary
physics (pp. 345–402). Brill.
Zenil, H. 2020 A review of methods for estimating algorithmic complexity: options, challenges,
and new directions. Entropy, 22, 612.
Self-Organized Criticality
6

Classically, a phase transition occurs when the properties of a system change upon
tuning an external parameter, like the temperature. Is it possible, that a complex
system regulates an internal parameter on its own, self-organized, such that it
approaches a critical point all by itself? This is the central question discussed in
this chapter.
Starting with an introduction to the Landau theory of phase transitions, particular
attention will be devoted to cellular automata, an important and popular class of
standardized dynamical systems. Cellular automata allow for an intuitive construc-
tion of models, such as the forest fire mode, the game of life, and the sandpile model,
which exhibits “self-organized criticality”. Mathematically, a further understanding
will be attained with the help of random branching theory. The chapter concludes
with a discussion of whether self-organized criticality occurs in the most adaptive
dynamical system of all, namely in the context of long-term evolution.

6.1 Landau Theory of Phase Transitions

One may describe the physics of thermodynamic phases either microscopically with
the tools of statistical physics, or by considering the general properties close to a
phase transition. The Landau theory of phase transitions does the latter, providing a
general framework valid irrespectively of the microscopic details of the material.

Second-Order Phase Transitions Phase transitions occur regularly in physical


systems when the number of components diverges, viz in “macroscopic” systems.
Phases have distinct properties, with the key property being the “order parameter”,
it distinguishes one phase from another. Mathematically one can classify the type of
ordering according to the symmetry of the ordering breaks.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 203
C. Gros, Complex and Adaptive Dynamical Systems,
https://doi.org/10.1007/978-3-031-55076-8_6
204 6 Self-Organized Criticality

M h
disordered M=0

M 0
h=0 h>0 Tc T
M 0
ordered
T
Tc

Fig. 6.1 Phase diagram of a magnet in an external magnetic field h. Left: The order parameter
M, here the magnetization, as a function of temperature across the phase transition. Illustrated
are typical arrangements of the local moments (arrows). In the ordered phase there is a net
magnetic moment, viz magnetization. For .h = 0/.h > 0 the transition disorder-order is a sharp
transition/crossover. Right: The .T − h phase diagram. A sharp transition occurs only for vanishing
external fields h

ORDER PARAMETER In a continuous or “second-order” phase transition, the high-


temperature phase has a higher symmetry than the low-temperature phase. The degree of
symmetry breaking is characterized by the order parameter .φ.

Note that all matter is disordered at high enough temperatures, in physical


systems ordered phases occur at low to moderate temperatures.

Ferromagnetism in Iron The classical example for a phase transition is that of a


magnet like iron. Above the Curie temperature of .Tc = 1, 043 K, the elementary
magnets are disordered, see Fig. 6.1 for an illustration. They fluctuate strongly and
point in random directions. The net magnetic moment vanishes. Below the Curie
temperature the moments point on the average to a certain direction creating such
a macroscopic magnetic field. Since magnetic fields are generated by circulating
currents, with current directions depending on time, magnetic state of ferromagnets
like iron break “time-reversal symmetry”. Further examples of order parameters
characterizing phase transitions in physical systems are listed in Table 6.1.

Table 6.1 Examples of common types of phase transitions in physical systems. When the
transition is continuous/discontinuous one speaks of a second-/first-order phase transition. Note,
that order parameters can be non-intuitive. The superconducting state, notable for its ability to
carry electrical current without dispersion, breaks what one calls the .U (1)-gauge invariance of
the normal (non-superconducting) metallic state
Transition Type Order parameter .φ
Superconductivity Second-order U(1)-gauge
Magnetism Mostly second-order Magnetization
Ferroelectricum Mostly second-order Polarization
Bose–Einstein Second-order Amplitude of .k = 0 state
Liquid-gas First-order Density
6.1 Landau Theory of Phase Transitions 205

a>0 3
f(T, φ,h) - f0(T,h)
h>0 P(φ) P(φ) = (t - 1) φ + φ

h>0
φ1 φ3 0
0 φ1 φ2 φ0 φ3

h=0 t>1 t<1

0 φ 0 φ

Fig. 6.2 Landau-Ginzburg theory for .t > 1 (dotted green line) and .t < 1 (dashed blue line),
corresponding repectively to the disordered and ordered phase, where .a = (t − 1)/2. Left: The
functional dependence of .f (T , φ, h) − f0 (T , h) = −h φ + a φ 2 + b φ 4 , including the case .h = 0
(full red line). Right: For a non-vanishing field .h /= 0, the graphical solution of (6.5), where .φ0 is
the order parameter in the disordered phase, .φ1 and .φ3 the stable solutions in the ordered state, and
.φ2 the unstable solution for .t < 1

Free Energy A statistical mechanical system takes the configuration with the
lowest energy at zero temperature. At finite temperatures .T > 0, it is however
not the energy E which is minimized, but the free energy F, which differs from
the energy by a term proportional to the entropy and to the temperature. For the
following this difference is however not directly of relevance.

Close to the transition temperature .Tc , the order parameter .φ is small and one
assumes within the Landau-Ginzburg model that the free energy density,

F
f = f (T , φ, h),
. f = ,
V
can be expanded for a small order parameter .φ and a small external field h,

f (T , φ, h) = f0 (T , h) − h φ + a φ 2 + b φ 4 + . . .
. (6.1)

where the parameters .a = a(T ) and .b = b(T ) are functions of the temperature
T . The external field h, e.g. a magnetic field for the case of magnetic systems, is
presumed to couple linearly to the order parameter .φ. For (6.1) to be stable for large
.φ one needs .b > 0, compare Fig. 6.2.

Spontaneous Symmetry Breaking Odd terms .∼ φ 2n+1 vanish in the expan-


sion (6.1) for .f (T , φ, h), which is valid for temperatures close to .Tc . Odd
contributions vanish when the disordered high-temperature state is invariant under
inversion symmetry,

f (T , φ, h) = f (T , −φ, −h),
. φ ↔ −φ, h ↔ −h ,
206 6 Self-Organized Criticality

which is generically the case. Inversion symmetry is broken when the temperature
is lowered below .Tc , viz when the order parameter .φ acquires a finite expectation
value. This is the phenomenon of “spontaneous symmetry breaking”, which is
inherent to second-order phase transitions.1

Variational Approach The Landau-Ginzburg functional (6.1) expresses the free


energy for all potentially possible values of .φ. The true physical state, namely
“thermodynamical stable state”, is obtained by finding the value of .φ minimizing
.f (T , φ, h),

 
0 = δf = −h + 2aφ + 4b φ 3 δφ,
.

0 = −h + 2aφ + 4bφ 3 , (6.2)

where .δf and .δφ denote small variations respectively of the free energy and of the
order parameter. The stationary condition (6.2) corresponds to a minimum of the
free energy if
 
.δ 2 f > 0, δ 2 f = 2a + 12bφ 2 (δφ)2 . (6.3)

In this case the solution is “locally stable”, since any change in .φ from its optimal
value would increase the free energy.

Solutions for .h = 0 When there is no external field, for .h = 0, the solution of (6.2)
is

0 for a > 0
.φ = √ . (6.4)
± −a/(2b) for a < 0

The trivial solution .φ = 0 is stable,


 
. δ2f = 2a(δφ)2 ,
φ=0

if .a > 0. The nontrivial solutions are given by .φ 2 = −a/(2 b), they are stable for
.a < 0,

 
2
. δ f = −4a(δφ)2 .
φ/=0

1 Spontaneous symmetry breaking is not present for first-order transitions, at which properties
change discontinuous, see Table 6.1.
6.1 Landau Theory of Phase Transitions 207

Graphically this is immediately evident,2 see Fig. 6.2. For .a > 0 there is a single
global minimum at .φ = 0, whereas one has two symmetric minima. when .a < 0.

Continuous Phase Transition Given (6.4), one has that the Ginzburg-Landau
functional (6.1) describes a continuous phase transition when .a = a(T ) changes
sign at the critical temperature .Tc . We may expand .a(T ) for small .T − Tc ,

a(T ) ∼ T − Tc ,
. a = a0 (t − 1), t = T /Tc , a0 > 0 ,

where we used .a(Tc ) = 0. For .T < Tc , in the ordered phase, the solution takes the
form

a0
.φ = ± (1 − t), t < 1, T < Tc .
2b

The sqrt-increase of the order parameter is a telltale characteristics of a second-order


phase transition on a mean-field level.

Simplification by Rescaling One can always rescale the order parameter .φ, the
external field h and the free energy density f , such that .a0 = 1/2 and .b = 1/4. This
leads to
t −1 t −1 2 1 4
a=
. , f (T , φ, h) − f0 (T , h) = −h φ + φ + φ
2 2 4
and

φ = ± 1 − t,
. t = T /Tc

for the non-trivial solution.

Solutions for .h /= 0 The solutions of (6.2) are determined in rescaled form by

h = (t − 1) φ + φ 3 ≡ P (φ) ,
. (6.5)

as illustrated in Fig. 6.2. In general one finds three solutions .φ1 < φ2 < φ3 and one
can show that the intermediate solution is always locally unstable,3 with .φ3 (.φ1 )
being globally stable for .h > 0 (.h < 0).

2 From a dynamical systems point of view, the transition shown in Fig. 6.2 is equivalent to a
pitchfork bifurcation, as detailed out Sect. 2.2.2 of Chap. 2.
3 The stability of the three solutions are treated in exercise (6.1).
208 6 Self-Organized Criticality

First-Order Phase Transition We note, see Fig. 6.2, that the solution .φ3 for .h > 0
remains locally stable when we vary the external field slowly, viz adiabatically,

(h > 0)
. → (h = 0) → (h < 0) ,

in the ordered state .T < Tc . At a certain critical field, see Fig. 6.3, the order
parameter changes sign abruptly, jumping from the branch corresponding to .φ3 > 0
to the branch .φ1 < 0. One speaks of hysteresis, a phenomenon typical for first-order
phase transitions.

Susceptibility When the system approaches the phase transition from above in the
disordered state, it has an increased sensitivity towards ordering under the influence
of an external field h.

SUSCEPTIBILITY The susceptibility .χ of a system denotes its response to an external


field,
 
∂φ
.χ = ,
∂h T

where the subscript T indicates that the temperature is kept constant.

The susceptibility is a prime candidate for experimental investigations, as it


measures the increase of ordering, .φ = φ(h), induced by an external field h.

Diverging Response Taking the derivative with respect to the external field h
in (6.5), .h = (t − 1) φ + φ 3 , we find
 ∂φ 1 Tc
1 = (t − 1) + 3 φ 2
. , χ (T ) = = (6.6)
∂h h→0 t −1 T − Tc

for the disordered phase, since .φ(h = 0) = 0 for .T > Tc . The susceptibility diverges
at the phase transition for .h = 0, see Fig. 6.3. This divergence is a typical precursor
of ordering for a second-order phase transition. Exactly at .Tc , viz at criticality, the
response of the system is formally infinite.

A non-vanishing external field .h /= 0 induces a finite amount of ordering .φ /= 0


at all temperatures, which masks the phase transition, compare Fig. 6.1. In this case,
the susceptibility is a smooth function of temperature, see (6.6) and Fig. 6.3.
6.2 Criticality in Dynamical Systems 209

order parameter φ φ3

susceptibility χ
0

h=0
h>0

φ1
0 Tc
field h temperature T

Fig. 6.3 Left: Discontinuous phase transition and hysteresis in the Landau model. Plotted is the
solution .φ = φ(h) of .h = (t − 1)φ + φ 3 in the ordered phase, viz for .t < 1, when changing the
field h. Right: The susceptibility .χ = ∂φ/∂h for .h = 0 (solid line) and .h > 0 (dotted line). The
susceptibility divergence in the absence of an external field, compare (6.6)

6.2 Criticality in Dynamical Systems

Length Scales Pphysical or complex system normally dispose of well defined time
and space scales. As an example we take a look at the Schrödinger equation for the
hydrogen atom,

∂Ψ (t, r) h̄2 Δ Ze2


i h̄
. = H Ψ (t, r), H =− − ,
∂t 2m |r|

where

∂2 ∂2 ∂2
Δ=
. + +
∂x 2 ∂y 2 ∂z2

is the Laplace operator. One does not need to know the physical significance of the
parameters to realize that one can rewrite the differential operator H , namely the
“Hamilton operator”, as
 
2a0 mZ 2 e4 h̄2
H = −ER
. a02 Δ + , ER = , a0 = .
|r| 2h̄2 mZe2

The length scale .a0 = 0.53 Å./Z is called “Bohr radius”, with the energy scale
ER = 13.6 eV being the “Rydberg energy”. The energy scale .ER determines both
.

the ground state energy and excitation. The Bohr radius .a0 sets the scale for the mean
radius of the ground state wavefunction and all other radius-dependent properties.

Similar length scales can be defined for essentially all dynamical systems defined
by a set of differential equations. The damped harmonic oscillator and the diffusion
210 6 Self-Organized Criticality

equations, e.g. are given respectively by

∂ρ(t, r)
ẍ(t) − γ ẋ(t) + ω2 x(t) = 0,
. = DΔρ(t, r) . (6.7)
∂t
The parameters 1/γ and 1/ω determine respectively the time scales for relaxation
and oscillation, with D being the diffusion constant.

Correlation Function A suitable quantity to measure and discuss the properties


of dynamical systems like the ones defined by (6.7), is the equal-time correlation
function S(r), which is the expectation value

.S(r) = 〈ρ(t0 , x)ρ(t0 , y)〉, r = |x − y| . (6.8)

Here ρ(t0 , x) denotes the particle density, for the case of the diffusion equation
or when considering a statistical mechanical system of interacting particles. The
exact expression for ρ(t0 , x) in general depends on the type of dynamical system
considered, for the Schrödinger equation ρ(t, x) = Ψ ∗ (t, x)Ψ (t, x), i.e. the
probability to find the particle at time t at the point x.

The equal-time correlation function measures the probability to find a particle


at position x when there is one at y. Often one is interested in the deviation of the
correlation from the average behaviour. In this case one considers 〈ρ(x)ρ(y)〉 −
〈ρ(x)〉〈ρ(y)〉 instead of (6.8).

Correlation Length Of interest is the behavior of the equal-time correlation


function S(r) for large distances r → ∞. In general we have two possibilities,

e−r/ξ non-critical
S(r)
. ∼ . (6.9)
r→∞ 1/r d−2+η critical

In normal, viz non-critical systems, correlations over arbitrary large distances cannot
be built up, which implies that the correlation function decays exponentially, over
a scale determined by the correlation length ξ . The notation d − 2 + η > 0 for the
decay exponent of the critical system is a convention from statistical physics, where
d = 1, 2, 3, . . . is the dimensionality of the embedding system.

Scale-Invariance and Self-Similarity If a control parameter of a physical system,


often the temperature, is tuned such that it sits exactly at the point of a phase
transition, the system is said to be critical. At this point there are no characteristic
length scales.

SCALE INVARIANCE If a measurable quantity, like the correlation function, decays like a
power of the distance ∼ (1/r)δ , with a critical exponent δ, the system is said to be critical
or scale-invariant.
6.2 Criticality in Dynamical Systems 211

Fig. 6.4 Simulation of the 2D-Ising model on a square lattice, H = 〈i,j 〉 σi σj , with nearest-
neighbor interactions 〈i, j 〉. There are two magnetization orientations σi = ±1 (dark/light dots).
Shown is the ordered state T < Tc (left), the critical state T ≈ Tc (middle), and the disordered
state T > Tc (right). Note the occurrence of fluctuations at all length scales at criticality, reflfecting
self similarity

Power laws have no scale, they are self-similar,4 because


 r δ  r δ
0 1
S(r) = c0
. ≡ c1 , c0 r0δ = c1 r1δ ,
r r
holds for arbitrary distance scales r0 and r1 .

Universality at the Critical Point The equal-time correlation function S(r) is


scale invariant at criticality, compare (6.9). This is a surprising statement, since we
have seen before that the differential equations determining dynamical systems have
well defined time and length scales. How then does the solution of a dynamical
system become effectively independent of the parameters entering its governing
equations?

Scale invariance implies that fluctuations occur over all length scales, albeit
with varying probabilities. This can be seen by observing snapshots of statistical
mechanical simulations of simple models. The case of the two-dimensional Ising
model is given in Fig. 6.4.
The scale invariance of the correlation function at criticality is a central result of
the theory of phase transitions within statistical physics. Properties of systems close
to a phase transition are not determined by the exact values of their parameters, but
by the structure of the governing equations and their symmetries. This circumstance
is denoted “universality”, it constitutes one of the reasons for classifying phase
transitions according to the symmetry of their order parameters, see Table 6.1.

4 Power laws in terms of scale-free degree distributions are a cornerstone of network theory, please
consult Sect. 1.6 of Chap. 1.
212 6 Self-Organized Criticality

Autocorrelation Function The equal-time correlation function S(r) measures


real-space correlations. The corresponding quantity in time domain is the autocor-
relation function

〈A(t + t0 )A(t0 )〉 − 〈A〉2


Γ (t) =
. , (6.10)
〈A2 〉 − 〈A〉2

which can be defined for any time-dependent measurable quantity A, such as A(t) =
ρ(t, r→). Note that observations are defined relative to 〈A〉2 , viz relative to the mean
fluctuations. The denominator in (6.10) is a normalization convention, it leads to
Γ (0) ≡ 1.

In the non-critical state, that is in the diffusive regime, long-term memory is


absent. As a consequence, information about the initial state is lost exponentially,

Γ (t) ∼ e−t/τ ,
. t → ∞, (6.11)

where τ is the relaxation or autocorrelation time. It corresponds to the time scale of


the diffusion process.

Dynamical Critical Exponent The relaxation time entering (6.11) diverges at


criticality, as does the real-space correlation length ξ entering (6.9). One can then
define an appropriate exponent z, dubbed “dynamical critical exponent” z, in order
to relate the two power laws for τ and ξ , via

1
τ ∼ ξ z,
. ξ= , ξ T →Tc
→ ∞.
|T − Tc |ν

The autocorrelation time is divergent in the critical state T → Tc .

Self-Organized Criticality We have seen that phase transitions can be character-


ized by a set of exponents describing the respective power laws of various quantities,
like the correlation function or the autocorrelation function. The phase transition
occurs generally at a single point, viz T = Tc for a thermodynamical system. At
the phase transition the system becomes effectively independent of the details of its
governing equations, being determined by symmetries.

It may then come as a surprise that there should exist complex dynamical
systems that attain a critical state for a finite range of parameters. This possibility,
denoted “self-organized criticality”, is somewhat counter-intuitive. We can regard
the parameters entering the evolution equation as given externally. Self-organized
criticality then implies that the system adapts to changes in the external parameters,
e.g. to changes in the given time and length scales, in such a way that the stationary
state becomes effectively independent of those changes.
6.2 Criticality in Dynamical Systems 213

6.2.1 1/f Noise

So far we have discussed the occurrence of critical states in classical thermody-


namics and statistical physics. We now ask ourselves for experimental evidence that
criticality might play a central role in certain time-dependent phenomena.

1/f Noise The power spectrum of the noise generated by many real-world
.

dynamical processes falls off inversely with frequency f . This .1/f noise has been
observed for various biological activities, like the heart beat rhythms, for functioning
electrical devices or for meteorological data series. The origin of “pink noise”,
another word for a .1/f power spectrum, could be self-organized phenomena. Within
this view, one describes pink noise as being generated by a continuum of weakly
coupled damped oscillators representing the environment.

Power Spectrum of a Single Damped Oscillator A system with a single relax-


ation time .τ , see (6.7), and eigenfrequency .ω0 , has a Lorentzian power spectrum
∞ −1
S(ω, τ ) = Re
. dt eiωt e−iω0 t−t/τ = Re
0 i(ω − ω0 ) − 1/τ
τ
= .
1 + τ 2 (ω − ω0 )2

For large frequencies .ω ⪢ 1/τ , the power spectrum .S(ω, τ ) falls off like .1/ω2 .
Being interested in the large-f behavior, we neglect .ω0 in the following.

Distribution of Oscillators The combined power or frequency spectrum of a


continuum of oscillators is determined by the distribution .D(τ ) of relaxation times
.τ . For a critical system relaxation occurs over all time scales, as discussed in

Sect. 6.2. We may hence assume a scale-invariant distribution

1
D(τ ) ≈
. (6.12)
τα
for the relaxation times .τ . This distribution of relaxation times leads to a frequency
spectrum

τ τ 1−α
S(ω) =
. dτ D(τ ) ∼ dτ
1 + (τ ω)2 1 + (τ ω)2
1 (ωτ )1−α 1
= d(ωτ ) ∼ 2−α . (6.13)
ω ω1−α 1 + (τ ω) 2 ω

For .α = 1 we obtain .1/ω, the typical behavior of .1/f noise.


214 6 Self-Organized Criticality

The question is then how assumption (6.12) can be justified. The wide-spread
appearance of .1/f noise can only happen when scale-invariant distribution of
relaxation times are ubiquitous, viz if they were self-organized. The .1/f noise
therefore constitutes an interesting motivation for the search of possible mechanisms
leading to self-organized criticality.

6.3 Cellular Automata

Cellular automata are finite state lattice systems with discrete local update rules.

.zi → fi (zi , zi+δ , . . .), zi ∈ [0, 1, . . . , n] , (6.14)

where .i + δ denote neighboring sites of site i. Each site or “cell” of the lattice
follows a prescribed rule evolving in discrete time steps. At each step the new value
for a cell depends only on the current state of itself and on the state of its neighbors.
Cellular automata differ from general dynamical networks in two aspects.5

– The update functions are all identical: .fi () ≡ f (), viz they are translational
invariant.
– The number n of states per cell is usually larger than two, the boolean case.

Cellular automata can give rise to exceedingly complex behavior despite their
deceptively simple dynamical structure. We note that cellular automata are always
updated synchronously and never sequentially or randomly. The state of all cells is
renewed simultaneously.

Number of Update Rules The number of possible update rules is exponentially


large. For a two-dimensional model, the square lattice, we consider cells taking one
of two possible states,

zi = 0 (dead),
. zi = 1 (alive) .

For a basic type of rules the evolution of a given cell to the next time step depends
only on the current state of the cell and on the values of each of its eight nearest
neighbors. In this case there are

.29 = 512 configurations, 2512 = 1.3 × 10154 possible rules ,

since any one of the 512 configurations can be mapped independently to ‘live’ or
‘dead’. For comparison note that the age universe is of the order of .3×1017 seconds.

5A particularly important class of dynamical networks are studied in Chap. 7.


6.3 Cellular Automata 215

Totalistic Update Rules It clearly does not make sense to explore systematically
the consequences of arbitrary updating rules. One simplification is to consider a
mean field approximation that results in a subset of rules termed “totalistic”. For
mean field rules, the new state of a cell depends only on the total number of living
neighbors, besides its own state. An eight-cell neighborhood has

9 possible total occupancy states of neighboring sites,


.

2 · 9 = 18 configurations, 218 = 262,144 totalistic rules .

This is a large number, but it is exponentially smaller than the number of all possible
update rules for the same neighborhood.

6.3.1 Conway’s Game of Life

The “game of life” takes its name because it attempts to simulate the reproductive
cycle of a species. It is formulated on a square lattice with an update rule involving
the eight-cell neighborhood. A new offspring needs exactly three parents in its
neighborhood. A living cell dies of loneliness if it has less than two live neighbors,
and of overcrowding if it has more than three live neighbors. A living cell feels
comfortable with two or three live neighbors; in this case it survives. The complete
set of updating rules is listed in Table 6.2.

Living Isolated Sets The time evolution of an initial set of a cluster of living cells
can show varied types of behavior. Fixpoints of the updating rules, such as a square

. (0, 0), (1, 0), (0, 1), (1, 1)

of four neighboring live cells, survive unaltered. There are many configurations of
living cells which oscillate, such as three live cells in a row or column,
 
. (−1, 0), (0, 0), (1, 0) , (0, −1), (0, 0), (0, 1) .

Table 6.2 Updating rules for the game of life, with .zi = 0, 1 corresponding to empty and living
cells. An “x” as an entry denotes what is going to happen for the respective number of living
neighbors
Number of living neighbors
.zi (t) .zi (t + 1) 0 1 2 3 4. . . 8
0 1 x
0 x x x x
1 1 x x
0 x x x
216 6 Self-Organized Criticality

(a) (b)

(c)

Fig. 6.5 Time evolution of some living configurations for the game of life, see Table 6.2. (a) The
“block”; it quietly survives. (b) The “blinker”; it oscillates with period 2. (c) The “glider”; it shifts
by (.−1, 1) after four time steps

It constitutes a fixpoint of .f (f (.)), alternating between a vertical and a horizontal


bar. The configuration

. (0, 0), (0, 1), (0, 2), (1, 2), (2, 1)

is dubbed “glider”, since it returns to its initial shape after four time steps but is
displaced by .(−1, 1), see Fig. 6.5. It constitutes a fixpoint of .f (f (f (f (.)))) times
the translation by .(−1, 1). The glider continues to propagate until it encounters a
cluster of other living cells.

Game of Life as a Universal Computer It is interesting to investigate, from an


engineering point of view, all possible interactions between initially distinct sets of
living cells in the game of life. In this context one finds that it is possible to employ
gliders for the propagation of information over arbitrary distances. One can prove
that arbitrary calculations can be performed by the game of life, when identifying
the gliders with bits. Suitable and complicated initial configurations are necessary
for this purpose, in addition to dedicated living subconfigurations performing logical
computations, in analogy to electronic gates, when hit by one or more gliders.

6.3.2 Forest Fire Model

The forest fires automaton is a simplified model of real-world forest fires. It is


formulated on a square lattice with three possible states per cell,

zi = 0 (empty),
. zi = 1 (tree), zi = 2 (fire) .
6.3 Cellular Automata 217

Fig. 6.6 For the forest fire model, the time evolution (from left to right) of a configuration of
living trees (green), burning trees (red) and of places burnt down (grey). Places burnt down can
regrow spontaneous with a small rate, the fire always spreads to nearest neighboring trees

A tree sapling can grow on every empty cell, with probability .p < 1. There is no
need for nearby parent trees, as sperms are carried by wind over wide distances.
Trees do not die in this model, but they catch fire from any burning nearest neighbor
tree. The rules are illustrated in Fig. 6.6.
The forest fire automaton differs from typical rules, such as Conway’s game of
life, because it has a stochastic component. In order to have an interesting dynamics
one needs to adjust the growth rate p as a function of system size, so as to keep the
fire burning continuously. The fires burn down the whole forest when trees grow too
fast. When the growth rate is too low, on the other hand, the fires, being surrounded
by ashes, may die out completely.

.zi (t) + 1)
.zi (t Condition
Empty Tree With probability .p < 1
Tree Tree No fire close by
Tree Fire At least one fire close by
Fire Empty Always

When adjusting the growth rate properly, one reaches a steady state, the system
having fire fronts continually sweeping through the forest, as it is observed for
real-world forest fires, see Fig. 6.7 for an illustration. In large systems stable spiral
structures form and set up a steady rotation.

Criticality and Lightning As defined above, the forest fire model is not critical,
since the characteristic time scale .1/p for the regrowth of trees governs the
dynamics. This time scale translates into a characteristic length scale .1/p, which
can be observed in Fig. 6.7, via the propagation rule for the fire.

Self-organized criticality can, however, be induced in the forest fire model when
introducing an additional rule, namely that a tree might ignite spontaneously with a
small probability f , when struck by lightning, causing also small patches of forest
218 6 Self-Organized Criticality

Fig. 6.7 Simulations of the forest fire model. Left: Fires burn in characteristic spirals for a growth
probability .p = 0.005 and no lightning, .f = 0. Reprinted from Clar et al. (1996) with permissions,
© 1996 IOP Publishing Ltd. Right: A snapshot of the forest fire model with a growth probability
.p = 0.06 and a lightning probability .f = 0.0001. Note the characteristic fire fronts with trees in
front and ashes behind

to burn. We will not discuss this mechanism in detail here, treating instead in the
next section the occurrence of self-organized criticality in the sandpile model on a
firm mathematical basis.

6.4 Sandpile Model and Self-Organized Criticality

Self-Organized Criticality The notion of ‘life at the edge of chaos’ states,6 that
certain dynamical and organizational aspects of living organisms may be critical.
Normal physical and dynamical systems show criticality however only for selected
parameters, e.g. .T = Tc , see Sect. 6.1. For criticality to be biologically relevant, the
system must evolve into a critical state starting from a wide range of initial states,
the defining trait of “self-organized criticality”.

Sandpile Model Per Bak and coworkers introduced a simple cellular automaton,
the BTW model, that mimics the properties of sandpiles. Every cell is characterized
by a force

zi = z(i, j ) = 0, 1, 2, . . . ,
. i, j = 1, . . . , L

on a finite L × L lattice. There is no one-to-one correspondence of the sandpile


model to real-world sandpiles. Loosely speaking one may identify the force zi with
the slope of real-world sandpiles. But this analogy is not rigorous, as the slope

6 The concept of life at the edge of chaos was first develop in the context of information-processing

networks, as discussed Chap. 7.


6.4 Sandpile Model and Self-Organized Criticality 219

of a real-world sandpile is a continuous variable. The slopes belonging to two


neighboring cells should therefore be similar, whereas the values of zi and zj on
two neighboring cells can differ by an arbitrary amount within the sandpile model.

The sand begins to topple when slopes gets too big,

zj → zj − Δij ,
. if zi > K ,

where K is the threshold slope and Δij The toppling matrix,



⎨ 4i=j
.Δi,j = −1 i, j nearest neighbors . (6.15)

0 otherwise

This update rule is valid for the four-cell neighborhood {(0, ±1), (±1, 0)}. The
threshold K is arbitrary, a shift in K simply shifts zi . It is customary to consider
K = 3. Any initial random configuration will eventually relax into a steady-state
with

zi = 0, 1, 2, 3,
. (stable state) .

Phases with zi ≤ K are absorbing, the subject of Sect. 6.4.1.

Open Boundary Conditions Without boundaries, the update rule (6.15) is


conserving, with the number of grains remaining constant.

CONSERVING QUANTITY If there is a quantity that is not changed by the update rule, it
is said to be conserving.

The sandpile model is locally conserving and hence said to be “abelian”. The
total height j zj is constant due to j Δi,j = 0. Globally, however, it is not
conserving, as one uses open boundaries, at which excess sand is lost, toppling
down from the table. Sand is lost when a site at the boundary topples.

However, there is only a tenuous relation of automatata models to real-world


sandpiles. The conserving nature of the sandpile model mimics the fact that sand
grains cannot be lost in real-world sandpiles, but this interpretation contrasts with
the previously assumed correspondence of zi with the slope of real-world sandpiles.

Avalanches When starting from a random initial state with zi ≤ K, the system
settles in a steady-state dynamic state when adding “grains of sand” for a while.
When a grain is dropped onto a site with zi = K

zi → zi + 1,
. zi = K ,
220 6 Self-Organized Criticality

Step 1 Step 2 Step 3 Step 4


3 0 3 2 3 3 0 3 2 3 3 0 4 2 3 3 2 0 3 3
1 2 3 1 0 1 2 4 1 0 1 4 0 2 0 2 0 2 2 0
2 3 3+1 1 3 2 4 0 2 3 3 0 2 2 3 3 1 2 2 3
3 1 0 2 0 3 1 1 2 0 3 2 1 2 0 3 2 1 2 0
0 2 1 3 1 0 2 1 3 1 0 2 1 3 1 0 2 1 3 1

Fig. 6.8 The progress of an avalanche, with duration t = 3 and size s = 13, for a sandpile
configuration on a 5 × 5 lattice with K = 3. The height of the sand in each cell is indicated by the
numbers. Also shown is avalanche progression (shaded region). The avalanche stops after step 3

a toppling event is induced, which may in turn lead to a whole series of topplings.
The resulting avalanche is characterized by its duration t and the size s of affected
sites. It continues until a new stable configuration is reached. In Fig. 6.8 a small
avalanche is shown.

Distribution of Avalanches We define with D(s) and D(t) the distributions of the
size and of the duration of avalanches. One finds that they are scale free,

D(s) ∼ s −αs ,
. D(t) ∼ t −αt , (6.16)

as we will discuss in the next section. The scaling relations (6.16) express the
essence of self-organized criticality, which we expect to be valid for a wide range
of cellular automata with conserving dynamics, independent of the special values of
the parameters entering the respective update functions. Numerical simulations and
analytic approximations for d = 2 dimensions yield

5 3
.αs ≈ , αt ≈ .
4 4
Conserving Dynamics and Self-Organized Criticality Given that toppling events
are locally conserving, avalanches of arbitrary large sizes must occur, as sand can
be lost only at the boundary of the system. Vice versa, only avalanches reaching the
boundary contribute to the powerlaw scaling expressed by (6.16). Self-organized
criticality breaks down as soon as there is a small but non-vanishing probability to
lose sand somewhere inside the system.

Features of the Critical State The empty board, when all cells are initially empty,
zi ≡ 0, is not critical. The system remains in the frozen phase when adding sand,
as long as most zi < K. Adding one sand corn after the other, the critical state is
slowly approached. There is no way to avoid the critical state.

Once the critical state is achieved the system remains critical. The critical state is
paradoxically also the point at which the system is dynamically most unstable. It has
6.4 Sandpile Model and Self-Organized Criticality 221

an unlimited susceptibility to the external driving, viz to the addition of grains, using
the terminology of Sect. 6.1, as a single added grain of sand can trip avalanches of
arbitrary size.

6.4.1 Absorbing Phase Transitions

For a variation of the original sandpile model, we disregard both dissipation


and external driving. Instead of losing sand at the boundaries, periodic boundary
conditions are implemented, with the consequence that the number of grains is
conserved both locally as well as globally. Starting with a random configuration
.{zi }, there will be a rush of initial avalanches, following the toppling rules (6.15),

until the system settles into either an active or an absorbing state.7

– All topplings will stop eventually whenever the average number of grains is too
small. The resulting inactive configuration is called “absorbing state”.
– For a large average number of grains, the redistribution of grains will never
terminate, resulting in a continously active state.

Adding externally a single grain to an absorbing state will lead at most to a single
avalanche with the transient acitivity terminating in another absorbing state. In this
picture, the grain of sand added has been absorbed.

Transition from Absorbing to Active State The average number of particles


ρ = 〈zi 〉 controls the transition from absorbing to active phase. The active state
.

is characterized by the mean number of sites with heights .zi greater then the
threshold K, the active sites. The avalanche shown in Fig. 6.8, has 1/2/2/0 active
sites respectively, to give an example, at time steps 1/2/3/4. The transition from the
absorbing to the active state is of second order, as illustrated in Fig. 6.9, with the
number of active sites acting as an order parameter.8

Self-Organization Towards the Critical Density Based on a separation of time


scales, there is a deep relation between absorbing state transitions and the concept
of self-organized criticality.
The external drive, the addition of grains of sand, one by one, is infinitesimal
slow in the sandpile model. The reason is that the external drive is stopped once
an avalanche starts, and resumed only once the avalanche has terminated. Relative
to the time scale of the external drive, the avalanche is hence instantaneous.
Slowly adding one particle after another continuously increases the mean particle

7 Generically, absorbing states are final configurations of Markov chains, as discussed in Sect. 3.3.2

of Chap. 3.
8 A fully self consistent mean-field theory of absorbing phase transitions in the presence of

uncorrelated external driving is presented in Sect. 10.2.2 of Chap. 10.


222 6 Self-Organized Criticality

density of active sites κ

external drive internal dissipation


slow fast
adding particles particle removal

ρc ρ
Fig. 6.9 Changing the mean particle density .ρ, an absorbing phase transition may occur, whith
the density of active sites acting as an order parameter. The system self-organizes towards the
critical particle density .ρc through the balancing of a slow external drive, realized by adding grains
in the sandpile model, and a fast internal dissipative process, when loosing grains of sand at the
boundaries

number, driving the system towards criticality from below, as shown in Fig. 6.9.
Particles surpassing the critical density are instantaneously dissipated through large
avalanches reaching the boundaries of the systems, the mean particle density is
hence pinned at criticality.

6.5 Random Branching Theory

Branching theory deals with the growth of networks via branching. Networks
generated by branching processes are loopless; they typically arise in theories of
evolutionary processes.

6.5.1 Branching Theory of Self-Organized Criticality

Avalanches have an intrinsic relation to branching processes; at every time step the
avalanche can either continue or stop. Random branching theory is hence a suitable
method for studying self-organized criticality.

Branching in Sandpiles A typical update during an avalanche is of the form

time 0: zi → zi − 4 zj → zj + 1
.
time 1: zi → zi + 1 zj → zj − 4
6.5 Random Branching Theory 223

p 1−p

1−p p 1−p p

Fig. 6.10 Branching processes. Left: The two possible processes of order .n = 1. Right: A
generic process of order .n = 3 with an avalanche of size .s = 7

when two neighboring cells i and j initially have .zi = K + 1 and .zj = K.
This implies that an avalanche typically intersects with itself. However, for a d-
dimensional hypercubic lattice with .K = 2d − 1, self-interaction of avalanches
becomes unimportant when .1/d → 0. In large dimensions, avalanches can be
mapped rigorously to a random branching process.9

Binary Random Branching The notion of neighbors loses meaning in large


dimensions. All that is important when .d → ∞, is the question whether an
avalanche continues, increasing its size continuously, or whether it stops. This can
be expressed by binary branching, the case that toppling events create two new
active sites.

BINARY BRANCHING An active site of an avalanche topples with the probability p


creating two new active sites.

For .p < 1/2 the number of new active sites decreases on the average, with
avalanches dying out; .pc = 1/2 is the critical state with on the average conserving
dynamics. See Fig. 6.10 for some examples of branching processes.

Distribution of Avalanche Sizes The properties of avalanches are determined by


the probability distribution,


.Pn (s, p), Pn (s, p) = 1 ,
s=1

9 An analogous situation occurs in the context of high-dimensional and/or random graphs, as


discussed in Sect. 1.3.3 of Chap. 1, which are also loopless in the thermodynamic limit.
224 6 Self-Organized Criticality

p p p

1−p 1−p p 1−p p p

Fig. 6.11 Branching processes of order .n = 2 with avalanches of sizes .s = 3, 5, 7


(left/middle/right) and boundaries .σ = 0, 2, 4

describing the probability to find an avalanche of size s for a branching process


of order n. Here s is the (odd) number of sites inside the avalanche, see Figs. 6.10
and 6.11 for some examples.

Small Avalanches For small s and large n one can evaluate the probability for
small avalanches to occur by hand,

Pn (1, p) = 1 − p,
. Pn (3, p) = p(1 − p)2 , Pn (5, p) = 2p2 (1 − p)3 ,

compare Figs. 6.10 and 6.11. Note that .Pn (1, p) is the probability to find an
avalanche of just one site.

Generating Function Formalism The generating function formalism for proba-


bility distributions is very useful when one has to deal with independent stochastic
processes, as the joint probability of two independent stochastic processes is
equivalent to the simple multiplication of the corresponding generating functions.10
We define with
 
.fn (x, p) = Pn (s, p) x s , fn (1, p) = Pn (s, p) = 1 (6.17)
s s

the generating functional .fn (x, p) for the probability distribution .Pn (s, p). We note
that

1 ∂ s fn (x, p)
Pn (s, p) =
. , n, p fixed . (6.18)
s! ∂x s x=0

Recursion Relation For generic n the recursion relation

fn+1 (x, p) = x (1 − p) + x p fn2 (x, p)


. (6.19)

10 Foran introduction to generating functions for probability distributions we refer to Sect. 1.3.2
of Chap. 1.
6.5 Random Branching Theory 225

is valid. To see why, one builds the branching network backwards, adding a site at
the top.

– With the probability .(1 − p)


one adds a single-site avalanche described by the generating functional x.
– With the probability p
one adds a site, described by the generating functional x, which generated two
active sites, described each by the generating functional .fn (x, p).

In the terminology of branching theory, one speaks of the decomposition of the


branching process after its first generation, a standard procedure.

Self-Consistency Condition For large n and finite x the generating functionals


fn (x, p) and .fn+1 (x, p) become identical, leading to the self-consistency condition
.

fn (x, p) = fn+1 (x, p) = x (1 − p) + x p fn2 (x, p) ,


.

with the solution



1− 1 − 4x 2 p(1 − p)
f (x, p) ≡ fn (x, p) =
. (6.20)
2xp

for the steady-state generating functional .f (x, p). The normalization condition
 
1− 1 − 42 p(1 − p) 1− (1 − 2p)2
f (1, p) =
. = =1
2p 2p

is fulfilled for .p ∈ [0, 1/2]. For .p > 1/2 the last step in above equation would not
be correct.

Subcritical Solution Expanding (6.20) in powers of .x 2 we find terms like


 2 k
1 k x 1 k
. 4p(1 − p) = 4p(1 − p) x 2k−1 .
p x p

Comparing this with the definition of the generating functional (6.17) we note that
s = 2k − 1, which is equivalent to .k = (s + 1)/2. This implies
.

1  s/2
P (s, p) ∼
. 4p(1 − p) 4p(1 − p) ∼ e−s/sc (p) (6.21)
p

for the probability to find an avalance of size s, where we have used the relation

= e−s(ln a)/(−2) ,
s/2 )
a s/2 = eln(a
. a = 4p(1 − p)
226 6 Self-Organized Criticality

together with the definition

−2
sc (p) =
. , lim sc (p) → ∞
ln[4p(1 − p)] p→1/2

of the avalanche correlation length .sc (p). For .p < 1/2, the correlation length .sc (p)
is finite and the avalanche is consequently not scale-free, compare Sect. 6.2. The
characteristic size of an avalanche .sc (p) diverges for .p → pc = 1/2. Note that
.sc (p) > 0 for .p ∈]0, 1[.

Critical Solution We now consider the critical case with



1− 1 − x2
p = 1/2,
. 4p(1 − p) = 1, f (x, p) = .
x

The expansion of . 1 − x 2 with respect to x is
    
 ∞

1
2
1
2 − 1 12 − 2 · · · 12 − k + 1  k
. 1 − x2 = − x2
k!
k=0

in (6.20) and therefore


    
1
2
1
2 − 1 12 − 2 · · · 12 − k + 1
Pc (k) ≡ P (s = 2k−1, p = 1/2) =
. (−1)k .
−k!

This expression is still unhandy. However we are only interested in the asymptotic
behavior for large avalanche sizes s. For this purpose we consider the recursive
relation
1/2 − k 1 − 1/(2k)
Pc (k + 1) =
. (−1)Pc (k) = Pc (k)
k+1 1 + 1/k

in the limit of large .k = (s + 1)/2, where .1/(1 + 1/k) ≈ 1 − 1/k,


  
Pc (k + 1) ≈ 1 − 1/(2k) 1 − 1/k Pc (k) ≈ 1 − 3/(2k) Pc (k) .
.

This asymptotic relation leads to

Pc (k + 1) − Pc (k) −3 ∂Pc (k) −3


. = Pc (k), = Pc (k) ,
1 2k ∂k 2k
6.5 Random Branching Theory 227

with the solution


3
Pc (k) ∼ k −3/2 ,
. D(s) = Pc (s) ∼ s −3/2 , αs = , (6.22)
2
for large .k, s, since .s = 2k − 1.

Distribution of Avalanche Duration The distribution of the duration n of


avalanches can be evaluated in a similar fashion. For this purpose one defines
the probability distribution function

Qn (σ, p)
.

for an avalanche of duration n to have .σ cells at the boundary, see Fig. 6.11. One can
derive a recursion relation for the corresponding generating functional, in analogy
to (6.19), and solve it self-consistently.11

The distribution of avalanche durations is determined from the generating


function via .Qn = Qn (σ = 0, p = 1/2), i.e. by the probability that the avalanche
stops after n steps. One finds

Qn ∼ n−2 ,
. D(t) ∼ t −2 , αt = 2 . (6.23)

Tuned or Self-Organized Criticality? The random branching model discussed in


this section had only one free parameter, the probability p. This model is critical
only for .p → pc = 1/2, giving rise to the impression that one has to fine tune the
parameters in order to obtain criticality, just like for ordinary phase transitions.
This is actually not the case. As an example we could generalize the sandpile
model to continuous forces .zi ∈ [0, ∞], together with the update rule

zj → zj − Δij ,
. if zi > K ,

and


⎪ K i=j

−c K/4 i, j nearest neighbors
.Δi,j = (6.24)

⎪ −(1 − c) K/8 i, j next-nearest neighbors

0 otherwise

11 The evaluation of avalanche durations is the topic of exercise (6.8).


228 6 Self-Organized Criticality

p2
0.8

0.6 W=1

p1 p3 0.4 W>1 W=2

0.2

0
0 0.2 0.4 0.6 0.8 1
q

Fig. 6.12 Galton–Watson processes. Left: Example of a reproduction tree, with .pm being the
probabilities of having .m = 0, 1, . . . offsprings. Right: Graphical solution for the fixpoint
equation (6.27), for various average numbers of offsprings W

for a square-lattice with four nearest neighbors and eight next-nearest neighbors.
The update rules are conserving,

. Δij = 0, ∀c ∈ [0, 1] .
j

For .c = 1 this model corresponds to the continuous field generalization of the


original sandpile model. The model defined by (6.24) can be expected to map in the
limit .d → ∞ to an appropriate random branching model with .p = pc = 1/2; it will
be critical for all values of the parameters K and c, due to its conserving dynamics.

6.5.2 Galton–Watson Processes

Galton–Watson processes are generalizations of the binary branching processes


considered so far, with interesting applications in evolution theory and some
everyday experiences.

History of Family Names Family names are handed down traditionally from
father to son. Family names regularly die out, leading over the course of time
to a substantial reduction of the pool of family names. This effect is especially
pronounced in countries looking back on millenia of cultural continuity, like China,
where 22 % of the population are sharing only three family names.
The evolution of family names is described by a Galton–Watson process, with a
key quantity of interest being the extinction probability, viz the probability that the
last person bearing a given family name dies without descendants.

Galton–Watson Process The basic reproduction statistics determines the evolu-


tion of family names, see Fig. 6.12.
6.5 Random Branching Theory 229

We denote with .pm the probability that an individual has m offsprings and with
(n)
with .pD the probability of finding a total of D descendants in the n-th generation.
The respective generating functions,
  (n)
G0 (x) =
. pm x m , G(n) (x) = pD x D ,
m D

obey the recursion relation


 (n)
G(n+1) (x) =
. pD [G0 (x)]D = G(n) (G0 (x)) .
D

We rewrite the recursion relation as


 
.G(n) (x) = G0 (G0 (. . . G0 (x) . . . )) = G0 G(n−1) (x) , (6.25)

which can be solved together with the starting point .G(0) (x) = x. This represen-
tation is the basis for all further considerations; we consider here the extinction
probability q.

Extinction Probability The reproduction process dies out when there is a genera-
tion with zero members. The probability of having zero persons bearing a specific
family name in the n-th generation is
 
(n)
q = p0 = G(n) (0) = G0 G(n−1) (0) = G0 (q) ,
. (6.26)

where we have used the recursion relation (6.25) and the stationarity condition
.G(n) (0) ≈ G(n−1) (0). The extinction probability q is hence given by the fixpoint
.q = G0 (q) of the generating functional .G0 (x) of the reproduction probability.

Binary Branching as a Galton–Watson Process As an example we take the case


that
W W
G0 (x) = 1 −
. + x2, G0' (1) = W ,
2 2
viz that people may have either zero or two sons, respectively with probabilities
1 − W/2 and .W/2 < 1. The expected number of offsprings W corresponds to the
.

fitness of an individuum in evolution theory.12 This setting corresponds to the case


of binary branching, see Fig. 6.10, with .W/2 being the branching probability, e.g.
as it would hold for the reproductive dynamics of unicellular organisms.

12 See Chap. 8.
230 6 Self-Organized Criticality

The self-consistency condition (6.26) for the extinction probability .q = q(W )


then reads

W W 2 1 1 (2 − W )2
.q = 1 − + q , q(W ) = ± − , (6.27)
2 2 W W 2 W2

with the smaller root being here of relevance. The extinction probability vanishes
for a reproduction rate of two,

⎨ 0 W =2
.q(W ) = q ∈ ]0, 1[ 1<W <2

1 W ≤1

and is unity for a fitness below one, compare Fig. 6.12.

6.6 Application to Long-Term Evolution

An application of the techniques developed in this chapter can be used to study a


model for the evolution of species proposed by Bak and Sneppen.

Fitness Landscapes Evolution deals with the adaption of species and their fitness
relative to the ecosystem they live in.

FITNESS LANDSCAPES The function that determines the chances of survival of a species,
its fitness, is the fitness landscape.

In Fig. 6.13 a simple fitness landscape is illustrated, in which there is only one
dimension in the genotype and/or phenotype space.13

Due to the presence of fitness barriers between adjacent local peaks, the
population will spend most of its time in a local fitness maximum whenever the
mutation rate is low with respect to the selection rate, see Fig. 6.13. Mutations
are random processes that induce evolutionary transitions from one local fitness
maximum to the next through stochastic escape.14

Coevolution It is important to keep in mind for the following discussion that an


ecosystem, and with it the respective fitness landscapes, is not static on long time
scales. The ecosystem is the result of the combined action of geophysical factors,

13 The term “genotype” denotes the ensemble of genes. The actual form of an organism, the

“phenotype”, is determined by the genotype plus environmental factors, like food supply during
growth.
14 Stochastic escape is discussed in Sect. 3.5.2 of Chap. 3.
6.6 Application to Long-Term Evolution 231

Fig. 6.13 Illustration of a


P
one-dimensional fitness

species fitness
landscape. A species evolving
from an adaptive peak P to a Q
B
new adaptive peak Q needs to
overcome the fitness
barrier B

genotype

such as average rainfall and temperature, and biological influences, namely the
properties and actions of the constituting species. The evolutionary progress of
one species will therefore be likely to trigger adaption processes in other species
appertaining to the same ecosystem, a process denoted “coevolution”.

Evolutionary Time Scales In the model of Bak and Sneppen there are no explicit
fitness landscapes like the one illustrated in Fig. 6.13. Instead, the model proposes
that a single number, the “fitness barrier”, can be used as a proxy for the influence of
all other species making up the ecosystem. The time needed for a stochastic escape
from one local fitness optimum increases exponentially with barrier height. We may
therefore assume that the average time t it takes to mutate across a fitness barrier of
height B scales as

t = t0 eB/T ,
. (6.28)

where .t0 and T are appropriate constants. The value of .t0 is arbitrary, as it merely
sets the time scale. The parameter T depends on the mutation rate, and the
assumption that mutation is low implies that T is small compared to the typical
barrier heights B in the landscape. In this case the time scales t for crossing slightly
different barriers are distributed over many orders of magnitude and only the lowest
barrier is relevant.

Bak–Sneppen Model The Bak–Sneppen model is a phenomenological model for


the evolution of barrier heights for a fixed number N of species. Each species has a
barrier,

Bi = Bi (t) ∈ [0, 1],


. t = 0, 1, 2, . . . ,

for its further evolution. The initial .Bi (0) are drawn randomly from .[0, 1]. The
evolutionary dynamics generated by the model consists of the repetition of two
steps.
232 6 Self-Organized Criticality

1 1 1

0.8 0.8
0.8
barriers

barriers

barriers
0.6 0.6
0.6
0.4 0.4

0.4
0.2 0.2

0.2 0 0
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100

species species species

Fig. 6.14 The barrier values (dots) for a 100 species one-dimensional Bak–Sneppen model with
.K = 2 after 50, 200 and 1600 steps of a simulation. Included in each frame is the approximate
position of the upper edge of the “gap” (horizontal line). A few species have barriers below this
level, indicating that they were involved in an avalanche at the moment when the snapshot of the
system was taken

– The times for a stochastic escape are exponentially distributed, see (6.28). It
is therefore reasonable to assume that the species with the lowest barrier .Bi
mutates and escapes first. After escaping, it will adapt quickly to a new local
fitness maximum and acquire a new barrier for mutation, which is assumed to be
uniformly distributed in .[0, 1].
– The fitness function for a species i is determined by the ecological environment
it lives in, which is made up of the other species. When a given species
mutates it therefore influences the fitness landscape for a certain number of
other species. Within the Bak–Sneppen model this translates into assigning new
random barriers .Bj for .K − 1 neighbors of the mutating species i.

With these two rules, the Bak–Sneppen model tries to capture two essential ingre-
dients of long-term evolution: The exponential distribution of successful mutations
and the interaction of species when one constituting species evolves.

Random Neighbor Model The topology of the interaction between species in the
Bak–Sneppen model is unclear. It might be chosen to be two-dimensional, if the
species are thought to live geographically separated, or one-dimensional in a toy
model. In reality, the topology is complex and can be assumed to be random, in first
approximation, resulting in the soluble random neighbor model.

Evolution of Barrier Distribution We start with a qualitative discussion of the


redistribution of barrier heights under the sequential repetition of the two steps
above, as shown Fig. 6.14. The initial barrier heights are uniformly distributed over
the interval .[0, 1] and the lowest barrier, removed in the first step is small. The new
heights reassigned subsequently will therefore lead, on average, to an increase in the
average barrier height with passing time.
6.6 Application to Long-Term Evolution 233

With increasing average barrier height the characteristic lowest barrier is also
raised, and eventually a steady state will be reached, just as in the sandpile model
discussed previously. It turns out that the characteristic value for the lowest barrier
is about .1/K at equilibrium when a the mean-field approximation is used and that
the steady state is critical.

Molecular Field Theory The barrier distribution,

p(x, t) ,
.

is the probability to find a barrier of hight .x ∈ [0, 1] at time step .t = 1, 2, . . . . In


addition, we define with .Q(x) the probability to find a barrier above x:

1
Q(x) =
. dx ' p(x ' ), Q(0) = 1, Q(1) = 0 . (6.29)
x

The dynamics is governed by the size of the smallest barrier. The distribution
function .p1 (x) for the lowest barrier is

p1 (x) = N p(x) QN −1 (x) ,


. (6.30)

it is given by the probability .p(x) for one barrier (out of the N barriers) to have the
barrier height x, while all the other .N − 1 barriers are larger. .p1 (x) is normalized,

1 1 ∂Q(x) x=1
. dx p1 (x) = (−N) dx QN −1 (x) = −QN (x) = 1,
0 0 ∂x x=0

where we used .p(x) = −Q' (x), .Q(0) = 1 and .Q(1) = 0.

Time Evolution of Barrier Distribution The time evolution for the barrier
distribution consists in taking away one (out of N) barrier, the lowest, via

1
p(x, t) −
. p1 (x, t) ,
N
and by removing randomly .K − 1 barriers from the remaining .N − 1 barriers, and
adding K random barriers,

1
p(x, t + 1) = p(x, t) − p1 (x, t)
. (6.31)
N
 
K −1 1 K
− p(x, t) − p1 (x, t) + .
N −1 N N
234 6 Self-Organized Criticality

We note that .p(x, t + 1) is normalized whenever .p(x, t) and .p1 (x, t) were
normalized correctly,
 
1 1 K −1 1 K
. dxp(x, t + 1) = 1 − − 1− +
0 N N −1 N N
 
K −1 N −1 K N −K K
= 1− + = + ≡ 1.
N −1 N N N N

Stationary Distribution Iterating (6.31), the barrier distribution will approach a


stationary solution .p(x, t + 1) = p(x, t) ≡ p(x), as can be observed from the
numerical simulation shown in Fig. 6.14. The stationary distribution corresponds to
the fixpoint condition
 
1 K −1 K −1 K
0 = p1 (x)
. − 1 − p(x) +
N N −1 N −1 N

of (6.31). With .p1 = Np QN −1 we obtain

0 = Np(x) QN −1 (x)(K − N) − p(x) (K − 1)N + K(N − 1) ,


.

which can be further simplified using .p(x) = −∂Q(x)/∂x,

∂Q(x) N −1 ∂Q(x)
0 = N (N − K)
. Q + (K − 1)N + K(N − 1) ,
∂x ∂x
or, equivalently,

0 = N (N − K) QN −1 dQ + (K − 1)N dQ + K(N − 1) dx .
.

One can integrate the last expression with respect to x,

0 = (N − K) QN (x) + (K − 1)N Q(x) + K(N − 1) (x − 1) ,


. (6.32)

where we took care of the boundary condition .Q(1) = 0, .Q(0) = 1.

Solution in the Thermodynamic Limit The polynomial (6.32) simplifies in the


thermodynamic limit, with .N → ∞ and .K/N → 0, to

0 = QN (x) + (K − 1) Q(x) − K (1 − x) .
. (6.33)

We note that .Q(x) ∈ [0, 1] and that .Q(0) = 1, .Q(1) = 0. There must therefore be
some .x ∈]0, 1[ for which .0 < Q(x) < 1. Then

K
QN (x) → 0,
. Q(x) ≈ (1 − x) . (6.34)
K −1
6.6 Application to Long-Term Evolution 235

1
at equilibrium
Q(x)

random

0
0 x 1
1/K

Fig. 6.15 The distribution .Q(x) to find a fitness barrier larger than .x ∈ [0, 1] for the Bak–Sneppen
model, for the case of random barrier distribution (dotted blue line) and the stationary distribution
(dashed red line), compare (6.35)

This relation remains valid as long as .Q < 1, or .x > xc :

K 1
. 1= (1 − xc ), xc = .
K −1 K

We then have in the limit .N → ∞



1 for x < 1/K
. lim Q(x) = , (6.35)
N →∞ (1 − x)K/(K − 1) for x > 1/K

compare Fig. 6.15, which translates to



0 for x < 1/K
. lim p(x) = , (6.36)
N →∞ K/(K − 1) for x > 1/K

when using .p(x) = −∂Q(x)/∂x. This result compares qualitatively well with
the numerical results presented in Fig. 6.14. Note, however, that the mean-field
solution (6.36) does not predict the exact critical barrier height for non-random
neighbors, which is somewhat larger for a one-dimensional arrangement of neigh-
bors, as in Fig. 6.14.

Distribution of the Lowest Barrier Equation (6.36) cannot be rigorously true for
.N < ∞, since there is a finite probability for barriers with .Bi < 1/K to reappear
at every step. If the barrier distribution is zero below the self-organized threshold
.xc = 1/K and constant above, then the lowest barrier must be below .xc with equal

probability:

K for x < 1/K 1
p1 (x) →
. , dx p1 (x) = 1 . (6.37)
0 for x > 1/K 0
236 6 Self-Organized Criticality

Fig. 6.16 A time series of 1

evolutionary activity in a active species


simulation of the
0.8
one-dimensional
Bak–Sneppen model with
.K = 2 showing 0.6

time
coevolutionary avalanches
interrupting the punctuated
equilibrium. Shown are the 0.4
new barrier values (dots), viz
the evolving species
0.2

0
0 20 40 60 80 100
species

Coevolution and Avalanches When the species with the lowest barrier mutates,
we assign new random barrier heights to it and to its .K − 1 neighbors. This
causes an avalanche of evolutionary adaptations whenever one of the new barriers
becomes the new lowest fitness barrier. One calls this phenomenon “coevolution”
since the evolution of one species drives the adaption of other species belonging to
the same ecosystem. In Fig. 6.16 this process is illustrated for the one-dimensional
model. The avalanches in the system are clearly visible and well separated in
time. In between the individual avalanches the barrier distribution does not change
appreciably; one speaks of a “punctuated equilibrium”.

Critical Coevolutionary Avalanches In Sect. 6.5 we discussed the connection


between avalanches and random branching. The branching process is critical when
it continues with a probability of 1/2. To see whether the coevolutionary avalanches
within the Bak–Sneppen model are critical we calculate the probability .pbran that at
least one of the K new, randomly selected, fitness barriers will be the new lowest
barrier.

With probability x one of the new random barriers is in .[0, x] and below the
actual lowest barrier, which is distributed with .p1 (x), see (6.37). We then have

1 1/K K2 2 1/K 1
.pbran = K p1 (x) x dx = K K x dx = x ≡ ,
0 0 2 0 2

viz the avalanches are critical. The distribution of the size s of the coevolutionary
avalanches is then
 3/2
1
.D(s) ∼ ,
s
Exercises 237

as evaluated within the random branching approximation, see (6.22), and indepen-
dent of K. The size of a coevolutionary avalanche can be arbitrarily large and
involve, in extremis, a finite fraction of the ecosystem, compare Fig. 6.16.

Features of the Critical State The sandpile model evolves into a critical state
under the influence of an external driving by adding one grain of sand after another.
The critical state is characterized by a distribution of slopes (or heights) .zi that has
a discontinuity; there is a finite fraction of slopes with .zi = Z − 1, but no slope with
.zi = Z, apart from some of the sites transiently participating in an avalanche.

In the Bak–Sneppen model the same process occurs, but without external
drivings. At criticality, the barrier distribution .p(x) = ∂Q(x)/∂x has a discontinuity
at .xc = 1/K, see Fig. 6.15. One could say, cum grano salis, that the system
has developed an “internal phase transition”, namely a transition in the barrier
distribution .p(x), which is an internal variable. The emergent state for .p(x) is a
many-body or collective effect, since it results from the reciprocal interactions of
the species participating in the formation of the ecosystem.

Exercises

(6.1) SOLUTIONS OF THE LANDAU–GINZBURG FUNCTIONAL


Determine the order parameter for h /= 0 and t < 1 via (6.5) and Fig. 6.2.
Discuss the local stability condition (6.3) for the three possible solutions. You
may select special values for h and t, such that P (φ) − h factorizes.
(6.2) THERMODYNAMICS OF THE LANDAU-GINZBURG MODEL
Determine the entropy S(T ) = ∂F ∂T and the specific heat cV = T ∂T within the
∂S

Landau–Ginzburg theory (6.1) for phase transitions.


(6.3) THE GAME OF LIFE
Consider the evolution of the following states, see Fig. 6.5, under the rules for
Conway’s game of life:
   
(0, 0), (1, 0), (0, 1), (1, 1) (0, −1), (0, 0), (0, 1)
.   
(0, 0), (0, 1), (1, 0), (−1, 0), (0, −1) (0, 0), (0, 1), (0, 2), (1, 2), (2, 1)
(6.38)

The predictions can be checked with online simulation applications.


(6.4) GAME OF LIFE ON A SMALL-WORLD NETWORK
Write a program to simulate the game of life on a 2D lattice. Consider this
lattice as a network with every site having edges to its eight neighbors. Rewire
the network such that (a) the local connectivities zi ≡ 8 are retained for every
site and (b) a small-world network is obtained. This can be achieved by cutting
238 6 Self-Organized Criticality

two arbitrary links with probability p and rewiring the four resulting stubs
randomly.15
Define an appropriate dynamical order parameter and characterize the changes
as a function of the rewiring probability.
(6.5) FOREST FIRE MODEL
Develop a mean-field theory for the forest fire model by introducing appro-
priate probabilities to find cells with trees, fires and ashes. Find the critical
number of nearest neighbors Z for fires to continue burning.
(6.6) REALISTIC SANDPILE MODEL
Propose a cellular automata model that simulates the physics of real-world
sandpiles somewhat more realistically than the original model. The cell value
z(x, y) should correspond to the local height of sand. Write a program to
simulate the model.
(6.7) RECURSION RELATION FOR AVALANCHE SIZES
Use the definition (6.17) for the generating functional fn (x, p) of avalanche
sizes in (6.19) and derive a recursion relation for the probability Ps (n, p)
of finding an avalanche of size s in the nth generation, given a branching
probability p. How does this recursion relation change when the branching is
not binary but, as illustrated in Fig. 6.12, determined by the probability pm of
generating m offsprings?
(6.8) RANDOM BRANCHING MODEL
Derive the distribution of avalanche durations (6.23) in analogy to the steps
explained in Sect. 6.5, by considering a recursion relation for the integrated
n
duration probability Q̃n = n' =0 Qn (0, p), viz for the probability that an
avalanche last maximally n time steps.
(6.9) GALTON–WATSON PROCESS
Use the fixpoint condition (6.26) and show that the extinction probability is
unity if the average reproduction rate is smaller than one.

Further Reading

Regarding cellular automata in general, and the game of life in particular, we


recommend Wolfram (1986), Creutz (2004), and Rennard (2002), with the later text
specifying how to implement logical functions in the game of life. It is pointed
out in Drossel (2000) that scaling results from avalanches reaching the boundary,
with the relation to random branching theory being treated in Zapperi et al. (1995).
In Calcaterra (2022), Lenia is disucssed, a generalization of the game of life to
continuous systems.
For a review of the forest fire and several related models, see Clar et al. (1996),
for a review on absorbing phase transitions Hinrichsen (2000). The forest fire model

15 Thisprocedure corresponds to a 2D Watts and Strogatz model, which is discussed in Sect. 1.5
of Chap. 1.
References 239

with lightning was introduced by Drossel and Schwabl (1992). A comprehensive


review on power laws in theory and nature is presented in Marković et al. (2014),
for a textbook on statistical physics and phase transitions see Sinai (2014). The
formulation of the Bak and Sneppen (1993) model for long-term coevolutionary
processes and its mean-field solution are discussed by Flyvbjerg et al. (1993).

References
Bak, P., & Sneppen, K. (1993). Punctuated equilibrium and criticality in a simple model of
evolution. Physical Review Letters, 71, 4083–4086.
Calcaterra, C. (2022). Existence of life in Lenia. arXiv:2203.14390.
Clar, S., Drossel, B., & Schwabl, F. (1996). Forest fires and other examples of self-organized
criticality. Journal of Physics: Condensed Matter, 8, 6803–6824.
Creutz, M. (2004). Playing with sandpiles. Physica A, 340, 521–526.
Drossel, B. (2000). Scaling behavior of the Abelian Sandpile model. Physical Review E, 61, R2168.
Drossel, B., & Schwabl, F. (1992). Self-organized critical forest-fire model. Physical Review
Letters, 69, 1629–1632.
Flyvbjerg, H., Sneppen, K., & Bak, P. (1993). Mean field theory for a simple model of evolution.
Physical Review Letters, 71, 4087–4090.
Hinrichsen, H. (1993). Non-equilibrium critical phenomena and phase transitions into absorbing
states. Advances in Physics, 49, 815–958.
Marković, D., & Gros, C. (2014). Powerlaws and self-organized criticality in theory and nature.
Physics Reports, 536, 41–74.
Rennard, J. P. (2002). Implementation of logical functions in the game of life. In Collision-based
computing (pp. 491–512). Springer.
Sinai, Y. G. (2014). Theory of phase transitions: Rigorous results. Elsevier.
Wolfram, S. (Ed.). (1986). Theory and applications of cellular automata. World Scientific.
Zapperi, S., Lauritsen, K. B., & Stanley, H. E. (1995). Self-organized branching processes: Mean-
field theory for avalanches. Physical Review Letters, 75, 4071–4074.
Random Boolean Networks
7

Complex systems theory deals with dynamical systems containing a very large
number of variables. The resulting dynamical behavior can be arbitrary complex
and sophisticated. It is therefore important to have well controlled benchmarks,
dynamical systems which can be investigated and understood in a controlled way
for large numbers of variables.
Networks of interacting binary variables, i.e. boolean networks, constitute such
canonical complex dynamical systems. They allow the formulation and investigation
of important concepts, among others regarding information retention and loss
and the occurrence of phase transition in the resulting dynamical states. Boolean
networks are recognized to be the starting points for the modeling of gene expression
and protein regulation networks; the fundamental networks at the basis of life.

7.1 Introduction

Boolean Networks In this chapter, we describe the dynamics of sets of N “binary


variables”, namely of variables having only two possible values, typically 0 and 1.
The actual values are irrelevant; .±1 is an alternative popular choice. The elements
interact with each other according to given interaction rules, the coupling functions.

BOOLEAN COUPLING FUNCTION A boolean function .{0, 1}K → {0, 1} maps K boolean
variables onto a single one.

The dynamics of the system is considered to be discrete, .t = 0, 1, 2, . . . . The


value of the variables at the next time step are determined by the choice of boolean
coupling functions.

BOOLEAN NETWORK The set of boolean coupling functions interconnecting the N


boolean variables can be represented graphically by a directed network, the boolean
network.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 241
C. Gros, Complex and Adaptive Dynamical Systems,
https://doi.org/10.1007/978-3-031-55076-8_7
242 7 Random Boolean Networks

Fig. 7.1 A boolean network


with .N = 4 sites. .σ1 (t +1) is
determined by .σ2 (t), .σ3 (t)
1 2
and .σ4 (t), .σ2 (t +1) by .σ1 (t)
and .σ3 (t), etc. The number
.Ki of controlling elements of
.σi is 3/2/1/1 (color coded)

3
4

In Fig. 7.1 a small boolean network is illustrated. Boolean networks at first


sight seem to be quite esoteric, devoid of the practical significance for real-world
phenomena. Why are they then studied so intensively?

Cell Differentiation in Terms of Stable Attractors The field of boolean networks


was given a first big boost by a seminal study by Kauffman in the late 1960s.
Kauffman casted the problem of gene expression in terms of a gene regulation
network, introducing the N–K model in this context. The question regards the how
of cell differentiation, i.e. the fact that a skin cell differs from a muscle cell, albeit
having the identical genetic markup. This can be due only to differences in gene
activity patterns in the respective cells. Kauffman proposed that different stable
attractors, viz cycles in gene expression networks, would correspond to distinct cells
in the bodies of animals.

The notion would hence be that cell types correspond to dynamical states of a
complex system, i.e. of the gene expression network. This proposal by Kauffman
has received on one side strong support from experimental studies. In Sect. 7.5.2 the
case of the yeast cell division cycle will be discussed, supporting the notion that gene
regulation networks constitute the underpinnings of life. Regarding the mechanisms
of cell differentiation in multicellular organisms, the situation is, on the other side,
less clear. Cell types are mostly determined by DNA methylation, which affects the
respective gene expression on long time scales.

Boolean Networks Are Everywhere Kauffman’s original work on gene expres-


sion networks has been generalized to a wide spectrum of applications, such as, to
give a few examples, the modeling of neural networks by random boolean networks
and of the “punctuated equilibrium” in long-term evolution.1

Classical dynamical systems theory deals with dynamical systems containing


a comparatively compact number of variables. Dynamical systems with large
numbers of variables need in part new toolsets for analysis and control. In this

1 Punctuated equilibrium will feature in Chap. 8.


7.2 Random Variables and Networks 243

field, random boolean networks are of prototypical importance, as they provide


a well defined class of dynamical systems for which the thermodynamical limit
N → ∞ can be taken. They show chaotic as well as regular behavior, despite their
apparent simplicity, along with other typical phenomena of dynamical systems. In
the thermodynamic limit a phase transition between chaotic and regular regimes
appears, with distinct consequences for information flow. These are the issues
studied in this chapter.

N–K Networks There are several types of random boolean networks. The most
simple realization is the N–K model. It is made up of N boolean variables, each
variable interacting exactly with K other randomly chosen variables. The respective
coupling functions are also chosen randomly from the set of all possible boolean
functions mapping K boolean inputs onto a single boolean output.

An ideal realization of the N–K model in nature is not known. All real physical
or biological problems have specific couplings determined by the structure and the
physical and biological interactions of the system in question. In many instances,
the topology of the couplings is complex and mostly unknown, it is then often a
good starting point to model a real-world system using a generic framework, like
the N–K model.

Binary Variables Modeling real-world systems by a collection of interacting


binary variables is often a simplification, as real-world variables are mostly
continuous. For the case of the gene expression network, one just keeps two possible
states for every single gene, active or inactive.

Thresholds, viz parameter regimes at which the dynamical behavior changes


qualitatively, are widespread in biological systems. Examples are neurons, which
fire or do not fire depending on the total strength of presynaptic activity. Similar
thresholds occur in metabolic networks in the form of activation potentials for
the chemical reactions involved. Modeling real-world systems based on threshold
dynamics with binary variables is a viable first step towards an understanding.

7.2 Random Variables and Networks

Boolean networks have a rich variety of possible concrete model realizations. The
most important types are discussed in the following.

7.2.1 Boolean Variables and Graph Topologies

State Space The N binary variables

σi ∈ {0, 1},
. i = 1, 2, . . . , N
244 7 Random Boolean Networks

constitute the state of the system .Σt at time t,

. Σt = {σ1 (t), σ2 (t), . . . , σN (t)} ,

which can be thought of as a vector pointing to one of the .Ω = 2N edges of a


N -dimensional hypercube, where .Ω is the number of possible configurations. For
numerical implementations and simulations, it can be useful to encode .Σt as the
binary representation of an integer number .0 ≤ Σt < 2N .

Controlling Elements Time is assumed to be discrete,

σi = σi (t),
. t = 1, 2, . . . .

The value of a given boolean element σi at the next time step is determined by the
values of K controlling variables.

CONTROLLING ELEMENTS The controlling elements σj1 (i) , σj2 (i) , . . ., σjKi (i) of a
boolean variable σi determine its time evolution.

The next state of an element is hence a boolean function of its controlling


elements,

σi (t + 1) = fi (σj1 (i) (t), σj2 (i) (t), . . . , σjKi (i) (t)) .


. (7.1)

Here fi is a boolean function associated with σi . The set of controlling elements


might include σi itself. Some exemplary boolean functions are listed in Table 7.1.

Table 7.1 Examples of boolean functions of three arguments. Given is a particular random
function and a canalizing function of the first argument. For the latter, the function value is 1
when σ1 = 0. If σ1 = 1, the output can be either 0 or 1. For the additive function shown, the
output is 1 (active) if at least two inputs are active. The generalized XOR is true when the number
of true-bits is odd
f (σ1 , σ2 , σ3 )
σ1 σ2 σ3 Random Canalizing Additive Gen. XOR
0 0 0 0 1 0 0
0 0 1 1 1 0 1
0 1 0 1 1 0 1
0 1 1 0 1 1 0
1 0 0 1 0 0 1
1 0 1 0 1 1 0
1 1 0 1 0 1 0
1 1 1 1 0 1 1
7.2 Random Variables and Networks 245

Fig. 7.2 Translational K=2 K=4


invariant linkages for a
completely ordered
one-dimensional lattice with
connectivities K = 2, 4, 6
K=6

Model Definition For a complete definition of the system several parameters need
to be specified.

– CONNECTIVITY
The first step is to select the connectivity Ki of each element, i.e. the number of
its controlling elements. With

1 
N
.〈K〉 = Ki
N
i=1

the average connectivity is defined. Mostly we will consider the case in which
the connectivity is identical for all nodes, Ki = K.
– LINKAGES
The second step is to select the specific set of controlling elements
 
. σj1 (i) , σj2 (i) , . . . , σjKi (i)

on which the element σi depends. See Fig. 7.1 for an illustration.


– EVOLUTION RULE
The third step is to choose the boolean function fi determining the value  of
σi (t + 1) from the values of the linkages σj1 (i) (t), σj2 (i) (t), . . . , σjKi (i) (t) .

Geometry of the Network The way the linkages are assigned determines the
topology of the network and networks can have highly diverse topologies. It is
custom to consider two special cases.

LATTICE ASSIGNMENT The boolean  variables σi are assigned  to the nodes of a regular
lattice. The K controlling elements σj1 (i) , σj2 (i) , . . ., σjK (i) are then chosen in a regular,
translational invariant manner, as in Fig. 7.2.

Complectly regular contrasts with fully random, viz uniform.

UNIFORM ASSIGNMENT For a uniform assignment, the set of controlling elements are
randomly drawn from all N sites of the network. This is the case for the N–K model, the
original Kauffman net. In terms of graph theory one also speaks of an Erdös–Rényi random
graph.
246 7 Random Boolean Networks

Intermediate cases are possible. Small-world networks, to give an example, with


regular short-distance links and random long-distance links are popular models in
network theory, such are scale-free topologies.2

7.2.2 Coupling Functions

Number of Coupling Functions The coupling function


 
fi :
. σj1 (i) , . . . , σjK (i) → σi

has .2K different arguments. To each argument value one can assign either 0 or 1.
Thus, there are a total of


⎪ 2 K =0
  ⎨
2K 2K 4 K =1
.Nf = 2 =2 =

⎪ 16 K =2

256 K =3

possible coupling functions. In Table 7.1 several examples are presented for the case
3
K = 3, out of the .22 = 256 distinct .K = 3 boolean functions.
.

Classification of Coupling Functions For small numbers of connectivity K one


can completely classify all possible coupling functions.

– K=0
There are only two constant functions, f = 1 and f = 0.
– K=1
Apart from the two constant functions,
forming the class A, there is the identity
and the negation, lumped together into
class B.

σ Class A Class B
. 0 0 1 0 1
1 0 1 1 0

2 Erdös–Rényi graphs are discussed in conjunction with small-world and scale-free networks in
Chap. 1.
7.2 Random Variables and Networks 247

– K=2
There are four classes of functions f (σ1 , σ2 ), as listed in Table 7.2, with each
class being invariant under the interchange 0 ↔ 1 in either the arguments or the
value of f .

For K = 2 the classes are

A: Constant functions.
B1 : Fully canalizing functions.
.
B2 : Normal canalizing functions, see also Table 7.1.
C: Non-canalizing functions, also denoted “reversible functions”.

For fully canalizing functions one of the arguments determines the output determin-
istically.

Types of Coupling Ensembles A range of distinct choices for the probability


distribution of the coupling functions are possible. Here the most important
examples.

– UNIFORM DISTRIBUTION
As introduced originally by Kauffman, the uniform distribution specifies all
possible coupling functions to occur with the same probability 1/Nf .
– MAGNETIZATION BIAS3
The probability of a coupling function to occur is proportional to p if the outcome
is 0 and proportional to 1 − p if the outcome is 1.
– FORCING FUNCTIONS
For forcing or canalizing functions, the function value is determined when one of
its arguments, say m ∈ {1, . . . , K}, has a specific value, say σm = 0. In contrast,
the function value is not specified if the forcing argument has a different value,
here when σm = 1. Compare Table 7.1.

Table 7.2 The 16 boolean functions for K = 2. For the definition of the various classes see
Sect. 7.2.2 and Aldana-Gonzalez et al. (2003)
σ1 σ2 Class A Class B1 Class B2 Class C
0 0 1 0 0 1 0 1 1 0 0 0 0 1 1 1 1 0
0 1 1 0 0 1 1 0 0 1 0 0 1 0 1 1 0 1
1 0 1 0 1 0 0 1 0 0 1 0 1 1 0 1 0 1
1 1 1 0 1 0 1 0 0 0 0 1 1 1 1 0 1 0

3 Magnetic moments may have only two possible directions, up or down in the language of spin-1/2

particles. A compound is hence magnetic when more moments point into one of the two possible
directions, viz if the two directions are populated unequally.
248 7 Random Boolean Networks

– ADDITIVE FUNCTIONS
In order to simulate the additive properties of inter-neural synaptic activities one
can choose

  
N
σi (t + 1) = Θ f˜i (t) ,
. f˜i (t) = −h + wij σj (t)
j =1

where Θ(x) is the Heaviside step function, h the threshold for activating the
neuron and wij the synaptic weight connecting the pre- and post-synaptic
neurons j and i. The value of σi (t + 1) depends only on a weighted sum of
its controlling elements at time t.

7.2.3 Dynamics

Model Realizations A given set of linkages and boolean functions .{fi } defines
what one calls a “realization” of the model, with the dynamics following from (7.1).
For the updating of the elements during a time step one has several choices.

– SYNCHRONOUS UPDATE
All variables .σi (t) are updated simultaneously.
– ASYNCHRONOUS UPDATING
A single variable is updated at every step. This variable may be picked at random
or by some predefined ordering scheme. Hence also the name “serial update”.

The choice of updating does not affect thermodynamic properties, like the phase
diagram discussed in Sect. 7.3.2. The occurrence and the properties of cycles and
attractors, as discussed in Sect. 7.4, depends however crucially on the form of
update.

Selection of the Model Realization There are several alternatives for choosing the
model realization during numerical simulations.

– QUENCHED MODEL4
One specific realization of coupling functions is selected at the beginning and
kept throughout all time.

4 An alloy made up of two or more substances is said to be “quenched” when it is cooled so

quickly that it remains stuck in a specific atomic configuration, which does not change anymore
subsequently.
7.3 Dynamics of Boolean Networks 249

– ANNEALED MODEL5
A new realization is randomly selected at each time step. Then either the linkages
or the coupling functions or both change with every update, depending on the
choice of the algorithm.
– GENETIC ALGORITHM
If the network is thought to approach a predefined goal, one may employ
a genetic algorithm in which the system slowly modifies its realization with
passing time.

Real-world systems are normally modeled by quenched systems with synchronous


updating. Interactions are fixed for all times.

Cycles and Attractors Boolean dynamics correspond to a trajectory within a finite


state space of size Ω = 2N . Any trajectory generated by a dynamical system with
unmutable dynamical update rules, as for the quenched model, will eventually lead
to a cyclical behavior. No trajectory can generate more than Ω distinct states in a
row. Once a state is revisited,

.Σt = Σt−T , T <Ω,

part of the original trajectory is retraced and cyclic behavior follows. The resulting
cycle acts as an attractor for a set of initial conditions.

Cycles of length one are fixpoint attractors. The fixpoint condition σi (t + 1) =


σi (t) (i = 1, . . . , N ) is independent of the updating rules, viz synchronous vs.
asynchronous. The order of updating the individual σi is irrelevant when none of
them changes.

An Example In Fig. 7.3 a network with N = 3 and K = 2 is fully defined. The


time evolution of the 23 = 8 states Σt is given for synchronous updating. One can
observe one cycle of length 2 and two cycles of length 1 (fixpoints).

7.3 Dynamics of Boolean Networks

We will now examine how we can characterize the dynamical state of boolean
networks in general and of N–K nets in particular. Two concepts will turn out to
be of central importance, the relation of robustness to the flow of information and
the characterization of the overall dynamical state, which we will find to be either
frozen, critical or chaotic.

5 A compound is said to be “annealed” when it has been kept long enough at elevated temperatures
such that the thermodynamic stable configuration has been achieved.
250 7 Random Boolean Networks

1 000 010
AND

001
2 OR OR 3

OR OR AND 100
12 3 13 2 23 1
00 0 00 0 00 0 110 011 111
01 1 01 1 01 0
10 1 10 1 10 0 101
11 1 11 1 11 1

Fig. 7.3 A boolean network with N = 3 sites and connectivities Ki ≡ 2. Left: Definition of the
network linkage and coupling functions. Right: The complete network dynamics. Reprinted from
Luque and Sole (2000) with permissions, © 2000 Elsevier Science B.V

7.3.1 Flow of Information Through a Network

Response to Changes Biological systems need to be robust. A gene regulation


network, to give an example, for which even small damage routinely results in the
death of the cell, will be at an evolutionary disadvantage with respect to a more
robust set-up. Here we will examine the sensitivity of the dynamics with regard to
initial conditions. A system is robust if two similar initial conditions lead to similar
long-time behavior. This approach is well defined for quenched models.

Hamming Distance Between Orbits For two distinct initial states,

Σ0 = {σ1 (0), σ2 (0), . . . , σN (0)},


. Σ̃0 = {σ̃1 (0), σ̃2 (0), . . . , σ̃N (0)} ,

we are typically interested in the case when Σ0 and Σ̃0 are close, viz when they
differ in the values of only a few elements. A suitable measure for the distance is
the “Hamming distance” D(t) ∈ [0, N],


N
2
D(t) =
. σi (t) − σ̃i (t) , (7.2)
i=1

which is just the sum of the elements that differ in Σ0 and Σ̃0 . As an example take

Σ1 = {1, 0, 0, 1},
. Σ2 = {0, 1, 1, 0}, Σ3 = {1, 0, 1, 1} .

We have 4 for the Hamming distance Σ1 –Σ2 and 1 for the Hamming distance Σ1 –
Σ3 . If the system is robust, two close-by initial conditions will never move far apart
with time passing, in terms of the Hamming distance.
7.3 Dynamics of Boolean Networks 251

Normalized Overlap The normalized overlap a(t) ∈ [0, 1] between two configu-
rations is defined as

1  2
N
D(t)
a(t) = 1 −
. =1− σi (t) − 2σi (t)σ̃i (t) + σ̃i2 (t)
N N
i=1

2 
N
≈ σi (t)σ̃i (t) , (7.3)
N
i=1

where we did assume the absence of any magnetization bias, namely that

1  2 1 1  2
. σi ≈ ≈ σ̃i ,
N 2 N
i i

in the last step. The normalized overlap (7.3) corresponds to a rescaled scalar
product between Σ and Σ̃. On the average, two arbitrary states have a Hamming
distance of N/2, which translates to a normalized overlap a = 1 − D/N of 1/2.

Information Retention for Long Times The difference between two initial states
Σ and Σ̃ can also be interpreted as an information for the system. One then has two
possible behaviors.

– LOSS OF INFORMATION
limt→∞ a(t) → 1, which implies that two states are identical in the thermody-
namic limit, or that they differ only by a finite number of elements. This happens
when two states are attracted by the same cycle. All information about the starting
states is lost.
– INFORMATION RETENTION
limt→∞ a(t) = a ∗ < 1, which means that the system ‘remembers’ that the
two configurations were initially different, with the difference measured by the
respective Hamming distance.

The system is robust when information is routinely lost, which holds for a ∗ = 1.
Robustness depends on the value of a ∗ when information is kept. If a ∗ > 0 then two
trajectories retain a certain similarity for all time scales.

Percolation of Information for Short Times Above we discussed how informa-


tion present in initial states evolves over extended time scales. Alternatively one
may ask a typical question of dynamical systems theory, namely how information is
processed for short times. We write

D(t) ≈ D(0) eλt ,


. (7.4)
252 7 Random Boolean Networks

where 0 < D(0) ⪡ N is the initial Hamming distance, with λ being the Lyapunov
exponent.6

The question is then whether two initially close trajectories converge or diverge
initially. One may generally distinguish between three different types of behaviors
or phases.

– CHAOTIC PHASE
λ > 0 : The Hamming distance grows exponentially, i.e. information is
transferred to an exponentially large number of elements. Two initially close
orbits soon become different. This behavior, found for large connectivities K,
is not suitable for real-world biological systems.
– FROZEN PHASE
λ < 0 : Two close trajectories typically converge, as they are attracted by the
same attractor. This behavior arises for small connectivities K. The system is
locally robust.
– CRITICAL PHASE
λ = 0 : When present, exponential time dependencies dominate all other contri-
butions. There is no exponential time dependence when the Lyapunov exponent
λ vanishes. In this case, the Hamming distance will depend algebraically on time,
D(t) ∝ t γ .

All three phases can be found in the N–K model when N → ∞. This is our next
step.

7.3.2 Mean-Field Phase Diagram

With in a mean-field approach, aka “molecular-field theory”, a microscopic model


is treated by averaging the influence of distinct components, lumping them together
into a single molecular field. Mean-field theories are ubiquitous.7

Mean-Field Theory We recall that we are working with two initial states,


N
2
Σ0 ,
. Σ̃0 , D(0) = σi − σ̃i ,
i=1

and that the Hamming distance .D(t) measures the number of elements differing in
Σt and .Σ̃t . An illustration of the following arguments is presented in Fig. 7.4.
.

6 Stabilityin terms of Lyapunov exponents is treated in Sect. 2.2 of Chap. 2.


7A reference molecular-field framework is the Landau theory of phase transitions, which is
discussed in detail in Sect. 6.1 of Chap. 6.
7.3 Dynamics of Boolean Networks 253

Σt+1 Σt+1

Σt 1 0 1 0 0 1 0 1 0 1 1 0 0 0 Σt

Fig. 7.4 The time evolution of the overlap between two states .Σt and .Σ̃t (left/right panels). The
vertices (squares) have values 0 or 1. Vertices with identical values in both states, .Σt and .Σ̃t , are
highlighted (gray background). The values of vertices at the next time step, .t + 1, can only differ
if the corresponding arguments are not identical. It is indicated whether vertices at time .t + 1 must
have the same value in both states (grey), or whether they can be different (star)

– For the N–K model, every boolean coupling function .fi is as likely to occur.
– On average, variables are controlling elements for K other variables.
– The variables differing in .Σt and .Σ̃t affect on the average .KD(t) coupling
functions.
– In the absence of a magnetization bias, coupling functions change their value
with probability .1/2.

Taken together, we conclude that the number of elements different in .Σt+1 and .Σ̃t+1
, viz the Hamming distance .D(t + 1), is given by
t
K K
D(t + 1) =
. D(t), D(t) = D(0) = D(0) et ln(K/2) , (7.5)
2 2

which identifies .λ = ln(K/2) as the local Lyapunov exponent.

Classification of Phases The connectivity K then determines the phase of the N–K
network.

– CHAOTIC
.K > 2 : Two initially close orbits diverge, the number of different elements, i.e.

the relative Hamming distance grows exponentially with time t.


– FROZEN
.K < 2 : The two orbits approach each other exponentially. All initial information

contained .D(0) is lost.


– CRITICAL
.K = Kc = 2 : The evolution of .Σt relative to .Σ̃t is driven by fluctuations.

The power laws typical for critical regimes cannot be deduced within mean-field
theory, which discards fluctuations.

The mean-field theory takes only average quantities into account. The evolution law
D(t + 1) = (K/2)D(t) holds only on the average. Fluctuations, viz the deviation of
.

the evolution from the mean-field prediction, are however of importance only close
to a phase transition, i.e. close to the critical point .K = 2.
254 7 Random Boolean Networks

The mean-field approximation generally works well for lattice physical systems
in high spatial dimensions and fails in low dimensions. The Kauffman network has
no dimension per se, but the connectivity K plays an analogous role.

Phase Transitions in Dynamical Systems and the Brain The notion of a ‘phase
transition’ originally comes from physics, where it denotes the transition between
two or more different physical phases, like ice, water and gas, which are well
characterized by their respective order parameters.

Classically, the term phase transition denotes therefore a transition between two
stationary states. The phase transition discussed here involves the characterization of
the overall behavior of a dynamical system. They are well defined phase transitions
in the sense that .1 − a ∗ plays the role of an order parameter; its value uniquely
characterizes the frozen phase and the chaotic phase in the thermodynamic limit.
An interesting, completely open and unresolved question is in this context
whether dynamical phase transitions play a role in the most complex information
processing system known, the mammalian brain. It is tempting to speculate,
e.g., that the phenomenon of consciousness may result from a dynamical state
characterized by a yet unknown order parameter. In case, consciousness would be
an ‘emergent’ state.

7.3.3 Bifurcation Phase Diagram

In deriving (7.5), we assumed that the coupling functions .fi of the system acquire
the values 0 and 1 with the identical probabilities .p = 1/2. We generalize this
approach to the case of a magnetic bias, as defined by

0 with probability p
fi =
. .
1 with probability 1 − p

For a given value of the bias p and connectivity K, there are critical values

Kc (p),
. pc (K) ,

such that for .K < Kc (.K > Kc ) the system is in the frozen phase (chaotic phase).
Vice versa, keeping the connectivity fixed and vary p, we have that .pc (K) separates
the system into a chaotic and frozen phase.

Evolution of the Overlap We note that the overlap .a(t) = 1 − D(t)/N between
two states .Σt and .Σ̃t at time t is the probability that two vertices have the same
value both in .Σt and in .Σ̃t .
7.3 Dynamics of Boolean Networks 255

1
0.9 K=1 1

0.8 a*
a(t+1)

K=3 0.8
0.7 K=7
0.6 a*
0.6
0.5
0 0.2 0.4 0.6 0.8 1 0 2 4 6 8 10
a(t) K
 
Fig. 7.5 Solution of the self-consistency condition .a ∗ = 1 − 1 − (a ∗ )K /Kc , see (7.9). Left:

Graphical solution equating both sides. Right: Numerical result for .a for .Kc = 3. The fixpoint

.a = 1 becomes unstable for .K > Kc = 3

– The probability that all arguments of the function .fi will be the same is both
configurations is
 K
ρK = a(t) .
. (7.6)

– As illustrated by Fig. 7.4, the values at the next time step differ with a probability
.2p(1 − p), but only if the arguments of the coupling functions are non-different.

– The probability that at least one controlling element has different values in .Σt
and .Σ̃t , .1 − ρK , this gives the probability, .(1 − ρK )2p(1 − p), of values being
different in the next time step.

We then have

1 − [a(t)]K
a(t + 1) = 1 − (1 − ρK ) 2p(1 − p) = 1 −
. , (7.7)
Kc

where .Kc is given in terms of p as



1 1 1 1
.Kc = , pc1,2 = ± − . (7.8)
2p(1 − p) 2 4 2K

The fixpoint .a ∗ of (7.7) obeys

1 − [a ∗ ]K
a∗ = 1 −
. . (7.9)
Kc

This self-consistency condition for the normalized overlap can be solved graphically
or numerically by simple iterations, compare Fig. 7.5.
256 7 Random Boolean Networks

Stability Analysis The trivial fixpoint

a∗ = 1
.

always constitutes a solution of (7.9). We examine its stability under the time
evolution, Eq. (7.7), by considering a small deviation .0 < Δat ⪡ 1 from the
fixpoint solution, .at = a ∗ − Δat :

1 − [1 − Δat ]K KΔat
1 − Δat+1 = 1 −
. , Δat+1 ≈ . (7.10)
Kc Kc

The trivial fixpoint .a ∗ = 1 therefore becomes unstable for .K/Kc > 1, viz when
 −1
.K > Kc = 2p(1 − p) .

Bifurcation Equation (7.9) has two solutions for .K > Kc , a stable fixpoint .a ∗ < 1
and the unstable solution .a ∗ = 1. One speaks of a bifurcation, which is shown in
Fig. 7.5. We note that


Kc 
. = 2,
p=1/2

in agreement with our previous mean-field result, Eq. (7.5). For large connectivities
one finds

∗ 1 − [a ∗ ]K 1
. lim a = lim 1− =1− = 1 − 2p(1 − p) ,
K→∞ K→∞ Kc Kc

since .a ∗ < 1 for .K > Kc , compare Fig. 7.5. Note that .a ∗ = 1/2 for .p = 1/2
corresponds to the average normalized overlap for two completely unrelated states
in the absence of a magnetization bias, .p = 1/2. Two initial similar states eventually
become completely uncorrelated for .t → ∞ when also the connectivity K diverges.

Rigidity of the Kauffman Net We can connect the results for the phase diagram
of the N–K network illustrated in Fig. 7.6 with our discussion on robustness, see
Sect. 7.3.1.

– CHAOTIC PHASE

.K > Kc : The infinite time normalized overlap .a is less than unity even when

two trajectories .Σt and .Σ̃t start out very close to each other. However, .a ∗ always
remains above the value expected for two completely unrelated states. The reason
is that the Hamming distance remains constant, modulo small-scale fluctuations
that do not contribute in the thermodynamic limit, .N → ∞, after the two orbits
did enter two distinct attractors.
7.3 Dynamics of Boolean Networks 257

10 p = 0.90

CHAOS
8 p = 0.79

p = 0.60
6
K
4

2
ORDER

0
0.50 0.60 0.70 0.80 0.90 1.00
p

Fig. 7.6 Phase diagram for the N–K model. Shown is the separation .Kc = [2p(1−p)]−1 between
the chaotic and order/frozen phase (solid line). The insets are simulations for .N = 50 networks
with .K = 3 and .p = 0.60 (chaotic phase), .p = 0.79 (on the critical line) and .p = 0.90 (frozen
phase). The site index runs horizontally, the time vertically. Note the fluctuations for .p = 0.79.
Reprinted from Luque and Sole (2000) with permissions, © 2000 Elsevier Science B.V

– FROZEN PHASE

.K < Kc : The infinite time overlap .a is exactly one. All trajectories approach

essentially the same configuration independently of the starting point, apart from
fluctuations that vanish in the thermodynamic limit. The system is said to “order”.

In the frozen phase, close-by orbits are attracted by the same cyclic attractor, in the
chaotic state by different attracting states.

Lattice Versus Random Networks The complete loss of information in the


ordered phase observed for the Kauffman net does not occur for lattice networks,
for which .a ∗ < 1 for any .K > 0. The finite range of the linkages in lattice systems
allows to store information about the initial data in spatially finite subsections of
the system, specific to the initial state. For Kauffman graphs, every region of the
network is equally close to any other and local storage of information is impossible.

Percolation Transition in Lattice Networks For lattice boolean networks, the


frozen and chaotic phases cannot be distinguished examining the value of the long-
term normalized overlap .a ∗ , as it is always smaller than unity. The lattice topology,
allows however to connect with percolation theory. One considers a finite system,
e.g. a .100 × 100 square lattice, and two states .Σ0 and .Σ̃0 that differ only along one
edge.
258 7 Random Boolean Networks

0
10
−1 p = 0.4
10
−2
10
−3 pc = 0.1464
10
−4
10 (a)
−5
10
p = 0.05
1 10 100 1000
D(t)/N
0.010

0.009 pc = 0.1464

0.008
(b)

0.007
0 100 200 300 400 500
t

Fig. 7.7 Normalized Hamming distance .D(t)/N for a Kauffman net with .N = 10, 000 variables,
connectivity .K = 4 and .D(0) = 100, viz .D(0)/N = 0.01. Top: Frozen phase (.p = 0.05),
critical (.pc ≃ 0.1464) and chaotic (.p = 0.4) phases, plotted with a logarithmic scale. Bottom:
Hamming distance for the critical phase (.p = pc ) but on a non-logarithmic scale. Reprinted from
Aldana-Gonzalez et al. (2003) with permissions, © 2003 Springer-Verlag New York, Inc

– If the damage, viz the difference in between .Σt and .Σ̃t , spreads for long times
to the opposite edge, then the system is said to be percolating and in the chaotic
phase.
– If the damage never reaches the opposite edge, then the system is in the frozen
phase.

Numerical simulations indicate, a critical .pc ≃ 0.298 for the two-dimensional


square lattice with connectivity .K = 4.

Numerical Simulations The results of the mean-field solution for the Kauffman
net are confirmed by numerical solutions of finite-size networks. In Fig. 7.7 the
normalized Hamming distance, .D(t)/N, is plotted for a Kauffman graph with
connectivity .K = 4 containing .N = 10,000 elements. The, results are shown for
parameters corresponding to the frozen phase and to the chaotic phase, in addition
to a parameter close to the critical line. Note that .1 − a ∗ = D(t)/N → 0 in the
frozen phase.

7.3.4 Scale-Free Boolean Networks

The Kauffman model is a reference model which can be generalized in various ways,
e.g. by considering small-world or scale-free networks.
7.3 Dynamics of Boolean Networks 259

Scale-Free Connectivity Distributions Scale-free connectivity distributions,8




1
P (K) =
. K −γ , ζ (γ ) = K −γ , γ > 1, (7.11)
ζ (γ )
K=1

abound in real-world networks. Here .P (K) denotes the probability to draw a


coupling function .fi (·) having Z arguments. The distribution (7.11) is normalizable
for .γ > 1. The average connectivity .〈K〉 is



 ∞ if 1 < γ ≤ 2
〈K〉 =
. K P (K) = ζ (γ −1) , (7.12)
K=1 ζ (γ ) < ∞ if γ >2

where .ζ (γ ) is the Riemann zeta function.

Annealed Approximation As before, we examine two states .Σt and .Σ̃t together
with the respective normalized overlap,

.a(t) = 1 − D(t)/N ,

which is identical to the probability that two vertices in .Σ and .Σ̃ have the same
value. For a magnetization bias p, we derived in Sect. 7.3.3 that

a(t + 1) = 1 − (1 − ρK ) 2p(1 − p)
. (7.13)

holds for the time-evolution of .a(t), where




ρK = [a(t)]K
. → [a(t)]K P (K) (7.14)
K=1

is the average probability that the .K = 1, 2, . . . controlling elements of the coupling


function .fi () are all identical. In (7.14) we generalized (7.6) to a non-constant
connectivity distribution .P (K), which leads to
 ∞


a(t + 1) = 1 − 2p(1 − p) 1 −
. a (t) P (K) ≡ F (a(t)) ,
K
(7.15)
K=1

compare (7.7). The statistical averaging used in (7.14) corresponds effectively to an


annealed model.

8 The theory of scale free graphs is developed in Sect. 1.6 of Chap. 1.


260 7 Random Boolean Networks

Fixpoints Within the Annealed Approximation In the limit .t → ∞, Eq. (7.15)


reduces to the self-consistency equation

a ∗ = F (a ∗ ) ,
.

for the fixpoint .a ∗ , where .F (a) is defined 


as the right-hand side of (7.15). Again,

.a = 1 is always a fixpoint of (7.15), since . K P (K) = 1, per definition.

Stability of the Trivial Fixpoint We repeat the stability analysis of the trivial
fixpoint .a ∗ = 1, as in Sect. 7.3.3, and assume a small deviation .Δa > 0 from

.a ,

a ∗ − Δa = F (a ∗ − Δa) = F (a ∗ ) − F ' (a ∗ )Δa,


. Δa = F ' (a ∗ )Δa .

The fixpoint .a ∗ becomes unstable if .F ' (a ∗ ) > 1. The critical point is determined by

 ∞
dF (a)
1 = lim
. = 2p(1 − p) K P (K)
a→1− da
k=1

= 2p(1 − p) 〈K〉 . (7.16)

For the classical N–K model all elements have the same connectivity, .Ki = 〈K〉 =
K, with (7.16) reducing to (7.10).

Frozen and Chaotic Phases for the Scale-Free Model For .1 < γ ≤ 2 the average
connectivity is infinite, see (7.12). .F ' (1) = 2p(1−p) .〈K〉 is then always larger than
unity and .a ∗ = 1 unstable, as illustrated in Fig. 7.8. Equation (7.15) then has a stable
fixpoint .a ∗ /= 1; the system is in the chaotic phase for all .p ∈]0, 1[.

For .γ > 2 the first moment of the connectivity distribution .P (K) is finite and
the phase diagram is identical to that of the N–K model shown in Fig. 7.6, with
.K replaced by .ζ (γc − 1)/ζ (γc ). The phase diagram in .γ –p space is presented in

Fig. 7.8. One finds that .γc ∈ [2, 2.5] for any value of p. There is no chaotic scale-
free network for .γ > 2.5. It is interesting to note that .γ ∈ [2, 3] for many real-world
scale-free networks.

7.4 Cycles and Attractors

We emphasized so far the general properties of boolean networks, such as the phase
diagram. We now turn to a more detailed inspection of the dynamics, in particular
regarding the structure of the attractors.
7.4 Cycles and Attractors 261

Fig. 7.8 Phase diagram for a 1


scale-free boolean network

magnetization bias p
with connectivity distribution
.∝ K
−γ , as given by (7.16).

The average connectivity


diverges for .γ < 2 and the
network is chaotic for all p
0.5 chaotic

ordered
0
2 2.2 2.4
exponent γ

7.4.1 Quenched Boolean Dynamics

Self-Retracting Orbits From now on we consider quenched systems for which the
coupling functions .fi (σi1 , . . . , σiK ) are fixed for all times. Orbits will eventually
retrace themself, at least partly, since state space .Ω = 2N is finite. Long-term
trajectories are therefore cyclic.

ATTRACTOR An attractor A0 of a discrete dynamical system is a region {Σt } ⊂ Ω in


phase space that maps completely onto itself under the time evolution, At+1 = At ≡ A0 .

In practice, this means that

.Σ (1) → Σ (2) → ... → Σ (1) ,

see Figs. 7.3 and 7.9 for some examples. Fixed points are cycles of length 1.

ATTRACTION BASIN The attraction basin B of an attractor A0 is the set {Σt } ⊂ Ω for
which there is a time T < ∞ such that ΣT ∈ A0 .

For randomly drawn initial conditions, the probability of ending up in a given


cycle is directly proportional to the size of its basin of attraction. The three-site
network illustrated in Fig. 7.3 is dominated by the fixpoint {1, 1, 1}, which is reached
with probability 5/8 for random initial starting states.

Attractors are Everywhere Attractors and fixpoints are generic features of


dynamical systems and core to their characterization, as they dominate the time
evolution in state space within their respective basins of attraction. Random
boolean networks allow for detailed studies of the structure of attractors and of
262 7 Random Boolean Networks

σ1 σ17
σ2 σ2
σ3 σ1 σ6 σ3
σ4 σ18 σ1
σ5 σ18
σ6 σ1 σ17 σ4
σ7 σ17
σ8 σ18 σ13 σ16 σ7 σ18 σ5
σ9 σ13
σ10 σ15
σ11 σ11 σ9 σ19 σ8
σ15
σ12 σ11
σ13 σ16 σ10 σ20
σ14 σ11
σ15 σ7
σ16 σ19 σ11 σ14
σ17 σ16 σ2
σ18 σ7 σ12
σ19 σ15
σ20 σ10

Fig. 7.9 Cycles and linkages. Left: sketch of the state space where every bold point stands for a
state Σt = {σ1 , . . . , σN }. The state space decomposes into distinct attractor basins for each cyclic-
or fixpoint attractor. Right: linkage loops for an N = 20 model with K = 1. The controlling
elements are listed in the center column. Each arrow points from the controlling element toward
the direct descendant. There are three modules of uncoupled variables. Reprinted from Aldana-
Gonzalez et al. (2003) with permissions, © 2003 Springer-Verlag New York, Inc

the connection to network topology. Of special interest is how various properties of


the attractors, like the cycle length and the size of the attractor basins, relate to the
thermodynamic differences between the frozen and the chaotic phase. These are the
issues that we shall discuss now.

Linkage Loops, Ancestors and Descendants Any variable σi may appear as


an argument in the coupling functions for other elements; it is said to act as a
controlling element. Graphically, the collections of all linkages corresponds to a
directed graph, as illustrated in Figs. 7.1, 7.3 and 7.9, with the vertices representing
the individual binary variables. A given element σi can influence potential a large
number of different states during the continued time evolution.

ANCESTORS AND DESCENDANT. The elements a vertex affects consecutively via the
coupling functions are its descendants. Going backwards in time one finds the ancestors
of an element.

In the 20-site network illustrated in Fig. 7.9 the descendants of σ11 are σ11 , σ12
and σ14 .
When an element is its own descendant (and ancestor) it is said to be part of a
“linkage loop”. Different linkage loops can overlap, as is the case for the linkage
loops

.σ1 → σ2 → σ3 → σ4 → σ1 , σ1 → σ2 → σ3 → σ1

shown in Fig. 7.1. Linkage loops are disjoint for K = 1, compare Fig. 7.9.
7.4 Cycles and Attractors 263

Modules and Time Evolution The set of ancestors and descendants determines
the overall dynamical dependencies.

MODULE The collection of all ancestors and descendants of a given element σi is the
module (or component) to which σi belongs.

If we go through all variables σi , i = 1, . . . , N we find all modules, with every


element belonging to one and only one specific module. Otherwise stated, disjoint
modules correspond to disjoint subgraphs, the set of all modules making up the full
linkage graph. The time evolution is block-diagonal in terms of modules. For all
times t an element σi (t) is independent of all variables not belonging to its own
module.
In lattice networks the clustering coefficient, the density of three-site loops, is
large and closed linkage loops occur frequently. For big lattice systems with a small
mean linkage K we expect far away spatial regions to evolve independently, due the
lack of long-range connections.

Relevant Nodes and Dynamic Core Taking a look at dynamics of the 20-site
model illustrated in Fig. 7.9, we notice that the elements σ12 and σ14 just follow the
dynamics of σ11 , they are “enslaved” by σ11 . These two elements do not control any
other element and one can delete them from the system without qualitative changes
to the overall dynamics.

RELEVANT NODE A node is termed relevant if its state is not constant and if it controls at
least one other relevant element (eventually itself).

An element is constant if it evolves, independently of the initial conditions,


always to the same state and not constant otherwise. The set of relevant nodes, the
dynamic core, controls the overall dynamics. The dynamics of all other nodes can
be disregarded without changing the attractor structure. The node σ13 of the 20-site
network illustrated in Fig. 7.9 is relevant if the boolean function connecting it to
itself is either the identity or the negation.
The concept of a dynamic core is relevant for practical applications. Gene
expression networks may be composed of thousands of nodes, with a relatively
small dynamic core controlling overall network dynamics. This is the case for the
gene regulation network controlling the yeast cell cycle discussed in Sect. 7.5.2.

Lattice Nets vs. Kauffman Nets Linkages are short-ranged for lattice systems and
whenever a given element σj acts as a controlling element for another element σi ,
there is a high probability that also the reverse holds, viz that σi is an argument
of fj .
264 7 Random Boolean Networks

The linkages are generally non-reciprocal for the Kauffman net; the probability
for reciprocity is K/N, which vanishes in the thermodynamic limit for finite K. The
number of disjoint modules in a random network therefore grows more slowly than
the system size. For lattice systems, on the other hand, the number of modules is
proportional to the size of the system. The differences between lattice and Kauffman
networks translate to different cycle structures, as every periodic orbit for the full
system is constructed out of the individual attractors of all modules present in the
network.

7.4.2 K = 1 Kauffman Network

We start our discussion of the cycle structure of Kauffman nets with the case .K = 1,
which can be solved exactly. The maximal length .lmax for a linkage loop is on the
average of the order of

lmax ∼ N 1/2 .
. (7.17)

At every of the l steps of a sequence of length l, the next node .σi could be one of
the l sites visited previously. Overall, the typical cycle length is reached when the
probability to close, .∼ l (l/N ), becomes of order one, which leads to (7.17). This
derivation can be made rigorous following the line of arguments we will develop in
Sect. 7.4.4 for large-K nets.

Three-Site Linkage Loop with Identities Linkage loops determine the cycle
structure together with the choice of the coupling ensemble. As an example we
take the case .N = 3.

For .K = 1 there are only two non-constant coupling functions, i.e. the identity I
and the negation .¬. When all three coupling functions are the identity, we have

.ABC → CAB → BCA → ABC → . . . ,

where we denoted with .A, B, C the values of the binary variables .σi , .i = 1, 2, 3.
There are two cycles of length one in which all elements are identical. When the
three elements are not identical, the cycle length is three. The complete dynamics is
then

000 → 000 100 → 010 → 001 → 100


. ,
111 → 111 011 → 101 → 110 → 011

as illustrated in Fig. 7.10.


7.4 Cycles and Attractors 265

Fig. 7.10 A .K = 1 linkage 1 0 0 1 0 0


loop with .N = 3 sites and
identities as Boolean ⇒ ⇒ ⇒
coupling functions 0 0 1

Three-Site Linkage Loops with Negations Let us consider now the case that all
three coupling functions are negations:

ABC → C̄ ĀB̄ → BCA → ĀB̄ C̄ → . . .


. Ā = ¬A, etc. .

The cycle length is two if all elements are identical

. 000 → 111 → 000

and of length 6 if they are not,

. 100 → 101 → 001 → 011 → 010 → 110 → 100 .

The complete state space .Ω = 23 = 8 decomposes into two cycles, one of length 6
and one of length 2.

Three-Site Linkage Loops with a Constant Function Let us see what happens
if any of the coupling functions is a constant function. For illustration purposes we
consider the case of two constant functions 0 and 1 and the identity,

ABC → 0A1 → 001 → 001 .


. (7.18)

Generally it holds that the cycle length is one if any of the coupling functions is the
identity, in this case only a single fixpoint attractor exists. Equation (7.18) holds for
all .A, B, C ∈ {0, 1}; the basin of attraction for 001 is hence the whole state space,
with 001 being a global attractor.

The Kauffman net contains very large linkage loops for .K = 1, see (7.17).
The probability that a given linkage loop contains at least one constant function is
consequently very high, which implies that the average cycle length remains short.

Loops and Attractors Attractors are made up of the set of linkage loops, which
illustrate by means of a five-site network with two linkage loops,

A→
.
I
B→
I
C→
I
A, D→
I
E→
I
D,

with all coupling functions being the identity I . The states

00000,
. 00011, 11100, 11111
266 7 Random Boolean Networks

are fixpoints in phase space .Σ = ABCDE. Examples of cyclic attractors of length


3 and 6 are

10000 → 01000 → 00100 → 10000


.

and

10010 → 01001 → 00110 → 10001 → 01010 → 00101 → 10010 .


.

In general, the length of an attractor is given by the least common multiple of the
periods of the constituent loops. This relation holds for .K = 1 boolean networks,
for general K the attractors are composed of the cycles of the constituent set of
modules.

Critical .K = 1 Boolean Networks When the coupling ensemble is selected


uniformly, as defined in Sect. 7.2.2, the .K = 1 network is in the frozen state. If we
do however restrict our coupling ensemble, allowing only identity and negation, the
value of one node is just copied or inverted to exactly one other node. Information
is retained, and not lost, when the two constant .K = 1 coupling functions are
not present. The information is not multiplied either, being transmitted to exactly
one and not more nodes. The network is hence critical in terms of the framework
developed in Sect. 7.3.1.

7.4.3 K = 2 Kauffman Network

The .K = 2 Kauffman net is critical, as discussed in Sects. 7.3.1 and 7.3.2. When
physical systems undergo a second-order phase transition, right at the point of
transition response functions will be scale free, following power laws. It is therefore
natural to expect the same for critical dynamical systems, such as a random boolean
network.
Initially, this expectation was born out by a series of mostly numerical investiga-
tions, which indicated that both the typical cycle lengths, as well as the mean
√ number
of different attractors, would grow algebraically with N , namely like . N. It was
therefore tempting to relate power laws seen in natural organisms to the behavior of
critical random boolean networks.

Undersampling of State Space However, the problem to determine the number


and the length of cycles turned out to be numerically exceedingly demanding. In
order to extract power laws one has to simulate systems with large N , which implies
that one needs to deal with an exponentially growing state space .Ω = 2N . An
exhaustive enumeration of all cycles is hence impossible and one has to resort to
a weighted sampling of state space for a given network realization, extrapolating
subsequently from a√small fraction of states sampled to the full state space. This
method yielded the . N dependence referred to above.
7.4 Cycles and Attractors 267

As a matter of principle, weighted sampling may however undersample state


space. The number of cycles found in the average state space may not be represen-
tative for the overall number of cycles, as there might be small fractions of state
space with very high number of attractors dominating the total number of attractors.
This is indeed the case. One can prove rigorously that the number of attractors
grows faster than any power for the .K = 2 Kauffman net. However, one may
still argue that results for the “average state space” are relevant for biological
applications, as biological systems are not too big anyway. Hormone and gene
regulation networks of mammals contain respectively of the order of 100 and 20,000
elements.

Observational Scale Invariance Experimental observations of a dynamical sys-


tem are typically equivalent to a random sampling of its phase space. Experimental
results will hence reflect the properties of the attractors with large basins of
attractions, which dominate phase space. For the case of the .K = 2 Kauffman
net an external observer would hence find scale invariance. Given that the response
of the system to a random perturbation will also involve the dominating attractors,
one may considered the .K = 2 net as “effectively scale free”.

7.4.4 K = N Kauffman Network

Within mean-field theory, which holds for the fully connected .K = N network,
one can evaluate the average number and the length of cycles using probability
arguments.

Random Walk Through Configuration Space We examine orbits starting from


an arbitrary configuration .Σ0 at time .t = 0. The time evolution generates a series of
states

Σ0 , Σ1 , Σ2 , . . .
.

through configuration space, with size .Ω = 2N . For a large connectivity, as in the


limit .K = N, one can assume the .Σt to be uncorrelated, forming a random walk.

Closing the Random Walk The walk through configuration space continues until
it hits a previously visited point, as illustrated in Fig. 7.11. We define with

– .qt : the probability that the trajectory remains unclosed after t steps;
– .Pt : the probability of terminating the excursion exactly at time t.

Including .Σ0 and .Σt , .t + 1 different sites have been visited if the trajectory is still
open at time t. Therefore, there are .t + 1 ways of terminating the walk at the next
time step and the relative probability of termination is .ρt = (t + 1)/Ω. The overall
268 7 Random Boolean Networks

Σ t+1 ρt
1−ρ t

Σ0 Σ1 Σ2 Σ3 Σt Σ t+1
Fig. 7.11 A random walk in configuration space. The relative probability .ρt = (t + 1)/Ω of
closing the loop at time t is the probability that .Σt+1 ≡ Σt ' , with a certain .0 ≤ t ' ≤ t

probability .Pt+1 to terminate the random walk at time .t + 1 is

t +1
Pt+1 = ρt qt =
. qt .
Ω
In analogy, the probability for the trajectory to remain open after .t + 1 steps is given
by
 
t+1 
t +1 i
qt+1 = qt (1 − ρt ) = qt
. 1− = q0 1− , q0 = 1 .
Ω Ω
i=1

Diverging Phase Space In the thermodynamic limit, .N → ∞, phase space


diverges exponentially, .Ω = 2N → ∞, which implies that the approximation


t  
t 
i
qt =
. 1− ≈ e−i/Ω = e− i i/Ω
= e−t (t+1)/(2Ω) (7.19)
Ω
i=1 i=1

becomes exact. For large times t we may use .t (t + 1)/(2Ω) ≈ t 2 /(2Ω). The overall
probability


Ω  ∞ t −t 2 /(2Ω)
. Pt ≃ dt e =1
0 Ω
t=1

for the random walk to close at all is unity.

Cycle Length Distribution The average number .〈Nc (L)〉 of cycles of length L is

  qt=L−1 Ω exp[−L2 /(2Ω)]


.Nc (L) = = , (7.20)
Ω L L
where we used (7.19), with .〈· · · 〉 denoting an ensemble average over realizations.
In deriving (7.20) we used the following considerations.
7.4 Cycles and Attractors 269

– The probability that .Σt+1 is identical to .Σ0 is .1/Ω.


– There are .Ω possible starting points (factor .Ω).
– Factor .1/L corrects for the overcounting of cycles when considering the L
possible starting sites of the L-cycle.

Average Number of Cycles From (7.20) the mean number .N̄c of cycles can be
extracted,


N  ∞
N̄c =
. 〈Nc (L)〉 ≃ dL 〈Nc (L)〉 . (7.21)
L=1 1

 
When going from the sum . L to the integral√ . dL in (7.21) we neglected terms of
order unity. Rescaling variables by .u = L/ 2Ω, one obtains
 ∞   ∞
e−u e−u
1 2 2
exp[−L2 /(2Ω)]
N̄c =
. dL = √ du + du .
L u u
   
1 1/ 2Ω 1
 
≡ I1 ≡ I2

∞ c ∞
A separation . 1/√2Ω = 1/√2Ω + c of the integral was performed, for simplicity
with .c = 1, any other finite value for c would also do the job. The second integral,
.I2 , does not diverge as .Ω → ∞. For .I1 we have

 
e−u
1 2 1 1 1
I1 =
. √ du = √ du 1 − u2 + u4 + . . .
1/ 2Ω u 1/ 2Ω u 2

≈ ln( 2Ω) ,
1
since all further terms .∝ 1/√2Ω du un−1 < ∞ for .n = 2, 4, . . . and .Ω → ∞. The
average number of cycles is then

N ln 2
N̄c = ln( 2N ) + O(1) ≃
. , (7.22)
2
which holds for .N = K Kauffman nets approaching the thermodynamic limit .N →
∞.

Mean Cycle Length On the average, the length .L̄ of a random cycle is
∞  ∞
1  1 exp[−L2 /(2Ω)]
L̄ =
. L 〈Nc (L)〉 ≈ dL L
N̄c L=1 N̄c 1 L
 ∞ √ 
1 2Ω ∞
dL e−L /(2Ω) = du e−u
2 2
= √ (7.23)
N̄c 1 N̄c 1/ 2Ω
270 7 Random Boolean Networks


after rescaling with .u = L/ 2Ω and using (7.20). The last integral on the right-
hand-side of (7.23) converges for .Ω → ∞, with the consequence that the mean
cycle length .L̄ scales as

L̄ ∼ Ω 1/2 /N = 2N/2 /N ,
. (7.24)

where we used (7.22), namely that .N̄c ∼ N.

7.5 Applications
7.5.1 Living at the Edge of Chaos

Gene Expression Networks and Cell Differentiation Kauffman introduced the


N–K model in the late 1960s for the purpose of modeling dynamics and time
evolution of networks of interacting genes, i.e. the gene expression network. In this
model an active gene might influence the expression of any other gene, e.g. when
the protein transcripted from the first gene influences the expression of the second
gene.

Gene expression networks of real-world cells are not random. However, the web
of linkages and connectivities among the genes in a living organism is intricate
and one may considered it to be a good 0-th order approximation to model gene-
gene interactions as random. The purpose would be to gain generic insights into the
properties of gene expression networks, namely results that are independent of the
particular set of linkages and connectivities realized in a particular living cell.

Dynamical Cell Differentiation Whether random or not, in order to keep the


cell functioning, the dynamics resulting from a gene expression network needs to
be stable. Humans have only a few hundred different cell types in their bodies.
Considering the fact that every single cell contains the identical genetic material, in
full, Kauffman proposed that cell types would correspond to a distinct dynamical
state of the underlying gene expression network. It is natural to assume that these
states correspond to attractors, viz in general to cycles. In the chaotic phase the
average length L̄ of a cycle in a N–K Kauffman net is

.L̄ ∼ 2αN ,

e.g. for N = K where α = 1/2, see (7.24). Considering that N ≈ 20,000 for the
human genome, an exponentially large mean cycle length L̄ is somewhat unsettling,
a single cell would take the universe’s lifetime to complete just a single cycle. It
then follows that operational gene expression networks of living organisms cannot
be in the chaotic phase.
7.5 Applications 271

Living at the Edge of Chaos There are but two possibilities left if the gene
expression network cannot operate in the chaotic phase, the frozen phase together
with the critical point. In the frozen phase, the average cycle length is short and the
dynamics stable, see Sect. 7.4.2. The system is consequently resistant to damage of
linkages.

But what about Darwinian evolution? Is too much stability good for the
adaptability of cells in a changing environment? Kauffman suggested that gene
expression networks would operate “at the edge of chaos”, an expression that
became legendary. By this Kauffmann meant that networks close to criticality
benefit from the stability properties of the close-by frozen phase, being at the same
time sensible to changes in the network structure, such that Darwinian adaption
remains possible.
But how can a system reach criticality by itself? For the N–K network there is
no extended critical phase, only a single critical point, K = 2. One speaks of “self-
organized criticality” when internal mechanisms allow adaptive systems to evolve
autonomously in such a way that they approach the critical point.9 This would be the
case if Darwinian evolution trims the gene expression networks towards criticality.
Cells close to the critical point would have the highest fitness, as cells in the chaotic
phase die because they are operationally unstable, with cells deep in the frozen
phase being selected out in the course of time, given that they are unable to adapt to
environmental changes.

7.5.2 Yeast Cell Cycle

Cell Division Process Cells have two tasks: to survive and to multiply. When a
living cell grows too big, a cell division process starts. The cell cycle has been
studied intensively for budding yeast. In the course of the division process, the cell
goes through a distinct set of states

G1 → S → G2 → M → G1 ,
.

with .G1 being the “ground state” in physics slang, viz the normal cell state; the
chromosome division takes place during the M phase. These states are characterized
by distinct gene activities, i.e. by the kinds of proteins active in the cell. All
eukaryote cells have similar cell division cycles.

9 See Chap. 6.
272 7 Random Boolean Networks

Fig. 7.12 The N = 11 core


network responsible for the Cell Size Cln3
yeast cell cycle. Acronyms
denote protein names, arrows
excitatory/inhibitory SBF MBF
connections (blue/red). Cln3
is inactive in the resting state
G1 , becoming active when
the cell reaches a certain size Sic1
(top), which initiates the cell Cln1,2 Clb5,6
division process. Data from
Li et al. (2004)

Clb1,2
Cdh1 Mcm1

Cdc20 Swi5

Yeast Gene Expression Network From the ∼ 800 genes involved only 11–13 core
genes are actually responsible for regulating the part of the gene expression network
responsible for the division process; all other genes are more or less just descendants
of the core genes. The cell dynamics contains certain checkpoints, where the cell
division process can be stopped if something was to go wrong. When eliminating the
checkpoints a core network with only 11 elements remains, as shown in Fig. 7.12.

Boolean Dynamics The full dynamical dependencies are not known in detail for
the yeast gene expression network. The simplest model is to assume

1 if ai (t) > 0 
σi (t + 1) =
. , ai (t) = wij σj (t) , (7.25)
0 if ai (t) ≤ 0 j

i.e. a boolean dynamics10 for the binary variables σi (t) = 0/1 representing the
activation/deactivation of protein i, with couplings wij = ± 1 for an excitatory/inhi-
bitory functional relation.

10 Genes are boolean variables in the sense that they are either expressed or not. The quantitative

amount of proteins produced by a given active gene is regulated via a separate mechanism involving
microRNA, small RNA snippets.
7.5 Applications 273

Fig. 7.13 The yeast cell cycle as an attractor trajectory of the gene expression network. Out of the
211 = 2, 048 states in phase space, shown are the 1764 states (green dots), that make up the basin
of attraction of the biologically stable G1 -state (bottom). After starting with the excited G1 normal
state (the first state in the biological pathway), compare Fig. 7.12, the boolean dynamics runs
through the known intermediate states (blue arrows), until the G1 states attractor is again reached,
this time representing the two daughter cells. Reprinted from Li et al. (2004) with permissions,
© 2004 by The National Academy of Sciences, U.S.A

Fixpoints The 11-site network has 7 attractors, all cycles of length one, viz
fixpoints. The dominating fixpoint has an attractor basin of 1764 states, representing
about 72% of the state space Ω = 211 = 2,048. Remarkably, the protein activation
pattern of the dominant fixpoint corresponds exactly to that of the experimentally
determined G1 ground state of the living yeast cell.

Cell Division Cycle In the G1 ground state the protein Cln3 is inactive. When
the cell reaches a certain size it becomes expressed, i.e. it becomes active. For the
network model one then just starts the dynamics by setting

.σCln3 → 1, at t =0

in the G1 state. The ensuing simple boolean dynamics, induced by (7.25), is depicted
in Fig. 7.13.
274 7 Random Boolean Networks

The remarkable result is that the system follows an attractor pathway that runs
through all experimentally known intermediate cell states, reaching the ground state
G1 in 12 steps.

Comparison with Random Networks The properties of the boolean network


depicted in Fig. 7.12 can be compared with those of a random boolean network.
A random network of the same size and average connectivity would have more
attractors with correspondingly smaller basins of attraction. Living cells clearly need
a robust protein network to survive in harsh environments.

Nevertheless, the yeast protein network shows more or less the same susceptibil-
ity to damage as a random network. The core yeast protein network has an average
connectivity of 〈K〉 = 27/11 ≃ 2.46. The core network has only N = 11 sites, a
number far too small to allow comparison with the properties of N–K networks in
the thermodynamic limit N → ∞. Nevertheless, an average connectivity of 2.46 is
remarkably close to K = 2, i.e. the critical connectivity for N–K networks.

Life as an Adaptive Network Living beings are complex and adaptive dynamical
systems.11 The here discussed insights in the yeast gene expression network indicate
that this statement is not just an abstract notion. Adaptive regulative networks
constitute the core of all living.

7.5.3 Application to Neural Networks

Time Encoding by Random Neural Networks There is a certain debate in


neuroscience whether, and to which extent, time encoding is used in neural
processing.

– ENSEMBLE ENCODING
Ensemble encoding entails that the activity of a sensory input is transmitted
via the firing of certain ensembles of neurons. Distinct sensory inputs, like the
different smells sensed by the nose, is encoded by dedicated neural ensembles.
– TIME ENCODING
Time encoding is present if the same neurons transmit more than one piece of
sensory information by changing their respective firing patterns.

Cyclic attractors are an obvious tool to generate time encoded information. One
would envision that appropriate initial conditions corresponding to certain activity
patterns of the primary sensory organs start the dynamics of neural and/or random
boolean networks, which will settle eventually into a cycle, as discussed in Sect. 7.4.
The random network may then be used to encode initial firing patterns by the time

11 In this context see Chap. 3, together with Chap. 8.


7.5 Applications 275

a b
Ensembles of neurons

Random boolean network Time−dependent output−


with cycles and attractors cycles depend on input

Fig. 7.14 Illustration of ensemble and time encoding (left/right). Left: Neuronal receptors
corresponding to the same class of input signals are combined, as occuring in the nose for different
odors. Right: The primary input signals are mixed together by a random neural network close to
criticality, the relative weights of odor components are subsequently time encoded by the output
signal

sequence of neural activities resulting from the firing patterns of the corresponding
limiting cycle, see Fig. 7.14.

Critical Sensory Processing The processing of incoming information is qualita-


tively different in the various phases of the N–K model, as discussed in Sect. 7.3.1.

The chaotic phase is unsuitable for information processing, any input results
in an unbounded response and saturation. The response in the frozen phase is
strictly proportional to the input and is therefore well behaved, but also relatively
uninteresting. The critical state, on the other hand, has the possibility of nonlinear
signal amplification.
Sensory organs in animals can routinely process physical stimuli, such as light,
sound, pressure, or odorant concentrations, which vary by many orders of magnitude
in intensity. The primary sensory cells, e.g. the light receptors in the retina, have,
however a linear sensibility to the intensity of the incident light, with a relatively
small dynamical range. It is therefore conceivable that the huge dynamical range
of sensory information processing of animals is a collective effect, as it occurs in a
random neural network close to criticality. This mechanism, which is plausible from
the view of possible genetic encoding mechanisms, is illustrated in Fig. 7.15.
276 7 Random Boolean Networks

Primary sensory cells


random boolean network
close to criticality

response
response

stimuli stimuli

Fig. 7.15 The primary response of sensory receptors can be enhanced by many orders of
magnitude using the non-linear amplification properties of a random neural network close to
criticality

Exercises

(7.1) K = 1 KAUFFMAN NET


Analyze selected N = 3 Kauffman nets connected via a K = 1 cyclic linkage
tree, σ1 = f1 (σ2 ), σ2 = f2 (σ3 ), σ3 = f3 (σ1 ). Consider

– f1 = f2 = f3 = identity,
– f1 = f2 = f3 = negation, and
– f1 = f2 = negation, f 3 = identity.

Construct all cycles and their attraction basins.


(7.2) N = 4 KAUFFMAN NET
Consider the N = 4 graph illustrated in Fig. 7.1. Assume all coupling
functions to be generalized XOR-functions (1/0 if the number of input-1’s
is odd/even). Find all cycles.
(7.3) SYNCHRONOUS VS. ASYNCHRONOUS UPDATING
Consider the dynamics of the three-site network illustrated in Fig. 7.3 under
sequential asynchronous updating. At every time step first update σ1 then σ2
and then σ3 . Determine the full network dynamics, find all cycles and fixpoints
and compare with the results for synchronous updating shown in Fig. 7.3.
(7.4) LOOPS AND ATTRACTORS
Consider, as in Sect. 7.4.2, a K = 1 network with two linkage loops,
¬ ¬ ¬
A→
.
I
B→ C→
I
A, D→ E→ D,

with I denoting the identity coupling and ¬ the negation, compare Sect. 7.2.2.
Find all attractors by considering first the dynamics of the individual linkage
loops. Is there any state in phase space which is not part of any cycle?
Further Reading 277

(7.5) RELEVANT NODES AND DYNAMIC CORE


How many constant nodes does the network shown in Fig. 7.3 have? Replace
then the AND function with XOR and calculated the complete dynamics. How
many relevant nodes are there now?
(7.6) THE HUEPE AND ALDANA NETWORK
Solve the boolean neural network with uniform coupling functions and noise,
⎧ K
⎨ sign j =1 σij (t) with probability 1 − η
.σi (t + 1) =
⎩ −sign K σ (t) with probability η
j =1 ij

via mean-field theory, where σi = ±1, by considering the order parameter


1 
T N
1
.Ψ = lim |s(t)| dt, s(t) = lim σi (t) .
T →∞ T 0 N→∞ N
i=1

See Huepe and Aldana-González (2002) and additional hints in the solutions
section.
(7.7) LOWER BOUND FOR BOND PERCOLATION
When covering the individual bonds of a d-dimensional hypercubic lattice
with a probability p there is giant connected component above a certain
critical pc (d) and none below. Define with
! 1/n "
λ(d) = lim
. σn (d) (7.26)
n→∞

the connectivity constant λ(d) of the original lattice, where σ (n) is the number
of paths of length n starting from a given site. Find an upper bound for σn (d)
and hence for the connectivity constant. For percolation to happen the bond
probability p must be at least as large as 1/λ(d). Why? Use this argument to
find a lower bound for pc (d).

Further Reading

The interested reader may want to take a look at Kauffman’s seminal work
on random boolean networks, Kauffman (1969), or to study his well-known
book, Kauffman (1993). For reviews on boolean networks please consult Aldana-
Gonzalez et al. (2003) and Schwab et al. (2020). See also Hopfensitz et al. (2013),
for a tutorial on attractors in boolean nets, and Wang et al. (2012), for boolean
network modeling of biological systems.
Original studies of potential interest include a numerical investigation of Kauff-
man nets, Bastolla and Parisi (1998), together with an investigation of the scaling
of the number of attractors with size, Samuelsson and Troein (2003). The modeling
278 7 Random Boolean Networks

of the yeast reproduction cycle by boolean networks is given in Li et al. (2004). For
the concept of observational scale invariance, and of nonlinear signal amplification
close to criticality, see respectively Marković and Gros (2013) and Kinouchi and
Copelli (2006).

References
Aldana-Gonzalez, M., Coppersmith, S., & Kadanoff, L. P. (2003). Boolean dynamics with random
couplings. In E. Kaplan, J. E. Marsden, & K. R. Sreenivasan (Eds.), Perspectives and problems
in nonlinear science. A celebratory volume in honor of Lawrence Sirovich. Springer Applied
Mathematical Sciences Series (pp. 23–89). Springer.
Bastolla, U., & Parisi, G. (1998). Relevant elements, magnetization and dynamical properties in
Kauffman Networks: A numerical study. Physica D, 115, 203–218.
Hopfensitz, M., Müssel, C., Maucher, M., & Kestler, H. A. (2013). Attractors in Boolean networks:
a tutorial. Computational Statistics, 28, 19–36.
Huepe, C., & Aldana-González, M. (2002). Dynamical phase transition in a neural network model
with noise: An exact solution. Journal of Statistical Physics, 108, 527–540.
Kauffman, S. A. (1969). Metabolic stability and epigenesis in randomly constructed nets. Journal
of Theoretical Biology, 22, 437–467.
Kauffman, S. A. (1993). The origins of order: Self-organization and selection in evolution. Oxford
University Press.
Kinouchi, O., & Copelli, M. (2006). Optimal dynamical range of excitable networks at criticality.
Nature Physics, 2, 348–352.
Li, F., Long, T., Lu, Y., Ouyang, Q., & Tang, C. (2004). The yeast cell-cycle network is robustly
designed. Proceedings of the National Academy Science, 101, 4781–4786.
Luque, B., & Sole, R.V. (2000). Lyapunov exponents in random Boolean networks. Physica A,
284, 33–45.
Marković, D., & Gros, C. (2013). Criticality in conserved dynamical systems: Experimental
observation vs. exact properties. Chaos, 23, 013106.
Samuelsson, B., & Troein, C. (2003). Superpolynomial growth in the number of attractors in
Kauffman networks. Physical Review Letters, 90, 098701.
Schwab, J. D, et al. (2020). Concepts in Boolean network modeling: What do they all mean?
Computational and Structural Biotechnology Journal, 18, 571–582.
Wang, R. S., Saadatpour, A., & Albert, R. (2012). Boolean modeling in systems biology: An
overview of methodology and applications. Physical Biology, 9, 055001.
Darwinian Evolution, Hypercycles and Game
Theory 8

Adaptation and evolution are quasi synonymous in popular language, it is hence


little surprising that the theory of Darwinian evolution is infused with complex sys-
tems concepts. We will discuss when a species will manage to successfully mutate
while escaping its local fitness optimum, or when it may spiral towards extinction
in the wake of an “error catastrophe”. Venturing into the mysteries surrounding the
origin of life, we will investigate the possible advent of a quasispecies in terms
of mutually supporting hypercycles. The basic theory of evolution is furthermore
closely related to game theory, the mathematical theory of socially interacting
agents, viz of rationally acting economic persons. The tragedy of the commons
occurring in this context describes the over-exploitation of resources.
We will learn in this chapter that complex systems come with respective
individual characteristics. For Darwinian evolution the core notions are fitness,
selection and mutation. On this basis, generic concepts from dynamical systems
theory unfold, like the phenomenon of stochastic escape. Furthermore we will show
that evolutionary processes affect not only the welfare of an individual species, but
the pattern of species abundances of entire ecosystems, which will be discussed in
this context the neutral theory of macroecology.

8.1 Introduction

Population Genetics The ecosystem of the earth is a complex and adaptive system.
It formed via Darwinian evolution through species differentiation and adaptation
to a changing environment. Reproduction success is based on a set of inheritable
traits, the genome, that is passed from parent to offspring, with the interplay
between random mutations and natural selection playing a central role. This process

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 279
C. Gros, Complex and Adaptive Dynamical Systems,
https://doi.org/10.1007/978-3-031-55076-8_8
280 8 Darwinian Evolution, Hypercycles and Game Theory

is denoted “microevolution”.1 Gene frequencies shifts may be induced also by


stochastic fluctuations, like “genetic drift”.

POPULATION GENETICS Population genetics involves the evolution of the frequency of


distinct gene variants, as induced primarily by microevolution and genetic drift.

Our focus will be on microevolution, with some excursions into noise-induced


processes.

Basic Terminology Let us introduce some basic variables and concepts.

– POPULATION
Mostly we will assume that the number of individuals M does not change with
time, which models the competition for a limited supply of resources.
– GENOME
A genome of size N encodes the inheritable traits by a set of N binary variables,

s = (s1 , s2 , . . . , sN ),
. si = ±1 .

Here, N is considered fixed.


– GENERATIONS
For simplicity, time sequences of non-overlapping generations will be assumed,
like in a wheat field. The population present at time t is replaced by their
offspring at generation t + 1.
– ASEXUAL REPRODUCTION
Asexual reproduction is present when individuals have just one parent, our basic
assumption.

We will treat predominantly models for asexual reproduction, though most concepts
can be easily generalized to the case of sexual reproduction. In Table 8.1 some

Table 8.1 Genome size N and the spontaneous mutation rates μ, compare (8.2), per base for
two RNA-based bacteria and DNA-based eukaryotes. From Jain and Krug (2006) and Drake
et al. (1998).
Organism Genome size Rate per base Rate per genome
Bacteriophage Qβ 4.5 × 103 1.4 × 10−3 6.5
Bacteriophage λ 4.9 × 104 7.7 × 10−8 0.0038
E. Coli 4.6 × 106 5.4 × 10−10 0.0025
C. Elegans 8.0 × 107 2.3 × 10−10 0.018
Mouse 2.7 × 109 1.8 × 10−10 0.49
Human 3.2 × 109 5.0 × 10−11 0.16

1 Evolutionary
processes transcending a single species, like the radiation of lineages, would be the
domain of “macroevolution”.
8.1 Introduction 281

Gene A Gene B

Enzyme A Enzyme B
X Y Z
white brown black

Fig. 8.1 A simple form of epistatic interaction occurs when the influence of one gene builds on
the outcome of another. In this fictitious example black hair can only be realized when the gene for
brown hair is also present

typical values for the size N of genomes are listed. Note the three orders of
magnitude between simple eukaryotic life forms and the human genome.

State of the Population The state of the population at time t can be described by
specifying the genomes of all the individuals,

.{sα (t)}, α = 1 . . . M, s = (s1 , . . . , sN ) ,

which leads with



Xs (t),
. Xs (t) = M , (8.1)
s

to the definition of the number Xs of individuals with genome s for each of the
2N points s in genome space. Typically, most of these occupation numbers vanish,
biological populations are extremely sparse in genome space.

Combinatorial Genetics of Alleles Classical genetics focuses on the pres-


ence/absence of a few characteristic traits. These traits are determined by specific
sites in the genome, the “loci”. The genetic realizations of these specific loci are
“alleles”. Popular examples are alleles for blue, brown, and green eyes.

Combinatorial genetics in the form of “Mendelian inheritance” deals with the


evolution of the frequency of appearance of a given allele, as resulting from
environmental changes during the adjustment process. Most visible evolutionary
changes are due to a remixing of alleles, as mutation induced changes in the genome
are rare. For a comparison, we list typical mutation rates in Table 8.1.

Beanbag Genetics Without Epistatic Interactions “Epistasis” is present when


a specific allele affects the expression of other alleles, or genes in general, as
illustrated in Fig. 8.1. Classical genetics neglects epistatic interactions. The resulting
picture is often called “beanbag genetics”, as if the genome was nothing but a bag
with a selection of different alleles.
282 8 Darwinian Evolution, Hypercycles and Game Theory

Genotype and Phenotype The physical appearance of an organism is not exclu-


sively determined by gene expression. One distinguishes between genotype and
phenotype.

– GENOTYPE
The genotype of an organism is the class to which that organism belongs as
determined by the DNA that was passed to the organism by its parents at
conception.
– PHENOTYPE
The phenotype of an organism is the class to which that organism belongs as
determined by the physical and behavioral characteristics of the organism, for
example its size and shape, its metabolic activities and its pattern of movement.

Strictly speaking, selection acts upon phenotypes, but only the genotype is
bequeathed. Variations in phenotypes act therefore as a source of noise for the
selection process. Pheno- and genotypes are set to be identical in the following.

Speciation “Speciation” is the process leading to the differentiation of an initial


species into ore or more distinct species. Speciation occurs most often due to
adaptation to different ecological niches, in particular in distinct geographical
environments. We will not treat the various theories proposed for speciation here.

8.2 Mutations and Fitness in a Static Environment

Constant Environment On short time scales, the environment can be though to


be static, our standard setting, which is assumed to include predicable variations,
like the day-night cycle. This presumption breaks down for long time scales, since
the evolutionary change of one species might lead to repercussions all over the
ecosystem to which it appertains.

Independent Individuals An important issue in the theory of evolution is the


emergence of specific kinds of social behavior, which arises when individuals of
the same population interact. Several phenomena arising in a social context will be
discussed within a game theoretical framework in Sect. 8.7. Until then, we assume
non-interacting individuals, which implies that the fitness of a given genetic trait
is independent of the frequency of this and other alleles, apart from the overall
competition for resources.
8.2 Mutations and Fitness in a Static Environment 283

Fig. 8.2 Illustration of a


basic reproduction process 1 2 3 4 5 6
proceeding from generation
S S S S S S t-1
t −1 to t, with individuals
1, 3, 6 having 1, 3 and 2
descendants respectively

1 2 3 4 5 6
S S S S S S t

Constant Mutation Rates We furthermore assume that the mutation rates are

– Constant over time,


– Independent of the locus in the genome, and
– Not subject to genetic control.

Alternative frameworks would require refined modeling, which is beyond our scope.

Stochastic Evolution The evolutionary process can be modeled as a three-stage


stochastic process.

– REPRODUCTION
The individual α at generation t is the offspring of an individual α  living at
generation t − 1. Reproduction is thus represented as a stochastic map

α
. −→ α  = Gt (α) ,

where Gt (α) is the parent of the individual α, which is chosen at random


among the M individuals living at generation t − 1. For the reproduction process
illustrated in Fig. 8.2, one has Gt (1) = 1, Gt (2) = 3, and so on.
– MUTATION
The genomes of the offspring differ from the respective genomes of their parents
through random changes.
– SELECTION
The number of surviving offspring of an individual depends on its genome via the
“fitness” associated with the genome, being directly proportional to the fitness.

Most mutations are neutral in real life, which leads to a “neutral regime” when
selection pressure is low, as further discussed in Sect. 8.4.
284 8 Darwinian Evolution, Hypercycles and Game Theory

Point Mutations and Mutation Rate The basic theory is based on independent
point mutations, namely that every element of the genome is modified independently
of the other elements,
Gt (α)
siα (t) = −si
. (t − 1) with probability μ , (8.2)

where the parameter μ ∈ [0,1/2] is the microscopic “mutation rate”.2 In real


organisms, more complex phenomena take place, like global rearrangements of the
genome, copies of some part of the genome, displacements of blocks of elements
from one location to another, and so on. The values for the real-world mutation rates
μ for various species listed in Table 8.1 are therefore to be considered as effective
mutation rates.

Fitness and Fitness Landscape The fitness W (s), also called “Wrightian fitness”,
of a genotype trait s is proportional to the average number of offspring an individual
possessing the trait s has. It is strictly positive and can therefore be written as

.W (s) = ekF (s) ∝ average number of offspring of s . (8.3)

Selection acts in first place upon phenotypes, but we neglect here the difference,
considering the variations in phenotypes as a source of noise, as discussed above.
Several key parameters and functional dependencies enter (8.3).

W (s) : Wrightian fitness


F (s) : Fitness landscape
k : Inverse selection temperature
w(s) : Malthusian fitness

The inverse selection temperature takes it name from physics slang.3 For the
Malthusian fitness, one rewrites (8.3) as W (s) = ew(s)Δt , where Δt is the generation
time. We will work here with discrete time, viz with non-overlapping generations,
making use only of the Wrightian fitness W (s).

Distribution of Fitness Effects In general, the “distribution of fitness effects”


describes if new mutations are advantageous, neutral or deleterious, and to which
proportions. For modeling frameworks based on concrete mutation processes, our
approach here, there is no need to treat the distribution of fitness effects separately.

2 Pointmutations can be thought to correspond to “single-nucleotide mutations” (SNM).


3 The probability to find a state with energy E in a thermodynamic system with temperature T is
proportional to the Boltzmann factor exp(−β E). The inverse temperature is β = 1/(kB T ), with
kB being the Boltzmann constant.
8.2 Mutations and Fitness in a Static Environment 285

Fig. 8.3 Illustration of idealized (smooth) one-dimensional model fitness landscapes F (s). In
contrast, real-world fitness landscapes are likely to contain discontinuities. Left: A fitness
landscape with peaks and valleys, metaphorically denoted “rugged landscape”. Right: A fitness
landscape containing a single smooth peak, as described by (8.19)

We remark that fitness is a concept defined at the level of individuals in a


homogeneous population. The resulting fitness of a species, as an averaged quantity,
needs to be explicitly evaluated and is model dependent.

Fitness Ratios The assumption of a constant population size makes the reproduc-
tive success a relative notion. Only the ratios

W (s1 ) ekF (s1 )


. = kF (s ) = ek[F (s1 )−F (s2 )] (8.4)
W (s2 ) e 2

are important. It follows that the quantity W (s) is defined up to a proportionality


constant and, accordingly, the fitness landscape F (s) only up to an additive constant,
much like the energy in physics.

Fitness Landscape A graphical representation of the fitness function F (s) is not


truely possible for real-world fitness functions, due to the high dimensionality of the
genome space, which contains 2N states. It is nevertheless customary to draw fitness
landscapes, like the ones shown in Fig. 8.3. However, one must bear in mind that
these illustrations are not to be taken at face value, apart for model considerations.

Fundamental Theorem of Natural Selection The fundamental theorem of


natural selection, first stated by Fisher in 1930, deals with adaptation in the absence
of mutations when the population size is large, close to the thermodynamic limit
M → ∞, which allows to neglect fluctuations.

The theorem states that the average fitness of the population cannot decrease in
time under these circumstances, and that the average fitness becomes stationary only
when all individuals in the population have the maximal reproductive fitness.
The proof is straightforward. We define with

1  1 
M
.W t ≡ W (sα (t)) = W (s)Xs (t) , (8.5)
M M s
α=1
286 8 Darwinian Evolution, Hypercycles and Game Theory

the average fitness of the population,


 with Xs being the number of individual having
the genome s. Note that the s in (8.5) contains 2N terms. The evolution equations
are given in the absence of mutations by

W (s)
Xs (t + 1) =
. Xs (t) , (8.6)
W t

where W (s)/W t is the relative reproductive success. The overall population size
remains constant,
 1 
. Xs (t + 1) = Xs (t)W (s) = M ,
s
W t s

where we have used (8.5) for W t . Then,

1 
1  2
s W (s)Xs (t)
W t+1
. = W (s)Xs (t + 1) = M

M s 1 
M s W (s )Xs (t)

W 2 t
= ≥ W t ,
W t

since W 2 t − W 2t = ΔW 2 t ≥ 0. The steady state,

W t+1 = W t ,
. W 2 t = W 2t ,

is only realizable when all individuals 1 . . . M in the population have the same
fitness. Modulo degeneracies, this implies identical genotypes.

Baldwin Effect Variations in the phenotype may be induced not only via stochastic
influences of the environment, but also through adaption of the phenotype itself to
the environment, viz through cognitive learning. Learning can speed up evolution
whenever the underlying fitness landscape is very rugged, by smoothing it out and
providing a stable gradient towards the genotype with the maximal fitness. One
speaks of the “Baldwin effect”.

8.3 Deterministic Evolution

Mutations are random events, which implies that evolution is inherently a stochastic
process. This does not hold in the limit of infinite population size .M → ∞,
for which stochastic fluctuations average out, becoming irrelevant. In this limit
the equations governing evolution are deterministic and governed only by average
transition rates. This allows to study in detail the condition necessary for adaptation
to occur for various mutation rates.
8.3 Deterministic Evolution 287

8.3.1 Evolution Equations

Mutation Matrix The mutation matrix



Qμ (s → s),
. Qμ (s → s) = 1 (8.7)
s

denotes the probabilities of obtaining a genotype .s when attempting to reproduce


an individual with genotype .s . The mutation rates .Qμ (s → s) may depend on a
parameter .μ determining the overall mutation rate. The mutation matrix includes
the absence of any mutation, viz the transition .Qμ (s → s ). It is normalized.

Deterministic Evolution with Mutations We generalize (8.6), which is valid in


the absence of mutations, by including the effect of mutations via the mutation
matrix Qμ (s → s),
  
Xs (t + 1)  
 
. = Xs (t)W (s )Qμ (s → s) Ws Xs (t) ,
M  
s s

or

xs (t)W (s )Qμ (s → s) 
s
xs (t + 1) =
. , W t = Ws xs (t) , (8.8)
W t
s

where we have introduced the normalized population variables

Xs (t) 
xs (t) =
. , xs (t) = 1 .
M s

The evolution dynamics (8.8) retains the overall size s Xs (t) of the population,
due to the normalization of the mutation matrix Qμ (s → s), via (8.7).

Hamming Distance The Hamming distance


N
(si − si )2 N 1 
N
.dH (s, s ) = = − si si (8.9)
4 2 2
i=1 i=1

measures the number of units that are different in two genome configurations s and
s , e.g. before and after the effect of a mutation event.

Mutation Matrix for Point Mutations We examine the effects of the simplest
mutation process, namely the case that a genome of fixed length N is affected by
random transcription errors afflicting only individual loci. For this case, namely
288 8 Darwinian Evolution, Hypercycles and Game Theory

point mutations, the overall mutation probability

Qμ (s → s) = μdH (1 − μ)N −dH


. (8.10)

is the product of the independent mutation probabilities for all loci i = 1, . . . , N ,


with dH denoting the Hamming distance dH (s, s ) given by (8.9) and μ the mutation
rate μ defined in (8.2). One has
 N 
. Qμ (s → s) = (1 − μ)N −dH μdH = (1 − μ + μ)N ≡ 1 ,
s
dH
dH

the mutation matrix defined by (8.10) is consequently normalized. Discarding a


normalization factor, we rewrite the mutation matrix as

Qμ (s → s) ∝ exp
. log(μ) − log(1 − μ) dH
 


∝ exp β si si , (8.11)
i

where we denoted by β an effective inverse temperature, defined by


 
1 1−μ
.β = log .
2 μ

The relation of the evolution equation (8.11) to the partition function of a thermo-
dynamical system, hinted at by the terminology “inverse temperature” will become
evident below.

Evolution Equations for Point Mutations With the help of the exponential
representations for both the fitness, W (s) = exp[kF (s)], see (8.3,) and for the
mutation matrix Qμ (s → s), we may express the evolution equation (8.8) as
 
1  
 
.xs (t + 1) = xs (t) exp β si si + kF (s ) (8.12)
Nt 
s i

in a form that is suggestive of a statistical mechanics analogy. The normalization Nt


in (8.12) takes care of both the average fitness W t , as appearing in (8.8), and of
the normalization of the mutation matrix, compare (8.11).

Evolution Equations in Linear Form The evolution equation (8.12) is non-linear


in the dynamical variables xs (t), which determine the average fitness W t entering
the normalization Nt . A suitable change of variables allows however to cast the
evolution equation into a linear form. For this purpose we introduce non-normalized
8.3 Deterministic Evolution 289

variables ys (t) implicitly via

ys (t)
xs (t) = 
. . (8.13)
s ys (t)

The idea is that the normalization s ys (t) can be selected freely for every
generation t = 1, 2, 3, . . .. The evolution (8.12) becomes
 
 
.ys (t + 1) = Zt ys (t) exp β si si 
+ kF (s ) , (8.14)
s i

where

1 s ys (t + 1)
.Zt =  .
Nt s ys (t)

Choosing a different normalization for ys (t) and for ys (t + 1) we may achieve Zt ≡


1. Equation (8.14) is then linear in ys (t).

Evolution Hamiltonian In the following we will make use of analogies to


notations commonly used in statistical mechanics. Readers unfamiliar with the
mathematics of the one-dimensional Ising model may skip the mathematical details
and concentrate on the interpretation of the results.

The form
 
ys (t + 1) =
. eβH [s,s ] ys (t)
s

linear evolution equation (8.14) involves an effective “Hamiltonian”4



βH [s, s ] = β
. si si + kF (s ) , (8.15)
i

which is a function of the binary variables s and s . Given that s and s run over the
genome space of subsequent generations, we may rename variables, s → s(t + 1)
and s → s(t), which leads to the notation

.ys(t+1) = eβH [s(t+1),s(t)] ys(t) . (8.16)
s(t)

4 In classical thermodynamics, the Hamiltonian H (α), or energy function, determines the proba-

bility that a given state α is populated. This probability is proportional to the Boltzmann factor
exp(−βH (α)), where β = 1/(kB t) is the inverse temperature.
290 8 Darwinian Evolution, Hypercycles and Game Theory

Statistical Mechanics of the Ising Model Evolution progresses generation per


generation, which is equivalent to an iterative solution of (8.16),

ys(t+1) =
. eβH [s(t+1),s(t)] · · · eβH [s(1),s(0)] ys(0) , (8.17)
s(t),...,s(0)

which defines in turn a two-dimensional Ising-type Hamiltonian5


 
βH = β
. si (t + 1)si (t) + k F (s(t)) . (8.18)
i,t t

The two dimensions are genome space, i = 1, . . . , N, and time t, viz the sequence
of subsequent generations. This expression will be used in the next sections.

8.3.2 Beanbag Genetics: Evolution Without Epistasis

Fujiyama Landscape The fitness function


N N
F (s) =
. hi si , W (s) = ek hi si , (8.19)
i=1 i=1

is denoted the “Fujiyama landscape”, since it corresponds to a single smooth peak


as illustrated in Fig. 8.3. To see why, we consider the case .hi > 0 and rewrite (8.19)
as

F (s) = s0 · s,
. s0 = (h1 , h2 , . . . , hN ) .

The fitness of a given genome .s is directly proportional to the scalar product with
the master sequence .s0 , with a well defined gradient pointing towards the optical
genom.

The Fujiyama Hamiltonian Epistatic interactions are absent in the smooth peak
landscape (8.19). In terms of the corresponding Hamiltonian, see (8.18), this fact
expresses itself as


N  khi 
βH = β
. Hi , Hi = si (t + 1)si (t) + si (t) . (8.20)
t
β t
i=1

5 Any system of binary variables is equivalent to a system of interacting Ising spins, which retains

only the classical contribution to the energy of interacting quantum mechanical spins (the magnetic
moments).
8.3 Deterministic Evolution 291

Every locus i is independent, corresponding to the one-dimensional Ising model


βHi in an effective uniform magnetic field khi /β. Loci are hence propagated
independently in generation time, t = 1, 2, . . . .

Transfer Matrix The Hamiltonian (8.20) does not contain interactions between
different loci of the genome. It is enough, as a consequence, to examine the evolution
of a single gene locus i, together with the associated Hamiltonian Hi . We define with
 
  y+ (t)
y = y+ (t), y− (t) ,
. yT =
y− (t)

the non-normalized densities of individuals having gene states si = ±1, for a


specific locus i. The iterative solution (8.17) takes the form
 t

y(t + 1) =
. T y(0) = T t+1 yT (0) , (8.21)
t  =0

where we defined the 2 × 2 “transfer matrix”


 
βHi [si (t+1),si (t)] eβ+khi e−β
T =e
. , T = . (8.22)
e−β eβ−khi

For the last expression s, s  = ±1 was used, together with the symmetrized form
 khi   
βHi = β
. si (t + 1)si (t) + si (t + 1) + si (t)
t
2 t

of the one-dimensional Ising model. The matrix T transfers the state of the system
from one generation to the next, from time t to t + 1, hence the naming.

Eigenvalues of the Transfer Matrix For simplicity we take

hi ≡ 1,
. s0 = (1, 1, . . . , 1) ,

and evaluate the eigenvalues ω of the transfer matrix,

ω2 − 2ω eβ cosh(k) + e2β − e−2β = 0 .


.

The solutions are



ω1,2
. = e cosh(k) ± e2β cosh2 (k) − e2β + e−2β .
β
292 8 Darwinian Evolution, Hypercycles and Game Theory

In the absence of a fitness landscape, namely when k = 0, the eigenvalues are


eβ ±e−β . Adaption does not occur, with the eigenvectors y = (1, ±1) of the transfer
matrix attributing equal weight to all states in genome space. In the following we
consider with k > 0 non-trivial fitness landscapes.

Dominance of the Largest Eigenvalue The genome distribution y is determined


by large powers of the transfer matrix, limt→∞ T t , as entering (8.21). The difference
between the two eigenvalues of the transfer matrix is

ω = ω1 − ω2 = 2 e2β sinh2 (k) + e−2β ,
. (8.23)

which vanishes only if k = 0 and β → ∞. For ω > 0 the larger eigenvalue


ω1 dominates in the limit t → ∞ and y = (y+ , y− ) is given the eigenstate
corresponding to ω1 , if y+ > y− .

Adaption in the Absence of Epistatic Interactions The sequence s0 = (1, . . . , 1)


is the state with maximal fitness. The probability to find individuals with genome s
at a Manhatten distance dH from s0 is proportional to
  dH
N −dH y−
. (y+ ) (y− ) dH
∝ .
y+

The entire population is hence within a finite distance of the optimal genome
sequence s0 whenever y− /y+ < 1, viz for ω > 0. We recall that
 
1 1−μ
.β = log , W (s) = ekF (s) ,
2 μ

where μ is the mutation rate for point mutations. Thus we see that there is some
degree of adaptation whenever the fitness landscape does not vanish (k > 0). This is
the case even for a maximal mutation rate μ → 1/2, which corresponds to β → 0.

8.3.3 Epistatic Interactions and the Error Catastrophe

The result of the previous Sect. 8.3.2, i.e. the occurrence of adaptation in a smooth
fitness landscape for any non-trivial model parameter, is due to the absence of
epistatic interactions in the smooth fitness landscape. Epistatic interactions intro-
duce a phase transition to a non-adapting regime once the mutation rate becomes
too high.
8.3 Deterministic Evolution 293

Sharp Peak Landscape One possibility to study this phenomenon is the limiting
case of strong epistatic interactions. In this limit, individual elements of the
genotype do not provide information on the value of the fitness, as per se, which
is defined by

1 if s = s0
W (s) =
. , (8.24)
1−σ otherwise

with .σ > 0. In this case, all genome sequences but one have the identical fitness,
which is lower than the fitness of the master sequence .s0 . The corresponding
landscape .F (s), defined by .W (s) = ekF (s) , is equally discontinuous, with (8.24)
corresponding to a fitness landscape with a “tower”. This landscape is devoid of
gradients pointing towards the master sequence of maximal fitness.

At first sight, the presence of epistatic interactions in (8.24) may not be evident.
For this we recall that the fitness landscape .F (s) of beanbag genetics is strictly
additive. Epistatic interactions are hence present whenever the fitness .F (s) is not
strictly multiplicative, which is the case for the sharp-peak model (8.24).

Distance from Optimality We define by .xk the fraction of the population whose
genotype has a Hamming distance k from the preferred genotype,

1 
. xk (t) = δdH (s,s0 ),k Xs (t) .
M s

The evolution equations can be formulated entirely in terms of the .xk , which
correspond to the fraction of the population being k point mutations away from
the master sequence.

Infinite Genome Limit For the limit .N → ∞ we rescale the rate for point
mutations, .μ, as

.μ = u/N , (8.25)

compare (8.2). In this limit, the mean number of mutations

u = Nμ
.

occurring per genome remains finite. Experimentally, this approximation holds well
for microbial taxa, as “Drake’s rule” states, with .u ≈ 0.003 for base substitutions.

Absence of Back Mutations Starting from the optimal genome .s0 we examine the
effect of successive mutations. Successful mutations increase the distance k from
the optimal genome .s0 . Assuming .u 1 in (8.25) leads to several consequences.
294 8 Darwinian Evolution, Hypercycles and Game Theory

x0 x1 x2 x3 x4

Fig. 8.4 The linear chain model for the tower landscape, as defined by (8.26), (8.27), and (8.28),
with k denoting the number of point mutations necessary to reach the optimal genome. The
population fraction .xk (t + 1) is influenced only by .xk−1 (t) and .xk (t), its own value at time t

– Per Generation, multiple mutations do not appear.


– One can neglect back mutations, viz mutations reducing the value of k.

The relative probability of back mutations,

k
. 1,
N −k

vanishes in the infinite-genome limit .N → ∞ for genomes within a finite distance


k from the master sequence. As a consequence, the tower model has the structure of
a linear chain, with .k = 0 being the starting point of the chain.

Linear Chain Model We have two parameters; u, which measures the mutation
rate, and .σ , setting the strength of the selection. Remembering that the fitness .W (s)
is proportional to the number of offspring, see (8.24), we find

1  
.x0 (t + 1) = x0 (t) (1 − u) , . (8.26)
W 
1  
x1 (t + 1) = ux0 (t) + (1 − u) (1 − σ ) x1 (t) , . (8.27)
W 
1  
xk (t + 1) = uxk−1 (t) + (1 − u)xk (t) (1 − σ ) , k > 1, (8.28)
W 

where .W  is the average fitness. These equations describe a linear chain model,
as illustrated in Fig. 8.4. The population of individuals with the optimal genome .x0
is constantly losing members, due to mutations. But it also has a higher number of
offspring than all other populations, due to its larger fitness.

Stationary Solution The average fitness of the population is given by

W  = x0 + (1 − σ )(1 − x0 ) = 1 − σ (1 − x0 ) .
. (8.29)
8.3 Deterministic Evolution 295

When looking for the stationary distribution .{xk∗ }, one notes that the equation for .x0∗
does not involve .xk∗ with .k > 0,

x0∗ (1 − u)
x0∗ =
. , 1 − σ (1 − x0∗ ) = 1 − u ,
1 − σ (1 − x0∗ )

which leads to

∗ 1 − u/σ if u < σ
.x0 = , (8.30)
0 if u ≥ σ

due to the normalization condition .0 ≤ x0∗ ≤ 1. For .u > σ the model becomes ill
defined. The stationary solutions for the .xk∗ follow, for .k = 1 one has
u
x1∗ =
. x0∗ ,
1 − σ (1 − x0∗ ) − (1 − u)(1 − σ )

compare (8.27) and (8.29). The stationary probability .xk∗ to find genomes at a
generic distance from optimality, .k > 1, is

(1 − σ )u
xk∗ =
. x∗ , (8.31)
1 − σ (1 − x0∗ ) − (1 − u)(1 − σ ) k−1

which follows from (8.28) and (8.29).

Distinct Phases Two regimes can be distinguished, as determined by the magni-


tude of the mutation rate .μ = u/N relative to the fitness parameter .σ , with

u=σ
.

being the transition point. The density .x0∗ of optimal genomes can be taken as an
order parameter,6 being zero for .u ≥ σ , and finite otherwise. In physics language,
epistasis corresponds to many-body interactions, which can be thought to induce a
phase transition in the sharp peak model. In this view, the absence of many-body
interactions in the smooth fitness landscape model is the underlying reason for the
likewise absence of distinct phases, compare Sect. 8.3.2.

Quasispecies Formation in the Adaptive Regime In the regime of small


mutation rates, .u < σ , one has .x0∗ > 0, in fact the whole population lies within
a finite distance from the preferred genotype. To see why, we note that

σ (1 − x0∗ ) = σ (1 − 1 + u/σ ) = u
.

6 Order parameters characterize the different between ordered and disordered phases, as explained

in Sect. 6.1 of Chap. 6.


296 8 Darwinian Evolution, Hypercycles and Game Theory

Fig. 8.5 Quasispecies 0.4


formation within the sharp
peak fitness landscape, as u=0.30, σ=0.5
defined by (8.24). The 0.3 u=0.40, σ=0.5
stationary population density

x*k u=0.45, σ=0.5
.xk , see (8.31), is peaked u=0.49, σ=0.5
around the genome with 0.2
maximal fitness, located at
.k = 0. The population tends
to spread out in genome space 0.1
when the overall mutation
rate u approaches the critical
point .u → σ 0
0 6 12
k

and take a look at (8.31),


 
(1 − σ )u 1−σ u
. = ≤ 1, for u < σ .
1 − u − (1 − u)(1 − σ ) 1−u σ

The .xk∗ therefore form a geometric series,


 k
∗ 1−σ u
.xk ∼ ,
1−u σ

which is summable whenever .u < σ . In this adaptive regime the population forms
what Manfred Eigen denoted a “quasispecies”, see Fig. 8.5.

QUASISPECIES A quasispecies is a population of genetically close, but not identical


individuals.

The quasispecies concept can be thought to contrast the classical view of


genetics, as dealing with a limited number of competing alleles. It shapes current
views on how primordial life may have evolved, a topic treated in Sect. 8.5.1.
Equally, present-day pathogentic viral popluations are often spread out thinly in
genome space, a key factor for overcoming resistences of potential hosts.

Wandering Regime and Error Threshold In the regime of a large mutation rate,
u > σ , we have .xk∗ = 0, .∀k, which is equivalent to a situation where the population
.

is distributed in an essentially uniform way over genotype space. Fitness will drop
to very low levels, given that the whole population lies far away from the preferred
genotype. The size of the population will drop equally, until the preconditions of
above analysis are no longer valid and a “wandering regime” is reached. In this
state, the effects of finite population size are prominent.
8.4 Finite Populations and Stochastic Escape 297

ERROR CATASTROPHE The transition from the adaptive (quasispecies) regime to the
wandering regime is denoted the “error threshold” or “error catastrophe”.

Quasispecies theory is intrinsically linked to the the notion of an error catastro-


phe, independent of the exact nature of the fitness landscape containing epistatic
interactions. A quasispecies will not be able to adapt when mutation rates become
too large. Most of the time, an error catastrophe implies extinction in the real world.

Drift Barrier Hypothesis So far, our line of arguments has been based on short-
term adaptive pressures, namely on a fitness landscape that is defined in relation
to a static environment. In this situation, the more individuals possess the optimal
genome, the better. However, environments change on longer time scales, which
is bad news for a well adapted species. Too few variations in the extant gene pool
severely restricts adaptability to new environmental forcings.

In order to be able to adapt to changing environments a species needs, impor-


tantly, only an intensive number of individuals with suitable genomes, not a
finite fraction of the population. The overall degree of genetic variability is hence
proportional to

M
uM =
. μ
N
where .μ is the mutation rate per base pair, in analogy to (8.2). If genetic variability
can be assumed to be similar between species, it follows that optimal mutation rates
scale as .1/M, viz inversely with population size. This is indeed seen in multicellular
eukaryotic species. There is however a catch, .u ∼ 1/M will cross the “drift barrier”,
as the error threshold is called in this context, when the population size M becomes
too small. This is the “drift barrier hypothesis”, which can be seen as an instance
of Kauffman’s notion of “life at the edge of chaos”.7 Both concepts deal with the
tradeoff between short- and long-term survivability.

8.4 Finite Populations and Stochastic Escape

Punctuated Equilibrium Evolution is not a steady process, there are regimes of


rapid increase of the fitness and phases of relative stasis. This interplay of dynamical
behaviors is denoted “punctuated equilibrium”. In this context, adaptation can result
either from local optimization of the fitness of a single species or via coevolutionary

7 The notion of life at the edge of chaos features prominently in Chap. 7.


298 8 Darwinian Evolution, Hypercycles and Game Theory

avalanches,8 which occur in an ecosystems when the evolution of one species


induces successive adaptive changes in a series of other species.

NEUTRAL REGIME The stage where evolution is driven essentially by random mutations
is called the neutral, wandering, or drifting regime.

The quasispecies model is inconsistent in the neutral regime, in which the


population spreads out in genome space. In this situation, fluctuations of the
reproductive process in a finite population have to be taken into account.

Deterministic Versus Stochastic Evolution Evolution is driven by stochastic


processes, since mutations are random events. Nevertheless, randomness averages
out and the evolution process becomes deterministic in the thermodynamic limit, as
discussed in Sect. 8.3, when the number M of individuals diverges, M → ∞.

Evolutionary processes in populations with a finite number of individuals differ


from deterministic evolution quantitatively and sometimes also qualitatively, the
latter being here our focus of interest.

STOCHASTIC ESCAPE Random mutations in a finite population might lead to a decrease


in the fitness with the fixation of potentially detrimental traits.

We will discuss in some detail under which circumstances stochastic escape is


important for evolutionary processes in small populations.9

8.4.1 Adaptive Climbing Under Strong Selective Pressure

Our first inroad into the population dynamics of finite populations is based on three
basic assumptions.

– Finite populations.
– Strong selective pressure.
– Small mutation rates.

Strong selective pressure implies that one can represent the population by a single
point in genome space, namely that the genomes of all individuals are taken to be
equal.

Adaptive Walks The evolutionary dynamics is taken to consist of three basic


steps.

8 Coevolutionary avalanches are likely to lead to a self-organized critical state, as shown in Sect. 6.6

of Chap. 6.
9 A general account of the theory of stochastic escape is given in Sect. 3.5.2 of Chap. 3.
8.4 Finite Populations and Stochastic Escape 299

N=2
1

0
Fig. 8.6 Local fitness optima in a one-dimensional random fitness distribution. The illustration
shows that random distributions may exhibit enormously large numbers of local optima (filled
circles). For these, all sequences with a hamming distance of one point mutations have lower
fitness values

– At each time step, only a single locus of the genome of a certain individual
mutates.
– If the mutation leads to a genotype with higher fitness, the new genotype spreads
rapidly throughout the entire population, which moves altogether to the new
position in genome space.
– If the fitness of the new genotype is lower, the mutation is rejected and the
population remains at the old position.

Physicists would call this type of dynamics a Monte Carlo process at zero
temperature. Further below we will relax the condition that unfavorable mutations
are relaxed.

Random Energy Model As formulated so far, our rules do not allow to transverse
a valley with low fitness, viz to pass from one local optimum to the next. As a first
step it hence important to investigate the statistical properties of local optima, which
depend on the properties of the fitness landscape. A suitable approach is to assume
that fitness values are randomly distributed.

RANDOM ENERGY MODEL The fitness landscape .F (s) is uniformly distributed between
zero and one.

The random energy model is illustrated in Fig. 8.6. It captures an ingredient


expected for real-world fitness landscapes, namely a large number of local fitness
optima close to the global fitness maximum.

Local Optima in the Random Energy Model Let us denote by N the number of
genome elements. The probability that a sequence with fitness .F (s) has a higher
fitness than all its N neighbors is given by

F N = F N (s) ,
.
300 8 Darwinian Evolution, Hypercycles and Game Theory

since we have to impose that every one of the N nearest neighbors,

.(s1 , . . . , −si , . . . , sN ), (i = 1, . . . , N ), s = (s1 , . . . , sN ) ,

has a fitness less than F . The probability that a point in genome space is a local
optimum is consequently given by
 1 1
Plocal optimum =
. F N dF = ,
0 N +1

since the fitness F is equally distributed in .[0, 1]. There are therefore many local
optima, namely .2N /(N +1). A schematic picture of the large number of local optima
in a random distribution is given in Fig. 8.6.

Average Fitness of a Local Optimum The typical fitness of a local optimum is



1 1 N +1 1 + 1/N
Ftyp =
. F F N dF = =
1/(N + 1) 0 N +2 1 + 2/N
1
≈ 1− , (8.32)
N
viz very close to the global optimum of 1, when the genome length N is large.
At every successful step, a mutation followed by selection, on average the distance
from the top is divided by a factor of two.

Successful Mutations The typical fitness attained after . successful steps is


therefore of the order of
1
1−
.
+1
,
2
when starting with . = 0 from an average initial fitness of 1/2. It follows that the
typical number of successful mutations after which an optimum is approximately
attained is
1 1 log N
Ftyp = 1 −
. ≈1− , typ +1= , (8.33)
N 2 typ +1 log 2

which is logarithmically small. The climbing process is illustrated in Fig. 8.7.

Time Needed for One Successful Mutation Even though the number of success-
ful mutations (8.33) needed to arrive at the local optimum is small, the time needed
to climb to the local peak can be very long. In first place, a given mutation must be
successful before it can contribute to the climbing process. For the random energy
8.4 Finite Populations and Stochastic Escape 301

Fig. 8.7 Adaptive climbing


vs. stochastic escape. The pesc
higher the fitness, the more climbing

Fitness F
difficult it becomes to climb
further. With the escape
probability .pesc the
population jumps somewhere
else, escaping from the local
optimum

different genotypes

model this is only rarely the case close to the top, because mutations lead to a
randomly distributed fitness of the offspring.

We define with

.tF = n Pn , n : number of generations ,
n

the average number of generations necessary for a population with fitness F to attain
a single successful mutation. We denoted with

Pn = (1 − F )F n−1
.

the probability that it takes exactly n generations, which is the case if .n − 1 mutation
attempts with lower fitness are followed by one with fitness in .[F, 1]. The climbing
time is evaluated as

 ∞

1−F  1−F ∂  n
.tF = nF =n
F F
F F ∂F
n=0 n=0

∂ 1 1
= (1 − F ) = . (8.34)
∂F 1 − F 1−F

The average number of generations necessary to increase fitness further by a


successful mutation diverges close to the global optimum .F → 1.

Total Climbing Time Every successful mutation decreases the distance .1 − F to


the top by 1/2. Vice versa the factor .1/(1 − F ) increases on the average by two. We
define with


typ

Topt = tF
. 2
=0
302 8 Darwinian Evolution, Hypercycles and Game Theory

the expected total number of generations to arrive at a local optimum, where . typ
is the typical number of successful mutations needed to arrive at a local optimum,
see (8.33). Using (8.34) for the average number of generations necessary for a single
successful mutation, .tF , we obtain

1 − 2 typ +1
Topt = tF
. ≈ tF 2 typ +1 = tF e( typ +1) log 2
1−2
N
≈ tF elog N = ≈ 2N , (8.35)
1−F

where we used . typ + 1 = log N/ log 2, setting .F ≈ 1/2 in the last step, our choice
for a typical starting fitness. When starting randomly, the number of generations
needed to climb to a local maximum in the random fitness landscape is hence
proportional to the length of the genome.

8.4.2 Adaptive Climbing vs. Stochastic Escape

The focus of Sect. 8.4.1 has been on the average properties of adaptive climbing. We
now take fluctuations in the reproductive process into account. Our aim to compare
two typical time scales, the number of generations needed for a stochastic escape
and for adaptive climbing.

Escape Probability Under the condition of strong selection limit, as assumed in


our model, favorable mutations spread instantaneously into the whole population.
In first order, it is therefore sufficient to restrict the analysis to populations of size
M in which all individuals have an identical fitness.

Mutations occur with probability u per genome, leading with probability F for
descendants having a lower fitness. The probability for this to happen, viz the
probability .pesc of stochastic escape, is consequently

pesc ∝ (F u)M ≈ uM ,
. F ≈ 1.

The escape can only happen when an adverse mutation occurs for every member of
the population within the same generation. A single individual not mutating retains
a higher fitness, the one of the present local optimum, with the consequence that all
other mutations are discarded when the selective pressure is strong, as assumed here.
The same holds whenever a single positive mutation in .[F, 1] shows up. Compare
also Fig. 8.7).

Stochastic Escape and Stasis As in the previous section, we consider a population


that is close to a local optimum, but still climbing. The probability that the fitness
of a given individual increases is .(1 − F )u. It needs to mutate with a probability u
8.4 Finite Populations and Stochastic Escape 303

and to achieve a higher fitness, when mutating, which happens .1 − F of times. We


denote with

a = 1 − (1 − F )u
.

the probability that the fitness of an individual does not increase with respect to
the current fitness F of the population. The probability .qpos that at least one better
genotype is found is then given by

qpos = 1 − a M .
.

Real-world populations are typically close to a local optimum. In this situation, we


can then distinguish between two evolutionary regimes.

– ADAPTIVE WALK
The escape probability .pesc is much smaller than the probability to increase the
fitness, .qpos  pesc . The population continuously increases its fitness via small
mutations.
– WANDERING REGIME
The adaptive dynamics slows down close to a local optimum. The probability
of stochastic escape .pesc may then become comparable to that of an adaptive
process, .pesc ≈ qpos .

In the “drifting regime”, as the second case is also called, the population wanders
around in genome space, starting a new adaptive walk after every successful escape.

Typical Escape Fitness The fitness F increases steadily during the adaptive walk
regime, until it reaches a certain typical fitness, .Fesc , for which the probability of
stochastic escape becomes substantial, i.e. when .pesc ≈ qpos ,

pesc = (Fesc u)M


.

= 1 − [1 − (1 − Fesc )u]M = qpos .

As .1 − Fesc is small, we can set .(Fesc u)M ≈ uM and expand the right-hand-side of
above expression with respect to .1 − Fesc ,

uM ≈ 1 − [1 − M(1 − Fesc )u] = M(1 − Fesc )u ,


.

obtaining

.1 − Fesc = uM−1 /M . (8.36)

The fitness .Fesc necessary for the stochastic escape to become relevant is exponen-
tially close to the global optimum .F = 1 for large populations M.
304 8 Darwinian Evolution, Hypercycles and Game Theory

Relevance of Stochastic Escape In general, stochastic escape occurs when the


population is at a local optimum, or close to one. We may estimate the importance
of the escape process relative to that of the adaptive walk by comparing the typical
fitness .Ftyp of a local optimum achieved by a typical climbing process with the
typical fitness .Fesc needed for the escape process to become important,

1 uM−1 1 uM−1
Ftyp = 1 −
. ≡ Fesc = 1 − , = ,
N M N M
where we have used (8.32) for .Ftyp . The last expression is independent of the details
of the fitness landscape, containing only the measurable parameters .N, M and u.
This condition can be fulfilled only when the number of individuals M is much
smaller than the genome length N, as .u < 1. The phenomenon of stochastic escape
occurs only for very small populations.

8.5 Prebiotic Evolution

Prebiotic evolution deals with the questions surrounding the origin of life. Is it
possible to define chemical autocatalytic networks in the primordial soup having
properties akin to those of the metabolic reaction networks powering the workings
of every living cell?
In this scenario, a key point regards the cell membrane, which acts as a defining
boundary, separating body and environment on a physical level. Precursors of life
would not have membranes, but one may hypothesize that chemical regulation
networks emerging within a primordial soup of macromolecules did evolve into
the protein regulation networks of living cells once enclosed by a membrane.

8.5.1 Quasispecies Theory

In Sect. 8.3.3 we discussed the concept of a quasispecies as a collection of


closely related genomes in which no single genotype is dominant. This situation
is presumably also typical for prebiotic evolutionary processes. In this context,
Eigen formulated the quasispecies theory for a system of information carrying
macromolecules through a set of equations for chemical kinetics,

ẋi = Wii xi +
. Wij xj − xi φ(t) , (8.37)
j =i

where the .xi denote the concentrations of .i = 1 . . . N molecules. .Wii is the


(autocatalytic) self-replication rate, with the off-diagonal terms .Wij (.i = j )
corresponding to effective mutation rates.
In a first step, we will generalize the concept of a quasi-species to a collection
of macromolecules using (8.37). Next, in Sect. 8.5.2, we will turn to a somewhat
8.5 Prebiotic Evolution 305

more realistic model of interacting chemical reactions, which will be, hopefully,
functionally closer to the precursors of the protein regulation network of living cells.

Mass Conservation We can choose the flux .−xφ(t) in Eigen’s equation (8.37) for
prebiotic evolution such that the total concentration
 
C=
. xi , Ċ = Wij xj − C φ,
i ij

is conserved for long times. Demanding .Ċ → 0 and .C → 1 for large times suggests

φ(t) =
. Wij xj (t) (8.38)
ij

for a suitable choice for the field .φ(t), which leads in turn to

d
Ċ = φ (1 − C),
. (C − 1) = −φ (C − 1) . (8.39)
dt

The total concentration .C(t) will therefore approach 1 for .t → ∞ for .φ > 0,
which we assume to be the case in the first case, implying total mass conservation.
In this case the autocatalytic rates .Wii dominate with respect to the transmolecular
mutation rates .Wij (.i = j ).

Eigenstates We can write the evolution equation (8.37) in matrix form,


⎛ ⎞
x1
d   ⎜ ⎟
. x(t) = W − 1φ x(t), x = ⎝ ... ⎠ , (8.40)
dt
xN

where W is the matrix .Wij . Assuming for simplicity a symmetric mutation matrix
Wij = Wj i , the solution of the linear differential equation (8.40) is given in terms
.

of the respective eigenvectors .eλ ,



W eλ = λ eλ ,
. x(t) = aλ (t) eλ , ȧλ = [λ − φ(t)] aλ .
λ

The eigenvector .eλmax with the largest eigenvalue .λmax will dominate for .t → ∞,
due to the overall mass conservation, as enforced by (8.39). The flux will likewise
adapt to the largest eigenvalue,

. lim λmax − φ(t) → 0,


t→∞
306 8 Darwinian Evolution, Hypercycles and Game Theory

leading to the stationary condition .ẋi = 0 for the evolution equation (8.40) in the
long time limit.

Quasispecies The eigenvectors .eλ will contain only a single non-zero entry when
W is diagonal in (8.40), viz when mutations are absent, such as .(0, . . . , 1, . . . , 0).
In this case, a single macromolecule remains in the primordial soup for .t → ∞.

EXTENSIVE/INTENSIVE VECTORS A normalized √ vector of length N is extensive, respec-


tively intensive, when its entries scale as .1/ N , respectively .∼ N 0 .

Intensive vectors are dominated by a finite number of entries, or by entries


that are summable.10 A quasispecies of distinct but closely related macromolecules
emerges when .eλmax is intensive, which is the case if the mutation rates .Wij (.i = j )
are either absent, or finite but small.

Error Catastrophe Eigen’s model is consistent when a quasispecies of macro-


molecules stabilizes. .eλmax is intensive in this case and the flux .φ finite, which means
that the mass conservation condition (8.39) can be fulfilled. One finds, in close
analogy to the error catastrophe discussed in Sect. 8.3.3, that these two properties
break down at a certain point when the off-diagonal matrix elements .Wij (.i = j )
are consecutively increased.11 At the threshold, .eλmax becomes extensive and the
population disperses, which is the telltale sign of an error catastrophe. The flux .φ
diverges at the same time, with the consequence that total mass conservation is no
longer possible, see (8.39). The underlying mechanism occurs also in models for
the emergence of hypercycles, which we will examine next within an analytically
tractable framework.

8.5.2 Hypercycles and Autocatalytic Networks

RNA World The macromolecular evolution equations (8.37) do not contain terms
describing the catalysis of molecule i by molecule j . This process is, however,
important both for the prebiotic evolution, as stressed by Manfred Eigen, as well
as for the protein reaction network in living cells.

HYPERCYCLES Two or more molecules may form a stable catalytic (hyper) cycle when
the respective intermolecular catalytic rates are large enough to mutually support their
respective synthesis.

An illustration of selected hypercycles is given in Figs. 8.8 and 8.9. A likely


chemical candidate for the constituent molecules is RNA, functioning both enzy-

10 A sequence of elements is summable if the sum does not diverge when the length is successively

increased. For vectors, the sum of the squared entries is of relevance.


11 This detailed out in exercise (8.2).
8.5 Prebiotic Evolution 307

A B
Fig. 8.8 The simplest hypercycle. Here, A/B are self-replicating molecules, with A acting as
a catalyst for B, and vice versa. The replication rate of one species increases hence with the
concentration of the other

matically and as a precursor of the genetic material. One speaks also of an “RNA
world”.

Reaction Networks Concentrating on the properties of catalytic reaction equa-


tions, we disregard mutations. Our basic reaction network is
⎛ ⎞

ẋi = xi ⎝λi +
. κij xj − φ ⎠ , (8.41)
j

where xi is the concentration of reactant i. The autocatalytic growth rate is λi


, with
κij encoding the transmolecular catalytic rates. Total concentration C = i xi
remains constant if the flux φ is set to
⎛ ⎞
 
φ=
. xk ⎝λk + κkj xj ⎠ . (8.42)
k j

In analogy to (8.39) one has


⎛ ⎞
  
Ċ =
. ẋi = xi ⎝λi + κij xj ⎠ − C φ
i i j

= (1 − C) φ → 0 ,

which implies that C → 1 in equilibrium.

Homogeneous Network An analytically solvable realization of (8.41) is given


when the interactions κi=j are homogeneous,

κi=j = κ,
. κii = 0, λi = α i , (8.43)
308 8 Darwinian Evolution, Hypercycles and Game Theory

par n

3
0

2
1

Fig. 8.9 A hypercycle of order n consisting of n cyclically coupled self-replicating molecules,


with each molecule providing catalytic support for the subsequent molecule. Parasitic self-
replicating molecules, indicated by ‘par’, receive catalytic support from the hypercycle without
contributing to it

where we also presumed uniformly distributed autocatalytic growth rates λi . The


reaction equations take the form
⎛ ⎞

ẋi = xi ⎝λi + κ
. xj − φ ⎠ = xi λi + κ − κxi − φ , (8.44)
j =i


where we have used i xi = 1. The stationary concentrations {xi∗ } of (8.44) are

∗ (λi + κ − φ)/κ
.xi = λi = α, 2α, . . . , N α , (8.45)
0

as illustrated in Fig. 8.10, where the non-zero solution is valid when the respective
xi∗ is positive, namely for λi − κ − φ > 0. The flux φ is determined self-consistently
via (8.42).

Self-Consistent Quasispecies Formation Dynamically, the xi (t) with the largest


growth rates λi will dominate and obtain a non-zero
 steady-state concentration xi∗ ,

as given by (8.45) However, total mass 1 = C = i xi needs to be conserved. We
may therefore assume that there exists an N ∗ ∈ [1, N] such that

∗ (λi + κ − φ)/κ for N ∗ ≤ i ≤ N
.xi = , (8.46)
0 for 1 ≤ i < N∗

as shown in Fig.  8.10, where N ∗ and φ are determined by the normalization


condition 1 = x ∗ . Remarkably, the self organization of the reaction network
i i
leads to a clear distinction. Part of the reactants die out, with the remainder forming
a stable, mutually supporting catalytic network.
8.5 Prebiotic Evolution 309

x*i : κ=50
40 x*i : κ=200 0.2

λi x*i : κ=450 x*i


20 0.1

0 0
0 10 20 30 i 40 50

Fig. 8.10 The autocatalytic growth rates λi (left axis), as in (8.43) with α = 1. The stationary
concentrations xi∗ (right axis) constitutes a prebiotic quasispecies, compare (8.46). Shown are
results for various mean transcatalytic rates κ. The individual molecules i = 1, . . . , 50 are indexed
along the horizontal axis

Termination Condition Mass conservation dictates

   
α 
N N
λi + κ − φ κ −φ
.1= xi∗ = = i+ N + 1 − N∗

κ κ ∗
κ
i i=N i=N

α   κ − φ 
∗ ∗
= N(N + 1) − N (N − 1) + N + 1 − N∗ ,
2κ κ

where λi = αi has been used. The termination condition, xi∗ = 0 for i = N ∗ − 1,

λN ∗ −1 + κ − φ α(N ∗ − 1) κ − φ
0=
. = + , (8.47)
κ κ κ

can be used to eliminate (κ − φ)/κ. We take the limit of large numbers of reactants
N and N ∗ ,

2κ  2  
.  N 2 − N ∗ − 2N ∗ N − N ∗
α
 2
= N 2 − 2N ∗ N + N ∗ = (N − N ∗ )2 .

The number of surviving reactants N − N ∗ is therefore



∗ 2κ
N −N
.  , (8.48)
α
310 8 Darwinian Evolution, Hypercycles and Game Theory

which is non-zero for a finite and positive inter-molecular catalytic rate κ. A


hypercycle of mutually supporting species (or molecules) has formed.

Origin of Life Aeons have passed since life emerged on earth. The scientific
discussions concerning its origin continue to be controversial and it remains spec-
ulative whether hypercycles played a central role. The basic theory of hypercycles
treated here, describing closed systems of chemical reactions, would be needed to
be extended to non-equilibrium situations, with constant in- and outflows of both
molecules and energy. In fact, a defining feature of biological activities is the buildup
of local structures, which is possible when entropy is reduced locally at the expense
of an overall increase of environmental entropy. Life, as we understand it today, is
based on open systems driven by a constant flux of energy.

It is nevertheless interesting to point out that (8.48) implies a clear division


between molecules i = N ∗ , . . . , N that can be considered to form a primordial
“life form”, and the remainder molecules i = 1, . . . , N ∗ − 1. These reactants
are ‘outside’, viz belonging to the environment, as their concentrations drop to
zero ‘inside’. This clear separation between participating and non-participating
substances is a result of the non-linearity of the reaction equation (8.41). The linear
evolution equation (8.37) would, on the other hand, result in a continuous density
distribution, as illustrated in Fig. 8.5 for the case of the sharp peak fitness landscape.
One could then conclude that life is possible only via cooperation, resulting from
non-linear evolution equations.

8.6 Macroecology and Species Competition

In Macroecology one disregards both the genetic basis of evolutionary processes as


well as specific species interdependencies. One is interested in formulating general
principles and models describing the overall properties of extended communities of
species.

Neutral Theory A central topic in ecology is to explain the distribution of species


abundances, as illustrated in Fig. 8.11 for the count of trees in a tropical rainforest
plot. In this example, for most species around 32 individual trees did grow in the
patch examined, there are fewer species with more/less individuals. Similar species
abundance distributions are found in virtually all ecosystems studied.

The neutral theory, as formulated originally by Hubbel, proposes that a handful


of universal principles are enough to explain the species abundance distributions
observed in nature. The framework is “neutral” as it abstracts from attributes that
are specific to the members of the taxa examined. The two central principles of the
neutral theory can be implemented mathematically in various fashions.
8.6 Macroecology and Species Competition 311

Fig. 8.11 The abundance of 40

trees in a 50 ha patch of
tropical rainforest in Panama 30

number of species
(bars), in comparison with the
prediction of neutral theory
(filled circles), see (8.52), 20
namely a Gamma
distribution. Data from 10
Volkov et al. (2003)

0
1 2 4 8 16 32 64 128 256 512 1024 2024
number of individuals per species

– DETERMINISTIC COMPETITION
In between species, resource competition is deterministic.
– STOCHASTIC EVENTS
Births, deaths, migration, and speciation, are treated on a stochastic level.

Stochasticity is thought to lead to random fluctuations in population sizes.

Stochastic Walk Through Population Space We consider a species performing


a random walk in population space. The master equation for the probability .px (t)
to observe x individuals of the species at time t is given by birth and death events,
proportional to .bx and .dx respectively,

ṗx (t) = bx−1 px−1 (t) + dx+1 px+1 (t) − bx + dx px (t) .


. (8.49)

The birth and death processes contain both intensive and extensive terms, respec-
tively .∝ (x)0 and .∝ (x)1 ,

bx = b̃0 + b̃1 x,
. dx = d̃0 + d̃1 x , (8.50)

where the intensive contributions .b̃0 (.d̃0 ) model the cumulative effects of immi-
gration (emigration) and speciation (extinction), which occur on the level of
individuals. The extensive terms, .b̃1 (.d̃1 ), are generated by reproduction processes
actual births and deaths, which are proportional to the population size.

Fokker–Planck Equation of Macroecology For large populations we may treat x


as a continuous variable and approximate the difference .dx+1 px+1 −dx px occurring
on the right hand side of (8.49) through a Taylor expansion,

∂   1 ∂2  
.dx+1 px+1 − dx px  dx px + dx px + . . . ,
∂x 2 ∂x 2
312 8 Darwinian Evolution, Hypercycles and Game Theory

where .x + 1 = x + Δx was used implicitly with .Δx = 1. An analogous relation


holds for the birth processes. Using the notation .p(x, t) = px (t) we obtain with

∂p(x, t) ∂   1 ∂2  
. = dx − bx p(x, t) + dx + bx p(x, t)
∂t ∂x 2 ∂x 2

the “Fokker–Planck equation of macroecology”,12 which we rewrite as

∂p(x, t) ∂ x ∂2  
. = − b p(x, t) + D 2 x p(x, t) , (8.51)
∂t ∂x τ ∂x
where we defined

−1 b̃1 + d̃1
τ = d̃1 − b̃1
. > 0, b = b̃0 − d̃0 = 2b̃0 > 0, D= ,
2
when restricting to the case .b0 = −d0 > 0.

Competition vs. Diffusion The parameters introduced in (8.51) have the following
interpretations:

– D induces fluctuations of the order . x in a population of size x.
– b corresponds to the net influx, caused either by immigration or by speciation.
– .τ is a measure of the strength of interaction effects in the ecosystem, expressed
as the time scale the system needs to react to perturbations.

In order to understand the effect of .τ in more detail, we start with the case .b = 0 =
D, for which the Fokker–Planck equation (8.51) reduces to

τ ṗ = p + xp ,
. p ∼ e−t/T x β , −τ/T = 1 + β .

The distribution is normalizable for .β < −1, which implies .T > 0. The ecosystem
would slowly die out, on a time scale T , as a consequence of the competition
between the species, when not counterbalanced by the diffusion D and the external
source b. Note, that .τ > 0 implies that .d̃1 > b̃1 , namely that there are more deaths
than births, on average, for all species and independent of population sizes.

Solution of the Fokker–Planck Equation The steady-state solution .ṗ(x, t) = 0


of (8.51) is the Gamma distribution

p0 (x) = A x b/D−1 e−x/(Dτ ) ,


. (8.52)

12 The theory behind the Fokker–Planck equation is developed in Sect. 3.5.2 of Chap. 3.
8.7 Coevolution and Game Theory 313

where A is a normalization constant. For a verification note that the population


current J , defined via the continuity equation .ṗ(x, t) = −∇J , vanishes in the
steady state, .J = 0, and hence

x ∂   x
0=
. − b p0 (x) + D xp0 (x) = + D − b p0 (x) + Dxp0 (x) ,
τ ∂x τ
which is satisfied by the solution (8.52). The steady state solution (8.52) fits the
real-world data quite well, see Fig. 8.11.

Microscopic Models The master equation (8.49) of macroevolution can be derived


from microscopic models. A specific microscopic update rule will lead to particular
expressions regarding the dependence of the birth and death rates .bx and .dx on
population size, and other microscopic parameters, which may be similar, but not
identical, to the relations postulated in (8.50).

An example of a microscopic model would be an ecosystem of S species


containing an overall number of N individuals, with a pairwise relation between
the species. For every pair one of the species dominates the other species with
probability .ρ, with probability .1 − ρ their relation is neutral.
At every time step a pair of individuals belonging to two different species S and

.S is considered.

– Stochastic process. With probability .μ the number of individuals in species S


(.S  ) is increased by one (decreased by one).
– Competition. With probability .1 − μ the number of individuals in the dominating
(inferior) species is increased by one (reduced by one). No update is performed
if their relation is neutral.

These update rules are conserving with respect to the total number of individuals.
The steady-state distribution obtained by this model is similar to the one obtained
for the neutral model defined by the birth and death rates (8.50), shown in Fig. 8.11,
with the functional form of their respective species abundance distributions differing
in details.

8.7 Coevolution and Game Theory

The average number of offspring, viz the fitness, is the single relevant reward
function within Darwinian evolution. There is hence a direct connection between
evolutionary processes and ecology with game theory, which deals with interacting
agents trying to maximize a single reward function, denoted utility. Several types
of games may be considered in this context, namely games of interacting species
giving rise to coevolutionary phenomena or games of interacting members of the
same species, pursuing distinct behavioral strategies.
314 8 Darwinian Evolution, Hypercycles and Game Theory

F(S)

x(S)

sequence space S sequence space S

F(S)

x(S)

sequence space S sequence space S

Fig. 8.12 Top: An evolutionary process of a single species in a static fitness landscape here with
tower-like structures, as defined in (8.24). The density of individuals in sequence space, .x(S), will
reorganize accordingly. Bottom: Coevolutionary processes are present when the adaption of one
species changes the fitness landscapes .F (S) of other species

Coevolution The larger part of this chapter has been devoted to the discussion of
the evolution of a single species. In Sect. 8.5.2 we ventured into the stabilization
of ‘ecosystems’ composed of a hypercycle of mutually supporting species, before
turning in Sect. 8.6 to general macroecological principles. We now go back to the
level of a few interdependent species.

COEVOLUTION When two or more species form an interdependent ecosystem the


evolutionary progress of part of the ecosystem will produce new evolutionary forcings for
other species.

One can view the coevolutionary process also as a change in the respective
fitness landscapes, as illustrated in Fig. 8.12. A prominent outcome of reciprocal
coevolution forcings is the “red queen” phenomenon.

RED QUEEN PHENOMENON When two or more species are interdependent then “It takes
all the running, to stay in place” (from Lewis Carroll’s children’s book “Through the
Looking Glass”).

A well-known example of the red queen phenomenon is the “arms race” between
predator and prey commonly observed in real-world ecosystems. Snakes becoming
ever more poisonous, with frogs developing successively higher resistance levels.

Green World Hypothesis Plants abound in real-world ecosystems, which are,


geology and climate permitting, rich and green. Naively one may expect that
herbivores should proliferate when food is plenty, keeping vegetation constantly
down. This does not seem to happen in the world, as a result of coevolutionary
interdependencies. The likely culprit are “trophic cascades”, where predators keep
8.7 Coevolution and Game Theory 315

herbivores substantially below the support level of the bioproductivity of the


plants.13 This “green world hypothesis” arises naturally in evolutionary models
featuring coevolutionary interdependencies.

Avalanches and Punctuated Equilibrium We discussed previously,14 that coevo-


lutionary avalanches may be observed within a state of punctuated equilibrium.

PUNCTUATED EQUILIBRIUM Most of the time ecosystems are in a steady-state neutral


phase. Outside influences, or rare stochastic processes may induce periods of rapid
coevolutionary processes.

The term punctuated equilibrium was coined to describe a characteristic feature


of the evolution of simple traits observed in fossil records. In contrast to the
gradualistic view of evolutionary changes, these traits typically show long periods
of stasis interrupted by sequences of rapid changes.

Strategies and Game Theory In contrast to the mostly stochastic considerations


discussed so far, one is often interested in the evolutionary processes giving rise to
specific survival strategies. These are questions that can be addressed within game
theory, which deals with strategically interacting agents in economics and beyond.
When an animal meets another animal it has to decide, to give an example, whether
confrontation, cooperation, or defection is the best strategy. Some basic elements.

– UTILITY
Every participant, the agent, plays for himself, solely maximizing its own utility.
– STRATEGY
Every participant follows a set of rules of what to do when encountering an
opponent; the strategy.
– ADAPTIVE GAMES
In adaptive games, the participants change their strategy in order to maximize
future return. This change can be either deterministic or stochastic.
– ZERO-SUM GAMES
When the sum of utilities is constant, you can only win what the others lose.
– NASH EQUILIBRIUM
Any strategy change by participants leads to a reduction of its utility.

13 The basic model for a trophic cascade, the Lotka-Volterra Model for rabbits and foxes, is treated

in Sect. 3.2.1 of Chap. 3.


14 See Sect. 6.6 of Chap. 6, where the Bak and Sneppen model of coevolutionary cascades is

analyzed.
316 8 Darwinian Evolution, Hypercycles and Game Theory

Hawks and Doves This simple evolutionary game tries to model competition
in terms of expected utilities between aggressive behavior (by the “hawk”) and
peaceful demeanor (by the “dove”). The rules are given in Table 8.2.
The expected returns, the utilities, are collected together in the “payoff matrix”,
  1 
AHH AHD 2 (V − C) V
A=
. = V . (8.53)
ADH ADD 0 2

which specifies what happens if agents meet. Does it pay to be peaceful, or


aggressive, and under which conditions? This is the question.

Adaptation by Evolution The introduction of reproductive capabilities for the


participants turns the Hawks and Doves game into an evolutionary game. In this
context one presumes that behavioral strategies are generated by distinct alleles.

The average number of offspring of a player is proportional to its fitness, which


is given in turn by the expected utility,
 
ẋH = AHH xH + AHD xD − φ(t) xH
.   , (8.54)
ẋD = ADH xH + ADD xD − φ(t) xD

where .xD and .xH are the densities of doves and hawks. The flux .φ is given by

φ(t) = xH AHH xH + xH AHD xD + xD ADH xH + xD ADD xD


.

it ensures an overall constant population, .xH + xD = 1.

Steady State Solution We are interested in the steady-state solution of (8.54), with
ẋD = 0 = ẋH . Setting
.

xH = x,
. xD = 1 − x ,

we find

x2 V V C
φ(t) =
. (V − C) + V x(1 − x) + (1 − x)2 = − x2
2 2 2 2

Table 8.2 Rules of the Hawk & Dove game, together with the entries .ADD , etc., of the payoff
matrix defined in (8.53)
Dove meets Dove .ADD = V /2 They divide the territory
Hawk meets Dove .AHD = V , .ADH = 0 The Hawk gets all the territory, the Dove
retreats and gets nothing
Hawk meets Hawk .AHH = (V − C)/2 They fight, get injured, and win half the
territory
8.7 Coevolution and Game Theory 317

for the flux. The update rule for the density of hawks is
   
V −C V V C 2
ẋ =
. x + V (1 − x) − φ(t) x = − x+ x −x x
2 2 2 2
   
C C+V V C V
= x x2 − x+ = x (x − 1) x −
2 C C 2 C
d
≡− V (x) ,
dx
where we defined the potential

x2 x3 x4
V (x) = −
. V + (V + C) − C .
4 6 8
The steady-state solution is given by

V  (x) = 0,
. x = V /C ,

apart from the trivial states .x = 0/1 (no/only hawks).

– For .V > C doves are eliminated from the population, it does not pay to be
peaceful.
– For .V < C hawks and doves coexists, with respective densities .x = V /C and
.1 − V /C.

A population consisting exclusively of cooperating doves, .x = 0, is unstable against


the intrusion of hawks.

Prisoner’s Dilemma The payoff matrix of the prisoner’s dilemma is


 
R S T >R>P >S cooperator =
ˆ dove
A=
. . (8.55)
T P 2R > S + T defector =
ˆ hawk

Here “cooperation” between the two prisoners is implied, and not cooperation
between a suspect and the police. The prisoners are best off if both keep silent.
The standard values are

T = 5,
. R = 3, P = 1, S = 0.
318 8 Darwinian Evolution, Hypercycles and Game Theory

The maximal global utility NR is obtained when everybody cooperates. However,


in a situation where agents interact independently, the only stable Nash equilibrium
is defection, with a global utility NP .
 
reward for cooperators = Rc = RNc + S(N − Nc ) /N ,
.

 
reward for defectors = Rd = T Nc + P (N − Nc ) /N ,

where .Nc is the number of cooperators and N the total number of agents. The
difference is

Rc − Rd ∼ (R − T )Nc + (S − P )(N − Nc ) < 0 ,


.

as .R − T < 0 and .S − P < 0. The reward for cooperation is always smaller than
that for defecting.

Evolutionary Games on a Lattice The adaptive dynamics of evolutionary games


can change completely when the individual agents are placed on a regular lattice. A
strong impact has also the possibility to adapt strategies based on past observations.
We give a possible set rules.

– At each generation (time step) agents evaluate their payoff together with the
payoff of the neighbors.
– Next, agents compare their payoff one by one with the payoffs obtained by the
neighbors.
– Agents switche strategy (cooperate/defect) to the strategy of the neighbor with
the highest payoff.

Despite their simplicity, this set of rules leads to surprisingly complex real-space
patterns of defectors intruding patches of cooperators, as illustrated in Fig. 8.13.
Further details depend on the value chosen for the payoff matrix.

Nash Equilibria vs. Coevolutionary Avalanches Coevolutionary games on a


lattice eventually lead to an equilibrium state, which by definition has to be a
Nash equilibrium. If such a state is perturbed from the outside, a self-critical
coevolutionary avalanche may follow, in close relation to the sandpile model.15

Game Theory and Memory Standard game theory deals with an anonymous soci-
ety of agents, with agents having no memory of previous encounters. Generalizing
this standard setup it is possible to empower the agents with a memory of their past
strategies and achieved utilities. Considering additionally individualized societies,

15 The occurrence of self-organized criticality in the sandpile model is discussed in Chap. 6.


8.7 Coevolution and Game Theory 319

(0) (1) (2) (3)

(4) (5) (10) (20)

Fig. 8.13 Time series of the spatial distribution of cooperators (gray) and defectors (black) on a
lattice of size .N = 40 × 40. Time is given as the numbers of generations (in brackets). Initial
condition: equal number of defectors and cooperators, randomly distributed. Parameters for the
payoff matrix, .{T ; R; P ; S} = {3.5; 3.0; 0.5; 0.0}. Reprinted from Schweitzer et al. (2002) with
permissions, © 2002 World Scientific Publishing Co Pte Ltd

this memory may include the names of the opponents encountered previously.
This type of games provides the basis for studying the emergence of sophisticated
survival strategies, like altruism, via evolutionary processes.

Opinion Dynamics Agents in classical game theory aim to maximize their respec-
tive utilities. However, this does not necessarily reflect daily social interactions.
When encountering somebody else, explicit maximization of rewards or utilities
is not always the priority.
An example of reward-free games is given by opinion dynamics models. In a
basic model .i = 1, . . . , N agents have continuous opinions .xi = xi (t). When two
agents interact, they change their respective opinions according to

[xi (t) + xj (t)]/2 |xi (t) − xj (t)| < θ
xi (t + 1) =
. , (8.56)
xi (t) |xi (t) − xj (t)| ≥ θ

where .θ is the confidence interval. This is the “bounded confidence model” we


discussed before.16 Consensus can be reached step by step only when the initial
opinions are not too contrarian. Relative to the initial scatter of opinions, global
consensus will be reached for large confidence intervals .θ . Clusters of distinct
opinions emerge on the other side for a small confidence interval.

16 Bounded confidence opinion dynamics is treated in detail in Sect. 4.3.4 of Chap. 4.


320 8 Darwinian Evolution, Hypercycles and Game Theory

8.7.1 Tragedy of the Commons

Ecological settings correspond in many instances to a game theoretical problem.


Competition between agents can be either direct, as for the Hawks and Doves
game, or indirect. The latter occurs when agents compete for a common resource, a
situation described by the celebrated “tragedy of the commons”.

Common Pool Resources A village may have a pond without access restrictions.
Everybody with a fishing rod may go to the pond and fish at any time. Overfishing is
the likely outcome, with the consequence that the common pool resource is depleted.
When this happens, the “tragedy of the commons” is in the making. However, free
access does not imply that fishing does not incur costs. One has to invest money, for
the equipment, and time, for the trip to the pond and for the actual activity. People
will stop fishing once returns become too small.
Diverse situations are described by the tragedy of the commons, such as the
overfishing of an ocean by fleets of trawlers, or the common exploitation of
an underground aquifer. Importantly, pollution of a common resource, like CO.2
emissions into earth’s atmosphere, fall into the same category.

Reference Model Agents invest into the commons as long as their payoffs .Ei
remain positive, where .i = 1, .., N, with N being the number of agents. The
quantity invested, .xi ≥ 0, can viewed as time or money. As usual, payoffs are given
by the difference between nominal return and investment costs,
  
Ei = e−xtot − ci xi ,
. xtot = xj , (8.57)
j

where .ci > 0 are per-unit investing costs. The factor .exp(−xtot ) specifies how the
productivity of the commons degrades when total investment .xtot increases.

Profitable Agents Selfish agents increase their investments .xi as long as their
gradients .dEi /dxi remain positive. The equilibrium condition is

(1 − xi )e−xtot = ci ,
. xi = 1 − ci extot . (8.58)

which can be rewritten as as


ci
xi = 1 −
. , cmax = e−xtot , (8.59)
cmax

where .cmax is the profitability barrier. Agents with .ci > cmax would have negative
xi and returns, which is not profitable. It is hence sufficient to restrict the analysis to
.

cost-efficient agents, with .ci ≤ cmax ..


8.7 Coevolution and Game Theory 321

total investment xtot 3

2 N=1
N=2
N=3
1 -log(c)

0
0 0.2 0.4 0.6 0.8 1
mean investment costs c
Fig. 8.14 For the tragedy of the commons, the optimal total investment .xtot . For N agents, total
investment as a function of mean investment costs .c̄ is given, see (8.61). Limiting behaviors are
.xtot = N at .c̄ = 0 and .xtot = 0 for .c̄ = 1. Also shown is the case .N → ∞, for which .xtot =
log(1/c̄) diverges logarithmically for vanishing .c̄ → 0

Dispersion Relation For the stationary state, combining (8.59) and (8.57) leads to
   
  ci ci 2
.Ei = cmax − ci 1 − = cmax 1 − , (8.60)
cmax cmax

which constitutes a dispersion relation .Ei = E(ci ). The profitability barrier .cmax =
exp(−xtot ) is determined in turn by the self-consistency condition

xtot −xtot 1 
. 1− e = c̄, c̄ = ci , (8.61)
N N
i

which is obtained by summing (8.58) over all agents. Here, .c̄ are the average invest-
ment costs of profitable agents. The resulting .xtot is plotted in Fig. 8.14. As expected,
the productivity .exp(−xtot ) of the common resource degrades progressively with
increasing N and decreasing .c̄.

Catastrophic Poverty We expand the self-consistency condition (8.61) for large


numbers of agents N,

c̄ c̄ xtot log(1/c̄)
. = −x = 1 − ≈1− , (8.62)
cmax e tot N N
322 8 Darwinian Evolution, Hypercycles and Game Theory

where we used that .limN→∞ xtot = log(1/c̄). For an average agent, with .ci = c̄, the
large-N expansion of the payoff .E(c̄) is consequently
 2
log(1/c̄) 1
E(c̄) ≈ cmax
. ∼ , (8.63)
N N2

where the dispersion relation (8.60) has been used. Intuitively one could have
expected that payoffs would scale instead as .1/N. The fact that the average payoff
scales worse, denoted “castastrophic poverty”, can be traced to a progressively
deteriorating state of the commons.

Oligarchs At face value, Eq. (8.63) holds only for the average agent. One can how-
ever prove that the payoffs of the majority of all agents scale as .(1/N)2 . A notable
exception are “oligarchs”, namely agents with investement costs substantially below
0
.c̄. Oligarchs have payoffs that scale as .(1/N) . Only a finite number of oligarchs

may howver be present.

Exercises

(8.1) THE ONE-DIMENSIONAL ISING MODEL


Solve the one-dimensional Ising model
 
H =J
. si si+1 + B si
i i

by the transfer matrix method presented in Sect. 8.3.2. Calculate the free
energy F (T ,B), the magnetization M(T , B) and the susceptibility χ (T ) =
limB→0 ∂M(T∂B
,B)
.
(8.2) ERROR CATASTROPHE
For the prebiotic quasispecies model (8.40), consider tower-like autocatalytic
reproduction rates Wjj and mutation rates Wij (i = j ) of the form

 ⎨ u+ i = j + 1
1 i=1
.Wii = , Wij = u− i = j − 1 ,
1−σ i >1 ⎩
0 i = j otherwise

with σ, u± ∈ [0, 1]. Determine the error catastrophe for the two cases
u+ = u− ≡ u and u+ = u, u− = 0. Compare with the results for the
tower landscape discussed in Sect. 8.3.3.
Hint: For the stationary eigenvalue (8.40), with ẋi = 0 (i = 1, . . .), write xj +1
as a function of xj and xj −1 . This two-step recursion relation leads to a 2 × 2
matrix. Consider the eigenvalues/vectors of this matrix, the initial condition
8.7 Coevolution and Game Theory 323


for x1 , and the normalization condition i xi < ∞ valid in the adapting
regime.
(8.3) COMPETITION FOR RESOURCES
The competition for scarce resources has been modeled in the quasispecies
theory, see (8.37), by an overall constraint on population density. With

.ẋi = Wii xi Wii = f ri − d, f˙ = a − f ri xi (8.64)
i

one models the competition for the resource f explicitly. Here a and f ri
are the regeneration rates of the resource f , respectively for species i, with
d being the mortality rate. Equation (8.64) does not contain mutation terms
∼ Wij , describing a stationary ecosystem. 
Which is the steady-state value of the total population density C = i xi and
of the resource level f ? Is the ecosystem stable?
(8.4) COMPETITIVE POPULATION DYNAMICS
Consider the symmetric Lotka-Volterra system

ẋ = x (1 − ky) − x ,
. ẏ = y (1 − kx) − y (8.65)

modeling populations of sheeps and rabbits with populations densities x and


y competing for the same food resource. Find the fixpoints and consider the
case of weak competition k < 1 and of strong competition k > 1. When do
sheeps and rabbits live happily together?
Equation (8.65) is invariant under an exchange x ↔ y. Can this symmetry be
spontaneously broken?
(8.5) HYPERCYCLES
Examine a reaction system for N = 2 molecules and a homogeneous network,
as defined by (8.41) and (8.42). Find the fixpoints and discuss their stability.
(8.6) PRISONER’S DILEMMA ON A LATTICE
Consider the stability of intruders in the prisoner’s dilemma (8.55) on a square
lattice, as the one illustrated in Fig. 8.13. Presume either a single intruder
or two adjacent intruders, both for defectors/cooperators in a background of
cooperators/defectors. Who survives?
(8.7) NASH EQUILIBRIUM
Examine the Nash equilibrium and its optimality for the following two-player
game:
Each player acts either cautiously or riskily. A player acting cautiously always
receives a low payoff. A player playing riskily gets a high payoff if the other
player also takes a risk. Otherwise, the risk-taker obtains no reward.
(8.8) WAR OF ATTRITION
When the primary investment is time, like in the competition of animals for
nesting grounds, the agent staying longest may win. A strategy S is defined in
terms of the probability p(x) to invest x ∈ [0, ∞]. Agents bidding m against
324 8 Darwinian Evolution, Hypercycles and Game Theory

S have the expected payoff


 m  ∞  m
Em = v
. p(x)dx − c(m) p(x)dx − c(x)p(x)dx , (8.66)
0 m 0

where v is the bare reward and c(x) = x γ a generic cost function. When
losing, the invested time is not recovered. Find the optimal strategy for the
war of attrition, as defined by (8.66).
(8.9) TRAGEDY OF THE COMMONS
For the analysis of the tragedy of the commons a productivity function
P (xtot ) = exp(−xtot ) had beed used in (8.57). Show that catastrophic poverty
is present for a generic P (xtot ).

Further Reading

There are excellent review articles on statistical approaches to Darwinian evolution


and ecology, see Azaele et al. (2016), Blythe and McKane (2007), and Drossel
(2001). The interested reader may want to consult Bull et al. (2005) regarding the
notion of quasispecies, Higgs and Lehman (2015) with respect to the RNA world,
Heasley et al. (2021), for the punctuated equilibrium, and Wilkinson et al. (2016)
for the green world hypothesis. General textbooks on evolution, game theory and
hypercycles are Sigmund (2017) and Eigen and Schuster (2012). The tragedy of the
commons is treated together with the notion of catastrophic poverty in Gros (2023).

References
Azaele, S., et. al. (2016). Statistical mechanics of ecological systems: Neutral theory and beyond.
Reviews of Modern Physics, 88, 035003.
Blythe, R. A., & McKane, A. J. (2007). Stochastic models of evolution in genetics, ecology and
linguistics. Journal of Statistical Mechanics: Theory and Experiment, P07018.
Bull, J. J., Meyers, L. A., & Lachmann, M. (2005). Quasispecies made simple. PLoS Computa-
tional Biology, 1, e61.
Drake, J. W., Charlesworth, B., & Charlesworth, D. (1998). Rates of spontaneous mutation.
Genetics, 148, 1667–1686.
Drossel, B. (2001). Biological evolution and statistical physics. Advances in Physics, 2, 209–295.
Eigen, M., & Schuster, P. (2012). The hypercycle – A principle of natural self-organization.
Springer.
Gros, H. (2023). Generic catastrophic poverty when selfish investors exploit a degradable common
resource. Royal Society Open Science, 10, 221234.
Heasley, L. R., Sampaio, N. M., & Argueso, J. L. (2021). Systemic and rapid restructuring of the
genome: A new perspective on punctuated equilibrium. Current Genetics, 67, 57–63.
Higgs, P. G., & Lehman, N. (2015). The RNA world: Molecular cooperation at the origins of life.
Nature Reviews Genetics, 16, 7–17.
Jain, K., & Krug, J. (2006). Adaptation in simple and complex fitness landscapes. In U. Bastolla,
M. Porto, H. E. Roman, & M. Vendruscolo, (Eds.), Structural approaches to sequence evolution:
Molecules, networks and populations. AG Porto.
References 325

Schweitzer, F., Behera, L., & Mühlenbein, H. (2002). Evolution of cooperation in a spatial
prisoner’s dilemma. Advances in Complex Systems, 5, 269–299.
Sigmund, K. (2017). Games of life: Explorations in ecology, evolution and behavior. Courier.
Volkov, I., Banavar, J. R., Hubbell, S. P., & Maritan, A. (2003). Neutral theory and relative species
abundance in ecology. Nature, 424, 1035–1037.
Wilkinson, D. M., & Sherratt, T. N. (2016). Why is the world green? The interactions of top-down
and bottom-up processes in terrestrial vegetation ecology. Plant Ecology & Diversity 9, 127–
140.
Synchronization Phenomena
9

Complex systems are based on interacting local computational units may show non-
trivial emerging behaviors. Examples are the time evolution of an infectious disease
in a certain city that is mutually influenced by an ongoing outbreak of the same
disease in another city, or the case of a neuron firing spontaneously while processing
the effects of afferent axon potentials.
A fundamental question is whether the time evolutions of interacting local
units remain dynamically independent of each other, or whether they will change
their states simultaneously, following identical rhythms. This is the notion of
synchronization, which we will study throughout this chapter. Starting with the
paradigmatic Kuramoto model we will learn that synchronization processes may be
driven either by averaging dynamical variables, or through causal mutual influences.
On the way, we will visit piecewise linear dynamical systems and the reference
model for infectious diseases, the SIRS model.

9.1 Frequency Locking

In this chapter, we will be dealing mostly with autonomous dynamical systems that
may synchronize spontaneously. A dynamical system can also be driven by outside
influences, being forced to follow the external signal synchronously.

Driven Harmonic Oscillator For a start, we recapitulate the dynamical states of


the driven harmonic oscillator,
( )
.ẍ + γ ẋ + ω0 x = F e + c.c. ,
2 iωt
γ > 0, (9.1)

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 327
C. Gros, Complex and Adaptive Dynamical Systems,
https://doi.org/10.1007/978-3-031-55076-8_9
328 9 Synchronization Phenomena

where ‘c.c.’ stands for conjugate complex. Of relevance will be the relation between
the eigenfrequency .ω0 , damping .γ , and the driving frequency .ω. In the absence of
an external forcing, .F ≡ 0, the solution is
/
γ 1
x(t) ∼ eλt ,
. λ ± =− ± γ 2 − 4ω02 , (9.2)
2 2
which is damped/critical/overdamped for .γ < 2ω0 , .γ = 2ω0 and .γ > 2ω0 .

Frequency Locking In the limit of long times, .t → ∞, the dynamics of the system
follows the external driving. Due to the damping .γ > 0, this holds for all .F /= 0.
For the ansatz

x(t) = aeiωt + c.c. ,


. (9.3)

the amplitude a may contain, as a matter of principle,1 an additional time-


independent phase. Applying to (9.1) yields
( ) ( )
F = a −ω2 + iωγ + ω02 = −a ω2 − iωγ − ω02
.

= −a (ω + iλ+ ) (ω + iλ− ) ,

where the eigenfrequencies .λ± are given by (9.2). The solution for the amplitude a
can then be written in terms of .λ± or alternatively as

−F
a=(
. ) . (9.4)
ω2 − ω02 − iωγ

The response becomes divergent, viz .a → ∞, at resonance .ω = ω0 and small


dampings .γ → 0.

General Solution The driven, damped harmonic oscillator (9.1) is an inhomo-


geneous linear differential equation. The general solution is hence given by the
superposition of the special solution (9.4) with the general solution of the homo-
geneous system (9.2). The latter dies out for .t → ∞ and the system synchronizes
with the external driving frequency .ω.

1 “Secular perturbation theory” deals with time-dependent amplitudes, .a = a(t). See Sect. 3.2 of
Chap. 3.
9.2 Coupled Oscillators and the Kuramoto Model 329

9.2 Coupled Oscillators and the Kuramoto Model

Whenever their dynamical behaviors are similar and the mutual couplings substan-
tial, sets of local dynamical systems may synchronize. It is not necessary that the
individual units show periodic dynamics on their own, coupled chaotic attractors can
also synchronize. We start by discussing the simplest non-trivial set-up, the case of
harmonically coupled uniform oscillators.

Limit Cycles In suitable coordinates, free rotations such as


( )
x(t) = r cos(θ (t)), sin(θ (t)) ,
. θ (t) = ωt + θ0 , θ̇ = ω ,

describe periodic attractors, viz limit cycles. In the following we take the radius r
as given, using the phase variable .θ = θ (t) for an effective description.

Coupled Dynamical Systems We study the behavior of a collection of individual


dynamical systems .i = 1, . . . , N, each characterized by a limit cycle with natural
frequency .ωi . The coupled system obeys

E
N
.θ̇i = ωi + Γij (θi , θj ), i = 1, . . . , N , (9.5)
j =1

where .Γij are suitable coupling constants.

Kuramoto Model A particularly tractable choice for the coupling constants .Γij
has been proposed by Kuramoto:

K
Γij (θi , θj ) =
. sin(θj − θi ) , (9.6)
N
where .K ≥ 0 is the coupling strength. The factor .1/N ensures that the model is well
behaved in the limit .N → ∞.

Two Coupled Oscillators We consider first the case .N = 2,

K K
θ̇1 = ω1 +
. sin(θ2 − θ1 ), θ̇2 = ω2 + sin(θ1 − θ2 ) , (9.7)
2 2
or

Δθ̇ = Δω − K sin(Δθ ),
. Δθ = θ2 − θ1 , Δω = ω2 − ω1 . (9.8)
330 9 Synchronization Phenomena

Fig. 9.1 The relative phase 2.5


.Δθ(t) of two coupled
oscillators, obeying (9.8),
Δω = 1.0 K = 0.9
2

Δθ/π
with .Δω = 1 and a critical
coupling strength .Kc = 1. 1.5
For an undercritical coupling,
.K = 0.9, the relative phase
1
increases steadily, for an
overcritical coupling
0.5
K = 1.01
.K = 1.01 it locks

0
0 5 10 15 20
t

The system has a fixpoint .Δθ ∗ with regard to the relative phase, determined by

d Δω
. Δθ ∗ = 0, sin(Δθ ∗ ) = , (9.9)
dt K
which leads to

Δθ ∗ ∈ [−π/2, π/2],
. K > |Δω| . (9.10)

This condition is valid for attractive coupling constants .K > 0. For repulsive .K < 0
anti-phase states are stabilized. We analyze the stability of the fixpoint using .Δθ =
Δθ ∗ + δ and (9.8), obtaining

d ( ) ∗
. δ = − K cos Δθ ∗ δ, δ(t) = δ0 e−K cos Δθ t .
dt

The fixpoint is stable since .K > 0 and .cos Δθ ∗ > 0, due to (9.10). Consequently, a
bifurcation is present.

– For .K < |Δω| there is no phase coherence between the two oscillators, which
are drifting with respect to each other.
– For .K > |Δω| phase locking is observed, which means that the two oscillators
rotate together with a constant phase difference.

The situation is illustrated in Fig. 9.1.

Natural Frequency Distribution We now examine what happens if we have a


larger number of coupled oscillators, .N → ∞. We select the probability distribution
.g(ω) of the natural frequencies .ωi , to be symmetric,

f ∞
.g(ω) = g(−ω), g(ω) dω = 1 , (9.11)
−∞
9.2 Coupled Oscillators and the Kuramoto Model 331

The choice of a zero average frequency,


f ∞
. ω g(ω) dω = 0 ,
−∞

implicit in (9.11) is generally possible, as the dynamical equations (9.5) and (9.6)
are invariant under a global translation

ω → ω + Ω,
. θi → θi + Ωt ,

with .Ω being the initial non-zero mean frequency.

Order Parameter The complex order parameter

1 E iθj
N
r eiψ =
. e (9.12)
N
j =1

is a macroscopic quantity that can be interpreted as the collective rhythm produced


by the assembly of the interacting oscillating systems. The radius .r(t) measures the
degree of phase coherence, with .ψ(t) corresponding to the average phase.

Molecular Field Representation We rewrite the order parameter defined in (9.12)


as

1 E i(θj −θi ) 1 E
N N
r ei(ψ−θi ) =
. e , r sin(ψ − θi ) = sin(θj − θi ) ,
N N
j =1 j =1

retaining in the second step the imaginary component. Inserting the second expres-
sion into the governing equation (9.5), we find

KE
θ̇i = ωi +
. sin(θj − θi ) = ωi + Kr sin(ψ − θi ) . (9.13)
N
j

The motion of individual oscillators .i = 1, . . . , N is coupled to the other oscillators


exclusively through the mean-field phase .ψ, with a coupling strength that is
proportional to the mean-field amplitude r.

The individual phases .θi are drawn towards the self-consistently determined
mean phase .ψ, as can be seen in the numerical simulations presented in Fig. 9.2.
Mean-field theory is exact for the Kuramoto model. It is nevertheless non-trivial to
solve, as the self-consistency condition (9.12) needs to be fulfilled.
332 9 Synchronization Phenomena

Fig. 9.2 Spontaneous


synchronization in a network
of limit cycle oscillators with
distributed individual
frequencies. Color coding
with slowest–fastest
(red–violet) natural
frequencies. With respect to
(9.5), an additional
distribution of individual radii
.ri (t) has been assumed.
These are slowly relaxing.
Also shown
E is the mean field
.re
iψ = r e iθi /N (asterix),
i i
compare (9.12). Reprinted
from Strogatz (2001) with
permissions, © 2001
Macmillan Magazines Ltd

Rotating Frame of Reference The order parameter .reiψ performs a free rotation
for long times .t → ∞ in the thermodynamic limit. With

.r(t) → r, ψ(t) → Ωt, N → ∞,

and one can transform via

θi → θi + ψ = θi + Ωt,
. θ̇i → θi + Ω, ωi → ωi + Ω

to the rotating frame of reference. The governing equation (9.13) then becomes

θ̇i = ωi − Kr sin(θi ) .
. (9.14)

This expression is identical to the one for the case of two coupled oscillators, see
(9.8), when substituting Kr by K. It then follows directly that .ωi = Kr constitutes
a special point.

Drifting and Locked Components Eq. (9.14) has a fixpoint .θi∗ for which .θ̇i = 0,
as determined by
[ π π]
Kr sin(θi∗ ) = ωi ,
. |ωi | < Kr, θi∗ ∈ − , . (9.15)
2 2

The interval for .θi∗ is selected such that .sin(θi∗ ) ∈ [−1, 1].
9.2 Coupled Oscillators and the Kuramoto Model 333

drifting locked drifing

−Kr 0 Kr ω
Fig. 9.3 The region of locked and drifting natural frequencies .ω within the Kuramoto model

– LOCKED UNITS
As we are working in the rotating frame of reference, .θ̇i = 0 means that the
participating limit cycles oscillate for .|ωi | < Kr with the average frequency .ψ,
they are “locked” to .ψ.
– DRIFTING UNITS
For .|ωi | > Kr, the participating limit cycle drifts, i.e. .θ̇i never vanishes. They do,
however, slow down when they approach the locked oscillators, compare (9.14).

The distinct dynamics of locked and drifting units is illustrated in Figs. 9.1, 9.2, and
9.3.

Stationary Frequency Distribution We denote with

ρ(θ, ω) dθ
.

the fraction of oscillators with natural frequency .ω that lie between .θ and .θ + dθ .
Of relevance for the following will be the continuity equation for .ρ = ρ(θ, ω),

∂ρ ∂ ( )
. + ρ θ̇ = 0 ,
∂t ∂θ

where .ρ θ̇ is a current density, it generalizes the generic expression “density times


velocity”. The frequency distribution .ρ(θ, ω) will be inversely proportional to the
speed

θ̇ = ω − Kr sin(θ )
.

in the stationary case, when .ρ̇ = 0. The individual oscillators pile up at slow places,
thinning out at fast places on the circle. Hence
f π
C
ρ(θ, ω) =
. , ρ(θ, ω) dθ = 1 , (9.16)
|ω − Kr sin(θ )| −π

for .ω > 0, where C is an appropriate normalization constant.


334 9 Synchronization Phenomena

Formulation of the Self-Consistency Condition We rewrite the self-consistency


condition (9.12) for the molecular field as

<eiθ > = <eiθ >locked + <eiθ >drifting = r eiψ ≡ r ,


. (9.17)

where the brackets .<·> denote population averages. In the last step we used the fact
that one can set the average phase .ψ to zero.

Locked Contribution The locked contribution to (9.17) is


f f
Kr ∗ (ω)
Kr ( )
<eiθ >locked =
. eiθ g(ω) dω = cos (θ ∗ (ω) g(ω) dω ,
−Kr −Kr

where we have assumed .g(ω) = g(−ω) for the distribution .g(ω) of the natural
frequencies within the rotating frame of reference. Using (9.15),

. ω = Kr sin(θi ), dω = Kr cos(θ ∗ )dθ ∗ ,

we obtain
f π/2
<eiθ >locked =
. cos(θ ∗ ) g(Kr sin θ ∗ ) Kr cos(θ ∗ ) dθ ∗ (9.18)
−π/2
f π/2
= Kr cos2 (θ ∗ )g(Kr sin θ ∗ ) dθ ∗ .
−π/2

We will make use of this expression further on.

Drifting Contribution The drifting contribution to the order parameter


f π f
<e >drifting =
.

dθ dω eiθ ρ(θ, ω)g(ω) = 0
−π |ω|>Kr

vanishes. Physically this is clear: oscillators that are not locked to the mean field
cannot contribute to the order parameter. Mathematically, it follows from .g(ω) =
g(−ω), together with .ρ(θ + π, −ω) = ρ(θ, ω) and .ei(θ+π ) = −eiθ .

Critical Coupling With the drifting component vanishing, the population average
<eiθ > of the order parameter,
.

f π
2
r = <eiθ > ≡ <eiθ >locked = Kr
. cos2 (θ ) g(Kr sin θ ) dθ . (9.19)
− π2
9.2 Coupled Oscillators and the Kuramoto Model 335

Fig. 9.4 The solution (9.22)


for the order parameter in the 1
Kuramoto√ model,
.r = R0 1 − Kc /K, here
with .R0 = 1
r

0
Kc K

is given by the locked contribution (9.18). We simplified notation, using .θ ∗ → θ .


When the coupling strength K is small, below a certain critical value .Kc , only the
trivial solution .r = 0 exists. Above, for .K > Kc , a finite order parameter .r > 0 is
stabilized, as illustrated in Fig. 9.4.

The critical coupling strength .Kc is obtained considering the limes .r → 0+ in


(9.19),
f
π 2
.1 = Kc g(0) cos2 (θ ) dθ = Kc g(0) , Kc = , (9.20)
π
|θ|< 2 2 πg(0)

where .limr→0 g(Kr sin θ ) = g(0) has been used.2

Expansion around Criticality For the functional dependence of the order r around
K = Kc we expand (9.19) with respect to .r << 1,
.

f [ ]
g '' (0) ( )2
1=K
. dθ cos2 (θ ) g(0) + Kr sin θ
|θ|< π2 2
π π
= K g(0) + K 3 r 2 g '' (0) (9.21)
2 2·8
which holds in light of our previous assumption, namely that integrals antisymmet-
ric in .θ vanish.3 Multiplying with .Kc /K, we rewrite (9.21) as

Kc r2 1 K 2 Kc |g '' (0)|
1−
. = 2, = ,
K R0 R02 16/π

f
2 Note,that . dx cos2 (x) = [cos(x) sin(x) + x]/2, modulo a constant.
f
3 One has . dx cos2 (x) sin2 (x) = x/8 − sin(4x)/32, plus an integration constant.
336 9 Synchronization Phenomena

where we assumed .g '' (0) < 0, namely that the frequency distribution is locally
maximal. Expanding .R0 = R0 (K) into powers of .K − Kc would lead to higher-
order corrections, one can hence set .R0 = R0 (Kc ). Together, we find
/
Kc 2
r = R0 1 −
. , Kc = , (9.22)
K πg(0)

as illustrated in Fig. 9.4.



– One observes sqrt-scaling, namely that .r ∼ K − Kc , which is the classical
mean-field result.4
– For large K, the order parameter .r = <eiθ > will approach unity, .limK→∞ r = 1.
For (9.22), the limit .K → ∞ would be recovered for .R0 = 1, generally this is
however not the case.

Physics of Rhythmic Applause A well known application of the Kuramoto model


concerns synchronization of the clapping of an audience after a performance, which
happens when everybody claps at a slow frequency and in tact. In this case the
distribution of “natural clapping frequencies” is quite narrow, and .K > Kc ∝
1/g(0).

When an individual wants to express particular satisfaction with the performance


he/she increases the clapping frequency by about a factor of two, as measured
experimentally. This increases the noise level, which depends primarily on the
clapping frequency. Measurements have shown, see Fig. 9.5, that the distribution
of natural clapping frequencies is broader when the clapping is fast. The resulting
drop in .g(0) induces .K < Kc ∝ 1/g(0), which implies in turn that synchronized
clapping is dynamically unstable when the applause is intense.

Chimera States The original chimera, the one from Greek mythology, was a
hybrid animal, part lion, goat and snake. In the world of synchronization phenoma,
a “chimera” is a hybrid state, supporting both synchronous and asynchronous
behaviors in networks of identical coupled oscillators. A prototypical system is
E
θ̇i = ω +
. wij sin(θj − θi − α) , (9.23)
j /=i

where the .wij are coupling matrix elements of varying strength. Translation invari-
ance, namely that .wij = wi−j , is allowed, e.g. for units that are distributed regularly
in real space. Of relevance is a non-zero phase lag .α, which induces reciprocal
frustration. For large numbers N of participating units, strictly speaking when
.N → ∞, partially synchronized states are stabilized for possibly exponentially

4 More about critical scaling in Sect. 6.1 of Chap. 6.


9.3 Synchronization in the Presence of Time Delays 337

Fig. 9.5 Normalized 5


distribution for the
frequencies of the clappings
4

clapping intensity
of 100 individuals. Data from
Néda et al. (2000a)
3

0
1 1.5 2 2.5 3 3.5 4
clapping frequency [Hz]

large transients. This is somewhat surprising, given that the natural frequencies are
all identical.

9.3 Synchronization in the Presence of Time Delays

Synchronization phenomena need the exchange of signals from one subsystem to


another. Real-world information exchange typically needs a certain time, which
introduces time delays that become important when they are comparable to the
intrinsic time scales of the individual subsystems.5 Here we discuss the effect of
time delays on the synchronization process.

Kuramoto Model with Time Delays Two limit-cycle oscillators, coupled via a
time delay T are described by

K [ ]
θ̇1 (t) = ω1 +
. sin θ2 (t − T ) − θ1 (t) ,
2
K [ ]
θ̇2 (t) = ω2 + sin θ1 (t − T ) − θ2 (t) .
2
In the steady state,

θ1 (t) = ω t,
. θ2 (t) = ω t + Δθ ∗ , (9.24)

5 Anintroduction into the intricacies of time-delayed dynamical systems is given in Sect. 2.5 of
Chap. 2.
338 9 Synchronization Phenomena

2 ω
1-0.9*sin(ω) 5
1-0.9*sin(6ω)
1.5
4
1 1

0.5

3
0
0 0.5 1 1.5
2
ω

Fig. 9.6 Left: Graphical solution of the self-consistency condition (9.27), for time delays .T =
1/6 (shaded/full line), having respectively one/three intersections (filled circles) with the diagonal
(dashed line). The coupling constant is .K = 1.8. Right: An example of a directed ring, containing
five sites

there is a synchronous oscillation with a yet to be determined locking frequency .ω


and phase slip .Δθ ∗ . Using .sin(α + β) = sin(α) cos(β) + cos(α) sin(β), we find

K[ ]
ω = ω1 +
. − sin(ωT ) cos(Δθ ∗ ) + cos(ωT ) sin(Δθ ∗ ) , (9.25)
2
K[ ]
ω = ω2 + − sin(ωT ) cos(Δθ ∗ ) − cos(ωT ) sin(Δθ ∗ ) .
2
Taking the difference leads to

Δω = ω2 − ω1 = K sin(Δθ ∗ ) cos(ωT ) ,
. (9.26)

which generalizes (9.9) to the case of a finite time delay T . Together, (9.25) and
(9.26) determine the locking frequency .ω and the phase slip .Δθ ∗ .

Multiple Synchronization Frequencies For finite time delays T , there are gener-
ally more than one solution for the synchronization frequency .ω. For concreteness,
we consider
K
ω1 = ω2 ≡ 1,
. Δθ ∗ ≡ 0, ω =1− sin(ωT ) , (9.27)
2
compare (9.26) and (9.25). This equation can be solved graphically, as shown in
Fig. 9.6.

For .T → 0 the two oscillators are phase locked, oscillating with the original
natural frequency .ω = 1. A finite time delay leads to a change of the synchronization
frequency and, eventually, for large enough time delay T and couplings K,
to multiple solutions for the locking frequency, with every second intersection
9.4 Synchronization Mechanisms 339

shown in Fig. 9.6 being stable/unstable. The introduction of time delays induces
consequently qualitative changes regarding the structuring of phase space.

Rings of Delayed-Coupled Oscillators An example illustrating the complexity


potential arising from delayed couplings is given by rings of N oscillators, as
illustrated in Fig. 9.6. Unidirectional couplings correspond to
[ ]
θ̇j = ωj + K sin θj −1 (t − T ) − θj (t) ,
. j = 1, . . . , N , (9.28)

where periodic boundary conditions are implied, namely that .θN +1 ≡ θ1 . Special-
izing to the uniform case .ωj ≡ 1, the network becomes invariant under discrete
rotations, which allows for plane-wave solutions with frequency .ω and momentum
k,6


θj = ω t − k j,
. k = nk , nk = 0, . . . , N − 1 , (9.29)
N
where .j = 1, . . . , N . With this ansatz, the locking frequency .ω is determined by
the self-consistency condition

ω = 1 + K sin(k − ωT ) .
. (9.30)

In analogy to (9.27), a set of solutions with distinct frequencies .ω can be found for
a given momentum k. The resulting dynamics is characterized by complex spatio-
temporal symmetries, in term of the constituent units .θj (t), which oscillate fully in
phase only for vanishing momentum .k → 0.

The plane-wave ansatz (9.29) describes phase-locked oscillations. Additional


solutions with drifting units cannot be excluded, they may show up in numerical
simulations. It is important to remember in this context that initial conditions in the
entire interval .t ∈ [−T , 0] need to be provided.7

9.4 Synchronization Mechanisms

The synchronization of limit cycle oscillators discussed in Sect. 9.2 is mediated by a


molecular field, viz by an averaged quantity. Averaging plays a central role in many
synchronization processes and may act both on a local basis and on a global level.

6 In the complex plane, .ψj (t) = eiθj (t) = ei(ωt−kj ) corresponds to a plane wave on a periodic
ring. Eq. (9.28) is then equivalent to the phase evolution of the wavefunction .ψj (t). The system
is invariant under translations .j → j + 1, which implies that the discrete momentum k is a good
quantum number, in the jargon of quantum mechanics. The periodic boundary condition .ψj +N =
ψj is satisfied for .k = 2π nk /N .
7 Initial functions for time delay systems are discussed in Sect. 2.5 of Chap. 2.
340 9 Synchronization Phenomena

Alternatively, synchronization can be driven by the casual influence of temporally


well defined events, a route to synchronization we will discuss in Sect. 9.4.2.

9.4.1 Aggregate Averaging

The coupling term of the Kuramoto model, see (9.6), contains differences .θi − θj
in the respective dynamical variables .θi and .θj . With an appropriate sign of the
coupling constant, this coupling corresponds to a driving force towards the mean,

θ1 + θ2 θ1 + θ2
θ1 →
. , θ2 → ,
2 2
which competes with the time development of the individual oscillators whenever
the respective natural frequencies .ωi and .ωj are distinct. On their own, the
individual units would rotate with different angular velocities. As carried out in
Sect. 9.2, a detailed analysis is necessary when studying this competition between
the synchronizing effect of the coupling and the de-synchronizing influence of a
non-trivial natural frequency distribution.

Aggregate Variables Generalizing above considerations, we consider a generic set


of dynamical variables .xi . When isolated the evolution rule .ẋi = fi (xi ) holds. The
geometry of the couplings will be determined by the normalized weighted adjacency
matrix,8
E
.Aij , Aij = 1 .
j

The matrix elements are .Aij > 0 if the units i and j are coupled, and zero otherwise,
with .Aij representing the relative weight of the link. We define now aggregate
variables .x̄i = x̄i (t) by
E
x̄i = (1 − κi )xi + κi
. Aij xj , (9.31)
j

where .κi ∈ [0, 1] is a local coupling strength. The aggregate variables


E .x̄i correspond

to a superposition of .xi with the weighted mean activity . j Aij xj of all its
neighbors.

8 The connection of the adjacency matrix to the graph spectrum is discussed in Sect. 1.2 of Chap. 1.
9.4 Synchronization Mechanisms 341

Coupling via Aggregate Averaging A general class of dynamical networks can be


formulated in terms of aggregate variables, through

ẋi = fi (x̄i ),
. i = 1, . . . , N , (9.32)

with the .x̄i given by (9.31). The .fi describe local dynamical systems which could
be, e.g., harmonic oscillators, relaxation oscillators, or chaotic systems.

Expansion Around the Synchronized State In order to expand (9.32) around the
globally synchronized state, we first rewrite the aggregate variables as
E
x̄i = (1 − κi )xi + κi
. Aij (xj − xi + xi ) (9.33)
j
( E ) E
= xi 1 − κi + κi Aij + κi Aij (xj − xi )
j j
E
= xi + κi Aij (xj − xi ) ,
j

E
where we have used the normalization . j Aij = 1. The differences in activities
.xj − xi are small close to the synchronized state,

E
fi (x̄i ) ≈ fi (xi ) + fi' (xi ) κi
. Aij (xj − xi ) . (9.34)
j

Differential couplings .∼ (xj − xi ) between the nodes of the network are hence
equivalent, close to synchronization, to the aggregate averaging of the local
dynamics via the respective .x̄i .

General Coupling Functions One may go one step further and define with
E
ẋi = fi (xi ) + hi (xi )
. gij (xj − xi ) (9.35)
j

a general system of .i = 1, . . . , N dynamical units interacting via coupling functions


gij (xj −xi ), which are respectively modulated through the .hi (xi ). Expanding (9.35)
.

close to the synchronized orbit yields


E
ẋi ≈ fi (xi ) + hi (xi )
. gij' (0)(xj − xi ), hi (xi )gij' (0) =
ˆ fi' (xi )κi Aij .
j

The equivalence of .hi (xi )gij' (0) and .fi' (xi )κi Aij , compare (9.34), is only local in
time, which is however sufficient for a local stability analysis. The synchronized
state of the system with differential couplings, see (9.35), is hence locally stable
342 9 Synchronization Phenomena

whenever the corresponding system with aggregate couplings is equally stable


against perturbations, compare (9.32).

Synchronization via Aggregated Averaging The equivalence of (9.32) and (9.35)


tells us that the driving forces leading to synchronization are aggregated averaging
processes of neighboring dynamical variables.

Till now we considered exclusively globally synchronized states. Synchroniza-


tion processes may lead however also to more complicated states, we will mention
alternative possibilities. For all cases, above discussion concerning the role of
aggregate averaging retains its validity.

– We saw, when discussing the Kuramoto model in Sect. 9.2, that generically not
all nodes of a network participate in a synchronization process. For the Kuramoto
model the oscillators with natural frequencies far away from the average do
not lock to the time development of the order parameter, see Fig. 9.3, retaining
drifting trajectories.
– Generically, synchronization takes the form of coherent time evolution with
phase lags, we have seen an example when discussing two coupled oscillators
in Sect. 9.2. The synchronized orbit is then

xi (t) = x(t) + Δxi ,


. Δxi const. ,

which implies that all elements .i = 1, . . . , N are locked.

Partially synchronized states involving an aggregated averaging over particularly


large numbers of distinct units are “chimera states”.9

Stability Analysis of the Synchronized State The stability of a globally synchro-


nized state, .xi (t) = x(t) for .i = 1, . . . , N, can be determined by considering small
perturbations, viz

xi (t) = x(t) + δi ct ,
. |c|t = eλt , (9.36)

where .λ is a Lyapunov exponent. The eigenvector .(δ1 , . . . , δN ) of the perturbation


is determined by the equations of motion linearized at a given point around the
synchronized trajectory. There is one Lyapunov exponent for every eigenvector, N
in all,

λα ,
. (δ1α , . . . , δN
α
), α = 1, . . . , N .

9 Chimera states are shortly discussed on page 336.


9.4 Synchronization Mechanisms 343

One of the exponents characterizes the flow along the synchronized direction. The
synchronized state is stable if all the remaining .λj (.j = 2, . . . , N ) Lyapunov
exponents are negative when averaged over the orbit.

Coupled Logistic Maps As an example for aggregate averaging we consider two


coupled logistic maps,10
( )
xi (t + 1) = r x̄i (t) 1 − x̄i (t) ,
. i = 1, 2, r ∈ [0, 4] , (9.37)

with

x̄1 = (1 − κ)x1 + κx2 ,


. x̄2 = (1 − κ)x2 + κx1 ,

where .κ ∈ [0, 1] is the coupling strength. Using (9.36) as an ansatz we obtain


( ) ( )( )
δ1 ( ) (1 − κ) κ δ1
c
. = r 1 − 2x(t)
δ2 κ (1 − κ) δ2

in linear order in the .δi around the synchronized orbit .x = x̄i = xi , for .i = 1, 2.
The expansion factor c corresponds to an eigenvalues of the Jacobian, the usual
situation for discrete maps. As a result, we find two local pairs of eigenvalues and
eigenvectors, namely

1
c1 = r(1 − 2x)
. (δ1 , δ2 ) = √ (1, 1)
2
1
c2 = r(1 − 2x)(1 − 2κ) (δ1 , δ2 ) = √ (1, −1)
2

corresponding to the respective local Lyapunov exponents, .λ = log |c|,

λ1 = log |r(1 − 2x)|,


. λ2 = log |r(1 − 2x)(1 − 2κ)| . (9.38)

As expected, .λ1 > λ2 , since .λ1 corresponds to a perturbation along the synchronized
orbit. The overall stability of the synchronized trajectory can be examined by
averaging above local Lyapunov exponents over the full time development, which
defines the “maximal Lyapunov exponent”.11

Synchronization of Coupled Chaotic Maps The Lyapunov exponents defined by


(9.38) need to be evaluated numerically. However, one can obtain an lower bound

10 For an illustration of the logistic map see Fig. 2.18 in Chap. 2.


11 Maximal Lyapunov exponents are discussed together with the theory of discrete maps in Chap. 2.
344 9 Synchronization Phenomena

for the coupling strength .κ necessary for stable synchronization by observing that
|1 − 2x| ≤ 1 and hence
.

|c2 | ≤ r|1 − 2κ| .


.

The synchronized orbit is stable for .|c2 | < 1. For .κ ∈ [0, 1/2] we obtain

r −1
|c2 | ≤ r(1 − 2κ),
. κ>
2r
for the lower bound for .κ. For the maximal reproduction rate, .r = 4, synchro-
nization is possible for .3/8 < κ ≤ 1/2. Given that the logistic map is chaotic for
.r > r∞ ≈ 3.57, this results proves that chaotic coupled systems may synchronize.

Interestingly, synchronization through aggregate averaging is achieved in a single


step for .κ = 1/2. This holds as .x̄1 = x̄2 for .κ = 1/2.

9.4.2 Causal Signaling

The synchronization of limit cycle oscillators discussed in Sect. 9.2 is in general


slow, see Fig. 9.2, as the information between distinct oscillators is exchanged only
indirectly, namely via a molecular field, which is in turn an averaged quantity.
Synchronization may be sustantially faster when local dynamical units influence
each other with precisely timed signals, the route to synchronization discussed here.
Relaxation oscillators, as studied in the next section, have non-uniform cycles,
which implies that the timing of the stimulation of one element by another is
important. This is a characteristic property of real-world neurons, which emit short
pluses, the spikes, along the efferent axon. The corresponding artificial neuron
models are called “integrate-and-fire” neurons. Relaxation oscillators are hence well
suited to study the phenomena of synchronization via causal signaling.

Terman–Wang Oscillators There are many variants of relaxation oscillators rel-


evant for describing integrate-and-fire neurons, starting from the classical Hodgkin-
Huxley equations. We discuss here a particularly transparent dynamical system, as
introduced originally by Terman and Wang, namely

ẋ = f (x) − y + I f (x) = 3x − x 3 + 2
. ( ) ( ). (9.39)
ẏ = e g(x) − y g(x) = α 1 + tanh(x/β)

In this setting, x corresponds in neural terms to the membrane potential, with


I representing an external stimulation of the neural oscillator. The amount of
9.4 Synchronization Mechanisms 345

relaxational state dy/dt=0 excitable state dy/dt=0


8 I>0 8 I<0
dx/dt=0
y4 y4 dx/dt=0
LB RB
PI
0 0

-2 -1 0 1 2 -2 -1 0 1 2
x x
Fig. 9.7 The .ẏ = 0 (red dashed-dotted lines), and .ẋ = 0 (blue full lines) isocline of the Terman–
Wang oscillator (9.39), here for .α = 5, .β = 0.2, .e = 0.1. Left: .I = 0.5, shown is the relaxation
cycle for .e << 1 (dotted line, arrows). Right: .I = −0.5, which has a stable fixpoint, .PI (filled
green dot)

dissipation is given by12

∂ ẋ ∂ ẏ
. + = 3 − 3x 2 − e = 3(1 − x 2 ) − e .
∂x ∂y

For small .e << 1 the system takes up energy for membrane potentials .|x| < 1,
dissipating energy for .|x| > 1.

Fixpoints The fixpoints of (9.39) are determined via

ẋ = 0 y = f (x) + I
.
ẏ = 0 y = g(x)

by the intersection of the two functions .f (x) + I and .g(x), as illustrated in Fig. 9.7.
Hence, there are two parameter regimes.

– For .I ≥ 0 a single unstable fixpoint .(x ∗ , y ∗ ) exists, with .x ∗ = 0.


– For .I < 0 and .|I | large enough,
( two additional ) fixpoints show up, as given by
the crossing of the sigmoid .α 1 + tanh(x/β) with the left branch (LB) of the
cubic .f (x) = 3x − x 3 + 2. One of the new fixpoints is stable.

The stable fixpoint .PI is indicated in Fig. 9.7.

12 The divergence of the flow is equivalent to the relative contraction/expansion of phase space,
.ΔV /V , as discussed in Sect. 3.1.1 of Chap. 3.
346 9 Synchronization Phenomena

6
relaxational excitable state
I>0 4
4 I<0
y(t)
x(t), y(t)
2 y(t)
2

0 0
x(t) x(t)
-2 -2

0 20 40 60 80 0 20 40 60 80
time time

Fig. 9.8 Sample trajectories of .x(t) (blue line) and .y(t) (dashed-dotted line). Results for the
Terman–Wang oscillator (9.39), together with .α = 5, .β = 0.2, .e = 0.1. Left: Spiking behavior
for .I = 0.5, characterized by silent/active phases for negative/positive x. Right: Relaxation to a
stable fixpoint for .I = −0.5

– RELAXATION REGIME
For the .I > 0 the Terman–Wang oscillator relaxes to a periodic solution, see
Fig. 9.7, which is akin to the relaxation oscillation observed for the Van der Pol
oscillator.13
– EXCITABLE STATE
For .I < 0 the system settles into a stable fixpoint, becoming however active
again when an external input shifts I into positive region. The system is said to
be “excitable”.

The presence of an excitable state is a precondition for neuronal models.

Silent and Active Phases In the relaxation regime, the periodic solution jumps
rapidly, for .e << 1, between trajectories that approach closely the right branch (RB),
and the left branch (LB) of the .ẋ = 0 isocline. The times spent respectively on
the two branches may however differ substantially, as indicated in Figs. 9.7 and 9.8.
Indeed, the limit cycle is characterized by two distinct types of flows.

– SILENT PHASE
The LB (.x < 0) of the .ẋ = 0 isocline is comparatively close to the .ẏ = 0
isocline, the system spends therefore considerable time on the LB. This leads to
a “silent phase”, which is equivalent to a refractory period.
– ACTIVE PHASE
The RB (.x > 0) of the .ẋ = 0 isocline is far from the .ẏ = 0 isocline, when
.α >> 1, see Fig. 9.7. The time development .ẏ is hence fast on the RB, which

defines the “active phase”.

13 The van der Pol oscillator, treated in Sect. 3.2 of Chap. 3, has to regimes, corresponding
respectively to small/large adaptive term.
9.4 Synchronization Mechanisms 347

The relative rate of the time development through .ẏ between silent and active phase
is determined by the parameter .α, as defined in (9.39).

Spikes as Activity Bursts In its relaxation phase, the Terman–Wang oscillator can
be considered as a spontaneously spiking neuron, see Fig. 9.8. When .α >> 1, the
active phase undergoes a compressed burst, viz a “spike”.

The differential equation defined in (9.39) are examples of a standard technique


within dynamical systems theory, the coupling of a slow variable to a fast variable,
which results in a separation of time scales. When the slow variable .y = y(t) relaxes
below a certain threshold, see Fig. 9.8, the fast variable .x = x(t) responds rapidly,
resetting on the way the slow variable.

Synchronization via Fast Threshold Modulation Limit cycle oscillators can


synchronize, albeit slowly, via a common molecular field, as discussed in Sect. 9.2.
Substantially faster synchronization can be achieved for networks of interacting
relaxation oscillators via “fast threshold synchronization”.

The idea is simple. Relaxation oscillators are characterized by distinct phases


during their cycle, which we denoted respectively silent and active states. These
two states take functionally distinct roles when assuming that a neural oscillator is
able to influence the threshold I of other units during its active phase, namely by
emitting a short burst of activity, viz by spiking. For our basic equations (9.39), this
concept translates into

. I → I + ΔI, ΔI > 0 ,

such that the second neural oscillator changes from an excitable state to the
oscillating state. This process is illustrated graphically in Fig. 9.9, it corresponds
to a signal sent from the first to the second dynamical unit. In neural terms, when
the first neuron fires, the second neuron follows suit.

Activity Propagation A basic model illustrating the propagation of activity is


given by a line of .i = 1, . . . , N coupled oscillators .(xi (t), yi (t)),

. 1 ⇒ 2 ⇒ 3 ⇒ ...

which are assumed to be initially all in the excitable state, with .Ii ≡ −0.5. Inter-unit
coupling via fast threshold modulation corresponds to

ΔIi (t) = Θ(xi−1 (t)) ,


. (9.40)

where .Θ(x) is the Heaviside step function. That is, we define an oscillator i to be in
its active phase whenever .xi > 0. The resulting dynamics is shown in Fig. 9.10. The
348 9 Synchronization Phenomena

y
dy/dt = 0

CE
8
RB E
o2(t 2)
C 6 o1(t 2)

o2(0)
RB
4

o1(0) LB E
2 o2(t)
o2(t1)
LB o1(t)
o1(t 1)
−2 –1 0 1 2 x

Fig. 9.9 Fast threshold modulation for two excitatory coupled relaxation oscillators, see (9.39),
denoted symbolically as .o1 = o1 (t) and .o2 = o2 (t). When .o1 jumps at .t = t1 from the LB to the
RB, it becomes active. The cubic .ẋ = 0 isocline for .o2 is consequently raised from C to .CE . This
induces .o2 to jump as well from left to right. Note that the jumping from the right branch (.RB and
.RBE ) back to the left branches occurs in the reverse order, .o2 jumps first

chain is driven by setting the first oscillator of the chain into the spiking state for a
certain period of time. All other oscillators spike consecutively in rapid sequence.

9.5 Piecewise Linear Dynamical Systems

For an understanding of a dynamical system of interest, it is often desirable to study


a simplified version. A standard technique involves to approximate non-linearities
in the flow by piecewise linear functions. This technique allows at times to obtain
analytic expressions for the orbits, which is a highly non-trivial outcome. For the
case of limit-cycle oscillators, this approach can be carried out for our reference
system, as defined by (9.39).

Piecewise Linear Terman–Wang Oscillators We approximate the nullclines of


(9.39) as
{ {
1 − x for x > 0 g0 for x > 0
. f (x) → , g(x) → , (9.41)
−1 − x for x < 0 −g0 for x < 0

as illustrated in Fig. 9.11. No fixpoint exists when .g0 > 1. In comparison to the
original functionality shown in Fig. 9.7, the .ẋ = 0 and .ẏ = 0 isoclines are now
discontinuous, but otherwise piecewise linear or constant functions.
9.5 Piecewise Linear Dynamical Systems 349

ΔI1(t)
0
xi(t)
-1

-2
i=1,2,3,4,5
-3
0 50 100 150
time

Fig. 9.10 As an example of synchronization via causal signaling, shown are sample trajectories
(lines) for a line of coupled Terman–Wang oscillators, as defined by (9.39). The relaxation
.xi (t)
oscillators are in excitable states, with .α = 10, .β = 0.2, .e = 0.1 and .I = −0.5. For .t ∈ [20, 100]
a driving current .ΔI1 = 1 is added to the first oscillator (dotted line). In consequence, .x1 starts to
spike, driving the other oscillators one by one via fast threshold modulation

Piecewise Linear Dynamics With .I = 0, the system becomes inversion symmetric.


It is then sufficient to consider with

ẋ = 1 − x − y,
. ẏ = e(g0 − y) (9.42)

positive .x > 0. As a further simplification we set .e = 1. The solution of (9.42) takes


the form
( )
x = x̃ − ỹt e−t + 1 − g0 ,
. y = ỹe−t + g0 , (9.43)

which can be verified by direct differentiation. The trajectory is determined by three


free parameters, .x̃/.ỹ and the yet unkown period T of the limit cycle.

Matching Conditions The individual segments of a piecewise linear system


need to be joined once explicit expressions for the solutions have been found.
The resulting “matching conditions” determine the parameters appearing in the
respective analytic solution.

Starting at .x(0) = 0, the orbit crosses the .x = 0 line the next time after half-a-
period, which is evident from Fig. 9.11, with the y-component changing in between.
The matching conditions are therefore

x(0) = 0 = x(T /2),


. y(T /2) = −y(0) . (9.44)
350 9 Synchronization Phenomena

1 dy/dt=0 1
y(t)

x 0
-1 1

dx/dt=0 x(t)
-1
-1
10 20
time
Fig. 9.11 The piecewise linear Terman–Wang system, as described by (9.41) and (9.42). The
parameters are .g0 = 1.2 and .e = 1. Left: The phase space .(x, y). Shown are the piecewise linear
.ẋ = 0 isocline (blue), the piecewise constant .ẏ = 0 isolcine (red), and the resulting limit cycle
(black). Right: As a function of time, kinks show up when the orbit crosses .x = 0. Compare
Figs. 9.7, and 9.8

The starting condition .x(0) = 0 yields .x̃ = g0 − 1, see (9.43), which leads to
( T ) −T /2
.g0 − 1 = g0 − 1 − ỹ e ,
2
[ ]
2g0 = −ỹ e−T /2 + 1 . (9.45)

As a result one has two condition for two parameters, .ỹ and T .

Limit Cycle Period Given .g0 , the only free model parameter, one can eliminate
either .ỹ or T from (9.45). Doing so, one obtains a self-consistency equation for the
remaining parameter that can be solved numerically.

Alternatively we may ask which .g0 would lead to a given period T . In this case
T is fixed und one eliminates .ỹ. From

2 [ ] [ ]
ỹ =
. 1 − eT /2 (g0 − 1), ỹ 1 + e−T /2 = −2g0
T
one finds
2 [ ][ ]
. − 2g0 = (g0 − 1) 1 − eT /2 1 + e−T /2
T
and hence
2 sinh(T /2)
g0 =
. , (9.46)
2 sinh(T /2) − T
9.6 Synchronization Phenomena in Epidemics 351

a remarkably simple expression. E.g., the orbit shown in Fig. 9.11 has a period of
T ≈ 7.6, which is consistent with .g0 = 1.2,.
.

Slow/Fast Oscillations The advantage of an analytic result is that it allows to study


limiting behaviors. On the positive y-axis, the orbit has to cross .x = 0 in the interval
.[1, g0 ], as evident from Fig. 9.11, given that .y = 1 and .y = g0 mark the upper/lower

border of respectively the .ẋ = 0 and .ẏ = 0 isoclines. From (9.46) one finds

. lim g0 = 1 , (9.47)
T →∞

an expected relation. The oscillatory behavior slows down progressively when .g0 →
1, viz when the orbit needs to squeeze through the closing gap between the two
isoclines. In the opposite limit, when .T → 0, the Taylor expansion .sinh(z) ≈ z +
z3 /6 leads to
/
3 3
.g0 ∼ , T ∼ , T << 1 . (9.48)
T2 g0

The flow accelerates as .1/ g0 when .g0 becomes large.

9.6 Synchronization Phenomena in Epidemics

There are illnesses, like measles, that come and go recurrently. Looking at the
statistics of local measle outbreaks presented in Fig. 9.12, one can observe that
outbreaks may occur in quite regular time intervals within a given city. Interestingly
though, these outbreaks can be either in phase (synchronized) or out of phase
between different cities.
The oscillations in the number of infected persons are definitely not harmonic,
sharing instead many characteristics with relaxation oscillations, which typically
have silent and active phases, compare Sect. 9.4.2.

SIRS Model The reference model for infectious diseases is the SIRS model. It
contains three compartments, in the sense that individuals belong at any point in
time to one of three possible classes.

S : susceptible,
. I : infected,

R : recovered.
352 9 Synchronization Phenomena

weekly measle cases


0.5

0
1

0.5

0
44 46 48 50 52 54 56 58
years
Fig. 9.12 Observation of the number of infected persons in a study on illnesses. Top: Weekly
cases of measle cases in Birmingham and Newcastle (red/blue lines). Bottom: Weekly cases of
measle cases in Cambridge and Norwich (green/brown lines). Data from He and Stone (2003)

The dynamics is governed by the following rules:

– INFECTION PROCESS
With a certain probability, susceptibles pass to the infected state after coming
into contact with one infected individual.
– RECOVERING
Infected individuals recover from the infection after a given period of time, .τI ,
passing to the recovered state.
– IMMUNITY
For a certain period, .τR , recovered individuals are immune, returning to the
susceptible state once when immunity is lost.

These three steps complete the infection circle .S → I → R → S. When .τR → ∞


(lifelong immunity) the model reduces to the SIR-model.

Sum Rule The SIRS model can be implemented both for discrete and for contin-
uous time, we start with the former. The infected phase is normally short, which
allows to use .τI as the unit of time, setting .τI = 1. The recovery time .τR is then a
multiple of .τI = 1. We define with

xt the fraction of infected individuals at time .t = 1, 2, 3, . . ., and with


.

st the percentage of susceptible individuals at time t.


.
9.6 Synchronization Phenomena in Epidemics 353

S S S I R R R S S state
1 2 3 4 5 6 7 8 9 time
Fig. 9.13 Example of the course of an individual infection within the discrete-time SIRS model
with an infection period .τI = 1 and a recovery duration .τR = 3. The number of individuals
recovering at time t is the sum of infected individuals at times .t − 1, .t − 2 and .t − 3, compare
(9.49)

Normalizing the total number of individuals to one, the sum rule

E
τR E
τR
. st = 1 − xt − xt−k = 1 − xt−k (9.49)
k=1 k=0

holds, as the fraction of susceptible individuals is just one minus the number of
infected individuals minus the number of individuals in the recovery state, compare
Fig. 9.13.

Discrete Time SIRS Model We denote with a the rate of transmitting an infection
when there is a contact between an infected individual and a susceptible individual.
Using the recursion relation (9.49) one obtains with
( )
E
τR
xt+1 = axt st = axt 1 −
. xt−k (9.50)
k=0

the discrete-time SIRS model.

Relation to the Logistic Map For .τR = 0, the discrete time SIRS model (9.50)
reduces to the logistic map,

xt+1 = axt (1 − xt ) .
.

The trivial fixpoint .xt ≡ 0 is globally attracting when .a < 1, which implies that the
illness dies out. The non-trivial steady state is

1
.x (1) = 1 − , for 1 < a < 3.
a
At .a = 3 a Hopf bifurcation is observed, beyond an oscillation of period two.
Increasing the growth rate a further, a transition to deterministic chaos takes place.

The SIRS model (9.50) shows somewhat similar behaviors. Due to the memory
terms .xt−k , the resulting oscillations may however depend on the initial condition.
354 9 Synchronization Phenomena

xt+1 = 2.2 xt (1-xt-xt-1-xt-2-xt-3-xt-4-xt-5-xt-6)


xt
0.2

0.1

0
0 20 40 60 80 100
time
Fig. 9.14 Example of a solution to the SIRS model, see (9.50), for .τR = 6. The number of infected
individuals might drop to very low values during the silent phase in between two outbreaks, as most
of the population is first infected and then immunized during an outbreak

For extended immunities, .τR >> τI , features characteristic of relaxation oscillators


are found, compare Fig. 9.14.

Two Coupled Epidemic Centers What happens when two epidemic centers are
weakly coupled? We use

st(1,2) ,
. xt(1,2)

to denote the fraction of susceptible/infected individuals in the respective cities.


Different dynamical couplings are conceivable, via exchange or visits of susceptible
or infected individuals. With
( )
(1) (1) (2)
.x
t+1 = a xt + e xt st(1) ,
( )
(2) (2) (1) (2)
xt+1 = a xt + e xt st , (9.51)

one describes a situation where a small fraction e of infected individuals visits the
respective other center. In addition, one needs to apply the sum rule (9.49) to both
centers.

For .e = 1 there is no distinction between the two centers, which means that the
dynamics described by (9.51) can be merged via .xt = xt(1) +xt(2) and .st = st(1) +st(2) .
A single combined epidemic population remains. The situation is similar to the case
of to coupled logistic maps, as given by (9.37). This is not surprising, since the
coupling term in (9.51) is based on aggregate averaging.
9.6 Synchronization Phenomena in Epidemics 355

0.3 a=2, e=0.005, τR=6, x0(1)=0.01, x0(2)=0


x(i)
t 0.2

0.1

0
0.3 a=2, e=0.100, τR=6, x0(1)=0.01, x0(2)=0
x(i)
t 0.2

0.1

0
0 10 20 30 40 50 60
time
Fig. 9.15 Time evolution of the fraction of infected individuals .x (1) (t) and .x (2) (t) within the SIRS
model, see (9.51), for two epidemic centers .i = 1, 2 with recovery times .τR = 6 and infection rates
.a = 2, see (9.50). For a very weak coupling .e = 0.005 (top) the outbreaks occur out of phase, for
a moderate coupling .e = 0.1 (bottom) in phase

In Phase Versus Out of Phase Synchronization We have seen in Sect. 9.2,


that relaxation oscillators coupled strongly during their active phases synchronize
rapidly. Here the active phase corresponds to an outbreak of the illness, which is
however not pre-given, but a self-organized event. In this sense one implements
with (9.51) a coupling equivalent to a fast threshold modulation, as discussed in
Sect. 9.4.2, albeit based on a self-organized process.

In Fig. 9.15 we present the results from a numerical simulation of the coupled
model, illustrating the typical behavior. We see that the epidemic outbreaks occur
in the SIRS model indeed in phase for moderate to large coupling constants e.
In contrast, for very small couplings e between the two centers of epidemics, the
synchronization phase flips to antiphase. This phenomenon is observed in reality,
compare Fig. 9.12.

Time Scale Separation The reason for the occurrence of out of phase synchroniza-
tion is the emergence of two separate time scales in the limit .tR >> 1 and .e << 1.
A small seed .∼eax (1) s (2) of infections in the second city needs substantial time to
induce a full-scale outbreak, even via exponential growth, when e is too small. But
in order to remain in phase with the current outbreak in the first city the outbreak
occurring in the second city may not lag too far behind. When the dynamics is
symmetric under exchange .1 ↔ 2 the system then settles in antiphase cycles.
356 9 Synchronization Phenomena

9.6.1 Continuous Time SIRS Model

Changing notation slightly, we use S, I , and .R = 1 − I − S for the densities of


susceptible, infected, and recovered.

Continuous Time Limit Substituting the discrete-time approximation for the


derivative,

It+Δt − It
I˙ ≈
. , Δt = 1 ,
Δt
in (9.50) yields

I˙ = aI (1 − I − R) − I = aI S − I ,
. (9.52)

which has a simple interpretation. The number of infected increases with a rate that
is proportional to the infection probability a and to the densities I and S of infected
and susceptibles. The term .−I describes per-time recovering.

Continuous Time SIRS Model For the discrete case, we took the duration of the
illness as the unit of time, as illustrated in Fig. 9.13. Using (9.52) and rescaling time
by .T = 1/λ, one obtains with

Ṡ = −gI S + δR,
. I˙ = gI S − λI, Ṙ = +λI − δR (9.53)

the standard continuous time SIRS model, where .g = λa and .δ = λ/τR . Recall that
τR denotes the average time an individual remains immune, as measured in units of
.

illness duration T .

Endemic Fixpoint The endemic state is defined by the fixpoint condition

λ g−λ λ g−λ
S=
. , I= , R= , (9.54)
g δ+λ δ δ+λ

which follows directly from (9.53). It exists when the infection rate g is larger than
the recovery rate .λ, namely when .g > λ. Otherwise, the illness dies out. One can
show that the endemic state is always stable. The discrete SIRS model supports in
contrast solutions corresponding to periodic outbreaks, as shown in Fig. 9.13.

Analytic Solution of the SIR Model Life-long immunity, which corresponds to


setting .δ = 0 in (9.53), reduces the SIRS model to the SIR model. Dividing the
equation for .I˙ by the one for .Ṡ yields

I˙ dI λ 1
. = = −1 + , (9.55)
Ṡ dS g S
Exercises 357

0.3
g = 3.0 λ
peak
actual cases I g = 2.0 λ
0.2 g = 1.5 λ

0.1

0
0.2 0.4 0.6 0.8 1
all cases X

Fig. 9.16 For the SIR model, the current number of infected I as a function of the cumulative
number of cases .X = 1 − S. The curves are described by the analytic solution (9.56), using
.R0 = 0 = I0 and .S0 = 1. The outbreak starts at .(X, I ) = (0, 0), ending when I vanishes again
(open circles). A sizeable fraction of the population is never infected

which has the solution


( )
λ S(t)
I (t) = 1 − R0 − S(t) +
. log , (9.56)
g S0

where .S0 = S(0), .I0 = I (0) and .R0 = 1 − S0 − I0 denote the starting configuration.
The functional dependence of .I = I (X) is shown in Fig. 9.16, where .X = I + R =
1 − S is the cumulative number of cases. The illness dies out before the entire
population has been infected.

Exercises

(9.1) THE DRIVEN HARMONIC OSCILLATOR


Solve the driven harmonic oscillator, see (9.1), for all times t and compare it
with the long time solution t → ∞, Eqs. (9.3) and (9.4).
(9.2) SELF-SYNCHRONIZATION
Consider an oscillator with time delay feedback,

.θ̇(t) = ω0 + K sin[θ (t − T ) − θ (t)] .

Discuss the self-synchronization in analogy to the stability of the steady-


state solutions of time delay systems.14 Which is the auto-locking frequency
in the limit of strong self-coupling, K → ∞?

14 The methods of time delay systems are laid down in Sect. 2.5 of Chap. 2.
358 9 Synchronization Phenomena

(9.3) KURAMOTO MODEL WITH THREE OSCILLATORS


Consider the Kuramoto model, as defined by (9.5) and (9.6), for three
identical oscillators. Start by showing that the natural frequencies can be
disregarded when ωi ≡ ω. Use relative variables α = θ2 −θ1 and β = θ3 −θ2 ,
for the determination of the fixpoints. Which ones are stable/unstable for
K > 0 and/or K < 0?
(9.4) LYAPUNOV EXPONENTS ALONG SYNCHRONIZED ORBITS
Evaluate, for two coupled oscillators, see (9.7), the longitudinal and the
orthogonal Lyapunov exponents, with respect to the synchronized orbit.
Does the average of the longitudinal Lyapunov exponent over one period
vanish?
(9.5) SYNCHRONIZATION OF CHAOTIC MAPS
The Bernoulli shift map f (x) = ax mod 1 with x ∈ [0, 1] is chaotic for
a > 1. Consider with
( )
x1 (t + 1) = f [1 − κ]x1 (t) + κx2 (t − T )
. ( ) (9.57)
x2 (t + 1) = f [1 − κ]x2 (t) + κx1 (t − T )

two coupled chaotic maps, with κ ∈ [0, 1] being the coupling strength and
T the time delay, compare (9.32). Discuss the stability of the synchronized
states x1 (t) = x2 (t) ≡ x̄(t) for general time delays T . What drives the
synchronization process?
(9.6) TERMAN–WANG OSCILLATOR
Discuss the stability of the fixpoints of the Terman–Wang oscillator, see
(9.39). Linearize the differential equations around the fixpoint solution and
consider the limit β → 0.
(9.7) PULSE COUPLED LEAKY INTEGRATOR NEURONS
The membrane potential x(t) of a leaky-integrator neuron can be thought to
increase over time like
( )
ẋ = γ (S0 − x),
. x(t) = S0 1 − e−γ t . (9.58)

When the membrane potential reaches a certain threshold, say xθ = 1, it


spikes and resets its membrane potential to x → 0 immediately after spiking.
For two pulse coupled leaky integrators xA (t) and xB (t) the spike of one
neuron induces an increase in the membrane potential of the other neuron by
a finite amount e. Discuss and find the limiting behavior for long times.
(9.8) SIRS MODEL
Find the fixpoints xt ≡ x ∗ of the discrete-time SIRS model, see (9.50), for
all τR . As a function of a, study the stability for τR = 0, 1.
(9.9) EPIDEMIOLOGY OF ZOMBIES
If humans and zombies meet, there are only two possible outcomes. Either
the zombie bites the human, or the human kills the zombie, Defining with
α ∈ [0, 1] and 1 − α the respective probabilities, the evolution equations for
References 359

the two population densities H and Z are

Ḣ = −αH Z,
. Ż = (2α − 1)H Z . (9.59)

Find and discuss the analytic solution when starting from initial densities H0
and Z0 = 1 − H0 . Which is the difference to the SIRS model?

Further Reading

The reader may consult a textbook containing examples for synchronization pro-
cesses by Pikovsky et al. (2003), and an informative review of the Kuramoto model
by Pikovsky and Rosenblum (2015), which contains in addition historical annota-
tions. Some of the material discussed in this chapter requires a certain background
in theoretical neuroscience, see e.g. Sterrat et. al (2023). An introductory review to
chimera states is given in Panaggio and Abrams (2015).
We recommend that the interested reader takes a look at some of the original
research literature, such as the exact solution relaxation oscillators, Terman and
Wang (1995), the concept of fast threshold synchronization, Somers and Kopell
(1993), the physics of synchronized clapping, Néda et al. (2000a,b), and synchro-
nization phenomena within the SIRS model of epidemics, He and Stone (2003). For
synchronization with delays see D’Huys et al. (2008).

References
D’Huys, O.,Vicente, R., Erneux, T., Danckaert, J., & Fischer, I. (2008). Synchronization properties
of network motifs: Influence of coupling delay and symmetry. Chaos, 18, 037116.
He, D., & Stone, L. (2003). Spatio-temporal synchronization of recurrent epidemics. Proceedings
of the Royal Society London B, 270, 1519–1526.
Néda, Z., Ravasz, E., Vicsek, T., Brechet, Y., & Barabási, A. L. (2000a). Physics of the rhythmic
applause. Physical Review E, 61, 6987–6992.
Néda, Z., Ravasz, E., Vicsek, T., Brechet, Y., & Barabási, A.L. (2000b). The sound of many hands
clapping. Nature, 403, 849–850.
Panaggio, M. J., & Abrams, D. M. (2015). Chimera states: coexistence of coherence and
incoherence in networks of coupled oscillators. Nonlinearity, 28, R67.
Pikovsky, A., & Rosenblum, B. (2015). Dynamics of globally coupled oscillators: Progress and
perspectives. Chaos, 25, 9.
Pikovsky, A., Rosenblum, M., & Kurths, J. (2003). Synchronization: A universal concept in
nonlinear sciences. Cambridge University Press.
Somers, D., & Kopell, N. (1993). Rapid synchronization through fast threshold modulation.
Biological Cybernetics, 68, 398–407.
Sterratt, D., Graham, B., Gillies, A., Einevoll, G., & Willshaw, D. (2023). Principles of computa-
tional modelling in neuroscience. Cambridge University Press.
Strogatz, S. H. (2001). Exploring complex networks. Nature, 410, 268–276.
Terman, D., & Wang, D. L. (1995) Global competition and local cooperation in a network of neural
oscillators. Physica D, 81, 148–176.
Complexity of Machine Learning
10

Without doubt, the brain is the most complex adaptive system known to humanity,
arguably also a complex system about which we know little. In both respects, the
brain faces increasing competition from machine learning architectures.
We present an introduction to basic neural network and machine learning con-
cepts, with a special focus on the connection to dynamical systems theory. Starting
with point neurons and the XOR problem, the relation between the dynamics of
recurrent networks and random matrix theory will be developed. The somewhat
counter-intuitive notion of continuous numbers of network layers is shown next to
lead to neural differential equations, respectively for information processing and
error backpropagation. Approaches aimed at understanding learning processes in
deep architectures make often use of the infinite-layer limit. As a result, machine
learning can be described by Gaussian processes together with neural tangent
kernels. Finally, the distinction between information processing and information
routing will be discussed, with the latter being the task of the attention mechanism,
the core component of transformer architectures.

10.1 Computation Units

Modern day computation devices and architectures are constructed using large
numbers of local computation units. Examples are transistors and quantum gates for
classical and quantum computers, or artificial neurons for deep learning algorithms.

Dimensionality Reduction Information processing by local computation units


intrinsically involves dimensionality reduction, as illustrated in Fig. 10.1. Per output,
only a single feature of the input data stream can be analyzed. Local units are hence
feature extractors.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 361
C. Gros, Complex and Adaptive Dynamical Systems,
https://doi.org/10.1007/978-3-031-55076-8_10
362 10 Complexity of Machine Learning

Fig. 10.1 Artificial neurons


and other local computation
input output
units receive many inputs,
with the subsequent
processing reducing
dimensionality. Only a few
unit
output streams are produced,
in most cases only a single
one. Computation units are
hence equivalent to local
feature extractors
(many) (few)

Commonly used computation units generate just a single output stream. An


exception are quantum gates, which need to be unitary. Quantum gates have hence
identical numbers of inputs and outputs, normally just two.

Perceptron Unit We start with the classical artificial neuron, denoted here “per-
ceptron unit”. Time is discrete, with individual in- and outputs corresponding to
real numbers, often with restricted domains. Dimensionality reduction is achieved
by taking the scalar product of the input vector .x with a weight vector .w,

y = σ (a(x − b)),
. x = w · x. (10.1)

The output y is generated via a transfer function .σ (z). In machine learning, the
threshold b is defined most of the time with an opposite sign. When the aim is to
study a given set of weights .w, one keeps the gain a as a free parameter. If not, one
can set it to unity, .a → 1, as done in machine learning. Geometrically, the isocline
.x = b,

w·x = b,
. (10.2)

corresponds to a plane in input space. Percetron units act hence as linear classifiers.

Planes divide space into two half-spaces, say .SA and .SB . As feature extractors,
perceptron units output information about whether the input vector is part of either
.SA or .SB .

Geometric Features Spaces When images or visual data are processed, input units
may be associated with specific geometric locations .Ri , usually corresponding to
pixels in a two-dimensional plane. After adaption, the weights .wi of a specific
classifying neuron,

wi (Ri ) → w(ΔRi ),
. ΔRi = Ri − Ry , (10.3)

will be a function of the geometric location .Ri of the afferent neuron, normally
relative to the location .Ry of the classifying neuron.
10.1 Computation Units 363

Geometric feature extraction is the prime task of the visual cortex. Examples are
linear or center-surround contrasts and gratings, both in the black-white and in the
color domain. Features can be highly non-linear geometric objects, classification is
nevertheless linear in the space of input activities.

Transfer Functions The transfer function .σ (z) entering (10.1) comes in a range of
varieties. Two classical variants are
1
σsig (z) =
. , σtanh (z) = tanh(z) , (10.4)
1 + e−z

where the first, the “sigmoidal”, works in the range .σsig ∈ [0, 1]. For the second
we have .σtanh ∈ [−1, 1]. Both transfer functions are monotonically increasing and
bounded, which makes the output a measure for the degree to which the input has
been classified. The infinite-gain limit

⎨ 0.0 x<b
. lim σsig (a(x − b)) = 0.5 x=b (10.5)
a→∞ ⎩
1.0 x>b

leads to binary classification. The same holds for .σtanh .

ReLU Units The transfer function of “rectified linear units”,



0 z<0
σReLU (z) =
. , (10.6)
z z>0

is piecewise linear.1 Negative arguments are shunted. On the positive side, gradients
will not vanish for large arguments, as they do for .σsig and .σtanh , remaining finite
for all .z > 0. This helps to avoid the vanishing gradient problem when propagating
errors backwards, a subject treated further below in Sect. 10.3.

Boltzmann Units So far, deterministic updating in discrete time has been used.
Stochastic dynamics,2

1 with probability p
σBoltz (z) =
. , (10.7)
0 with probability 1 − p

is however also an option. In the context of “Boltzmann machines” one relates the
transition probability p to the difference between the two suitably defined energies,

1 See Sect. 9.5 of Chap. 9, for the general theory of piecewise linear dynamical systems.
2 For the general theory of stochastic dynamical systems see Chap. 3.
364 10 Complexity of Machine Learning

respectively of starting/final states. For

p = p(z) = σsig (z)


. (10.8)

one recovers on average the sigmoidal transfer function.

10.1.1 Structured Units

The computation units discussed so far correspond to “point neurons”, in the sense
that the output depends exclusively on a single aggregate quantity, namely .x = w·x,
respectively .z = a(x − b). A possible extension are detailed biological models
that are characterized by a potentially large number of biophysical compartments.
“Compartmental neurons” are of relevance in the neurosciences. Remaining on an
abstract level, we discuss here several options for internally structured computation
units.

Internal Memories In general there is a tradeoff between increasing either the


number of units or the complexity of the constituent elements. The latter is a viable
option in particular when qualitative new features are added. A prime example is the
introduction of an internal variable acting as a local memory, viz a memory cell. An
influential version in this respect is LSTM, a unit with “long short-term memory”.

When an internal memory is present, the unit needs a mechanism deciding to


which extend past activities influence the new output, together with the updated
value of the memory cell. For a LSTM unit, the corresponding mechanisms are
encoded as gates, respectively for input, output and forgetting. Gating parameters
are adaptable, viz learned during training.

Gated Units Gating occurs when a certain data stream controls the processing of
a second data stream. A basic example is the “gated linear unit”, GLU,

. σGLU (x) = (xw − bw ) σsig (xv − bv ), xw = w · x, xv = v · x , (10.9)

which contains two adaptable weight matrices and thresholds, .w, .v and .bw , .bv . The
output is a linear function of the input vector .x, which is however modulated by
a sigmoidal, .σsig (xv − bv ) ∈ [0, 1]. After training, .w and .v may encode distinct
features. If this happens, the processing of a given feature may proceed only when
the second feature is also present. This type of gating is hence equivalent to a
coincidence detector for self-selected features. It holds as a corollary that a single
GLU unit can express all logical gates, including the XOR gate.3

3 See Exercise (10.1).


10.1 Computation Units 365

AND OR XOR
1 1 1

x2 x2 x2
?
0 0 0
0 x1 1 0 x1 1 0 x1 1

Fig. 10.2 The AND (left) and OR (middle) gates are examples of linearly separable logical gates
(dashed diagonal line). The XOR gate is not linearly separable (right). Filled/open bullets denote
logical true/false

10.1.2 The XOR Problem

The term “artificial intelligence” was coined at a 1956 Dartmouth workshop. Hopes
of rapidly achieving human-level AI were however crushed when it was realized
13 years later that linear classifiers,4 like point neurons, may not perform non-
linear classification tasks, such as representing an XOR gate; see Fig. 10.2. This
was considered unfortunate, given that universal computation hinges on being able
to express all logical gates, inclusively the XOR gate. The “XOR problem”, as it
was dubbed, is thought to have initiated an extended period of disillusionment with
neuronal networks, the “AI winter”.

Hidden Layers In machine learning jargon, whatever is not part of input or output
is “hidden”. A classical solution to the XOR problem is to add a hidden layer.

As an example consider that the weights and thresholds of the two hidden units
are such that they define respectively the planes

x1 + x2 = 0.5,
. x1 + x2 = 1.5 , (10.10)

as illustrated in Fig. 10.3. For binary neurons with .σ (z) = θ (z), where .θ (z) is the
Heaviside step function, the mapping .(x1 , x2 ) → (y1 , y2 ) reads

. (0, 0) → (0, 0) (1, 1) → (1, 1),


(0, 1) → (1, 0) (1, 0) → (1, 0) , (10.11)

which implies that .(0, 1) and .(1, 0) are mapped to the same point in the activity
space of the two hidden-layer units, .(y1 , y2 ). At this step, the XOR configuration
becomes linearly separable.

4 M. Minsky, S. Papert, “Perceptrons: An Introduction to Computational Geometry” (1969).


366 10 Complexity of Machine Learning

1 1
x1 w1 y1
x2 y2
(2)
(1)
x2 w2 y2 0 0
0 x1 1 0 y1 1

Fig. 10.3 Left: Receiving inputs .x1 and .x2 , two hidden layer neurons produce outputs .y1 and .y2 ,
the hidden-layer activity. Specific to the hidden layer are the weight vectors .w1 and .w1 , together
with the respective thresholds (not shown). Middle: In input space .(x1 , x2 ), the classifying planes
(gray dashed lines) of the two hidden neurons; see (10.10). Right: The activity space of the hidden
neurons, .(y1 , y2 ). Shown is the mapping (10.11) for binary neurons. The XOR configuration is
now linearly separable (dashed diagonal line)

Dimensional Embedding Classification becomes simpler when a given problem


is embedded in an enlarged state space. We keep a single point neuron, extending
however the input space via .(x1 , x2 ) → (x1 , x2 , x3 ). As an example we set .x3 =
±Δ, together with

.(0, 0) → (0, 0, +Δ) (1, 1) → (1, 1, +Δ),


(0, 1) → (1, 0, −Δ) (1, 0) → (1, 0, −Δ) . (10.12)

Consequently, the .x3 = 0 plane can be used for the classification of the XOR
gate. Above example is constructed by hand. When using random embeddings,
a common approach, larger embedding dimensions may be necessary. The brain
makes extensive use of dimensional embedding,

Universal Function Approximators Solutions to the XOR problem abound. An


option mentioned further above are gated units, see (10.9), which can do the job
all alone. Neural networks are “Turing complete” when logical gates can be repre-
sented, acting as “universal function approximators”. Given enough units, arbitrary
complex functional dependencies can be modeled with increasing accuracy.

10.2 Recurrent Neural Networks

Point neurons connected via synaptic weights form a network of interconnected


units.5 Networks corresponding to “feedforward nets” are free of loops, “recurrent

5 Generic network theory is developed in Chap. 1.


10.2 Recurrent Neural Networks 367

nets” not. In general, the “membrane potential” .xi of a point neuron is given by

xi = ei +
. wij yj , yi = tanh(ai (xi − bi )) , (10.13)
j

where .ei is an external driving and .wij the entries of the synaptic weight matrix .ŵ.
A .tanh-transfer function has been selected.

Lyapunov Exponents Recurrent nets feed back on themselves, which gives raise
to a trailing memory. In the autonomous case, when .ei ≡ 0, the largest eigenvalue
of .ŵ, the largest “Lyapunov exponent” of the map defined by (10.13), determines if
activities tend to increase or decrease in linear order.6 This issue will be treated in
detail in Sect. 10.2.2.

10.2.1 Random Matrix Theory

The statistics of the entries of a matrix can be used for estimating the corresponding
spectrum of eigenvalues. When the entries are only weakly correlated, as defined
below, random matrix theory applies, providing precise results. Often, matrix
elements can be assumed to be independently distributed, in particular when the
rank of the matrix is large, the typical case for both neuroscience and machine-
learning applications.

Ensemble Average Random matrix theory deals with matrices with elements that
are drawn from a given probability distribution, normally a Gaussian. Drawing
all elements once generates a specific realization. “Ensemble average” denotes the
average over realizations. In practice one is however interested in the spectrum of
a specific matrix. For large matrices the ensemble average can be substituted by an
average of all entries, the venue taken here.

Off-Diagonal Correlations In its basic form, random matrix theory can be applied
only when the entries .wij are statistically fully independent. It holds however also
when the off-diagonal cross-correlation .Γ ,
  
i,j wij − μw wj i − μw
Γ =
.   2
, Γ ∈ [−1, 1] , (10.14)
i,j wij − μw

6 See Chap. 2.
368 10 Complexity of Machine Learning

Γ = -0.6 Γ = 0.0 Γ = 0.6


1 1 1
imaginary axis

imaginary axis

imaginary axis
0 0 0

-1 -1 -1

-1 0 1 -1 0 1 -1 0 1
real axis real axis real axis

Fig. 10.4 The distribution of eigenvalues of an elliptic .400 × 400 matrix. The eigenvalues
(filled dots) in the complex plane are shown for .Γ = −0.6/0.0/0.6 (left/middle/right), as
defined by (10.14). Included are the corresponding analytic results (10.15), which are valid in
the thermodynamic limit (lines)


is non-zero. For a .N × N matrix, we defined the mean .μw = ij wij /N 2 . The
inequality .|Γ | ≤ 1 follows from .(a ± b)2 ≥ 0, or .a 2 + b2 ≥ ∓2ab. Three special
cases are of interest.

– .Γ = 0 : Fully uncorrelated, complex eigenvalues.


– .Γ = 1 : Symmetric matrix with .wij = wj i ; real spectrum.
– .Γ = −1 : Antisymmetric matrix with .wij = −wj i ; imaginary eigenvalues.

One speaks of “elliptic matrices” when .Γ = 0.

Elliptic Random
√ Matrices We set .μw = 0, without loss of generality, and assume
σw = 1/ N for the standard deviation .σw of the matrix elements. Random matrix
.

theory states that the spectrum of the eigenvalues .λ is uniformly distributed within
an ellipse defined by

x2 y2
EΓ = λ = x + iy,
. + ≤1 . (10.15)
(1 + Γ )2 (1 − Γ )2

The half widths are .1 ± Γ respectively along the real and imaginary axis. All
eigenvalues are real/imaginary for .Γ = ±1. In Fig. 10.4 a comparison between
numerical spectra and (10.15) is presented. The agreement would be perfect in the
limit of infinite-large matrices.

Finite Mean For .Γ = 0 the ellipse defined by (10.15) reduces to a circle and one
recovers the “circular law of random matrix theory”, namely that
 the eigenvalues are
uniformly distributed within a circle. When the mean .μw = ij wij /N 2 is finite,
an additional isolated eigenvalue .λμ is generated. For .Γ = 0 one can derive that

σ̃w2 μ̃w σ̃w2


λμ = μ̃w +
. , μw = , σw2 = . (10.16)
μ̃w N N
10.2 Recurrent Neural Networks 369

Fig. 10.5 Recurrent nets are input reservoir output


the centerpiece of reservoir
computing, mapping data
input streams into a
high-dimensional reservoir of
non-linear features. Only the
weights of linear output units
are trained

The additional contribution, .σ̃w2 /μ̃w , can be interpreted as an interaction between


mean and continuum.

Spectral Radius When the eigenvalue spectrum is bounded, there exists a .Rw > 0
such that

.|λi | ≤ Rw , ∀i (10.17)

The “spectral radius” .Rw determines the long-term evolution when applying the
mapping in question repeatedly. For elliptic random matrices with zero mean .Rw =
σ̃w (1 + |Γ |).

Variance √ √ spectral radius .Rw scales with


Control An interesting point is that the
.σ̃w = σw N, which is directly proportional to . N and to the standard deviation
.σw of the matrix elements. Standard learning algorithms, like Hebbian plasticity in

biology or backpropagation in machine learning, treat each synaptic weight however


as an independent adaptable parameter. The variance .σw2 resulting from the learning
process is hence neither optimized nor controlled.

Reservoir Computing Spectral radius regulation is central for “reservoir comput-


ing”, as illustrated in Fig. 10.5. A recurrent net with otherwise fixed synaptic weights
is used to transform the input into a multitude of random non-linear features. Only
the weights of a subsequent linear output layer are adapted during training, typically
for prediction tasks.

10.2.2 Criticality in Recurrent Networks

The brain is autonomously active, which implies that recurrent connections are
functionally important. Does the level of internally produced activities impact the
capability of the brain to process information? We have seen in Chap. 4, that
critical dynamical systems exhibit non-trivial properties. Indeed, it can be shown
370 10 Complexity of Machine Learning

that the long-range correlations characteristic of systems poised on a bifurcation


point improve the processing of incoming data streams. This is captured by the
“critical brain hypothesis”. The quantitative impact of being close to criticality can
be examined via a self-consistent theory for recurrent neural nets. We start with
some preliminaries.

Self-Consistent Variance Theory We have seen that the variance of a random


matrix determines its spectrum (10.15). It is hence clear that self-consistent
approaches to recurrent nets will be on the level of variances.

Starting, we recall elementary relations for two statistically independent random


variables .z1 /.z2 with means .μ1 /.μ2 and standard deviations .σ1 /.σ2 . The variances of
the sum .z1 + z2 and respectively of the product .z1 z2 are given by
 
2
σ1+2
. = σ12 + σ22 , 2
σ1·2 = σ12 + μ21 σ22 + μ22 − μ21 μ22 . (10.18)

The first relation is the standard sum rule for the variance,7 the second relation is
treated in exercise (10.2).

Statistically Independent Units For a network with N units we define with

σ̃w
σw = √ ,
. σx , σy , σext , (10.19)
N

the standard deviations of respectively the synaptic weights, of the membrane


potential x, of the neural activity y, and of the external input e, see (10.13). In
reality, the respective distributions would be specific to the individual unit. The
central approximation used here is that all units are statistically independent, and
otherwise equivalent. It is then sufficient to work with (10.19).

Balanced Activities As a further simplification we consider networks with bal-


anced activity and weight distributions, which implies vanishing means .μ
w , .μx , .μy
and .μext . The definition (10.13) of the membrane potential, .xi = ei + j wij yj ,
leads to

σx2 = σext
.
2
+ Nσw2 σy2 = σext
2
+ σ̃w2 σy2 , (10.20)

where we used (10.18) and (10.19) and that units receive N recurrent inputs. The
variance of the neural activity is determined by

σy2 + μ2y =
. dx tanh2 (x − b) P (x) , (10.21)

7 More about random variables in Chap. 5.


10.2 Recurrent Neural Networks 371

Fig. 10.6 Comparison 1


between .tanh(z) and the tanh(z)
Gaussian transfer functions 2
(+/-)sqrt(1 - exp(-z ))
defined in (10.23)

0
-3 -2 -1 0 1 2 3
z

-1

where .P (x) is the distribution of the membrane potential. We did set .a = 1 and
made use of . (y −μy )2 = y 2 −μ2y . For balanced activities the threshold vanishes,
.b = 0, together with .μy = 0 = μx .

Central Limit Theorem The central limit theorem states that sums of large
numbers of random variables converges to a normal distribution. We can hence
assume that the probability distribution .P (x) of the membrane potential entering
(10.21) is normal distributed with mean .μx and variance .σx2 ,

1
e−(x−μx )
2 /(2σ 2 )
P (x) =
. x , 1= dxP (x) . (10.22)
2π σx2

A last step is needed before the integral in (10.21) can be evaluated analytically.

Gaussian Transfer Function We define with

(z) = 1 − e−z
2
σGauss = ± σGauss
.
2 , 2
σGauss (10.23)

a “Gaussian transfer function”. The comparison between the .tanh-transfer function


and .σGauss presented in Fig. 10.6 shows that both transfer functions are functionally
similar, with identical slopes at .z = 0. One can therefore use one as an approxima-
tion for the other.

Variance of the Neural Activity We use the Gaussian transfer function for a
balanced system, with b, .μx , .μy and .μw vanishing, together with (10.22) in (10.21).
The resulting expression,

1  
dx 1 − e−x e−x /(2σx ) ,
2 2 2
σy2 =
. (10.24)
2π σx2
372 10 Complexity of Machine Learning

leads via

1 σeff 1 1 + 2σx2
dx e−x
2 /(2σ 2 )
1 − σy2 =
. eff , 2
=
2π σx σeff
2 σeff σx2

to
σeff 1 1
1 − σy2 =
. = , 2σx2 = − 1. (10.25)
σx 1 + 2σx2 (1 − σy2 )2

Note that .y 2 < 1 and hence .σy2 < 1.

Variance Self-Consistency For .Γ = 0, the case considered here, one can


substitute .σ̃w by the spectral radius .Rw = σ̃w . Together, (10.20) and (10.25) lead to
the self-consistency condition

 2 1
2 σext
. + Rw σy = 2σx2 =
2 2
− 1,
(1 − σy2 )2

or
 2   2
2Rw
. σy 1 − σy2
2 2
= 1 − 1 + 2σext
2
1 − σy2 , (10.26)

which is quite remarkable. It states that the neural activity, in terms of the variance
σy2 , is determined exclusively by the spectral radius .Rw of the weight matrix,
.
2 .
together with an eventually present external input, .σext

Absorbing Phase Transition The system is autonomous when .σext = 0. In this


case one obtains

. 2Rw σy = 2σy2 + higher powers


2 2

when expanding both sides of (10.26) in powers of .σy2 . Consequently, the system
undergoes an “absorbing phase transition”8 at .Rw = 1.

– The activity dies out for .Rw < 1.


– The activity steadily increases for .Rw > 1, until limited by the non-linearity of
the transfer function.

The state with vanishing neural activity is the final state of all orbits when .Rw < 1,
whatever the starting configuration. It is hence termed “absorbing”. The active state

8 Absorbing phase transitions are treated in depth in Sect. 6.4.1 of Chap. 6.


10.2 Recurrent Neural Networks 373

Fig. 10.7 The solution of the 0.4


variance self-consistency critical
condition (10.26). An 0.3
absorbing phase transition σy absorbing active
occurs at .Rw = 1 in the 0.2
absence of external stimuli, σext = 0.01
0.1
viz when .σext = 0 σext = 0.001
0
0.9 0.95 1 1.05 1.1
spectral radius Rw

is chaotic when performing simulations with a specific weight matrix. This is a


consequence of .Rw > 1, as discussed in Chap. 2.

Criticality vs. External Driving In Fig. 10.7 numerical solutions of (10.26) are
presented. The phenomenology of a second-order phase transition is reproduced,9
2 acting as an external field.
with the input variance .σext

Interestingly, Fig. 10.7 shows that the neural activity is substantial at criticality
Rw = 1 even when external stimuli are comparatively weak. This result raises some
.

doubts with regard to the critical brain hypothesis. Information processing will still
benefit quantitatively from being close to criticality even when the variance .σext 2 of

the data input stream is not negligible small. On a qualitative level, the working
regime of the network does however not change as a function of the spectral radius
in the presence of an external driving.

Feed-Forward Networks Eq. (10.25) can be interpreted as a mapping of .σx to .σy


across a non-linearity. Adding (10.20) one obtains a mapping from .σy,n−1 to .σy,n ,
when adding layer indices, which holds also across a feed-forward layer,

1 1
2
σy,n
. =1− =1−  , (10.27)
1 + 2σx,n
2 1+2 2
σext,n + σ̃w2 σy,n−1
2

2
where .σext,n is an additional input to layer n, if existing. Above expression is valid
when all averages are zero, which would not be the case in a practical application.
The generalization to non-zero .μy , .μx , .μw is straightforward, but a bit cumbersome,
see exercise (10.5).

9 SeeSect. 6.1 of Chap. 6 for an introduction to the Landau theory of phase transitions. The
connection is made in exercise (10.3).
374 10 Complexity of Machine Learning

10.3 Neural Differential Equations

Deep networks correspond to a mapping

f = f(x, ϑ)
.

of an input .x = (x1 , x2 , ..) to an output .f = (f1 , f2 , ..). The parameters .ϑ =


(ϑ1 , ϑ2 , ..) are adaptable, viz determined during training.

Training Data Training data consists of a set of .Nα pairs of inputs .xα and target
outputs .Fα . The overall goal is to minimize a loss function L, e.g.
 2
L=
. Fα − fα , fα = f(xα , ϑ) , (10.28)
α

which can be done by following the gradient, .ϑ̇ ∼ −∇ϑ L. For layered systems, as
the one illustrated in Fig. 10.3, the gradient is evaluated recursively.

Error Backpropagation As a basic deep learning architecture we consider a


system made up of .NL identical layers,

yl = f(xl , ϑ l ),
. xl = yl−1 , (10.29)

where .ϑ l denotes the adaptable parameters of the lth layer, like thresholds and
synaptic weights. In order to simplify the notation, we set .NL = 3. For a given
training pair .(xα , Fα ), the gradient of the loss function with respect to the parameters
.ϑ 3 of the last layer is


∇ϑ 3 L = ∇ϑ 3 f(x3 , ϑ 3 ) · E3 ,
. E3 = (−2) Fα − fα .

We used here the compressed notation .∇A · B ≡ i (∇Ai )Bi . With the chain rule,
we have

∇ϑ 2 L = ∇ϑ 2 f(x2 , ϑ 2 ) · E2 ,
. E2 = ∇x3 f(x3 , ϑ 3 )E3 (10.30)

for the gradient of the loss function with respect to .ϑ 2 . The error .E2 is a linear func-
tion of .E3 , the error signal of the next layer. Errors are hence propagated from top
to down, which can be continued recursively. This is denoted “backpropagation”.

10.3.1 Residual Nets

Layers process their inputs typically via (10.1), which involves a non-linear transfer
function .σ (·). The gradient of .σ (·) vanishes for large inputs when the transfer
10.3 Neural Differential Equations 375

Fig. 10.8 Skip connections


short-circuit the information
processing by one or more
layers. The input .xl is added
to the output .yl of a given
block, .xl+1 = yl + xl

function is limited, as is it the case, e.g., for the sigmoidal. The magnitude of the
error backpropagated via (10.30) decreases hence in magnitude layer by layer, a
phenomenon denoted “vanishing gradient problem”.

Skip Connections A popular workaround for the vanishing gradient problem are
ReLU units, see (10.6), for which the gradient is constant for positive arguments.
An alternative are “skip connections”,

yl = xl + f(xl , ϑ l ),
. xl = yl−1 , (10.31)

which corresponds to add the identity to the forward pass, see Fig. 10.8. Learning
by adapting the parameters .ϑ l has now the task to generate the correct ‘residual’
.yl − xl , hence the name ResNet, “residual net”.

Neural Differential Equation It is tempting, with .xl+1 = yl , to rewrite (10.31) as

dx
xl+1 − xl = f(xl , ϑ l ),
. ≈ f(x, ϑ) , (10.32)
dt

where the layer index l was substituted by a pro forma time t, using .xl → x(t)
and .ϑ l → ϑ(t). Layer time .t ∈ [0, TL ] is continuous, whereas the layer index
.l = 1, 2 . . . was discrete. The “neural differential equation” (10.32) corresponds

hence to a continuous layer representation of a residual network.

Continuous Backpropagation As such, the neural differential equation (10.32)


contains an unknown function, .ϑ(t), which specifies the value of the parameters for
any layer time t. With skip connections, the recursive relation (10.30) for the error
becomes
 dE
El−1 = ∇xl f(xl , ϑ l ) + 1 El ,
. ≈ −∇x f(x, ϑ) · E . (10.33)
dt
376 10 Complexity of Machine Learning

Backpropagated errors tend to increase, given that .∇x f is normally positive. In the
continuous formulation, the error function .E = E(t) is obtained by integrating back
in layer time. When adapting parameters via gradient minimization, .Δϑ = −η∇ϑ L,
one can use (10.30) to obtain

. Δϑ = −η∇ϑ f(x, ϑ) · E , (10.34)

which leads in turn to two interdependent differential equations with respect to layer
time,

ẋ = f(x, ϑ),
. Ė = −∇x f(x, ϑ) · E . (10.35)

In the adiabatic approximation, when the learning rate .η is small, one takes .ϑ as
constant when integrating .x up from .x(0) = xα . Then the error function .E is
evaluated downwards from .E(TL ) = (−2)(Fα − x(TL )). In the end, the parameters
are updated.

It may seem a bit unusual, at first sight, to work with continuously stacked layers,
which is however a viable option. Which one to use, continuous or discrete layers,
is in the end a question of performance.

10.4 Gaussian Processes

In Sect. 10.2.2 we made extensive use of normal distributed random variables.


Gaussian processes also involve normal distributed objects, however in the form of
functions, and not variables. Starting with the basics, we recapitulate “multivariate”
normal distributions, viz the case that more than one degree of freedom is relevant.

10.4.1 Multivariate Gaussians

The probability density for a set of N independent normal distributed random


variables,


N −(xi −μi )2 /(2σ 2 )
e i
P0 (x) =
. (10.36)
i=1 2π σi2

is a special case of

1 1 T Σ −1 (x−μ)
P (x) = √
.
N√
e− 2 (x−μ) (10.37)
2π det(Σ)
10.4 Gaussian Processes 377

where .Σ is a .N × N symmetric matrix denoted “covariance matrix”. The vector of


means is .μ, with .Σ −1 denoting the inverse, .ΣΣ −1 = Σ −1 Σ = 1.

Diagonalization We assume that the covariance matrix is positive definite, with


eigenvalues .σi2 . There exists hence an orthogonal transformation U such that
⎛ ⎞
σ12 0
⎜ ⎟
.U T ΣU = ⎝ ..
. ⎠ ≡ Σ0 , U −1 = U T , (10.38)
0 σN2

with .det(Σ0 ) = i σi2 = det(Σ). The columns of U are the eigenvectors of .Σ. We
recall that the eigenvectors .ei of a non-singular matrix are also eigenvectors of the
inverse,

1
Σ ei = σi2 ei
. ⇔ ei = Σ −1 ei ,
σi2

with the precondition that none of the eigenvalues vanish, viz that .Σ is non-singular.
Equivalently, the relation
⎛ 1

0
⎜ σ12 ⎟
⎜ .. ⎟
U T Σ −1 U = ⎜
. . ⎟ (10.39)
⎝ ⎠
1
0
σN2

holds. For a quick check one multiplies both sides of (10.38) and (10.39).

Normalization The properties of a multivariate Gaussian distribution can be


evaluated using U for transforming the variables,

x = U y,
. det(U ) = 1 . (10.40)

The normalization of .P (x) follows directly,


 1
e−yi /(2σi ) = 1 ,
2
. dx P (x) = dy
i 2π σi2

where we eliminated .μ via an appropriate shift of the origin. It also follows that
(10.36) and (10.37) are identical, modulo an orthogonal variable transformation.
378 10 Complexity of Machine Learning

Covariance Matrix We have to show that .Σ honors its name, namely that the
general definition of the covariance matrix,

 
. (xi − μi )(xj − μj ) = dx (xi − μi )(xj − μj ) P (x) , (10.41)

coincides with .Σij . For statistically independent variables, distributed according to


P0 , see (10.36), the covariance matrix is diagonal,
.

 
. (xi − μi )(xj − μj ) 0 = Σ0 ,

with .Σ0 defined in (10.38). Applying the orthogonal transformation .U T to both


sides maps .P0 (x) to .P (x), and .Σ0 to .Σ, the preposition we wanted to prove.

Regression A central machine learning task is regression, namely to extrapolate or


interpolate training data. The goal is to make an educated guess about something not
yet seen. This can be quantified in particular when assuming that values are drawn
from multivariate Gaussians. We enlarge the set of variables,
     
x1 μ1 Σ11 Σ12
x=
. , μ= , Σ= , (10.42)
x2 μ2 Σ21 Σ22

where the .Σij are matrices, not matrix elements. The probability of observing .x1 ,
the prediction we want to make after having observed .x2 , is denoted as

P (x1 |x2 ) ,
.

reading, “the probability of observing .x1 given .x2 ”.

Bayes Theorem Per definition, the relation

P (x1 , x2 )
P (x1 |x2 )P (x2 ) = P (x1 , x2 ),
. P (x1 |x2 ) = , (10.43)
P (x2 )

holds. It is also denoted “Bayes theorem”, in particular together with the recursive
substitution .P (x1 , x2 ) = P (x2 |x1 )P (x1 ). Given the relations,

1=
. dx1 P (x1 |x2 ), P (x1 ) = dx2 P (x1 , x2 ) ,

one has that .P (x1 |x2 ) ≥ 0 is a properly normalized probability distribution, for
any .x2 , denoted “posterior”. Idem for the “marginal”, .P (x1 ). Posterior distributions
10.4 Gaussian Processes 379

are the object of desire when the information gained from an observation is to be
expressed quantitatively, a process denoted “inference”.10

Variational Inference For practical calculations, the distributions involved in


Bayesian inference need to be parameterized. One speaks of “variational inferences”
when a certain functional form is stipulated together with appropriate variational
parameters. The approach is “variational” when the parameters are determined by
minimizing an information-theoretical objective function.

Multivariate Gaussians are prime candidates for variational inference. One can
show that the posterior .P (x1 |x2 ) is also a multivariate Gaussian, with mean vector
and covariance matrix, .μ1|2 and .Σ1|2 , given by

−1 −1
μ1|2 = μ1 + Σ12 Σ22
. (x2 − μ2 ), Σ1|2 = Σ11 − Σ12 Σ22 Σ21 . (10.44)

The derivation involves skilled manipulations of Gaussian distributions. A motiva-


tion using neural tangent kernels is given in Sect. 10.4.3 One can generalize (10.44)
to the case that observations are noisy and not accurate.

10.4.2 Correlated Stochastic Functions

In machine learning, a key objective is to generate increasingly accurate function


approximations. Say we have a first guess, denoted “prior”, which we want to refine
during training. Being a guess, the prior cannot be accurately defined. Therefore, an
algorithm for modeling stochastically broadened functional dependencies is needed.
That’s what stochastic processes are about.

Smooth Stochastic Functions We denote with f the stochastic function to be


modeled. For every argument x from the support, the function is stochastically
distributed around a mean value, .m(x), e.g. .m(x) = x 2 . The notation

f ∼ Gp (m, k),
. k = k(x, x  ) , (10.45)

defines a Gaussian process for f . The covariance matrix .k(x, x  ) has two tasks.

– .k(x, x) is the variance of the probability distribution .f (x), for a given argument
x.
– .k(x, x  ) makes sure that strong discontinuities are rare, when drawing function
values for .x = x  .

10 More about Bayesian statistics in Sect. 5.1 of Chap. 5.


380 10 Complexity of Machine Learning

multivariate realizations
2
m(x) = x
1 k0 = 0.04, ξx = 0.5
k0 = 0.04, ξx = 0.05
0.5

-1 -0.5 0 0.5 1
function argument x
Fig. 10.9 The Gaussian process defined by (10.48), with a diagonal variance .k0 = 0.04. Shown is
the mean function .m(x) = x 2 (shaded line) and two realizations of the corresponding multivariate
Gaussian distribution, respectively for a correlation length .ξx = 0.5/0.05 (blue/red). The function
estimate is smoother for longer-ranged correlations between the .N = 400 arguments

Estimates are independent for each argument x when the covariance matrix is
diagonal.

Discrete Supports In practice, function estimates are evaluated on a discrete set


of arguments, .xi , as in the example presented in Fig. 10.9. On a given set of N
arguments, .x = (x1 , . . . , xN ), a multivariate Gaussian distribution,

μi = m(xi ),
. Σi,j = k(xi , xj ) , (10.46)

is recovered, where .i, j ∈ [1, N]. For every argument .xi , the function values .fi are
distributed according to (10.37),

1 1 T −1
P (f) = √ N √
. e− 2 (f−μ) Σ (f−μ) (10.47)
2π det(Σ)

where .f = (f1 , . . . , fN ) and with .μ and .Σ given by (10.46). In Fig. 10.9 we


illustrate a quadratic mean function,

m(x) = x 2 ,
. k(x, x  ) = k0 e−|x−x |/ξx , (10.48)

together with an exponentially decaying covariance matrix. The stochastic function


approximation becomes smoother with increasing correlation length .ξx .
10.4 Gaussian Processes 381

Inference The expressions (10.44) for the mean and the covariance matrix of
the posterior hold in analogy for Gaussian processes. We enlarge the number of
arguments and function values as

x = (x1 , x2 ),
. f = (f1 , f2 ) ,

where .x1 is the N -dimensional vector of arguments covering the entire support. The
data vector .x2 , of length .Nd < N , includes all arguments for which training data is
available, with corresponding function values .f2 . Inference consists of predicting .f1 .

In analogy to (10.44), the parameters of the posterior .P (f1 |f2 ) are


−1 −1
μ1|2 = μ1 + Σ12 Σ22
. (f2 − μ2 ), Σ1|2 = Σ11 − Σ12 Σ22 Σ21 . (10.49)

On a formal level, there is a difference regarding the term .f2 − μ2 , as we are


describing the distribution of function values .fi = m(xi ), and not of the arguments.

Radial Basis Functions Correlations decay exponentially for non-critical pro-


cesses, the reason we used an exponentially decaying covariance matrix in (10.48).
Quadratic exponents,
 2 /(2σ 2 )
.m(x) → 0, k(x, x  ) = k0 e−(x−x ) k , (10.50)

are however more covenient for frameworks based on multivariate Gaussians.


Covariance matrices of this type are denoted “radial basis functions”. Also included
in (10.50) is a standard selection, namely a vanishing mean function .m(x). An
alternative would be the average of the training data. When inference is performed,
the mean of the posterior will be updated via (10.49), viz based on the training data.
A non-trivial prior may bias this process.

In Fig. 10.10 an example of inference with Gaussian processes is presented.


The variance .[σ1|2 (x)]2 of the prediction is given by .Σ1|2 (x, x), which has been
included in terms of a confidence interval of two standard deviations. It is evident
that inference with Gaussian processes is much better at interpolating data than
extrapolating. The latter because predictions revert to the prior away from the data,
here to .m(x) = 0. The estimated confidence intervals depend sensibly on the
positioning of the training data.

Intra-Domain Inference As a matter of principle, inference can be performed


between different domains. Out-of domain inference requires however knowledge
about semantic cross-correlations, which cannot be modeled with radial basis
functions, or another generic ansatz for the covariance matrix.
382 10 Complexity of Machine Learning

1
2
x (data)
function values prediction
training data
0.5

-1 -0.5 0 0.5 1
arguments x
Fig. 10.10 Predictions (red line) inferred via (10.44) from a randomly drawn set of data points
(blue dots). The covariance matrix is .k(x, x  ) = exp(−(x − x  )2 /0.2). The function values for the
training data (blue bullets) follow .f2 (x) = x 2 . The mean function .m(x) = 0 vanishes, in contrast
to Fig. 10.9. Also included are .2σ confidence estimators (checkerborad red lines)

Here we work exclusively with intra-domain inference, such as predicting a


function value when observing another, as in Fig. 10.10. The corollary is that
all components of the covariance matrix have identical dependencies on their
arguments.
Intra-domain inference includes the case of predicting the value for an observed
position, viz that .x1 → x2 . When this happens, one would expect that the prediction
concurs with the observation, as it happens for the example shown in Fig. 10.10.
This is easily confirmed analytically when both .x1 and .x2 are scalars, which implies
.Σ11 = Σ12 = Σ21 = Σ22 ≡ k(0) when .x1 = x2 . The predicted mean is then

−1
μ1|2 = μ1 + Σ12 Σ22
. (x2 − μ2 ) → x2 ,

given that .μ1 = m(x1 ) coincides with .μ2 = m(x2 ) when .x1 → x2 . The inferred
mean function value, .μ1|2 , coincides hence with the training value .x2 . This will
not be the case when the data is noisy, which is described by adding .σD2 δij to the
covariance matrix .Σij in (10.46).

10.4.3 Machine Learning with Neural Tangent Kernels

Deep learning architectures typically involve large numbers of computational units.


It is hence natural to assume that Gaussian distributions abound. This is indeed the
case.
10.4 Gaussian Processes 383

Infinite Layer Limit A basic setup is that of a single layer neural network,
 
f =
. vi yi , yi = σ (xi − bi ), xi = wij xjα , (10.51)
i j

where f is a classifying unit, the function to be computed, and .y = (y1 , y2 , ..) the
activity of the hidden layer. The training samples .xα = (x1α , x2α , ..) are specific to
the problem at hand. All internal parameters are taken to be random but frozen, viz
corresponding to a specific realization.

The components of the training vectors, like .xiα and .xjα , are not necessarily
uncorrelated, which implies that the membrane potentials .xi are independent
random variables, but possibly not normal distributed. The same holds for the .yi . As
a sum of independent distributions, f becomes however Gaussian for large hidden
layers. This line of arguments, denoted the “infinite layer limit”, can be extended to
deep architectures.

Learning Correlations Learning is all about extracting information from data


samples. Inputs can be differentiated, viz classified, only when differences can
be quantified in terms of correlations. In leading order this regards two-point
correlations. Given that the output is Gaussian distributed in the infinite-layer limit,
the output f of a deep learning architecture can be modeled in leading order by a
Gaussian process over input space,

f ∼ Gp (m, k),
. m = m(x), k = k(x, x ) , (10.52)

compare (10.45). Mean function and covariance matrix are averages over random
sets .ϑ of parameters,
 
m(x) = f (x; ϑ) ϑ
. (10.53)
 
k(x, x ) = (f (x; ϑ) − m(x)(f (x ; ϑ) − m(x ) ϑ .

The output of the network is Gaussian because the parameters .ϑ act as random
variables, hence the average over .ϑ. The functional dependencies of the covariance
matrix were central for the application of Gaussian processes discussed before. This
is however not a relevant point for the “neural network gaussian process” defined by
(10.53), which serves mainly as a conceptual background.

Supervised Learning During training, the output of the network,

f α = f (xα ; ϑ)
.

is compared to the target .F α associated with the specific training sample .xα .
The set of network parameters .ϑ are adapted by minimizing the loss function
384 10 Complexity of Machine Learning


L=
. α (F
α − f α )2 via gradient descent,

d  
. ϑ =η (∇ϑ f α ) F α − f α , (10.54)
dt α

with .η being the learning rate. The right-hand-side includes the contribution of all
Nα training samples. Training is “supervised” because the target .F α is explicitly
.

provided. For a general input .x, the output .f = f (x; ϑ) evolves as

d  d   
. f = ∇ϑ f · ϑ ≡ η Θ(x, xα ) F α − f α
dt dt α

Θ(x, xα ) = ∇ϑ f (x; ϑ) · ∇ϑ f (xα ; ϑ) (10.55)

when using the gradient update rule (10.54). So far we did not make use of the
infinite layer limit. Equations (10.54) and (10.55) are valid for any least-square
minimization via gradient descent.

Tangent Kernels As a function of its arguments, .xα and .xβ , the “neural tangent
kernel” .Θ(xα , xβ ) is a symmetric matrix. With the matrix elements being given
by a scalar product, the neural tangent kernel is also positive definite.11 For large
networks the tangent kernel is

– constant during training and


– independent of the initial distribution of network parameters.

One says that .Θ(xα , xβ ) is “deterministic”. We do not delve into the proof of these
two prepositions, which involve inductive argumentation tailored to the architecture
at hand. Systems become increasingly over-parametrized when network sizes
diverge, which leads to the emergence of an exponentially large number of optimal
parameter configurations. The probability for the starting parameter configuration to
be close to an optimum will therefore increase steadily with layer width. Individual
parameters need to change just a little when their number diverges.

Training Space Given that the tangent kernels are constant during training, one
can solve the time evolution of .f = f (x; ϑ) explicitly via (10.55). First one needs
to determine the time-dependency of the .f α = f (xα ; ϑ), which is done in the space
of training data,

F = (F 1 , . . . , F Nα ),
. f2 = (f 1 , . . . , f Nα ) ,

11 See exercise (10.7).


10.4 Gaussian Processes 385

where we used the subscript ‘2’ to indicate training inputs, in equivalence to (10.49).
Specifying (10.55) to the training space yields

d β   d 
. f =η Θ(xβ , xα ) F α − f α , f2 = η Θ22 F − f2 , (10.56)
dt α
dt

which constitutes a closed set of equations for .Nα dynamical variables, .f2 . The
projection of the tangent kernel to the training space, .Θ22 , is a symmetric .Nα × Nα
matrix. The solution,

f2 (t) = e−η Θ22 t f2 (0) + 1 − e−η Θ22 t F ,
. (10.57)

holds when .Θ22 is constant. One verifies (10.57) by direct differentiation. The
constant of integration has been selected such that .f2 (t = 0) = f2 (0). Perfect
learning, .f2 (t) → F, is recovered in the limit .t → ∞.

Tangent Kernel Inference We define with .f1 the vector of network outputs to be
predicted and use (10.57) to rewrite (10.55) as

d  
. f1 = η Θ12 F − f2 = η Θ12 e−ηΘ22 t F − f2 (0) ,
dt
where the first and second arguments of .Θ12 are respectively generic inputs and
training data. Starting from .f1 (0) and .f2 (0), direct integration yields
−1  
f1 (t) = f1 (0) + Θ12 Θ22
. 1 − e−ηΘ22 t F − f2 (0) , (10.58)

which is a quite remarkable result.

– The output of the network is fully specified by (10.58), including the entire
training procedure.
– When training is complete, viz for .t → ∞, the predictions obtained using
the infinite layer limit coincide formally with the corresponding expression for
Bayesian inference, Eq. (10.49).

As for Bayesian inference, one has that .f1 → F when predictions are made for
training data. This is seen easily for the case that .f1 spans the entire training space,
which leads to .Θ12 → Θ22 and .f1 (0) → f2 (0).

Note, that the formal equivalence of (10.58) and (10.49) does not imply that the
tangent kernel .Θ(x, x ) and the covariance matrix .k(x, x ) are identical. They are
not. The covariance matrix is defined via (10.41) independently in terms of two-
point correlations of the output .f = f (x, ϑ), in contrast to the definition of the
neural tangent kernel, as given by (10.55). It is however possible, though rather
cumbersome, to express .k(x, x ) in terms of .Θ(x, x ).
386 10 Complexity of Machine Learning

Confidence Intervals The tangent kernel approach generates precisely defined


predictions for inputs .x not seen during training. Given that tangent kernels are
deterministic in the infinite layer limit, this result may seem at odds with the
statement that the output is Gaussian distributed, which holds for the same limit.
Indeed, there is no contradiction.

The tangent kernel predictions (10.58) depend on .f1 (0) and .f2 (0), namely on
the outputs generated before training did start, which are determined in turn by
the starting parameter configuration. Given that .ϑ is drawn randomly, confidence
intervals are obtained by sampling (10.58) with respect to .ϑ, viz by sampling .f1 (0)
and .f2 (0) with respect to the initial set of parameters.

The Linear Paradox Training is “lazy” when parameters hardly change, as in the
infinite layer limit. The first order Taylor expansion,

f (x, ϑ) ≈ f (x, ϑ 0 ) + ∇ϑ f (x, ϑ 0 ) Δϑ,
. Δϑ = ϑ − ϑ 0 , (10.59)

becomes then exact. The “linear model” (10.59) is a bit paradoxical. It is still non-
linear in .x, but linear in parameter space. This is possible as one works in a regime
of a diverging number of parameters, one is hence dealing with overfitting.

10.5 Attention Induced Information Routing

Classical deep learning is about information processing. When data sets become
complex, as for language processing, it may become important to focus processing
resources to subsets of the input stream, the very task of “attention”. As such,
attention has long been studied both in the neurosciences and in psychology. The
first self-contained computational model was however developed from a machine
learning perspective.

Top-Down Attention Signals Classically, attention is mediated in the neuro-


sciences by a top-down signal. An example would be the task to concentrate on red
objects in the visual stream. This can be done f.i. by gain control, viz by modulating
the gain a of the neural transfer mapping .σ (a(x − b)), see (10.1). The gain of
neurons processing red/non-red objects would be increased/decreased.

Another example for neural filtering is biased competition, which involves


mutually inhibiting clusters of neurons. Clusters encoding red objects will dominate
the competition when receiving an additional boost, e.g. in the form of additional
top-down inputs.

The Homunculus Problem From a computational perspective, neuroscience mod-


els like gain control or biased competition have two core shortcomings.
10.5 Attention Induced Information Routing 387

Generally, attention signals are taken to be generated by some unknown process


in higher brain areas, which corresponds on a functional level to presume the
existence of an identity, an “homunculus”, deciding whether or not to pay attention
to a certain type of objects or data. This is a problem, since theories not including the
generation of attention signals are incomplete. Related is the question of how higher
brain areas know about lower-area specializations, f.i. with regard to the color of
objects within the input stream.

Training Attention The homunculus problem is related conceptional to an


assumption often made implicitly in the neurosciences, as well as in psychology.
The view is that advanced cognitive systems first develop an understanding of the
world, via data acquisition and world modeling, with attention being an add-on. In
this view, a cognitive system can pay attention to red objects only once the system
understands concepts like ‘red’ and ‘object’.

The approach taken by machine learning is radically different. Here, all parts
of the system are jointly trained. This includes attention, when incorporated, as in
transformer architectures. The view is that attention can develop in a meaningful
manner only together with representation extractions and world modeling.

10.5.1 Transformer

Functionally, attention corresponds to information routing. Only part of the incom-


ing data is processed further. We have seen that gated units, as described by (10.9),
are capable to regulate the processing on a local level. Attention works in contrast on
higher level, deciding whether somewhat substantial chunks of data are of interest
or not.

Tokenization We have seen previously that it may not be desirable to work


directly with raw data.12 For time series’, this amounts to extract meaningful
symbols, a procedure denoted “tokenization” in the context of machine learning.
For text processing, the most widely used tokens are word equivalents, with token
vocabularies having typically about .5 · 104 entries. Tokens are mapped to high-
dimensional activity states .xi in .R D , e.g. with .D = 29 . Further processing will
benefit if embedding schemes respect semantic relations between tokens.

Pairwise Attention Attention involves pairs of tokens where one token needs to
decide if another token carries relevant information. This decision cannot be carried
out directly on the level of the respective token activities, say .xi and .xj , with the
reason being that activity encodings are constrained by the nature of the overall task.
For a working attention mechanism, tokens therefore need to generate associated

12 See the discussion on time series characterization, Sect. 5.1.4 of Chap. 5.


388 10 Complexity of Machine Learning

objects. Given that attention is inherently asymmetric, at least two additional objects
are necessary, “query” and “key”. In the end, a given token will receive relevant
information not just from a single other token, but from a larger number. A third
object is hence needed for the alignment of the respective representations, “value”.

Query, Key, Value The constituent units of attention-based machine learning


models, such as transformers, are tokens, with high-dimensional activity vectors
.xi . Tokens come with three matrices,

Q̂i ,
. K̂i , V̂i ,

denoted query, key and value. The entries of these matrices are adapted via
backpropagation during training. The three associated query, key and value vectors,

Qi = Q̂i · xi ,
. Ki = K̂i · xi , Vi = V̂i · xi , (10.60)

are activity dependent. In order to decide whether token j carries information that
is relevant to i, the respective queries and keys are compared, viz .Qi with .Kj .

Dot Product Attention As a matter of principle, many methods would be suitable


for comparing query and key vectors, say .Qi with .Kj . Given that the involved query
and key matrices are anyhow adapted during training, the respective scalar product,

eij = Qi · Kj ,
.

is sufficient. For a measure of how relevant j is to i, we need to map .eij to a value


αij ∈ [0, 1], e.g. via
.

1 −βeij  
αij =
. e , Zi = e−βeij , αij = 1 , (10.61)
Zi
j ≤i j ≤i

where .β is an effective inverse temperature. The Boltzmann distribution (10.61) is


denoted “softmax” in machine learning jargon.

Causal Attention In (10.61) attention is paid only to past events .j ≤ i, the


causality condition. Attention is not an all-or-nothing affair, but a graded process,
with weights that are given by the entries .αij of the attention matrix. The output
activity .yi of token i is determined via

yi =
. αij Vj (10.62)
j ≤i
10.5 Attention Induced Information Routing 389

token output

queries
keys =α1 =α2 =α3 =α4

values ⋅α1+ ⋅α2+ ⋅α3+ ⋅α4

tokens Is this a joke ?


Fig. 10.11 Attention in machine learning. Tokens with activities .xi generate query/key/value
vectors as .Qi = Q̂i · xi , .Ki = K̂i · xi and .Vi = V̂i · xi . Causality implies to attend only to
previous words in a sentence. For a given token, here ‘joke’, the scalar product of its query vector
with all previous key vectors generates weights .αi . The sum of the weighted values is the output
of the attention layer. See (10.61) and (10.62)

by the amount of attention paid to what happened before. An illustration is given


in Fig. 10.11. The past becomes irrelevant in the limiting case .αij = δij , which is
included in (10.62).

Transformer The primary task of attention is information routing. Once selected,


information needs also to be processed. For transformer models this is done on a
token-per-token level, by feeding .yi into a perceptron with a single hidden layer.
Transformer layers consist hence of two sublayers, attention and a classical token-
wise neural net.

Linearized Attention Attention is a non-linear process scaling quadratically with


the number of tokens per layer, .NT , the context length. There are .NT2 /2 scalar
products .Qi · Kj to be computed, which are individually quadratic in the respective
token activities, .xi and .xj . Linearized versions of attention forfeit the mapping
(10.61) to a normalized distribution, using instead
  
yi lin =
. Qi · Kj Vj = Qi · Kj ⊗ Vj (10.63)
j ≤i j ≤i

for the new token activity, modulo a normalization factor, where .⊗ denotes the outer
product.
390 10 Complexity of Machine Learning

Recurrent Linear Attention We rewrite (10.63) as


 
yi lin = Qi · Ŝi ,
. Ŝi = γ i−j Kj ⊗ Vj
j ≤i

where we introduced with .γ ∈ [0, 1] an additional decay factor. It plays a role


equivalent to the spectral radius of recurrent networks, see Sect. 10.2.1. Past events
are explicitly discounted when .γ < 1. The update relation

Ŝi+1 = Ki+1 ⊗ Vi+1 + γ Ŝi


. (10.64)

shows that the inference effort scales linearly with context length .NT . This
formulation is also denoted “retentive network”.

One can interpret (10.64) as an abstract recurrent network unfolding in ‘context


time’, with the entries of the key-value matrix .Ŝi corresponding to neuronal
activities. The new network activity .Ŝi+1 is given by the sum of two terms, the
input .Ki+1 ⊗ Vi+1 and the old activity .Ŝi , with the latter being discounted by the
spectral radius .γ of the network.

Exercises

(10.1) GATED LINEAR UNITS


A symmetrized version of (10.9) is

σGLU (x) = tanh(xw − bw ) tanh(xv − bv ) ,


. (10.65)

where σtanh has been used for the transfer function. Show that a single unit
can encode the AND and the XOR gate.
(10.2) VARIANCE OF A PRODUCT
Derive (10.18) for the product of two statistically independent random
variables. Use x = Δx + μx , where Δx = x − μx , equivalently for both
variables.
(10.3) WIGNER SEMI-CIRCULAR LAW
Show that (10.15) reduces to the Wigner semi-cricular law (1.15) for the
density of state of the adjacency matrix of an Erdös–Rényi graph with
coordination number z.
(10.4) LANDAU THEORY FOR ABSORBING PHASE TRANSITIONS
The variance σy2 of the neuronal activity is small close to the critical point
Rw = 1. An effective theory can be obtained by expanding the self-
consistency condition (10.26) with respect to powers of σy2 . Show that the
corresponding expression (6.2) for the Landau theory of phase transitions is
2 is measured in units of σ .
recovered when σext y
References 391

(10.5) LAYER VARIANCE MAPPING


Generalize the mapping (10.27) of the variance of unit activity from one
layer to the next to non-zero means μx , μy and μw .
(10.6) NEURAL DIFFERENTIAL EQUATIONS
Analytically solve the neural differential equations (10.35) for the case of
a single linear unit per layer, with f (x, ϑ) = ϑx. A single training pair
(x α , F α ) is given. Is it consistent with the update rule (10.34) that ϑ is
independent of layer time?
(10.7) DOT PRODUCTS ARE KERNEL FUNCTIONS
Kernels are (semi-) positive definite symmetric functions of two inputs.
Prove that scalar products are kernels.
(10.8) LINEAR ATTENTION
Assume that the elements of the D × D query, key and value matrices
are randomly initialized with zero mean and standard deviation σ0 . Use
(10.63) to evaluate the variance of the activity yi when inputs are random,
having zero mean and standard deviation σx . How should σ0 scale with the
embedding dimension D?

Further Reading

We recommend reviews and textbooks on the notion of brain criticality, Gros


(2021), classical deep learning concepts, Biehl (2023), and the defining features
of attention in psychology and the neurosciences, Lindsay (2020). Of interest to
the reader may be furthermore introductions to Gaussian processes, Williams and
Rasmussen (2006), to random matrix theory, Akjouj et al. (2022), and recurrent
neural architectures, (Yu et al., 2019).
Original literature relevant to the subjects treated concerns gated linear units,
Dauphin et al. (2017), elliptic random matrices, Sommers et al. (2017), and the
notion of flow control, Schubert & Gros. (2019). Neural tangent kernels are treated
in Jacot et al. (2018), neural ordinary differential equations in Chen et al. (2018).
Transformer models were introduced in the seminal ‘Attention is All You Need’
paper Vaswani et al. (2017), retentive networks in Sun et al. (2023).

References
Akjouj, I., et al. (2022). Complex systems in ecology: A guided tour with large Lotka-Volterra
models and random matrices. arXiv:2212.06136.
Biehl, M. (2023). The shallow and the deep: A biased introduction to neural networks and old
school machine learning. University of Groningen Press.
Chen, R. T., Rubanova, Y., Bettencourt, J., & Duvenaud, D. K. (2018). Neural ordinary differential
equations. Advances in Neural Information Processing Systems, 31.
Dauphin, Y. N., Fan, A., Auli, M., & Grangier, D. (2017). Language modeling with gated
convolutional networks. PMLR, 70, 933–941.
392 10 Complexity of Machine Learning

Gros, C. (2021). A Devil’s advocate view on ‘Self-Organized’ brain criticality. Journal of Physics:
Complexity, 2, 2021.
Jacot, A., Gabriel, F., & Hongler, C. (2018). Neural tangent kernel: Convergence and generalization
in neural networks. Advances in Neural Information Processing Systems, 31, 2018.
Lindsay, G. W. (2020). Attention in psychology, neuroscience, and machine learning. Frontiers in
Computational Neuroscience, 14, 29.
Sommers, H. J., Crisanti, A., Sompolinsky, H., & Stein, Y. (1988). Spectrum of large random
asymmetric matrices. Physical Review Letters, 60, 1895.
Schubert, F., & Gros, C. (2021). Local homeostatic regulation of the spectral radius of echo-state
networks. Frontiers in Computational Neuroscience, 15, 587721.
Sun, Y. et al. (2023). Retentive network: A successor to transformer for large language models.
arXiv:2307.08621.
Vaswani, A. et al. (2017). Attention is all you need. In Advances in Neural Information Processing
Systems (vol. 30).
Williams, C. K., & Rasmussen, C. E. (2006). Gaussian processes for machine learning. MIT Press.
Yu, Y., Si, X., Hu, C., & Zhang, J. (2019). A review of recurrent neural networks: LSTM cells and
network architectures. Neural Computation, 31, 1235–1270.
Solutions
11

The nature of the exercises given at the end of the respective chapters varies. Some
exercises are concerned with the extension of formulas and material, others deal
with the investigation of models through computer simulations. Follow the index to
reach the solution to any exercise.

11.1 Solutions to the Exercises of Chap. 1,


Network Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
11.2 Solutions to the Exercises of Chap. 2,
Bifurcations and Chaos in Dynamical Systems . . . . . . 401
11.3 Solutions to the Exercises of Chap. 3,
Dissipation, Noise and Adaptive Systems . . . . . . . . . . . 407
11.4 Solutions to the Exercises of Chap. 4,
Self Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
11.5 Solutions to the Exercises of Chap. 5,
Information Theory of Complex Systems . . . . . . . . . . . 416
11.6 Solutions to the Exercises of Chap. 6,
Self-Organized Criticality . . . . . . . . . . . . . . . . . . . . . . . . . 424
11.7 Solutions to the Exercises of Chap. 7,
Random Boolean Networks . . . . . . . . . . . . . . . . . . . . . . . 430
11.8 Solutions to the Exercises of Chap. 8,
Darwinian Evolution, Hypercycles and
Game Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
11.9 Solutions to the Exercises of Chap. 9,
Synchronization Phenomena. . . . . . . . . . . . . . . . . . . . . . .440
11.10 Solutions to the Exercises of Chap. 10,
Complexity of Machine Learning . . . . . . . . . . . . . . . . . . 446

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 393
C. Gros, Complex and Adaptive Dynamical Systems,
https://doi.org/10.1007/978-3-031-55076-8_11
394 11 Solutions

11.1 Solutions to the Exercises of Chap. 1

(1.1) NETWORK OF CLIQUES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394


(1.2) ENSEMBLE FLUCTUATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
(1.3) CLUSTERING COEFFICIENT AND DEGREE SEQUENCE . . . . . . . 395
(1.4) TSALLIS–PARETO DISTRIBUTION . . . . . . . . . . . . . . . . . . . . . . . . . . 396
(1.5) KRONECKER GRAPHS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
(1.6) SELF-RETRACING PATH APPROXIMATION . . . . . . . . . . . . . . . . . . 397
(1.7) PROBABILITY GENERATING FUNCTIONS . . . . . . . . . . . . . . . . . . . 397
(1.8) CLUSTERING COEFFICIENT OF REGULAR LATTICES . . . . . . . . 398
(1.9) ROBUSTNESS OF GRAPHS AGAINST FOCUSED ATTACKS . . . . . 399
(1.10) EPIDEMIC SPREADING IN SCALE-FREE NETWORKS . . . . . . . . . 400

(1.1) NETWORK OF CLIQUES


The respective networks are shown in Fig. 11.1. The network diameter D is
2 (3) for the network of companies (managers), the average degree 18/6 = 3
(36/9 = 4) and the clustering coefficient 5/12 ≈ 0.42 (163/270 ≈ 0.6). For
comparison, the clustering would have been Crand = z/(N − 1), see (1.3),
for random graphs, viz 3/5 = 0.6 (4/8 = 0.5). The network of managers
contains the respective board compositions as cliques, which explains the
large value of C.
(1.2) ENSEMBLE FLUCTUATIONS
The probability that a given vertex has degree k is provided by a Binomial
distribution, see (1.4). Therefore, the probability that R vertices have degree
k, viz Xk = R, is given by
 
N
P (Xk = R) =
. pkR (1 − pk )N−R . (11.1)
R
N 
Considering the large-N limit with N  R, and R ∼ (N/R)R , we find

(λk )R
.P (Xk = R) = e−λk , λk = Npk = Xk  , (11.2)
R!
viz the binomial distribution. Equation (11.1) reduces for pk  1 to the
Poisson distribution, namely to (11.2), in the thermodynamic limit N → ∞.
11.1 Solutions to the Exercises of Chap. 1 395

(4,5,9) (4,5,6,8)
5
1 4
(1,9) (2,4,6,7) 8
2 6

(1,2,3) (2,3,6)
3 7

Fig. 11.1 A network of cliques (left) and the associated network of nodes (right). Nodes
correspond to managers sitting on the board of several companies, the cliques

(1.3) CLUSTERING COEFFICIENT AND DEGREE SEQUENCE


The expression (1.29) for the clustering coefficient has the form
 2  2
1  1 
.C = kqk = k(k + 1)pk+1
Nz Nz3
k k

where we have used qk = (k + 1)pk+1 /z for the excess degree distribution


qk . We hence find
 2  2
1  1 1 
.C = (k − 1)kpk = (ki − 1)ki , (11.3)
N z3 Nz3 N
k i

when expressing C as a function of the degree sequence {ki }. For a


homogeneous network, with ki = z, for all sites, the formula (11.3) yields
N 2 (z − 1)2 z2 /(N 3 z3 ) ≈ z/N, in agreement with (1.3).
A simple star-shaped network with a single central site has the degree
sequence

k1 = N − 1,
. ki = 1, i = 2, . . . , N ,

with an intensive coordination number z = 2(N − 1)/N ≈ 2, for large


number of sites N. The statistical formula (11.3) for the clustering coefficient
would, on the other hand, diverge like

2
1 (N − 2)(N − 1) N
. ≈ ∼
N23 N 23

in the thermodynamic limit N → ∞. When deriving the expression (11.3)


one did disregard all dependencies of the respective degrees of nearest-
neighbor sites. A large experimental value indicates therefore the presence
of strong inter-site degree correlations.
396 11 Solutions

Fig. 11.2 The Kronecker


graph K = G ⊗ H (right) is A A0 A1
the outer product of G (left)
and H (middle). A pair of
sites in K are linked only if
0
the constituent sites of G and
H are both connected B B0 B1

C C0 C1

(1.4) TSALLIS–PARETO DISTRIBUTION


The mean μ of the Tsallis–Pareto distribution (1.78) was already evaluated
in Sect. 1.1.3 as
X x+1−1 α−1 X
.μ = (α − 1) dx = (1 + x)2−α −1
0 (1 + x)α 2−α 0

∞, α < 2, X → ∞
= (11.4)
1/(α − 2), α > 2, X → ∞

where the normalization 1 = p(x)dx for α > 1 was used. With x 2 =
(x + 1)2 − 2x − 1 we find
X (x + 1)2 α−1 X
x 2  + 2x + 1 = (α − 1)
. dx = (1 + x)3−α
0 (1 + x)α 3−α 0

for the second moment, which simplifies for α > 3 to

α−1 2 2 2 2
.x 2  = − −1 = − = . (11.5)
α−3 α−2 α−3 α−2 (α − 3)(α − 2)

The standard deviation σ 2 is then


α−1
σ 2 = x 2  − x2 =
. . (11.6)
(α − 3)(α − 2)2

(1.5) KRONECKER GRAPHS


An illustrative example of the Kronecker graph K = G⊗H of a 3-site graph
G and a two-site graph H is given in Fig. 11.2. The resulting network K has
3 · 2 = 6 sites.
11.1 Solutions to the Exercises of Chap. 1 397

Denoting by pG (k) and pH (k) the respective degree distributions one can
evaluate the degree distribution of the K as
 ∞
pK (k) =
. δk,l·l pG (l) pH (l ) = dl dl δ(k − l · l ) pG (l) pH (l )
l,l 0

∞ dl G
= p (l) pH (k/ l) ,
0 l

since a given site xij in K is linked to ki ·kj other sites, where ki and kj are the
respective degrees of the constituent sites in G and H . In above derivation we
have used δ(ax) = δ(x)/|a|. The average degree and hence the coordination
number is multiplicative,
 
kK =
. k pK (k) = l · l pG (l) pH (l ) = kG kH .
k l,l

(1.6) SELF-RETRACING PATH APPROXIMATION


This exercise needs a background in Green’s functions. One needs to find
the one-particle Green’s function G(ω) of single particle hopping with
amplitude t = 1 (the entries of the adjacency matrix) on a random lattice.
We denote by

1 1
.G0 (ω) = , G(ω) = ,
ω ω − Σ(ω)

the single-site Green’s function G0 (ω), viz the Green’s function on an


isolated vertex and with Σ(ω) the respective one-particle self-energy.
We may now expand the self-energy in terms of hopping processes, with
the lowest-order process being to hop to the next site and back. Once the next
site has been reached the process can be iterated. We then have

1
Σ(ω) = zG(ω),
. G(ω) = ,
ω − zG(ω)

which is just the starting point for the semi-circle law, Eq. (1.14).
(1.7) PROBABILITY GENERATING FUNCTIONS
We start by 2
 evaluating the variance σK of the distribution pk generated by
G0 (x) = k pk x k , considering
 
G0 (x) =
. k pk x k−1 , G0 (x) = k(k − 1) pk x k−2
k k
398 11 Solutions

together with

G0 (1) = k,
. G0 (1) = k 2  − k .

Hence

σK2 = k 2  − k2 = G0 (1) + k − k2 .


. (11.7)

For a cummulative process generated by GC (x) = GN (G0 (x)) we have

d
. GC (x) = GN (G0 (x)) G0 (x), μC = nk ,
dx
where we have denoted with μC the mean of the cummulative process. For
the variance σC2 we find w

d2  2
. GC (x) = GN (G0 (x)) G0 (x) + GN (G0 (x)) G0 (x) ,
dx 2
which leads with (11.7) to

σC2 = GC (1) + μC − μ2C


.
   
= σN2 − n + n2 k2 + n σK2 − k + k2 + nk − n2 k2

= k2 σN2 + n σK2 .

If pn is deterministic, e.g. if we throw the dice always exactly N times,


then σN2 = 0 and σC2 = NσK2 , in agreement with the law of large numbers,
Eq. (5.10).
(1.8) CLUSTERING COEFFICIENT OF REGULAR LATTICES
One Dimension We start proving (1.60),

3(z − 2)
C=
. ,
4(z − 1)

which holds for the clustering coefficient of a one-dimensional lattice with


coordination number z. In this case, a site is connected to all sites within a
distance of z/2 (included), both to the left and to the right.
The clustering coefficient C is defined by the average fraction of pairs of
neighbors of a vertex, which are also neighbors of each other. Therefore, we
remind ourselves that


z−1
1
. k= z(z − 1)
2
k=1
11.1 Solutions to the Exercises of Chap. 1 399

is the number of pairs of neighbors of a given vertex. Next, we evaluate the


connected pairs of neighbors for a given node. Ordering the neighbors of a
given node linearly, compare Fig. 1.13, in a chain of lengths z + 1, one has
3z/4 starting nodes, each connected to z/2 − 1 sites (in one direction, left or
right), which results in a total of

3z  z 
. −1
4 2
connected pairs of neighbors. Thus the result for C is

1 z z  3(z − 2)
C z(z − 1) = 3
. −1 , C= .
2 4 2 4(z − 1)

General Dimensions The arguments above can be generalized for lattices in


arbitrary dimension d using relatively simple arguments. Consider that we
are now dealing with d coordinate or lattice lines traversing a certain node.
Thus, in order to calculate the cluster coefficient, we confine ourselves to a
one-dimensional subspace substitute z by z/d, the connectivity on the line,
yielding

3(z/d − 2) 3(z − 2d)


Cd =
. = .
4(z/d − 1) 4(z − d)

(1.9) ROBUSTNESS OF GRAPHS AGAINST FOCUSED ATTACKS


Following Sect. 1.4, we define with


 
kc
F0 (x)
F0 (x) =
. pk bk x k = pk x k , F1 (x) =
z
k=0 k=0

the non-normalized generating function for the degree distribution of the


active nodes. The percolation threshold occurs for

1c k
F1 (1) = 1,
. F1 (1) = pk k(k − 1) .
z
k=2

Using the degree distribution for Erdös–Rényi graphs we obtain, see


Fig. 11.3,

c −2 k
k
zk z
pk = e−z
. , ez = z . (11.8)
k! k!
k =0
400 11 Solutions

Fig. 11.3 Critical fraction fc


for Erdös–Rényi graphs to
retain functionality, as a
function of coordination
number z. Inset: Critical
degree kc . See (11.9) and
(11.8) respectively

The network will fall below the percolation threshold when the fraction of
removed vertices exceeds the critical fraction


kc  
1 zkc −1 zkc
fc = 1 −
. pk = 1 − − e−z + , (11.9)
z (kc − 1)! kc !
k=0

where we have used (11.8). The result is plotted in Fig. 11.3. The critical
fraction fc approaches unity in the limit z → ∞.
(1.10) EPIDEMIC SPREADING IN SCALE-FREE NETWORKS
This task has the form of a literature study. One defines with ρk (t) the
probability that an individual having k social connections is ill, viz infected
(i.e. k is the degree). It changes via

ρ̇k (t) = −ρk (t) + λk [1 − ρk ] Θ(λ) .


. (11.10)

The first term on the right-hand-side describes the recovering process, the
second term the infection via the infection rate λ. The infection process is
proportional to the number of social contacts k, to the probability [1 − ρk ]
of having been healthy previously, and to the probability
 kρk (t)pk
Θ(λ) =
.  (11.11)
k j jpj

that a given social contact is with an infected individual, where pk is the


degree distribution. The arguments leading to above functional form for
Θ(λ) are very similar to those for the excess degree distribution qk discussed
in Sect. 1.3.1. Note, that all edges leading to the social contact enter Θ(λ),
whereas only the outgoing edges contributed to qk .
11.2 Solutions to the Exercises of Chap. 2 401

Numerically one can the solve (11.10) and (11.11) self-consistently for
the stationary state ρ̇k = 0 and a given scale-invariant degree distribution
pk , as explained in the reference given. The central question then regards the
existence of a threshold: Does the infection rate λ need to be above a finite
threshold for an outbreak to occur or does an infinitesimal small λ suffice?

11.2 Solutions to the Exercises of Chap. 2

(2.1) JACOBIANS WITH DEGENERATE EIGENVALUES . . . . . . . . . . . . . . . 401


(2.2) CATASTROPHIC SUPERLINEAR GROWTH . . . . . . . . . . . . . . . . . . . . . 401
(2.3) HETEROCLINIC ORBITS AND SYMMETRIES . . . . . . . . . . . . . . . . . . 403
(2.4) THE SUBCRITICAL PITCHFORK TRANSITION . . . . . . . . . . . . . . . . . 403
(2.5) ENERGY CONSERVATION IN LIMIT CYCLES . . . . . . . . . . . . . . . . . . 403
(2.6) MAXIMAL LYAPUNOV EXPONENT . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
(2.7) LYAPUNOV EXPONENTS ALONG PERIODIC ORBITS . . . . . . . . . . . 404
(2.8) GRADIENT DYNAMICAL SYSTEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
(2.9) LIMIT CYCLES IN GRADIENT SYSTEMS . . . . . . . . . . . . . . . . . . . . . . 406
(2.10) DELAY DIFFERENTIAL EQUATIONS . . . . . . . . . . . . . . . . . . . . . . . . . .406
(2.11) PERIOD-3 CYCLES IN THE LOGIST MAP . . . . . . . . . . . . . . . . . . . . . . 406
(2.12) CAR-FOLLOWING MODEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407

(2.1) JACOBIANS WITH DEGENERATE EIGENVALUES


The Jacobian and the eigenvalues of (2.72) are
   
−1 1 1
. , λ1 = −1, λ2 = −r, e1 = ,
0 −r 0

with e1 being the eigenvector for λ1 . In order to determine e2 , we consider the


ansatz
 
a 1
.e2 = , −a + 1 = −ra, a= .
1 1−r

The result is, that e2 aligns more and more with e1 in the limit r → 1 (when
normalized), viz when the two eigenvalues become degenerate.
(2.2) CATASTROPHIC SUPERLINEAR GROWTH
No Damping For γ = 0 in (2.73), the polynomial growth equation

1−β
x 1−β x
ẋ = x β ,
. ẋx −β = 1, =t+ 0
1−β 1−β
402 11 Solutions

has the solution


  1
  1 β−1
1−β 1/(1−β)
x(t) = (1 − β)t
. + x0 = β−1
,
1/x0 − (β − 1)t
(11.12)

where x0 = x(t = 0). The long-time behavior is regular for sublinear growth
β < 1. For superlinear growth, β > 1, one finds instead a singularity ts with
limt→ts x(t) = ∞
1
1/(β − 1) β−1 1 1
x(t) =
. , ts = . (11.13)
ts − t β − 1 x β−1
0

With Damping A finite γ = 1 in (2.73),

ẋ = x β − x = x(x β−1 − 1) .
. (11.14)

may change the flow qualitatively.

– For a superlinear β > 1 one finds x(t) → 0 for all starting x0 ∈ [0, 1],
since then x β=1 < 1, approaching however again a singularity akin
to (11.13) for starting resources x0 > 1.
– For a sublinear β < 1 on finds however x(t) → 0 for all starting x0 , since
the Lyapunov exponent of the fixpoint x ∗ = 1 is β − 1.
– Note, that superlinear growth processes have been described for some
cultural phenomena, like the growth of cities, with exponents β ≈ 1.2
being slightly above unity. See, e.g., Bettencourt et al. (2007).

Sublinear Decay Finally we consider a sublinear decay process,

1−β
x 1−β x
ẋ = −x β ,
. ẋx −β = −1, = −t + 0 ,
1−β 1−β

which has the solution


1−β
1   1 x
. x(t) = (1 − β) 1−β t0 − t 1−β , t0 = 0 . (11.15)
1−β

For β ∈ [0, 1], the resource is fully depleted when t reaches t0 .


11.2 Solutions to the Exercises of Chap. 2 403

Fig. 11.4 Sample trajectories of the system (11.16) for = 0 (left) and = 0.2 (right). Shown
are the stable manifolds (thick green lines), the unstable manifolds (thick blue lines) and the
heteroclinic orbit (thick red line)

(2.3) HETEROCLINIC ORBITS AND SYMMETRIES


The system
 
ẋ = 1 − x 2 ∗ −2x ∗ 0
.  2 J (x ) = (11.16)
ẏ = −y + 1 − x 2 0 −1

has the two fixpoints x∗± = (x ∗ , 0), where x ∗ = ±1. The eigenvectors of the
Jacobian J (x ∗ ) are aligned with the x and the y axis respectively, just as for
the system (2.18), with (−1, 0) being now a saddle and (1, 0) a stable node.
The flow diagram is illustrated in Fig. 11.4, it is invariant when inverting x ↔
(−x). The system contains the y = 0 axis as a mirror line for a vanishing
= 0 and there is a heteroclinic orbit connecting the unstable manifold of x∗−
to one of the stable manifolds of x∗+ .
A finite removes the mirror line y = 0 present at = 0 and destroys
the heteroclinic orbit. The unstable manifold of x∗− is still attracted by x∗+ ,
however now as generic trajectory.
(2.4) THE SUBCRITICAL PITCHFORK TRANSITION
The fastest way to understand the behavior of the subcritical pitchfork
transition,

a x4
.ẋ = ax + x 3 , U (x) = − x 2 − , (11.17)
2 4
is to look at the respective√bifurcation potential U (x), which is illustrated in
Fig. 11.5. The fixpoints ± −a exist for a < 0 and are unstable, the trivial
fixpoint is stable (unstable) for a < 0 (a > 0).
(2.5) ENERGY CONSERVATION IN LIMIT CYCLES t
The numerical integration of the energy change 0 Ė dt for the Taken-
Bogdanov system (2.33) is presented in Fig. 11.6. Energy is continuously
taken up and dissipated even along the limit cycle, but the overall energy
404 11 Solutions

Fig. 11.5 The bifurcation


U(x)
potential U (x) of the
subcritical pitchfork
transition, as defined by
(11.17). The trivial fixpoint
x ∗ = 0 becomes unstable for
x
a>0

balance vanishes once one cycle has been completed. Simple integrator
methods like Euler’s method may not provide adequate numerical precision.
(2.6) MAXIMAL LYAPUNOV EXPONENT
We consider the definition

1 df (n) (x)  
λ(max) = lim
. log , f (n) (x) = f f (n−1) (x) .
n1 n dx

of the maximal Lyapunov exponent (2.55), together with


 
df (n) (x) df f (n−1) (x)   df (n−1) (x)
. = = f f (n−1) (x)
dx dx dx
 (n−2) 
  df f (x)
= f f (n−1) (x) .
dx
We hence obtain

1
n−1
.λ (max)
= lim log f (f (j ) ) , f (0) (x) = x ,
n1 n
j =0

which is just the definition of the time average of the local Lyapunov
exponent (2.54).
(2.7) LYAPUNOVEXPONENTS ALONG PERIODIC ORBITS
A perturbation δϕ of a solution ϕ ∗ (t) of (2.35) evolves in linear order as

d ∗ d d
. ϕ + δϕ = 1 − K cos(ϕ ∗ ) + K sin(ϕ ∗ )δϕ, δϕ = λϕ δϕ ,
dt dt dt

with the longitudinal (local) Lyapunov exponent being λϕ = K sin(ϕ∗).


There are two fixpoints for a supercritical K > 1, compare Fig. 2.13,
with the upper/lower one being stable/unstable, in agreement with λϕ being
positive/negative.
11.2 Solutions to the Exercises of Chap. 2 405

y μ = 0.9 > μc

integrated change in energy


0.2

0.1
1 x

0
0 0.5 1 1.5

x coordinate

Fig. 11.6 Left: The flow for the Taken-Bogdanov system (2.33), for μ = 0.9 > μc . The thick
black line is the limit cycle, compare Fig. 2.12. Right: The integrated energy balance (2.34) along
a trajectory starting at (x, y) = (1, 0.2) (in light green in the left panel), using forth-order Runge-
Kutta integration

For K < Kc the trajectory is periodic and λϕ = K sin(ϕ∗) varies along the
unit circle. The mean exponent,

2π 2π
λT =
. λϕ dϕ = K sin(ϕ)dϕ = 0 ,
0 0

vanishes however. λT is the “global” or “maximal” Lyapunov exponent. This


result is not a coincidence. Two periodic solutions ϕ ∗ (t + t1,2 ) with starting
∗ = ϕ ∗ (t ) will have a separation Δϕ = ϕ ∗ − ϕ ∗ which oscillates
angles ϕ1,2 1,2 2 1
with the same period T , viz

Δϕ(t + T ) = Δϕ(t),
. ϕ ∗ (t + T ) = ϕ ∗ (t) . (11.18)

Linearizing (11.18) results in a vanishing global Lyapunov exponent λT .


(2.8) GRADIENT DYNAMICAL SYSTEMS
For the two dimensional system (11.16),

ẋ = fx (x, y) = 1 − x 2 + g(x, y) ≡ − ∂U∂x


(x,y)
.  2 (x,y) , (11.19)
ẏ = fy (x, y) = −y + 1 − x 2 ≡ − ∂U∂y

to be a gradient system, as defined by (2.42), we need to select the term g(x, y)


such that
∂fx ∂fy
. = = −4 x(1 − x 2 ), g(x, y) = −4 x(1 − x 2 )y .
∂y ∂x
406 11 Solutions

The potential U is then

y2 x3  2
U (x, y) =
. + − x − 1 − x2 y .
2 3
The system (11.19) retains its fixpoints (±1, 0) even in the presence of
g(x, y). The respective Jacobians also remain unchanged, since

∂g ∂g
. =0= .
∂x x=(±1,0) ∂y x=(±1,0)

An additional fixpoint is however generated by g(x, y). Which one is it?


(2.9) LIMIT CYCLES IN GRADIENT SYSTEMS
Let’s denote with x ∗ (t) a closed trajectory with period T and x ∗ (T ) = x ∗ (0).
The flow is given by ẋ = −∇V (x), which leads to

    T dV
. 0 = V x∗ (T ) − V x∗ (0) = dt
0 dt
T   T  2
= ∇V · ẋ dt = − ẋ dt .
0 0

The flow ẋ must hence vanish along any closed trajectory for gradient limit
cycles, which is only possible if this trajectory consists of an isolated fixpoint.
(2.10) DELAY DIFFERENTIAL EQUATIONS
Probing for a harmonic orbit, we have

ẏ(t) = −ay(t) − by(t − T ),


. −iω = −a − beiωT , y(t) = e−iωt .

Taking the real and the imaginary part leads to


a ω
. − = cos(ωT ), = sin(ωT ), a 2 + ω2 = b2 ,
b b

where we have used cos2 + sin2 = 1 in the last step. For a = 0 we obtain
π
ωT =
. + π n, n = 0, ±1, ±2, . . .
2
and b = ±ω for even/odd values of n.
(2.11) PERIOD-3 CYCLES IN THE LOGIST MAP
The 3-fold iterated logistic map is illustrated in Fig. 11.7. At r = r ∗ it is
tangential to the identity, hence the name “tangent transition”. For r slightly
larger than r ∗ there are three pairs of stable/unstable fixpoints. At criticality
their values merge to

x1 ≈ 0.5144,
. x2 ≈ 0.9563, x3 ≈ 0.1510 ,
11.3 Solutions to the Exercises of Chap. 3 407

Fig. 11.7 The functional 1


dependence of the 3-fold
iterated logistic map. A
saddle-node bifurcation √ 0.8
occurs at r = r ∗ = 1 + 2 2.
At this point the iterated map
0.6

g(g(g(x)))
is tangential to the identity
map
0.4

x
0.2 r = 3.80
r = r* = 3.82842
r = 3.86
0
0 0.2 0.4 0.6 0.8 1
x

which can be found from the roots of x = g(g(g(x)))), with g(x) = rx(1−x).
The derivative of the flow,

d
. g(g(g(x)))) = g (x3 )g (x2 )g (x1 ) → 1
dx
becomes neutral, as one can verify numerically. As expected, the Lyapunov
exponent vanishes at criticality.
(2.12) CAR-FOLLOWING MODEL
We consider ẋ = v0 + Δẋ in (2.76), which reads ẍ(t + T ) = α(v(t) − ẋ(t)),
and obtain
d
. Δẋ(t + T ) = −αΔẋ(t), −λe−λT = −α, Δẋ(t) = e−λt .
dt
The steady-state solution is stable for λ > 0, which is the case when both α
and T are positive. It is fun to simulate the model numerically for non-constant
v(t).

11.3 Solutions to the Exercises of Chap. 3

(3.1) SHIFT MAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408


(3.2) FIXPOINTS OF THE LORENZ MODEL . . . . . . . . . . . . . . . . . . . . . . . . . . 408
(3.3) HAUSDORFF DIMENSION OF THE CANTOR SET . . . . . . . . . . . . . . . . 409
(3.4) DRIVEN HARMONIC OSCILLATOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410

(continued)
408 11 Solutions

(3.5) MARKOV CHAIN OF UMBRELLAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410


(3.6) BIASED RANDOM WALK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
(3.7) CONTINUOUS-TIME LOGISTIC EQUATION . . . . . . . . . . . . . . . . . . . . . 411
(3.8) NOISE-INDUCED QUASI-PERIODIC ORBITS . . . . . . . . . . . . . . . . . . . . 412

(3.1) SHIFT MAP  


Transformation With x = 1 − cos(π θ ) /2 we find

4x(1 − x) = (1 − cos(π θ ))(1 + cos(π θ )) = 1 − cos2 (π θ )) = sin2 (π θ ))


.

for the logistic map g(x) = rx(1 − x) at r = 4, which takes the form

xn+1 = sin2 (π θn ))
.

cos(π θn+1 ) = 1 − 2 sin2 (π θn ) = cos2 (π θn ) − sin2 (π θn )

= cos(2π θn ) .

The arguments of this expression lead directly to the shift map (3.73), which
is shown in Fig. 11.8.
Stationary Distribution The distribution functions p(x) and p(θ ) are related
by

dx π
p(x)dx = p(θ )dθ,
. = sin(π θ ), cos(π θ ) = 1 − 2x .
dθ 2

With sin(π θ ) = 1 − cos2 (π θ ) we find

p(θ ) 2 p(θ ) 1 p(θ )


.p(x) = =  = √ , (11.20)
dx/dθ π 1 − (1 − 2x)2 π x(1 − x)

which is illustrated in Fig. 11.8. One can convince oneself easily that p(θ ) =
1, namely that all angles θ occur with identical probabilities.
(3.2) FIXPOINTS OF THE LORENZ MODEL
Linearizing the differential equations (3.8) around the fixpoint (x ∗ , y ∗ , z∗ )
yields

x̃˙ = −σ x̃ + σ ỹ,
.

ỹ˙ = (−z∗ + r) x̃ − ỹ − x ∗ z̃,

z̃˙ = y ∗ x̃ + x ∗ ỹ − b z̃ ,
11.3 Solutions to the Exercises of Chap. 3 409

600 10000 iterations, 100 bins


analytic result
0.8

0.6
400
θn+1

p(x)
0.4

200
0.2

0
0 0.2 0.4 0.6 0.8 1 0
0 0.2 0.4 0.6 0.8 1
θn x

Fig. 11.8 For r = 4 the logistic map (2.46) is equivalent to the shift map (left), as defined by
(3.73). The probability distribution p(x) can be evaluated exactly, see (11.20), and compared with
a simulation (right)

where x̃ = x − x ∗ , ỹ = y − y ∗ , and z̃ = z − z∗ are small perturbations around


the fixpoint, here for σ = 10 and β = 8/3. Using the ansatz x̃ ∼ ỹ ∼ z̃ ∼ eλt
we can determine the eigenvalues λi in the above equation.
For the fixpoint (x ∗ , y ∗ , z∗ ) = (0, 0, 0) we find the eigenvalues
 √ √ 
8 −11 − 81 + 40 r −11 + 81 + 40 r
.(λ1 , λ2 , λ3 ) = − , , .
3 2 2

For r < 1 all three eigenvalues are negative and the fixpoint is stable. For
r = 1 the last eigenvalue λ3 = (−11 + 11)/2 = 0 is marginal. For r > 1 the
fixpoint becomes unstable.
The stability of the non-trivial fixpoints, Eq. (3.9), for 1 < r < rc can be
proven in a similar way, leading to a cubic equation. You can either find the
explicit analytical solutions via Cardano’s method or solve them numerically,
e.g. via Mathematica, Maple or Mathlab, and determine the critical rc , for
which at least one eigenvalue turns positive.
(3.3) HAUSDORFF DIMENSION OF THE CANTOR SET
Dimension of a Line To cover a line of length l we need one circle of diameter
l. If we reduce the diameter of the circle to l/2, we require two circles to
cover the line. Generally we require a factor of 2 more circles if we reduce the
diameter to a half. From the definition (3.12) of the Hausdorff dimension we
obtain

log[N(l)/N(l )] log[1/2]
DH = −
. =− = 1,
log[l/ l ] log[2]

where we used N(l) = 1 and N(l = l/2) = 2. Therefore, the line is one-
dimensional.
410 11 Solutions

Dimension of the Cantor Set If we reduce the diameter of the circles from l to
l/3, we require a factor of two more circles to cover the Cantor set. Therefore
we obtain the Hausdorff dimension

log[N(l)/N(l )] log[1/2]
.DH = − =− ≈ 0.6309 ,
log[l/ l ] log[3]

where we used N(l) = 1 and N(l = l/3) = 2.


(3.4) DRIVEN HARMONIC OSCILLATOR
In the long time limit the system oscillates with the frequency of the driving
force. Hence, we can use the ansatz

.x(t) = x0 cos(ωt + φ) ,

where we have to determine the amplitude x0 and the phase shift φ. Using this
ansatz for the damped harmonic oscillator we find

.(ω02 − ω2 ) x0 cos(ωt + φ) − γ x0 ω sin(ωt + φ) = cos(ωt) .

The amplitude x0 and the phase shift φ can now be found by splitting above
equation into sin(ωt) terms and cos(ωt) terms and comparing the prefactors.
For the case ω = ω0 we obtain φ = −π/2 and x0 = /(γ ω0 ). Note that
x0 → ∞ for γ → 0.
(3.5) MARKOV CHAIN OF UMBRELLAS
Let p and q = 1 − p be the probability of raining and not raining respectively,
with n ∈ {0, 1, 2, 3, 4} being the number of umbrellas in the place Lady Ann
is currently staing (which could be either the office or her apartment). This
process is illustrated in Fig. 11.9.
Neglecting the time indices for the stationary distribution ρ(n) for the
number of umbrellas, we have

0 = ρ̇(0) = qρ(4) − ρ(0)


 
0 = ρ̇(4) = ρ(0) + pρ(1) − (p + q)p(4) = p ρ(1) − ρ(4)
 
. 0 = ρ̇(1) = pρ(4) + qρ(3) − (p + q)p(1) = q ρ(3) − ρ(1)
 
0 = ρ̇(3) = qρ(1) + pρ(2) − (p + q)p(3) = p ρ(2) − ρ(3)
 
0 = ρ̇(2) = p ρ(3) − p(2)

and hence ρ(4) = ρ(1) = ρ(3) = ρ(2), which leads to


q
1 = ρ(0) + 4ρ(4) = ρ(0) + 4ρ(0)/q,
. ρ(0) = . (11.21)
q +4

The probability for Lady Ann to get wet is therefore p · ρ(0) = pq/(4 + q).
11.3 Solutions to the Exercises of Chap. 3 411

Fig. 11.9 Graphical representation of the Markov chain corresponding to the N = 4 umbrella
problem. Being at office/home (solid/dashed circles), there can be n = 0, 1, 2, 3, 4 umbrellas in
the respective place, being taken with probability p (when it rains) to the next location. When it
does not rain, with probability q = 1−p, no umbrella changes place. Note that the sum of outgoing
probabilities is one for all vertices

(3.6) BIASED RANDOM WALK


The master equation (3.31) generalizes to
 
pt+1 (x) = (1 + α) pt (x − 1) + (1 − α) pt (x + 1) /2 ,
.

with (1 ± α)/2 being the probabilities for a walker to jump to the right or to
the left, and hence

pt+Δt (x) − pt (x) (Δx)2 pt (x + Δx) + pt (x − Δx) − 2pt (x)


. =
Δt 2Δt (Δx)2
αΔx pt (x + Δx) − pt (x − Δx)
− ,
Δt 2Δx
which results in

(Δx)2 αΔx
ṗ(x, t) = Dp (x, t) − vp (x, t),
. D= , v= .
2Δt Δt
(11.22)

Rewriting the expression for the ballistic contribution v as

2α (Δx)2 2α
v=
. → D, α ∝ Δx (11.23)
Δx 2Δt Δx
we see that a finite propagation velocity v is recovered in the limit Δt → 0
and Δx → 0 whenever α is linear in Δx.
For a vanishing D → 0 the solution of (11.22) is given by a ballistically
travelling wave p(x, t) = u(x − vt).
(3.7) CONTINUOUS-TIME LOGISTIC EQUATION
The continuous-time logistic equation is

  ẏ ẏ
ẏ(t) = αy(t) 1 − y(t) ,
. + = α,
y 1−y
412 11 Solutions

Fig. 11.10 Numerical simulation of (3.74), with K = 1.02 and g = 1.2, using Euler integration
with a time step Δt = 0.01. Noise, as defined by (3.75), with a standard deviation σ = 0.003/0.009
(left/right plot) has been employed. The filled/open circle denotes the locus of the stable/unstable
fixpoint, 1 = K cos(ϕ± ), compare Fig. 2.13

with the solution


1
. log(y) − log(1 − y) = αt + c, y(t) = ,
e−αt−c + 1

where c is a constant of integration. Clearly limt→∞ y(t) = 1. For a


comparison with the logistic map, Eq. (2.46), we discretize time via

y(t + Δt) − y(t)  


. = αy(t) 1 − y(t) ,
Δt
αΔt
y(t + Δt) = (αΔt + 1)y(t) 1 − y(t) ,
αΔt + 1
 
x(t + Δt) = r x(t) 1 − x(t) ,

when renormalizing x(t) = (αΔt)y(t)/(αΔt + 1). Then x → 0 for y → 1 in


the limit Δt → 0, which implies that the continuous time logistic differential
equation corresponds to r = (αΔt + 1) → 1, compare (2.47).
(3.8) NOISE-INDUCED QUASI-PERIODIC ORBITS
Two illustrative numerical orbits are shown in Fig. 11.10. The orbit will make
another revolution once it has been able to jump from the stable to the unstable
fixpoint, which corresponds to a stochastic escape process (see Sect. 3.5.2 of
Chap. 3).
11.4 Solutions to the Exercises of Chap. 4 413

11.4 Solutions to the Exercises of Chap. 4

(4.1) INSTABILITY OF EULER’S METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . 413


(4.2) EXACT PROPAGATING WAVEFRONT SOLUTION . . . . . . . . . . . . . . . . 413
(4.3) LINEARIZED FISHER EQUATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
(4.4) TURING INSTABILITY WITH TWO STABLE NODES . . . . . . . . . . . . . . 414
(4.5) EIGENVALUES OF 2 × 2 MATRICES . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
(4.6) LARGE DENSITIES OF NERVOUS RATS . . . . . . . . . . . . . . . . . . . . . . . . 415
(4.7) AGENTS IN THE BOUNDED CONFIDENCE MODEL . . . . . . . . . . . . . . 415
(4.8) CLASSICAL VOTER MODEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416

(4.1) INSTABILITY OF EULER’S METHOD


The discretized diffusion equation reads
 
ρ(x, t + Δt) = (1 − 2r)ρ(x, t) + r ρ(x + Δx, t) + ρ(x − Δx, t)
.

where r = DΔt/(Δx)2 . Considering the periodic initial state

ρ(x, 0) = cos(π x/Δx) = (−1)n ,


. x = nΔx

we find

ρ(nΔx, Δt) = (1 − 2r)(−1)n + 2r(−1)n±1 = (1 − 4r)(−1)n


.

for the next state with t = Δt. This map is stable,1 whenever

. |1 − 4r| < 1, 0 ≤ r ≤ 1/2 .

You may verify this result numerically.


(4.2) EXACT PROPAGATING WAVEFRONT SOLUTION
The Fermi function an its first order derivative are

1 ∂ρ ∗
ρ ∗ (x, t) =
. , = βcρ ∗ (1 − ρ ∗ ) ,
1 + eβ(x−ct) ∂t

and, equivalently

∂ρ ∗ ∂ 2ρ∗
. = −βρ ∗ (1 − ρ ∗ ) , = β 2 (1 − 2ρ ∗ )ρ ∗ (1 − ρ ∗ ) .
∂x ∂x 2

1 See Sect. 2.2 of Chap. 2.


414 11 Solutions

Setting ρ ∗ → ρ and using the definition (4.2) of a reaction-diffusion system,


we find with

∂ρ ∂ 2ρ  
R(ρ) =
. − D 2 = ρ(1 − ρ) βc − β 2 D(1 − 2ρ)
∂t ∂x
 
= βρ(1 − ρ) (c − βD + 2βDρ
= rρ(1 − ρ)(ρ − θ )

the “Zeldovich equation”, where we have defined r = 2β 2 D and the threshold


θ = (βD − c)/(2β). For θ = 0 one finds

R(ρ) = rρ 2 (1 − ρ),
. c = βD . (11.24)

The result, R(ρ) = ρ 2 (1 − ρ), describes a specific type of bisexual logistic


growth, which can be seen by considering a population composed of αρ
females and (1 − α)ρ males. The number of offsprings will be ∼ αρ
(logistic growth) if mating strategies lead to all females being reproductive,
and ∼ α(1 − α)ρ 2 if males and females meet randomly.
(4.3) LINEARIZED FISHER EQUATION
With ρ = 1 − 1/u the derivatives are

u̇ u −2(u )2 u
ρ̇ =
. , ρ = , ρ = − 2
u2 u2 u3 u
and (4.71) becomes
 
u̇ 1 1 2(u )2 u 2 (u )2
.
2
= 1 − − 3
− 2 +
u u u u u 1/u u4
or

u̇ = (u − 1) + u ,
. v̇ = v + v ,

which is equivalent to the linearized Fisher equation (4.18) when shifting via
v = u − 1 the zero.
(4.4) TURING INSTABILITY WITH TWO STABLE NODES
For the two eigenvalues to be real and both negative (or both positive) the
determinant Δ1 = 1− a b need to be positive, compare (4.25), which implies

(
. a − b)
2
> 4Δ1 = 4 − 4 a b, ( a + b)
2
> 4.

This is easily possible, e.g. using a = 2 and b = 1/4 with the determinant
Δ1 = 1 − 2/4 = 1/2 remaining positive, and with the trace b − a remaining
11.4 Solutions to the Exercises of Chap. 4 415

negative. A Turing instability with two stable nodes is hence possible, e.g.
with the matrices
     
−2 1 −8 0 −10 1
.A1 = , A2 = , A= .
−1 1/4 0 −1/8 −1 1/8

The eigenvalues λ± (A1 ) = (−7 ± 17)/8 are both attracting and the
determinant of A = A1 + A2 is with Δ = −1/4 negative.
(4.5) EIGENVALUES OF 2 × 2 MATRICES
For a fixpoint x∗ = (x ∗ , 0) we find for the Jacobian J
 
0 1
J =
. , ẋ = y, ẏ = −λ(x)y − V (x) .
−V (x ∗ ) −λ(x ∗ )

The determinant

Δ = V (x ∗ ) < 0
.

is negative for a local maximum and x∗ hence a saddle, compare (4.25).


(4.6) LARGE DENSITIES OF NERVOUS RATS
The stationarity condition (4.45) reads
 
σn = eσ 1 − e−σn (σ − σn ) ≈ eσ (σ − σn ),
. σ  1, σn → σ

and hence
  σ
σn 1 + eσ ≈ σ eσ ,
. σn ≈ . (11.25)
1 + e−σ

The number of calm rats per comfort zone,

σ e−σ
σc = σ − σn ≈
.
1 + e−σ

becomes exponentially small, but never vanishes.


(4.7) AGENTS IN THE BOUNDED CONFIDENCE MODEL
The total number of agents does not change when ρ̇(x)dx = 0, viz when

d/2 d
2
. dy dxρ(x + y)ρ(x − y) = − dy dxρ(x)ρ(x + y) ,
−d/2 −d
(11.26)
416 11 Solutions

compare (4.54). Using x = x̃ − y/2 and y = 2ỹ we rewrite the second term
in (11.26) as
d d/2
. dy d x̃ρ(x̃ − y/2)ρ(x̃ + y/2) = 2 d ỹ d x̃ρ(x̃ − ỹ)ρ(x̃ + ỹ) ,
−d −d/2

which has exactly the same form a the first term in (11.26).
(4.8) CLASSICAL VOTER MODEL
The probability pi that a given agent changes opinion is
⎛ ⎞
1⎝ σi  ⎠
.pi = 1− σj ,
2 z
j

where z is the coordination number


 of the underlying network, viz the average
number of neighbors. The sum j runs over the neighbors of the ith agent.
Averaging over all realizations one has

d 1
ṁ =
. σi  = −2σi pi  = −σi  + σj , σi2 = 1 .
dt z
j

Whenever a voter flips with probability pi , its preference reverses, changing


by −2σi . On average, σi  = σj , which
 implies that the magnetization m =
σi  remains constant when using z = j 1 in above expression for ṁ.
Magnetization is conserved only as an average over realizations. For a specific
realization of consecutive opinion flips, consensus will eventually be reached,
either with σi ≡ 1 or σi ≡ −1, respectively with probabilities P+ and P− =
1 − P+ . On the average, magnetization remains constant,

1+m
m = P+ − P− = 2P+ − 1,
. P+ = ,
2
which we wanted to prove.

11.5 Solutions to the Exercises of Chap. 5

(5.1) THE LAW OF LARGE NUMBERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417


(5.2) CUMMULATIVE DISTRIBUTION FUNCTIONS . . . . . . . . . . . . . . . . . 418
(5.3) SYMBOLIZATION OF FINANCIAL DATA . . . . . . . . . . . . . . . . . . . . . . 418
(5.4) THE OR TIME SERIES WITH NOISE . . . . . . . . . . . . . . . . . . . . . . . . . . 419

(continued)
11.5 Solutions to the Exercises of Chap. 5 417

(5.5) TRAILING AVERAGES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419


(5.6) MAXIMAL ENTROPY DISTRIBUTION FUNCTION . . . . . . . . . . . . . . 420
(5.7) TWO-CHANNEL MARKOV PROCESS . . . . . . . . . . . . . . . . . . . . . . . . . 420
(5.8) KULLBACK-LEIBLER DIVERGENCE . . . . . . . . . . . . . . . . . . . . . . . . . 422
(5.9) CHI-SQUARED TEST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
(5.10) EXCESS ENTROPY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
(5.11) TSALLIS ENTROPY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423

(5.1) THE LAW OF LARGE NUMBERS


For the generating function G(x) of a cumulative of stochastically indepen-
dent variables we have
  (i)  2
.G(x) = Gi (x), Gi (x) = pk x k , σi2 = Gi (1)+Gi (1)− Gi (1)
i k

and μi = Gi (1), see (5.7) and (5.9). Then

   
μ = G (1) =
. Gi (1) Gj (1) = μj .
j i=j j

The mean of the cumulative process is just the sum of the means of the
individual processes. For the evaluation of the variance we need

      
G (1) =
. Gi (1) Gj (1)Gl (1) + Gi (1) Gj (1)
l,j =l i=j,l j i=j

   2 
= μj μl + Gj (1) = μj − μ2j
l,j =l j j j
 
+ σi2 − μi + μ2i
j
 2   
= μj + σi2 − μi = μ2 + σi2 − μ ,
j j j
418 11 Solutions

Fig. 11.11 The flat


0.02 flat, N=1
distribution, which has a Gaussian

probability densities
flat, N=2
variance of σ 2 = 1/12 is
shown together with the
probability density of the sum
of N = 2 flat distribution. 0.01
Also shown is the
corresponding limiting√
Gaussian with σ = 1/ 12N ,
compare (5.11). 100 bins and
a sampling of 105 have been 0
0 0.2 0.4 0.6 0.8 1
used x

where we have used σi2 = Gi (1) + μi − μ2i . Then



.σ 2 = G (1) + μ − μ2 = σj2 ,
j

which is the law of large numbers: the variance is additive.


(5.2) CUMMULATIVE DISTRIBUTION FUNCTIONS
The distribution p2 (x) for the sum of two drawings from a given PDF p(x)
is
1
p2 (y) =
. dx1 dx2 δ(y − x1 − x2 )p(x1 )p(x2 ) = dx1 dx2 δ(y − x1 − x2 ) ,
0

where we have assumed that p(x) is the flat distribution, for x ∈ [0, 1],
which has the variance
1 1
σ2 =
. x 2 dx = .
0 12

Changing to x1 + x2 and x1 − x2 as integration variables one finds, that p2 (y)


is a triangle, as illustrated in Fig. 11.11.
(5.3) SYMBOLIZATION OF FINANCIAL DATA
For the Dow Jones index in the twentieth-century, from 1900 to 1999, the
joint probabilities are

p+++ = 0.15, p+−+ = 0.11, p−++ = 0.13, p−−+ = 0.12


. ,
p++− = 0.13, p+−− = 0.12, p−+− = 0.10, p−−− = 0.11

as extracted from historical daily data. The default would be 1/8 = 0.125.
Of interest are the ratios

p++− /p+++ = 0.87, p+−− /p+−+ = 1.09


. .
p−+− /p−++ = 0.77, p−−− /p−−+ = 0.92
11.5 Solutions to the Exercises of Chap. 5 419

The conditional probabilities p(+| − +) and p(−| − +) of a third-day


increase/decrease, given that the Dow Jones index had decreased previously
on the first day and increased on the second day, are evaluated via

p(+| − +) p−++ 1
p(+| − +) + p(−| − +) = 1,
. = = ,
p(−| − +) p−+− 0.77

yielding p(+| − +) = 0.57 and p(−| − +) = 0.43. There had been conse-
quently, in the twentieth-century, a somehow substantially larger chance of
a third day increase in the Dow Jones index, when the index had decreased
previously on the first day and increased on the second day. This kind of
information could be used, at least as a matter of principle, for a money-
making scheme. But there are many caveats: There is no information about
the size of the prospective third-day increase in this analysis, the standard
deviation of the p±±± may be large, and there may be prolonged periods of
untypical statistics. We do recommend not to invest real world money using
home made money-making schemes on the basis of this or related statistical
analysis.
(5.4) THE OR TIME SERIES WITH NOISE
Recall that OR(0, 0) = 0, together with OR(0, 1) = OR(1, 0) = OR(1, 1) =
1. For the initial condition σ1 = 0, σ0 = 0, the time series is ..0000000.., for
all other initial conditions the time series is ..1111111..,

1 typical
p(1) =
. .
3/4 average

The time series always returns to the typical sequence,

. . . . 1111101111111110111111111111110000000000000 . . .

for finite but low levels of noise, where we underlined noise-induced


transitions (time runs from right to left). We have therefore p(1) → 1 when
noise is present at low levels. For larger amounts of noise the dynamics
becomes, on the other hand, completely random, with p(1) → 1/2.
(5.5) TRAILING AVERAGES
Using partial integration one can derive local (in time) updating rules,

d 1 ∞ 1 ∞ d
. ȳ(t) = dτ y (t − τ ) e−τ/T = − dτ e−τ/T y(t − τ )
dt T 0 T 0 dτ
1 −τ/T ∞ 1 ∞
=− e y(t − τ ) − dτ e−τ/T y(t − τ )
T 0 T2 0
y(t) − ȳ(t)
= , (11.27)
T
420 11 Solutions

which proves the usual equivalence between an exponential kernel


and (11.27). The trailing averages are normalized, which can be seen when
specializing to constant inputs,

ȳ(t)
.
y(t)≡y0
= y0 , yc (t) y(t)≡y0
= y0 , ys (t) y(t)≡y0
= y0 ,

which determines the respective prefactors. The normalized trailing sin-


transform is given by

1 + (ωT )2 ∞
ys (t, ω) =
. dτ y(t − τ ) e−τ/T sin(ωτ ) .
ωT 2 0

Again, the update rules can be derived using a partial integration. One finds

y(t) − yc (t, ω) y(t) − ys (t, ω)


ẏc (t, ω) =
. + (ωT )2 . (11.28)
T T
yc (t, ω) − ys (t, ω)
ẏs (t, ω) = . (11.29)
T
These update rules become numerically unstable for large frequencies, due
to the factor (ωT )2 in (11.28). They are however useful when dealing with
low-pass filters.
(5.6) MAXIMAL ENTROPY DISTRIBUTION FUNCTION
The two conditions (5.39), of a given mean μ and a given varicance σ 2 , can
be enforced by two Lagrange parameters λ1 and λ2 respectively, via

f (p) = −p log(p) − λ1 xp − λ2 (x − μ)2 p .


.

The stationary condition (5.35), f (p) = 0, leads to

−log(p)−1−λ1 x−λ2 (x−μ)2 = 0,


. log(p) = log(const.)−λ2 (x−μ̃)2 ,

where μ̃ = μ − λ1 /(2λ2 ). Therefore

(x−μ̃)2
p(x) ∝ 2−λ2 (x−μ̃) ∼ e
2
. 2σ 2 ,

with σ 2 = 1/(2λ2 loge (2)), which determines the Lagrange parameter λ2 .


The first condition in (5.39), demands the mean to equal μ, viz μ̃ ≡ μ and
consequently λ1 = 0.
(5.7) TWO-CHANNEL MARKOV PROCESS
We remind ourselves that OR(0, 1) = OR(1, 1) = OR(1, 1) = 1, together
with OR(0, 0) = 0, and that AND(0, 1) = AND(1, 0) = AND(0, 0) = 0,
with AND(1, 1) = 1. We consider first α = 0 (no noise), and the four initial
11.5 Solutions to the Exercises of Chap. 5 421

conditions (underlined),

. . . σt+1 σt : ...000000 ...111111


. ,
. . . τt+1 τt : ...000000 ...111111

and

. . . σt+1 σt : ...000000 ...000001


. .
. . . τt+1 τt : ...111111 ...111110

In consequence, one finds three types of stationary states. The master


equations for the respective joint probabilities are

pt+1 (0, 0) = (1 − α)pt (0, 0) + α [pt (1, 0) + pt (0, 1)] .


. (11.30)
pt+1 (1, 1) = (1 − α)pt (1, 1) (11.31)

and

pt+1 (1, 0) = αpt (1, 1).


. (11.32)
pt+1 (0, 1) = (1 − α) [pt (1, 0) + pt (0, 1)] + αpt (0, 0) (11.33)

The stationary condition is pt+1 = pt ≡ p. From (11.31) and (11.32) it then


follows, for 0 < α < 1,

p(1, 1) = 0,
. p(1, 0) = 0 .

Using this result we obtain

1
p(0, 0) = p(0, 1) ≡
.
2
from (11.30) and (11.33), where we normalized the result in the last step.
This finding is independent of the noise level α, for any α =
 0, 1. The
marginal distribution functions are

pσ (0) = 1 pτ (0) = 1/2


. .
pσ (1) = 0 pτ (1) = 1/2

Lastly, we find

H [p] = 1,
. H [pσ ] = 0, H [pτ ] = 1

for the respective entropies. The mutual information I (σ, τ ) = H [pσ ] +


H [pτ ] − H [p] vanishes since the σ -channel becomes deterministic for any
finite noise level α.
422 11 Solutions

(5.8) KULLBACK-LEIBLER DIVERGENCE


The Kullback-Leibler divergence for the two normalized PDFs

γ −1
.p(x) = e−(x−1) , q(x) = , x, γ > 1

is
∞  
p(x)
K[p; q] =
. p(x) log2 dx
1 q(x)
∞  
= −H [p] − e−(x−1) log2 (γ − 1)x −γ
1

= −H [p] − log2 (γ − 1) + γ e−(x−1) log2 (x) ,
1

which is stationary for

∂K[p; q]
. =0
∂γ

viz

1 ∞ ∞
. = ln(2) e−(x−1) log2 (x) = e−(x−1) ln(x) ,
γ −1 1 1

where we have used


d 1 d 1 1
. log2 (x) = ln(x)/ ln(2), log2 (x) = ln(x) = .
dx ln(2) dx ln(2) x

Numerically one finds


∞ 1
. e−(x−1) ln(x) ≈ 0.596, γ = + 1 ≈ 2.678 .
1 0.596

The graphs intersect twice, q(x) > p(x) both for x = 1 and x → ∞.
(5.9) CHI-SQUARED TEST
We rewrite the Kullback Leibler divergence, see (5.2.4), as

     
pi qi
K[p; q] =
. pi log =− pi log
qi pi
i i
     
qi − pi + pi qi − pi
=− pi log − pi log 1 +
pi pi
i i
11.5 Solutions to the Exercises of Chap. 5 423

  (qi − pi )2  (pi − qi )2
≈− (qi − pi ) + =
2pi 2pi
i i i

χ 2 [p; q]
≡ ,
2

where we used the Taylor expansion


 log(1 +
x) ≈ x − x 2 /2 together with
the normalization conditions i pi = 1 = i qi . Note the relation to the
Fisher information, see (5.59) and (5.61).
(5.10) EXCESS ENTROPY
We show that En ,
 
En ≈ H [pn ] − n H [pn+1 ] − H [pn ]
.

is monotonically increasing with n as long as H [pn ] is concave,


 
En − En−1 = H [pn ] − n H [pn+1 ] − H [pn ]
.
 
− H [pn−1 ] + (n − 1) H [pn ] − H [pn−1 ]
 
= n 2H [pn ] − H [pn+1 ] − H [pn−1 ]
 1 
= 2n H [pn ] − H [pn+1 ] − H [pn−1 ] ,
2
where the last term in the brackets is positive as long as H [pn ] is concave as
a function of n, compare Fig. 5.4.
(5.11) TSALLIS ENTROPY
For small 1 − q we may develop the Tsallis entropy Hq [p], as defined
by (5.70), as

1   q−1  1   (q−1) ln(pk ) 


Hq [p] =
. pk pk −1 = pk e −1
1−q 1−q
k k

≈− pk ln(pk ) ≡ H [p] .
k

With pk ∈ [0, 1], q ∈]0, 1] and q − 1 ≤ 0 we have

 q−1 1   q−1 
. pk ≥ 1, Hq [p] = pk pk − 1 ≥ 0.
1−q
k
424 11 Solutions

The joint distribution function for two statistically independent systems p is


just the product of the individual distributions, p = pX pY , which leads to
 q  
. pX (xi )pY (yj ) = pq (xi ) pq (yj )
xi ,yj xi yj
  
= (1 − q)Hq [pX ] + 1 (1 − q)Hq [pY ] + 1 ,

where a substitution equivalent to k pk = 1 was used for Hq [pX ] and
Hq [pY ]. Hence we have
 
1  q
.Hq [p] = pk −1
1−q
k

= Hq [pX ] + Hq [pY ] + (1 − q)Hq [pX ]Hq [pY ] ,

which shows that the Tsallis entropy is extensive only in the limit q → 1.
The Tsallis entropy is maximal for an equiprobability distribution, which
follows directly from the general formulas (5.34) and (5.36).

11.6 Solutions to the Exercises of Chap. 6

(6.1) SOLUTIONS OF THE LANDAU–GINZBURG FUNCTIONAL . . . . . . . 424


(6.2) THERMODYNAMICS OF THE LANDAU-GINZBURG MODEL . . . . . . 425
(6.3) THE GAME OF LIFE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
(6.4) GAME OF LIFE ON A SMALL-WORLD NETWORK . . . . . . . . . . . . . . 425
(6.5) FOREST FIRE MODEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
(6.6) REALISTIC SANDPILE MODEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
(6.7) RECURSION RELATION FOR AVALANCHE SIZES . . . . . . . . . . . . . . . 428
(6.8) RANDOM BRANCHING MODEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
(6.9) GALTON–WATSON PROCESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429

(6.1) SOLUTIONS OF THE LANDAU–GINZBURG FUNCTIONAL


We want that the cubic equation (6.5) factorizes,

0 = P (φ) − h = φ 3 + (t − 1)φ − h ≡ (φ − φ3 )(φ − φ2 )(φ − φ1 ) .


.
11.6 Solutions to the Exercises of Chap. 6 425

This is the case, e.g., selecting

−1 −3 3 3
φ3 = 1,
. φ2 = , φ1 = , t= , h= .
4 4 16 16

Local stability is determined by the derivative P (φ), as illustrated in Fig. 6.2.


An example is

d  
P (φ2 ) =
. P (φ) − h φ = (φ2 − φ3 )(φ2 − φ1 ) < 0 ,
dφ 2

which implies that φ2 is an unstable fixpoint. In analogy, φ1 and φ3 are locally


stable.
(6.2) THERMODYNAMICS OF THE LANDAU-GINZBURG MODEL
In th ordered state, the free energy density is given by

t −1 2 1 (t − 1)2
.f (T , φ, h) − f0 (T , h) = φ + φ4 = − ,
2 4 4

where we used φ 2 = 1 − t. It follows that

∂F t −1 ∂F V 1−t
. = −V ⇒ S= =
∂t 2 ∂T Tc 2

for the entropy S, with t = T /Tc . The specific heat CV is

∂S V
C V = Tc
. =− , T < Tc .
∂T 2Tc

For T > Tc the specific heat CV vanishes, there is a jump at T = Tc .


(6.3) THE GAME OF LIFE
The solutions have already been given in Fig. 6.5, apart from the cross
{(0,0), (0,1),(1,0),(−1,0),(0,−1)}. For an illustration of its development see
Fig. 11.12.
(6.4) GAME OF LIFE ON A SMALL-WORLD NETWORK
The construction of a small-world net with conserving local connectivities
ki ≡ 8 is shown and explained in Fig. 11.13. An appropriate dynamical order
parameter would be the density of life cells ρ(t) at time t.2
(6.5) FOREST FIRE MODEL
We define by xt , xf and xe the densities of cells with trees, fires and ashes
(empty), with xt + xf + xe = 1. A site burns if there is at least one fire on
one of the Z nearest-neighbor cells. The probability that none of Z cells is

2 This problem has been surveyed in detail by Huang et al. (2003).


426 11 Solutions

"cross"

Step 1 Step 2 Step 3 Step 4

Step 5 Step 6 Step 7

Fig. 11.12 Evolution of the pattern “cross” in the game of life: after seven steps it gets stuck in a
fixed state with four blinkers

burning is (1 − xf )Z , the probability that at least one out of Z is burning is


1 − (1 − xf )Z . We have then the update rules
 
xf (t + 1) = 1 − (1 − xf (t))Z xt (t),
. xe (t + 1) = xf (t) − pxe (t) ,

where p is the probability for a sampling to grow. Compare Fig. 6.6. The
stationary densities xe (t + 1) = xe (t) ≡ xe∗ , etc., are

2+p ∗
(1 + p)xe∗ = xf∗ ,
. 1 = xf∗ + xt∗ + xf∗ /(1 + p), xt∗ = 1 − x ,
1+p f

which leads to self-consistency condition for the stationary density xf∗ of fires,

  2+p ∗


.xf = 1 − (1 − xf∗ )Z 1− x . (11.34)
1+p f

In general one needs to solve (11.34) numerically. For small densities of fires
we expand
 
Z(Z − 1) ∗ 2
1 − (1 − xf∗ )Z ≈ 1 − 1 − Zxf∗ +
. (xf )
2
Z(Z − 1) ∗ 2
= Zxf∗ − (xf ) ,
2
11.6 Solutions to the Exercises of Chap. 6 427

Fig. 11.13 Construction of a small-world network out of the game of life on a 2D-lattice: One
starts with a regular arrangement of vertices where each one is connected to its eight nearest
neighbors. Two arbitrarily chosen links (wiggled lines) are cut with probability p and the remaining
stubs are rewired randomly as indicated (dashed arrows). The result is a structure showing
clustering as well as a fair amount of shortcuts between far away sites, as in the Watts and Strogatz
model, Fig. 1.14, but with conserved connectivities ki ≡ 8

finding
    
1 Z−1 ∗ 2+p ∗ Z−1 2+p
. = 1− xf 1− xf ≈ 1 − + x∗
Z 2 1+p 2 1+p f

for (11.34). The minimal number of neighbors for fires to burn continuously
is Z > 1 in mean-field theory.
(6.6) REALISTIC SANDPILE MODEL
The variable zi should denote the true local height of a sandpile; the toppling
starts when the slope becomes too big after adding grains of sand randomly,
i.e. when the difference zi − zj between two neighboring cells exceeds a
certain threshold K. Site i then topples in the following fashion:

– Look at the neighbor j of site i for which zi − zj is biggest (and positive)


and transfer one grain of sand from i to j ,

zi → zi − 1,
. zj → zj + 1 . (11.35)

If more than one neighbor satisfies the criteria, select one randomly.
– Repeat the first step as long as there is at least a single neighbor j of i
satisfying the condition zi ≥ zj + 1.

The toppling process mimics a local instability, being at the same time
conserving. Sand is lost only at the boundaries. This model leads to true
sandpiles, in the sense that it is highest in the center and lowest at the
428 11 Solutions

Fig. 11.14 Example of a simulation of a one-dimensional realistic sandpile model, see (11.35),
with 60 cells, after 500 (left) and 2000 (right) time step

boundaries, compare Fig. 11.14. Note that there is no upper limit to zi , only to
the slope |zi − zj |.
(6.7) RECURSION RELATION FOR AVALANCHE SIZES
Substituting fn (x) = s Pn (s) x s (where we dropped the functional depen-
dence on the branching probability p), as defined by (6.17), into the recursion
relation (6.19) for the generating functional fn (x), one obtains

  2
. Pn+1 (s) x s = x (1 − p) + x p Pn (s) x s . (11.36)
s s

The recursion relation for the probability Pn (s) of finding an avalanche of size
s is obtained by Taylor-expanding the right-hand side of (11.36) in powers of
x s , and comparing prefactors,

x0 :
. Pn+1 (0) = 0,
 2
x1 : Pn+1 (1) = (1 − p) + p Pn (0) = 1 − p ,

for s = 0, 1. There is at least one site, hence Pn (0) = 0. We find



Pn+1 (s) = p
. Pn (s )Pn (s − s − 1), s>1 (11.37)
s

for general avalanche sizes. This expression is easy to evaluate numerically


by evaluating all Pn (s) with increasing n, the numerical effort increases like
O(s 2 ).
11.6 Solutions to the Exercises of Chap. 6 429

For a general branching process, described by the probability pm of having


m offsprings, the recursion relation (6.19) takes the form
  m
fn+1 (x) = x
. pm fn (x) . (11.38)
m

Also in this case a recursion relation, in analogy to (11.37), could be derived,


involving however multiple summations, with the computational effort scaling
like O(s M ), where M is the maximal number of descendants. One has pm = 0
for m > M.
(6.8) RANDOM BRANCHING MODEL n
For the probability Q̃n = n =0 Qn (0, p) for the avalanche to last 1 . . . n
time steps, the recursion ansatz

Q̃n+1 = (1 − p) + p Q̃2n
. (11.39)

is valid, in analogy to the recursion relation (6.19) for the generating function
of avalanche sizes. The case here is simpler, as one can work directly
with probabilities: The probability Q̃n+1 to find an avalanche of duration
1 . . . (n + 1) is the probability (1 − p) to find an a avalanche of length 1
plus the probability p Q̃2n to branch one time step, generating two avalanches
of length 1 . . . n.
In the thermodynamic limit we can replace the difference Q̃n+1 − Q̃n by
the derivative ddn Q̃n
leading to the differential equation

dQ̃n 1 1 1
. = + Q̃2n − Q̃n , for p = pc = , (11.40)
dn 2 2 2
which can easily be solved by separation of variables. The derivative of the
solution Q̃n with respect to n is the probability of an avalanche to have a
duration of exactly n steps.
n
Q̃n =
. , (11.41)
n+2
dQ̃n 2
D(t = n) = = ∼ n−2 .
dn (n + 2)2

Check that (11.41) really solves (11.40), with the initial condition Q̃0 = 0.
(6.9) GALTON–WATSON PROCESS
The generating functions G0 (x) are generically qualitatively similar to the
ones shown in Fig. 6.12, due to the normalization condition G(1) = 1.
430 11 Solutions

The fixpoint condition G(q) = q, Eq. (6.26), has a solution for q < 1
whenever

G (x)
. > 1, W = G (1) > 1 ,
x=1

since G (1) is the expectation value of the distribution function, which is the
mean number of offsprings W .

11.7 Solutions to the Exercises of Chap. 7

(7.1) K = 1 KAUFFMAN NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430


(7.2) N = 4 KAUFFMAN NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
(7.3) SYNCHRONOUS VS. ASYNCHRONOUS UPDATING . . . . . . . . . . . . . . 430
(7.4) LOOPS AND ATTRACTORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
(7.5) RELEVANT NODES AND DYNAMIC CORE . . . . . . . . . . . . . . . . . . . . . 431
(7.6) THE HUEPE AND ALDANA NETWORK . . . . . . . . . . . . . . . . . . . . . . . . 432
(7.7) LOWER BOUND FOR BOND PERCOLATION . . . . . . . . . . . . . . . . . . . . 433

(7.1) K = 1 KAUFFMAN NET


The solutions are illustrated in Fig. 11.15.
(7.2) N = 4 KAUFFMAN NET
The attractors of the N = 4 graph illustrated in Fig. 7.1 are given in Fig. 11.15.
(7.3) SYNCHRONOUS VS. ASYNCHRONOUS UPDATING
Asynchronous updating of the three-site network shown in Fig. 7.3 leads to
the attractors illustrated in Fig. 11.16.
(7.4) LOOPS AND ATTRACTORS
The attractors are defined by the set of linkage loops. There are two loops
given which write out to C1

.ABC → CAB̄ → B̄C Ā → ĀB̄ C̄ → C̄ ĀB → B C̄A → ABC ,

and C2

DE → Ē D̄ → DE .
.

In its general form the attractor cycle is

.ABCDE → CAB̄ Ē D̄ → B̄C ĀDE → ĀB̄ C̄ Ē D̄


→ C̄ ĀBDE → B C̄AĒ D̄ → ABCDE .
11.7 Solutions to the Exercises of Chap. 7 431

Fig. 11.15 Left: Solution of three K = 1, N = 3 boolean networks with a cyclic linkage tree
σ1 = f1 (σ2 ), σ2 = f2 (σ3 ), and σ3 = f3 (σ1 ). (a) All coupling functions are the identity. (b) All
coupling functions are the negation. (c) f1 and f2 = negation, f3 identity. Right: Solution for
the N = 4 Kauffman nets shown in Fig. 7.1, σ1 = f (σ2 , σ3 , σ4 ), σ2 = f (σ1 , σ3 ), σ3 = f (σ2 ),
σ4 = f (σ3 ), with all coupling functions f (. . .) being the generalized XOR functions, which count
the parity of the controlling elements

Depending on the initial state, the loops C1 und C2 lead to two cycles each.
Cycles L1 , L2 emerge from C1 , L3 and L4 from C2 ,

L1 :
. 000 → 001 → 101 → 111 → 110 → 010 → 000
L2 : 100 → 011 → 100
L3 : 00 → 11 → 00
L4 : 01 → 10 → 01
   
The possible combinations of L1 , L2 and L3 , L4 construct all possible
attractor cycles. This leads to two attractor cycles of length two and two
attractor cycles of length six,

.L1 L3 : 00000 → 00111 → 10100 → 11111 → 11000 → 01011 → 00000


L1 L4 : 00001 → 00110 → 10101 → 11110 → 11001 → 01010 → 00001
L2 L3 : 10000 → 01111 → 10000
L2 L4 : 10001 → 01110 → 10001

(7.5) RELEVANT NODES AND DYNAMIC CORE


The attractors of the network illustrated in Fig. 7.3 are (000), (111) and
(010) ↔ (001). None of the elements σi , i = 1, 2, 3, has the same value
in all three attractors, there are no constant nodes.
432 11 Solutions

Fig. 11.16 Solution of the N = 3, Z = 2 network defined in Fig. 7.3, when using sequential
asynchronous updating. The cycles completely change in comparison to the case of synchronous
updating shown in Fig. 7.3

The dynamics obtained when substituting the AND by an XOR is shown


in Fig. 11.17. One obtains two fixpoints, (000) and (011). The first element,
σ1 is hence constant, with the dynamic core being made-up from σ2 and σ3 .
(7.6) THE HUEPE AND ALDANA NETWORK
As the exact solution can be found in the paper Huepe and Aldana-González
(2002), we confine ourselves to some hints. You should start with the fraction
of elements φN (t) with +1 at time t, which reduces to the probability φ(t) for
σi = +1 in the N → ∞ limit. You will then find that

s(t) = 2φ(t) − 1 .
.

Afterwards one has to consider the probability I(t) for the output function to
be positive, which gives us the recursion equation

φ(t + 1) = I (t)(1 − η) + (1 − I (t))η .


.

The relation between I (t) and φ(t) is still unknown but can be calculated via

I (t) =
. Pξ(t) (x)dx
0

with P being the probability density function of the sum ξ(t) =


K ξ(t)
σ
j =1 ji (t), which can be represented as the K-fold of Pσ (t) in Fourier
space,
 K
. P̂ξ(t) = P̂σ (t) .

For the probability density of σ (t) the proper ansatz is

.Pσ (t) = φ(t)δ(x − 1) + [1 − φ(t)] δ(x + 1) .

After some calculus you should finally obtain the recursion relation for s(t)
and find both its fixed points and the critical value ηc .
11.8 Solutions to the Exercises of Chap. 8 433

100 011 000

010 101 111 110 001

Fig. 11.17 Solution for the 3-site network shown in Fig. 7.3, with the AND function substituted
by an XOR

(7.7) LOWER BOUND FOR BOND PERCOLATION


An upper bound of the number σn (d) of paths of length n on a d-dimensional
hypercubic lattice is
 1/n
σn (d) < 2d(2d − 1)n−1 ,
. λ(d) = lim σn (d) < 2d − 1 ,
n→∞

as there are 2d possibilities for the first step and 2d − 1 for each subsequent
step. Self-retracing paths do however occur and not all of the such constructed
paths are hence distinct.
An exponentially small (as a function of n) fraction of σn (d) of paths of
length n and starting from a given site will be covered with bonds if the bond
covering probability p is lower than 1/λ(d) (because than pλ(d) < 1) and
the percolation transition hence occurs at pc (d) = 1/λ(d). We hence find the
lower bound
1 1
pc (d) =
. > . (11.42)
λ(d) 2d − 1

11.8 Solutions to the Exercises of Chap. 8

(8.1) THE ONE-DIMENSIONAL ISING MODEL . . . . . . . . . . . . . . . . . . . . . . . 434


(8.2) ERROR CATASTROPHE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
(8.3) COMPETITION FOR RESOURCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
(8.4) COMPETITIVE POPULATION DYNAMICS . . . . . . . . . . . . . . . . . . . . . . . 436
(8.5) HYPERCYCLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
(8.6) PRISONER’S DILEMMA ON A LATTICE . . . . . . . . . . . . . . . . . . . . . . . . 437
(8.7) NASH EQUILIBRIUM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438

(continued)
434 11 Solutions

(8.8) WAR OF ATTRITION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438


(8.9) TRAGEDY OF THE COMMONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439

(8.1) THE ONE-DIMENSIONAL ISING MODEL


For the one-dimensional Ising system the energy is
 
. E = −J Si Si+1 − B Si , (11.43)
i i

with Si = ±1. The partition function


  
.ZN = ZN (T , B) = e−βEn = ... TS1 ,S2 TS2 ,S3 . . . TSN ,S1
n S1 SN

can be evaluated with the help of the 2 × 2 transfer matrix


 
eβ(J +B) e−βJ
T =
. . (11.44)
eβJ eβ(J −B)

It has the eigenvalues


!
λ1,2 = eβJ cosh βB ±
. e2βJ cosh2 βB − 2 sinh 2βJ ,

leading to
  
ZN (T , B) =
. TN = (λ1 )N + (λ2 )N  (λ1 )N ,
S1,S2
S1=S2

for large N and λ1  λ2 . The free energy per particle is given by

F (T , B) −kT
. = ln ZN (T , B) ,
N N
and the magnetization per particle by
 −1/2
M(T , B) ∂F (T , B) e−4βJ
. =− = 1+ .
N ∂B sinh2 βB
11.8 Solutions to the Exercises of Chap. 8 435

(8.2) ERROR CATASTROPHE


Case u− = 0, u+ = u The fixpoint conditions read

0 = (1 − σ )xi + uxi−1 − φxi ,


. i > 1,
0 = x1 − φx1 ,

where the xi are the respective concentrations. Hence we can immediately


write down the N × N reproduction rate matrix W :
⎛ ⎞
1 0 0 0 ···
⎜ u (1 − σ ) 0 0 ···⎟
⎜ ⎟
.W = ⎜
⎝0 u (1 − σ ) 0 · · · ⎟

,
.. .. .. .. . .
. . . . .

whose diagonal elements obviously represent the eigenvalues. The largest


eigenvalue is 1 = W11 , and the corresponding eigenvector
  u N−1 
1 u  u 2
.e1 = √ 1, , ,··· , .
N σ σ σ

This eigenvector is normalizable only for u < σ , viz u = σ is the error


threshold.
Case u− = u+ = u The fixpoint conditions are

0 = (1 − σ )xi + uxi−1 + uxi+1 − φxi ,


. i > 1,
0 = x1 + ux2 − φx1 .

The first equation is equivalent to

φ+σ −1
xi+1 =
. xi − xi−1 , i > 1, (11.45)
u
which can be cast into the 2 × 2 recursion form
    
xi 0 1 xi−1
. = φ+σ −1 .
xi+1 −1 u xi

The largest eigenvalues of the above recursion matrix determine the scaling
of xi for large i. For the determination of the error threshold you may solve
for the xi numerically, using (11.45) and the mass-normalization condition

i xi = 1 for the self-consistent determination of the flux φ.
436 11 Solutions

(8.3) COMPETITION FOR RESOURCES


Summing in (8.64) over the species i one obtains
 
. Ċ = f ri xi − dC, f˙ = a − f ri xi , 0 = a − dC ,
i i


for the steady-state case Ċ = 0 = f˙, viz C = i xi = a/d; no life without
regenerating resources. The growth rates Wii = f ri − d need to vanish (or to
be negative) in the steady state and hence
 
d
. lim f (t) → min .
t→∞ i ri

Only a single species survives the competition for the common resource (the
one with the largest growth rate ri ), the ecosystem is unstable.
(8.4) COMPETITIVE POPULATION DYNAMICS
The fixpoints of the symmetric sheeps and rabbits Lotka–Volterra
model (8.65) are x∗0 = (0, 0), x∗x = (1, 0), and x∗y = (0, 1), together with the
non-trivial fixpoint
 
1   (1 − 2x − ky) −kx
x∗k =
. 1, 1 , J = .
1+k −ky (1 − 2y − kx)

For x = x∗k the Jacobian J and the Lyapunov exponents λ± are


 
1 −1 −k −1 ± k
. , λ± = .
1+k −k −1 1+k

The non-trivial fixpoint is stable for weak competition k < 1. In this case,
sheeps and rabbits coexist.
x∗k becomes however a saddle for strong competition k > 1 and the
phenomenon of spontaneous symmetry breaking occurs. Depending on the
initial conditions, the system will flow either to x∗x or to x∗y , even though the
constituting equations of motion (8.65) are symmetric under an exchange of x
and y, with x∗x and x∗y having the identical two Lyapunov exponents (−1) and
(1 − k).
The trivial fixpoint x∗0 is always unstable with a degenerate eigenvalue of
(+1).
(8.5) HYPERCYCLES
The fixpoints (x1∗ , x2∗ ) of the reaction network (8.41) are determined by

0 = x1∗ (α + κ x2∗ − φ)
.

0 = x2∗ (2α + κ x1∗ − φ)


φ = αx1∗ + 2αx2∗ + 2κx1∗ x2∗
11.8 Solutions to the Exercises of Chap. 8 437

where φ is the flux. The last equation, see (8.42), is used to satisfy the
constraint x1 + x2 = 1, together with x1∗ , x2∗ ≥ 0.
Solving above equations, one finds x1∗ = κ−α ∗
2κ and x2 = 2κ for κ > α.
κ+α

Otherwise, only the trivial solutions (x1 , x2 ) = (0, 1) and (x1∗ , x2∗ ) = (1, 0)
∗ ∗

are fixpoints. Linearizing the equations around the fixpoints leads us to the
matrix
 
(κ − α)x2∗ − 4κx1∗ x2∗ (κ − α)x1∗ − 2κ(x1∗ )2
.M = .
κx2∗ − 2κ(x2∗ )2 α + κx1∗ − 2αx2∗ − 4κx1∗ x2∗

For (x1∗ , x2∗ ) = (1, 0) the biggest eigenvalue of M is κ + α, which is positive


for positive growth rates, so the fixpoint is unstable. For (0, 1) one finds the
condition κ < α, which guarantees that all eigenvalues are negative. For the
analysis of ( κ−α κ+α
2κ , 2κ ), computer algebra systems like Maple, Mathematica
or Mathlab, may be used.
(8.6) PRISONER’S DILEMMA ON A LATTICE
We use first a general payoff matrix and then, specifically, {T ; R; P ; S} =
{3.5; 3.0; 0.5; 0.0} as in Fig. 8.13. We consider the four cases separately:

– ONE DEFECTOR IN THE BACKGROUND OF COOPERATORS


The payoffs are

intruding defector: 4 × T = 4 × 3.5 = 14


.
cooperating neighbors: 3 × R + 1 × S = 3 × 3 + 0 = 9

Therefore, the neighboring cooperators will become defectors in the next


step.
– TWO ADJACENT DEFECTORS IN THE BACKGROUND OF COOPERATORS
The payoffs are:

intruding defectors: 3 × T + 1 × P = 3 × 3.5 + 0.5 = 11


.
cooperating neighbors: 3 × R + 1 × S = 3 × 3 + 0 =9

Therefore, the neighboring cooperators will become defectors in the next


step.
– ONE COOPERATOR IN THE BACKGROUND OF DEFECTORS
The payoffs are:

intruding cooperator: 4 × S =4×0 =0


.
defecting neighbors: 3 × P + 1 × T = 3 × 0.5 + 3.5 = 5

The cooperating intruder will die and in the next step only defectors will
be present.
438 11 Solutions

– TWO ADJACENT COOPERATORS IN THE BACKGROUND OF DEFECTORS


The payoffs are:

intruding cooperators: 3 × S + 1 × R = 4 × 0 + 3 =3
.
defecting neighbors: 3 × P + 1 × T = 3 × 0.5 + 3.5 = 5

The cooperating intruders will die and in the next step only defectors will
be present.

One can go one step further and consider the case of three adjacent intruders.
Not all intruders will then survive for the case of defecting intruders and not
all intruders will die for the case of cooperating intruders.
(8.7) NASH EQUILIBRIUM
The payoff matrix of this game is given by
 
L L
.A= , L<H,
0H

for the cautious/risky player, where L signifies the low payoff and H the high
payoff. Denoting the number of cautious players by Nc we can compute the
reward for participants playing cautiously or riskily, respectively and from this
the global reward G:

Rc = [LNc + L(N − Nc )] /N = L ,
.

Rr = [0 · Nc + H (N − Nc )] /N = H (N − Nc )/N ,

(N − Nc )2
G(Nc ) = H + Nc L .
N

The function G(Nc ) has two local maxima at Nc = 0 and Nc = N


representing the Nash equilibria. The first case is optimal for all players, with
the maximal global utility being NH .
(8.8) WAR OF ATTRITION
For the stationary solution, the payoffs Em in (8.66) must be independent of
the investment m, a condition which holds generically for games with contin-
uous investments. The outcome is a “mixed evolutionary stable strategy”, as
determined by ∂Em /∂m = 0,
∞ p(m) 1 ∞
0 = vp(m) − c (m)
. p(x)dx, = p(x)dx .
m c (m) v m
11.8 Solutions to the Exercises of Chap. 8 439

Differentiation leads to

p (x) p(x) −1
. − 2
c (x) = p(x) , (11.46)
c (x) (c (x)) v

when changing variables from m to x. The solution of (11.46) is

c (x) −c(x)/v ∞
p(x) =
. e , p(x)dx = 1 .
v 0

A linear cost function c(x) = x corresponds to an exponentially distributed


evolutionary stable strategy.
(8.9) TRAGEDY OF THE COMMONS
For a generic productivity of the commons, the payoff function is
 
Ei = xi P (xtot ) − ci ,
. P (0) = 1, P (xtot ) < 0 .

Payoffs are optimal when

.P (xtot ) − ci = −xi P (xtot ), Ei = −P (xtot ) xi2 , (11.47)

where xtot is determined by averaging the first equation over agents,


xtot
.P (xtot ) − c̄ = − P (xtot ) . (11.48)
N
Maximal investments costs cmax are derived from the limit xi → 0 in (11.47),

cmax − ci
cmax = P (xtot ),
. xi = , (11.49)
−P (xtot )

which leads to

(cmax − ci )2
.Ei = .
−P (xtot )

Again, the dispersion relation is strictly quadratic, compare the original case,
Eq. (8.60). That it depends explicitly on xtot is not relevant, as it follows
from (11.48) that cmax −c̄ ∼ 1/N, which leads directly to catastrophic poverty.
Catastrophic poverty arises hence whenever the productivity function P (xtot )
is monotonically decaying.
440 11 Solutions

11.9 Solutions to the Exercises of Chap. 9

(9.1) THE DRIVEN HARMONIC OSCILLATOR . . . . . . . . . . . . . . . . . . . . . . . 440


(9.2) SELF-SYNCHRONIZATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
(9.3) KURAMOTO MODEL WITH THREE OSCILLATORS . . . . . . . . . . . . . . 441
(9.4) LYAPUNOV EXPONENTS ALONG SYNCHRONIZED ORBITS . . . . . . 442
(9.5) SYNCHRONIZATION OF CHAOTIC MAPS . . . . . . . . . . . . . . . . . . . . . . 442
(9.6) TERMAN–WANG OSCILLATOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
(9.7) PULSE COUPLED LEAKY INTEGRATOR NEURONS . . . . . . . . . . . . . 443
(9.8) SIRS MODEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
(9.9) EPIDEMIOLOGY OF ZOMBIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445

(9.1) THE DRIVEN HARMONIC OSCILLATOR


For all times, the solution can be obtained by combining the homogeneous
solution (no external force) and one special solution, such as the long-time
ansatz from (9.3) and (9.4). Since the homogeneous solution is given by
$
γ γ2
.x(t) ∼ eλt , λ± = − ± − ω02 ,
2 4

with damping γ , this contribution vanishes in the limit t → 0 and only the
special solution survives.
(9.2) SELF-SYNCHRONIZATION
We use the ansatz

.θ̇ (t) = ω0 + K sin[θ (t − T ) − θ (t)], θ (t) = ω t + eλt

for the steady-state solution ∝ ωt, together with a small perturbation ∝ eλt .
The steady-state solution is stable whenever the real part of λ is negative.3
Expanding for  1 we obtain
 
.ω = ω0 − K sin(ωT ), λ = K cos(ωT ) e−λT − 1 , (11.50)

for the contributions O( 0 ) and O( 1 ) respectively, which specializes (9.27)


to the case of self-synchronization. There is always a solution λ = 0
to (11.50), indicating marginal stability. A graphical analysis of the second

3 Compare Sect. 2.5 of Chap. 2.


11.9 Solutions to the Exercises of Chap. 9 441

equation of (11.50) shows that λ = 0 remains the only solution for


K cos(ωT ) > 0, and that a solution with λ < 0 appears for

. − 1 < T K cos(ωT ) < 0 .

Further on, for T K cos(ωT ) < −1, the exponent λ becomes positive and the
steady-state trajectory ∝ ωt becomes unstable.
Considering for the locking frequency ω graphically the limit K → ∞
in (11.50) on finds that


ωT → n 2π,
. ω=n , n = 0, 1, 2, . . . .
T
The natural frequency ω0 becomes irrelevant and the system self locks with
respect to the delay time T .
(9.3) KURAMOTO MODEL WITH THREE OSCILLATORS
For the Kuramoto system,

K
θ̇i = ωi +
. sin(θj − θi ) , (11.51)
N
j =i

with identical natural frequencies ωi ≡ ω, one can set ω = 0 without loss


of generality. This is achieved by transforming onto the rotating frame of
reference, via θi → θi + ωt. Setting τ = 3/K one obtains

τ θ̇1 = sin(θ2 − θ1 ) + sin(θ3 − θ1 )


.

τ θ̇2 = sin(θ1 − θ2 ) + sin(θ3 − θ2 )


τ θ̇3 = sin(θ1 − θ3 ) + sin(θ2 − θ3 )

for the case of N = 3 oscillators. Using relative variables α = θ2 − θ1 and


β = θ3 − θ2 , which implies θ3 − θ1 = β + α, leads to a closed set of equations
for α and β:

τ α̇ = −2 sin(α) + sin(β) − sin(α + β), .


. (11.52)
τ β̇ = −2 sin(β) + sin(α) − sin(α + β) . (11.53)

For the fixpoints, determined by α̇ = 0 = β̇, we add above equations after


multiplying one of them by a factor of two:

. sin(α) = − sin(α + β), sin(β) = − sin(α + β) . (11.54)


442 11 Solutions

The fixpoint are


(0, 0),
. (π, 0), (0, π ), (π, π ), ± (1, 1) , (11.55)
3
of which (0, 0) (fully synchronized) is stable for K > 0 (attractive cou-
pling). The last state, (1, 1)2π/3, corresponds to three oscillators separated
respectively by 120◦ , which is the stable configuration for K < 0 (repul-
sive coupling). The remainder three fixpoints, (0, π ), (π, 0) and (π, π ) are
unstable; they correspond to configurations in which two oscillators are
synchronized, with the third one being anti-synchronized.
(9.4) LYAPUNOV EXPONENTS ALONG SYNCHRONIZED ORBITS
For two coupled harmonic oscillators

K K
θ̇1 = ω1 +
. sin(θ2 − θ1 ), θ̇2 = ω2 + sin(θ1 − θ2 ),
2 2
see (9.7), the Jacobian at the fixpoint (9.9) is
 
K −1 1 ω2 − ω1
. cos(θ2 − θ1 ) , sin(θ2 − θ1 ) = .
2 1 −1 K

The eigenvalues and eigenvectors are then


   
1 1 1 1
λ1 = 0,
. v1 = √ , λ2 = −κ, v2 = √ ,
2 1 2 −1

where κ = K cos(θ2 − θ1 )/2 > 0, since θ2 − θ1 ∈ [−π/2, π/2] (at the


attracting fixpoint). The first Lyapunov exponent describes a neutral flow
along the synchronized orbits, with Δθ = θ2 − θ1 remaining constant. The
second Lyapunov exponent is negative, describing the attraction along the
orthogonal direction. The periodicity condition, namely that the average of
the first Lyapunov exponent over one period vanishes, is fulfilled.
(9.5) SYNCHRONIZATION OF CHAOTIC MAPS
For (9.57), we are looking for solutions of type

x1 (t) = x̄(t) + ct ,
. x2 (t) = x̄(t) − ct ,

with  1. That is we are looking for perturbations perpendicular to the


synchronized trajectory x1 (t) = x2 (t). For the Bernoulli shift map, f (x) =
ax mod 1 with x ∈ [0, 1], one obtains
 
x̄(t + 1) = f (1 − κ)x̄(t) + κ x̄(t − T ) ,
. c = (1 − κ)a − κac−T .
(11.56)
11.9 Solutions to the Exercises of Chap. 9 443

The solution x1 (t) = x2 (t) = x̄(t) is stable for |c| < 1, viz for κ > κc . Setting
c = 1 in (11.56) we find

a−1
1 = a − 2κc a,
. κc = ,
2a
which is actually independent of the time delay T . It is instructive to plot
x1 (t) and x2 (t), using a small script or program. Note however, that the
synchronization process may take more and more time with increasing delay
T.
The synchronization process is driven by an averaging procedure, which is
most evident for κ = 1/2 and T = 0. For this setting of parameters, perfect
synchronization is achieved in a single time step.
(9.6) TERMAN–WANG OSCILLATOR
We linearize (9.39) around the fixpoint (x ∗ , y ∗ ), taking at the same time the
limit β → 0,

−1 (x < 0)
. lim tanh(x/β) = −1 + 2Θ(x) = .
β→∞ 1 (x > 0)

We find, since x ∗ < 0 (compare Fig. 9.7),

.x̃˙ = 3 (1 − x ∗ 2 ) x̃ − ỹ
ỹ˙ = − ỹ

where x̃ = x − x ∗ and ỹ = y − y ∗ are small perturbations around the fixpoint.


The ansatz x̃ ∼ ỹ ∼ eλt determines the eigenvalues λ in above set of
equations. We obtain λ1 = 3 (1 − x ∗ 2 ) and λ2 = − . The fixpoint x ∗  0 is
unstable, since λ1  3 > 0 for this case. The fixpoint at |x ∗ | > 1 is stable,
since λ1 < 0, λ2 < 0, which implies that x̃ ∼ ỹ ∼ eλt decays in the long time
limit.
(9.7) PULSE COUPLED LEAKY INTEGRATOR NEURONS
To be specific, we select xA (t = 1) = 1 in (9.58), which implies S0 =
1/(1 − exp(−γ )). For the first neuron, the equation of motion reads then
 
ẋA = γ (S0 − xA ),
. xA (t) = 1 − e−γ t /(1 − e−γ ) , (11.57)

compare Fig. 11.18. For two coupled neurons xA (t) and xB (t) the membrane
potential of the second neuron gets a kick when the first neuron spikes, and
vice versa.
When the two neurons are synchronized, say A spikes just before B, the
membrane potentials differ by just after the spike. Once xA reaches the
firing threshold, the difference xA (t) − xB (t) is smaller than , due to the
444 11 Solutions

1 1
xA(t)
0.8 γ=8 4 2 1 0.8

0.6 0.6
x(t)

0.4 0.4
xB(t)
0.2 -γt -γ 0.2
x(t)=(1-e )/(1-e )
0 0
0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 2.5 3
time t time t

Fig. 11.18 Left: The time development x(t) of a single leaky integrator (11.57). It spikes when
x > 1, resetting the membrane potential x to zero. Right: For γ = 2, two coupled leaky integrators.
When one neuron spikes, the membrane potential of the other neuron is increased by = 0.1

concaveness of the solution (11.57), compare Fig. 11.18. The resulting limit
cycle is stable.
If the two neurons are not synchronized the concaveness of x(t) reduces the
difference of the spiking times for A and B until the limit cycle is reached.
(9.8) SIRS MODEL
The fixpoint equation reads
 
x ∗ = ax ∗ 1 − (τR + 1)x ∗ ,
.

which leads to the solution


a−1
x∗ = 0
. or x∗ = ,
a(τR + 1)

where τR = 0, 1, 2, . . . is the recovery time. We examine the stability of x ∗


against a small perturbation x̃n by linearization using xn = x ∗ + x̃n ,


τR  
x̃n+1 = −ax ∗
. x̃n−k + a x̃n 1 − (τR + 1)x ∗ . (11.58)
k=0

No Immunity For τR = 0 the stability condition (11.58) for the trivial fixed
point x ∗ = 0 reduces to

x̃n+1 = a x̃n ,
. leading to the stability condition a < 1.

The analysis for the second fixed point with τR = 0 runs analogously to the
computation concerning the logistic map.4

4 See Sect. 2.4 of Chap. 2.


11.9 Solutions to the Exercises of Chap. 9 445

Immunity of One Time Step For τR = 1, the situation becomes somewhat


more complicated,

1 1
x̃n+1 =
. (3 − a)x̃n + (1 − a)x̃n−1 .
2 2

With the common ansatz x̃n = λn for linear recurrence relations, one finds the
conditions

1 3 1 2
. − a+ + a − 14a + 17 < 1, and a 2 − 14a + 17 > 0
4 4 4

for small perturbations to remain small and not to grow exponentially, i.e.
|λ| < 1. In consequence, a has to fulfill

1 < a < 7 − 4 2 ≈ 1.34 .
.

(9.9) EPIDEMIOLOGY OF ZOMBIES


Dividing the two equations

Ḣ = −αH Z,
. Ż = (2α − 1)H Z (11.59)

yields

dH α (1 − α)H0 − α α
. = , H = + Z, (11.60)
dZ 1 − 2α 1 − 2α 1 − 2α

where the constant of integration has been chosen such that the initial
condition Z0 = 1 − H0 is respected. Whether zombies or humans are more
effective depends on the value of α

– α > 1/2
Zombies win with α/(1 − 2α) being negative. Humans go extinct when the
population of zombies reaches

α − (1 − α)H0
Z∗ =
. , (11.61)
α
which is positive for α > 1/2 and all H0 ∈ [0, 1].
– α < 1/2
Whether humans manage to clinch on depends on the size of their initial
population. For Z → 0 the formally surviving human population is
 %  %
∗ (1 − α)H0 − α >0 > α
.H = = for H0 . (11.62)
1 − 2α <0 < 1−α
446 11 Solutions

Humans always survive when the initial zombie population Z0 = 1 − H0


is small.

Zombies do not recover by themselves, in contrast to the infected population


within the SIRS model, they need to be eliminated by humans (the suscep-
tibles). For a further discussion of the epidemiology of zombies see Alemi
(2015).

11.10 Solutions to the Exercises of Chap. 10

(10.1) GATED LINEAR UNITS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446


(10.2) VARIANCE OF A PRODUCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
(10.3) WIGNER SEMI-CIRCULAR LAW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
(10.4) LANDAU THEORY FOR ABSORBING PHASE TRANSITIONS . . . . 447
(10.5) LAYER VARIANCE MAPPING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
(10.6) NEURAL DIFFERENTIAL EQUATIONS . . . . . . . . . . . . . . . . . . . . . . . . 449
(10.7) DOT PRODUCTS ARE KERNEL FUNCTIONS . . . . . . . . . . . . . . . . . . . 449
(10.8) LINEAR ATTENTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450

(10.1) GATED LINEAR UNITS


The input vector of (10.65) is taken to be two dimensional, x = (x1 , x2 ),
with xi = ±1. We set

w = (1, 0),
. v = (0, −1), bw = 0 = bv . (11.63)

With σ± ≡ tanh(±1) we have

σGLU (1, 1) = σ+ σ− ,
. σGLU (1, −1) = σ+ σ+ ,

and equivalently for (−1, −1) and (−1, 1). This is, apart from an overall
normalization, the definition of the XOR gate.
For the AND gate one sets, e.g., v1 = 0 = v2 , bv = −1, which leads to
constant second factor, and w1 = 1 = w2 and bw = 1.5.
11.10 VARIANCE OF A PRODUCT
Trivially, x = Δx + μx holds, where Δx = x − μx , and the same for y.
Consequently one has

xy − μx μy = (Δx + μx )(Δy + μy ) − μx μy = ΔxΔy + Δxμy + Δyμx .


.
11.10 Solutions to the Exercises of Chap. 10 447

Per definition Δx = 0 = Δy, which leads to


& ' & 2 '
. (xy − μx μy )2 = ΔxΔy + Δxμy + Δyμx
& ' & ' & '
= (ΔxΔy)2 + (Δxμy )2 + (Δyμx )2 .

Lastly it suffices to note that x and y are independent. One then obtains
  
2
σxy
. = σx2 + μ2x σy2 + μ2y − μ2x μ2y ,

the formula to be proven.


(10.3) WIGNER SEMI-CIRCULAR LAW
Adjacency matrices are symmetric, Γ = 1, with entries that are either zero
or unity. With z being the coordination number, we have

z 1  z 2  z 2 z
μw =
. , σw2 = (N − z) +z 1− ≈ .
N N N N N

The boundaries,
√ √
(1 + Γ ) Nσw = 2 z ,
.

do hence agree. Given that (10.15) describes a uniformly filled ellipse, it


follows that the density of state is also an ellipse.
(10.4) LANDAU THEORY FOR ABSORBING PHASE TRANSITIONS
Making use of the Taylor series 1/(1 − x)2 ≈ 1 + 2x + 3x 2 + . . . the self-
consistency condition (10.26) takes the form
 
2Rw
. σy ≈ 1 + 2σy2 + 3σy4 − 1 + 2σext
2 2 2
, σy2 = x ,

or
2
2σext
. = 2(1 − Rw
2
)σy2 + 3σy4 . (11.64)

Substituting the relations

2
σext 4
h≡2
. , a = 1 − Rw
2
, b= , φ ≡ σy (11.65)
σy 3

into (11.64) leads to

h = 2aφ + wbφ 3 ,
.

which is exactly Eq. (6.2), which determines the relation between the order
parameter φ and the external field h, where a = 1 − Rw 2 = a ∼ T −T .
c
448 11 Solutions

For h = 0 the order parameter is finite when a < 0, viz when T < Tc or
Rw2 > 1.

Interestingly, the correspondence to the Landau theory of phase transi-


tions is recovered only when σext 2 is measured in units of σ , as expressed
y
by (11.65), with the relation h = 2σext2 /σ involving a self-consistency loop.
y
2 from (11.64).
In practice, one determines first σext
(10.5) LAYER VARIANCE MAPPING
Disregarding layer indices, the mean of the membrane potential is

μ̃w
μx = μext + Nμw μy = μext + μ̃w μy ,
. μw = , (11.66)
N
where we made use of (10.16). The 1/N scaling for the mean implies that it
drops out of the relation (10.18) for the variance,
   
2
σw·y
. = N σw2 + μ2w σy2 + μ2y − μ2w μ2y
  σ̃w
= σ̃w2 σy2 + μ2y , σw = √ . (11.67)
N

The probability distribution P (x) is therefore a Gaussian with mean μx and


variance σx2 = σext
2 + σ 2 . Equation (10.24) takes hence the from
w·y

1  
dx 1 − e−(x−b) e−(x−μx ) /(2σx ) ,
2 2 2
σy2 + μ2y = 
.
2π σx2

or
!
1 − σy2 − μ2y =
. 2π σb2 dxN(x|μb , σb )N (x|μx , σx ). (11.68)
!
(μx −μb )2
2π σb2 −
2(σb2 +σx2 )
= ! e , (11.69)
2π(σb2 + σx2 )

when writing the integrand as the product of two Gaussians, where μb = b


and 2σb2 = 1. In the last step we used the common formula for the product
of two Gaussians.
Adding layer indices we obtain a mapping (σy,n−1 , μx,n−1 ) →
(σy,n , μx,n ), with

μx,n = μext,n + μ̃w,n μy,n−1


.
 2 
2
σx,n = σext,n
2
+ σ̃w,n
2
σy,n−1 + μ2y,n−1
11.10 Solutions to the Exercises of Chap. 10 449

and

μy,n =
. dxσGaus (x − bn )N (x|μx,n , σx,n ) ≈ σGaus (μx,n − bn )

(μx,n −bn )2
1 − 2
1 − σy,n
2
− μ2y,n = ! e 1+2σx,n

1 + 2σx,n
2

where σGaus (z) is the Gaussian transfer function (10.23).


(10.6) NEURAL DIFFERENTIAL EQUATIONS
For a single linear unit per layer the neural differential equations (10.35) read

ẋ = ϑx,
. Ė = −ϑE, , x(0) = x α E(1) = 2[F α − x(1)] ,
(11.70)

where we normalized layer time, t ∈ [0, 1]. Also included are the boundary
conditions for x(t) (starting) and E(t) (ending). Here we used a different
sign convention for E(t) than in the main text. It is possible to assume that
ϑ(t) ≡ ϑ is uniform when training starts, given that the update procedure
for ϑ is independent of layer time,

d 
Δϑ = −ηxE,
. xE = (ϑ − ϑ)xE = 0 ,
dt

when using (11.70) and (10.34), which implies that xE = 2f α (F α −f α ) for


all layers. Here f α = x(1). The layer-dependency of the activity x = x(t)
and the error E = E(t) are

x(t) = x α eϑt ,
. E(t) = 2(F α − f α )e−ϑ(t−1) , f α = x α eϑ .

Training stops once ϑ = log(F α /x α ), the expected outcome.


(10.7) DOT PRODUCTS ARE KERNEL FUNCTIONS
A feature function g(x) maps an input vector x to a real vector. The dot-
product of the discretized function is
 β
Θα,β =
. giα gi , gα = g(xα ), α = 1, . . . , Nα .
i

For the tangent kernel, i involves all adaptable parameters, with Nα being
the training data size. The expectation value of any vector h = (h1 , . . . , hNα )
is
  
.Θα,β h = hα Θα,β hβ = ci2 , ci = hα giα ,
α,β i α
450 11 Solutions

which is always positive (or zero). This statement holds in particular for the
eigenvectors of Θ, which proves that the eigenvalues are positive (or zero).
(10.8) LINEAR ATTENTION
With (10.18), the variance of the entries of the query, key and value vectors
is
2
σQKV
. = Dσ02 σx2 .

An individual key-query scalar product has hence a variance D(σQKV 2 )2 =


D 3 σ04 σx4 . Multiplied with a value vectors, the geometric sum (10.63) yields

1   2 1
σy2 =
.
2
DσQKV σQKV = D 4 σ06 σx6
1−γ 1−γ

for the variance of the elements of the output activity vector. It is desirable
that σy remains constant when D increases.

– When the normalization of the inputs activities does not scale with D, e.g.
when σx = 1, one has σ0 ∼ (1/D)2/3 .
– When input vectors are normalized, such that |xi | = 1, one has σx2 ∼ 1/D
and hence σ0 ∼ (1/D)1/6 .

References
Alemi, A. A., et al. (2015). You can run, you can hide: The epidemiology and statistical mechanics
of zombies. Physical Review E, 92, 052801.
Bettencourt, L. M. A., et al. (2007). Growth, innovation, scaling, and the pace of life in cities.
Proceedings of the National Academy of Sciences, 104, 7301–7306.
Huang, S.-Y., Zou, X.-W., Tan, Z.-J., & Jin, Z.-Z. (2003). Network-induced non-equilibrium phase
transition in the “Game of Life”. Physical Review E, 67, 026107.
Huepe, C., & Aldana-González, M. (2002). Dynamical phase transition in a neural network model
with noise: An exact solution. Journal of Statistical Physics, 108, 527–540.
Index

Symbols Autocorrelation function, 212


1/f noise, 213 Avalanche
coevolution, 236
critical, 226
A length distribution, 220
Absorbing phase transition, 221, 372 sandpile, 219
self-organized criticality, 221 size distribution, 220, 223
time-scale seperation, 221 subcritical, 225
Adaptation/adaption
catastrophy, 71
game theory, 315, 316 B
time scale, 301 Backpropagation, 374
Adaptive Bak, Per
binning, 173 sandpile model, 218
climbing, 302 Bak–Sneppen model, 231
regime, 295 Baldwin effect, 286
system, 97 Basin of attraction, 47
walk, 298, 303 cycle, 261
Adiabatic limit, 69 Bautin normal form, 61
Adjacency matrix, 11 Bayesian
Algorithm, genetic, 249 inference, 170, 379
Alleles, 281 statistics, 169, 169
Annealed likelihood, 169
approximation, 259 posterior, 171
Asexual reproduction, 280 prior, 171
Asynchronous updating, 248 Bayes theorem, 169, 378
Attention, 386 Beanbag genetics, 281, 290
causal, 388 Bernoulli shift map, 358
dot product, 388 Bifurcation, 46, 56
key, 388 blue sky, 57
query, 388 codimension, 65
value, 388 fold, 57
Attractor, 47 global, 62, 63
basin, 47, 261 homoclinic, 63
boolean network, 261 Hopf, 59
collision, 77 infinite period, 64, 65
cyclic, 249, 261 local, 62
relict, 146 logistic map, 74
strange, 93 normal form, 56

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 451
C. Gros, Complex and Adaptive Dynamical Systems,
https://doi.org/10.1007/978-3-031-55076-8
452 Index

pitchfork, 57 Central limit theorem, 168


potential, 57 Chaos
saddle-node, 56, 143 0-1 test, 95
symmetry, 58 crisis, 78
tangent, 84 deterministic, 71
tangential, 57 hyperchaos, 93
transcritical, 57 life at the edge of, 271
Binning logistic map, 75
probability distribution, 171 partially predictable, 94
time series, 172 route to chaos, 77
Boolean dynamics Chemical reactions, 304
ancestor, 262 Chi-squared test, 201
coupling function, 241 Clique, 6
descendant, 262 Clustering
variable, 243 coefficient, 5
Boolean network, 241, 241 lattice models, 34
annealed, 249 loop, 13
connectivity, 245 random graph, 22
controlling element, 244 Codimension, 65
dynamic core, 263 Coevolution, 230, 314
dynamics, 248 arms race, 314
evolution rule, 245 avalanche, 236, 318
geometry, 245 green world hypothesis, 314
lattice assignment, 245 red queen, 314
linkage, 245, 263 Cognitive system
mean-field theory, 252 time series analysis, 175
model realization, 248 Coherence resonance, 124
module, 263 Collective phenomena, 147
percolation of information, 251 Complexity
quenched, 248 algorithmic, 198, 198
relevant node, 263 behavior, 195
scale-free, 258 deterministic, 199
state space, 243 emergence, 199
time evolution, 263 excess entropy, 196
uniform assignment, 245 extensive, 194
Bose–Einstein condensation, 204 generative, 198
Grassberger, 197
intensive, 194
C Kolmogorov, 198
Cantor set, 125 measure, 193
Car following model, 156 Computation unit, 361
linear flow, 157 Boltzmann, 363
Catastrophe theory, 65 gating, 364
hysteresis, 66 Gaussian, 371
Cell GLU, 364
differentiation, 242, 270 LSTM, 364
division, 271 perceptron, 362
yeast cycle, 273 ReLU, 363
Cellular automata, 214 Conditional probability, 169
game of life, 215 Connection probability, 4
updating rules, 214, 215 Conserving system, 89
Center manifold, 67 Lotka–Volterra, 103
catastrophe theory, 67 Constant of motion, 49, 102
pitchfork, 67 Continuity equation, 118
Index 453

Kuramoto model, 333 evolution, 286


Macroecology, 313 Differential equation
opinion dynamics, 156 first-order, 48
Coordinates injecivity, 81
normal, 88 neural, 374, 375
polar, 45 Diffusion, 106
Coordination number, 4 equation, 106, 129
Correlation Green’s function, 107
autocorrelation, 212 one dimensional, 106
function, 210 ordinary, 110
length, 210 stochastic process, 113
scale invariance, 210 subdiffusion, 110
Coupling ensemble, 246, 247 Dimensionality reduction, 361
additive functions, 248 Dimension, Hausdorff, 93
classification, 246 Dissipation, 87
forcing functions, 247 Dissipative system, 87, 91
magnetization bias, 247 conserving, 89
uniform distribution, 247 phase space contraction, 88
Coupling functions, 246 Distance
Covariance matrix, 378 average, 5
Critical Hamming, 250, 287
avalanche, 226 percolation, 22
coevolutionary avalanches, 236 Distribution
driven harmonic oscillator, 328 avalanche, 220
phase, 252, 253 component sizes, 25
sensory processing, 275 cycle length, 268
slowing down, 67 fitness barrier, 232, 234
Criticality fitness effects, 284
dynamical system, 209 Gaussian, 120
scale invariance, 210 natural frequencies, 330
universality, 211 power law, 10
Current scale free, 10
diffusion, 118 Tsallis–Pareto, 10
drift, 118 Drift velocity, 117
escape, 120 Dynamical system
Cycle, 249 adaptive, 97
attractor, 261 autonomous, 48
average number, 269 basic concepts, 45
length distribution, 268 conserving, 89
criticality, 209
deterministic, 106
D dissipative, 87, 89, 91
Degree ergodic, 50
average, 17 gradient, 68
sequence, 17 integrable, 50
Degree distribution, 9 invariant manifold, 52
arbitrary, 17 Jacobian, 53
Erdös–Rényi, 9 Lotka–Volterra, 102
excess, 18, 19 mechanical, 50
neighbors, 18 noise-controlled, 117
scale-free, 39 phase transition, 254
Delay differential equations, 78 reaction-diffusion, 130
Deterministic stochastic, 106, 117
chaos, 71 Taken–Bogdanov, 62
454 Index

time delays, 78 prebiotic evolution, 306


Dynamics threshold, 296
adaptive climbing, 298 Escape
Bak–Sneppen model, 233 current, 120
boolean network, 248 Kramer’s, 121
conserving, 219 Evolution
continuous time, 48 adaptive regime, 295
discrete time, 48 barrier distribution, 232
evolution, 283 deterministic, 287
macromolecules, 304 epistasis, 290, 292
opinion, 319 fitness barrier, 231
fundamental theorem, 285
generation, 280
E linear equation, 288
Eigen, Manfred long-term, 230
hypercycle, 306 mixed stable strategy, 438
quasispecies theory, 304 mutation, 283
Eigenvector neutral regime, 298
extensive, 306 point mutations, 288
intensive, 306 population genetics, 279
Emergence, 147 prebiotic, 304
strong, 147, 199 punctuated equilibrium, 236, 297
weak, 147, 199 quasispecies, 296
Ensemble random energy model, 299
average, 9, 114, 367 selection, 283
fluctuations, 9, 10 speciation, 282
Entropy, 177 stochastic, 283
conditional, 179, 189 time scales, 231
density, 196 wandering regime, 296
differential, 181 Excess
excess, 196 degree distribution, 18, 19
information, 177 entropy, 196
joint, 187 Excitable state, 346
life, 177 Exponent
marginal, 187 dynamical, 212
minimal, principle, 184 scaling, 10
relative, 190 Extensive (property)
Shannon, 177 complexity, 194
Tsallis, 201 entropy, 178
Environment macroecology, 311
constant, 282 thermodynamic limit, 5, 178
resources, 320 vector, 306
Epistatic interactions, 281, 292 Extinction probability
Equation Galton–Watson, 229
chemical kinetic, 304 Markov chain, 110
continuity, 118, 156
deterministic evolution, 287
diffusion, 106 F
Fisher, 130 Fast threshold modulation, 347
Fokker–Planck, 117, 119 Feature extraction, 361
Langevin, 113 geometric, 362
Newton, 118 Fermi function, 133
Erdös–Rényi graph, 4 Ferroelectricity, 204
Error catastrophe, 292, 297 Ferromagnetism, 204
Index 455

First passage time, 108 glider, 216


Fisher equation, 130 universal computing, 216
velocity selection, 137 Game theory, 313
Fisher information, 192 lattice, 318
Fitness Nash equilibrium, 315
barrier, 230 payoff matrix, 316
Fujiyama, 290 strategy, 315
landscape, 230, 284 utility, 315
maximum, 230 zero-sum, 315
ratio, 285 Gaussian
sharp peak, 293 distribution, 120, 165
Wrightian, 284 multivariate, 376
Fixpoint, 45, 52 process, 376
catastrophic, 67 Gene expression network, 242
classification, 54 Genetic algorithm, 249
flow of information, 255 Genetics
focus, 54 beanbag, 281
hyperbolic, 54 combinatorial, 281
logistic map, 73 Mendelian, 281
Lorenz model, 92 Genome, 280
Lotka–Volterra, 102 genotype, 282
map, 53 mutation, 283
node, 54 phenotype, 282
period two, 74 size, 280
saddle, 54 Genotype, 282
stability, 47, 52 Giant connected component,
Flow of information, 250 20
Fokker–Planck equation, 117, 119 percolation, 28
harmonic potential, 119 size, 28
macroecology, 311 Google page rank, 113
particle current, 118 Gradient descent, 383
Forest fire model, 216 Graph
lightning, 217 Barabási–Albert, 37
Fractal, 93 bipartite, 2
Free energy, 205 clique, 6
Frequency locking, 328 clustering, 7
Friction, 87 community, 7
damping term, 114 diameter, 5
large damping, 122 Erdös–Rényi, 4
Fujiyama landscape, 290 Kronecker, 42
Function Laplacian, 14, 113
Fermi, 133 normalized Laplacian,
sigmoidal, 133, 363 15
scale-free, 11, 36
spectrum, 12, 14
G tree, 8
Galton–Watson process, 228 Green’s function, 12
Game diffusion, 107
Hawks and Doves, 316 Growth rate
Prisoner’s dilemma, 317 delay, 81
tragedy of the commons, 320 forest fire, 216
Game of life, 215 nodes, 37
blinker, 215 reaction network, 307
block, 215 SIRS model, 353
456 Index

H rigidity, 256
Hamming distance, 250, 287 Kolmogorov complexity, 198
Harmonic oscillator Kramer’s escape, 121
driven, 327 Kronecker graph, 42
Hausdorff dimension, 93 Kullback-Leibler divergence, 190
Sierpinski carpet, 94 2
.χ test, 191
Hawks and Doves game, 316 Fisher information, 193
Heteroclinic orbit, 55 mutual information, 192
Homocline, 63 Kuramoto model, 329, 329
Hopf bifurcation, 46, 59, 80 critical coupling, 334
subcritical, 60 drifting component, 332, 334
supercritical, 60 locked component, 332, 334
theorem, 60 rhythmic applause, 336
Huepe–Aldana network, 277 time delay, 337
Hydrogen atom, 209
Hypercycle, 306, 306
prebiotic evolution, 309 L
Hysteresis, 66 Lagrange parameter, 182
Landau-Ginzburg model, 205
Langevin equation, 113
I diffusion, 115
Information massless, 116
loss, 251 non-linear, 116
mutual, 185, 188 solution, 114
retention, 251 Laplace operator, 129, 209
routing, 387 Law
theory, 163 circular, random matrices, 368
Intensive (property) large numbers, 167
excess entropy, 196 Ohm, 118
macroecology, 311 power, 210
thermodynamic limit, 5, 178 second, thermodynamics, 177
vector, 306 semi-circle, 13
Ising model, 211 Leaky integrator neuron, 358
deterministic evolution, 290 Learning
transfer matrix, 291 Bayesian, 171
Isocline offline vs. online, 176
perceptron, 362 supervised, 383
Terman–Wang oscillator, 345 Lévy flight, 109
Van der Pol oscillator, 101 Liénard variable, 100
Life
adaptive system, 274
J edge of chaos, 271
Jacobian, 53 origin, 304, 310
Jensen inequality, 189 Limit cycle, 45, 46, 329
Linkage, 245
loop, 262
K Liouville’s theorem, 89
KAM Liquid-gas transition, 204
theorem, 51 Logistic map, 71
torus, 51 bifurcation, 74
Kauffman network, 243 chaos, 75
K.=1, 264 odd, 77
K.=2, 266 Loop
K.=N, 267 linkage, 262
Index 457

network, 13 random surfer, 111


Lorenz model, 91 Mathematical pendulum, 88
Lotka–Volterra model, 102 Matrix
rabbits and foxes, 102 adjacency, 11
sheeps and rabbits, 323 elliptic, 368
Lyapunov mutation, 287
exponent, 52, 76 payoff, 316
exponent, hyperchaos, 93 transfer, 291
exponent, maximal, 76 Mean-field theory
spectrum, 54 Bak–Sneppen model, 233
boolean network, 252
Kuramoto model, 331
M Mechanical system, 102
Machine learning Microevolution, 280
attention, transformer, 386 Model
backpropagation, 374 Bak–Sneppen, 231
continuous layers, 374 car following, 156
Gaussian process, 376 forest fire, 216
neural tangent kernel, 382 Gray–Scott, 142
recurrent networks, 366 Ising, 211
regression, 378 Kuramoto, 329
residual networks, 374 Lorenz, 91
softmax, 388 random energy, 299
tokenization, 387 random neighbors, 232
XOR problem, 365 random surfer, 112
Macroecology, 310 SIRS, 351
neutral theory, 310 small-world network, 34
Magnetism, 204 Watts–Strogatz, 35
Malthusian fitness, 284 Module, boolean network, 263
Manifold Mutation
center, 55, 67 matrix, 287
invariant, 52, 104 point, 287
iso-energy, 104 rate, 283, 284
stable, 55 time scale, 300
unstable, 55 Mutual information, 185, 188
Map
logistic, 71
Poincaré, 49 N
shift, 125 Nash equilibrium, 315, 318
Markov Natural frequencies, 330
process, 185 Network
property, 110 actors, 2
two channel, 185 assortative mixing, 8
Markov chain, 110, 185 autocatalytic, 306
absorbing states, 110 bipartite, 2, 8
extinction probability, 110 boolean, 241
master equation, 111 communication, 3
stationary, 111 correlation effects, 8
Mass conservation, 305 diameter, 5
Master equation, 111 diffusion, 113
bounded confidence, 154 evolving, 36
diffusion, 106 gene expression, 242, 270
Fokker–Planck, 119 metabolic, 8
macroecology, 311 N–K, 243
458 Index

preferential attachment, 37 threshold, 21


protein interaction, 3 transition, 21, 257
reaction, 307 Periodic driving, 122
retentive, 390 Perturbation theory
robustness, 29 Fischer equation, 136
social, 1 secular, 97
yeast, 272 Phase (dynamic)
Neural chaotic, 252–253
activity, variance, 371 critical, 252–253
network, recurrent, 366 frozen, 252, 253
tangent kernel, 384 Phase diagram
Neuron bifurcation, 254
compartmental, 364 Gray–Scott, 144
excitable, 346 N–K model, 256
point, 364 Phase space, 47
Neutral regime, 298 contraction, 88
Newton’s law, 118 Phase transition, 66, 147
Noise continuous, 207
colored, 114 dynamical system, 254
pink, 213 first-order, 208
stochastic system, 117 Landau theory, 203
white, 114 second-order, 203
Phenotype, 282
Piecewise linear system, 348
O matching conditions, 349
Ohm’s law, 118 Pitchfork bifurcation, 57
Open boundary conditions, 219 catastrophe theory, 65
Opinion dynamics, 154, 319 Poincaré map, 49
bounded confidence, 154 Point mutation, 284, 287
opinion current, 156 Poisson distribution, 9
Orbit Potential
boolean network, 253 double-well, 121
chaotic, 92 harmonic, 119
closed, 51 Mexican hat, 153
matching conditions, 349 Power spectrum, 213
relaxation oscillations, 100 Prebiotic evolution, 304
self-retracting, 261 RNA world, 306
synchronized, 342 Prisoner’s dilemma, 317
Order parameter, 203 Probability distribution, 163
Kuramoto model, 331 binning, 171
Origin of life, 310 cumulative, 167
Oscillator exponential, 165
coupled, 329 Gamma, 312
mathematical, 88 Gaussian, 165
Stuart–Landau, 59 generating functional, 166
Terman–Wang, 344 marginal, 186
Van der Pol, 97 mean, 164
median, 164
PDF, 164
P standard deviation, 164, 165
Payoff matrix, 316 Tsallis–Pareto, 42
Percolation, 17 variance, 164
cliques, 21 Probability generating function
information, 251 binary branching, 224
Index 459

Galton–Watson, 228 Schrödinger equation, 209


Poisson distribution, 24 Self organization, 147
properties, 23, 166 Fisher equation, 137
Punctuated equilibrium, 236, 297, 315 reaction network, 308
traffic jam, 160
Self-organized criticality, 212
Q branching theory, 222
Quasispecies, 296, 304 Semi-circle law, 13
Quenched dynamics, 261 Serial updating, 248
Shannon entropy, 178
predictability, 177
R source coding theorem, 180
Radial basis functions, 381 Sharp peak landscape, 293
Random branching, 222 linear chain model, 294
binary, 223 stationary solution, 294
decomposition, 225 Shift map, 125
sandpile, 222 Sierpinski carpet, 94
Random graph, 4 Sigmoidal, 133
construction, 17 SIRS model, 351
generalized, 17 continuous time, 356
robustness, 29 coupled, 354
Random matrix theory, 367 discrete time, 353
Random walk, 106 endemic state, 356
configuration space, 267 logistic map, 353
Lévy flight, 109 on a network, 43
Reaction-diffusion system, 130 Skip connections, 375
Gray Scott, 142 Small-world
normal form, 130 effect, 2
sum rule, 135 graph, 34
Regression, 378 Speciation, 282
Relaxation oscillator Species
synchronization, 347 fitness, 284
Terman–Wang, 344 quasispecies, 296
Van der Pol, 100 Spectral radius, 369
Relaxation time, 212 Spontaneous symmetry breaking, 205
distribution, 227 State space
Reproduction, asexual, 280 boolean network, 243
Reservoir computing, 369 population, 281
RNA world, 306 undersampling, 266
Stationary solution
Hawks and Doves, 316
S sharp peak landscape, 294
Saddle-node bifurcation, 56, 143 Stigmergy, 150
invariant cycle, 65 Stochastic calculus, 116
Sandpile model, 218 Ito, 116
abelian, 219 Stratonovich, 117
boundary conditions, 219 Stochastic escape, 117, 302
self-organized criticality, 218 evolution, 298, 304
updating rule, 218 probability, 302
Scale invariance, 210 typical fitness, 303
criticality, 211 Stochastic resonance, 117, 121
distribution, 259 ice ages, 123
graph, 36 resonance condition, 123
observation, 267 switching times, 122
460 Index

Stochastic system, 113 linear chain trick, 82


continuous, 164 synchronization, 337
discrete, 164 Time scale separation
evolution, 283 SIRS model, 355
Strange attractor, 93 Van der Pol oscillator, 100
Stuart–Landau oscillator, 59 Time series, 173
Superconductivity, 204 analysis, 176
Susceptibility, 208 self averaging, 174, 175
Swarm intelligence, 147 symbolization, 173
Synchronization trailing average, 176
aggregate averaging, 340 XOR, 174
applause, 336 Tipping instability, 69
causal signaling, 344 Tragedy of the commons, 320
chimera, 336 catastrophic poverty, 321
driven oscillator, 328 oligarchs, 322
Kuramoto model, 333 Training
mechanisms, 339 lazy, 386
relative phase, 355 Transcritical bifurcation, 57
relaxation oscillator, 347 Transfer matrix, 291
time delay, 337 Transformer, 389
Synchronous updating, 248 Transport, 106
ballistic, 107
diffusive, 107
T Tsallis entropy, 201
Taken–Bogdanov system, 62 Tsallis–Pareto distribution, 42
Tangent transition, 406 Turing
Terman–Wang oscillator, 344 complete, 366
active phase, 346 instability, 139
silent phase, 346 machine, 198
Theorem
Bayes, 169
central limit, 168 U
fundamental of natural selection, 285 Updating
KAM, 51 asynchronous, 248
Liouville, 89 offline vs. online, 176
source coding, Shannon, 180 serial, 248
Thermodynamic limit, 5 synchronous, 248
Bak–Sneppen model, 234
clustering coefficient, 6
deterministic evolution, 298 V
giant connected component, 21 Van der Pol oscillator, 97
graph spectrum, 12 Liénard variable, 100
N–K network, 268 secular perturbation theory,
random matrix theory, 368 97
Time Vanishing gradient problem, 375
adaptive climbing, 301 Variable
encoding, 274 aggregate, 340
first passage, 108 boolean, 243, 243
relaxation, 212 Liénard, 100
Time delay, 78 rotating frame, 332
delay kernel, 81 Variational
distributed, 81 calculus, 182
kernel series framework, 82 inference, 379
Kuramoto model, 337 Voter model, 156
Index 461

W X
Walk XOR problem, 365
adaptive, 298, 303
macroecology, 311
N–K network, 267 Y
random, 106 Yeast cell cycle, 271, 273
Wandering regime, 303
Watts–Strogatz model, 35
Wigner law, 13 Z
Wrightian fitness, 284 Zeldovich equation, 414

You might also like