Poly X MB 2024
Poly X MB 2024
MICHAEL BENZAQUEN1
www.econophysiX.com
[email protected]
ii
Contents
Foreword 1
iii
iv CONTENTS
9 Financial engineering 53
9.1 Optimal portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
9.2 Optimal trading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
9.3 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
9.3.1 Bachelier’s fair price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
9.3.2 Black and Scholes’ extravaganza . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
9.3.3 Residual risk beyond Black-Scholes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
9.3.4 The volatility smile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
9.3.5 Model-generated crashes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
9.4 The Financial Modelers’ Manifesto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Appendices 65
C Fish markets 75
C.1 Why fish markets? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
C.2 The Marseille fish market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
C.3 Trading relationships and loyalty formation . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
C.3.1 A simple model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
C.3.2 Mean Field Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
C.3.3 Beyond Mean Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
C.3.4 Heterogeneities and real data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
C.4 The impact of market organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
C.4.1 The Ancona fish market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
C.4.2 Similarities and differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
References 86
Tutorial sheets 87
1. Time series simulation and analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
2. Randomness in complex systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3. Stylized facts in financial time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4. Mesoscopic models in finance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5. The Random Field Ising model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6. Herd behavior and aggregate fluctuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7. The ant recruitment model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8. The latent order book model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
9. Optimal portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
vi CONTENTS
Foreword
Ever since Bachelier’s PhD thesis in 1900 [1], a theory of Brownian motion 5 years before Einstein, our
understanding of financial markets has reasonably progressed. Over the past decades, financial engi-
neering has grown tremendously and has regrettably outgrown our understanding. The inadequacy of
the models used to describe financial markets is often responsible for the worst financial crises, with
significant impact on everyday economy. From a physicist’s perspective, understanding the price for-
mation mechanisms – namely how markets absorb and process information of thousands of individual
agents to come up to a "fair" price – is a truly fascinating and challenging problem. Fortunately, modern
financial markets provide enormous amounts of data that can now be used to test scientific theories at
levels of precision comparable to those achieved in physical sciences.
This course presents the approach adopted by physicists to analyse and model financial markets (see
e.g. [2–5]). Our analysis shall, insofar as this is possible, always be grounded on the real financial data.1
Rather than sticking to a rigorous mathematical formalism, we will seek to foster critical thinking and
develop intuition on the "mechanics" of financial markets, the orders of magnitude, and certain open
problems. By the end of this course, students should be able to:
• Answer simple questions on the subject of this course (without course notes).
• Rigorously analyse real financial data and empirical results.
• Conduct logical reasoning to answer modelling problems (ABM).
• Carry out simple calculations similar to those presented in class and during the tutorials (PDEs,
Langevin equations, dimensional analysis, etc.).
The content of this course is largely inspired by references [2, 3], together with my own experience
in quantitative finance as a physicist. Students interested in going the extra mile are encouraged to
consult references [6–14].
1
The tutorials (on Python) will focus on the analysis of real financial data and numerical simulations of some of the models
presented in the course.
1
2 FOREWORD
1
In this Chapter we introduce some important ideas and methods for the description of one-dimensional
time series and the analysis of empirical data.
Throughout this course we shall restrict to the analysis of one-dimensional time series, denoted x(t),
or equivalently x t ∈ R.
1.2 Variogram
Most of the time, we shall also restrict to the analysis of stationary processes.
Most of the time we shall consider detrended time series, that is 〈∆τ x〉 = 0. The most natural quantity
that comes to mind to characterise by how much x(t) varies over a given timescale τ is the standard
deviation:
Æ
σ(τ) := V (τ) , with V (τ) := (∆τ x)2 . (1.1)
The plot of V (τ) against τ is known as the variogram. Frequently, one rather refers to the signature
plot, which is the plot of V (τ)/τ against τ.
3
4 1. EMPIRICAL TIME SERIES
σ02
1 − e−ω|τ| .
VOU (τ) = (1.2)
ω
Typical examples of OU processes are the velocity of a large Brownian particle in a viscous fluid
or interest rates, see Vasicek interest rate model [15].
We also define the correlogram C (τ) := 〈x t+τ x t 〉 − 〈x t 〉2 , which can be expressed as a function of the
variogram.
In most cases, the variogram does not suffice to fully characterise the time series. One should com-
pute the whole probability distribution Pτ (∆τ x), or equivalently all of its moments Mq (τ) = (∆τ x)q .
There is however a remarkable exception: scale invariant processes.
Process x t is said to be scale invariant if and only if there exists a function f such that:
1 ∆τ x
Pτ (∆τ x) = f . (1.3)
σ(τ) σ(τ)
Fractional Brownian motions are a good example of scale invariant processes, in particular one has
2
ffBm (u) = p12π e−u /2 for all H. Levy flights are also scale invariant.
If x t is a scale invariant process, then all the relevant information is contained in σ(τ). Indeed,
all the moments Mq (τ) are proportional to the qth power of σ(τ):2
Z
q
Mq (τ) = σ(τ) du uq f (u) . (1.4)
In addition to bringing substantial mathematical simplifications, scale invariance often bears witness
of a physical property of the system (see Chapter 5).
P t−1 0
1
To compute this result one may proceed in discrete time: x t+1 = (1 − ω)x t + η t ⇒ x t − x 0 = t 0 =0 (1 − ω) t−t −1 η t 0 , and
τ−1 τ
2 τ−t 0 −1 2 1−(1−ω)
then using that 〈η t 0 η t 00 〉 = 2σ0 δ(t − t ), one can compute 〈(x τ − x 0 ) 〉 = 2σ0 t 0 =0 [(1 − ω) ] = 2σ0 1−(1−ω)2 , which
2 0 00 2 2
P
1.4 Intermittency
In many situations, higher order moments carry essential information. Indeed, while the variogram
of prices returns x t in financial markets is remarkably linear, financial time series are quite far from
Bachelier’s random walks, or even Levy flights. Indeed, strong long range correlations appear at the
level of x 2t which measures the activity or volatility of the market. Very much like in fluid turbulence,
one observes intermittency or volatility clustering, that is calm periods interspersed with more agitated
episodes of all sizes. While the correlogram of price changes doesn’t reveal such effects, the correlogram
of squared returns displays very long range correlations:
As a results, moments no longer trivially scale as in Eq. (1.4). One speaks of a multifractal time series
when:
ξ(q)6=q
Mq (τ) ∼ σ(τ) . (1.6)
A rather good model for both fluid turbulence and finance is the so-called log-normal cascades [16, 17].
In a nutshell, one has ξ(q) = q + λ2 q(q − 2) where λ is coined the intermittency parameter. The scale
invariant or monofractal case corresponds to λ = 0.
When the variance 〈x 2t 〉 is itself a stochastic process, one speaks of Heteroskedasticity, see Chapter 2.
Skewness and kurtosis are commonly used to further describe the shape of a probability distribution. The
skewness ζ is a measure of the asymmetry of a probability distribution. The kurtosis κ is a measure of
the "tailedness" of a probability distribution. They are given by the 3rd and 4th standardised moments:
M3c M4c
3 · 4 ·
x − 〈x〉 x − 〈x〉
ζ := = , κ := −3= − 3, (1.7)
σ σ3 σ σ4
with M3c and M4c the 3rd and 4th central moments respectively.3 The Gaussian distribution has κ =
ζ = 0, and more generally all cumulants of higher order are identically zero. One speaks of negative
skew (ζ < 0) when the left tail is longer, and positive skew (ζ > 0) when the right tail is longer, see
Fig 1.2. A distribution with κ = 0 is said to be mesokurtic while κ > 0 (resp. < 0) is refered to as
leptokurtic (resp. platykurtic). A leptokurtic (resp. platykurtic) distribution has fatter (resp. thinner)
tails than the Gaussian.
3
Note that sometimes κ, as defined in Eq. (1.7), is called excess kurtosis while κ + 3 is called kurtosis.
6 1. EMPIRICAL TIME SERIES
Conclusions
While the variogram is often the first quantity one will consider to analyse empirical time series, one
should bear in mind that the mean and standard deviation of a time series are most often not the
whole story. In certain cases, mean and variance may even not be defined (e.g. Levy flights); and
yet, empirically one can always compute a mean and a variance, only the latter will be completely
irrelevant and reflect boundary effects only. To avoid this, one should always compute the whole
probability distribution, and more generally look at the time series directly! Intermittency or Levy
flights are generally visible to the naked eye (see Fig. 1.3). Levy flights resemble a Brownian motion
with occasional large jumps.
Figure 1.3: Intermittent or heteroskedastic signal (top), and Levy stable random walk (bottom).
2
In this Chapter we present some important features and stylised facts on financial time series.
1. Each transaction involves a buyer and a seller, which means that there must be as many people
who think the price will rise as people who think it shall decline. Therefore, price changes are
unpredictable, or in the modern language, price are Martingales.1
2. Further, if one considers that price returns at a given timescale, say daily, are non other than the
sum of a large number N of small price changes,
PN −1
pN − p0 = t=0 r t , with r t = p t+1 − p t ,
then, for large N , the Central Limit Theorem (CLT) ensures that daily price changes are Gaussian
random variables, and that prices thus follow Gaussian random walks.
While his first conclusion is rather accurate, the second is quite wrong as we shall see below. Note
however that such reasoning is quite remarkable for that time! Be that as it may, on such grounds,
Bachelier derives a series of very interesting results such as Bachelier first law which states that the
price variogram grows linearly with time lag τ:
but also results on first passage times2 and option pricing (precursor of Black-Scholes).
7
8 2. STATISTICS OF REAL PRICES
should be noted that the CLT also applies beyond this constraint, only the aggregate distribution no
longer converges to a Gaussian but to a Levy stable law.
Most importantly, for the error to be negligible everywhere one needs N → ∞, or equivalently
here, continuous time.4 This is never the case in real life, and thus empirically the CLT only applies to a
central region of width w N , and nothing can be said for the tails of the distribution beyond this region
(see Fig. 2.1). If the return distribution is power law, say ρ(r) ∼ 1/|r| p
1+µ
with µ > 2 such that
p the
variance is still finite, the width of thepcentral region scales as w N ∼ N log N which is only log N
times wider than the natural width σ N . The probability to fall in the tail region decays slowly as
1/N µ/2−1 . In fact the tail behaviour of the aggregate distribution is the very same power-law as as that
of ρ(r). In other words, far away from the central Gaussian region, the power-law tail survives even
when N is very large.
One should thus carefully refrain from invoking the central limit theorem to describe the proba-
bility of extreme events – in most cases, the error made in estimating their probabilities is orders of
magnitudes large.
p t+1 − p t
x t := ≈ log p t+1 − log p t . (2.2)
pt
In addition, the price of stocks is rather arbitrary, as it depends on the number of stocks in circulation
and one may very well decide to split each share into n, thereby dividing their price by n, without a
priori changing any fundamental properties (splitting invariance or dilation symmetry). Another sym-
metry exists in foreign exchange (FX) markets. The idea is that there should be no distinction between
using the price π of currency A in units of B, or 1/π for currency B in units of A. Relative returns satisfy
such a property: x = δp/p = −δ(1/p)/(1/p).
r t := p t+1 − p t . (2.3)
Indeed, the fact that price are discrete and quoted in ticks which are fixed fractions of dollars (e.g. 0.01$
for US Stocks) introduces a well-defined $-scale for price changes and breaks the dilation symmetry.
4
Considering Gaussian iid returns in continuous time, one is left with the standard geometric Brownian motion model, well
established in mathematical finance since the 1960’s.
5
On a practical note, relative price changes are also more convenient since asset prices can span a few $ to a few M$.
2.4. TYPICAL DRAWDOWN 9
Other examples in favour of an additive price process are some contract for which the dilation argument
does not work, such as volatility, which is traded on option markets, and for which there is no reason
why volatility changes should be proportional to volatility itself.
• The unconditional distributions of returns have fat power law tails. Recall that power law func-
tions are scale invariant, which here corresponds to micro-crashes of all sizes.
• The empirical probability distribution function of returns on short to medium timescales (from
a few minutes to a few days) is best fitted by a symmetric Student t-distribution, or simply the
t-distribution:
1+µ
1 Γ 2 aµ 1
P(x) := p µ ∼ , (2.4)
π Γ 2 (x 2 + a2 ) 21+µ
|x|1+µ
a2
with typically 3 < µ < 5. Its variance, given by σ2 = µ−2 , diverges as µ ↓ 2 from above.
• On longer timescales (months, years) the returns distribution becomes quite asymmetric. While
the CLT starts to kick (very slowly, see Section 2.2) for the positive tail, the negative remains very
fat. In other words, downward price jumps are on average larger than their upward counterparts.
6
Excluding week-ends and public holidays, there are ≈ 250 trading days per year.
7
For real (non-Gaussian) markets, it is even worse. An obvious improvement is diversification.
10 2. STATISTICS OF REAL PRICES
• In extreme markets, one can have µ < 3 or even µ < 2 (e.g. MXN/$ rate) such that σ = ∞!
The daily returns of the MXN/$ rate are actually very well fitted by a pure Levy distribution with no
obvious truncation (µ ≈ 1.4). The case of short term interest rates is also of interest. The latter (say
3-month rates) are strongly correlated to the decision of central banks to increase or decrease the day-
to-day rate; kurtosis is rather high as a result of the short rate often not changing at all but sometimes
changing a lot.
〈x t x t 0 〉 ≈ σ2 δ(t − t 0 ) , (2.5)
for timescales above a few minutes and below a few days,8 or else statistical arbitrages would be pos-
sible, one observes activity intermittency (see Chapter 1).
Actually, the volatility is itself a dynamic variable evolving at short and long timescales (multiscale).
One says that price returns are heteroskedastic random variables, from ancient Greek hetero: different,
and skedasis: dispersion. A common model is given by:
x t = σt ξt , (2.6)
where the ξ t are centred iid random variables with unit variance encoding sign of returns and unpre-
dictability 〈ξ〉=0, while σ t is a positive random variable with fast and slow components, see Eq. (1.5).
The squared volatility variogram is given by:
To validate such a scaling, one would need 1/ν ≈ 5 decades of data which is inaccessible. Actually, it is
difficult to be sure that Vσ2 (τ) converges to a finite value A at all. Multifractal models suggest instead:
Volatility appears to be marginally stationary as its long term average can hardly be defined. The very
nature of the volatility process is still a matter of debate (see [19] for recent insights) highlighting the
complexity of price change statistics.
8
At very high frequency price are mechanically mean reverting. At long timescales systematic inefficiencies exist (trend,
value).
2.7. VOLATILITY FLUCTUATIONS AND KURTOSIS 11
N
1
X
τ
κN = κξ + (3 + κξ )g(0) + 6 1− N g(τ) , (2.9)
N τ=1
with κξ the kurtosis of ξ t . Interestingly enough κ1 = κξ + (3 + κξ )g(0) > κξ which mean that – even
within a Gaussian model (κξ = 0) – a fluctuating volatility suffices to create a little kurtosis. For large
N , the CLT kicks in and κN decays to zero, but it does so extremely slowly as κN ∼ κ1 /N ν .
This is the so-called leverage effect. Consistently, the response function 〈ξ t σ t+τ 〉 is negative for τ > 0,
and = 0 for τ < 0, see Fig.2.4.
The leverage effect has direct implications on the skewness of the return distribution. Analogous
to Eq. (2.9), one can show that the skewness of the aggregate return writes:
N
1
X
τ
ζN = p ζξ h(0) + 3 1− N h(τ) , (2.10)
N τ=1
where h(τ) = x t σ2t+τ ≤ 0 is the return-volatility response function. Here again, even within a
Gaussian model (ζξ = 0), the leverage effect suffices to create a little negative skewness, consistent with
the empirical return distributions, see Section 2.5. Actually one can show that ζN actually increases
with N and reached a maximum value at the typical timescale of the leverage effect before the CLT
kicks in.
12 2. STATISTICS OF REAL PRICES
Conclusion
Real financial time series display a number of properties not accounted for within the geometric (con-
tinuous time) Brownian motion standard framework. Accounting for of all these effects is of outmost
importance for risk control and derivatives pricing. Different assets differ in the value of their higher
cumulants (skewness, kurtosis); for this reason a description where the volatility is the only parameter
is bound to miss a great deal of reality.
3
In this Chapter we confront two rather opposing views of prices in financial markets: price discovery
and price formation.
“I can’t figure out why anyone invests in active management [...]. Since I think everything
is appropriately priced, my advice would be to avoid high fees. So you can forget about hedge
funds.”
– Eugene Fama
The market is seen as an objective measuring instrument which provides a reliable assessment p t
of the fundamental value vt of the exchanged assets.1 Consistently, in most Economics 101 textbooks
one shall find the following equation:
p t = E[vt |F t ] , (3.1)
with F t the common knowledge.2 Immediate consequences of the EMH are as follows.
• Prices can only change with the arrival of new exogenous information (e.g. new iPhone release,
discovery of a new gold mine, diplomatic crisis). As a results, price moves are unpredictable be-
cause news are, by definition, unpredictable. While consistent with Bachelier’s findings discussed
in Chapter 2, nothing says that the EMH it is the only possible explanation.
• Large price moves should be due to important news that change significantly the fundamental
value of the asset. Crashes must be exogenous.
• Markets are fundamentally efficient. Small mispricings are rapidly corrected by "those who know
the fundamental price" (whoever that is).
“Professor Fama is the father of modern efficient-markets theory, which says financial prices
efficiently incorporate all available information and are in that sense perfect. In contrast, I have
argued that the theory makes little sense, except in fairly trivial ways. Of course, prices reflect
available information. But they are far from perfect. [...] I emphasise the enormous role played
in markets by human error.”
– Robert Shiller 3
1
The fundamental value or intrinsic value is, according to Wikipedia, the "true, inherent, and essential value" of a financial
asset. Other definitions vary with the asset class, but clearly, it is a very ill-defined concept.
2
Clearly, also a very ill-defined concept.
3
Fama and Shiller shared the 2013 Nobel prize in Economics...
13
14 3. WHY DO PRICES CHANGE?
The latter puzzle suggests that a significant fraction of the volatility is of endogenous nature, in
contradiction with Fama’s theory. To a physicist, nontrivial endogenous dynamics is a natural feature
of a complex system made of a large number of interacting agents, very much like a bird flock or a
fish school. Imitation and feedback loops induce instabilities and intricate behavior consistent with the
statistical anomalies described in Chapter 2.
Empirical data actually suggests that over 90% of the volatility is of endogenous nature. Indeed,
restricting to large price jumps (> 4σ) and using a large news database, Joulin et al. [20] showed
that only 5 to 10% of 4σ-jumps can be explained by news. One may rightfully argue that however
large, there is no database which contains "all the news". Interestingly enough, one can show that
exogenous jump are less persistent than endogenous jumps, and thus cross-validate the jumps identified
as endogenous/exogenous. In particular, the volatility decay after a jump follows (see Omori law):
with a = 1 for an exogenous jump and a = 1/2 for an endogenous jump. To note, slow relaxation is a
characteristic feature of complex systems.
4
Fama’s arguments disregard the way in which markets operate and how the trading is organised.
5
High frequency liquidity providers acting near the trade price are called market makers.
3.4. LIQUIDITY AND MARKET IMPACT 15
We define the midprice p t := (a t + b t )/2 and the bid-ask spread s t := a t − b t .6 The price axis is
discrete and the step-size is coined the tick size, typically 0.01$ for US stocks. When the average spread
is of the order of (resp. larger than) the tick size, one speaks of a large tick (resp. small tick) asset.
Be that as it may, trades consume liquidity and impact prices, this is called market impact, or price
impact, or simply impact, commonly denoted I. It corresponds to the average price move induced by a
trade of sign ε (ε = +1 for buy trades, and −1 for sell trades):
I := 〈ε t · (p t+1 − p t )〉 . (3.3)
Note that I > 0 since, on average, buy trades push prices up while sell trades drag prices down.
At this point, it should be stressed that the available volume in the order book at a given instant
in time (the instantaneous liquidity) is a very small fraction of the total daily traded volume, typically
< 1%.7 As a result, large trades must necessarily be cut in small pieces (order splitting), and can take
hours, days or even weeks to get executed. Such large orders executed sequentially or long periods of
time are coined metaorders.
From the perspective of the EMH, market impact is a substantial paradigm shift, prices appear to
move mostly because of trades themselves, very little because of new public information. One speaks of
price formation, rather than price discovery. Actually, because of the small outstanding liquidity, private
information (if any) can only be very slowly incorporated in prices.
Of interest for academics and practitioners, market impact is indeed both of fundamental and prac-
tical relevance. Indeed, in addition to being at the very heart of the price formation process, it is also the
source of substantial trading costs due to price slippage8 – also called execution shortfall.9 Significant
progress has been made in understanding market impact during the past decades [22–25].
encoding that the order flow has long range predictability. But if trades are indeed the reason of price
changes, how is this compatible with the fact that returns are nearly unpredictable? This a priori
paradox coined the diffusivity puzzle [23, 26] refers to the a priori incompatibility of diffusive prices
and super-diffusive order flow. Indeed, if we are to believe that prices are driven by trades, one would
naïvely expect that the impact of correlated orders should result in persistent price dynamics.
6
The spread represents the cost of an immediate round trip: buy then sell a small quantity results in a cost per share of s t ;
it also turns out to set the order of magnitude of the volatility per trade, that is the scale of price changes per transaction [21].
7
The daily traded volume is itself also very small compared to the total market capitalization.
8
The slippage is the difference between the expected price of a trade and the price at which the trade is actually executed.
9
Slippage is usually of the order a few basis points (1 bp = 10−4 ).
16 3. WHY DO PRICES CHANGE?
Bachelier’s first law discussed in Chapter 2 holds for timescales typically spanning from a few min-
utes to a few days. Below and above several market "anomalies" arise. At very short timescales, prices
tend to mean revert.10 At longer timescales (few weeks to few months) prices returns tend to be posi-
tively autocorrelated (trend), and at even longer timescales (few years) mean-revert. Actually, on such
timescales the log-price π is well described by an Ornstein-Uhlenbeck process driven by a positively
correlated (trending) noise:
dπ 0
= −κπ t + η t , with 〈η t η t 0 〉 ∼ e−γ|t−t | , (3.5)
dt
where γ−1 ≈ few months and κ−1 ≈ few years. The intuitive explanation of this phenomenon is that
when trend signals become very strong it is very likely that the price is far away from the fundamental
value. Fundamentalists (investors believing in value) then become more active, causing price mean-
reversion, overriding the influence of chartists or trend-followers.
Figure 12: Histogram of the price distortion for US stock index (left) and Canadian stock index
(right), using the non-linear model (4.1).
Note that the stochastic dynamic systems described by (3.3) or (4.1) indeed price/valu
undergo a phenomenological bifurcation (P-bifurcation) USinstock indexspace, which
parameter
Figure 12: Histogram of the price distortion for US stock index (le
means a qualitative change in the stationary distribution of mispricing from uni-
(right),
modal to bimodal. Since the Fokker-Planck equation associated using
with those the non-linear model (4.1).
systems
does not have a known solution, one has to use approximation methods to find the set
of parameters for which the bifurcation occurs. The result of the analysis of Chiarella
Figure 3.2: Price/value distortions on a US stock index for over two centuries, from [28]. Note
et al. (2008) and Chiarella et al. (2011) is the following condition that the stochastic dynamic systems described
for P-bifurcation:
Figure 3: Log-levelWe of 9 Figure
the
apply the US 3: silvermantest,
stock
R package index, together
Log-level ofwhich anwith
theis US theundergo
stock smoothed
index,
implementation a fundamental
together
of Silverman test thevalue
(1981)with
phenomenological as bifurcation
smoothed fundamental (P-bifurcation)
value as in
taking into account modification suggested by Hall and York (2001) in order to prevent it from being
obtained from model (3.3) with parameters in Table 3.
obtained from model (3.3) with parameters We also plot value
in Tableplus/minus standard
3. We also plot one
value plus/minus standard one
too conservative. means a qualitative change in the stationary distribution
deviation of the estimation interval,ofandthethe benchmark fundamental the value
deviation estimation interval, and modal to obtained
benchmark
bimodal. from Gordon
fundamental
Since value obtained from Gordon
the Fokker-Planck equation assoc
model. model. 26
does not have a known solution, one has to use approximatio
10
For market makers mean-reversion is favourable while trending is detrimental (adverse selection).
of parameters for which the bifurcation occurs. The result o
11
Inexperiments
Artificial market Figure 3 we show present
that even thewhen smoothed estimate value
the fundamental ofetfundamental
al. (2008)toand
is known value
In Figure 3 we present the smoothed estimate of fundamental value (using a
Chiarella
all, one (using aet al.
is tempted (2011) the
to forecast is the following co
Kalman
behaviour of their fellow smoother) forKalman
traders which the US
ends stock index
smoother)
up creating given
for
trends, the the
US estimated
bubbles stock We parameters
9index
and crashes.given
Thethe
apply the in Table
R estimated
temptation 3.parameters
one’s in
to silvermantest,
package outsmart Table
peers
which is 3.
an implement
is too strong to resist. taking into account modification suggested by Hall and York (2001) in
15 too conservative.
15
26
3.7. PARADIGM SHIFTING 17
1. Within the EMH or price discovery framework, prices are exogenous. Prices reflect fundamental
values, up to small and short-lived mispricings (quick and efficient digestion). Market impact is
non other than information revelation, the order flow adjusts to changes in fundamental value,
regardless of how the trading is organised.
While consistent with martingale prices and fundamentally efficient markets (by definition new
information cannot be anticipated), this view of markets comes with some major puzzles. In ad-
dition to the whole idea of markets almost immediately digesting the information content of news
being rather hard to believe, one counts in particular the excess trading, the excess volatility and
the trend-following puzzles. The concept of high frequency non-rational agents, noise traders, was
artificially introduced in the 80’s to cope with excess trading and excess volatility. But however
noisy, noise traders cannot account for excess long-term volatility and trend-following.
2. Within the order-driven prices or price formation framework, prices are endogneous, mostly af-
fected by the process of trading itself. The main driver of price changes is the order flow, re-
gardless of its information content. Impact is a mechanical statistical effect, very much like the
response of a physical system.
Here, prices are thus perfectly allowed to err away from the fundamentals (if any). Further, excess
volatility is a direct consequence of excess trading! This scenario is also consistent with self-
exciting feedback effects, expected to produce clustered volatility (see Chapter 2): the activity
of the market itself leads to more trading which, in turn, impacts prices and generates more
volatility and so on and so forth. As we shall see in Chapter 5, such mechanisms are also expected
to produce power law tailed returns.
While probably more convincing – and more inline with real data – than the EMH view, two
caveats remain at this stage: the diffusivity puzzle12 and the market efficiency. Indeed, let’s recall
that despite the anomalies discussed above, for reasonable timescales prices are approximately
martingales. How can the order-driven prices perspective explain why signature plots are so
universally flat? Fundamental efficiency is replaced with statistical efficiency, the idea being that
efficiency results from competition: traders seek to exploit statistical arbitrage opportunities,
which, as a result, mechanically disappear,13 by that flattening the signature plot.
Finally, note that the very meaning of what a good trading strategy is varies with one’s view. Within
the EMH, a good strategy is one which predicts well moves in the fundamental value. With mechanical
impact, a good strategy aims at anticipating the actions of fellow traders, the order flow, rather than
fundamentals. Further empirical support of the "order-driven prices" view is given in Chapters 4 and 7.
12
In Chapter 4, we will provide an econometric resolution for the diffusivity puzzle.
13
See e.g. the Minority Game presented in Chapter 5.
18 3. WHY DO PRICES CHANGE?
4
In this Chapter, we present some econometric models1 aimed at reproducing (sometimes predicting)
some of the stylised facts unveiled in Chapters 2 and 3. Such models need to be calibrated on real
data, but one should always be extra-careful and look at their conclusions with a critical eye. Indeed,
anything can be calibrated on data,2 and, despite the natural temptation to do so, the fact that one can
put a number on it should not give extra-credit to the model whatsoever.
r t = G1 ε t + η t , (4.1)
with G1 a constant measuring the impact amplitude and η t a noise term accounting for non-trade
related price changes (exogenous quote changes). We assume 〈η t 〉 = 0, 〈η t η t 0 〉 = σ02 δ(t − t 0 ) and
〈r t η t 0 〉 = 0. One can easily check that such a model violates market efficiency and does not solve the
diffusivity puzzle: 〈r t r t 0 〉 = G12 Cε (t − t 0 ) 6∼ δ(t − t 0 ). The resulting price dynamics is super-diffusive,
with a Hurst exponent H = 1 − γ/2 > 1/2. Using Eq. (4.2) and writing:
t−1
X t−1
X t−1
X
p t = p0 + r t 0 = p0 + G 1 εt 0 + ηt0 , (4.2)
t 0 =0 t 0 =0 t 0 =0
allows to identify the problem. Here, each trade suffices to shift the supply and demand curves per-
manently, which seems a bit too extreme. There is no reason why the impact of each trade should be
imprinted in the price forever.
1
As opposed to microscopic or microfounded models, see Chapter 5.
2
For example, one can always calibrate a Gaussian random walk on a given time series and output a standard deviation
σ, but this does not prove that the data are Gaussian!
19
20 4. ECONOMETRIC MODELS FOR PRICE CHANGES
Market efficiency seems to impose that a trade’s impact must relax over time. Precisely, in order to com-
pensate for order flow correlation and restore price diffusivity, the impact of a trade must be transient
and can be described by a time-decaying kernel or propagator G(t):
t−1
X t−1
X
0
p t = p0 + G(t − t )ε t 0 + ηt0 . (4.3)
t 0 =0 t 0 =0
Non-parametric calibration of this model onto real data indicates that G decays as a power law. Letting
G(t) ∼ t −β into Eq. (4.3) and enforcing 〈(p t − p0 )2 〉 ∼ t yields:3
1−γ
β= , (4.4)
2
which formalises that persistent order flow and diffusive prices can make peace provided the impact of
single trades decays as a slow power law of time with a particular exponent β ≈ 0.25. Impact decay
must be fine-tuned to compensate the long memory of order flow, and allow the price to be close to a
martingale. The very slow decay of impact (so slow that its sum is divergent) is sometimes referred to
as long-range resilience. Note also that this model predicts zero permanent impact lim∞ G = 0.
• Calibrating Eq. (4.3) on real data reveals indeed that β ≈ (1 − γ)/2, and that up to ≈ 80% of
price moves can be explained by trades!
• Using trade volume q t instead of trade signs ε t is also common, in particular for pricing purposes,
see below. Actually, the most statistically significant proxy for order flow is ε t log |q t | indicating
that the most important feature is the sign but with a residual dependence on volume.
• In practice, more complex multi-event propagator models are used, often coined generalized prop-
agator models, with several kernels and feedback on limit orders, cancellations etc.
• In some situations, it is more convenient to write the propagator model for single returns. Using
Eq. (4.3), one has:
t−1
X
G(t + 1 − t 0 ) − G(t − t 0 ) ε t 0 + η t .
r t = p t+1 − p t = G(1)ε t +
t 0 =0
Defining the discrete derivative Ġ(t) = G(t + 1) − G(t) and using the convention G(0) = 0 such
that Ġ(0) = G(1) yields:
X
rt = Ġ(t − t 0 )ε t 0 + η t . (4.5)
t 0 ≤t
Most importantly bear in mind that the propagator model is a solution to the diffusivity puzzle, it
is not the solution. Further, at this stage the model is purely econometric, and finding a deeper reason
for Eq. (4.4), possibly from a microfounded game-theoretic scenario appears essential to improve our
understanding of price formation.
P t−1
3
First, note that 〈(p t − p0 )2 〉 ∼ t is tantamount to C r (t − t 0 ) ∼ δ(t − t 0 ), indeed 〈(p t − p0 )2 〉 = 〈r r 〉 ∼
t 0 ,t 00 =0 t 0 t 00
P t−1 t s t
δ(t − t ) ∼ t. Then, using Eq. (4.3), 〈(p t − p0 ) 〉 = G(`)G(s)Cε (s − `) ∼
0 00
P
t 0 ,t 00 =0
2
`,s=1
d`ds(`s) |` − s|−γ ∼
−β
2−2β−γ
s1 −β −γ
t dudv(uv) |u − v| .
| {z }
constant
4.1. THE PROPAGATOR MODEL 21
one can show that inter-asset price impact, coined cross-impact, is significant, and that transactions me-
diate a significant part of the cross-correlation between different instruments. The intuition is that in
two related products the order flow of one may reveal information, or communicate excess supply/de-
mand regarding the other. The analysis of cross-impact effects falls beyond the scope of this course, for
more details see e.g. [32–38].
Using a linear propagator model, as that of Eq. (4.3), but replacing ε t with vt yields:
T X
t T
X
0 1 X
Cslip = vt G(t − t )vt 0 = vt G(|t − t 0 |)vt 0 , (4.8)
t=1 t 0 =1
2 t,t 0 =1
which is a convenient quadratic form in vt . Note that here we have omitted spread costs arising from
the difference between midprice and transaction price. In full generality one should write:
T T
1 X X vt s t
Cslip = vt G(|t − t 0 |)vt 0 + . (4.9)
2 t,t 0 =1 t=1
2
?
P problem amounts to finding the optimal schedule {vt } which minimises
Solving the optimal execution
Cslip under the constraint t vt = Q with Q the total order volume. Assuming constant spread s t = s
5
allows to disregard thePspread
P sterm (linear in vt ) in the optimisation problem. Switching to continuous
time, vt → vt dt and → , one is left with a standard problem R of variational calculus, where one
shall introducing a Lagrange multiplier λ to enforce the constraint vt dt = Q. Setting δCslip /δ[vt ] = 0,
it follows that for all t:
Z T
dt 0 G(|t − t 0 |)vt 0 = λ . (4.10)
0
In the general case this equation is difficult to solve.6 For an exponentially decaying kernel G(t) =
G0 e−ωt , one can show that vt? is given by the bucket shaped function:
Q
δ(t) + δ(T − t) + ω
vt? = 2 , (4.11)
1 + ωT /2
4
Bold lower (resp. upper) cases are vectors (resp. matrices).
5
If the spread is non-constant, traders try to take advantage of moments where the spread is small, and as a result the two
terms on the RHS of Eq. (4.9) interact in a non-trivial way, making the optimisation problem much more complicated.
6
One can prove that vt must be symmetric about T /2: vt = vT −t . Indeed, one can easily check that vT −t also solves
Eq. (4.10) (change of variables t 0 → T − t 0 ).
22 4. ECONOMETRIC MODELS FOR PRICE CHANGES
RT
with δ(t) such that 0 dtδ(t) = 1/2, see Fig. 4.1. One finds that a fraction 1/(2 + ωT ) should be
executed at the open and at the close, while the rest should be executed at a constant speed throughout
trading the interval.7 The corresponding slippage writes:
2
? Q Qs
Cslip = G0 (1 + ωT ) + , (4.12)
2 + ωT 2
which a decreasing function of T , in favour of the slowest possible execution,8 as could be expected
from the quadratic nature of the cost model.
In the more realistic case of a power-law kernel G(t) ∼ (1 + ωt)−β , the problem can be solved
numerically. The optimal profile is well approximated by the following U-shaped function (see Fig. 4.1):
Γ (2β)
ν?t ≈ QT 1−2β t β−1 (T − t)β−1 . (4.13)
Γ 2 (β)
Note that, minimising the variance of the expected slippage can also be achieved with an extra penalty
term (see the famous Almgren-Chriss problem [39]).
An important warning is that, throughout this section, we have implicitely assumed that our order
flow {vt? } is uncorrelated with the order flow from the rest of the market, and that in the absence of
our investor, the price is a martingale. These assumptions might very well fail, since the information
used to decide on such a trade could well be shared with other investors, who may trade in the same
direction.
Fat tails The idea that large realised returns in the past create extra volatility in the present can be
encoded as:
In the limit ωT → ∞ (rapid impact decay), the optimal profile is vt? = Q/T , called a time-weighted average price (TWAP).
7
8
In practice execution cannot be infinitely slow as the expected gain that motivates trading generally has a finite prediction
horizon.
4.3. MULTIFRACTAL MODELS 23
with σ0 the bare volatility level, κ the feedback intensity, and α the relative weight of the realized
volatility to the true underlying volatility in the feedback process. With the specification x t = σ t ξ t
(see Chapter 2) one obtains σ2t+1 = σ02 + κσ2t [1 + α(ξ2t − 1)].
In the case α = 1, provided the volatility reaches a stationary state with mean 〈σ2t 〉 = σ̄2 , one has
for κ < 1:
2
σ02
σ̄ = −→ +∞ . (4.15)
1−κ κ↑1
For κ > 1 the volatility diverges. For κ < 1, one can show that such a model leads to power-law
distributed returns:9
1
P(x) ∼ , (4.16)
|x|1+2ζ
Volatility clustering Multiplying both sides of Eq. (4.14) by σ t−τ (still with α = 1) and taking the
average yields g(τ + 1) = σ02 − (1 − κ)σ̄2 + κg(τ) where we have defined σ̄2 g(τ) := 〈σ2t σ2t+τ 〉 − σ̄4 .
Using Eq. (4.15) one obtains: g(τ + 1) = κg(τ) and thus g(τ) ∼ exp(−τ/τc ) with τc = | log κ|−1 .
While the decay time of the exponential volatility correlation diverges as κ ↑ 1, there is a single charac-
teristic timescale, whereas empirical data clearly shows the existence of multiple time scales encoded
in a (scale invariant) power-law decay g(τ) ∼ τ−ν with ν ≈ 0.2 (see Chapter 2).
Generalising the model to take into account the past realized returns over many time scales as:
X
σ2t+1 = σ02 + κ 2
K (τ)X t−τ,τ , (4.17)
τ≤1
P t+τ−1
with X t,τ := log p t+τ −log p t = t=t 0 x t 0 and K (τ) ∼ t −δ , yields instead g(τ) ∼ τ−ν with ν = 3−2δ.
Leverage effect To capture the leverage effect one needs to include a sign sensitive term to dissym-
metrise the dynamics. The simplest model is encoding that volatility should increase with negative past
returns:
with κ > 0 and κlev < 2κσ0 to ensure positivity of the RHS.
The interesting regime is τ α−1 where α−1 1 is a large cut-off time scale, beyond which the
correlations of ω t vanish; λ is the intermittency parameter. One can show that σ̄2 = σ02 and that the
rescaled correlation g(τ) of the squared volatilities, as defined above, writes:
Furthermore, all even moments of price changes can be computed and one finds a multifractal be-
haviour:
• The empirical distribution of volatility is compatible with log-normality. But this does not mean
that it is the right model.10
• For qλ2 > 1, the moments of price changes diverge, suggesting power-law tailed returns with tail
exponent µ = 1/λ2 .
• Fitting Eq. (4.19) to real data yields λ ≈ 0.01 to 0.1 and α−1 ≈ a few years. This yields µ ≈ 10
to 100, much larger than the empirical tails (3 < µ < 5) shown in Chapter 2.
Conclusions
The danger of overconfidence in econometric models, as those presented in this Chapter, is important.
Indeed, it is quite human when rather simple equations give non-trivial results consistent with empirical
data to adopt them as if they were the one and only, forgetting that there is no microfounded justification
whatsoever for postulating them in the first place.
10
An inverse gamma distribution also fits the data very well, and is actually motivated by simple models of volatility
feedback such as GARCH.
5
If, as suggested in Chapter 3, volatility is indeed mostly endogenous then clearly seeking for a mi-
croscopic interpretation is probably the best way to go. Imitation, risk aversion and feedback loops
(trading impacts prices, which in turn influence trading and so on and so forth) are good candidates
to account for some of the non-Gaussian statistics revealed by empirical data. Let us stress that, to this
day, there exists no complete agent-based model (ABM), simple and universal, able to account for all
of the empirical features of price statistics. In this Chapter, we provide a few instructive toy examples
or toy models to analyse the possible effects of interactions, feedback or heterogeneities, to name a few,
and help develop an intuition about the complex systems underlying market dynamics.
Let us also mention that, while such ideas were mainly developed by them, they do not belong to
statistical physicists only. For example, as Poincaré commented on Bachelier’s thesis:
“ When men are in close touch with each other, they no longer decide independently of each
other, they each react to the others. Multiple causes come into play which trouble them and pull
them from side to side, but there is one thing that these influences cannot destroy and that is their
tendency to behave like Panurges sheep.”
– Henri Poincaré
Nature counts many examples of collective behaviour, from the dynamics of bird flocks or fish
schools to the synchronisation of fireflies (see [42] for a great holiday reading). Interactions are key
in understanding collective phenomena. For example, by no means could anyone pretend to account
for the complex dynamics of a bird flock by simply extrapolating the behaviour of one bird. Only inter-
actions can explain how over a thousand birds can change direction in a fraction of a second, without
25
26 5. MICROSCOPIC (AGENT-BASED) MODELS FOR PRICE CHANGES
Figure 5.1: Examples of collective phenomena in the animal world and human systems.
having a leader giving directions (nonlinear amplification of local fluctuations or avalanches). Further,
it appears that the features of the individual may be inessential to understand aggregate behaviour.
Indeed, while a bird and a fish are two quite different specimens of the animal world, bird flocks and
fish schools display numerous similarities. Human systems naturally display collective behaviour as
well, for the better or for the worse, see Fig. 5.1. Other examples are clapping, fads and fashion, mass
panics, vaccination campaigns, protests, or stock market crashes.
5.2. POWER LAWS, SCALE INVARIANCE AND UNIVERSALITY 27
Power laws are interesting because they are scale invariant functions.1 This means that, contrary
to e.g. exponential functions, systems described by power laws have no characteristic length scales or
timescales. Many examples can be found in the physics of phase transitions. At the critical temperature
of the paramagnetic-ferromagnetic phase transition, Weiss magnetic domains become scale invariant.
So do the fluctuations of the interface at the critical point of the liquid-gas phase transition. Of par-
ticular interest to our purpose is the percolation phase transition, see below. Fluid turbulence, already
mentioned in Chapters 1 and 2 for its similarities with financial time series statistics, displays a number
of power laws. In particular the statistics of the velocity field in a turbulent flow are often made of scale
invariant power laws, independent of the fluid’s nature, the geometry of the flow and even the injected
energy. As we have seen, and shall see in Chapters 7 and 6, power laws are also very present in finance:
probability distribution of returns (3 < µ < 5), correlations of volatility (ν ≈ 0.2), correlations of the
order flow (γ ≈ 0.5), volatility decay after endogenous shocks (a ≈ 0.5), etc. Their universality means
that they are independent of the asset, the asset class, the time period, the market venue, etc.
All this indicates that financial markets can probably benefit from the sort of analysis and modelling
conducted in physics to understand complex collective systems. In a nutshell, this amount to modelling
the system at the microscopic level with its interactions and heterogeneities, most often through agent-
based modelling (ABM), and carefully scaling up to the aggregate level where the generic power laws
arise. One remark is in order here: this approach goes against the whole representative agent idea which
nips in the bud all heterogeneities, often essential to account for in the description of the phenomena
at hand, see [43]. Actually, simplifying the world to representative agents poses a real dimensionality
issue: while there is only one way to be the same, there are an infinity of ways to be different.
N
1X φ
r= ϕi := , (5.1)
λ i=1 λ
where ϕi ∈ {−1, 0, 1} signifies agent i selling, being inactive, or buying, and λ is a measure of
market depth.
• Agents i and j are connected (or interact) with probability p/N (and ignore each
P other with
probability 1 − p/N ), such that the average number of connections per agent is j6=i p/N ≈ p.
• If two agents i and j are connected, they agree on their strategy: ϕi = ϕ j .
f (x)
1
A function f is said to be scale invariant if and only if there exists a function g such that for all x, y, f ( y) =g x
y . One
can easily show that the only scale invariant functions are power laws.
28 5. MICROSCOPIC (AGENT-BASED) MODELS FOR PRICE CHANGES
Percolation theory teaches us that the population clusters into connected groups sharing the same
opinion, see e.g. [44]. Denoting by nα the size of cluster α, one has:
1X
r= nα ϕ α , (5.2)
λ α
where ϕα is the common strategy within cluster α. Price statistics thus conveniently reduce to the
statistics of cluster sizes for which many results are known. In particular, on can distinguish three
regimes.
1. As long as p < 1 all cluster are small compared to the total number of agents N , and the proba-
bility distribution of cluster sizes scales as:
1
P(n) ∼ exp(−ε2 n) , (5.3)
n5/2
with ε = 1 − p 1. The market is unbiased 〈r〉 = 0 (as long as ϕ = ±1 play identical roles).
3. If p > 1, there exists a percolation cluster of size O(N ), or in other words there exists a finite frac-
tion of agents with the same strategy, 〈φ〉 6= 0 ⇒ 〈r〉 6= 0, and the market crashes. A spontaneous
symmetry-breaking occurs.
This quite instructive model gives the "good" distribution of returns (µ = 3/2 was observed in
Chapter 2 for the MXN/$ rate) and the possibility for crashes when the connectivity of the interaction
network increases. However, this story holds only if the system sits below, but very close to the insta-
bility threshold. But what ensures such self-organised criticality where the value of p would stabilise
near p ® 1? In section 5.5, we give a stylised example illustrating why such systems could be naturally
attracted to the critical point. Further, note that this model is static and thus not relevant to account
for volatility dynamics. How do these clusters evolve with time? How to model opinion dynamics? An
interesting extension of this model was proposed in [45]: by allowing each agent to probe the opinion
of a subset of other agents and either conform to the dominant opinion or not if the majority is too
strong, one obtains a richer variety of market behaviors, from periodic to chaotic.
2
The incentive ui is the difference between the utilities choices +1 and −1.
5.3. MIMICRY AND OPINION CHANGES 29
1. The personal inclination, which we take to be time independent and which is measured by
f i ∈ R with distribution ρ( f ). Large positive (resp. negative) f indicates a strong a priori
tendency to decide si = +1 (resp. si = −1).
2. Public information, affecting all agents equally, such as objective information on the scope
of the vote, the price of the product agents want to buy, or the advance of technology, etc.
The influence of this exogenous signal is measured by the incentive field, h(t) ∈ R.
3. Social pressure or imitation effects. Each agent i is influenced by the previous decision made
by a certain number of other agents j. The influence of j on i is measured by the connectivity
matrix Ji j . If Ji j > 0, the decision of agent j to e.g. buy reinforces the attractiveness of the
product for agent i, who is now more likely to buy. This reinforcing effect, if strong enough,
can lead to an unstable feedback loop.
• Agents decide according to the so-called logit rule or quantal response in the choice theory litera-
ture [48] (see Appendix A) which makes the decision a random variable, with probability:
1
P(si = +1|ui ) = , P(si = −1|ui ) = 1 − P(si = +1|ui ) , (5.5)
1 + e−βui
where β quantifies the level of irrationality in the decision process, analogous to the inverse tem-
perature in physics. When β → 0 incentives play no role and the choice is totally random/unbi-
ased, whereas β → ∞ corresponds to deterministic behaviour.
We focus on the mean-field case where Ji j := J/N for all i, j,3 and f i = f = 0 for all i. Note that thisP
does
not mean that each agent consults all the others, but rather that the average opinion m := N −1 i si ,
or e.g. the total demand, is public information andP influences the behaviour of each individual agent
−1
i ui and the fraction φ = N+ /N of agent choos-
4
equally. Defining the average incentive u := N
ing +1, one can easily show that:
m(t) = 2φ(t) − 1 , and u(t) = h(t) + J 2φ(t − 1) − 1 . (5.6)
Using Eqs. (5.5) and denoting ζ = eβu yields the following updating rules:
ζ 1
P(N+ → N+ + 1) = (1 − φ) , P(N+ → N+ − 1) = φ , (5.7)
1+ζ 1+ζ
which naturally lead to the following evolution equation: 〈N+ 〉 t+1 = 〈N+ 〉 t + 1 × P(N+ → N+ + 1) − 1 ×
P(N+ → N+ − 1) + 0 × P(N+ → N+ ) that is:
ζ
d〈N+ 〉 = −φ. (5.8)
1+ζ
The equilibrium state(s) of the system are such that d〈N+ 〉 = 0 which yields:
ζ? ?
φ? = , with ζ? = eβ[h+J(2φ −1)]
. (5.9)
1 + ζ?
The solutions of Eq. (5.10) are well known (see Fig. 5.2). When h = 0, there is a critical value βc = 2/J
separating a high temperature (equivalently weak interactions) regime β < βc where agents shift
randomly between the two choices, with φ ? = 1/2; this is the paramagnetic phase. A spontaneous
polarization (symmetry-breaking) of the population occurs in the low temperature (equivalently strong
30 5. MICROSCOPIC (AGENT-BASED) MODELS FOR PRICE CHANGES
m?
> c
< c
Figure 5.2: Average opinion (or aggregate demand, or overall trust etc.) as a function of the external
incentive field in the high and low temperature limits.
interactions) regime β > βc , that is φ ? 6= 1/2; this is the ferromagnetic phase.5 When h 6= 0, one of the
two equilibria becomes exponentially more probable than the other.
To summarise the results let us consider the following gedankenexperiment. Suppose that one starts
at t = 0 from a euphoric state (e.g. confidence in economic growth), where h J, such that φ ? = 1
(everybody wants to buy). As confidence is smoothly decreased, the question is: will sell orders appear
progressively, or will there be a sudden panic where a large fraction of agents want to sell? One finds
that for small enough influence, the average opinion varies continuously (until φ ? = 0 for h −J),
whereas for strong imitation discontinuity appears around a crash time, when a finite fraction of the
population simultaneously change opinion. Empirical evidence of such nonlinear opinion shifts can be
found in the adoption of cell phones in different countries in the 90s [49], the drop of birth rates by
the end of the glorious thirty [50], crime statistics in different US states [51], or the way clapping dies
out at the end of music concerts [49].
For β close to βc , one finds that opinion swings (e.g. sell orders) organise as avalanches of various
sizes, distributed as a power-law with an exponential cut-off which disappears as β → βc . The power
law distribution indicates that most avalanches are small, but some may involve an extremely large
number of individuals, without any particularly large change of external conditions. In this framework,
it is easy to understand that, provided the external confidence field h(t) fluctuates around zero, bursts
of activity and power-laws (e.g. in the distribution of returns) are natural outcomes. In other words,
a slowly oscillating h(t) leads to a succession of bull and bear markets, with a strongly non-Gaussian,
intermittent behaviour.
choices of the followers are initialised with equal probability and we assume m odd to avoid
draws.
• We denote by q t (resp. π t ) the probability that a follower (resp. an agent chosen at random)
makes the right choice at time t. We repeat this process until it converges: q t , π t → q, π.
An equal-time equation can be easily obtained by noting that agents are either fundamentalists or
followers, such that:
π t = z p + (1 − z)q t . (5.11)
A dynamical equation can be obtained by noting that the probability that a follower sees right at time
t + 1 is equal to the probability that the majority among m saw right at time t, which in turn equal to
the probability that at least (m + 1)/2 agents saw right at time t, such that:
m
X
` `
q t+1 = Cm π t (1 − π t )m−` . (5.12)
`=(m+1)/2
Combining Eqs. (5.11) and (5.12) yields a dynamical equation of the form:
q t+1 = Fz (q t ) , (5.13)
from which the fixed points q? (z), π? (z) can be computed, see Fig. 5.3.
• For z large, there is only one attractive fixed point q? = q> ≥ p. Followers actually increase
their probability of being right, herding is efficient as it yields more accurate predictions than
information seeking. Further, the performance of followers increases with their number! This
a priori counterintuitive result comes from the fact that while fundamentalists do not interact,
followers benefit from the observation of aggregate behavior. Herders use the information of
other herders who have themselves a higher performance than information forecasters.
• However, below a certain critical point zc , two additional solutions appear, one stable q< < 1/2
and one unstable. The upper solution q> keeps increasing as z decreases, until it decreases
abruptly towards 1/2 at z = 0. The lower solution q< is always very bad: there is a substantial
probability that the initial condition will drive the system towards q< , i.e. the probability to be
right is actually lower than a fair coin toss. If herders are trapped in the bad outcome, adding
more herders will only self-reinforce the effect, by that making things even worse.
Quite naturally, the next step is to allow agents to choose whether to be followers or fundamental-
ists, that is allowing for z to depend on time: z → z t . We consider selfish agents following game theory
and aiming at reaching a correct forecast. Further, given that information processing has a cost, as
long as 〈q〉 > p, agents will prefer switching from the fundamentalist strategy to the follower strategy
(z t ↓). Conversely, z t ↑ when 〈q〉 < p, and hence we expect that the population will self-organize to a
state z † in which that no agent has the incentive to change his strategy, that is 〈q(z † )〉 = p. The state
z † is called a Nash equilibrium. One can show that z † ∼ N −1/2 , see [52]. Most importantly, here, such
32 5. MICROSCOPIC (AGENT-BASED) MODELS FOR PRICE CHANGES
an equilibrium is the result of the simple dynamics of adaptive agents with limited rationality, in con-
trast with the standard interpretation with forward looking rational agents, who correctly anticipate
the behavior of others, and respond optimally given the rules of the game.
This model captures very well the balance between private information seeking and exploiting
information gathered by others (herding). When few agents herd, information aggregation is highly
efficient. Herding is actually the choice taken by nearly the whole population, setting the system in a
phase coexistence region (z † < zc ) where the population as a whole adopts either the right or the wrong
forecast. See [52] for the details and in particular the effects of including heterogeneity in the agents’
characteristics.
• Consider a large number N of agents who must make a binary choice, say join a riot or not.
• Call Nt+ the number of agents deciding to join at time t and φ t = Nt+ /N .
• Each agent i makes his mind according to his conformity threshold ci ∈ [0, 1], heterogeneous
across agents and distributed according to ρ(c).
• If the number of agents Nt+ exceeds N (1 − ci ), then agent i joins at t + 1.
+
= i 1Nt+ >N (1−ci ) or equivalently
P
In mathematical terms the last point translates into Nt+1
1X 1X
φ t+1 = 1φ t >(1−ci ) = 1ci >1−φ t . (5.14)
N i N i
Such asymmetric exploitation does not seem to correspond to the equilibrium of a representative
ant with rational expectations. The explanation is rather to be looked after in the interactions, or as
put by biologists: recruitment dynamics. Kirman proposed a simple and insightful model [56] based on
tandem recruitment to account for such behavior.
• Consider N ants and denote by n(t) ∈ [0, N ] the number of ants feeding on source A at time t.
• When two ants meet, the first one converts the other with probability 1 − δ.6
• Each ant can also change its mind spontaneously with probability ε.7
The probability pn+ for n → n + 1 can be computed as follows. Provided there are 1 − n ants feeding
on B, either one of them changes its mind with probability ε or she meets one of the n ants from A and
gets recruited with probability 1 − δ. The exact same reasoning can be held to compute the probability
pn− for n → n − 1. Mathematically this translates into:
n h n i
p n+ = 1− ε + (1 − δ) (5.16a)
N N − 1
n N −n
p n− = ε + (1 − δ) , (5.16b)
N N −1
the probability of n → n being given by 1 − pn+ − pn− . Two interesting limit cases can be addressed.
• In the ε = 1/2, δ = 1 (no interaction) case, the problem at hand is tantamount to the Ehrenfest
urn model or dog-flea model [57], proposed in the 1900’s to illustrate certain results of the emerg-
ing statistical mechanics theory. In this limit, n follows a binomial distribution at equilibrium
P(n) = CNn εn (1 − ε)N −n = CNn /2N .
• When δ = ε = 0, the first ant always adopts the position of the second, and since first/second are
drawn with equal probability, the n process is a martingale with absorption at n = 0 or n = N .
Indeed, once all the ants are at the same food source, nothing can convert them (ε = 0).8
In the general case and large N limit, one can show (see [56]) that there exists an order parameter
O = εN /(1−δ) such that for O < 1, the distribution is bimodal (corresponding to the situation observed
in the experiments), for O = 1 the distribution is uniform, and for O > 1 the distribution is unimodal,
see Fig. 5.5.9 Note that the interesting O < 1 regime can be obtained even for weakly persuasive agents
(δ ® 1) provided self-conversion ε is low enough.
6
Of course who “the first one" is, is unimportant since they could have been drawn in the other order with the same
probability.
7
In the framework of trading, ε can represent either exogenous news, or the replacement of the trader by another one.
8
The probability for absorption at n = N is simply given by n0 /N with n0 the number of ants feeding on A at t = 0.
9
In the O < 1 regime with δ = 2ε and N → ∞ limit one can prove that the distribution of the fraction x = n/N is given
by a symmetric Beta distribution P(x) ∼ x α−1 (1 − x)α−1 with α = εN .
34 5. MICROSCOPIC (AGENT-BASED) MODELS FOR PRICE CHANGES
Figure 5.5: (Left) Switching behavior in the O < 1 regime. (Right) Ant population distributions for
three different value of the order parameter O = εN /(1 − δ).
When O > 1 the system fluctuates around n = N /2. When O < 1, while the average value of n is
also N /2, this value has little relevance since the system spends most of the time close to the extremes
n = 0, N , regularly switching from one to the other.10
The most important point is that in the O < 1 regime, none of the n states is, itself, an equilibrium.
While the system can spend a long time at n = 0, N (locally stationary) these states are by no means
equilibria. It is not a situation with multiple equilibria, all the states are always revisited, and there is
no convergence to any particular state. In other words, there is perpetual change, the system’s natural
endogenous dynamics is out of equilibrium.11 Most economic models focus on finding the equilibrium
to which the system will finally converge, and the system can only be knocked of its path by large
exogenous shocks. Yet, financial markets or even larger economies display a number of regular large
switches (correlations, mood of investors etc.) which do not seem to be always driven by exogenous
shocks. In the stylised setting presented here such switches are understood endogenously.
Several extensions of the model have been proposed. In particular, the version presented here does
not take into account the influence of the proximity of agents; but one can easily limit the scope of
possible encounters according to a given communication network Ji j (see RFIM model above).
The interpretation of each term in the right hand side of this equation is a follows.
10
Check https://rf.mokslasplius.lt/kirman-ants to play with the model.
11
The only thing that can be said is that there exists an equilibrium distribution.
5.4. FEEDBACK EFFECTS 35
• a > 0 accounts for trend following, past returns amplify the average propensity to buy or sell.
• b > 0 accounts for risk aversion. Negative returns have a larger effect than positive returns.
Indeed the two first terms can be re-written as (a − br t )r t such that it becomes clear that the
effective trend following effect increases when r t < 0. The term −br t2 also accounts for the effect
of short term volatility, the increase of which (apparent risk) is expected to decrease demand.
• a0 > 0 accounts for the market clearing mechanism. It is a stabilising term: price moves clear
orders and reduce imbalance.
• k > 0 accounts for mean-reversion towards a hypothetic fundamental value pF ; if the price wan-
ders too far above (resp. below) sell (resp. buy) orders will be generated.
• χ > 0 accounts for the sensitivity to random exogenous news ξ t , with 〈ξ t 〉 = 0, 〈ξ t ξ t 0 〉 =
2ς20 δ(t − t 0 ).
Combining Eqs. (5.17) and (5.18) and taking the continuous time limit r t ≈ u = ∂ t p, one obtains a
Langevin equation for the price velocity u of the form:
du ∂V u2 u3
= − + ξ̃ t , avec V (u) = κ(p t − pF )u + α +β , (5.19)
dt ∂u 2 3
where we have introduced α = (a0 − a)/λ, β = b/λ, κ = k/λ, and ξ̃ t = χξ t /λ. The variable u
thus follows the dynamics of a damped fictitious particle evolving in a potential V (u) with a random
forcing ξ̃ t .
• β > 0, α > 0 – Risk aversion is responsible for a local maximum at u = u? = −α/β < 0 and
a local minimum at u = 0 in the potential V (u) (see left panel of Fig. 5.6). A potential barrier
V ? := V (u? ) − V (0) = αu∗ 2 /6 separates a metastable region around u = 0 from an unstable
region u < u? . Starting from p0 = pF the particle oscillates around u = 0 until an activated
event driven by ξ̃ t brings the particle to u? after which u → −∞, that is a crash induced by the
amplification of sell orders due to the risk aversion term. The typical time before the crash scales
as exp(V ? /D). Note that, here, a crash occurs due to a succession of unfavourable events which
add up to push the system below the edge, and not due to a single large event in particular. Also
note that, in practice, volatility feedback effects would increase fluctuations before the crash, by
that increasing D and thus lowering further the crash time.
36 5. MICROSCOPIC (AGENT-BASED) MODELS FOR PRICE CHANGES
Figure 5.6: Langevin potential V (u) for different values of the parameters.
• β > 0, α < 0 – Taking now trend following to be large compared with the stabilising effects
while keeping risk aversion yields a very interesting situation. Starting at t = 0 from p0 =
pF , the potential V (u) displays a local maximum at u = 0 and a local minimum at u = u† =
−α/β with V † = V (u† ) . In the beginning the particle oscillates around u† > 0 and the price
increases linearly on average 〈p t − pF 〉 ∼ u† t with no economic justification whatsoever. This
is a speculative bubble, growth is self-sustained by the trend following effect. However, as time
passes the potential is modified (due to the increasing slope of the linear term κ(p t − pF )u) and
V † decreases accordingly (see right panel of Fig. 5.6). When the local minimum ceases to exist,
the bubble bursts, that is u† → −∞). The bubble lifetime t † is such that V † = u† t † which is
t † = −α/(4κ); as expected it increases with the amplitude of trend following and decreases with
that of mean reversion.
For a deeper analysis of destabilising feedback loops in financial markets, see our recent work on
endogenous liquidity crises [58].
• Consider a large number N of agents who must make a binary choice, say ±1.
• At each time step, a given agent wins if and only if he chooses the option that the minority of his
fellow players also choose. By definition, the number of winners is thus always < N /2.
• At the beginning of the game, each agent is given a set of strategies fixed in time (he cannot try
new ones or slightly modify the ones he has in order to perform better). A strategy takes as input
the string of, say M past outcomes of the game, and maps it into a decision. The total number of
M
possible strategies is 22 (the number of strings is 2 M and to each of them can be associated +1
or −1). Each agent’s set of strategies is randomly drawn from the latter. While some strategies
may be by chance shared, for moderately large M , the chance of repetition is exceedingly small.
• Agents make their decision based on past history. Each agent tries to rank his strategies according
to their past performance, e.g. by giving them scores, say +1 every time a strategy gives the
correct result, −1 otherwise. A crucial point here is that he assigns these scores to all his strategies
depending on the outcome of the game, as if these strategies had been effectively played, by that
neglecting the fact that the outcome of the game is in fact affected by the strategy that the agent is
actually playing.12
The game displays very interesting dynamics which fall beyond the scope of this course, see [59–61].
Here, we focus on the most striking and generic result. We introduce the degree of predictability H
12
Note that this is tantamount to neglecting impact and crowding effects when backtesting an investment strategy.
5.5. THE MINORITY GAME 37
2M
1 X
H= M 〈w|h〉2 , (5.20)
2 h=1
where 〈w|h〉 denotes the average winning choice conditioned to a given history (M -string) h. One can
show that the number of strategies does not really affect the results of the model, and that in the limit
N , M 1 the only relevant parameter is α = 2 M /N . Further, one finds that there exists a critical point
αc (≈ 0.34 when the number of strategies is equal to 2) such that for α < αc the game is unpredictable
(H = 0) whereas for α > αc the game becomes predictable in the sense that conditioned to a given
history h, the wining choice w is statistically biased towards +1 or −1 (H > 0). In the vocabulary of
financial markets, the unpredictable and predictable phases can be called the efficient and inefficient
phases respectively.
At this point, it is easy to see that, by allowing the number of players to vary, the system self-
organises such as to lie in the immediate vicinity of the critical point αc . Indeed, for α > αc , corre-
sponding to a relatively small number of agents N , the game is to some extent predictable and thus
exploitable. This is an incentive for new agents to join the game, that is N ↑ or equivalently α ↓. On
the other hand, for α < αc , corresponding to a large number of agents N , the game is unpredictable
and thus uninteresting to extract profits. Agents leave the game, that is N ↓ or equivalently α ↑. This
mechanism spontaneously tunes α → αc .
By adapting the rules the Minority Game can be brought closer to real financial markets, see [61].
The critical nature of the problem around α = αc leads to interesting properties, such as fat tails and
clustered volatility. Most importantly, the conclusion of an attractive critical point (or self-organised
criticality) is extremely insightful: it suggests that markets operate close to criticality, where they can
only be marginally efficient!
38 5. MICROSCOPIC (AGENT-BASED) MODELS FOR PRICE CHANGES
6
Benoit Mandelbrot was the first to propose the idea of scaling in the context of financial markets [62], a
concept that blossomed in statistical physics well before getting acceptance in economics, for a review
see [63]. In the last thirty years, many interesting scaling laws have been reported, concerning different
aspects of price and volatility dynamics. In particular, the relation between volatility and trading activity
has been the focus of many studies, see e.g. [64–70] and more recently [21, 71, 72].
39
40 6. DIMENSIONAL ANALYSIS IN FINANCE
Let us assume there exists an equation relating these n = 5 variables of the form:
f (p, σ2 , Q, V, C) = 0 . (6.1)
The number of non-linearly dependent units m can be computed as the rank of the following matrix:
$ shares T
p 1 −1 0
σ2 0 0 −1
Q 0 1 0
V 0 1 −1
C 1 0 0
Here m = 3 such that according to the Vaschy-Bukingham π-theorem there exists an equation equiva-
lent to Eq. (6.1) involving n − m = 2 dimensionless variables. One can for example choose:
Qσ2
pQ
= g , (6.2)
C V
with g a dimensionless function that cannot be determined on the basis of dimensional analysis only.
Invoking the Modigliani-Miller theorem which argues that capital restructuring between debt and
equity should keep p × σ constant, while not affecting the other variables. This suggests that g(x) ∼
x −1/2 , finally leading to the so-called 3/2 law:
W ∼ C N 3/2 , (6.3)
where we introduced the trading activity or exchanged risk W := P V σ and the trading rate N = V /Q.
Figure 6.1: Trading activity against trading frequency for (a) 12 futures contracts and (b) 12 US stocks,
from [75]. The insets show the slopes α obtained from linear regression of the data, all clustered around
3/2.
6.4. EMPIRICAL EVIDENCE 41
tick size, among other things, could play an important role – like the molecular size in the ideal gas
analogy. Besides, note that while the Modigliani-Miller theorem is intended for stocks, the 3/2 law
unexpectedly does hold also for futures (see Fig. 6.1) for which the Modigliani-Miller argument does
not have any theoretical grounds.
Conclusions
Dimensional analysis is a powerful tool that has proved its worth in physics, but that is yet under-
exploited in finance and economics. The prediction that exchanged risk W, also coined trading activity,
namely W = price × volume × volatility, scales like the 3/2 power of trading frequency N , is well sup-
ported by empirical data, both at the trade-by-trade and metaorder levels. The dimensionless quantity
W /(C N 3/2 ), where C denotes the trading costs, is a good candidate for a trading invariant across as-
sets and time. Finally, let us stress that unveiling the mysteries of the enigmatic 3/2 law from the
microscopic 3/2 law is yet to be done: is the 3/2 an approximate happy coincidence, or is there a deep
principle behind it?
42 6. DIMENSIONAL ANALYSIS IN FINANCE
7
As discussed in Chapter 3, the way trades affect prices (market impact) stands at the very heart of the
price formation process. Here we address the impact of metaorders.
with q t the volumes of the child orders, m t the execution rate, and T the execution horizon.
The ideal experiment to measure the impact of a metaorder would be to compare two different
versions of history: one in which the metaorder was executed, and one in which it was not, all other
things being equal. Of course this is not possible in reality and one estimates the impact as the mean
price difference between the beginning (t = 0) and the end (t = T ) of the metaorder:
I(Q, T ) := 〈ε · (p T − p0 )|Q〉 . (7.2)
This quantity is often called the peak impact. As argued in Chapter 3, one expects I > 0. It is also
convenient to define the impact path as:
Z t
I t := I(Q t , t) , with Q t = m t dt , (7.3)
0
and where one can also have t > T . Indeed the impact path after the execution is an important quantity
showing nontrivial behavior on which we shall comment below. We define the permanent impact as:
I∞ := lim I(Q, t) . (7.4)
t→∞
Note that the permanent impact receives two types of contribution: one depending on the motivation
to execute the metaorder in the first place (often called prediction impact), and the other coming from
the possibly permanent mechanical reaction of the market to the trades. Here we are interested in
the latter. Finally, we also define the execution cost (also called slippage cost or execution shortfall,1 see
Chapter 4) as the volume weighted average premium paid by the trader executing the metaorder:
¬P ¶ ¬R T ¶
T
C(Q, T ) := q
t=1 t (p t − p 0 )|Q ≈ 0
m t (p t − p 0 )dt|Q . (7.5)
1
The execution shortfall is often defined as C/Q.
43
44 7. MARKET IMPACT OF METAORDERS
Figure 7.1: Average impact path during a metaorder execution (red) and after (blue).
Note that one would expect all these quantities to depend on the whole execution profile {q t } and
not just on the aggregate volumes Q or Q t . However, in practice one observes impact is only very
weakly affected by the precise shape of the execution schedule.
Also note that measuring metaorder impact requires proprietary data listing which child orders
belong to which metaorders, together with q t , p t . In addition, to be safe from possible idiosyncratic
biases one would need metaorders from a large number of different market participants (broker data).
Available data sets with such rich and detailed data are very scarce (and quite pricey).
where Y is a numerical factor of order one, σT and VT respectively denote volatility on scale T and
total traded volume during the execution, and δ ≈ 0.5 < 1 bears witness of the concave
p nature of price
impact. Actually, the impact does not depend on T .2 Indeed, taking that σ T = σ0 T and VT = V0 T
yields:
δ δ
Q Q
σT = σ0 T 1/2−δ ,
VT V0
so that with δ = 1/2 all time dependence disappears. Conventionally the square root law is written
with daily volatility σd and daily traded volume Vd :
δ
Q
I(Q) = Y σd . (7.7)
Vd
Equation (7.7) is surprisingly universal: it is found to be to a large degree independent of details such
as the asset class (including equities, futures, FX, options, and even Bitcoin), market venue, execution
style (limit orders or market orders or both), microstructure (small and large ticks), and time period.
2
As we shall see in the following Chapter, there is a residual dependence on T in the very small participation ratio regime.
7.3. SLIPPAGE COSTS, ORDERS OF MAGNITUDE 45
In particular, the advent of electronic markets and High Frequency Trading (HFT) has not altered the
square-root behaviour, in spite of radical changes: before 2005 liquidity was mostly provided by market
makers, and after dominated by HFT. A few additional remarks are in order:
• The square root law is everything but intuitive. Indeed, it means that impact is not additive, or
in other words that "2Q 6= Q + Q" (the first half of the trade has a much larger impact than the
second).
• Concave impact also means that small orders have very large p impact (relatively speaking). A
metaorder taking 0.1% of the daily volume moves the price by 0.1% ≈ 3% of the daily volatility.
• For reasons that shall become clear in the following Chapter, the square root is expected to fail
when Q < Vbest (with Vbest the average volume at the best quotes) and T < a few seconds (the
typical microstructure time) and T > a few days.
• As we shall see in Chapter 8, stylised agent-based models (ABM) are of great value to gain insight
into the origins of the square root law.
µ̃ 2 9
Q? = Vd , Q max = Q? . (7.10)
Y σd 4
3
With σd ≈ 1% and a participation ratio of say 10−3 to 10−2 , one obtains using Eq. (7.8) that the average slippage is as
large as 2 to 6 basis points. Taking an institutional investor with say $10B AUM, a turnover of say 10 days and a leverage
factor 10, this means that by just trading randomly one looses statistically $500M to $1.5B per year (there are ≈ 250 trading
days in one year).
46 7. MARKET IMPACT OF METAORDERS
Conclusions
The square root impact law contradicts that predicted by many models. While it took several years to
push its way up (there are still a few dissenting voices), the square root law is an interesting example
in which empirical data compelled the community to accept that reality was fundamentally different
from theory.
4
The ANcerno Ltd (formerly the Abel Noser Corporation) database is a very large dataset containing more than 10 million
metaorders, executed on the US equity market and issued by a diversified set of institutional investors, see www.ancerno.com
for details.
8
Statistical physics aims at overcoming the gap between microscopic dynamics and aggregate behavior
(e.g. deriving state equations such as the ideal gas law from kinetic theory). Analysing the dynamics
of the order book is precisely looking at the system at the microscopic level, from which we may model
agent’s actions (order flow events), carefully upscale to the aggregate level, and hopefully understand
stylised facts on price dynamics such as the square root impact law (see Chapter 7).
8.1 Coarse-graining
Let us recall that while the price impact of individual trades is non universal and strongly depends on
the microstructure, the impact of metaorders is highly universal and quite insensitive to microstructural
changes.
This indicates that microscopic details are likely irrelevant to account for the square root impact law,
and suggests a coarse-grained approach.1 The continuum Navier-Stokes equations in hydrodynamics
can be obtained by coarse-graining over the microscopic molecular degrees of freedom of the liquid
molecules; one obtains that one emergent scalar parameter – the viscosity – encodes all the complexity
of the microscopic scale and suffices to describe the dynamics of the macroscopic systems. Here, we
apply a similar approach by coarse-graining over the microscopic degrees of freedom of the order book
in order to build a "hydrodynamic model" for low frequency market dynamics.
47
48 8. LATENT ORDER BOOK MODELS FOR PRICE CHANGES
price is such that the probability to get executed is large enough to warrant posting one’s order. The
insensitivity of the square-root law to the high frequency dynamics of prices suggests that its origin
should lie in some general properties of the low frequency, large scale dynamics of latent liquidity,
rather than in its short-lived revealed counterpart.
where L denotes the latent liquidity of the market. Indeed the steeper the latent order book (large L )
the weaker the impact.
• Latent orders diffuse with diffusivity constant D (random re-evaluation of the reservation price).
• Latent orders are canceled with multiplicative rate ν (participants reduce their trading intentions
or leave the market).
• New intentions are deposited with additive rate λ (new arrivals).
• When a buy intention meets a sell intention they are instantaneously matched: A + B = ∅. We
implicitly assume that latent orders are revealed in the vicinity of the trade price pt .
• The trade price pt is conventionally defined through the equation ρB (pt , t) = ρA (pt , t).
According to this set of rules, the reduced latent order book density φ(x, t) = ρB (x, t)−ρA (x, t) solves:
where the sign function sgn(x) = 1 x≥0 − 1 x<0 indicates that buy (resp. sell) latent orders can only be
deposited in the bid (resp. ask) side of the book. Setting ξ = x − p t , the resulting stationary latent
order book reads:
λ
φ st (ξ) = − sgn(ξ) 1 − e−|ξ|/ξc ,
(8.4)
ν
2
Note that the variable x denotes the reservation price relative to the informational price component p̂ t such that the true
reservation price reads p = p̂ t + x. We here assume that p̂ t encodes all informational aspects of prices and itself performs an
additive random walk, see [97] for a detailed discussion.
8.4. A REACTION-DIFFUSION MODEL 49
Figure 8.1: (a) Reaction-diffusion setup for the latent order book. (b) Stationary latent order book
densities.
p
where ξc = Dν−1 denotes the typical length scale below which the order book can be considered to
be linear (see Fig. 8.1b):
φ st (ξ) ≈ −L ξ . (8.5)
p
The slope L = λ/ νD is directly related to the total transaction rate J through:
J := D∂ξ φ st (ξ)|ξ=0 = DL . (8.6)
Below, we focus on the infinite memory limit, namely ν, λ → 0 while keeping L ∼ λν−1/2 constant, such
that the latent order book becomes exactly linear since in that limit ξc → ∞. This limit considerably
simplifies the mathematical analysis.
where ∗ denotes the space and time convolution product, φ0 (x) = φ(x, 0) is the initial condition, and
Gν (x, t) = e−νt G (x, t) with G the diffusion kernel:
x2
1 t>0
G (x, t) = p exp − . (8.11)
4πDt 4Dt
The price trajectory can then be computed from the combination of Eqs. (8.3) and (8.10). With infinite
memory ν, λ → 0, and taking φ0 (x) = φ st (x), one can show that the impact path I t = p t − p0 solves
the following self consistent equation:
t∧T
(I t − I t 0 )2
Z
1 mt 0
It = exp dt 0 . (8.12)
L 4D(t − t ) 0
p
0 4πD(t − t )
0
Further, focusing on the case of constant participation rate (m t = m0 = Q/T ) one can show that market
impact reduces to:
v
tQ
I(Q) = F (η) , (8.13)
L
where η := m0 /Jp is the participation ratio and the3scaling function F (η) ≈ η/π
p
for low participation
(η 1) and ≈ 2 for high participation (η 1) with a smooth crossover at η? ∼ 1. Hence, recalling
Q = m0 T , I(Q) is linear in Q for small Q at fixed T , and crosses over to a square-root for large Q.
Provided νT 1, finite memory corrections (ν 6= 0) are easily computed, see [98]. For νT 1
the latent liquidity is very short-lived (Markovian limit) and the impact becomes linear regardless of
participation ratio.
Q
I ∞ = ξc , (8.14)
2Q lin
consistent with some theoretical predictions [77, 93, 94]. In the Markovian limit (νT 1) all memory
is already lost at the end of the execution and the permanent impact trivially matches the peak impact.
3
This is precisely the regime obtained with the simple static geometrical arguments above.
8.5. TIMESCALE HETEROGENEITY 51
Using Eq. (8.12) one obtains that C can be identically rewritten as a quadratic form:
ZZ T
1
C= m t H(t, t 0 )m0t dtdt 0 , (8.15)
2 0
where H is a non-negative operator (see [97]), thereby showing that C ≥ 0 for any execution sched-
ule m t .
The kernel G(t) decays as t −β with β = 1/2 6= (1 − γ)/2 (see Chapter 4). As a result, the model
generates mean-reverting price dynamics, inconsistent with real data. Introducing timescale hetero-
geneities for the renewal of liquidity – in particular fractional diffusion instead of normal diffusion for
latent order – allows one to cure such deficiencies, see [99]. We assume waiting times for the diffusion
of latent orders to be distributed according to a power-law function with tail exponent α of the form
Ψ(t) ∼ 1/t 1+α . For α > 1 one recovers normal diffusion, but for α < 1, the mean waiting time diverges
and Eq. (8.2) becomes:
∂t φ = K D1−α
t (∂ x x φ − ν̃φ) + s(x, t) , (8.17)
where K is a generalised diffusion coefficient, ν̃ is a reduced cancellation rate, and where D1−α
t = ∂ t D−α
t
with D−α
t the fractional Riemann-Liouville operator [100, 101].4
Similar to Eq. (8.2), Eq. (8.17) can
be solved in Fourier space in the infinite memory limit to obtain the corresponding stationary order
book and market impact of metaorders, see [99]. In particular, Eq. (8.16) becomes:
Z t
1 mt 0
p t = p0 + dt 0 p , (8.18)
Lα 0 4πK(t − t 0 )α
with Lα a liquidity parameter, analogous to L in the normal diffusion case. Equation (8.18) allows to
identify the propagator decay exponent β = min(1/2, α/2). Thus, for α < 1 the equality β = (1 − γ)/2
can be achieved by the choice α = 1 − γ; and recalling γ ≈ 0.5 implies α ≈ 0.5. In other words, a
Rt
4
The fractional Riemann-Liouville operator is defined as D−α
t
f (t) = Γ [α]−1 0
du (t − u)α−1 f (u).
52 8. LATENT ORDER BOOK MODELS FOR PRICE CHANGES
fractional latent order book model enables the price to be diffusive in the presence of a persistent order
flow thereby solving the diffusivity puzzle.
Market participants are indeed highly heterogeneous, and display a broad spectrum of volumes
and timescales, from low frequency institutional investors to High Frequency Traders (HFT). Timescale
heterogeneity is often a crucial ingredient in complex systems.
which is consistent with the empirical autocorrelation of the order flow, see Chapter 3.
Conclusions
A microfounded theory based on a linear (latent) order book, inspired by diffusion-reaction models
from physics and chemistry, is able to account for the square-root impact law. This is without relying
on any equilibrium or fair-pricing conditions, but rather on purely statistical considerations.
9
Financial engineering
In a sense, financial engineering could be defined as the art of chiselling PnL (Profit & Loss) distributions.
Indeed, the work of financial engineers is to devise products that suit the investors, more/less risk,
more/less skew etc. In this Chapter we present different ways used to optimise investing, including
optimal portfolio composition, dynamical optimisation, and option hedging.
Restricting to the case of Gaussian statistics, consider N assets with arbitrary correlations described
by their correlation matrix:
with ri , r j the returns of assets i, j ∈ [1, N ]. Since C is a symmetric matrix, it can be diagonalised
(principal component analysis or PCA). In particular, returns can be written as a weighted sum of un-
correlated Gaussian variables {ea }a∈[1,N ] (often called explicative factors or principal components) with
zero mean and variance given by the eigenvalue of C and denoted σ2a . One has:
N
X
ri = 〈ri 〉 + via ea , with 〈ea e b 〉 = σ2a δab . (9.2)
a=1
This decomposition often has a simple economic interpretation. For stocks, the eigenvector associated
to the highest eigenvalue, coined the market mode, is ∼ p1N {1, . . . , 1}. The next modes correspond to
P others. The returns of a global portfolio with weights {w i }i∈[1,N ]
one or several economic sectors against
are also Gaussian with mean 〈r p 〉 = i w i 〈ri 〉 and variance:
N
X
σ2p = w i Ci j w j . (9.3)
i, j=1
N
λ X −1
w?i = C 〈r j 〉 . (9.4)
2 j=1 i j
53
54 9. FINANCIAL ENGINEERING
The Lagrange multiplier λ is determined by the equation w?i 〈ri 〉 = G . One finally obtains:
P
i
PN −1
j=1 Ci j 〈r j 〉
w?i = G PN . (9.5)
−1
i, j=1 Ci j 〈ri 〉〈r j 〉
Making use of Eq. (9.4), the average return and variance of the optimal portfolio write:
X λ X −1 X λ2 X −1
〈r p 〉 = w?i 〈ri 〉 = C 〈ri 〉〈r j 〉 , σ2p = w?i Ci j w?j = C 〈rk 〉〈r` 〉 .
i
2 i, j i j i, j
4 k,` k`
Eliminating λ yields that the set of optimal portfolios is described by a parabola, called the efficient
frontier, in the risk-return plane σ2p , 〈r p 〉, see Fig. 9.1. This line separates ‘possible’ portfolios (below)
from ‘impossible’ ones (above). The Sharpe ratio S := 〈r p 〉/σ p is constant and maximal along this
line, such that all optimal portfolios have the same Sharpe. The Lagrange multiplier λ sets the risk (or
equivalently the average return) along this line and can be interpreted as the typical drawdown (see
Chap. 2) ∆? := σ2p /〈r p 〉 = λ/2.
• As we have seen above, the correlation matrix C is a crucial input for managing portfolio risk.
However the empirical determination of C is very noisy due to the length of the available time
series not being very large compared to the number of assets in the portfolio, see e.g. [103,
104]. In Fig. 9.2 we plot a typical density of eigenvalues of C together with the corresponding
Marchenko-Pastur distribution (pure random matrix) [105]. Anything below this curve must be
considered as noise. Typically, only 5 to 10% of the eigenvalues are outside the noise band; but
they account for 20 to 30% of the total volatility.1 Here, given that the above formulas involve
the inverse of the correlation matrix, one is concerned with the smallest eigenvalues of C, the
high noise level of which leads to large numerical errors in C −1 . In practice this leads to strong
underestimation of risk of optimal Markowitz-like portfolios by over-investing in artificially low
risk modes. Several methods to clean correlation matrices and improve risk estimation have been
proposed [106].
• Often there can be other constraints on portfolio composition. In particular, non linear constraints
make the optimisation procedure highly nontrivial. Interesting similarities with the physics of
spin glasses can be devised. Examples of such constraints are long only portfolios in which w?i ≥ 0
for all i (no short positions), or margin calls on futures markets where a certain deposit is required
regardless of the position (long or short) that is i |w i | = f with f the fraction of wealth invested
P
• The hypothesis of stationarity underlies the use of the correlation matrix for trading optimal
portfolios. But we do not live in a stationary world, described by a time-invariant covariance
1
These values correspond to an estimation of the correlation matrix of daily returns of a few hundreds of stocks over a
few years.
9.1. OPTIMAL PORTFOLIOS 55
matrix. This induces the out-of-sample risk to be larger than expected. The covariance between
assets evolves not only because the volatility of each asset changes over time [107] and react to the
recent market trend [108–110], but also because correlations themselves increase or decrease,
depending on market conditions [111–113]. Sometimes these correlations jump quite suddenly,
due to an unpredictable geopolitical event. The arch example of such a scenario is the Asian crisis
in the fall of 1997, when the correlation between bonds and stocks indexes abruptly changed sign
and became negative – a flight to quality mode that has prevailed ever since [114, 115].2
• Markowitz optimal portfolios are all proportional to one another (they only differ by the choice
of λ) and since the problem is linear, a superposition of optimal portfolios is still optimal. If all
market participants trade Markowitz optimal portfolios, the market portfolio made of the aggre-
gate positions of all the agents is also optimal, and thus satisfies Eq. (9.4). Recall the equation
of the optimal portfolio line (or efficient frontier) σ2p /〈r p 〉 = λ2 and eliminate λ with the inverted
Eq. (9.4) applied to a portfolio made of two ‘assets’: the optimal market portfolio + an infinitesi-
mal fraction of asset i (w i = 1 − wmkt = ε → 0), that is 〈ri 〉 = λ2 Ci,mkt . One obtains an expression
of the average return of asset i as function of its covariance with the market portfolio Ci,mkt :
Ci,mkt
〈ri 〉 = βi 〈rmkt 〉 , with βi = , (9.6)
σ2p
where βi is called the beta of asset i. This is the famous Capital Asset Pricing Model (CAPM). Of
course this does not work very well because not all agents have the same definition of an optimal
portfolio, nor do they have the same estimates for average returns and risks.
• If price statistics aren’t Gaussian, the variance may not be an adequate measure of risk, in the
sense that minimising variance is not equivalent to an optimal control of large fluctuations. With
power-law tailed returns, the Value-at-Risk (VaR) is more suited and the corresponding optimisa-
tion problem can be tracked analytically, see e.g. [2].3
• Finally, slippage costs can be very large when trading large portfolios. Accounting cross-impact
effects (see Chapter 4) is essential, as not doing so leads to an incorrect estimation of liquidity
which results in suboptimal execution strategies, see e.g. [32].
2
A Flight to quality or flight-to-safety is the action of investors moving their capital from riskier assets to safer ones, such
as treasuries and other bonds.
3
The Value-at-Risk or VaR corresponds to the level of loss associated to a certain probability of loss, say p = 1%, over a
certain time interval τ. Mathematically this translates into:
R −VaR
−∞
Pτ (x)dx = p ,
with Pτ (x) the probability distribution of returns on timescale τ. In other words: over a time τ, the probability that I loose
more than VaR is p.
56 9. FINANCIAL ENGINEERING
with rn = pn+1 − pn are assumed to be iid with mean 〈rn 〉 = mτ and variance 〈rn2 〉 − 〈rn 〉2 = σ2 τ. The
optimal strategy in this setting is the set of successive position φn? (g n ) which ensure 〈g N 〉 = G while
minimising variance of final wealth:
R 2 = (g N − G )2 . (9.8)
To determine φn? (g n ) one can work backwards in time; this is Bellmann’s method. Optimising the last
position φN −1 is done by noting that R 2 = (g N −1 + φN −1 rN −1 − G )2 , and that rN −1 is independent
of the value of g N −1 , such that:
R 2 = φN2 −1 〈rN2 −1 〉 + 2φN −1 〈rN −1 〉(g N −1 − G ) + (g N −1 − G )2 . (9.9)
∂φN −1 R 2 = 0 yields:
m
φN? −1 = (G − g N −1 ) , (9.10)
m2 τ + σ2
and proceeding similarly in a recursive way for all n ∈ [0, N − 1] leads to:
m
φn? = 2 (G − g n ) , (9.11)
m τ + σ2
where we have used the already known strategy for ` > n.4 The optimal strategy thus consists in taking
positions proportional to the distance to the target: invest more when one is far from the target, and
reduce the investment when the gains approach the target. One can show that the resulting risk in the
limit T τ is exponentially small in T .5 This result is to be compared to that of the naive constant
execution rate strategy φn = φ0 = G /(mT ) for all n, for which the variance R 2 ∼ 1/T (according to
the CLT) which is much larger than that of the optimal strategy at large T , see Fig. 9.3.
Two remarks are in order. The first one is that the optimal strategy allows for huge losses at inter-
mediate times (which will on average be compensated by the positive trend), but this is not acceptable
for several obvious reasons. The second is that all this story relies – in addition to the returns not being
Gaussian, iid etc. – on the assumption that the drift m is perfectly known, which is also not the case
in practice.6 Notwithstanding, this simple model shows that trading strategies can be monitored to
modify the gains distribution, a useful concept for the following sections.
4
Assuming that {φ`? }`>n have been determined, one can write:
¬ PN −1 ¶ PN −1
R 2 = (g n + φn rn + `=n+1 φ`? r` − G )2 = φn2 〈rn2 〉 + 2φn 〈rn 〉(g n − G ) + `=n+1 φ`? 〈rn r` 〉 + O(φn0 ) .
Recalling that 〈rn r` 〉 ∝ δn,` and taking ∂φn R 2 = 0 one obtains Eq. (9.11).
Taking the continuous time limit τ → 0 and assuming the price X (t) to be a continuous time Brownian motion one
5
obtains dg = φ ? dX and d X = mdt + σdW with W (t) a Brownian motion of zero drift and unit volatility, to be interpreted in
the Ito sense since the price increment is posterior to the determination of the optimal strategy φ ? . Noting that dg = d(G − g)
one obtains: 2
d(G − g) = (G − g) − m σ2
dt − m
σ dW .
which reveals that for large T drift uncertainty becomes the major source of risk.
9.3. OPTIONS 57
9.3 Options
Options are another way to chisel one’s PnL distribution.
• A put option protects the owner against the potential drawdown of the price of a given asset
p −p
(called the underlying). More precisely, the put covers losses larger than L = W0 0 p0 < where p<
is called the strike.
• Symmetrically, a call option protects the owner against the increase of the price of the underlying
asset that he will need to buy in the future, by warranting a maximum buy price p> , also called
the strike (e.g. you need to buy wheat in a year but you are worried that the price will go up
due to global warming; if you are ready to pay up to $10, but not more, you can buy a one-year
maturity call option with a strike at p> = $10).
We here restrict to the analysis of European options (or plain vanilla) which have a well defined maturity
(or expiry date) T at which the option can be exercised.
Let us take the example of a put option. The owner of the option has clearly shaped his PnL
distribution to cover losses larger than L but one must not forget that he also had to buy the option.
Denoting C< the cost of the option, his actual losses covered are those beyond L + C< (see the left
panel of Fig. 9.4). The natural question is thus, what should be the price of the option contract? This
problem, at the very origin of the derivatives pricing science, was first solved by Bachelier in 1900 with a
fair game argument: the cost of the option contract should be such that on average no party is favoured.
Noting that a put option pays X := (p< − p T )1 pT <p< per underlying asset, this is C< = 〈X |(p T , p0 )〉,
and assuming that prices follow additive continuous time random walks:
Z p<
C< = (p< − p)P(p, t = T |p0 , t = 0) dp . (9.12)
0
Similarly, noting that a call option pays Y := (p T − p> )1 pT >p> , one has C> = 〈Y |(p T , p0 )〉, or:
Z ∞
C> = (p − p> )P(p, t = T |p0 , t = 0) dp . (9.13)
p>
(p−p )2
Further, in the Gaussian case P(p, t = T |p0 , t = 0) = p 1 exp − 2σ20T .7
2πσ2 T
There exists a variety of different option contracts. The common American options are similar to the
European kind with the difference that they can be exercised at any time before the maturity. Other
kinds can become very complex, and therefore hard to price, in particular for the buyer who usually
p −p p −p
gets scammed. Barrier put options cover losses only between L = W0 0 p0 < and Le = W0 0 p0 with
q
σ2 T
7
Note that the price of an at-the-money (ATM) option (p0 = p> ) is simply given by C>ATM = 2π .
58 9. FINANCIAL ENGINEERING
p < p< , such that if the price swings below p extreme losses are left unhedged (see the right panel
of Fig. 9.4). These contracts are particularly toxic, they are tantamount to a property insurance policy
covering flooding but only if the water level does not exceed 10cm below the ceiling. The thing is that,
even if the insurer tells you that is very unlikely, chances are that if the water reaches such a level it
will likely go all the way to the top. In the same vein, all products for which there is a large probability
of making a small profit and a very small probability of a huge loss, (say 99.99% chances to make $1
and 0.01% chances to loose $10,000) are highly vicious, as estimating such small probabilities often
comes from unreliable models that most often underestimate the occurrence of rare events (for one
thing, because they did not occur in the past). Complex, unrealistic and hardly calibratable models are
often used to bury risk.
g = C> − Y + φ0 (p T − p0 ) . (9.14)
Finding the optimal hedging strategy amounts to finding φ0? which minimises variance and remains
consistent with Bachelier’s argument 〈g〉 = 0. In the zero drift case m = 0 – to which we shall stick in
the following – Bachelier’s argument yields again Eq. (9.13), regardless of φ0 .9 The variance writes:
where R02 = (C> − Y )2 is the unhedged risk of the option. ∂φ0 R 2 = 0 yields:
〈Y (p T − p0 )〉
φ0? = . (9.16)
σ2 T
This shows that there is an optimal number of underlying shares that one should hold to minimise the
risk. On the one hand holding shares reduces the risk because part of the potential loss at maturity due
8
For the sake of simplicity, we restrict to the simplest case when the hedging strategy is constrained to be static, but one
can naturally do better with dynamical optimisation {φn }n∈[0,N −1] (see Sect. 9.2).
9
In full generality C> = 〈Y |(p T , p0 )〉 − φ0 mT .
9.3. OPTIONS 59
to the option being exercised is covered, but on the other hand holding too many shares is bad because
one gets exposed to the fluctuations of the underlying’s price. In the Gaussian case a simplification
appears:
∞
(p − p> )(p − p0 ) (p − p0 )2
Z
1
φ0? = 2 p exp − dp = ∂ p0 C> , (9.17)
σ T p> 2πσ2 T 2σ2 T
often called Black-Scholes Delta hedge. The derivatives of C> are called the Greeks because they are
denoted by Greek letters; in particular ∆ := ∂ p0 C> , hence the name of the hedging strategy.10 Injecting
φ0? in Eq. (9.15) and using Eq. (9.16) yields R 2 = R02 − σ2 T ∆2 .
Black and Scholes considered the continuous time limit where the price follows a continuous time
random walk.11 From here on we no longer make the assumption of zero drift and allow for m 6= 0. Still
from the perspective of the insurer, consider a portfolio short an option and long φ t stocks. Its value Π
follows dΠ = −dC> +φ t dp.12 Further, using Ito’s formula yields dC> = ∂ t C> dt +∂ p C> dp+ σ2 ∂ pp C> dt
2
Black-Scholes Delta hedge follows, quite miraculously, by noting that the only source of uncertainty in
the value of the portfolio being dp, the evolution of Π becomes purely deterministic (zero risk port-
folio) if one chooses φ t? = ∂ p C> . Black-Scholes solution is thus the perfect hedging strategy: the PnL
distribution is, regardless of the risk criterion, a δ distribution at g = 0 (impossible to do better, see
Fig. 9.5). As we shall see below, this strange (and dangerous) property does not survive in real life
conditions.
Figure 9.5: (left) R 2 as function of φ. (right) Shaped PnL distributions of the insurer.
For the model to be arbitrage free requires dΠ = 0 which leads to a backwards diffusion equation,13
sometimes coined the Black-Scholes PDE:
σ2
∂ t C> + 2 ∂ pp C> = 0. (9.19)
Note that the average return m does not appear in Eq. (9.19) and will thus not condition the solution
either, or in other words, the cost of insuring a security in a bear market is, oddly enough, the same
10
Note that the option’s cost depends on the maturity T , the strike p> , p0 and p T , and the way it varies with these parameters
is the main focus of the options trading community. The ∆ of the option is positive given that the higher the p0 , the more
likely it is to reach the strike at maturity, and the more expensive the option. The other greeks are the Gamma Γ := ∂ p0 ∆, the
Theta Θ := −∂ T C> < 0, and the Vega V := ∂σ C> > 0. For m = 0, σ and T only appear through the combination σ2 T , and
one has V = −2T /(σΘ).
11
While the results derived here can also be painfully derived within the previous framework, the formalism of stochastic
differential calculus wedded to the unrealistic continuous limit (on which most of mathematical finance relies) appears to be
extremely convenient.
12
The sign of the first term on the RHS is negative because the insurer has sold the option, and thus he will loose gains if
the option’s price increases.
13
A backwards diffusion equation is a diffusion equation with the wrong sign. The ‘proper’ diffusion equation is recovered
by letting t → −t, which allows the same physical interpretation only backwards in time.
60 9. FINANCIAL ENGINEERING
as in a bull market! Because time flows backwards, one needs a boundary condition in the future
(instead of an initial condition). Here it is given by the price of the option at maturity, obvious in all
circumstances, C> (t = T ) = Y = (p T − p> )1 pT >p> – to be propagated from t = T to t = 0. It is then
easy to solve Eq. (9.19) and show that it leads again to Bachelier’s fair price, Eq. (9.13).
• In such a case, the optimal strategy is not simply given by Black-Scholes Delta hedge. Corrections
can be computed as an expansion involving the cumulants of the returns and the higher order
derivatives ∂ p n C> . However, the resulting φ ? is still an increasing function of p which varies
from 0 to 1. Most importantly the difference with ∆ is actually numerically quite small, and so
is the resulting increase of residual risk.
• Other risk objectives might be more suited to non Gaussian statistics. Optimal strategies in terms
of VaR display similar features with a slower variation of φ ? with p, leading to less portfolio
re-balancing and thus less trading costs.
• The independence of the ∆ hedge with m is at first sight counterintuitive, to say the least. With
m very large and positive, one would expect an increased average pay-off of the option, while
for m large and negative, the option should be worthless. However, such a reasoning does not
take into account the impact of the hedging strategy on the global wealth balance, which is
itself proportional to m. In other words, the average PnL associated to trading the underlying
partly compensates for the modified option’s pay-off due to m in the fair game argument; the
compensation is perfect in the Gaussian case. In the general case, the corrections associated to
m 6= 0 remain in practice numerically quite small, at least for maturities up to a few months.
Black-Scholes zero risk miracle actually comes from the continuous time hypothesis, combined
with Gaussian statistics. Indeed, writing R 2 in discrete time, one finds that the ∆ hedge only perfectly
compensates for R02 in the limit τ → 0, see e.g. [2]. In real life the limit τ → 0 is unachievable, first
because re-hedging in continuous time is physically impossible, and because trading costs increase with
re-hedging frequency!14 For τ finite, the residual risk is not negligible (even with Gaussian increments).
At the leading order in τ the residual risk is given by:
σ2 τ
R ?2 = ρ(1 − ρ) + O τ3/2 ,
(9.20)
2
with ρ the probability at t = 0 that the option is exercised at expiry t = T . To evaluate the quality
of a hedging strategy one commonly defines a quality ratio Q := C> /R ? . Here, in the case of an at-
the-moneypoption (for which ρ = 1/2) with maturity T = N τ and assuming Gaussian fluctuations, one
has Q ≈ 4N /π, which increases only slowly with N . An option of maturity T = 1 month, re-hedged
daily (N ≈ 25) yields Q ≈ 5, which means that the residual risk is one-fifth of the price of the option
itself. Even if one re-hedges every 30 min the quality ratio is only Q ≈ 20.
manifold Σ(p> , T ), commonly coined the volatility surface, appears to increase with the difference be-
tween p0 and p> , which is called the smile effect, and flattens with increasing maturity (as the Gaussian
approximation improves), see Fig. 9.6.
Figure 9.6: Volatility smile for increasing kurtosis, absolute skewness and maturity.
To understand this and in particular the shape of the volatility surface, a cumulant expansionpof
Eq. (9.13) can be performed to account for non-zero skewness and kurtosis. Denoting by ζ T = ζ/ N
and κ T = κ/N respectively the skewness and kurtosis of the price increments on the scale of the
maturity of the option, one has:
v
−M 2 /2 ζ T κT
t T
G 2 3
C> = C> + σ e M+ M −1 +O M , (9.21)
2π 6 24
p
with C>G the Gaussian pricing formula, as given by Eq. (9.13), and M := (p> − p0 )/(σ T ) the rescaled
moneyness. On the other hand, the variation of the Gaussian pricing corresponding to a variation in
volatility δσ2 = 2σδσ can be easily computed and writes:
v
t T 2
G G G
C> (σ + δσ) = C> (σ) + δC> , with δC> = δσ G
e−M /2 , (9.22)
2π
Identifying Eqs (9.21) and (9.22) to set δσ, shows that the effect of non-zero skewness and kurtosis
can thus be reproduced (to first order) by a Gaussian pricing formula with an effective volatility given
by Σ = σ + δσ:
ζT κT
2 3
Σ(p> , T ) = σ 1 + M+ M −1 +O M , (9.23)
6 24
consistent with the shape and amplitude of the smile, see Fig. 9.6. Kurtosis reduces the effective at-the-
money volatility, and increase it out-of-the-money.15 For κ T = 1 (typical of one month options), the
kurtosis correction to strongly out-of-the money options, say M = 3, is ≈ 30%. Negative skew shifts
the minimum to the right such that, around the money, the implied volatility follows a straight line with
negative slope (well observed for options on stock indices, which have a large historical skew). Finally,
note that the heteroskedasticity of financial time series (see Chapter 2) leads to an anomalously slow
decay of kurtosis (less than 1/N ), and thus a very slow flattening of the smile with maturity.
a single day. Ironically enough, it was the very use of a crash-free model (Gaussian tails, no extreme
events) that helped to trigger a crash.16 Let us quickly go through this instructive event which should
have been a lesson to remember better.
The idea of portfolio insurance, principally due to Leland, O’Brien and Rubinstein (LOR) [118, 119],
consisted in using Black-Scholes theory to guide trading such as to set a floor below which the value
of an investment portfolio cannot fall. One may wonder, if the objective was to ensure the possibility
to sell the assets at a guaranteed price level, why not just buy a put option? While as mentioned
above, organised options exchanges existed since 1973, they were actually quite limited [120]. Chiefly
because the SEC was back then rather suspicious of derivatives, they were restricted to short-term,
only on individual stocks (no stock indices), and there were limits on the size of positions that could
be accumulated, making them unsuitable for the insurance of large diversified portfolios. Black and
Scholes had showed that it was possible to mirror, or replicate, perfectly the returns on an option by
continuously adjusting a position on the underlying. While BS had used the idea to compute options’
costs, LOR focused on the interest of the replicating portfolio itself to manufacture a synthetic put, and
implementing large scale portfolio insurance.
Such products were proposed to investors as a substitute for genuine put options, and in the mid-
80s portfolio insurance became a big business. Theoretically the idea was a good one, but that was
without counting on Black-Scholes flaws, the fact that liquidity is finite and that market impact is real.
In 1987 LOR managed over 50B$ when the daily liquidity back then was of the order of 5B$ only. With
such orders of magnitude, it is fairly easy to understand that with abnormal price swings, rebalancing
the replicating portfolio without totally swamping the market would take a long time. This sets the real
system quite far from Black-Scholes continuous time hypothesis... Further, the strategy, which basically
consisted in buying stocks as prices rose and selling them as the value of the portfolio fell towards its
floor, was a clear unstable feedback loop which would do nothing but amplify the swings. And this is
precisely what happened on October 19, 1987.17 See e.g. [120–124].
• I will remember that I didn’t make the world, and it doesn’t satisfy my equations.
• Though I will use models boldly to estimate value, I will not be overly impressed by mathematics.
• I will never sacrifice reality for elegance without explaining why I have done so.
• Nor will I give the people who use my model false comfort about its accuracy. Instead, I will make
explicit its assumptions and oversights.
• I understand that my work may have enormous effects on society and the economy, many of them
beyond my comprehension.
16
A very similar story took place in 2008. Pricing models were again fundamentally flawed as they underestimated the
probability of global systemic risk (multiple borrowers defaulting on their loans simultaneously). By neglecting the very
possibility of a global crisis, they contributed to triggering one [117].
17
While some may argue that portfolio insurance did not trigger the crash, it is clear that it was the catalyst, responsible
for the snowball effect that exacerbated it.
9.4. THE FINANCIAL MODELERS’ MANIFESTO 63
“There are always implicit assumptions behind a model and its solution method. But human
beings have limited foresight and great imagination, so that, inevitably, a model will be used in
ways its creator never intended. This is especially true in trading environments [...] but it’s also a
matter of principle: you just cannot foresee everything. So, even a correct model, correctly solved,
can lead to problems. The more complex the model, the greater this possibility.”
– Emanuel Derman, 1996
“Unfortunately, as the mathematics of finance reaches higher levels so the level of common
sense seems to drop. There have been some well publicised cases of large losses sustained by com-
panies because of their lack of understanding of financial instruments [...]. It is clear that a major
rethink is desperately required if the world is to avoid a mathematician-led market meltdown.”
– Paul Wilmott, 2000
64 9. FINANCIAL ENGINEERING
Appendices
65
Appendix A
Here we introduce some important ideas on classical choice theory and discuss its grounds.
1 βuα X
pα = e , with Z= eβuγ , (A.1)
Z γ∈A
where uα denotes the utility of alternative α,1 and β is a parameter that allows to interpolate between
deterministic utility maximization (β → ∞) and equiprobable choices or full indifference (β = 0).2
Indeed, in the β = 0 limit one obtains ∀α, pα = 1/N with N = card(A ), regardless of the utilities. In
the β → ∞ limit, one obtains that all pα are zero except for pαmax = 1 where αmax = argmaxα∈A (uα ). In
analogy with statistical physics, β := 1/T is often called inverse temperature. In the 2D case A = {1, 2}
one has:
and p2 = 1 − p1 . See Fig. A.1 for an illustration of the limit cases discussed above.
Figure A.1: Probability p1 as function ∆u = u1 − u2 for three different values of the temperature T =
1/β.
1
Utility maximisation in economics is tantamount to energy minimisation in physics; one might define an energy scale for
the alternatives as eγ = −uγ .
2
There is no reason to think that the inverse temperature β should be the same for all agents. Some may be more/less
rational than others, such there could be a distribution of β’s. This however falls beyond the scope of this course.
67
68 APPENDIX A. CHOICE THEORY AND DECISION RULES
A.1.3 Is it justified?
Well, not really. Typical justifications found in the literature are given below.
• Axiomatic [48].
• If one considers that the εαi are iid random variables distributed according to a Gumbel law
(double-exponential), it is possible to show that the probability is indeed given by Eq. (A.1) [127].
The deep reason for choosing a Gumbel distribution remains rather elusive.
• Rational choice theory goes with the assumption that the agent considers all available choices
presented to him, weighs their utilities against one another, and then makes his choice. A number
of criticisms to this view of human behaviour have emerged, with e.g. Simon [128] as a key
figure. Simon highlighted that individuals may be “satisfiers” rather than pure optimisers, in
the sense that there is both a computational cost and a cognitive bias related to considering the
universe of available choices. This led to the idea of bounded rationality as a way to model
real agents [129–132]. Also Schwartz [133] observed that while standard economic theory
advocates for the largest number of options possible, “more can be less” due to both cognitive
and processing limitations. In other words, agents maximizing their utility take intoPaccount the
information cost (or the entropy) which according to Shannon [134] writes P S = − α pα log pα .
In this exploration-exploitation setting, maximizing F = U +T S with U = α pα uα naturally yields
pα ∼ exp(βuα ), see [135]. Although clearly more sound than the previous argument, there are
no solid behavioral grounds supporting it either.
• The ultimate (and honest) argument is that the logit rule is mathematically very convenient.
Indeed, it is non other than the Boltzmann-Gibbs distribution used in statistical physics for which
a number of analytical results are known.
Figure A.2: Probability distribution function (pdf) of noise differences ∆ε and corresponding ccdf.
Note that the idea of locality would translate in the restriction of the sums to a subset of “neighbouring"
choices only.
for all α, γ. Equation (A.4) is called the detailed balance; it translates the idea in statistical physics
that, at equilibrium, each microscopic process should be balanced by its reverse process (microscopic
reversibility) [136]. For Eq. (A.1) to be the equilibrium solution of Eq. (A.3), it thus suffices that
R> (−∆u) = eβ∆u R> (∆u). While the logistic function R> (∆u) = 1/[1+exp(β∆u)] is a natural solution,
there might very well be other possibilities. In particular, provided the distribution of noise differences
is symmetric, there exists a function F (∆u) = 1/2 − R> (∆u) such that F (0) = 0, F (−∆u) = −F (∆u)
and F 0 (0) ≥ 0. A Taylor expansion at small utility differences ∆u reveals:
R (−∆u)
log > = 4F 0 (0)∆u + O(∆u3 ) , (A.5)
R> (∆u)
such that the detailed balance is always satisfied to first order with β = 4F 0 (0),4 but may be violated
for higher orders in utility differences.
Very little is known about detailed-balance-violating models. In statistical physics this question is
relevant for the dynamics of out-of-equilibrium systems. But when it comes to the decisions of people,
it is a highly relevant question even for equilibrium systems since there are no solid grounds to support
the detailed balance (and the logit rule). In all the following we will assume the logit rule as a sound
and simple model for decision making, but one should always remain critical and refrain from drawing
quantitative conclusions.
4
Interestingly enough, this argument allows to relate the temperature to the distribution of noise differences. If r(∆ε) is
Gaussian with standard deviation σ, one finds β ∼ 1/σ.
70 APPENDIX A. CHOICE THEORY AND DECISION RULES
Appendix B
In the previous Chapter we focused on the effects of interactions with the peers. Here, we explore
interactions with the past and their consequences.
“There is a steadily accumulating body of evidence that people, even in carefully set up ex-
perimental conditions, do not behave as they are supposed to do in theory. Heaven alone knows
what they do when they are let loose in the real world with all its distractions. (...) This said,
it seems reasonable to assume that people are inclined to move towards preferable situations in
some more limited sense and not to perversely choose outcomes which make them feel worse off.
But, one can think of many ways in which this could be expressed and one does not need to impose
the formal structure on preferences that we have become used to. People may use simple rules of
thumb and may learn what it is that makes them feel better off, they may have thresholds which
when attained, push them to react.”
– Alan Kirman
In this line of thought, an interesting idea is that the utility, or well-being, associated to a certain
decision may depend on our memory if it has already been made in the past, see e.g. [139].
This encodes that the perceived utility is initially blurred by some estimation error that decays to zero
as the time standing by α grows; Γα−1 is the typical learning time. Note that this is somewhat tantamount
to having an inverse temperature β which increases with time.
71
72 APPENDIX B. IMITATION OF THE PAST
where the first term on the RHS is the intrinsic utility of choice α, while the second accounts for memory
effects. In other terms, the utility assigned to a choice is the sum of a “bare” component indicating a
choice’s objective worth plus a memory term affecting the utility of that choice whenever the individual
has picked it in the past (see Fig. B.1).2 φ is a decaying memory kernel encoding that more recent
choices have a stronger effect, and γ(t) indicates the choice of the individual at time t.
The sign of the kernel separates two different cases: φ < 0 indicates a situation where an individual
grows weary of his past choices, while φ > 0 corresponds to the case where an individual becomes
increasingly endeared with them. The former case leads to an exploration of all the choices. The latter
presents an interesting transition to a self-trapping regime when feedback is strong enough and memory
decays slowly enough. Memory effects hinder the exploration of all choices by the agent, and may even
cause him to leave the optimal choices unexplored: the agent remains stuck with an a priori suboptimal
choice (out of equilibirum), see [140].
Such a narrative then convinces more agents that the effect is real, and their resulting behaviour cre-
ates and reinforces the effect. In other terms, a large consensus among agents about the correlations
between a piece of information and the system’s reaction can be enough to establish these correlations.
Keynes called such a commonly shared representation of the world on which uncertain agents can rely
a convention, he noted that [141]:
1
Hotel and restaurant chains rely on this strong universal principle: people often tend to prefer things they know.
2
One may also think, in the physicist’s language, of an energy landscape (akin to minus the utility) where the energy of a
given site or configuration increases or decreases if the system has already visited that site.
B.2. SELF-FULFILLING PROPHECIES 73
A concrete example of self-fulfilling prophecy with a sudden change of convention is that of the
correlation between bond and stock markets as a function of time. The sign of this correlation has
switched many times in the past. The last one was took place during the 1997 Asian crisis. Before
1997, the correlation was positive, consistent with the belief that low long term interest rates should
favor stocks (since bonds move opposite to rates, an increase in bond prices should trigger an increase in
stock prices). But another story suddenly took over when a fall in stock markets triggered an increased
anxiety of the operators who sold their risky equity to replace it with non-risky government bonds
(Flight to Quality), which then became the dominant pattern.
74 APPENDIX B. IMITATION OF THE PAST
Appendix C
Fish markets
Financial markets are among the most sophisticated and scrutinised markets. They are different from
other markets on many grounds. To highlight their peculiarities, in this appendix we present a very
different kind of markets: fish markets.
On a general note, interactions and transactions most often take place in markets. While this might
seem rather obvious once said, it is surprising to see how more often than not markets are completely
ignored in standard models.
“ It is a peculiar fact that the literature on economics and economic history contains so little
discussion of the central institution that underlies neoclassical economics – the market.”
– Douglass North, 1977
• Transactions are public and available to us, in particular thanks to famous economist Alan Kir-
man [142–147] who thoroughly recorded them in different market places.
• Fish markets are simple notably because Fish is a perishable good and as a consequence there is
no stock management to deal with from one day to the next.
• They can be of different sorts, from peer-to-peer (P2P) such as Marseille, to centralised auctions
such as Ancona, Tokyo or Sydney. We can thus expect to learn something from the differences
and similarities of their outcomes.
As we shall see, perhaps surprisingly, from all the a priori disorder (several buyers and sellers with
different needs and prices), aggregate coordination emerges making the whole thing rather efficient.
Such aggregate behaviour displays a number a regularities, but the latter are clearly not the result of
isolated optimisers, they cannot be attributed to individual rationality, nor can they be accounted for
in the standard competitive framework. The aim of this Chapter is precisely to understand and model
aggregate coordination from the perspective of agents who learn from their past experience, rather
than optimise estimates of their future utilities.
“ What is the meaning of having preferences over future bundles of goods? How do I know
what my preferences will be when I arrive at a future point in time? In particular, if my experiences
influence my tastes how can I know what I will turn out to prefer. [...] There was an advertisement
for Guinness which said, ‘I don’t like Guinness. That’s why I have never tried it’. This seems
absurd to most people but is perfectly consistent with an economist’s view of preferences. Since
75
76 APPENDIX C. FISH MARKETS
my preferences are well defined I do, in fact, know whether I like Guinness or not. Therefore there
is no reason for me to try it if I happen not to like it.”
– Alan Kirman
“ The habits and relationships that people have developed over time seem to correspond much
more to things learnt by the force of experience rather than to conscious calculation. ”
– Alan Kirman
where πi j (t) denotes the accumulated profits and γ is a discount factor accounting for finite
memory effects (the typical memory time scale is γ−1 ).
• We choose logit (or Boltzman) statistics:
1 β Ji j (t) X
pi j (t) = e , with Zi = eβ Jik (t) . (C.2)
Zi k
We further note that 〈πi j 〉 relies on (i) seller j still having some quantity q j of fish on his stand, and (ii)
i visiting j, such that one can write:
π̄ eβ∆ − 1 π̄ β∆
∆= = tanh . (C.5)
γ eβ∆ + 1 γ 2
As in the RFIM, Eq. (C.5) can be solved graphically to obtain the results displayed in Fig. C.1.
Figure C.1: Loyalty formation bifurcation and order parameter in the Weisbuch-Kirman model [148].
For β < βc := 2γ/π, there is only one solution ∆ = 0 corresponding to J1 = J2 = π/2γ, the
buyer visits both sellers with equal probability. For β > βc , the symmetric solution becomes unstable
and two nonzero symmetric solutions ∆± appear. In particular for β ¦ βc , ∆± ∼ ± β − βc . In this
p
regime, the buyer effectively prefers one of the two sellers and visits him more frequently, he becomes
loyal. Loyalty formation is more likely when memory is deep: βc ↓ when γ−1 ↑. Let us stress that
loyalty emerges spontaneously and is accompanied with symmetry breaking: none of the two sellers is
objectively better than the other.
In the disorganised phase in which buyer i chooses among the M sellers with equal probability (∂ j Ji j =
0), one has yi = M Ji2j /M 2 Ji2j = 1/M . In the fully organised phase in which buyer i is loyal to seller k
only (Ji j ∼ δ jk ), one finds yi = 1. In particular 1/ yi represents the number of sellers visited by buyer i.
The simulated data is consistent with a transition from yi ≈ 1/M to 1 when β is increased beyond a
certain βc (see Fig. C.1 and supporting documents).
1
In full generality π̄i j depends on both i and j, is the fish going to be sold at a mall’s food court or in fancy restaurant?
For the sake of simplicity, we here focus on a symmetric case π̄i j = π̄ for all i, j.
2
In the general case buyers interact since P(q j > 0) depends on other buyers having visited seller j before i.
78 APPENDIX C. FISH MARKETS
Interestingly, fluctuations (e.g. of the number of visitors per seller) vanish in the fully organised
phase (β βc ), the buyer-seller network becomes deterministic. One might thus argue that the loyal
phase is “Pareto superior" given that the deterministic character of the interactions allows sellers to
estimate better the quantity of fish that they will sell, avoid unsold fish wasted at the end of the day,
which translates into higher profits than in the disorganised phase (β < βc ) for both customers and
sellers. This provides a nice example in which spontaneous aggregate coordination is beneficial.
Looking at the pdf of Gini indices over all buyers reveal that loyalty also exists in Ancona (G ≈ 0.4 > 0),
but the distribution is unimodal, in contrast with Marseille where it is bimodal. One can argue that the
central auction mechanism erases the behavioral binarity.
Concluding remarks
Markets are a fundamental element of every economy. Fish markets show that the aggregate regularities
and coordination arise from the interactions between highly heterogeneous agents. While in Marseille
nothing prevents buyers to wander around and just pick the cheapest seller as would be required by the
standard model, this is not what happens. Most stylised facts revealed by data cannot be reasonably
accounted for with isolated representative agents; they can instead be reproduced in simple agent-
based models with memory but limited intelligence.3 Finally, differences in market organisation can
lead to differences in the aggregate results.
“ Aggregate regularity should not be considered as corresponding to individual rationality.
(...) The fact that we observe an aggregate result which conforms to the predictions of a particular
model is not enough to justify the conclusion that individuals are actually behaving as they are
assumed to do in that model.
– Alan Kirman
3
While the gap between micro and macro behavior is not straightforward, one does not need to take into account all the
complexity at the individual scale to understand aggregate behavior.
80 APPENDIX C. FISH MARKETS
Bibliography
[1] L. Bachelier, Théorie de la Spéculation, Ph.D. thesis, Ecole Normale Supérieure (1900).
[2] J.-P. Bouchaud and M. Potters, Theory of financial risk and derivative pricing: from statistical
physics to risk management (Cambridge university press, 2003).
[3] J.-P. Bouchaud, J. Bonart, J. Donier, and M. Gould, Trades, quotes and prices: financial markets
under the microscope (Cambridge University Press, 2018).
[5] B. Sharma, S. Agrawal, M. Sharma, D. Bisen, and R. Sharma, arXiv preprint arXiv:1108.0977
(2011).
[6] J. Hasbrouck, Empirical market microstructure: The institutions, economics, and econometrics of
securities trading (Oxford University Press, 2007).
[7] P. Wilmott, Paul Wilmott on Quantitative Finance-3 Volume Set (John Wiley & Sons, 2006).
[8] L. Sophie and L. Charles-albert, Market microstructure in practice (World Scientific, 2018).
[10] P. Wilmott et al., Derivatives: The theory and practice of financial engineering (Wiley Chichester,
1998).
[12] M. Levinson et al., The economist guide to financial markets: Why they exist and how they work
(The Economist, 2014).
[13] A. Ilmanen, Expected returns: An investor’s guide to harvesting market rewards (John Wiley &
Sons, 2011).
[14] I. Tulchinsky, Finding Alphas: A quantitative approach to building trading strategies (John Wiley
& Sons, 2019).
[17] J.-P. Bouchaud and J.-F. Muzy, in The Kolmogorov Legacy in Physics (Springer, 2003) pp. 229–246.
[19] J. Gatheral, T. Jaisson, and M. Rosenbaum, Quantitative Finance 18, 933 (2018).
[20] A. Joulin, A. Lefevre, D. Grunberg, and J.-P. Bouchaud, Wilmott Magazine September-October,
1 (2008).
81
82 BIBLIOGRAPHY
[21] M. Wyart, J.-P. Bouchaud, J. Kockelkoren, M. Potters, and M. Vettorazzo, Quantitative finance
8, 41 (2008).
[22] J. Hasbrouck, Empirical market microstructure: The institutions, economics, and econometrics of
securities trading. (Oxford University Press, 2007).
[23] J.-P. Bouchaud, J. D. Farmer, and F. Lillo, How markets slowly digest changes in supply and demand
(Elsevier: Academic Press, 2008).
[25] J.-P. Bouchaud, Price impact. Encyclopedia of quantitative finance. (Wiley, 2010).
[26] F. Lillo and J. D. Farmer, Studies in Nonlinear Dynamics & Econometrics 8, 1 (2004).
[27] Y. Lempérière, C. Deremble, P. Seager, M. Potters, and J.-P. Bouchaud, arXiv preprint
arXiv:1404.3274 (2014).
[28] A. Majewski, S. Ciliberti, and J.-P. Bouchaud, Journal of Economic Dynamics and Control ,
103791 (2019).
[29] J.-P. Bouchaud, S. Ciliberti, Y. Lemperiere, A. Majewski, P. Seager, and K. Sin Ronia, Available at
SSRN 3070850 (2017).
[30] J.-P. Bouchaud, Y. Gefen, M. Potters, and M. Wyart, Quantitative Finance 4, 176 (2004).
[31] M. Benzaquen, I. Mastromatteo, Z. Eisler, and J.-P. Bouchaud, J. Stat. Mech. 2017, 023406
(2017).
[32] I. Mastromatteo, M. Benzaquen, Z. Eisler, and J.-P. Bouchaud, Risk July 2017 (2017).
[34] J. Hasbrouck and D. J. Seppi, Journal of financial Economics 59, 383 (2001).
[35] A. Boulatov, T. Hendershott, and D. Livdan, The Review of Economic Studies 80, 35 (2013).
[40] J.-F. Muzy, J. Delour, and E. Bacry, The European Physical Journal B-Condensed Matter and
Complex Systems 17, 537 (2000).
[42] S. H. Strogatz, Sync: How order emerges from chaos in the universe, nature, and daily life (Hachette
UK, 2012).
[44] D. Stauffer and A. Aharony, Introduction to percolation theory (Taylor & Francis, 2018).
[45] J.-P. Eckmann and E. Moses, Proceedings of the national academy of sciences 99, 5825 (2002).
[48] S. P. Anderson, A. De Palma, and J.-F. Thisse, Discrete choice theory of product differentiation
(MIT press, 1992).
[49] Q. Michard and J.-P. Bouchaud, The European Physical Journal B-Condensed Matter and Com-
plex Systems 47, 151 (2005).
[50] J. Bongaarts and S. C. Watkins, Population and development review , 639 (1996).
[51] E. L. Glaeser, B. Sacerdote, and J. A. Scheinkman, The Quarterly Journal of Economics 111,
507 (1996).
[52] P. Curty and M. Marsili, Journal of Statistical Mechanics: Theory and Experiment 2006, P03013
(2006).
[54] D. S. Scharfstein, J. C. Stein, et al., American Economic Review 80, 465 (1990).
[55] R. J. Shiller, J. Pound, et al., Survey evidence on diffusion of interest among institutional investors,
Tech. Rep. (Cowles Foundation for Research in Economics, Yale University, 1986).
[57] P. Ehrenfest and T. Ehrenfest-Afanassjewa, Über zwei bekannte Einwände gegen das Boltzmannsche
H-Theorem (Hirzel, 1907).
[58] A. Fosset, J.-P. Bouchaud, and M. Benzaquen, Available at SSRN 3496148 (2019).
[59] D. Challet and Y.-C. Zhang, Physica A: Statistical Mechanics and its Applications 246, 407
(1997).
[60] D. Challet, M. Marsili, and Y.-C. Zhang, Physica A: Statistical Mechanics and its Applications
294, 514 (2001).
[62] B. B. Mandelbrot, Fractals and scaling in finance: Discontinuity, concentration, risk (Springer,
1997).
[65] C. M. Jones, G. Kaul, and M. L. Lipson, Review of financial studies 7, 631 (1994).
[75] M. Benzaquen, J. Donier, and J.-P. Bouchaud, Market Microstructure and Liquidity 2, 1650009
(2016).
[78] M. G. Daniels, J. D. Farmer, L. Gillemot, G. Iori, and E. Smith, Physical review letters 90, 108102
(2003).
[79] R. C. Grinold and R. N. Kahn, Active Portfolio Management (McGraw Hill New York, 2000).
[82] J. Donier and J. Bonart, Market Microstructure and Liquidity 1, 1550008 (2015).
[85] I. Mastromatteo, B. Tóth, and J.-P. Bouchaud, Phys. Rev. E 89, 042805 (2014).
[86] X. Brokmann, E. Serie, J. Kockelkoren, and J.-P. Bouchaud, Market Microstructure and Liquidity
1, 1550007 (2015).
[87] E. Bacry, A. Iuga, M. Lasnier, and C.-A. Lehalle, Market Microstructure and Liquidity 1, 1550009
(2015).
[89] J. Farmer, A. Gerig, F. Lillo, and H. Waelbroeck, Quant. Finance 13, 1743 (2013).
[90] E. Zarinelli, M. Treccani, J. D. Farmer, and F. Lillo, Market Microstructure and Liquidity 1,
1550004 (2015).
[97] J. Donier, J. F. Bonart, I. Mastromatteo, and J.-P. Bouchaud, Quantitative Finance 15, 1109
(2015).
[98] M. Benzaquen and J.-P. Bouchaud, Quantitative Finance 18, 1781 (2018).
[100] W. R. Schneider, Fractional diffusion. In Dynamics and Stochastic Processes Theory and Applications
(Springer Berlin Heidelberg, 1990) pp. 276–286.
BIBLIOGRAPHY 85
[103] J.-P. Bouchaud and M. Potters, (Oxford: Oxford University Press) (2011).
[104] J. Bun, J.-P. Bouchaud, and M. Potters, Physics Reports 666, 1 (2017).
[106] J. Bun, J.-P. Bouchaud, and M. Potters, Physics Reports 666, 1 (2017).
[108] G. Bekaert and G. Wu, The review of financial studies 13, 1 (2000).
[109] J.-P. Bouchaud, A. Matacz, and M. Potters, Phys. Rev. Lett. 87, 228701 (2001).
[110] Q. Li, J. Yang, C. Hsiao, and Y.-J. Chang, Journal of Empirical Finance 12, 650 (2005).
[111] A. Ang and J. Chen, Journal of financial Economics 63, 443 (2002).
[113] E. Balogh, I. Simonsen, B. Z. Nagy, and Z. Néda, Physical Review E 82, 066113 (2010).
[114] M. Wyart and J.-P. Bouchaud, Journal of Economic Behavior & Organization 63, 1 (2007).
[116] F. Black and M. Scholes, Journal of political economy 81, 637 (1973).
[117] D. Colander, H. Föllmer, A. Haas, M. D. Goldberg, K. Juselius, A. Kirman, T. Lux, and B. Sloth,
Univ. of Copenhagen Dept. of Economics Discussion Paper (2009).
[121] D. Bates and R. Craine, Valuing the futures market clearinghouse’s default exposure during the
1987 crash, Tech. Rep. (National Bureau of Economic Research, 1998).
[123] D. MacKenzie, An engine, not a camera: How financial models shape markets (Mit Press, 2008).
[127] D. McFadden and P. Zarembka, Conditional logit analysis of qualitative choice behavior , 105
(1974).
[130] R. Selten, Journal of Institutional and Theoretical Economics (JITE)/Zeitschrift für die gesamte
Staatswissenschaft 146, 649 (1990).
86 BIBLIOGRAPHY
[132] G. Gigerenzer and R. Selten, Bounded rationality: The adaptive toolbox (MIT press, 2002).
[135] J.-P. Nadal, G. Weisbuch, O. Chenevez, and A. Kirman, Advances in self-organization and evo-
lutionary economics, Economica, London , 149 (1998).
[136] N. G. Van Kampen, Stochastic processes in physics and chemistry, Vol. 1 (Elsevier, 1992).
[138] D. Kahneman and R. H. Thaler, Journal of Economic Perspectives 20, 221 (2006).
[139] D. Kahneman, in Choices, Values, and Frames (Cambridge University Press, 2000) pp. 673–692.
[141] J. M. Keynes, “The general theory of interest, employment and money,” (1936).
[142] A. Kirman and A. Vignes, in Issues in contemporary economics (Springer, 1991) pp. 160–185.
[144] A. P. Kirman and N. J. Vriend, in Interaction and Market structure (Springer, 2000) pp. 33–56.
[145] A. P. Kirman and N. J. Vriend, Journal of Economic Dynamics and Control 25, 459 (2001).
[147] M. Gallegati, G. Giulioni, A. Kirman, and A. Palestrini, Journal of Economic Behavior & Orga-
nization 80, 20 (2011).
[148] G. Weisbuch, A. Kirman, and D. Herreiner, The economic journal 110, 411 (2000).
[149] B. Derrida, Souletie, Vannimenus, Stora (eds.) Chance and Matter, North-Holland (1986).
Tutorial sheets
The following tutorials were designed by teaching assistants Jérôme Garnier-Brun, Ruben Zakine, Théo
Dessertaine and José Moran.
87
88 TUTORIALS
8. Compute the volatility signature plots V (τ)/τ for the two fBMs you generated and compare.
Comments?
4. Now show that 〈x(t)〉 −→ 0. Taking x(0) = 0 for simplicity, compute the product x(t)x(t + τ)
t→∞
and average it over ξ to recover the correlation function:
σ2 −ω|τ|
− e−ω(2t+τ) .
C(t, τ) = 〈x(t)x(t + τ)〉 = e (9.13)
2ω
Comment on its behavior at long times.
5. Create a DataFrame df where each row is a time point and with 100 columns corresponding
to an instance of an Ornstein-Uhlenbeck process with σ = 1 and your choice of ω. (hint: use
pd.DataFrame(x) where x is the output of a function you have defined). Take again T = 105
and dt = 10−3 .
6. Create a function that computes the correlation x(t)x(t + τdt), averages it over t for a given
realization, and then averages over all the realizations in df for a single value τ. Compute
this correlation function over a set of points (at least up to τdt = 2) and compare this with
σ/(2ω)e−ωdtτ .
7. Compute the variogram V (τ) of the Ornstein-Uhlenbeck process, as well as the volatility signa-
ture plot V (τ)/τ. Compare it to the fBM. What do you notice?
8. Recasting the equation in a Langevin form
dx(t)
= −U 0 (x(t)) + σξ(t), (9.14)
dt
can you provide an intuitive explanation for the variogram you just obtained? Draw a sketch!
Bonus: can you give the steady state probability distribution of the particle’s position at a glance?
can be rewritten as a function of σ(τ) = M2 (τ) in this case where the probability to move by
p
∆x in time τ is given by
1 2
1 ∆x e− 2 u
Pτ (∆x) = f , f (u) = p . (9.16)
σ(τ) σ(τ) 2π
Explain why the fBM is “scale-invariant”.
2. Verify your finding numerically.
3. You should have found that for the fBM Mq (τ) ∝ σ(τ)q . Such a scaling is referred to as
monofractal, while a process with Mq (τ) ∝ σ(τ)ζ(q) with ζ(q) a non-linear function of q is
called multifractal. You should have a file gen_heliumjet_R89.npy describing the velocity of
a turbulent Helium jet (synthetically generated). Simply looking at the time series (note the file
includes 4 times series), do you think it can be adequately described by a monofractal process?
What about a multifractal one?
4. Plot Mq (τ) calculated from the experimental data as a function of τ for different values of q.
Overlapping aσ(τ)q on the data (with a a scaling constant), can you confirm or infirm the
monofractal nature of the signal?
5. Motivated by turbulent jets and financial time series (see PC3), Bacry, Muzy & Delour introduced
the Multifractal Random Walk (MRW), for which it can be shown that
Play with the expression on the results from the previous question and comment. What do you
think is a good guess of λ?
90 TUTORIALS
where the X i are iid random variables with mean 〈X i 〉 = 0 and variance X i2 = σ2 .
1. State the central limit theorem.
2. Consider the characteristic function GX (k) = eikX . Show that if the X i are Gaussian, then YN
is also Gaussian and of variance N σ2 .
p
3. Generalize the result to ZN = YN / N for all distributions of X i with bounded variance, thus
proving the central limit theorem.
4. Mentally prepare yourself to approximate any large sum of random variables as a constant plus
Gaussian white noise until the end of the course.
5. (Bonus) Prove the central limit theorem in a statistical physics style, starting from
∞ N
Z
Y X
PYN ( y) = dx i PX 1 ,...,X N (x 1 , . . . , x N ) δ y − xi . (9.19)
−∞ i=1 i
dλ
R
Hint: use the integral representation δ(u) = R 2π
eiλu and the fact that N is large.
Does the CLT appear to survive? If so, provide a heuristic expression for the variance from your
numerical experiments.
TUTORIAL 2 91
5. Repeat but with variables with longer range correlations, for example
σ2
〈X i X j 〉 = . (9.22)
|i − j|2
where x m is the minimum (necessarily positive) value taken by X , and α > 0 is a shape parameter.
1. Calculate the mean and variance of X . What range(s) of values of α should we focus on?
2. Rather than the sum of N random variables, let us now consider the maximum value of the
draw, that we will write MN = max{X 1 , . . . , X N }. Write down the general expression for its
PDF, PMN (m), as a function of the PDF and the cumulative distribution of X and check that it is
correctly normalized.
3. Based on PMN (m), show that the most probable value of the maximum MN∗ satisfies
6. In light of this last result, discuss the contribution of the largest term to the fluctuations of the
sum YN and relate this to your answer to Question 1.
7. It turns out that the CLT generalizes to heavy-tailed random variables with diverging variance
(and average). Based on the result for the maximum, can you hazard a guess at the tail behavior
of the empirical mean of N → ∞ heavy-tailed random variables? Verify your intuition with a
numerical simulation Student-t case. Hint: the pdf of a Student-t of parameter ν decays as
∼ x −(ν+1) .
8. This exercice provided a very coarse introduction to extreme value statistics, which has become
a cornerstone of statistical mechanics and complexity science. Based on your very first result in
the field, could you explain to a friend why comparing countries normalized by their population
sizes at the olympics might not be such a smart idea?
network), but may also be an essential tool to clean noisy real-world data. In this exercise, we will
focus on the latter and more specifically how random matrix theory may help us to estimate the covari-
ance matrix between N time-varying variables X 1 , . . . , X N from simultaneous realizations (e.g. stocks).
Assuming these variables are centered and of unit variance, a standard estimator for the covariance
matrix is given by
T
1X t t
Ei j = X X , (9.27)
T t=1 i j
where T is the number of timesteps we have access to.
1. Show that E is symmetric and positive semi-definite by rewriting it with the N × T data matrix
H.
2. Let q = N /T . What is an immediate consequence of trying to estimate the matrix when q > 1,
i.e. when there are more variables than samples in time?
3. In the context of random matrices, the natural extension of the expected value is the normalized
trace operator,
1
τ(A) = 〈Tr A〉 , (9.28)
N
such that the kth moment of A is τ(Ak ). To convince ourselves that the empirical covariance
matrix will be distorted by noise whenever q > 0, calculate the first two moments of E, assuming
that the data is uncorrelated in time i.e. 〈X it X sj 〉 = Ci j δ t,s . Hint: use Wick’s theorem,
X Y
〈X 1 X 2 . . . X 2n 〉 = 〈X i X j 〉, (9.29)
pairings pairs
where we sum over all distinct pairings of {X 1 , . . . , X 2n } and each summand is the product of the
n pairs.
4. A common approach to compute the eigenvalue distribution of a random matrix is to go through
the Stieltjes transform,
gA (z) = τ [z1 − A]−1 , (9.30)
1
ρ(x) = lim Im gA (x − iη), (9.31)
π η→0+
where ρ(x) is the probability density function of the eigenvalues when N → ∞. In the simplest
possible case C = 1, E is referred to as the Wishart matrix W, the Stieltjes transform of which
can be shown to be given by
z + q − 1 ± (z + q − 1)2 − 4qz
p
gW (z) = . (9.32)
2qz
p
Show that the zeros of the argument under the square root are given by λ± = (1 ± q)2 , and
1
subsequently choose the correct branch of gW (z) in order to satisfy gW (z) → z as z → ∞.
5. Given that only the square root has a contribution in the imaginary part of the Stieltjes transform
when η → 0+ , express the eigenvalue density as a function of λ± . This density is known as the
Marčenko-Pastur law. For what values of q is it correctly normalized?
6. Run numerical simulations to test the eigenvalue density that you have found for q < 1 and
N = {20, 50, 100}. What do you observe when the density is no longer normalized analytically?
7. How do you think one could use the Marčenko-Pastur density to improve the measurement of
empirical covariance matrices, for example on the stock market?
TUTORIAL 3 93
4. Write a function that computes the series for r t∆ .Compute the mean and std. of r t1 , as well as
pt
that of r t250 = log p t−250 , the yearly returns. Comments?
5. In mathematical finance, people often resort to the following model, “à la Black-Scholes ” to
study price dynamics:
p t = (1 + µ)p t−1 + ση t p t−1 (9.33)
where η t is a gaussian random variable with 〈η t η t 0 〉 = δ(t − t 0 ) and 〈η t 〉 = 0 (white noise), and
with σ, µ 1. Interpret the terms µ and σ. How does one write r t within this framework?
6. Use np.random.randn to draw a random gaussian numbers r̃ t of the same length as r t and
multiply them by your estimation of σ to have the correct variance. Plot r̃ t . Comments?
7. We define the survival function (recall PC2) of the returns as
Z∞
F (x) := dr ρ(r) (9.34)
x
where ρ is the density function associated to the returns. Plot the survival functions of the daily
returns r t on the right (use r t on positive-valued bins) and left (-r t on positive-valued bins) tails.
Hint: look up how to do this on the Numpy Cheatsheet.
8. After importing the correct module with import scipy.stats, use the functions
y_normal = s c i p y . s t a t s . norm . s f ( x = x , s c a l e = sigma )
y _ s t u d e n t = s c i p y . s t a t s . t . s f ( x = x , s c a l e = sigma , d f = nu )
to compute the survival functions of a normal distribution of std. σ and of a Student-t distribu-
tion of std. σ and tail parameter ν at x (can be a numpy array). Estimate σ from data, but play
with ν. Which parameter fits best the right and left tails (try ν ∈ [2, 6])?
94 TUTORIALS
9. Do the same as the two previous functions, but for ∆ ∈ {30, 60, 90}. Qualitatively, what is
happening? (Remember to adjust with the corresponding value of σ∆ ).
10. What can you comment about the model proposed in Eq. (9.33)?
Part 3 : Correlations
1. Before beginning, center the daily returns by removing the mean, i.e. r t ← r t − 〈r t 〉. This is
standard when working with correlations.
2. Write a function to compute the correlation function
〈r t r t+τ 〉
C r,r (τ) = q (9.35)
r t2
Compute it for −200 ≤ τ < 200. What do you expect to see? Same question but for
3. Compute and plot C|r|,|r| (τ) and C r 2 ,r 2 (τ) for 1 ≤ τ < 1000. Choose a log-log scale when
plotting. Comments?
4. What does this imply for the proper modeling of return dynamics?
where η t is a white noise of 0 mean and variance 〈η t η t 0 〉 = 2σ02 δ(t − t 0 ) and where G (t) = G(t + 1) −
G(t) is the discrete derivative of the propagator. In this case, the mid price m t = m t−1 + r t−1 reads
X X
m t = m0 + G(t − t 0 )" t 0 + ηt0 . (9.37)
t 0 <t t 0 <t
1. We are interested first in the response function of the returns to sign fluctuations, namely
Show that X
R(`) = S(i). (9.42)
0≤i<`
V (τ) = (m t+τ − m t )2 t
, (9.43)
again computing first for each day and then averaging over all days in January. Compute for
0 ≤ τ < 2000 and plot in log-log scale, then plot the volatility signature plot V (τ)/τ.
5. How can you interpret previous figures?
a 3 b 2
U(x) = x + x + c x.
3 2
Discuss the influence of the parameters on the number and position of the extrema of U.
3. In field theory, the simplest model to study critical phenomena for is to consider the “mexican
hat” or ϕ 4 free energy functional at a relative “distance” t from the critical temperature Tc ,
Z
t 2 T − Tc
F [ϕ(x)] = ϕ (x) + ϕ 4 (x) dx, t= (9.45)
Ω 2 Tc
that is quartic and not cubic. Do you have any idea why?
where λ represents the “market depth” (the larger the market, the smaller the effect of inbalance on
the return). We can then write a generic model for the dynamics of the demand-supply imbalance:
with ξ t a Gaussian white noise (zero mean, unit variance), and p F the fundamental value assuming it
exists (or the market’s belief of the fundamental value if it doesn’t).
1. Discuss the mechanisms that the different terms in equation (9.46) are attempting to model.
We now take the continuous limit such as to approximate the return as r t ' u = ∂ t p t .
2. Show (with a physicist’s level of rigor), that equation (9.46) may be rewritten as
du
= −U 0 (u(t)) + ξ̃ t (9.47)
dt
with the potential
α 2 β 3
U(u) = κ(p t − p F )u + u + u .
2 3
What are the expressions of κ, α, β and ξ̃ t as a function of the initial parameters?
3. Based on our previous experience with a cubic potential, how do you think the different pa-
rameters will affect the evolution of the price? Notice that the price alters the potential as it
evolves!
d2 p t dp t
= −α − κ(p t − p F ) + ξ̃ t . (9.48)
dt 2 dt
If we are close to the fundamental value of the price, can you say anything on the fluctuations
of the price ‘velocity’? Think of the other SDEs we have seen so far!
3. Suppose now that β > 0, α > 0 and p = p F (t = 0). Locate the system’s equilibria, study their
stability and calculate the height of the potential barrier ∆U between the equilibria. How do
you think this quantity impacts the system’s evolution?
with, at this stage, p t = p fixed and independent of u t . Take χ = 1 and try p = ±1, what
difference do you observe?
3. Now introduce the feedback by updating p t as
p t+δt = p t + r t δt
where hi is an idiosyncratic bias proper to each agent. We suppose that the hi are random iid variables
with a density ρ(h) centered at zero and of characteristic width σ.
1. Interpret the different terms in the equation.
2. From now on, we take the mean-field limit Ji j = J0 /N ∀(i, j), J0 > 0. What does Eq. (9.49) look
like now? Rewrite it in terms of the average opinion m(t) = N1 i Si (t).
P
3. What are the values of the Si and m variables in the limits F = ±∞? How about when J0 =
F = 0?
4. How do you think the average opinion evolves with F (t) for J0 /σ = 0? Same question but now
for J0 /σ positive but small, and finally for J0 /σ large.
2. Now take the generic case where ρ(h) is a unimodal distribution with a finite characteristic
width. We introduce φ the fraction of agents with opinion +1. Show that in N → ∞ limit, we
can rewrite the dynamics as
where P> is the survival function of h. Using this expression, write the equation for a fixed point
to the dynamics in terms of the average opinion.
3. Writing m∗ the average opinion at the fixed point, study its equation graphically to determine
the number of solutions. (Hint: you may fix F = 0 and consider the case where h is Gaussian.)
Can you now see why J0 /σ is the relevant parameter to characterize the system?
4. Show that the number of solution changes at the critical point reached for
e−λ(s) λ(s)s−1
P(s) = , (9.55)
(s − 1)!
The model
We consider N agents trading a single asset. Each agent i has a trading intention φi (t) ∈ {−1, 0, 1},
where −1 represents a sell intention, 1 a buy and 0 an agent doing nothing. At each time t, these
intentions are chosen randomly with probability a for ±1 and 1 − 2a for 0. The underlying process of
why an agent choses either one of the intentions is not specified.
Furthermore, we assume that agents display herding behavior: (1) they cluster with each other
and share a common trading intention within one cluster (2) clusters are independent. The clustering
procedure is as follows. Each agent may form a link with the N − 1 others. A link is formed with
probability p independent of the link. The average number of links around one agent is p(N − 1). For
this number to remain finite in the limit N 1, we choose p = c/N . Such procedure for constructing
the graph has first been brought up by Erdo s and Renyi. Interestingly enough, these graphs display
a phase transition for the size of their largest connected component S. If c > 1 then f := S/N > 0,
otherwise f → 0. On can show that the size W of one randomly chosen connected component of the
graph is such that
A w
P(W = w) ∼ exp −(c − 1) , (9.57)
w→∞ |w|5/2 w0
which becomes a pure power-law for c = 1.
Finally, as a first approximation, we assume that the price variation associated with the trading
intentions {φα } depends linearly on the intentions. As a result, we model the price change of the asset
as follows
nc nc
1X 1X
∆p(t) = p(t + 1) − p(t) = Wα φα (t) := X α (t), (9.58)
λ α=1 λ α=1
with nc the (random) number of clusters, Wα the size of cluster α and φα the associated trading inten-
tion. The parameter λ is called market depth and measures how much trading intention is needed if
one wants to move the price by one unit.
1. Argue as to why the X α ’s are independent identically distributed random variables. Generically,
how is the distribution of the sum N i.i.d. random variables related to the distribution of the a
single draw?
2. Denoting by F (x) = P(X α ≤ x|φα 6= 0), show that
N
X
P(nc = k)[1 − 2a + 2a f˜(z)]k = exp ψ log 1 − 2a + 2a f˜(z)
F (z) =
k=1
with ψ the cumulant generating function of nc . We denote by γ(z) the quantity 1 −2a + ˜
zX
2a f (z).
(Hint: the cumulant generating function of a random variable X is given by log E e .)
6. Using graph theory, one can show that
N c −z
ψ(z) = N z + (e − 1).
2
Deduce
1
Nc
N
F (z) = γ exp −1 .
2 γ
7. What is the average number no of trading agents in the market? How can it remain finite in the
limit N → ∞?
8. With this prescription, show that in the limit N → ∞, one gets
h c ˜ i
F (z) ≈ exp no 1 − ( f (z) − 1) .
2
9. We give the second moment µ2 (α) = A(c)(1 − c)−1 and the fourth moment µ4 (α) = A(c)(1 +
2c)(1 − c)−5 of Wα . Deduce the excess kurtosis of the distribution of the price variation λ∆p
1 + 2c
κ(λ∆p) = c
.
no 1 − A(c)(1 − c)3
2
10. Comments?
Part 2 : Simulations
Using the networkx package (imported as nx), one can easily generate random Erdo s-Renyi graphs
using
nx.fast_gnp_random_graph(N,p)
1. For N = 104 , no = 2000, λ = 10, generate a series of returns on T = 104 and for c = 1.
2. Plot the histogram of positive and negative log-returns. Compare it to a Gaussian distribution.
What is the exponent of the tail?
Introduction
In a series of experiments carried on harvester’s ants, entomologists Deneubourg and Pasteels stumbled
upon a curious phenomenon. Provided with two a priori identical food sources, ants tend to prefer
one over the other but sometimes switch without any apparent reason. Ants display an asymmetrical
behavior in a symmetrical situation. In the realm of social sciences, we can relate this behavior to the
phenomenon known as "herding". Herd instinct in finance is the phenomenon where investors follow
what they perceive other investors are doing rather than their own analysis. Herd instinct has a history
of starting large, unfounded market rallies and sell-offs that are often based on a lack of fundamental
support to justify either. In his article Ants, rationality and recruitment, economist Alan Kirman proposes
a simple model to explain ants’ behaviors and by extension "herding".
The model
Consider N ants divided between identical and always-full food sources denoted by F1 and F2 . We
denote by k(t) the number of ants in zone F1 at time t. Between time t and time t + 1, one ant is
randomly drawn and can switch food source subjected to two influences
• spontaneous switch to the other food source with probability " ∈]0, 1]. We exclude 0 to avoid all
the ants being stuck in one food source.
µ
• The first ant follows another randomly chosen ant with probability N ∈ [0, 1].
µ k
k
W (k → k + 1) = 1 − "+ (9.59)
N N N −1
µ N −k
k
W (k → k − 1) = "+ (9.60)
N N N −1
3. (a) Show that the probability to have k ants in F1 at time t obeys the so-called Master Equation
(b) Denoting by P(t) the vector P(t)k = P(k, t), show that there exists a stochastic matrix T
such that
P(t + 1) = TP(t)
(c) Deduce the existence of a stationary probability measure Ps for the repartition of ants be-
tween the two food sources.
TUTORIAL 7 103
4. If you had to guess, how do you think the stationary distribution looks when switches or follow-
ing dominate?
5. For the stationary probability measure Ps , establish the global balance condition
X X
W (` → k)Ps (`) = W (k → `)Ps (k) (9.62)
`6=k `6=k
and provide an interpretation. What further condition on the stationary distribution directly
satisfies this global balance condition?
Γ (2α)
fs (x) = [x(1 − x)]α−1 , (9.63)
Γ 2 (α)
R∞ R1
with α = "/µ and Γ (x) = 0
dt t x−1 e−x . We give the identity B(α, β) := 0
dx x β−1 (1 −
Γ (α)Γ (β)
x)α−1 = Γ (α+β) .
3. Sketch this density and interpret it for different values of α.
The model
In this model, we assume that orders may be place for any value of the price therefore taking a contin-
uous limit. We denote by ρA(x, t) and ρB (t) the latent volume densities in the Ask/Bid sides. These
quantities evolves according to a set of rules
• When a buy intention meets a sell intention they are instantaneously matched and are thus re-
moved from the LOB. We implicitly assume that latent orders are revealed in the vicinity of the
trade price p t .
• The trade price p t is conventionally defined through the equation ρB (p t , t) = ρA(p t , t).
We finally define the reduced latent order book density by φ(x, t) = ρB (x, t) − ρA(x, t).
where both βi and ηi,t are random variable with densities Pβ and Pη .
1. Interpret the different terms of the update. What can you say about the distributions of β and
η?
2. Show that the density of latent orders ρ(x, t) at price x at time t, follows the evolution
Z
ρ(x, t + δt) = dβ dη dx 0 Pβ (β)Pη (η)ρ(x 0 , t)δ(x − x 0 − β f t − η),
∂ t ρ = −Vt ∂ x ρ + D∂ x2 ρ.
TUTORIAL 8 105
5. Show that, with the appropriate change of the variable x, we get the diffusion equation with
diffusion constant D
∂ t ρ = D∂ x2 ρ. (9.67)
In the following, we will consider the case of uninformed impact by setting Vt = 0.
6. Show that
x2
1
G(x, t) = p exp − ,
4πDt 4Dt
R +∞ ax 2 q b2
is a solution to the above equation. Hint: ∀a > 0, ∀b ∈ C, −∞ e− 2 +bx dx = 2π
a e .
2a
2. (a) Justify why the deposition rate can be modelled by an additive action. If the execution
price is p t , on average, are buy (resp. sell) orders place above or below p t ?
(b) Deduce that (9.68) is modified as follows
∂ t ρA = D∂ x2 ρA − νρA + λΘ(x − p t )
(9.69)
∂ t ρB = D∂ x2 ρB − νρB + λΘ(p t − x),
λ
φ st (ξ) = − sign(ξ) 1 − e−|ξ|/ξc
ν
p
with ξ = x − p∞ and ξc := Dν−1 .
2. Draw the shape of the stationary order book and exhibit a quantity below which the order book
can be considered linear. Take the infinite memory limit. What is the shape of φ st (ξ)?
Since meta-order of size Q are too big to be executed at once, they are usually broken down and
placed within a time-span of T with a rate m t such that
Z T
Q= ds ms .
0
106 TUTORIALS
We can model this as an extra source term m t δ(x − p t )· 1[0,T ] in the reaction-diffusion equation.
Using results from partial differential equations, one can show that the solution of this equation
in the infinite memory limit reads
with t ∧ T = min(t, T ).
3. We assume that the execution rate is constant (m t := m0 ) and we place ourselves at the end of
the execution (t = T ). Show that
T
(p T − ps )2
Z
m0 ds
pT = exp − .
L 4D(T − s)
p
0 4πD(T − s)
p
4. It is straightforward to check that ps = A Ds is a solution provided A is solution to a particular
algebraic equation.
(a) Show that
1 p
A2 (1 − u)
Z
m0 du
A= exp − p .
LD 4(1 + u)
p
0 4π(1 − u)
m0
(b) In the limit m0 L D, derive A ≈ p .
LD π
q
m0
(c) In the limit m0 L D, derive A ∝ LD.
5. Recalling that p T is the impact of the meta-order when assuming p0 = 0, show that I(Q) ∝
p
Q.
1
Yi = − log(Ui ), (9.71)
λ
with Ui ∼ Unif[0, 1], then the Yi are exponentially distributed with rate λ. Hint: use the
cumulative distribution function.
2. Consider a particle on a lattice that jumps to the right with rate Dr and jumps to the left with
rate D` . How do you select the next particle action and how do you find the time of the jump?
TUTORIAL 8 107
3. Consider N particles on the same site. For a single particle, the rate to jump to the right is Dr , to
jump to the left is D` . Particles disappear with multiplicative rate ν. In addition, a new particle
can be deposited on the site with rate λ. What is the total event rate on the site? How do you
choose the upcoming event?
4. Consider a discrete LOB with a large number of sites S. Each site k gathers nk orders placed at
a given price (on the US market, intersite spacing is $0.01). How do you determine the next
event in the whole LOB?
5. Consider now two types of orders (or particles): buy limit orders (piling up at the left of the
mid-price), and sell limit orders (piling up at the right the the mid-price). List all events that
can change the mid-price.
6. Using the previous questions, suggest an algorithm that simulates the LOB dynamics.
1 −1
w∗ = C (λµ + γ1), (9.72)
2
where λ and γ are Lagrange multipliers, who’s values are fixed by solving the system
C = IN + ββ > (9.74)
1. Now that riskier assets can no longer be balanced with both long and short positions, do you
think the optimal portfolio should still have all nonzero weights?
2. We introduce a vector of “spins” θi = {0, 1}, i = 1, . . . , N , representing the exclusion or inclusion
of an asset in the portfolio. A heuristic algorithm to construct a long-only optimal portfolio is as
follows:
1. Start from θ = 1,
2. Compute the Markowitz optimal portfolio w∗ ,
3. Whenever w∗i < 0, set θi = 0,
4. Go back to step 2, computing w∗ → w̃∗ with a reduced matrix C̃ including only entries with
θi = 1,
and iterate until w̃∗i > 0 ∀i with θi = 1. Code this procedure in a function gen_Markowitz_LO(C).
3. We introduce the average “magnetization”
1X
·
m= θi . (9.76)
N i
Provide an interpretation for this quantity in the portfolio context. What should we take the
average on?
4. Going back to the single-index risk model, compute the average magnetization of the long-only
Markowitz portfolios for fixed values of σ (for example σ = {0.01, 0.05, 0.1}) and N ranging
from 100 to 104 . Plot m as a function of N . How about as a function of σN ? Provide an
interpretation for the effect of σ.
5. What do you think is the consequence for an investor of having a very sparse portfolio? Keep in
mind that in reality the vector β will vary in time!
β j2 θ j + 1
P
+ j
β = . (9.78)
j βjθj
P
How can you relate the magnetization m to the mean threshold when N → ∞?
110 TUTORIALS
3. Using a (loose) CLT argument, show that the mean threshold should satisfy
〈β 2 〉c 1 1
β+ = + , (9.79)
〈β〉c N 〈β〉c
R β+
with the notation 〈g(β)〉c = −∞ dβ g(β)ρ(β). Can you guess why the scaling in σN found
earlier requires us to keep the O (1/N ) term? What next steps would you take to try to go
further?