0% found this document useful (0 votes)
41 views

m2 Stochastic Calculus Course 2020 2021

Uploaded by

Matthew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

m2 Stochastic Calculus Course 2020 2021

Uploaded by

Matthew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 150

Introduction to stochastic calculus

Rough lecture notes

Master of mathematics
Université Paris-Dauphine – PSL

2020 – 2021
Corrected version
Compiled August 27, 2024

Course home page on https://djalil.chafai.net/


Suggested schedule of the lectures.

We recommend that the in class or oral lectures differ from the lecture notes, ideally they should contain
less details and should be focused on the essential aspects, the structure, the culture, and the intuition.

• Lecture 1 (2 x 1.5h)
Chapter 1 (Preliminaries)

• Lecture 2 (2 x 1.5h)
Chapter 2 (Processes, filtrations, stopping times, martingales)

• Lecture 3 (2 x 1.5h)
Chapter 3 (Brownian motion)

• Lecture 4 (2 x 1.5h)
Chapter 3 (Brownian motion)

• Lecture 5 (2 x 1.5h)
Chapter 4 (More on martingales)

• Lecture 6 (2 x 1.5h)
Chapter 5 (Itô stochastic integral with respect to BM)

• Lecture 7 (2 x 1.5h)
Chapter 5 (Itô stochastic integral and semi-martingales)

• Lecture 8 (2 x 1.5h)
Chapter 6 (Itô formula and applications)

• Lecture 9 (2 x 1.5h)
Chapter 6 (Itô formula and applications)

• Lecture 10 (2 x 1.5h)
Chapter 7 (Stochastic differential equations)

• Lecture 11 (2 x 1.5h)
Chapter 7 (Stochastic differential equations)

• Lecture 12 (2 x 1.5h)
Chapter 8 (More links with partial differential equations)

• Exam

There are also separate excercises sessions (séances de travaux dirigés).

ii/142
These are the lecture notes of an introduction course on stochastic calculus, given at Université Paris-Dauphine – PSL, for
second year master students in mathematics1 . The prerequisite is a probability theory course based on Lebesgue integral, including
conditional expectation, gaussian random vectors, and standard notions of convergence. The initial version of these lecture notes
was based on a course given by Halim Doss, inspired from the book by Nobuyuki Ikeda and Sinzo Watanabe [20]. The current
version is also inspired in part from the books by Fabrice Baudoin [4] and Jean-François Le Gall [31], and by plenty of other sources.
Some bits are truly original. Beware that these lecture notes are designed to constitute a rich written reference for the live course.
The live course concerns only a strict subpart focusing on intuition, selected for being essential for understanding the concepts and
techniques. At the time of writing, here are the main differences with the written lecture notes by Halim Doss before 2018:
• More on probability basics, uniform integrability, Lebesgue – Stieltjes integral
• More on martingales and local martingales
• More on examples and applications everywhere
• More on history, intuition, link with physics, programming
• Properties of Brownian motion, Dubins – Schwarz theorem, Feynman – Kac formula, Langevin processes
• More on semi-martingales, stochastic integral, and Itô formula
These lecture notes do not cover several important topics related to stochastic calculus, such as fine analysis of Brownian motion
: regularity, excursions, zeros, recurrence and transcience, etc, random time change, Euler – Maruyama schemes for numerical
analysis of stochastic differential equations, applications of stochastic calculus to finance, physics, biology, statistics, stochastic
control, and Monte Carlo methods, Malliavin calculus, Stroock – Varadhan support theorems, local times and Tanaka formula,
Schilder large deviation principle, additive functionals : law of large numbers, ergodic theorems, central limit theorems, large
deviation principles, link with entropy and Poisson equation, Doob H-transforms, Friedlin – Wentzell large deviations principle
for perturbation of dynamical systems, Feller branching diffusions, branching Brownian motion, Fisher – Wright diffusion, diffu-
sions with jumps, space/time white noise, Bakry – Émery non-explosion criterion and link with Poincaré, logarithmic Sobolev, and
isoperimetric functional inequalities, diffusions on manifolds, Eyrings- Kramers formula, etc. On the other hand, some topics are
considered in the exams, such as Cox – Ingersoll – Ross and Bessel processes, Lévy area of planar Brownian motion, etc.
There are many other references on the subject. An accessible introduction are the books by Laurence Craig Evans [16] and
by Bernt Øksendal [49]. The books by Richard Durrett [13], Philip Protter [42], and Hui-Hsiung Kuo [28] are also accessible. More
advanced references include the books by Michel Métivier [36], Chris Rogers and David Williams [44, 45], Daniel Stroock and
Srinivasa Varadhan [47], Ioannis Karatzas and Steven Shreve [24], Daniel Revuz and Marc Yor [43], Jean Jacod [21], Iosif Gikhman
and Anatoli Skorokhod [18], and by Claude Delacherie and Paul-André Meyer [9, 10]. Finally, accessible references with exercises
include the book by Francis Comets and Thierry Meyre [8] (in French) and Paolo Baldi [3] for instance.

Contributors.

• 2018 – 2022 : Djalil Chafaï

• – 2018 : Halim Doss

Glitches hunters.

• 2023 – 2024 : Aniss Fares

• 2021 – 2022 : Pauline Amrouche, Faniriana Rakoto Endor, Justin Salez

• 2020 – 2021 : Oskar Bataillon, Yi Han, Qiaoyu Luo, Gabriel Moreira-Nogueira, Diego Alejandro Murillo
Taborda, Lyes Tifoun, Walid El Wahabi

• 2019 – 2020 : Oscar Cosserat, Łukasz Madry,


˛ Alejandro Rosales Ortiz, Ziyu Zhou

• 2018 – 2019 : Clément Berenfeld

1 MASEF (Mathématiques pour l’économie et la finance) and MATH (Mathématiques appliquées et théoriques).

iii/142
Notation.

R+ [0, +∞)
BM Brownian motion
O, o Landau notation
iff if and only if
a.s. almost surely
u.i. uniformly integrable
w.r.t. with respect to
1A indicator of A
x · y or 〈x, y〉 x y + · · · + x d y d if x, y ∈ Rd
q1 1
|x| x 12 + · · · + x d2 if x ∈ Rd
BE Borel σ-algebra of E
e exponential
d differential element
i the complex number (0, 1)
d , i , j , k, m, n, ℓ integer numbers
p, q, r, s, t , u, v, α, β, ε real numbers
s ∧ t and s ∨ t min(s, t ) and max(s, t )
f is increasing f (y) ≥ f (x) if y ≥ x
p
LRd (Ω, P) X : Ω → Rd measurable with E(∥X ∥p ) < ∞
〈x, y〉H scalar product in the Hilbert space H
〈M , N 〉 angle bracket of local martingales M , N
〈M 〉 〈M , M 〉
[M , N ] square bracket of local martignales M , N
[M ] [M , M ]
X ∼µ X has law µ

iv/142
Some of the scientists related to Brownian motion and stochastic calculus

Life time Scientist


(1975 – ) Martin Hairer
(1968 – ) Wendelin Werner
(1959 – ) Jean-François Le Gall
(1955 – ) Alain-Sol Sznitman
(1954 – ) Dominique Bakry
(1953 – ) Terry Lyons
(1951 – ) David Nualart
(1949 – 2014) Marc Yor
(1947 – ) Shige Peng
(1947 – ) Étienne Pardoux
(1944 – ) Nicole El Karoui
(1944 – ) Jean Jacod
(1942 – 2004) Catherine Doléans-Dade
(1940 – ) S. R. Srinivasa Varadhan
(1940 – ) Daniel W. Stroock
(1938 – ) Mark Iosifovich Freidlin
(1938 – 1995) Fischer Black
(1935 – ) Shinzo Watanabe
(1934 – ) Albert Shiryaev
(1934 – 2003) Paul-André Meyer
(1930 – ) Henry McKean
(1930 – 2011) Anatoliy Skorokhod
(1930 – 1997) Ruslan Stratonovich
(1927 – 2013) Donald Burkholder
(1925 – 2010) Paul Malliavin
(1924 – 2014) Eugene Dynkin
(1923 – 2020) Freeman Dyson
(1916 – 2008) Gilbert Hunt
(1915 – 2008) Kiyosi Itô
(1915 – 1940) Wolfgang Doeblin
(1914 – 1984) Mark Kac
(1911 – 2004) Shizuo Kakutani
(1910 – 2004) Joseph Leo Doob
(1908 – 1989) Robert Horton Cameron
(1906 – 1970) William Feller
(1903 – 1987) Andrey Kolmogorov
(1900 – 1988) George Uhlenbeck
(1896 – 1971) Paul Lévy
(1894 – 1964) Nobert Wiener
(1879 – 1955) Albert Einstein
(1875 – 1941) Henri Lebesgue
(1872 – 1946) Paul Langevin
(1872 – 1917) Marian Smoluchowski
(1871 – 1956) Émile Borel
(1870 – 1942) Jean Baptiste Perrin
(1870 – 1946) Louis Bachelier
(1856 – 1922) Andrey Markov
(1856 – 1894) Thomas Joannes Stieltjes
(1773 – 1858) Robert Brown

v/142
On ne peut non plus fixer une tangente, même de façon approchée, à aucun point de la trajec-
toire, et c’est un cas où il est vraiment naturel de penser à ces fonctions continues sans dérivées
que les mathématiciens ont imaginées, et que l’on regarderait à tort comme de simples curiosités
mathématiques, puisque la nature les suggère aussi bien que les fonctions à dérivée.

Jean Perrin (1870 – 1942), Les Atomes (1913), Chapitre 4, partie 68, [39].

Uhlenbeck’s attitude to Wiener’s work was brutally pragmatic and it is summarized at the end
of footnote 9 in his paper (written jointly with Ming Chen Wang) “On the Theory of Brownian
Motion II” (1945): the authors are aware of the fact that in the mathematical literature, especially
in papers by N. Wiener, J. L Doob, and others [cf. for instance Doob (Annals of Mathematics 43,
351 1942) also for further references], the notion of a random (or stochastic) process has been de-
fined in a much more refined way. This allows [us], for instance, to determine in certain cases the
probability that the random function y(t) is of bounded variation or continuous or differentiable,
etc. However it seems to us that these investigations have not helped in the solution of problems
of direct physical interest and we will therefore not try to give an account of them.

Mark Kac (1914 – 1984) about George Uhlenbeck (1900 – 1988)


in Enigmas of Chance : an autobiography (1984).
This was before the completion of the theory of stochastic processes and stochastic calculus,
its numerical applications, and the rise of nowadays mathematical finance which is based on it.
About Brownian motion across physics and mathematics, the reader may take a look at [23, 48, 12, 38, 7].

“... Ainsi l’intégrale et les processus d’Itô, lointains descendants de la théorie de la spéculation de Bachelier, retour-
nent à la spéculation financière. Ils méritent à tous égards d’être intégrés dans la culture générale des mathématiciens.”

Jean-Pierre Kahane, Le mouvement brownien.


Un essai sur les origines de la théorie mathématique
Société Mathématique de France, 1998.

vi/142
Contents

0 Motivation 1

1 Preliminaries 3
1.0 Sigma-algebras, random variables, and probabilities . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Expectation and law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Markov, Cauchy – Schwarz, Hölder, Jensen, convergence, Borel – Cantelli, LLN, LIL, CLT, . . . . . 4
1.4 Uniform integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Gaussian random vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Bounded variation and Lebesgue – Stieltjes integral . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.8 Monotone class theorem and Carathéodory extension theorem . . . . . . . . . . . . . . . . . . . 13

2 Processes, filtrations, stopping times, martingales 15


2.1 Measurability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Martingales, sub-martingales, super-martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Doob stopping theorem and maximal inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Brownian motion 29
3.1 Characterizations and martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Variation of trajectories and quadratic variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 Blumenthal zero-one law and its consequences on the trajectories . . . . . . . . . . . . . . . . . 35
3.4 Strong law of large numbers, invariance by time inversion, law of iterated logarithm . . . . . . 38
3.5 Strong Markov property, reflection principle, hitting time . . . . . . . . . . . . . . . . . . . . . . 40
3.6 A construction of Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.7 Wiener integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.8 Wiener measure, canonical Brownian motion, Cameron – Martin formula . . . . . . . . . . . . 47

4 More on martingales 53
4.1 Quadratic variation, square integrable martingales, increasing process . . . . . . . . . . . . . . 53
4.2 Local martingales and localization by stopping times . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3 Convergence in L2 and the Hilbert space M20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4 Convergence in L1 , closedness, uniform integrability . . . . . . . . . . . . . . . . . . . . . . . . . 64

5 Itô stochastic integral with respect to Brownian motion 69


5.1 Itô versus Stratonovich stochastic integrals in a nutshell . . . . . . . . . . . . . . . . . . . . . . . 69
5.2 Itô stochastic integral with respect to Brownian motion . . . . . . . . . . . . . . . . . . . . . . . 70
5.3 Brownian semi-martingales and Itô formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6 Itô stochastic integral and semi-martingales 77


6.1 Stochastic integral with respect to continuous martingales bounded in L2 . . . . . . . . . . . . 77
6.2 Stochastic integral with respect to continuous local martingales . . . . . . . . . . . . . . . . . . 81
6.3 Notion of semi-martingale and stochastic integration . . . . . . . . . . . . . . . . . . . . . . . . 83

vii/142
CONTENTS

6.4 Summary of stochastic integrals and involved spaces . . . . . . . . . . . . . . . . . . . . . . . . . 86

7 Itô formula and applications 87


7.1 Itô formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.2 Lévy characterization of Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.3 Doléans-Dade exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.4 Dubins – Schwarz theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.5 Girsanov theorem for Itô integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.6 Sub-Gaussian tail bound and exponential square integrability for local martingales . . . . . . . 99
7.7 Burkholder – Davis – Gundy inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.8 Representation of Brownian functionals and martingales as stochastic integrals . . . . . . . . . 103

8 Stochastic differential equations 107


8.1 Stochastic differential equations with general coefficients . . . . . . . . . . . . . . . . . . . . . . 107
8.2 Ornstein – Uhlenbeck, Bessel, and Langevin processes . . . . . . . . . . . . . . . . . . . . . . . . 113
8.3 Markov property, Markov semi-group, weak uniqueness . . . . . . . . . . . . . . . . . . . . . . . 119
8.4 Martingale, generator, Kolmogorov equations, strong Markov property, Girsanov theorem . . . 121
8.5 Locally Lipschitz coefficients and explosion time . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

9 More links with partial differential equations 133


9.1 Feynman – Kac formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
9.2 Kakutani probabilistic formulation of Dirichlet problems . . . . . . . . . . . . . . . . . . . . . . 136

Bibliography 141

viii/142
Chapter 0

Motivation

In this introductory course on stochastic calculus, the goal is to define integrals of the form
Z t
It = X s dY s , t ≥ 0,
0

where (X t )t ≥0 and (Y t )t ≥0 are stochastic processes. The result is a stochastic process (I t )t ≥0 .


If we sub-divide the interval [0, t ] into [t 0 , t 1 ] ∪ · · · ∪ [t n−1 , t n ] with 0 = t 0 and t n = t , then the hypothetic
quantity I t is naturally approximated by the (random) Riemann sum
n−1
X
Xei (Y ti +1 − Y ti ), where Xei ∈ [X ti , X ti +1 ], i = 0, . . . , n − 1.
i =0

Following Riemann, Stieltjes, and L. C. Young among others, the convergence of this quantity when n → ∞
with maxi (t i +1 − t i ) → 0 is garanteed when the integrator and the integrand are regular enough. Unfortu-
nately, this does not work for instance when both X and Y are Brownian motion, due to the fact that the
sample paths of this stochastic process are of infinite variation on all finite intervals. The solution found by
Itô is to take advantage of the stochastic nature of Brownian motion and to consider the limit of Riemann
sums in L2 or in probability. Taking1 Xei = X ti and following this idea leads to what is known as the Itô inte-
gral, for which (I t )t ≥0 is typically a martingale. This approach remains valid far beyond Brownian motion,
for a remarkable class of integrators called semi-martingales, which are sums of a local martingale and a
finite variation process. There is also a fundamental formula of calculus2 called the Itô formula or lemma.
The set of semi-martingales is stable by stochastic integration and by composition with smooth functions.
The Itô integral allows us to define and compute in particular I t when both X and Y are Brownian
motion, and more generally to solve stochastic differential equations of the form
Z t Z t
Xt = X0 + σ(X s )dB s + b(X s )ds, t ≥ 0,
0 0

where for instance B = (B t )t ≥0 is a Brownian motion, and where σ and b are regular enough coefficients,
typically locally Lipschitz, as in the classical Cauchy – Lipschitz theorem for ordinary differential equations.
In the right hand side, the first integral is an Itô stochastic integral which is a limit in probability of Riemann
sums while the second integral is a Lebesgue – Stieltjes integral which is an almost sure limit of Riemann
sums. Both integrals are special cases of the integral with respect to a semi-martingale. We study various
properties of stochastic differential equations, including the strong Markov property, the relation to mar-
tingales and partial differential equations (Duhamel formula), and the relative asbolute continuity of the
distribution of the solutions for different choices of coefficients (Girsanov formulas).
Finally we give a probabilistic interpretation of real Schrödinger operators, known as the Feynman – Kac
formula, and a probabilistic representation of the solution of Dirichlet type problems, due to Kakutani.
Stochastic processes are essential in the modelling of phenomena in physics, computer science, biology,
chemistry, finance, etc. Stochastic calculus is essential in the computation and estimation of distributions
of stochastic objects of interests such as stopping times and solutions of stochastic differential equations.
Beyond utilitarism, stochastic calculus provides also deep and aesthetic mathematics, essential for your
happiness. A great thank you to Kolmogorov, Doob, Lévy, Itô, and all the others for this wonderful universe.
ei = 1 (X t + X t ) leads to the Strato(novich) integral, with advantages and drawbacks.
1 Taking X
2 i i +1
2 A smooth function is the integral of its derivative.

1/142
0 Motivation

³ ´
Ω, F , (Ft )t ≥0, P
Unless otherwise stated, the random variables
and stochastic processes considered in this course
are defined on this enormous filtered probability space
and moreover the filtration is complete and right-continuous.

2/142
Chapter 1

Preliminaries

We refer to [14] and to [19] for the essential basic notions of probability theory (and more).

1.0 Sigma-algebras, random variables, and probabilities

A σ-field or σ-algebra – tribu in French – on a set Ω is a collection of subsets A ⊂ P (Ω) such that

• Ω∈A

• for all A ∈ A , we have A c ∈ A

• for all at most countable family (A n )n in A , we have ∩n A n ∈ A

where A c = Ω \ A. By combining these properties we also get ∅ ∈ A and ∪n A n ∈ A . We say that the couple
(Ω, A ) is a measurable space. Extreme examples of σ-algebras are P (Ω) and {∅, Ω}.

• The intersection of an arbitrary family of σ-algebras is a σ-algebra.

• The σ-algebra generated by a subset of P (Ω) is the ∩ of all the σ-algebras containing the subset.

• If Ω is equipped with a topology T , the σ-algebra generated by T is called the Borel σ-algebra B 1 .

A map f : Ω → E where (Ω, F ) and (E , E ) are measurable is measurable when f −1 (B ) ∈ F for all B ∈ E .
A (positive) measure on a measurable space (Ω, A ) is a map µ : A → [0, +∞] such that

• µ(∅) = 0

• for all at most countable family (A n )n of parwise disjoint elements of A , we have µ(∪n A n ) = n µ(A n ).
P

The triplet (Ω, A , µ) is a measured space. The measure µ is a probability measure when µ(Ω) = 1, and in this
case the triplet (Ω, A , µ) is then a probability space.
A random variable X taking values in a measurable space (E , E ) is a measurable map defined on a prob-
ability space (Ω, A , P). By default we always assume that there is an underlying probability space (Ω, A , P).

1.1 Expectation and law

If X = 1 A then E(X ) = P(A), by linearity and monotone convergence this allows to define E(X ) ∈ [0, +∞]
when X takes its values in [0, +∞]. Next L1 is the set of random variables such that E(|X |) < ∞. If X = X + −X −
then |X | = X + + X − ∈ L1 if and only if X ± ∈ L1 , and then we have E(|X |) = X + − X − and E(X ) = E(X + ) − E(X − ).

The law P X of a real random variable X is characterized by

• distribution: P X (B ) = P(X −1 (B )) for all B ∈ BR

• cumulative distribution function: F X (x) = P X ((−∞, x])) = P(X ≤ x) for all x ∈ R

• characteristic function: ϕ X (t ) = EPX (eit • ) = E(eit X ) for all t ∈ R,


1 Further reading: https://djalil.chafai.net/blog/2016/03/21/integration-alpha-et-omega/

3/142
1 Preliminaries

• Laplace transform (when X ≥ 0): L X (t ) = EPX (e−t • ) = E(e−t X ) for all t ≥ 0.

More generally, for a random variable X : (Ω, A ) → (E , B), the law P X = P ◦X −1 of X is a probability mea-
sure on (E , B). This infinite dimensional dual functional object is characterized by considering its values
on a sufficiently large family of test functions such as, when (E , B) = (R, B), 1(−∞,x] , x ∈ R, or eit • , t ∈ R, etc.

1.2 Independence

1. A family (Ai )i ∈I of sub-σ-algebras of A is independent when for all finite J ⊂ I and all A i ∈ Ai we have

P(∩i ∈J A i ) = P(A i ).
Y
i ∈J

2. We say that a family (X i )i ∈I of random variables is independent, X i : (Ω, A ) 7→ (E i , Bi ), when the


family of sub-σ-algebras (σ(X i ))i ∈I is independent, where

σ(X i ) = {X i−1 (B ) : B ∈ Bi )

is the σ-algebra generated by X i . Thus (X i )i ∈I is independent iff for all J ⊂ I finite,

P X i :i ∈J = ⊗i ∈J P X i E i , ⊗i ∈J Bi ).
Y
on (
i ∈J

It follows that if X 1 , X 2 , . . . , X n are real random variables integrable and independent then
n n n
X i ∈ L1 and E E(X i ).
Y ¡Y ¢ Y
Xi =
i =1 i =1 i =1

1.3 Markov, Cauchy – Schwarz, Hölder, Jensen, convergence, Borel – Cantelli, LLN, LIL, CLT, . . .

Markov inequality. If U (X ) ≥ 0 for a non-decreasing function U then for all r > 0,

E(U (X ))
P(X ≥ r ) ≤ .
U (r )

This allows to control tails with moments. Conversely, we can control moments by tails via
Z ∞
E(U (|X |)) = U (0) + U ′ (t )P(|X | ≥ t )dt .
0

Cauchy – Schwarz inequality. In [0, +∞], with equality if and only if X and Y are colinear,

E(X Y ) ≤ E(|X |2 )1/2 E(|Y |2 )1/2 .

Hölder inequality. If p ∈ [1, ∞] and q = 1/(1 − 1/p) = p/(p − 1) then, in [0, +∞],

E(|X Y |) ≤ E(|X |p )1/p E(|Y |q )1/q .

Jensen inequality. If U : Rd → R is convex and X ∈ L1 with U (X ) ∈ L1 then

U (E(X )) ≤ E(U (X )),

moreover when U is strictly convex then equality is achieved only if X is (almost surely) constant. Useful
examples include U (x) = x p , p ≥ 1, U (x) = ecx , c ∈ R, U (x) = +∞1x<0 + x log(x)1x≥0 .
Convergences. Below (X n )n≥1 , (Yn )n≥1 , X , Y are real random variables on a probability space (Ω, A , P),
of law µn , νn , µ, ν and cumulative distribution function F n ,G n , F,G respectively.
a.s.
Almost sure convergence. We say that X n −→ X when

P( lim X n = X ) = 1
n→∞

4/142
1.3 Markov, Cauchy – Schwarz, Hölder, Jensen, convergence, Borel – Cantelli, LLN, LIL, CLT, . . .

in other words P({ω ∈ Ω : limn→∞ X n (ω) = X (ω)}) = 1. This is the notion of convergence in the SLLN.
P
Convergence in probability. We say that X n −→ X when

∀ε > 0, lim P(|X n − X | ≥ ε) = 0


n→∞

which means that ∀ε > 0, limn→∞ P({ω ∈ Ω : |X n (ω) − X (ω)| ≥ ε}) = 0. This is used in the weak LLN.
Lp
Mean convergence. For all p ∈ [1, ∞), we say that X n −→ X when

X ∈ Lp and lim E(|X n − X |p ) = 0.


n→∞

The most useful cases are p ∈ {1, 2, 4}.


law d
Convergence in law. The following properties are equivalent and we say then that X n −→ X , or X n −→ µ
nar.
(convergence in distribution), or µn −→ µ (narrow convergence). This is used in the CLT.

1. limn→∞ E( f (X n )) = E( f (X )) for all bounded and continuous f : R → R

2. limn→∞ E( f (X n )) = E( f (X )) for all C ∞ and compactly supported f : R → R

3. Cumulative distribution function. limn→∞ E( f (X n )) = E( f (X )) for all f = 1(−∞,x] with x continuity


point of P(X ≤ •), in other words F n (x) = P(X n ≤ x) → F (x) = P(X ≤ x) as soon as F is continuous at x

4. Fourier transform or characteristic function. limn→∞ E( f (X n )) = E( f (X )) for all f = eit • , t ∈ R

5. Laplace transform. (on R+ ) limn→∞ E( f (X n )) = E( f (X )) for all f = e−t • , t ≥ 0.

Contrary to the other modes of convergence, the convergence in law does not depend on the law of the cou-
ple (X n , X ) and uses only marginal laws. The Fourier and Laplace transforms convert sums of independent
random variables into products, for which the expectation is the product of expectations.
Apart the convergence in law, the other modes of convergence are stable by finite linear combinations.
The almost sure convergence, the convergence in probability, and the convergence in law are stable by com-
position with continuous functions, and this is referred to sometimes as the continuous mapping theorem.
The notions of convergence extend naturally to random vectors by using a distance/norm/scalar prod-
uct, for instance for the characteristic function by replacing it X by i〈t , X 〉.

Lp CV

L1 CV

a.s. CV ⇒ CV in P ⇒ CV in law

If X is constant then the convergence in law implies the convergence in probability. The convergence in
L1 is equivalent to uniform integrability and convergence in probability.
Monotone convergence theorem. If (X n )n≥1 takes its values in [0, +∞] and ↗ then

E( lim X n ) = lim E(X n ) ∈ [0, +∞].


n→∞ n→∞

Fatou lemma. If (X n )n≥1 takes its values in [0, +∞] then

E( lim X n ) ≤ lim E(X n ) ∈ [0, +∞].


n→∞ n→∞

a.s.
Dominated convergence theorem. If X n −→ X and supn |X n | ≤ Y , E(Y ) < ∞, then

lim E(X n ) = E( lim X n ) = E(X ).


n→∞ n→∞

The dominated convergence is an easy to check criterion of uniform integrability.


a.s. L1
Scheffé lemma. If X n , X ∈ L1 , X n −→ X then X n −→ X iff E(|X n |) → E(|X |).

5/142
1 Preliminaries

law law law


Slutsky lemma. If X n −→ X and Yn −→ Y and Y is constant then (X n , Yn ) −→ (X , Y ). In particular
law law law
X n Yn −→ X Y , X n + Yn −→ X + Y , X n /Yn −→ X /Y if Y ̸= 0.
Fubini – Tonelli theorem. Let (Ω1 , A1 , µ1 ) and (Ω2 , A2 , µ2 ) two measurable spaces, and let f : Ω1 ×Ω2 → R
be a measurable function. If f ≥ 0 or if f ∈ L1 (µ1 ⊗ µ2 ) then
Z Z µZ ¶
f (x, y)d(µ1 ⊗ µ2 )(x, y) = f (x, y)dµ1 (x) dµ2 (y).

Borel – Cantelli lemma. Let (A n )n be events in a probability space (Ω, A , P). We define
(
limn A n = ∪n ∩m≥n A m = {ω ∈ Ω : ω ∈ A n for n large enough},
limn A n = ∩n ∪m≥n A m = {ω ∈ Ω : ω ∈ A n for infinitely many values of n}.

We have (limn A cn )c = limn A n , and limn 1 A n = 1lim An


and limn 1 A n = 1limn A n .
n

n P(A n ) < ∞ then P(limn


P
1. (Cantelli) if An ) = 0

n P(A n ) = ∞ and the (A n )n are independent then P(lim A n ) = 1.


P
2. (Borel zero-one law) if

The Borel – Cantelli lemma is a great provider of almost sure convergence. Note that if X takes its values in
[0, +∞] then E(X ) < ∞ implies P(X < ∞) = 1, and this allows to prove the Cantelli part:
nX o
P(A n ) = E1 A n = E
X X X
1 An and 1 A n = ∞ = lim A n .
n n n n n

Strong Law of Large Numbers (SLLN). If X ∈ L1 and X 1 , X 2 , . . . are i.i.d.2 copies of X then, with m = E(X ),

X 1 + · · · + X n a.s. X 1 + · · · + X n L1
−→ m and −→ m.
n n→∞ n n→∞

Central limit theorem (CLT). If moreover X ∈ L2 , then with σ2 = Var(X ) = E((X − m)2 ) = E(X 2 ) − m 2 ,
p ³
n X1 + · · · + Xn ´ X −m +···+ X −m
1 n law
−m = p −→ N (0, 1).
σ n nσ n→∞

Law of iterated logarithm (LIL). Under the assumptions and with the notation of the CLT, almost surely
p
³ n ³ X +···+ X
1 n
´´ ³ X −m +···+ X −m ´
1 n
lim − m = lim =1
σ 2 log log(n)
p p
n→∞ n n→∞ 2n log log(n)σ

and p
³ n ³ X +···+ X
1 n
´´ ³ X −m +···+ X −m ´
1 n
lim − m = lim = −1
σ 2 log log(n)
p p
n→∞ n n→∞ 2n log log(n)σ
X 1 +···+X n P
Note that the CLT gives n − m −→ 0, which is a weak form of LLN.
n→∞

1.4 Uniform integrability

For any family (X i )i ∈I ⊂ L1 , the following three properties are equivalent3 . When one (and thus all) of
these properties holds true, we say that the family (X i )i ∈I is uniformly integrable (u.i.) or equi-integrable4 .
The first property can be seen as a natural definition of uniform integrability.

1. (definition of uniform integrability) limr →+∞ supi ∈I E(|X i |1|X i |≥r ) = 0


2 Independent and identically distributed, in French “indépendantes et identiquement distribuées”.
3 Further reading: https://djalil.chafai.net/blog/2014/03/09/de-la-vallee-poussin-on-uniform-integrability/
4 The terminology comes from the fact that by dominated convergence, we have X ∈ L1 if and only if lim
r →∞ E(|X |1|X |≥r ) = 0.

6/142
1.5 Conditioning

2. (epsilon-delta criterion) the family is bounded in L1 in the sense that

sup E(|X i |) < ∞


i ∈I

and moreover ∀ε > 0, ∃δ > 0, ∀A ∈ F , P(A) ≤ δ ⇒ supi ∈I E(|X i |1 A ) ≤ ε

3. (de la Vallée Poussin5 boundedness in LU criterion) there exists a non-decreasing convex U : R+ → R+


such that limx→+∞ U (x)/x = +∞ and such that (U (|X i |))i ∈I is bounded in L1 , namely

sup E(U (|X i |)) < ∞.


i ∈I

Note that this implies boundedness in L1 , and is implied by boundedness in Lp with p > 1.

Here are examples for uniformly integrable families:

• every finite subset of L1 is uniformly integrable. In particular if X ∈ L1 then there exists a non-decreasing
convex and super-linear U such that U (|X |) ∈ L1 , but beware that this U depends on X .

• if (X i )i ∈I is bounded in Lp with p > 1 then it is u.i.

• if supi ∈I |X i | ∈ L1 (domination: |X i | ≤ X ∈ L1 for all i ∈ I ) then (X i )i ∈I is u.i.

L1
• if T ∈ {N, R+ } and X t −→ X ∈ L1 then (X t )t ∈T , (X t )t ∈T ∪ {X }, and (X t − X )t ∈T are u.i.
t →∞

• if X ∈ L1 and X i = E(X | Fi ) for all i ∈ I for σ-algebras (Fi )i ∈I then (X i )i ∈I is uniformly integrable.

The notion of uniform integrability leads to a stronger version of the dominated convergence theorem:
for any p ≥ 1, and for any random variables X and (X t )t ∈T , T ∈ {N, R+ }, we have

Lp P
X t , X ∈ Lp and X t −→ X if and only if (|X t |p )t ∈T is u.i. and X t −→ X
t →∞ t →∞

In particular the convergence in probability together with u.i. implies X ∈ L1 , which is remarkable!
The dominated convergence theorem corresponds to the special case supt ∈T |X t | ∈ L1 .

1.5 Conditioning

1. Orthogonal projection in a Hilbert space. Let H be a Hilbert space and F ⊂ H be a closed sub-space.
For all x ∈ H there exists a unique y ∈ F , called the orthogonal projection of x on F , which satisfies
one (and thus all) the following equivalent properties:

• (orthogonality) for all z ∈ F , x − y ⊥ z namely 〈x, z〉 = 〈y, z〉


° ° ° °
• (variational: least squares) for all z ∈ F , °x − y ° ≤ ∥x − z∥ namely °x − y ° = minz∈F ∥x − z∥.

2. Let (Ω, A , P) be a probability space and F be a sub-σ-algebra of A . Let us consider the Hilbert space
H = L2 (Ω, A , P). The set F = L2 (Ω, F , P) is a closed sub-space of H . If X ∈ H , it is natural to consider
the best (least squares) approximation of X by an element of F , denoted Y . The random variable Y is
the orthogonal projection of X on F , characterized by the following:

Y ∈ L2 (Ω, F , P) and, for all Z ∈ L2 (Ω, F , P), E(|X − Y |2 ) ≤ E(|X − Z |2 ).

Using the relation to scalar product, the second property is equivalent to

• for all Z ∈ L2 (Ω, F , P), E(X Z ) = E(Y Z ), or even for all B ∈ B, E(X 1B ) = E(Y 1B ).

We denote Y = E(X | F ) and we call it the conditional expectation of Y given F . It is the best approxi-
mation in L2 (in a sense least squares) of X by an F -measurable square integrable random variable.
5 After Charles-Jean Étienne Gustave Nicolas de la Vallée Poussin (1866 – 1962), Belgian mathematician.

7/142
1 Preliminaries

3. If now X ∈ L1 (Ω, A , P), we define by extension Y = E(X | F ), a real random variable characterized by

(a) Y ∈ L1 (Ω, F , P)
(b) for all Z bounded and F measurable, E(X Z ) = E(Y Z ), or for all B ∈ F , E(X 1B ) = E(Y 1B ).

Proof. Let µ be the bounded measure on (Ω, F ) defined by µ(B ) = E(X 1B ), B ∈ F . Set ν = PF . For
all B ∈ F , if ν(B ) = 0 then µ(B ) = 0. From the Radon – Nikodym theorem, there exists a unique Y ∈
L1 (Ω, F , ν) such that B Y dν = µ(B ), for all B ∈ F in other words E(Y 1B ) = E(X 1B ), for all B ∈ F .
R

The expectation and the variance of square integrable random variables have a variational interpreta-
tion. Namely if X ∈ L2 then var(X ) is the square distance in L2 of X to the sub-space of constants r.v. namely

var(X ) = inf E((X − c)2 ) = inf (E(X 2 ) − 2c E(X ) + c 2 ).


c∈R c∈R

This infinimum is a minimum, achieved for c = E(X ), which is therefore the orthogonal projection of X in
L2 on the sub-space of constant random variables, and

var(X ) = E((X − E(X ))2 ) = E(X 2 ) − 2E(X E(X )) + (E(X )2 ) = E(X 2 ) − (E(X ))2

which follows in fact from the Pythagoras theorem in L2 . More generally we have

var(X ) = E(X 2 ) − (E(X ))2


= E(X 2 ) − E((E(X | F ))2 ) + E((E(X | F ))2 ) − (E(X ))2
= E(var(X | F )) + var(E(X | F ))

where var(X | F ) = E(X 2 | F ) − (E(X | F ))2 . Note that by definition of E(X | F ),

inf E((X − Y )2 ) = E((X − E(X | F ))2 )


Y :σ(Y )⊂F

= E(X 2 ) − 2E(X E(X | F )) + E((E(X | F ))2 )


= E(X 2 ) − E((E(X | F ))2 )
= E(var(X | F )).

Note that E = E(· | T ) where T = {∅, Ω}. The conditional expectation generalizes the expectation and
has all the properties of an expectation, and more. Namely, for all sub-σ-algebra F of A :

• Linearity. for all α, β ∈ R and X , Y ∈ L1 , E(αX + βY | F ) = αE(X | F ) + βE(Y | F )

• Independence. If X is independent of F (always the case when X is constant) then E(X | F ) = E(X )

• Factorization. If X is F -measurable, Y ∈ L1 , X Y ∈ L1 , then E(X Y | F ) = X E(Y | F ), in particular we


recover the “projection property” E(X | F ) = X if X ∈ L1 (Ω, F , P) which is the case when X is constant

• Composed “projections” or “tower property”. For all sub-σ-algebras F , G with G ⊂ F and all X ∈ L1 ,

E(E(X | F ) | G ) = E(E(X | G ) | F ) = E(X | G ),

and in particular for all X ∈ L1 , E(E(X | F )) = E(X ), and if X is constant then E(X | F ) = X .

• Normalization. E(1Ω | F ) = 1Ω (follows from some of the properties above)

• Positivity or monotonicity. For all X , Y ∈ L1 , if X ≤ Y then E(X | F ) ≤ E(Y | F ), or equivalently for all
X ∈ L1 if X ≥ 0 then E(X | F ) ≥ 0. In particular for all X ∈ L1 ,

|E(X | F )| ≤ E(|X | | F )

8/142
1.5 Conditioning

• Convexity. Jensen inequality: for all non-negative convex U : Rd → R and all X ∈ L1 ,

U (E(X | F )) ≤ E(U (X ) | F ).

In particular, for all p ∈ [1, ∞), |E(X | F )|p ≤ E(|X |p | F ). Moreover for all X ∈ Lp and Y ∈ Lq with
1 ≤ p, q < ∞, 1/p + 1/q = 1 (q = p/(p − 1)), we have the Hölder inequality

|E(X Y | F )| ≤ (E(|X |p | F )1/p E(|Y |q | F )1/q .

The Cauchy – Schwarz inequality corresponds to the special case p = q = 1/2

• Monotone convergence. If X n ≥ 0, X n ↗ X , X ∈ L1 , then E(X n | F ) ↗ E(X | F ). This allows to define


E(X | F ) for all non-negative random variable X taking values in [0, +∞].

Theorem 1.5.1. Transfer or the meaning of being measurable.

If T : Ω → (F, F ) are Y : Ω → (R, BR ) and random variables then Y is σ(T ) measurable if and only if
there exists g : (F, F ) → (R, BR ) measurable such that Y = g ◦ T .

Proof. If Y = 1 A for A ∈ σ(T ), then A = T −1 (B ) for some B ∈ F , and therefore Y = 1B ◦ T . If Y = i ∈I a i 1 A i


P

with I finite and A i = T −1 (B i ), B i ∈ F , then Y = ( i ∈I a i 1B i ) ◦ T . The property is thus satisfied when Y is


P

a step function. Now, if Y is non-negative and σ(T ) measurable, then there exists a sequence (Yn )n of step
functions, σ(T ) measurable, such that Yn ↗ Y , and Yn = g n ◦ T . By setting g = lim g n , we get Y = g ◦ T .
Finally, if Y is just σ(T ) measurable, then it suffices to write Y = Y+ − Y− . ■

Let X ∈ L1 (Ω, A , P) and let T : (Ω, A ) → (F, F ) be a random variable. The conditional expectation of X
given T , denoted E(X | T ), is defined by E(X | T ) = E(X | σ(T )). It is characterized by the following properties:

1. There exists g : (F, F ) → (R, BR ) with E(X | T ) = g (T ) and g (T ) ∈ L1

2. For all h : (F, F ) → (R, BR ) measurable and bounded,

E(X h(T )) = E(g (T )h(T )).

If X ∈ L2 then, thanks to the transfer theorem (Theorem 1.5.1), the conditional expectation E(X | T ) is
the best approximation in L2 (least squares!) of X by a measurable function of T .
For a probability space (Ω, F , P), an event A ∈ F , and a sub-σ-algebra A ⊂ F , the quantity P(A | A ) =
E(1 A | A ) is a random variable taking its values in [0, 1]. Similarly, conditioning with respect to an event
makes sense in the sense that E(X | A) = E(X | 1 A = 1), and
E(X 1 A ) E(X 1 A c )
E(X | 1 A ) = 1A + 1 Ac
P(A) P(A c )
= E(X | 1 A = 1)1 A + E(X | 1 A = 0)1 A c .

Finally, when X and Y take their values in an at most countable set then

E(X | Y ) = F (Y ) where F (y) = E(X | Y = y) = x P(X = x | Y = y).


X
x

Remark 1.5.2. Conditional expectation as averaging of residual randomness.

Let X and Y be random variables defined on a probability space (Ω, F , P), and let A be a sub-σ-
algebra of F . If X is independent of A and if Y is A -measurable, then, using the monotone class
theorem, for all F -measurable and bounded or positive f : R × R → R, we get

E( f (X , Y ) | A ) = E( f (X , Y ) | Y ) = g (Y ) where g (y) = E( f (X , y)).

This suggests to interpret intuitively the conditional expectation as an averaging of residual random-
ness, and not only as the best approximation in the sense of least squares.

9/142
1 Preliminaries

Let X and Y be two random variables taking values in the measurable spaces (E , E ) and (F, F ) respec-
tively. The conditional law of X given Y is a family (N (y, ·)) y∈F of probability measures on (E , E ), in other
words a transition kernel, such that for all A ∈ E , the map y ∈ F 7→ N (y, A) ∈ [0, 1] is measurable, and for all
bounded (or positive) measurable test function h : E → R,
Z
E(h(X ) | Y ) = h(x)N (Y , ·).
E

For all y ∈ F , we also say that N (y, ·) is the conditional law of X given Y = y, in other words
Z
E(h(X ) | Y = y) = h(x)N (y, ·).
E

In particular P(X ∈ A | Y ) = N (Y , A) for all A ∈ E . We sometimes speak about disintegration of measure.


The random variables X and Y are independent if and only if N (y, ·) does not depend on y in the sense
that for almost all y ∈ F , N (y, ·) = P X where P X is the law of X .
R R
If (X , Y ) has Lebesgue density f X ,Y then X and Y have densities f X = f (·, y)dy and f Y = f (x, ·)dx and
the conditional law Law(X | Y = y) has density f X |Y =y = f X ,Y (x, y)/ f Y (y), in such a way that

f X ,Y (x, y) = f X |Y =y (x) f Y (y) = f X (x) f Y |X =x (y).

1.6 Gaussian random vectors

A random vector X = (X 1 , . . . , X n ) of Rn is a Gaussian random vector when every linear combination of its
components is Gaussian, namely for all α1 , . . . , αn ∈ R the real random variable α1 X 1 +· · ·+αn X n is Gaussian.
Let X be a random vector with mean vector and the covariance matrix

m = E(X ) = (E(X 1 ), . . . , E(X n )) and Σ = E((X j − m j )(X k − m k )) 1≤ j ,k≤n


¡ ¢

Then X is Gaussian iff its characteristic function is given for all t ∈ Rn by


1
ϕ X (t ) = E eit X = eit m− 2 〈Σt ,t 〉 .
¡ ¢

We denote this law N (m, Σ). Beware that when n = 1, we denote Σ = σ2 .


We say that N (0, I d ) is the standard Gaussian.
The law N (m, Σ) has a density iff Σ is invertible, given by
³ ´
exp − 21 〈Σ−1 (x − m), x − m〉
x ∈ Rn 7→ p ,
(2π)n det(Σ)

otherwise N (m, Σ) is supported by a strict sub-vector space of Rn .


If (X 1 , . . . , X n ) is a Gaussian random vector, then X 1 , . . . , X n are independent iif Σ is diagonal.
If Z ∼ N (0, I n ) and m ∈ Rd and A ∈ Md ,n (R) then AZ ∼ N (m, A A ⊤ ) is a Gaussian random vector of Rd .

Coding in action 1.6.1. Simulation.

Write a Pythona or Juliab program for the simulation of a sample of N (m, Σ) knowing m and Σ. What
is the best way to reduce to the one-dim. case? What is the best way to find A such that A A ⊤ = Σ?
a https://en.wikipedia.org/wiki/Python_(programming_language)
b https://en.wikipedia.org/wiki/Julia_(programming_language)

1.7 Bounded variation and Lebesgue – Stieltjes integral

10/142
1.7 Bounded variation and Lebesgue – Stieltjes integral

Definition 1.7.1. p-variation of a function on a finite interval.

Let [a, b] ⊂ R be a finite interval. For all p ≥ 1, the p-variation of a function f : [a, b] → R is defined by
³ ´1/p
p
° ° X
°f ° = sup | f (t k+1 ) − f (t k )| ∈ [0, +∞]
p−var
tk k

where the supremum runs over all finite partitions or sub-divisions of the interval I namely the finite
sequences (t k )0≤k≤n in [a, b] such that n ≥ 0 and a = t 0 < · · · < t n+1 = b.

• ∥ f ∥1-var is called sometimes the total variation of f

• if f : [a, b] → R has finite 1-variation, we say that f has finite variation or is of bounded variation

• if f : [a, b] → R is of bounded variation then f is bounded (the boundedness of [a, b] plays a role here).

• if f : [a, b] → R if of bounded variation and is differentiable with integrable derivative then


Z b
∥ f ∥1-var = | f ′ (t )|dt .
a

• if f is continuously differentiable then f has bounded variation and the latter holds true.

Theorem 1.7.2. Representation of bounded variation functions on a finite interval.

Let [a, b] ⊂ R be a finite interval. For all f : [a, b] 7→ R, the following properties are equivalent:

1. f is of bounded variation

2. f is the difference of two positive increasing functions [a, b] → R.

Such a decomposition is not unique in general.

Proof. 1 ⇒ 2. Let f be a function of bounded variation on [a, b]. For all t ∈ [a, b], let

n−1
X
F (t ) = sup | f (t k+1 ) − f (t k )|
δ k=0

where the supremum runs over the set of partitions or sub-divisions δ : a = t 0 < · · · < t n = t of [a, t ], n = n δ ≥
1. Now F is increasing (and bounded) by definition. It suffices now to show that G = F − f is increasing. We
observe that for all t 1 < t 2 in [a, b], we have F (t 1 ) + f (t 2 ) − f (t 1 ) ≤ F (t 1 ) + | f (t 2 ) − f (t 1 )| ≤ F (t 2 ), and thus

G(t 2 ) −G(t 1 ) = F (t 2 ) − f (t 2 ) − F (t 1 ) + f (t 1 ) ≥ 0.

2 ⇒ 1. If f and g have bounded variation on [a, b], then it is also the case for f − g . On the other hand, if f
is monotonic on I then it is of bounded variation since for all sub-division a = t 0 < · · · < t n = b, n ≥ 1,

n−1
X
| f (t k+1 ) − f (t k )| = | f (b) − f (a)|.
k=0

The notion of bounded variation is used for the Lebesgue – Stieltjes integral in stochastic calculus.

Theorem 1.7.3. Lebesgue – Stieltjes integral of continuous finite variation integrators.

Let [a, b] ⊂ R be a finite interval. Let f : [a, b] → R be right continuous and of bounded variation.

11/142
1 Preliminaries

Then there exists a unique finite signed Borel measure µ f on ([a, b], B[a,b] ) such that

µ f ({a}) = 0, and for all t ∈ [a, b], µ f ((a, t ]) = f (t ) − f (a).

It is customary to denote dµ f = d f , and for all measurable g : [a, b] → R, positive or in L1 (|µ f |),
Z b Z
g (t )d f (t ) = g dµ f .
a

Moreover, for all bounded and continuous g : [a, b] → R, and for all sequence (δn )n≥1 of partitions or
sub-divisions of [a, b], δn : a = t 0(n) < · · · < t m
(n)
n
(n)
= b, m n ≥ 1, with limn→∞ maxk (t k+1 − t k(n) ) = 0, we have
Z b
g (t k(n) )( f (t k+1
(n)
) − f (t k(n) )).
X
g (t )d f (t ) = lim
a n→∞
k
Rt
Furthermore, h : t ∈ [a, b] 7→ h(t ) = a g (s)d f (s) is continuous and of bounded variation, and µh =
g µ f in other words dh(t ) = g (t )d f (t ), in the sense that for all bounded and measurable k : [a, b] → R,
Z b Z b Z t Z b
k(t )dh(t ) = k(t )d g (s)d f (s) = k(t )g (t )d f (t ).
a a a a

In particular when f (t ) = t for all t ≥ 0 then on all [a, b] ⊂ [0, ∞), the measure µ f is the Lebesgue measure
and for all measurable g : R+ 7→ R which is locally bounded or positive, we have, for all t ≥ 0,
Z t Z Z t
g (s)d f (s) = g dµ f = g (s)ds.
0 0

Theorem 1.7.3 is used in stochastic calculus with f (t ) = Vt (ω), t ≥ 0, and for almost all fixed ω ∈ Ω where
V = (Vt )t ≥0 is a finite variation process, for instance V = 〈M 〉 where M is a continuous local martingale. In
particular when M = B is Brownian motion then Vt = t is deterministic and we recover the example above.
Theorem 3.2.1 says that Brownian motion has a.s. sample paths of infinite variation on any interval. In
particular the assumptions of Theorem 1.7.3 are not satisfied when f (t ) = B t (ω), t ∈ [a, b] ⊂ [0, +∞).

Proof. First part. Theorem 1.7.2 gives f = f + − f − where f ± ≥ 0 are bounded and increasing. This reduces
the problem to the case where f is increasing and µ f is a positive Borel measure. In this case, the result
follows from the Carathéodory extension theorem (Theorem 1.8.5). Note: µ f is unique even if f ± are not.
Second part. For all n ≥ 1, set g (n) (a) = g (a), and for all t ∈ (a, b], g (n) (t ) = g (t k(n) ) if t ∈ (t k(n) , t k+1
(n)
] for some
(n) (n)
k ∈ {0, . . . , m n − 1}. Then g is measurable, we have limn→∞ g (t ) = g (t ) for all t ∈ [a, b], and moreover
supn supt ∈[a,b] |g (n) (t )| ≤ supt ∈[a,b] |g (t )| < ∞. By dominated convergence in L1 (|µ|), we obtain
Z Z Z b
g (t k(n) )( f (t k+1
(n)
− f (t k(n) )) = g (n) dµ f −→
X
g dµ f = g (t )d f (t ).
n→∞ a
k

Note that if g is measurable and not continuous, then g (n) → g as n → ∞, almost everywhere on [a, b],
which is suitable for the Lebesgue measure but not necessarily for the measure |µ| which is of interest here.
Third part. First of all, for all s ∈ [a, b], we have µ f |[a,s] = µ f |[a,s] .
Z s Z Z
g (t )d f (t ) = g dµ f 1[a,s] = g 1[a,s] dµ f .
a

The continuity of h follows now by dominated convergence. For the 1-variation, we write
X XZ Z
|h(t k+1 ) − h(t k )| ≤ |g |1(tk+1 ,tk ] d|µ f | = |g |d|µ f | < ∞.
k k

Finally, to prove the formula, it suffices to check it for k = 1[a,c] for c ∈ [a, b]. This writes µh (c) − µh (a) =
Rc
a g (t )dµ f (t ) = h(c) − h(a), which is the definition of µh . Note that by construction we have h(a) = 0. ■

12/142
1.8 Monotone class theorem and Carathéodory extension theorem

Remark 1.7.4. Riemann – Stieltjes – Young integral.

Following L.C. Young, it can be shown that if f , g : [a, b] → R are continuous with f of finite p-var.
and g of finitie q-var. with 1/p + 1/q > 1, then the Riemann – Stieltjes integral is well defined:
Z b m
Xn
f (t )dg (t ) = lim f (t k(n) )(g (t k+1
(n)
) − g (t k(n) )),
a n→∞
k=0

where (δn )n≥1 is an arbitrary sequence of partitions of [a, b], δn : a = t 0 < · · · < t mn = b, m n ≥ 1.

1.8 Monotone class theorem and Carathéodory extension theorem

Definition 1.8.1. π-systems and λ-systems.

• We say that C ⊂ P (Ω) is a π-system when A ∩ B ∈ C for all A, B ∈ C

• We say that S ⊂ P (Ω) is a λ-system (or monotone class or Dynkin6 system) when

– ∪n A n ∈ S for all (A n )n such that A n ⊂ A n+1 and A n ∈ S for all n


– A \ B ∈ S for all A, B ∈ S such that B ⊂ A.
Named after Eugene Dynkin (1924 – 2014), Soviet and American mathematician.

Basic examples of π-systems are given by the class of singletons {{x} : x ∈ R} ∪ {∅}, the class of product
subsets {A × B : A, B ∈ P (Ω)}, and the class of intervals {(−∞, x] : x ∈ R}.
A basic yet important example of λ-system is given by {A ∈ A : P(A) = Q(A)} where P and Q are proba-
bility measures on (Ω, A ), see Corollary 1.8.4 for an application.

Lemma 1.8.2. σ-algebras.

A λ-system that contains Ω and which is a π-system is a σ-algebra.

Note that conversely, a σ-algebra is always a π-system, but not a λ-system in general, due to the second
property of λ-systems which is not necessarily valid for a σ-algebra when A ̸= Ω.

Proof. If a λ-system S ⊂ P(Ω) contains Ω and is a π-system then for all A, B ∈ S we have

A ∪ B = Ω \ ((Ω \ A) ∩ (Ω \ B )),

which means that S is table by finite union. This allows to drop the non-decreasing condition in the stabil-
ity of S by countable union, which simply means finally that S is a σ-algebra. ■

Theorem 1.8.3. Dynkin π-λ Theorem.

If S ⊂ P (Ω) is a λ-system containing Ω and including a π-system C ,


then S contains also the σ-algebra σ(C ) generated by C .

Proof. The λ-system generated by a subset of P (Ω) is by definition the intersection of all λ-systems which
include this subset. This intersection is not empty since it contains P (Ω), and we can check that it is a
λ-system. It is the smallest (for the inclusion) λ-system containing the initial subset of P (Ω).
Let S ′ be the λ-system generated by C and Ω. It suffices to show that S ′ is a σ-algebra. For that, and
thanks to lemma 1.8.2, it suffices to show that S ′ is a π-system. To do so, let us define

S 1 = {A ∈ S ′ : A ∩ B ∈ S ′ for all B ∈ C },

13/142
1 Preliminaries

which is a λ-system including Ω and containing C , hence S 1 ⊂ S ′ , and thus S 1 = S ′ . Now,

S 2 = {A ∈ S ′ : A ∩ B ∈ S ′ for all B ∈ S ′ }

is a λ-system containing Ω and including S and thus S 2 = S ′ , hence S ′ is a π-system. ■

Corollary 1.8.4. Sierpińskia – Dynkin (functional) monotone class theorem.


a Named after Wacław Sierpiński (1882 – 1969), Polish mathematician.

1. For all probability measures P and Q on a measurable space (Ω, A ), if P(A) = Q(A) for all A ∈ C
where C is a π-system such that σ(C ) = A , then P = Q

2. Let H be a vector space of bounded measurable functions (Ω, A ) → (R, BR ) such that

(a) H is stable by monotone convergence namely if ( f n )n is a sequence in H such that f n ↗ f


pointwise with f bounded then f ∈ H
(b) H contains constant functions namely 1Ω ∈ H , is stable by product namely if f , g ∈ H
then f g ∈ H , and contains all 1 A for all A in a π-system C on Ω such that σ(C ) = A

then H contains all A -measurable bounded functions Ω → R.

Note that H is an algebra in the sense that it is a vector space stable by product.
The second statement can be seen as some sort of Stone – Weierstrass theorem of measure theory.

Proof.

1. Take S = {A ∈ A : P(A) = Q(A)} and use Theorem 1.8.3.

2. Take S = {A ∈ A : 1 A ∈ H } and use Theorem 1.8.3.

Theorem 1.8.5. Carathéodory extension theorem.

Let Ω ̸= ∅, A ⊂ P (Ω), and µ : A 7→ R+ . Let σ(A) be the σ-algebra generated by A . If

1. Ω ∈ A

2. (stability by complement) for all A ∈ A , we have A c = Ω \ A ∈ A

3. (stability by intersection) for all A, B ∈ A , we have A ∩ B ∈ A

4. µ is σ-additive and σ-finite

then there exists a unique σ-additive measure µext on (Ω, σ(A)) such that µext = µ on A .

Proof. See for instance [4]. The uniqueness can be deduced from Corollary 1.8.4. ■

14/142
Chapter 2

Processes, filtrations, stopping times, martingales

A stochastic process or process is a family of random variables X = (X t )t ≥0 , indexed by a parameter


t ∈ R+ interpreted as a time, defined on a probability space (Ω, F , P), and taking values in some measurable
space (G, B). By default a process takes real values. In general G is a metric space, with distance denoted d ,
complete, separable, and B is its Borel σ-algebra.

2.1 Measurability

The natural filtration of a process (X t )t ≥0 is the increasing family (Ft )t ≥0 of sub-σ-algebras of F defined
for all t ≥ 0 by Ft = σ(X s : 0 ≤ s ≤ t ). More generally, an increasing family (Ft )t ≥0 of sub-σ-algebras of F is
called a filtration. For a given filtration (Ft )t ≥0 on (Ω, F , P), we say that the process X is. . .

• real when G = R in other words X takes real values (this is the default in this course)

• d -dimensional when G = Rd in other words X takes its values in Rd , d ≥ 1

• issued from the origin when X 0 = 0 (makes sense when G is a vector space)

• adapted when for all t ≥ 0, X t is Ft measurable

• measurable when for all t ≥ 0, (s, ω) ∈ [0, t ] × Ω 7→ X s (ω) is B[0,t ] ⊗ F measurable

• progressive when for all t ≥ 0, (s, ω) ∈ [0, t ] × Ω 7→ X s (ω) is B[0,t ] ⊗ Ft measurable

• right-continuous (respectively left-continuous, continuous) when for almost all ω ∈ Ω, the sample
path t ∈ R+ 7→ X t (ω) ∈ G is right-continuous (respectively left-continuous, continuous)

• square integrable when for all t ≥ 0, E(X t2 ) < ∞

• bounded in Lp , p ≥ 1, when supt ≥0 E(|X t |p ) < ∞

• bounded when there exists a finite C > 0 such that almost surely, supt ≥0 |X t | ≤ C

• locally bounded when for almost all ω ∈ Ω and all t ≥ 0, sups∈[0,t ] |X s (ω)| < ∞

• of finite variation when almost surely t 7→ X t is of bounded variation on all finite intervals of R+ ,
equivalently is the difference of two positive increasing processes, see Theorem 1.7.2

• Feller continuous when x 7→ E( f (X t ) | X 0 = x) is continuous for all t ≥ 0 and bounded continuous f .

Theorem 2.1.1. Progressive σ-field and progressive processes.

1. The family P of all A ∈ F ⊗BR+ such that the process (ω, t ) 7→ 1(ω,t )∈A is progressive is a σ-field
on Ω × R+ called the progressive σ-field. Moreover the following properties hold:

• For all A ⊂ Ω × R+ , we have A ∈ P if and only if for all t ≥ 0, A ∩ (Ω × [0, t ]) ∈ Ft ⊗ B[0,t ] .


• A process X = (X t )t ≥0 is progressive if and only if it is measurable with respect to the
progressive σ-algebra P on Ω × R+ as a random variable X : (ω, t ) ∈ Ω × R+ 7→ X t (ω)

15/142
2 Processes, filtrations, stopping times, martingales

2. If X = (X t )t ≥0 is adapted right-continuous or left-continuous defined on a filtered probability


space (Ω, F , (Ft )t ≥0 , P) and taking its values in a metric space (E , d ) equipped with its Borel
σ-algebra, then X is progressive. In particular continuous adapted implies progressive.

Proof.

1. Exercise

2. We give the proof in the right-continuous case, the left-continuous case being entirely similar. For all
n ≥ 1, t > 0, s ∈ [0, t ], we define the random variable
(
n
n X kt /n
if s ∈ [(k − 1)t /n, kt /n), 1 ≤ k ≤ n,
Xs =
Xt if s = t .

Since (X t )t ≥0 is right-continuous, it follows that X s (ω) = limn→∞ X sn (ω) for all t > 0 and s ∈ [0, t ] and
all ω ∈ Ω. On the other hand, for every Borel subset A of E ,
n
[³ [ ´
{(ω, s) ∈ Ω × [0, t ] : X sn (ω) ∈ A} = ({X t ∈ A} × {t }) ({X kt /n ∈ A} × [(k − 1)t /n, kt /n)) .
k=1

Since (X t )t ≥0 is adapted, this set belongs to Ft ⊗ B[0,t ] . Therefore, for all n ≥ 1, the function (ω, s) ∈
Ω × [0, t ] 7→ X sn (ω) is measurable for Ft ⊗ B[0,t ] . Now a pointwise limit of measurable functions is
measurable, and therefore the function (ω, s) ∈ Ω × [0, t ] 7→ X s (ω) is also measurable for Ft ⊗ B[0,t ] ,
which means, since t > 0 is arbitrary, that (X t )t ≥0 is progressive.

A process X = (X t )t ≥0 taking its values in Rd can be seen as a random variable taking its values in the
“path space” P (R+ , Rd ) of functions from R+ to Rd . The measurability is for free if we equip P (R+ , Rd ) with
the σ-algebra AP (R+ ,Rd ) generated by the cylindrical events

{ f ∈ P (R+ , Rd ) : f (t 1 ) ∈ I 1 , . . . , f (t n ) ∈ I n }

where n ≥ 1, t 1 , . . . , t n ∈ R+ , and where I 1 , . . . , I n are products of intervals in Rd of the form di=1 (a i , b i ].


Q

Unfortunately P (R+ , Rd ) is so big that AP (R+ ,Rd ) turns out to be too small, and does not contain for instance
events of interest such that { f ∈ P (R+ , Rd ) : supt ∈[0,1] f (t ) < 1}.
We focus in this course on continuous processes. This suggests to consider C (R+ , Rd ) and the σ-algebra
AC (R+ ,Rd ) generated by the cylindrical events { f ∈ C (R+ , Rd ) : f (t 1 ) ∈ I 1 , . . . , f (t n ) ∈ I n } where n ≥ 1, t 1 , . . . , t n ∈
R+ , n ≥ 1, I 1 , . . . , I n are products of intervals in Rd of the form di=1 (a i , b i ]. We have then the following:
Q

Theorem 2.1.2. What a wonderful world.

On C (R+ , Rd ), the following σ-algebras coincide:

• σ-algebra AC (R+ ,Rd ) generated by the cylindrical events

• Borel σ-algebra BC (R+ ,Rd ) generated by the open sets of the topology of uniform convergence
on compact intervals of R+ .

Proof. Take d = 1 for simplicity. It can be shown that C (R+ , Rd ) equipped with the distance

2−n (1 ∧ max | f (t ) − g (t )|)
X
d(f ,g) =
n=1 t ∈[0,n]

is a Polish space in other words a complete and separate metric space, and the associated topology is the
one of uniform convergence on compact subsets of R+ . First we have the inclusion AC (R+ ,Rd ) ⊂ BC (R+ ,R)
since the σ-algebra AC (R+ ,R) is generated by the cylinders

{ f ∈ C (R+ , R) : f (t 1 ) < a 1 , . . . , f (t n ) < a n }, n ≥ 1, t 1 , . . . , t n ∈ R+ , a 1 , . . . , a n ∈ R,

16/142
2.1 Measurability

which are open subsets. Conversely, for all g ∈ C (R+ , R), all n ≥ 1, and all r > 0,

{ f ∈ C (R+ , R) : max | f (t ) − g (t )| ≤ r } = ∩t ∈Q∩[0,n] { f ∈ C (R+ , R) : | f (t ) − g (t )| ≤ r }


t ∈[0,n]

belongs to AC (R+ ,R) , and since these sets generate BC (R+ ,Rd ) , we get AC (R+ ,Rd ) = BC (R+ ,R) . ■

Theorem 2.1.3. Continuous processes as random variables on path space.

Let X = (X t )t ≥0 be a continuous d -dimensional process defined on (Ω, F , P). Let Ω′ ∈ F such that
P(Ω′ ) = 1 and Ω′ ⊂ {X • ∈ C (R+ , Rd )}. Then the map X |Ω′ : ω ∈ Ω′ → X • (ω) ∈ C (R+ , Rd ) is measurable
with respect to the σ-algebras F ′ = {F ∩ Ω′ : A ∈ A } and BC (R+ ,Rd ) .

Proof. Let us consider an arbitrary cylindrical event

F = { f ∈ C (R+ , Rd ) : f (t 1 ) ∈ I 1 , . . . , f (t n ) ∈ I n },
Qd
where n ≥ 1, t 1 , . . . , t n ∈ R+ , and I 1 , . . . , I n are product of intervals as i =1 (a i , b i ]. Then

Ω′ ∩ {X • ∈ F } = Ω′ ∩ {X t1 ∈ I 1 , . . . , X tn ∈ I n } ∈ F ′ .

Now BC (R+ ,Rd ) is generated by cylindrical events (Theorem 2.1.2). ■

Remark 2.1.4. Equality of processes, modification and indistinguishability.

Two processes X = (X t )t ≥0 and Y = (Y t )t ≥0 defined on the same probability space (Ω, F , P) are indis-
tinguishable when for almost all ω ∈ Ω the sample paths t 7→ X t (ω) and t 7→ Y t (ω) coincide, namely

P(∀t ≥ 0 : X t = Y t ) = 1.

There is a weaker notion in which the almost sure event depends on time, namely we say that Y is a
modification of X if for all t ≥ 0 the event Ωt = {ω ∈ Ω : X t (ω) ̸= Y t (ω)} is negligible, in other words

∀t ≥ 0 : P(X t = Y t ) = 1.

If X and Y are continuous then the two notions of indistinguishable and modification coincide.

If X = (X t )t ≥0 and Y = (Y t )t ≥0 are two processes taking values in Rd with same finite dimensional
marginal distributions, in the sense that for all n ≥ 1 and all t 1 , . . . , t n ∈ R+ , the random vectors (X t1 , . . . , X tn )
and (Y t1 , . . . , Y tn ) have same law in (Rd )n , then X and Y have same law as random variables on the path space
(P (R+ , R), AP (R+ ,Rd ) ). The following theorem provides a sort of converse, stated when d = 1 for simplicity.

Theorem 2.1.5. Kolmogorov extension theorem.

For all n ≥ 1 and all t ∈ Rn with 0 ≤ t 1 ≤ · · · ≤ t n , let µt1 ,...,tn be a probability measure on Rn . Let us
assume the following consistency condition:

• for all n ≥ 1, t ∈ Rn with 0 ≤ t 1 ≤ · · · ≤ t n , and all A 1 , . . . , A n−1 ∈ BR , we have

µt1 ,...,tn (A 1 × · · · × A n−1 × R) = µt1 ,...,tn−1 (A 1 × · · · × A n−1 ).

Then there exists a unique probability measure µ on the path space (P (R+ , R), AP (R+ ,R) ) such that
for all n ≥ 1, all t ∈ Rn with 0 ≤ t 1 ≤ · · · ≤ t n , and all A 1 , . . . , A n ∈ BR , we have

µ(πt1 ∈ A 1 , . . . , πtn ∈ A n ) = µt1 ,...,tn (A 1 × · · · × A n ),

where πt (ω) = ωt , namely πt : ω ∈ P (R+ , R) 7→ ωt ∈ R for all t ≥ 0.

17/142
2 Processes, filtrations, stopping times, martingales

Proof. For a cylindrical event A t1 ,...,tn (B ) = { f ∈ P (R+ , R) : ( f (t 1 ), . . . , f (t n )) ∈ B } where n ≥ 1, t ∈ Rn with


0 ≤ t 1 ≤ · · · ≤ t n , and where B ∈ BRn , we define µ(B ) = µt1 ,...,tn (B ). This makes sense thanks to the consis-
tency condition. Note that we could drop the ordering on the coordinates of t by defining µt1 ,...,tn = µt(1) ,...,t(n)
where t (1) ≤ · · · ≤ t (n) is the reordering. Moreover µ(P (R+ , R)) = 1. Since the set of cylinders satisfies the as-
sumptions of the Carathéodory extension theorem (Theorem 1.8.5), and generates the σ-algebra AP (R+ ,R) ,
it remains to show that µ is a σ-finite measure, which is the difficult part of the proof. See instance [4]. ■

2.2 Completeness

Contrary to discrete processes, continuous processes lead naturally to measurability issues.


In a probability space (Ω, F , P), we say that A ⊂ Ω is negligible when there exists A ′ ∈ F with A ⊂ A ′ and
P(A ′ ) = 0. We say that the (Ω, F , P) is complete when F contains the negligible subsets of Ω.
A filtration (Ft )t ≥0 on (Ω, F , P) is complete when F0 contains the negligible subsets of F .
Completeness emerges naturally via almost sure events which are complement of negligible subsets.

Theorem 2.2.1. Measurability of running supremum from completeness.

Let (X t )t ≥0 be a continuous process defined on a probability space (Ω, F , P) and taking values in a
topological space E equipped with its Borel σ-field E . Let f : E → R be a measurable function.

• If (Ω, F , P) is complete then sups∈[0,t ] f (X s ) is measurable for all t ≥ 0.

• If X is adapted with respect to a complete filtration (Ft )t ≥0 then (sups∈[0,t ] f (X s ))t ≥0 is adapted.

Proof. Let Ω′ ∈ F be an almost sure event on which X is continuous. Set S t = sups∈[0,t ] f (X s ).

• For all t ≥ 0 and A ∈ E , we have

Ω′ ∩ {S t ∈ A} = Ω′ ∩ sup f (X s ) ∈ A ∈ F ,
© ª
s∈[0,t ]∩Q

while (Ω \ Ω′ ) ∩ {S t ∈ A} ⊂ Ω \ Ω′ is negligible and thus belongs to F by completeness of (Ω, F , P).

• Same argument as before with Ft instead of F .

The notion of completeness is relative to the probability measure P. There is also a notion of universal
completeness, see [9], that do not depend on the probability measure, but we do not use it in these notes.

2.3 Stopping times

Definition 2.3.1. Stopping time.

A map T : Ω → [0, +∞] is a stopping time or optional time for a filtration (Ft )t ≥0 on (Ω, F , P) when
{T ≤ t } ∈ Ft for all t ≥ 0. All constant non-negative random variables are stopping times.

Contrary to discrete time filtrations, the notion of stopping times for continuous time filtration leads
naturally to the notions of complete filtration and right continuous filtration.
In a probability space (Ω, F , P), we say that A ⊂ Ω is negligible when there exists A ′ ∈ F with A ⊂ A ′
and P(A ′ ) = 0. A filtration (Ft )t ≥0 on (Ω, F , P) is complete when F0 contains the negligible subsets of F , in
particular all almost sure events. We say then that (Ω, F , (Ft )t ≥0 , P) is a complete filtered probability space.

Theorem 2.3.2. Hitting times as archetypal examples of stopping times.

Let X = (X t )t ≥0 be a continuous and adapted process on a probability space (Ω, F , P) with respect to
a complete filtration (Ft )t ≥0 , and taking its values in a metric space G equipped with its Borel σ-field.

18/142
2.3 Stopping times

Then, for all closed subset A ⊂ G, the hitting time T A : Ω → [0, +∞] of A, defined by

T A = inf{t ≥ 0 : X t ∈ A},

with convention inf ∅ = +∞, is a stopping time.

For instance Tn = T[n,∞) = inf{t ≥ 0 : |X t | ≥ n} when G = Rd .

Proof. Let Ω′ be the almost sure event on which X is continuous. On Ω′ , since X is continuous and A is
closed, we have {t ≥ 0 : X t ∈ A} = {t ≥ 0 : dist(X t , A) = 0}, the map t ≥ 0 7→ dist(X t , A) is continuous, and the
inf in the definition of T A is a min. Now, since X is adapted, we have, for all t ≥ 0,

Ω′ ∩ {T A ≤ t } = Ω′ ∩ {X s ∈ A} ∈ Ft ,
\
s∈[0,t ]∩Q

where we have also used the fact that Ω′ ∈ Ft for all t ≥ 0 since (Ft )t ≥0 is complete. On the other hand,
(Ω \ Ω′ ) ∩ {T A ≤ t } ⊂ Ω \ Ω′ is negligible, and belongs then to Ft for all t ≥ 0 since (Ft )t ≥0 is complete. ■

We say that a filtration (Ft )t ≥0 is right-continuous when Ft = Ft + for all t ≥ 0 where

Ft + = Ft +ε = Fs .
\ \
ε>0 s>t

Theorem 2.3.3. Stopping times: alternative definition.

If T : Ω → [0, +∞] is a stopping time with respect to a filtration (Ft )t ≥0 then {T < t } ∈ Ft for all t ≥ 0.
Conversely this property implies that T is a stopping time when the filtration is right-continuous.

Proof. If T is a stopping time then for all t ≥ 0 we have



{T ≤ t − 1/n} ∈ Ft ,
[
{T < t } =
n=1

(note also that {T = t } = {T ≤ t } ∩ {T < t }c ∈ Ft ). Conversely {T ≤ t } ∈ ∩s>t Fs = Ft + since for all s > t ,

{T < (t + 1/n) ∧ s} ∈ Fs .
\
{T ≤ t } =
n=1

This can be skipped at first reading.

The following generalizes Theorem 2.3.2 to hitting times of arbitrary measurable subsets by proges-
sive processes, at the price of assuming right continuity of the filtration in addition to completeness.

Theorem 2.3.4: Hitting times are stopping times reloaded.

Let X = (X t )t ≥0 be a progressive process defined on a probability space (Ω, F , P) equipped


with a right continuous and complete filtration (Ft )t ≥0 , and taking its values in a measurable
space G. Then for all measurable subset A ⊂ G, the hitting time T A : Ω → [0, +∞] defined by

T A = inf{t ≥ 0 : X t ∈ A},

with convention inf ∅ = +∞, is a stopping time.

Example of progressive processes include adapted right-continuous processes.

19/142
2 Processes, filtrations, stopping times, martingales

Proof. The debut D B of any B ∈ F ⊗ B(R+ ) is defined for all ω ∈ Ω by

D B (ω) = inf{t ≥ 0 : (ω, t ) ∈ B } ∈ [0, +∞].

If B is progressive, then D B is a stopping time (this is known as the debut theorem). Indeed, for all
t ≥ 0 the set {D B < t } is then the projection on Ω of {s ∈ [0, t ) : (ω, s) ∈ B }, which belongs to B(R+ )⊗Ft
since B is progressive. Since the filtration is right-continuous and complete, this projectiona belongs
to Ft . Now {D B < t } ∈ Ft for all t ≥ 0 implies that D B is a stopping time since the filtration is right
continuous (Theorem 2.3.3). Finally it remains to note that T A = D B with B = {(ω, t ) : X t ∈ A}, which
is progressive as the pre-image of R+ × A by the map (ω, t ) 7→ X t (ω) (recall that X is progressive). ■
a See [9, Th. IV.50 page 116]. This is related to a famous mistake made by the French Henri Lebesgue (1875 – 1941) on the

measurability of projections of measurable sets in product spaces, that motivated the Russian Nikolai Luzin (1883 – 1950)
and his student Mikhail Yakovlevich Suslin (1894 – 1919) to forge the concept of analytic set and descriptive set theory.

Remark 2.3.5. Canonical filtration.

It is customary to assume that the underlying filtration is right-continuous and complete. For a given
filtration (Ft )t ≥0 , it is always possible to consider its completion (σt )t ≥0 = (σ(N ∪ Ft ))t ≥0 where
N is the collection of negligible subsets of F . It is also customary to consider the right-continu-
ous version (σt + )t ≥0 , called the canonical filtration. A process is always adapted with respect to the
canonical filtration constructed from its completed natural filtration.

From now on and unless otherwise stated we make the “canonical assumption”:
we assume that the underlying filtration is complete and right-continuous.

Remark 2.3.6. Subtleties about righ-continuity of filtrations.

The natural filtration of a right-continuous process is not right-continuous in general, indeed a


counter example is given by X t = t Z for all t ≥ 0 where Z is a non-constant random variable, since
σ(X 0 ) = {∅, Ω} while σ(X 0+ε : ε > 0) = σ(Z ) ̸= σ(X 0 ). However it can be shown that the completion
of the natural filtration of a “Feller Markov process” – including all Lévy processes and in particular
Brownian motion – is always right-continuous.

Theorem 2.3.7. Stopping times properties.

Let S, T , and Tn , n ≥ 0 be stopping times for some underlying filtration (Ft )t ≥0 on an underlyning
probability space (Ω, F , P). Then:
1. the following family is a σ-algebra called the stopping σ-algebra:
FT = {A ∈ F : ∀t ≥ 0, A ∩ {T ≤ t } ∈ Ft }.
Moreover the stopping time T is FT -measurable

2. X = (X t )t ≥0 is adapted then the stopped process X T = (X t ∧T )t ≥0 is also adapted. Moreover


(X T )S = X S∧T = (X S )T

3. if (X t )t ≥0 is adapted and progressive and if T is a.s. finite then X T = (X t ∧T )t ≥0 is progressive

4. if X = (X t )t ≥0 is adapted and right-continuous then Z = X T 1T <∞ is FT -mesurable

5. if S ≤ T then FS ⊂ FT

6. S ∧ T and S ∨ T are stopping times and in particular FS∧T ⊂ FS∨T

7. if (Ft )t ≥0 is right-continuous then limn Tn and limn Tn are stopping times and
∩n FTn = Finfn Tn .

20/142
2.3 Stopping times

Proof. The proof of the first three items are left as exercises.

4. Let B ∈ BR and t ≥ 0. Then we have:

{Z ∈ B } ∩ {T ≤ t } = {X T ∧t ∈ B } ∩ {T ≤ t }.

Now we consider the composition of measurable maps:

ω ∈ (Ω, Ft ) 7→ (σ(ω) ∧ t , ω) ∈ ([0, t ] × Ω, B[0,1] ⊗ Ft ) 7→ X σ(ω)∧t (ω) ∈ (R, BR )

and we use the fact that X is progressive.

5. If A ∈ FS then, for all t ≥ 0, A ∩ {T ≤ t } = A ∩ {S ≤ t } ∩ {T ≤ t } ∈ Ft , hence A ∈ FT .

6. For all t ≥ 0 we have

{S ∧ T > t } = {S > t } ∩ {T > t } ∈ Ft and {S ∨ T ≤ t } = {S ≤ t } ∩ {T ≤ t } ∈ Ft .

7. It suffice to show that supn Tn and infn Tn are stopping times. But

{sup Tn ≤ t } = ∩n {Tn ≤ t } ∈ Ft and {inf Tn < t } = ∪n {Tn < t } ∈ Ft


n n

and therefore
{inf Tn ≤ t } = ∩ε>0 {inf Tn < t + ε} ∈ Ft + = Ft .
n n

Let A ∈ ∩n FTn . Then


A ∩ {inf Tn < t } = ∪n A ∩ {Tn < t } ∈ Ft .
n

Therefore
A ∩ {inf Tn ≤ t } ∈ Ft + = Ft .
n

Remark 2.3.8. Truncation via cutoff stopping times for continuous processes.

Truncation is an important tool in probability theory, and allows for instance to prove the strong law
of large numbers for i.i.d. integrable random variables by reduction to the case of more integrable
random variables. This tool is also available for stochastic processes, and its version with cutoff stop-
ping times has the advantage of keeping the martingale structure (Doob stopping, Theorem 2.5.1).
Let X = (X t )t ≥0 be adapted. For all n we introduce the “truncation” or “cutoff” stopping time

Tn = inf{t ≥ 0 : |X t | ≥ n},

which takes its values in [0, +∞]. We have Tn ≤ Tn+1 for all n. If X is continuous then almost surelya .

Tn ↗ +∞.
n→∞

Still if X is additionally continuous then almost surely and for all n ≥ 1 and all t ≥ 0,

|X t ∧Tn | ≤ n1|X 0 |≤n + |X 0 |1|X 0 |>n .

If X 0 = 0 then the process |X Tn | is bounded by n for all n ≥ 1. This is useful in this courseb .
a Indeed, almost surely, either the trajectory of X is bounded then T = +∞ for large enough n beyond a (random)
n
threshold, or the trajectory of X is unbounded and then by definition of being continuous and unbounded we have Tn ↗
+∞ as n → ∞. Without continuity X t could take arbitrary large values near a finite time forcing (Tn )n to be bounded.
b Localization is efficient for continuous processes issued from the origin. If X is discontinuous and in particular if it is

a discrete time process, then, due to a possible jump at time Tn , we could have |X Tn | > n even if X 0 = 0 and n is large.

21/142
2 Processes, filtrations, stopping times, martingales

2.4 Martingales, sub-martingales, super-martingales

We restrict for simplicity to continuous martingales/sub-martingales/super-martingales. But many of


the results remain actually valid for right-continuous martingales/sub-martingales/super-martingales.
The notion of martingale implements the idea of updating with a conditionally independent ingredient.

Definition 2.4.1. Martingales, sub-martingales, super-martingales.

Let X = (X t )t ≥0 be a real adapted and integrable process in the sense that for all t ≥ 0, X t is measur-
able for Ft and X t ∈ L1 . Then, when

• E(X t | Fs ) ≥ X s for all t ≥ 0 and all s ∈ [0, t ], we say that X is a sub-martingale,

• E(X t | Fs ) = X s for all t ≥ 0 and all s ∈ [0, t ], we say that X is a martingale

• E(X t | Fs ) ≤ X s for all t ≥ 0 and all s ∈ [0, t ], we say that X is a super-martingale.

These three notions can be seen in a sense as a probabilistic counterpart of the notions of increasing
sequence, constant sequence, and decreasing sequence in basic classical analysis.

• For a sub-martingale, t 7→ E(X t ) grows and in particular E(X t ) ≥ E(X 0 ) for all t ≥ 0

• For a martingale, t 7→ E(X t ) is constant, namely E(X t ) = E(X 0 ) for all t ≥ 0. It is a conservation law

• For a super-martingale, t 7→ E(X t ) decreases and in particular E(X t ) ≤ E(X 0 ) for all t ≥ 0.

The set of martingales is the intersection of the set of sub-martingales and the set of super-martingales.
A super-martingale or sub-martingale is a martingale if and only if its expectation is constant along time.
Being a martingale for a given filtration is a property stable by linear combinations.
If M is a martingale and if (t n )n≥0 is a strictly increasing sequence of times then the sequence of random
variables (M tn )n≥0 is a discrete time martingale. We will try to avoid using discrete time martingales, but we
will sometimes discretize time, notably to handle stopping times, which is roughly the same. The theory of
discrete time martingales is similar to the theory of continuous time martingales that we develop here and
comes with very similar theorems. In this course, most stochastic processes are in continuous time, and
when we say “continuous process/martingale/etc”, we mean that the process has continuous sample paths.

Example 2.4.2. Martingales.

1. If Y ∈ L1 then the process (X t )t ≥0 defined by X t = E(Y | Ft ) for all t ≥ 0 is a martingale with


respect to (Ft )t ≥0 known as the Doob martingale or a closed martingale. It is uniformly inte-
grable. Corollary 4.4.5 provides a sort of converse (u.i. martingales are closed)

2. If (X t )t ≥0 is a martingale and if ϕ : R → R is convex and such that ϕ(X t ) ∈ L1 for all t ≥ 0, then by
the Jensen inequality for conditional expectation, (Y t )t ≥0 = (ϕ(X t ))t ≥0 is a sub-martingale for
the same filtration. In particular (|X t |)t ≥0 , (X t2 )t ≥0 , and (e X t )t ≥0 are sub-martingales

3. If (X t )t ≥0 is a sub-martingale and if ϕ : R → R is convex and non-decreasing such that ϕ(X t ) ∈ L1


for all t ≥ 0, then by the Jensen inequality for condition expectation, (Y t )t ≥0 = (ϕ(X t ))t ≥0 is a
sub-martingale for the same filtration. In particular (e X t )t ≥0 is a sub-martingale

4. A martingale X = (X t )t ≥0 is also a martingale for its natural filtration (σ(X s : s ∈ [0, t ]))t ≥0

5. If (E n )n≥1 are independent and identically distributed exponential random variables of mean
1/λ, then, for all t ≥ 0, the number of these random variables falling in the interval [0, t ] is
N t = card{n ≥ 1 : E n ∈ [0, t ]}. It is known that the counting process (N t )t ≥0 has independent
and stationary increments of Poisson law, namely for all n ≥ 1 and 0 = t 0 ≤ · · · ≤ t n , the random
variables N t1 − N t0 , . . . , N tn − N tn−1 are independent of law Poi(λ(t 1 − t 0 )), . . . , Poi(λ(t n − t n−1 )).
We say that (N t )t ≥0 is the simple Poisson process of intensity λ. Now for the (natural) filtration

22/142
2.5 Doob stopping theorem and maximal inequalities

(Ft )t ≥0 , Ft = σ(N s : 0 ≤ s ≤ t ), and for all c ∈ R, the process (N t − c t )t ≥0 is a sub-martingale if


c < λ, a martingale if c = λ, and a super-martingale if c > λ. Namely, for all 0 ≤ s ≤ t ,

E(N t − ct | Fs ) = E(N t − N s − c(t − s) + N s − c s | Fs )


= E(N t − N s ) − c(t − s) + N s − c s
= (λ − c)(t − s) + N s − c s.

This process is not continuous, but has right-continuous and left limited trajectories (càdlàga ).

6. If (N t )t ≥0 is the simple Poisson process of intensity λ as above, then, for all 0 ≤ s ≤ t ,

E(eNt −c t | Fs ) = eNs −cs E(eNt −Ns )e−c(t −s) = eNs −cs eλ(t −s)(e−1)−c(t −s) .

It follows that for the natural filtration of (N t )t ≥0 , the process (eNt −ct )t t ≥0 is a sub-martingale if
c < λ(e−1), a martingale if c = λ(e−1), and a super-martingale if c > λ(e−1). We often say that
(eNt −c t )t ≥0 is an exponential (sub/super-)martingale.

7. The Brownian motion (B t )t ≥0 of Chapter 3 has independent and stationary Gaussian incre-
ments: for all n ≥ 1 and 0 = t 0 ≤ · · · ≤ t n the random variables B t1 − B t0 , . . . , B tn − B tn−1 are inde-
pendent of law N (0, t 1 − t 0 ), . . . , N (0, t n − t n−1 ). Thus the process (B t )t ≥0 is a martingale for its
natural filtration, indeed, for all 0 ≤ s ≤ t ,

E(B t | Fs ) = E(B t − B s + B s | Fs ) = E(B t − B s ) + B s = B s .

This process has continuous trajectories. Moreover and similarly, for all c ∈ R, the process
(B t2 − ct )t ≥0 is a sub-martingale if c < 1, a martingale if c = 1, and a super-martingale if c > 1.
The key is to use the decomposition B t = (B t −B s )2 +2B s B t −B s2 . We can also study the process
eB t −c t and seek for a condition on c to get a martingale, and we speak about an exponential
martingale. For simplicity, most of the martingales encountered in this course are continuous.
a Continu à droite avec limites à gauche.

2.5 Doob stopping theorem and maximal inequalities

Stopped martingales are martingales, and the conservation law extends to stopping times:

Theorem 2.5.1. Dooba stopping theorem.


a Named after Joseph L. Doob (1910 – 2004), American mathematician.

If M is a continuous martingale and T : Ω → [0, +∞] is a stopping time then M T = (M t ∧T )t ≥0 is a


(continuous) martingale, namely for all t ≥ 0 and s ∈ [0, t ], we have

M t ∧T ∈ L1 and E(M t ∧T | Fs ) = M s∧T .

Moreover, if T is bounded, or if T is almost surely finite and (M t ∧T )t ≥0 is u.i.a , then

M T ∈ L1 and E(M T ) = E(M 0 ).


a For instance dominated by an integrable random variable, or even bounded by a constant.

In practice, the best is to retain that (M t ∧T )t ≥0 is a martingale. We have limt →∞ M T ∧t 1T <∞ = M T 1T <∞
a.s. When T < ∞ a.s. we could use what we know on M and T to deduce by monotone or dominated
convergence that this holds in L1 , giving E(M T ) = E(limt →∞ M t ∧T ) = limt →∞ E(M t ∧T ) = E(M 0 ). Theorem
2.5.1 states that this is automatically the case when T is bounded or when M T is u.i. Furthermore, if M T is
u.i. then it can be shown that M ∞ exists, giving a sense to M T even on {T = ∞}, and then E(M T ) = E(M 0 ).

23/142
2 Processes, filtrations, stopping times, martingales

Proof. Let assume first that T takes a finite number of values t 1 < · · · < t n . Let us show that M T ∈ L1 and
E(M T ) = E(M 0 ). We have M T = nk=1 M tk 1T =tk ∈ L1 , and moreover, using {T ≥ t k } = (∪ik−1 {T = t i })c ∈ Ftk−1 ,
P
=1
and the martingale property E(M tk − M tk−1 | Ftk−1 ) = 0, for all k, we get

n
³X ´
E(M T ) = E(M 0 ) + E E(M tk − M tk−1 | Ftk−1 )1T ≥tk = E(M 0 ).
k=1

Suppose now that T takes an infinite number of values but is bounded by some constant C . For all n ≥ 0,
we approximate T by the piecewise constant random variable (discretization of [0,C ])1
n
X k
Tn = C 1T =C + t k 1tk−1 ≤T <tk where t k = t n,k = C .
k=1 n

This is a stopping time since it takes discrete values and for all m ≥ 0,

∅ ∈ F0

 if m ̸∈ {t k : 1 ≤ k ≤ n}
{Tn = m} = {T = C } ∈ FC if m = C

c
k−1 } ∩ {T < t k } ∈ Ft k

{T < t if m = t k , 1 ≤ k ≤ n

where we used the fact that {T = t } = {T ≤ t } ∩ {T < t }c = {T ≤ t } ∩ ∩∞


r =1 {T > t − 1/r } ∈ Ft for all t ≥ 0.
Since Tn takes a finite number of values, the previous step gives E(M Tn ) = E(M 0 ). On the other hand,
almost surely, Tn → T as n → ∞. Since M is continuous, it follows that almost surely M Tn → M T as n → ∞.
Let us show now that (M Tn )n≥1 is uniformly integrable. Since for all n ≥ 0, Tn takes its values in a finite set
t 1 < · · · < t mn ≤ C , the martingale property2 and the Jensen inequality give, for all R > 0,

E(|M Tn |1|MTn |≥R ) = E(|M tk |1|M tk |≥R,Tn =tk )


X
k
E(|E(MC | Ftk )|1|M tk |≥R,Tn =tk )
X
=
k
E(E(|MC | | Ftk )1|M tk |≥R,Tn =tk )
X

k
E(|MC |1|M tk |≥R,Tn =tk )
X
=
k
= E(|MC |1|MTn |≥R ).

Now M is continuous and thus locally bounded, and MC ∈ L1 , thus, by dominated convergence,

sup E(|M Tn |1|MTn |>R ) ≤ E(|MC |1sups∈[0,C ] |M s |≥R ) −→ 0.


n R→∞

Therefore (M Tn )n≥0 is uniformly integrable. As a consequence

a.s.
lim M Tn = M T ∈ L1 and E(M T ) = lim E(M Tn ) = E(M 0 ).
n→∞ n→∞

Let us suppose now that T is an arbitrary stopping time. For all 0 ≤ s ≤ t and A ∈ Fs , the random
variable S = s1 A + t 1 A c is a (finite) stopping time, and what precedes for the finite stopping time t ∧ T ∧ S
gives M t ∧T ∧S ∈ L1 and E(M t ∧T ∧S ) = E(M 0 ). Now, using the definition of S, we have

E(M 0 ) = E(M t ∧T ∧S ) = E(1 A M s∧T ) + E(1 A c M t ∧T ) = E(1 A (M s∧T − M t ∧T )) + E(M t ∧T ).

Since E(M t ∧T ) = E(M 0 ), we get E((M t ∧T − M s∧T )1 A ) = 0, namely the martingale property for (M t ∧T )t ≥0 .
Finally, suppose that T < ∞ a.s. and (M t ∧T )t ≥0 is u.i. The random variable M T is well defined and
limt →∞ M t ∧T = M T a.s. because M is continuous. Furthermore, since (M t ∧T )t ≥0 is u.i., it follows that M T ∈
L1 and limt →∞ M t ∧T = M T in L1 . In particular E(M 0 ) = E(M t ∧T ) = limt →∞ E(M t ∧T ) = E(M T ). ■
∀t
1 By using dyadics, we could define T
n in such a way that Tn ↘ T , giving M Tn → M T pointwise when M is right-continuous.
2 It also works for non-negative sub-martingales.

24/142
2.5 Doob stopping theorem and maximal inequalities

Example 2.5.2. Example of application of Doob stopping theorem.

Let (M t )t ≥0 be a continuous martingale, a < b, and T = inf{t ≥ 0 : M t ∈ {a, b}} the hitting time of the
boundary of [a, b]. Suppose that M 0 takes its values in [a, b] and that T is almost surely finitea . Then
on the one hand, we have the equation P(M T = a) + P(M T = b) = 1. On the other hand, by definition
of T the process (M t ∧T )t ≥0 is bounded and thus u.i. and the Doob stopping theorem (Theorem 2.5.1)
gives then x = E(M 0 ) = E(M T ) = a P(M T = a)+b P(M T = b). It follows by combining the equations that

b−x x −a
P(M T = a) = and P(M T = b) =
b−a b−a
(note that x ∈ [a, b]). This holds in particular for Brownian motion started from x ∈ [a, b], and by
using an exponential martingale, it is then even possible to compute the Laplace transform of T .
p
a Holds for BM B with B = 0 ∈ (a, b) since P(T = ∞) ≤ P(T > t ) ≤ P(B ∈ (a, b)) = P(
0 t t Z ∈ (a, b)) → 0 as t → ∞.

Coding in action 2.5.3. Gambler’s ruin.

Physically Brownian motion and the simple symmetric random walk are the same, it is just a matter
of scale. Fix a ≤ b in Z. Write a code to plot on the same graphic multiple trajectories of such a
random walk started from various values of x ∈ [a, b] and stopped when it reaches a or b. Could you
verify numerically the formulas of Example 2.5.2? And mathematically?

Remark 2.5.4. Counter example with an unbounded stopping time.

If M is a continuous martingale with M 0 = 0, then, for all a > 0, T a = inf{t > 0 : M t = a} is a stopping
time, but it cannot be bounded since this would give 0 = E(M 0 ) = E(M Ta ) = a > 0, a contradiction!

The following variant of the Doob stopping is useful in many applications.

Theorem 2.5.5. Doob stopping theorem for sub-martingales.

If M is a continuous sub-martingale and S and T are bounded stopping times such that S ≤ T , M S ∈
L1 , and M T ∈ L1 , then E(M S ) ≤ E(M T ).

Proof. We proceed as in the proof of Theorem 2.5.1, by assuming first that S and T take their values in the
finite set {t 1 , . . . , t n } where t 1 < · · · < t n . In this case M T and M S are in L1 automatically. The inequality S ≤ T
gives 1S≥t ≤ 1T ≥t for all t . Using this fact and the sub-martingale property of M , we get
n
³X ´ ³Xn ´
E(M S ) = E(M 0 ) + E E(M tk − M tk−1 | Ftk−1 ) 1S≥tk ≤ E(M 0 ) + E E(M tk − M tk−1 | Ftk−1 )1T ≥tk = E(M T ).
k=1 | {z } k=1
≥0

More generally, when S and T are arbitrary bounded stopping times satisfying S ≤ T , and at least when M
is a non-negative sub-martingale, we can proceed by approximation as in the proof of Theorem 2.5.1. ■

This can be skipped at first reading.

Theorem 2.5.6: Doob stopping theorem for non-negative super-martingales

If M is a continuous non-negative super-martingale and S and T are stopping times such that
S ≤ T , then M S ∈ L1 and M T ∈ L1 and E(M S ) ≥ E(M T | FS ), in particular E(M S ) ≥ E(M T ).

When S and T are bounded we recover Theorem 2.5.6 in the special case where M ≤ 0.

Proof. See for instance [31, Theorem 3.25 pages 64 – 65]. Note that since M is a non-negative super-
martingale, it is automatically bounded in L1 since 0 ≤ E(M t ) ≤ E(M 0 ) for all t ≥ 0. ■

25/142
2 Processes, filtrations, stopping times, martingales

The following theorem allows to control the tail of the supremum of a martingale over a time interval
by the moment at the terminal time. It is a continuous time martingale version of the simpler Kolmogorov
maximal inequality for sums of independent and identically distributed random variables.

Theorem 2.5.7. Doob maximal inequalities.

1. If M is a continuous martingale or non-negative sub-martingale then for all p ≥ 1, t ≥ 0, λ > 0,


³ ´ E(|M |p )
t
P max |M s | ≥ λ ≤ .
s∈[0,t ] λ p

2. If M is a continuous martingale then for all p > 1 and t ≥ 0,


³ ´ ³ p ´p ° ° p
E max |M s |p ≤ E(|M t |p ) in other words ° max |M s |° ≤ ∥M t ∥p .
° °
s∈[0,t ] p −1 s∈[0,t ] p p −1

In particular if M t ∈ Lp then M t∗ = maxs∈[0,t ] |M s | ∈ Lp .

Note that q = 1/(1 − 1/p) = p/(p − 1) is the Hölder conjugate of p in the sense that 1/p + 1/q = 1.
The Doob inequality is often used with p = 2, for which (p/(p − 1))p = 4.

Proof. We can assume that the right hand side is finite (M t ∈ Lp ), otherwise the inequalities are trivial.

1. If M is a martingale, then by the Jensen inequality for the convex function u ∈ R 7→ |u|p , the process
|M |p is a sub-martingale. Similarly, If M is a non-negative sub-martingale then, since u ∈ [0, +∞) 7→
u p is convex and non-decreasing it follows that M p = |M |p is a sub-martingale. Therefore in all cases
(|M s |p )s∈[0,t ] is a sub-martingale. Let us define the bounded stopping time

T = t ∧ inf{s ≥ 0 : |M s | ≥ λ}.

Since M is continuous we have |M T | ≤ max(|M 0 |, λ) and thus M T ∈ L1 . The Doob stopping Theorem
2.5.5 for the sub-martingale |M |p and the bounded stopping times T and t that satisfy T ≤ t gives

E(|M T |p ) ≤ E(|M t |p ).

On the other hand the definition of T gives

|M T |p ≥ λp 1maxs∈[0,t ] |M s |≥λ + |M t |p 1maxs∈[0,t ] |M s |<λ ≥ λp 1maxs∈[0,t ] |M s |≥λ .

It remains to combine these inequalities to get the desired result

2. We first reduce to the case where M satisfies maxs∈[0,t ] |M s | ∈ Lp . To do so, we introduce for all n ≥ 1
the truncation or localization stopping time3 Tn = inf{s ≥ 0 : |M s | ≥ n} . By the Doob stopping theorem
(Theorem 2.5.1), the process (M s∧Tn )s∈[0,t ] is a martingale. Moreover, since M is continuous, we have
the domination |M s∧Tn | ≤ |M 0 | ∧ n, and since M t ∈ Lp gives M 0 ∈ Lp , we obtain maxs∈[0,t ] |M s∧Tn | ∈ Lp .
The desired inequality for the dominated martingale (M s∧Tn )s∈[0,t ] would give
¶p
p
µ
p
E( max |M s∧Tn | ) ≤ E(|M t ∧Tn |p ),
s∈[0,t ] p −1
and the desired result for (M s )s∈[0,t ] would then follow by monotone convergence theorem as n →
∞ since then |M s∧Tn | ↗ |M s | for all s ∈ [0, t ]. Thus this shows that we can assume without loss of
generality that sups∈[0,t ] |M s | ∈ Lp . This is our first martingale localization argument!
By using the proof of the first item with p = 1 and decomposing M t as we did for M T , we get
E(|M t |1maxs∈[0,t ] |M s |≥λ )
P( max |M s | ≥ λ) ≤
s∈[0,t ] λ
3 Since we are only interested by the time interval [0, t ], we could take ∧t which makes the stopping time bounded.

26/142
2.5 Doob stopping theorem and maximal inequalities

for all λ > 0, and thus


Z ∞ Z ∞
p−1
λ P( max |M s | ≥ λ)dλ ≤ λp−2 E(|M t |1maxs∈[0,t ] |M s |≥λ )dλ.
0 s∈[0,t ] 0

Now the Fubini – Tonelli theorem gives


Z ∞ Z maxs∈[0,t ] |M s | 1
λp−1 P( max |M s | ≥ λ)dλ = E λp−1 dλ = E( max |M s |p ).
0 s∈[0,t ] 0 p s∈[0,t ]

and similarly (here we need p > 1)


Z ∞
1
λp−2 E(|M t |1maxs∈[0,t ] |M s |≥λ) )dλ = E(|M t | max |M s |p−1 ).
0 p −1 s∈[0,t ]

Combining all this gives


p
E( max |M s |p ) ≤ E(M t max |M s |p−1 ).
s∈[0,t ] p −1 s∈[0,t ]

But since the Hölder inequality gives


p−1
E(|M t | max |M s |p−1 ) ≤ E(|M t |p )1/p E( max |M s |p ) p ,
s∈[0,t ] s∈[0,t ]

we obtain
p p−1
E( max |M s |p ) ≤ E(|M t |p )1/p E( max |M s |p ) p .
s∈[0,t ] p −1 s∈[0,t ]

Consequently, since E(maxs∈[0,t ] |M s |p ) < ∞, we obtain the desired inequality.

Example 2.5.8. A consequence of Doob maximal inequality.

Let (M t )t ≥0 be a continuous martingale bounded in Lp , p > 1, namely

C p = sup E(|M t |p ) < ∞.


t ≥0

It follows that M is u.i. But the Doob maximal inequality says more. Namely, by Theorem 2.5.7, for
all t ≥ 0, E(maxs∈[0,t ] |M s |p ) ≤ C p . The monotone convergence theorem gives then

E(sup |M t |p ) ≤ C p < ∞.
t ≥0

Therefore, almost surely supt ≥0 |M t | < ∞. In other words M has almost surely bounded trajectories.
Beware however that the bound is random and may depend on the trajectory.

The following version of Doob maximal inequality is useful for some applications.

Theorem 2.5.9. Doob maximal inequality for super-martingales.

If M is a continuous super-martingale, then for all t ≥ 0 and λ > 0, denoting M − = max(0, −M ),


´ E(M ) + 2E(M − )
0
³
t
P max |M s | ≥ λ ≤ .
s∈[0,t ] λ

In particular when M is non-negative then E(M − ) = 0 and the upper bound becomes E(M 0 )/λ.

This can be skipped at first reading.

Proof. Let us define the bounded stopping time


T = t ∧ inf{s ∈ [0, t ] : M s ≥ λ}.

27/142
2 Processes, filtrations, stopping times, martingales

We have M T ∈ L1 since |M T | ≤ max(|M 0 |, |M t |, λ). By the Doob stopping Theorem 2.5.5 with the sub-
martingale −M and the bounded stopping times 0 and T that satisfy M 0 ∈ L1 and M T ∈ L1 , we get

E(M 0 ) ≥ E(M T ) ≥ λP( max M s ≥ λ) + E(M t 1maxs∈[0,t ] M s <λ )


s∈[0,t ]

hence, recalling that M − = max(−M , 0),

λP( max M s ≥ λ) ≤ E(M 0 ) + E(M t− ).


s∈[0,t ]

This produces the desired inequality when M is non-negative. For the general case, we observe
that the Jensen inequality for the non-decreasing convex function u ∈ R 7→ max(u, 0) and the sub-
martingale −M shows that M − is a non-negative sub-martingale. Thus, by Theorem 2.5.1,

λP( max M s− ≥ λ) ≤ E(M t− ).


s∈[0,t ]

Finally, putting both inequalities together gives

λP( max |M s | ≥ λ) ≤ λP( max M s ≥ λ) + λP( max M s− ≥ λ) ≤ E(M 0 ) + 2E(M t− ).


s∈[0,t ] s∈[0,t ] s∈[0,t ]

28/142
Chapter 3

Brownian motion

Just like the central limit theorem, Brownian motion is a physical as well as a mathematical phenomenon,
see figures 3.1, 3.2, and 3.3. In this chapter, we study some properties of the mathematical Brownian motion.

Markov processes

Lévy processes

Brownian Motion

Martingales Gaussian processes

For all t > 0, d ≥ 1, the density of the Gaussian distribution N (0, t I d ) on Rd is

|x|2
d e− 2t
x ∈R →
7 p t (x) = p where |x|2 = x 12 + · · · + x d2 .
( 2πt )d

We have, for all s, t > 0,


Z
p t +s (x) = (p t ∗ p s )(x) = p t (x − z)p s (z)dz.
Rd

29/142
3 Brownian motion

20 25

20

15

-20
10

5
-40

-60

-5

-80 -10
-60 -50 -40 -30 -20 -10 0 10 -10 0 10 20 30 40

10 20

0
0

-10

-20

-20

-40
-30

-40 -60
-50 -40 -30 -20 -10 0 10 -40 -30 -20 -10 0 10

Figure 3.1: First steps of four approximated sample paths of 2-dimensional Brownian motion issued from
the origin, numerically simulated with a Gaussian random walk via code plot(cumsum(randn(2,1000))).

Figure 3.2: From the famous book [39] of Jean Perrin (1870 – 1942), three tracings of the motion of col-
loidal particles of radius 0.53 µm, as seen under the microscope are displayed. Successive positions every
30 seconds are joined by straight line segments (mesh size is 3.2 µm). These precise and systematic experi-
ments, inspired by the historical ones by Robert Brown (1773 – 1858), allowed to test the atomistic theory of
Ludwig Boltzmann (1944 – 1906), Albert Einstein (1879 – 1955), Marian Schmoluchovski (1872 – 1917), and
others. “Ainsi, la théorie moléculaire du mouvement brownien peut-être regardée comme expérimentalement
établie, et, du même coup, il devient assez difficile de nier la réalité objective des molécules.”. Louis Bachelier
(1870 – 1946) identified independently a similar physical phenomenon in the behavior of stock markets.

30/142
Figure 3.3: Atomistic interpretation of physical Brownian motion: a big particle of dust in a liquid is subject
to a high number of collisions with the molecules of the liquid, which are much smaller and disordered by
heat. This leads to the kinetic interpretation behind the Langevin equation in Example 8.2.7. In reality, the
diameter ratio is high, for instance the colloidal particle observed by Perrin has diameter of 0.57 µm while
a molecule of water has a diameter of 0.27 nm, which gives a diameter ratio of about 2000. Moreover in
reality the molecules density is high, the distance between molecules being of 0.31 nm for water. With this
atomistic interpretation, physical Brownian motion is essentially a random walk, seen at a space-time scale
which makes it close to mathematical Brownian motion, its idealistic scaling limit.

Definition 3.0.1. Brownian motiona or Wienerb process.


a Named after Robert Brown (1773 – 1858), Scottish botanist.
b Named after Norbert Wiener (1894 – 1964), American mathematician.

A d -dimensional Brownian motion (BM) is a d -dimensional process B = (B t )t ≥0 which has:

1. Almost surely continuous trajectories, in the sense that B is a continuous process.

2. Stationary, Gaussian, independent increments:

• for all 0 ≤ s ≤ t , B t − B s ∼ N (0, (t − s)I d )


• for all t 0 = 0 < t 1 < · · · < t n , n ≥ 0, B t1 − B t0 , . . . , B tn − B tn−1 are independent.

Beware that there are no conditions on B 0 , and in particular B t = B 0 + B t − B 0 may not be Gaussian.

# Python program generating the graphic used for the lecture notes cover
import numpy as np ; import matplotlib.pyplot as pp
for i in range(1,11):
pp.plot(np.cumsum(np.random.randn(1,1000)[0]),'k-',linewidth=1)
pp.axis('off') ; pp.show()

# Julia program generating the graphic used for the lecture notes cover
using Pkg ; Pkg.add("Plots") ; using Plots
for i=1:10
plot!(cumsum(randn(1000,1),dims = 1), lw = 1, legend = false, grid = false,
,→ axes=([],false))
end
gui()

31/142
3 Brownian motion

Coding in action 3.0.2. Stochastic simulation.

By using the structure of the increments, write your own program simulating and plotting approxi-
mated trajectories of BM. Can we check numerically that the mathematical object of Brownian mo-
tion exists? Let D be a closed domain of Rd such as a disc or a square, containing the origin 0. Let ∂D
be its boundary, let B be a BM with B 0 = 0, and let T = inf{t ≥ 0 : B t ∈ ∂D}. Write a program simulating
the law of T and the law of B T , and producing nice plots when d = 1, d = 2, d = 3.

Remark 3.0.3. Gaussiana and Lévyb processes.


a Named after Carl Friedrich Gauss (1777 – 1855), German mathematician.
b Named after Paul Lévy (1886 – 1971), French mathematician.

For all n ≥ 1 and 0 ≤ t 1 < · · · < t n the random vector (B t1 , . . . , B tn ) is Gaussian, and we say that Brown-
ian motion is a Gaussian process. On the other hand, for all n ≥ 1 and 0 = t 0 < · · · < t n the increments
B t1 − B t0 , . . . , B tn − B tn−1 are independent and stationary in the sense that their law depends only on
the differences t 1 −t 0 , . . . , t n −t n−1 between successive times. Also Brownian motion has independent
and stationary increments and such processes are called Lévy processes. They form a special sub-
class of Markov processes. The Poisson process considered in Example 2.4.2 is also a Lévy process,
for which the increments are Poisson and the trajectories right continuous with left limits.

Remark 3.0.4. Finite dimensional laws.

A d -dimensional continuous process X = (X t )t ≥0 issued from x ∈ Rd is a Brownian Motion iff for all
n ≥ 0, all 0 < t 1 < · · · < t n , all A i ∈ BRd , 1 ≤ i ≤ n, we have
Z
P(X t1 ∈ A 1 , . . . , X tn ∈ A n ) = p t1 (x 1 − x)p t2 −t1 (x 2 − x 1 ) · · · p tn −tn−1 (x n − x n−1 )dx 1 · · · dx n .
A 1 ×···×A n

Remark 3.0.5. Reduction to centered case.

From the definition, we get that if B = (B t )t ≥0 is a Brownian motion issued from the origin namely
B 0 = 0 and if H is a random variable then (H + B t )t ≥0 is also a Brownian motion, issued from H .

Remark 3.0.6. Reduction to one-dimensional case.

From the definition, if X = (X t )t ≥0 is d -dimensional with coordinates X t = (X t1 , . . . , X td ) in Rd , then X


is a Brownian motion issued from the origin iff the following two properties hold true:

1. for all 1 ≤ i ≤ d , (X ti )t ≥0 is a Brownian motion issued from the origin

2. the processes (X t1 )t ≥0 , . . . , (X td )t ≥0 are independent.

3.1 Characterizations and martingales

Theorem 3.1.1. Characterization of BM by Gaussianity and covariance.

If X = (X t )t ≥0 is real, continuous, issued from the origin, then X is a Brownian motion if and only if
X is a Gaussian process, centered, with covariance given by E(X t X s ) = s ∧ t for all s, t ≥ 0.

Proof.

1. Suppose that X = (X t )t ≥0 is a Brownian motion issued from the origin, then for all 0 < t 1 < · · · < t n the
random variables X t1 , X t2 − X t1 , . . . , X tn − X tn−1 are Gaussian, centered, and independent, and X 0 = 0,

32/142
3.1 Characterizations and martingales

and (X t1 , X t2 −X t1 , . . . , X tn −X tn−1 ) and (X t1 , . . . , X tn ) are (centered) Gaussian random vectors in the sense
that all linear combinations of their coordinates are Gaussian. Moreover, for all 0 ≤ s ≤ t , we have

E(X s X t ) = E(X s (X t − X s )) + E(X s2 ) = 0 + s = s = s ∧ t .

2. Conversely, if X = (X t )t ≥0 is a Gaussian process, centered, with E(X s X t ) = s ∧ t for all s, t ≥ 0, then


for all 0 < t 1 < · · · < t n , the random vector (X t1 , X t2 − X t1 , . . . , X tn − X tn−1 ) is Gaussian, centered, with
diagonal covariance diag(t 1 , t 2 − t 1 , . . . , t n − t n−1 ), which implies that (X t )t ≥0 is a Brownian motion.

Corollary 3.1.2. Scale invariance by space-time scaling.


³ ´
If B = (B t )t ≥0 is a BM on R, issued form the origin, then for all c ∈ (0, +∞), p1 B ct is a BM.
c t ≥0

³ ´
Proof. The process p1 B c t is continuous, Gaussian, centered, with same covariance as BM. ■
c t ≥0

Theorem 3.1.3. Fourier and Laplace martingale characterizations of Brownian motion.

Let X = (X t )t ≥0 be a d -dimensional continuous process issued from the origin.


The following properties are equivalent:

1. X is a Brownian motion
|λ|2 t
2. For all λ ∈ Rd , (M tλ )t ≥0 = (eiλ·X t + 2 )t ≥0 is a martingalea for the natural filtration of X
|λ|2 t
3. For all λ ∈ Rd , (N tλ )t ≥0 = (eλ·X t − 2 )t ≥0 is a martingale for the natural filtration of X .
a The notion of martingale remains valid for complex valued processes.

Proof. Let us define G t = σ(X s : s ∈ [0, t ]) for all t ≥ 0. The process X is a BM iff for all 0 ≤ s < t , X t − X s is
independent of G s and X t − X s ∼ N (0, (t − s)I d ), in other words if and only if for all 0 ≤ s < t and λ ∈ Rd ,

|λ|2 (t −s)
E(eiλ·(X t −X s ) | G s ) = e− 2 .

By multiplying both sides by eiuZ for an arbitrary bounded G s measurable random variable Z and taking the
expectation we get that X t −X s is independent of G s and X t −X s ∼ N (0, (t −s)I d ). This shows the equivalence
of the first two properties. The third property is just the Laplace (instead of Fourier) transform version. ■

Definition 3.1.4. Brownian motion with respect to a filtration.

Let (Ω, F , (Ft )t ≥0 , P) be a filtered probability space. We say that a continuous d -dimensional process
X = (X t )t ≥0 is an (Ft )t ≥0 Brownian motion when it is (Ft )t ≥0 adapted and for all t ≥ 0 and s ∈ [0, t ],
the increment X t − X s is independent of Fs and follows the Gaussian law N (0, (t − s)I d ), which is
equivalent to say that for all λ ∈ Rd , the process (exp(iλ · X t + 12 |λ|2 t ))t ≥0 is an (Ft )t ≥0 -martingale.

Remark 3.1.5. Definitions of Brownian motion (BM).

If X = (X t )t ≥0 is an (Ft )t ≥0 BM, then X is a BM in the sense of Definition 3.0.1. Conversely, a BM


(X t )t ≥0 in the sense of Definition 3.0.1 is an (G t )t ≥0 BM where G t = σ(X s : s ≤ t ) for all t ≥ 0 is the
natural filtration associated to X (see Theorem 3.1.3).

33/142
3 Brownian motion

Theorem 3.1.6. Martingale properties.

Let B = (B t )t ≥0 be an (Ft )t ≥0 d -dimensional Brownian motion and let B t = (B t1 , . . . , B td ) be the coor-


dinates of the random vector B t . Then for all 0 ≤ s < t an 1 ≤ j , k ≤ d ,
j j j j
E(B t − B s | Fs ) = 0 and E((B t − B s )(B tk − B sk ) | Fs ) = (t − s)1 j =k .

As a consequence, for all 1 ≤ j , k ≤ d ,


j
• (B t )t ≥0 is a continuous (Ft )t ≥0 -martingale, provided that B 0 ∈ L1
j
• (B t B tk − 1 j =k t )t ≥0 is a continuous (Ft )t ≥0 -martingale, provided that B 0 ∈ L2 .

Actually it turns out that these properties characterize Brownian motion (see Theorem 7.2.1).
j
Proof. The first property follows from the fact that (B t )t ≥0 is a BM. For the second property, we write

j j j j
E((B t − B s )(B tk − B sk ) | Fs ) = E((B t − B s )(B tk − B sk ))
j j j
= E((B t − B s ))E((B tk − B sk ))1 j ̸=k + E((B t − B sk )2 )1 j =k
= 0 + (t − s)1 j =k .

As a consequence, for all 0 ≤ s ≤ t and 1 ≤ j , k ≤ d ,


j j j
E(B t | Fs ) = B s = E(B s | Fs )

and
j j j
E(B t B tk − t 1 j =k | Fs ) = B s B sk − s1 j =k = E(B s B sk − s1 j =k | Fs ).

Up to now, we study BM but it is unclear if BM exists or not! Actually an explicit construction of BM is


given in Section 3.6. Other constructions are available, see for instance [28].

3.2 Variation of trajectories and quadratic variation

See Definition 1.7.1 (finite variation functions) and Definition 4.1.1 (quadratic variation of processes).

Theorem 3.2.1. Variation and quadratic variation of Brownian motion.

Let B = (B t )t ≥0 be a BM issued from the origin, let [u, v] be a finite interval, 0 ≤ u < v, and let δ be a
partition or sub-division of [u, v], δ : u = t 0 < · · · < t n = v, n ≥ 1. Let us consider the quantities

n−1 n−1
|B ti +1 − B ti |2 .
X X
r 1 (δ) = |B ti +1 − B ti | and r 2 (δ) =
i =1 i =0

Then the following properties hold true:

1. lim|δ|→0 r 2 (δ) = v − u in L2 and thus in P, where |δ| = sup0≤i ≤n (t i +1 − t i ). In other words, the
quadratic variation of B on a finite interval is equal to the length of the interval.

2. supδ∈P r 1 (δ) = +∞ almost surely, where P is the set of subdivision of [u, v]. In other words the
sample paths of B are almost surely of infinite variation on all intervals.

Rb
The second proprery implies that we cannot hope to define an integral a ϕt dB t (ω) with ϕ continuous
as in Theorem 1.7.2 because t 7→ B t (ω) is of infinite variation on all intervals for almost all ω. However, and
following Itô, the first property will be the key to give a sort of L2 or in P meaning to such stochastic integrals.
The quadratic variation of square integrable continuous martingales is considered in Theorem 4.1.4.

34/142
3.3 Blumenthal zero-one law and its consequences on the trajectories

Proof. We could use Lemma 4.1.2 to get that the sample path of B have infinite variation on the time interval
[0, t ]. Let us be more precise by using the special explicit nature of Brownian motion.

1. If Z ∼ N (0, 1) then E(Z 4 ) = 3, hence


¢2 ¢
E((r 2 (δ))2 ) = E |B ti +1 − B ti |2
¡¡ X
i
E(|B ti +1 − B ti |4 ) + 2 E(|B ti +1 − B ti |2 |B t j +1 − B t j |2 )
X X
=
i i<j

= 3 (t i +1 − t i )2 + 2 (t i +1 − t i )(t j +1 − t j )
X X
i i<j
¢2
= 2 (t i +1 − t i )2 +
X ¡X
(t i +1 − t i )
i i
= 2 (t i +1 − t i )2 + (v − u)2 .
X
i

Moreover E(r 2 (δ)) =


P
i (t i +1 − t i ) = v − u. Thus

E((r 2 (δ) − (v − u))2 ) = 2 (t i +1 − t i )2 ≤ 2 max(t i +1 − t i )(v − u) −→ 0.


X
i i |δ|→0

2. From the first part, there exists a sequence of subdivisions (δk )k of [u, v] such that

lim r 2 (δk ) = lim |B t k − B t k |2 = v − u


X
almost surely
k→∞ k→∞ i i +1 i

and thus, almost surely,

|B t k − B t k |2
P
i
k
X i +1 i
sup r 1 (δ) ≥ r 1 (δ ) = |B t k − B t k | ≥ −→ +∞,
δ i
i +1 i maxi |B t k − B t k | k→∞
i +1 i

where used the fact that almost surely, maxi |B t k − B t k | → 0 as k → ∞ since B • is continuous and
i +1 i
hence uniformly continuous on every compact interval such as [u, v] (Heine theorem).

3.3 Blumenthal zero-one law and its consequences on the trajectories

This can be skipped at first reading.

Theorem 3.3.1: Properties of Brownian trajectories

If B = (B t )t ≥0 is a one-dimensional BM on R issued form the origin, and Ft = σ(B s : s ∈ [0, t ]),


then:

1. Blumenthala 0-1 law. The σ-algebra F0+ = ∩t >0 Ft is trivial: for all A ∈ F0+ , P(A) ∈ {0, 1}

2. Almost surely, for all ε > 0, infs∈[0,ε] B s < 0 and sups∈[0,ε] B s > 0

3. For all a ∈ R, almost surelyb , T a = inf{t ≥ 0 : B t = a} < ∞

4. Almost surelyc , limt →∞ B t = −∞ and limt →∞ B t = +∞

5. Almost surely, the function t ∈ R+ → B t is not monotone on any non singleton interval.
a Named after Robert McCallum Blumenthal (1931 – 2012), American mathematician.
b However T
a is not bounded, see Remark 2.5.4.
c This does not imply that a.s. lim
t →∞ |B t | = +∞.

35/142
3 Brownian motion

Proof.

1. The idea is to show that F0+ is independent of itself. For all A ∈ F0+ , all k ≥ 1, all bounded
continuous f : Rk → R, and all 0 < t 1 < · · · < t k , we have

E(1 A f (B t1 , . . . , B tk )) = lim+ E(1 A f (B t1 − B ε , . . . , B tk − B ε )).


ε→0

Now when 0 < ε < t 1 , the random variables B t1 −B ε , . . . , B tk −B ε are independent of Fε (structure
of the increments of simple Markov property), and thus independent of F0+ . It follows that

E(1 A f (B t1 , . . . , B tk )) = lim+ P(A)E( f (B t1 − B ε , . . . , B tk − B ε )) = P(A)E( f (B t1 , . . . , B tk )).


ε→0

Hence F0+ is independent of σ(B t1 , . . . , B tk ) for all t i ’s, and thus is independent of σ(B t , t > 0).
But σ(B t , t > 0) = σ(B t , t ≥ 0) since B 0 = 0. It remains to note that F0+ ⊂ σ(B t , t ≥ 0).

2. For the statement with the sup, it suffices to show that P(A) = 1 where
\n o
A= sup B s > 0 .
n s∈[0,1/n]

We can restrict the intersection to n ≥ N for an arbitrary large threshold N , therefore A ∈ F0+ .
Next, thanks to the Blumenthal zero-one law, it suffices to show that P(A) > 0. Now
³ ´
P sup B s > 0 ↘ P(A)
s∈[0,1/n] n→∞

while ³ ´ 1
P sup B s > 0 ≥ P(B 1/n > 0) = ,
s∈[0,1/n] 2
giving P(A) ≥ 1/2 and thus P(A) = 1. The statement with inf follows by using −B instead of B .

3. Thanks to the previous property,


³ ´ ³ ´
P sup B s > ε ↗ P sup B s > 0 = 1.
s∈[0,1] ε→0 s∈[0,1]

But by the scale invariance (Corollary 3.1.2),


³ ´ ³ ´ ³ ´
P sup B s > ε = P sup ε−1 B ε2 s > 1 = P sup B s > 1
s∈[0,1] s∈[0,ε−2 ] s∈[0,ε−2 ]

Now, since ³ ´ ³ ´
P sup B s > 1 ↗ P sup B s > 1 ,
s∈[0,ε−2 ] ε→0 s≥0

we get ³ ´
P sup B s > 1 = 1.
s≥0

Again by scaling, we obtain, for all R > 0, and also by replacing B by −B ,


³ ´ ³ ´
P sup B s > R = 1 and P inf B s < −R = 1.
s≥0 s≥0

This implies that for all a ∈ R, almost surely T a < ∞.

4. This is implied directly by the end of the proof of the previous item.

5. From the item about the inf and sup, and the structure of increments, we have, almost surely,
for all t ∈ Q ∩ R+ and all ε > 0, infs∈[t ,t +ε] B s < B t and sups∈[t ,t +ε] B s > B t , hence the result.

36/142
3.3 Blumenthal zero-one law and its consequences on the trajectories

Corollary 3.3.2: Law of hitting time via Laplace transform

Let (B t )t ≥0 be a one-dimensional Brownian motion with B 0 = 0. For all a > 0, let us consider
the hitting time T a = inf{t ≥ 0 : B t = a}, which is almost surelyp finite thanks to Theorem 3.3.1.
Then its Laplace transform is given by λ ≥ 0 7→ E(e−λTa ) = e−a 2λ , and it has density

a 2
− a2t
t ∈ R+ 7→ p 3
e .
2πt

c2
Proof. For all c > 0 and n, the Doob stopping (Theorem 2.5.1) with the martingale (ecB t − 2 t )t ≥0 and
c2 c2
the bounded stopping time T a ∧ n gives E(ecB Ta ∧n − 2 (Ta ∧n) ) = 1. Now, since ecB Ta ∧n − 2 (Ta ∧n) ≤ ec a ,
c2
we get, by dominated convergence, E(ecB Ta − 2 Ta ) = 1. Next, since B has almost surely continuous
trajectories, we have B Ta = a almost surely, and this gives the formula for the Laplace transform. The
formula for the density follows then by the inversion formula for the Laplace transform. ■

37/142
3 Brownian motion

3.4 Strong law of large numbers, invariance by time inversion, law of iterated logarithm

The nature of the increments of Brownian motion leads to formulate the following theorem.

Theorem 3.4.1. Strong law of large numbers.

Bt
If (B t )t ≥0 is BM on R with B 0 = 0 then lim = 0 almost surely and in Lp for all p ∈ [1, ∞).
t →∞ t

p law
The central limit theorem would be the trivial statement t Btt −→ N (0, 1).
t →∞
The a.s. remains valid for an arbitrary B 0 , and the Lp convergence if B 0 ∈ Lp .
³¯ ¯p ´
Proof. Since for all t > 0 and all p > 0, E ¯ Btt ¯ = E(|B
p
1| )
and B 1 ∼ N (0, 1), we have immediately
¯ ¯
t p/2

B t Lp Bt P
−→ 0 and in particular −→ 0.
t t →∞ t t →∞
To get the almost sure convergence, we need some tightness, a control of tails that can be done via moments.
Let us prove the a.s. convergence. Let a and b be real numbers such that 0 < a < b. We have
µ ¶2 ¶
Bt
µ µ ¶
1
E sup ≤ 2 E sup B t2 .
a≤t ≤b t a a≤t ≤b

The Doob maximal inequality of Theorem 2.5.7 applied to the martingale (B a+t )t >0 on [0, b − a] yields
¶2 ¶
Bt
µ µ
1 4b
E sup ≤ 2
4E(B b2 ) = 2 .
a≤t ≤b t a a

Applying this to a = 2n and b = 2n+1 we obtain


¶2 ¶
Bt
µ µ
8
E sup ≤
2n ≤t ≤2n+1 t 2n

Thus, by the Markov inequality, for any ε > 0,


¯ ¯ µ ¶2 ¶
¯ Bt ¯ Bt
µ ¶ µ
1 8
P sup ¯ ¯ > ε ≤ 2 E
¯ ¯ sup ≤ n 2,
2n ≤t ≤2n+1 t ε 2n ≤t ≤2n+1 t 2 ε

which gives
∞ µ ¯ ¯
¯ Bt ¯

P sup ¯ ¯ > ε < ∞.
X
¯ ¯
n=0 2n ≤t ≤2n+1 t

Now, according to the Borel – Cantelli lemma, there exists an almost ¯ ¯sure event A ε such that for all ω ∈ A ε ,
there exists a threshold n ω such that for all n ≥ n ω , sup2n ≤t ≤2n+1 ¯ B t t(ω) ¯ ≤ ε. Thus, for all ε > 0, there exists an
¯ ¯

a.s. event A ε such that for all ω ∈ A ε , there exists t ω , such that for all t ≥ t ω ,
¯ ¯
¯ B t (ω) ¯
¯ t ¯ ≤ ε.
¯ ¯

A 1/r , on which limt →∞ Btt = 0.


T∞
It remains to consider the almost sure event A = r =1 ■

Corollary 3.4.2. Invariance by time inversion.

If B = (B t )t ≥0 is a BM on R with B 0 = 0 then X = (t B 1/t )t ≥0 with the convention X 0 = 0 is also BM.

Proof. The process X is Gaussian, centered, with E(X s X t ) = s ∧ t for all s, t ≥ 0. It remains to prove that X is
continuous. By definition X is continuous on (0, ∞). It remains to prove the almost sure continuity at t = 0.
This follows from Theorem 3.4.1, namely, almost surely, limt →0+ X t = limt →0+ t B 1/t = limt →+∞ Btt = 0. ■

38/142
3.4 Strong law of large numbers, invariance by time inversion, law of iterated logarithm

Theorem 3.4.3. Law of Iterated Logarithm.

If (B t )t ≥0 is a Brownian motion on R then


³ Bt Bt ´
P lim p = −1, lim p =1 =1
t ↘0 2t log(log(1/t )) t ↘0 2t log(log(1/t ))

and ³ Bt Bt ´
P lim p = −1, lim p = 1 = 1.
t →∞ 2t log(log(t )) t →∞ 2t log(log(t ))

This can be skipped at first reading.

Proof. The second property follows from the first one by using invariance by time inversion (Corol-
lary 3.4.2). Let us prove the first property. We can assume without loss of generality that B 0 = 0. Since
the intersection of two almost sure events is almost sure, and since the law of B is symmetric in the
sense that −B and B have same law, it follows that it suffices to show that
³ Bt ´
P lim p = 1 = 1.
t ↘0 2t log(log(1/t ))

Let us first prove that


³ Bt ´
P lim p ≤ 1 = 1. (⋆)
t ↘0 2t log(log(1/t ))

2t log(log(1/t )). For all α > 0 and β > 0, the Doob maximal inequality of Theo-
p
Let us define h(t ) =
α2
rem 2.5.7 used for the “exponential” martingale (eαB t − 2
t
)t ≥0 gives, for all t ≥ 0,
³ ³ α ´ ´ ³ α2
´
P max B s − s > β = P max eαB s − 2 s ≥ eαβ ≤ e−αβ .
s∈[0,t ] 2 s∈[0,t ]

For all θ, δ ∈ (0, 1) and n ≥ 1, this inequality with t = θ n , α = (1 + δ)h(θ n )/θ n and β = h(θ n )/2 gives
³ ³ (1 + δ)h(θ n ) ´ h(θ n ) ´
P max B s − s > = O n→∞ (n −(1+δ) ).
s∈[0,θ n ] 2θ n 2

By the Borel – Cantelli lemma, we get that for almost all ω ∈ Ω, there exists n ω such that for all n ≥ n ω ,
³ (1 + δ)h(θ n ) ´ 1
max B s − s ≤ h(θ n ).
s∈[0,θ n ] 2θ n 2

This inequality implies that for all t ∈ [θ n+1 , θ n ],

1 (2 + δ)h(t )
B t (ω) ≤ max B s (ω) ≤ (2 + δ)h(θ n ) ≤ p .
s∈[0,θ n ] 2 2 θ

Therefore ³ Bt 2+δ´
P lim p ≤ p = 1.
t ↘0 2t log(log(1/t )) 2 θ
Now we let θ → 1 and δ → 0 to get (⋆). It remains to prove that
³ Bt ´
P lim p ≥ 1 = 1.
t ↘0 2t log(log(1/t ))

For that, for all n ≥ 1 and θ ∈ (0, 1), we define the event
p
A n = {ω ∈ Ω : B θn (ω) − B θn+1 (ω) ≥ (1 − θ)h(θ n )}.

39/142
3 Brownian motion

p p
We have, denoting a n = (1 − θ)h(θ n )/(θ n/2 1 − θ),

1
Z ∞ u2 an 2
an ³ 1+θ−2pθ ´
P(A n ) = p e− 2 du ≥ 2
e− 2 = O n→∞ n − 1−θ .
2π an 1 + an

Thus n≥1 P(A n ) = +∞. Now the independence of the increments of B and the Borel – Cantelli
P

lemma give that almost surely, for an infinite number of values of n, we have
p
B θn − B θn+1 ≥ (1 − θ)h(θ n ).

But the first part of the proof gives, for almost all ω ∈ Ω, that there exists n ω such that for all n ≥ n ω ,
p
B θn+1 > −2h(θ n+1 ) ≥ −2 θh(θ n ).

Therefore, almost surely, for an infinite number of values of n, we have


p
B θn > h(θ n )(1 − 3 θ).

This gives
³ Bt p ´
P lim p ≥ 1 − 3 θ = 1.
t ↘0 2t log(log(1/t ))
It remains to send θ to 0. Note that this proof uses both sides of the Borel – Cantelli lemma. ■

Corollary 3.4.4. Regularity of Brownian motion sample paths.

If (B t )t ≥0 is a Brownian motion on R then for all s ≥ 0, we have


³ B t +s − B s B t +s − B s ´
P lim p = −1, lim p = 1 = 1.
t ↘0 2t log(log(1/t )) t ↘0 2t log(log(1/t ))

In particular almost surely the sample paths t ∈ R+ 7→ B t of B are not 21 -Höldera continuous on finite
intervals and in particular are nowhere differentiable on R+ .
a f : I → R is γ-Hölder continuous when (∀ε > 0)(∃η > 0)(∀s, t ∈ I )(|s − t |γ ≤ η ⇒ | f (s) − f (t )| ≤ ε).

Proof. Follows from Theorem 3.4.3 and the fact that (B t +s − B s )t ≥0 and (B t )t ≥0 have same law. ■

3.5 Strong Markov property, reflection principle, hitting time

If (B t )t ≥0 is BM then we easily check that for all fixed T > 0, the process (B t +T − B T )t ≥0 is a BM, issued
form the origin, independent of FT . This is the simple Markov property. It extends to stopping times T :

Theorem 3.5.1. Strong Markova property.


a Named after Andrey Markov (1856 – 1922), Russian mathematician.

If B = (B t )t ≥0 is a d -dimensional Brownian motion issued from the origin, then for all stopping time
T such that P(T < ∞) > 0, under the probability measure P(· | T < ∞), the following properties hold:
1. ((B t +T − B T )1{T <∞} )t ≥0 is a Brownian motion issued from the origin, independent of FT

2. For all measurable and bounded f : Rd → R, we have, for all t > 0,


E( f (B t +T )1{T <∞} | FT ) = P t ( f )(B T )1{T <∞}
where
1 |x−y|2
Z
P t ( f )(x) = E( f (x + B t )) = p e− 2t f (y)dy = (p t ∗ f )(x).
( 2πt )d Rd

40/142
3.5 Strong Markov property, reflection principle, hitting time

We say then that Brownian motion is a strong Markov process.

Proof. Suppose first that P(T < ∞) = 1. Let us define B ∗ = (B T +t − B T )t ≥0 . For all n ≥ 1, let us define
X k +1
Tn = n
1T ∈£ kn , k+1 ¢.
k≥0 2 2 2n

We have that T ≤ Tn , and Tn takes its values in the set of dyadics D n = {k/2n : k ≥ 0}. We check easily that
Tn is a stopping time, and that Tn ↘ T as n → ∞. Let A ∈ FT , m ≥ 0, and 0 = t 0 < · · · < t m < ∞. By the
dominated convergence theorem, we have, for all continuous and bounded ϕ : (Rd )m → R,

E(1 A ϕ(B t∗1 , . . . , B t∗m )) = E(1 A ϕ(B t1 +T − B T , . . . , B tm +T − B T ))


= lim E(1 A ϕ(B t1 +Tn − B Tn , . . . , B tm +Tn − B Tn )).
n→∞

Moreover, for all n ≥ 1, we have A ∈ FT ⊂ FTn since T ≤ Tn and, using the fact that A ∈ FTn ,

E(1 A ϕ(B t1 +Tn − B Tn , . . . , B tm +Tn )) = E(1 A∩{Tn =r } ϕ(B t1 +r − B r , . . . , B tm +r − B r ))


X
r ∈D n

E(1 A∩{Tn =r } E(ϕ(B t1 +r − B r , . . . , B tm +r − B r ) | Fr ))


X
=
r ∈D n

P(A ∩ {Tn = r })E(ϕ(B t1 +r − B r , . . . , B tm +r − B r ))


X
=
r ∈D n

= P(A)E(ϕ(B t1 − B 0 , . . . , B tm − B 0 )).

This implies the first property since (B t − B 0 )t ≥0 is a Brownian motion issued from the origin. Note that
this proves in the same time the fact that B ∗ has the law of B and is independent of FT . To prove only the
identity in law, we can remove 1 A in other words take A = Ω.
The second property follows immediately from the first one, namely since for all t ≥ 0, B t∗ is independent
of FT while B T is measurable with respect to FT we get, using Remark 1.5.2,

E( f (B t +T ) | FT ) = E( f (B t∗ + B T ) | FT ) = g t (B T )

where
g t (x) = E( f (x + B t∗ )) = E( f (x + B t )) = (p t ∗ f )(x).
Finally, for a T taking values in [0, +∞], the same argument works with A replaced by A ∩ {T < ∞}. ■

This can be skipped at first reading.

Corollary 3.5.2: Reflection principle

Let B be a one-dimensional Brownian motion issued from the origin. For all t ≥ 0, let us define
S t = sups∈[0,t ] B s . Then, for all t ≥ 0, the following properties hold:

• For all a ≥ 0 and all b ∈ (−∞, a], P(S t ≥ a, B t ≤ b) = P(B t ≥ 2a − b).

• The random variables S t and |B t | have same law.

The reflection principle simply says that on the event {T a ≤ t }, the probability of being, at time t ,
below level b = a − (a − b), is equal to the one of being above level a + (a − b), hence the name. This is
related to the fact that the process after time T a is again BM, which has a symmetric law.

Proof.

• We know from Theorem 3.3.1 that T a = inf{t ≥ 0 : B t = a} < ∞ almost surely. We have

P(S t ≥ a, B t ≤ b) = P(T a ≤ t , B t ≤ b) = P(T a ≤ t , B t′ −Ta ≤ b − a)

with B t′ = B Ta +t − B Ta , where we have used in the last step B t′ −Ta = B Ta +t −Ta − B Ta = B t − a

41/142
3 Brownian motion

which makes sense on {T a ≤ t }. Now by the strong Markov property (Theorem 3.5.1), B ′ is
independent of T a and has the same law as B . Since B ′ and −B ′ have same law, it follows that
(T a , B ′ ) has the same law as (T a , −B ′ ). Also

P(T a ≤ t , B t′ −Ta ≤ b − a) = P(T a ≤ t , −B t′ −Ta ≤ b − a)


= P(T a ≤ t , −(B t − a) ≤ b − a)
= P(T a ≤ t , B t ≥ 2a − b)
= P(B t ≥ 2a − b)

where we have use in the last step the fact that {T a ≤ t } contains a.s. {B t ≥ 2a − b}.

• Follows from the first identity with b = a, the inequality B t ≤ S t , and the fact that B t and −B t
have same law, which give, for all a ≥ 0,

P(S t ≥ a) = P(S t ≥ a, B t ≤ a) + P(S t ≥ a, B t ≥ a)


= P(B t ≥ a) + P(B t ≥ a)
= P(B t ≥ a) + P(B t ≤ −a)
= P(|B t | ≥ a).

Corollary 3.5.3: Densities

Let (B t )t ≥0 be a one-dimensional Brownian motion issued from the origin.

• For all t > 0, the law of the couple (sups∈[0,t ] B s , B t ) has density

2(2a − b) − (2a−b)2
(a, b) ∈ R2 7→ p e 2t 1a≥0,b≤a .
2πt 3

a2
• For all a ∈ R, the law of T a = inf{t ≥ 0 : B t = a} is equal to the law of B 12
with density

|a| a2
t ∈ R 7→ p e− 2t 1t >0 .
2πt 3

The law of T a is known as the Lévy or Bachelier distribution, special case of T a,0 of Corollary 3.8.4. It
is, up to scaling by a 2 , an inverse χ2 distribution.

Proof.

• Direct consequence of Corollary 3.5.2.

• Thanks to Corollary 3.5.2, we have, for all t ≥ 0, denoting S t = sups∈[0,t ] B s ,

P(T a ≤ t ) = P(S t ≥ a) = P(|B t | ≥ a) = P(B t2 ≥ a 2 ) = P(t B 12 ≥ a 2 ) = P(a 2 /B 12 ≤ t ).

See Corollary 3.3.2 for the law of T a via stopped martingales instead of Markov property.

3.6 A construction of Brownian motion

A natural and intuitive idea to construct Brownian motion is to try to realize it as a scaling limit of a ran-
dom walk with Gaussian increments. More precisely, if (X n )n≥1 are independent and identically distributed

42/142
3.6 A construction of Brownian motion

real random variables with law N (0, 1), then this would consist for all n ≥ 1 to define the Gaussian process
(X tn )t ≥0 obtained by linear interpolation as

X 1 + · · · + X ⌊nt ⌋ ³ ⌊nt ⌋ ´
X tn = p ∼ N 0, ,
n n

and to consider the limit in law of (X tn1 , . . . , X tnk ) as n → ∞, for all k ≥ 1 and 0 ≤ t 1 ≤ · · · ≤ t k . Actually X tn is a
good approximation for numerical simulation. The central limit phenomenon suggests that the Brownian
motion scaling limit is the same if we start from non Gaussian ingredients: we only need zero mean and
unit variance. Such a functional central limit phenomenon is known as the Donsker invariance principle.
From this point of view, Brownian motion is just a universal Gaussian limiting object.
Beyond intuition, the mathematical existence of Brownian motion is not obvious. Historically, Norbert
Wiener seems to be the first scientist to give a rigorous construction, around 1923, and for this reason,
Brownian motion is sometimes called the Wiener process. For more information on history, see [48, 12].
The construction of Brownian motion provided below is based on another very natural idea: by seeing
Brownian motion as an infinite family of orthogonal Gaussian random variables, we could start from our
favorite infinite dimensional Hilbert space, such as L2 (R, dx), and construct a Gaussian random variable by
using a linear combination of the elements of a Hilbert basis with Gaussian i.i.d. weights. This will produce
a Gaussian process with the desired covariance. It will then remain to obtain the continuity, which can be
done by using a general tightness criterion on the increments due to Kolmogorov.

Theorem 3.6.1. Pre-Brownian motion or Gaussian measures.

Let us consider the Hilbert space G = L2 (R, dx) and


Z
〈 f , g 〉G = f (x)g (x)dx, f , g ∈ G.

Then there exists a centered Gaussian family Be = (Beg )g ∈G defined on a probability space (Ω, A , P)
such that g ∈ G 7→ Beg ∈ L2 (Ω, A , P) is a linear isometry, in other words for all f , g ∈ G and α, β ∈ R,

E(Bef Beg ) = 〈 f , g 〉G and Beα f +βg = αBef + βBeg .

Proof. Let (X n )n≥0 be i.i.d. real random variables with law N (0, 1), defined on a probability space (Ω, A , P),
and let (e n )n≥0 be an orthonormal sequence of the Hilbert space G = L2 (R, dx). For all g ∈ G, the series


X
Beg = X n 〈g , e n 〉G
n=0

is well defined in L2 (Ω, A , P). Indeed the Cauchy criterion is satisfied:

³³ p+q ´2 ´ p+q
E 〈g , e n 〉2G −→ 0.
X X
X n 〈g , e n 〉G =
n=p n=p p,q→∞

We see from Lemma 3.6.2 that Be is a centered Gaussian random variable and that

∥Beg ∥2 = E((Beg )2 ) = 〈g , g 〉G = ∥g ∥2G

hence g 7→ Beg is an isometry. Its linearity is immediate. By polarization we get, for all f , g ∈ G,

°2 ° °2
4E(Bef Beg ) = E((Bef + Beg )2 ) − E((Bef − Beg )2 ) = E(Be2f +g ) − E(Be2f −g ) = ° f + g °G − ° f − g °G = 〈 f , g 〉G .
°

The sub-space H = span{Beg : g ∈ G} of L2 (Ω, A , P) is isomorphic via g 7→ Beg to G = L2 (R, dx).

43/142
3 Brownian motion

Lemma 3.6.2. Convergence of Gaussians.

P
If (X n )n are Gaussian real random variables with X n −→ X for a random variable X , then the conver-
n→∞
gence holds in Lp for all p ≥ 1, X ∼ N (m, σ2 ), limn→∞ E(X n ) = m, and limn→∞ E((X n − E(X n ))2 ) = σ2 .

Proof of Lemma 3.6.2. Let us show now that X is Gaussian. Since X n → X in law, we get, for all t ∈ R,
t2
ϕ X n (t ) = eit mn − 2 σn → ϕ X (t ). Thus, for all ε > 0, |ϕ X n (ε)| = exp(− ε2 σ2n ) → |ϕ X (ε)|. Since ϕ X is a charac-
2 2

teristic function, it is non vanishing in a neighborhood of the origin, and thus σn → σ∗ for some σ∗ ≥ 0. It
follows in turn that for all t ∈ R, eit mn → eit m∗ for some m ∗ . Now by dominated convergence,
p m2
− 2n
Z 2
it m n − t2
Z
t2 p 2
m∗
2πe = e e dt −→ eit m∗ e− 2 dt = 2πe− 2
R n→∞ R

thus m n → m ∗ . Hence X ∼ N (m ∗ , σ2∗ ), and (m ∗ , σ2∗ ) = (m, σ2 ). Finally, for all p ≥ 1, since E(|X n |p ) is a
continuous function of m n and σn , it is bounded in n, thus (X n )n is u.i. and therefore X n → X in Lp . ■

With Be being as in Theorem 3.6.1, let us define, for all t ≥ 0, the random variable

B t = Be1[0,t ] .

Now B = (B t )t ≥0 is a centered Gaussian process, with covariance given for all s, t ≥ 0 by

E(B s B t ) = 〈1[0,s] , 1[0,t ] 〉L2 (R,dx) = s ∧ t .

However the “pre-BM” B has no reason to be continuous. Let us remark however that for all 0 ≤ s < t ,

Bt − Bs
p ∼ N (0, 1), thus1 E((B t − B s )2n ) = c n (t − s)n .
t −s

The fourth moment case n = 2 allows, thanks to Theorem 3.6.3 below (p = 4, ε = 1, γ < ε/p = 1/4), to
construct a continuous modification B ∗ of B , which is a Brownian motion on R issued from the origin.
Moreover using the higher moments for all values of n gives the optimal Hölder regularity: γ < n−1 1
2n → 2 .

Theorem 3.6.3. Kolmogorov continuity criterion.

Let X = (X t )t ≥0 be a process defined on a probability space (Ω, A , P) taking its values in a Banach
space B with norm ∥·∥, and such that the following tightness of increments property holds: there
exist p ≥ 1, ε > 0, and c > 0 such that for all s, t ≥ 0,

E(∥X t − X s ∥p ) ≤ c|t − s|1+ε .

Then there exists a modificationa of X that is a continuous process whose trajectories are, on each
finite interval, γ-Hölder continuous for all γ ∈ [0, ε/p), in the sense that a.s. for all t > 0, there exists a
constant C = C (ω, t ) > 0 such that for all u, v ∈ [0, t ] and all η, if |u − v| ≤ η then |X u − X v | ≤ C ηγ .
a There exists X ∗ = (X ∗ ) ∗
t t ≥0 such that for all t ≥ 0, X t = X t as random variables in other words almost surely.

Proof. The proof is an instance of the chaining method or technique, invented by Kolmogorov. It suffices to
prove the result on a finite time interval [0, t ]. Let us first show that X is Hölder continuous on the dyadics
D = ∪n≥0 Dn where Dn = {t k/2n : k ∈ {0, . . . , 2n }} ⊂ Dn+1 . For notation simplicity we take t = 1. For all n ≥ 1,
all ε > 0, and all γ > 0, the Markov inequality gives
µ ¶ 2n ³ ´
−γn
P max ∥X P ∥X ∥ ≥ 2−γn
X
k − X k−1 ∥ ≥ 2 ≤ k − X k−1
1≤k≤2n 2n n
2 2n n
2
k=1

1 We have c 2n (2n)!
n = E(Z ) = 2n n! where Z ∼ N (0, 1) but this explicit formula for c n is useless for our purposes.

44/142
3.7 Wiener integral

2n ³ ´
2γpn E ∥X k n − X k−1 p
X
≤ n

2 2
k=1
n −n(1+ε)+γpn
≤ c2 2 = c2−n(ε−γp) .

Now, by taking ε > γp we get



P( max ∥X ∥ ≥ 2−γn ) < ∞.
X
k − X k−1
1≤k≤2n 2n n 2
n=1
Thus, the Borel – Cantelli lemma provides A ∈ A such that P(A) = 1 and for all ω ∈ A, there exists n ω such
that for all n ≥ n ω , we have max1≤k≤2n ∥X kn − X k−1
n
∥ ≤ 2−γn . Hence there exists a random variable C such that
2 2

C < ∞ a.s. and max ∥X k − X k−1 ∥ ≤ C 2−γn .


1≤k≤2n 2n 2n

Let us prove that on A, the paths of X are γ-Hölder continuous on D, Let s, t ∈ D with s ̸= t and n ≥ 0 such
that |s − t | ≤ 2−n . Let (s k )k≥1 be increasing, with s k = s for k large enough (stationarity), and s k ∈ Dk for all k.
Let (t k )k≥1 be a similar sequence for t , such that s k and t k are neighbors in Dn for all k. Then

X ∞
X
Xt − Xs = (X tk+1 − X tk ) + X sn − X tn − (X sk+1 − X sk ),
k=n k=n

where the sums are actually finite since the sequences are stationary. Now

X ∞
X
∥X t − X s ∥ ≤ ∥X tk − X tk+1 ∥ + ∥X sn − X tn ∥ + ∥X sk+1 − X sk ∥,
k=n k=n

and thus
∞ ∞ 2C
∥X t − X s ∥ ≤ C 2−γn + 2 C 2−γ(k+1) ≤ 2C 2−γk ≤ 2−γn ,
X X
k=n k=n 1 − 2−γ
−n ′ −γn
meaning that |s − t | ≤ 2 implies ∥X t − X s ∥ ≤ C 2 for some random variable C ′ . Thus, on A, the sample
paths of X are γ-Hölder continuous on D. The set D is dense in R+ . Now for all ω ∈ A, let t 7→ X t∗ (ω) be the
unique continuous function2 agreeing with t 7→ X t (ω) on D.
It remains to show that X ∗ is a modification of X . By construction, X t = X t∗ for all t ∈ D. Let t ∈ R+ . Since
D is dense in R+ , there exists (t n )n in D with limn→∞ t n = t , thus limn→∞ X tn = X t in Lp ((Ω, A , P), (B, ∥·∥))
thanks to the hypothesis. Hence there exists a subsequence (t nk )k such that limk→∞ X tnk = X t almost surely
(here we use (B, ∥·∥)). Finally, the continuity of X ∗ gives X tnk = X t∗n → X t∗ = X t almost surely as k → ∞.
k

Corollary 3.6.4. Existence.

One-dimensional Brownian motion exists, and thus d -dimensional Brownian motion for all d ≥ 1.
Moreover, almost surely, the trajectories of real Brownian motion are, on each finite time interval,
Hölder continuous of order γ for all γ ∈ (0, 1/2), not more.

Proof. Theorem 3.6.3 with p = 2n and n → ∞ gives γ ∈ (0, 1/2), while Theorem 3.4.4 gives the optimality. ■

3.7 Wiener integral

We know that every finite and deterministic linear combination of the increments of Brownian motion
is a Gaussian random variable. More generally, this phenomenon should remain valid for infinite determin-
istic linear combinations provided square integrability. Indeed, the Wiener integral introduced in Theorem
3.7.1 gives a meaning to the Gaussian random variable
Z ∞
ω ∈ Ω 7→ g (s)dB s (ω)
0

where the integrator (B t )t ≥0 is a d -dimensional Brownian motion and the integrand g is in L2Rd (R+ , dx). The
integrand is deterministic and square integrable, while the integrator is random and Gaussian.
2 We can use here the following general property of metric spaces: if S and T are metric spaces with S complete, if D is a dense

subset of S, and if f : D → T is uniformly continuous, then there exists a unique continuous fe : S → T that agrees with f on D.

45/142
3 Brownian motion

Theorem 3.7.1. Wiener integral.

Let B = (B t )t ≥0 = ((B t1 , . . . , B td ))t ≥0 be a d -dimensional Brownian motion issued from 0, defined on


(Ω, A , P). Let G be the Gaussian sub-space of L2 (Ω, P) generated by the real random variables {B ti :
t ≥ 0, 1 ≤ i ≤ d }. Then there exists a unique map I : L2Rd (R+ , dx) → G such that:

1. If g = a1(s,t ] with 0 ≤ s ≤ t and a ∈ Rd then I (g ) = a · (B t − B s )

2. I is an isometry (and thus continuous) in the sense that for all f and g in L2Rd (R+ , dx), we have
Z ∞
E(I ( f )I (g )) = f (s) · g (s)ds
|0
| {z }
〈I ( f ),I (g )〉L2 (Ω,P) {z }
〈 f ,g 〉L2 (R+ ,dx)
Rd

3. I is linear and bijective

The Wiener integral of g is the random variable I (g ) and we denote


Z ∞
I (g )(ω) = g (s)dB s (ω).
0

Proof. The following sub-space


n n o
S = f ∈ L2Rd (R+ , dx) : f = a i 1(ti ,ti +1 ] , t 0 = 0 < t 1 < · · · < t n , n ≥ 0, a i ∈ Rd
X
i =0

of L2Rd (R+ , dx) is dense. If f ∈ S, then f =


P
finite a i 1(t i ,t i +1 ] , and we define
X
I(f ) = a i · (B ti +1 − B ti ).
finite

This definition does not depend on the decomposition chosen for f , and the map f 7→ I ( f ) is linear. More-
over, we remark that thanks to the properties of Brownian motion, we have,

E((I ( f ))2 ) = E((a i · (B ti +1 − B ti ))(a j · (B t j +1 − B t j )))


X
i,j

E((a i · (B ti +1 − B ti ))2 )
X
=
i
X ³³ X ´2 ´
= E a i , j (B ti +1 , j − B ti , j )
i j

a i , j a i ,k E((B ti +1 , j − B ti , j )(B ti +1 ,k − B ti ,k ))
XX
=
i j ,k
XX
= a i , j a i ,k (t i +1 − t i )1 j =k
i j ,k

|a i |2 (t i +1 − t i )
X
=
i
Z ∞
= | f (x)|2 dx.
0

Since S is dense, I can be extended by continuity to the whole space L2Rd (R+ , dx). Namely, for all f ∈
L2Rd (R+ , dx), there exists a sequence ( f n )n is S such that ∥ f n − f ∥ → 0. Therefore

∥ f n − f m ∥ = ∥I ( f n ) − I ( f m )∥L2 (Ω,A ,P) −→ 0.


m,n→∞

Set I ( f ) = limn→∞ I ( f n ). This limit does not depend on the sequence ( f n )n used to approximate f . Moreover
∥ f ∥22 = E((I ( f ))2 ), and, by polarization, using the linearity of I , we have, for all f , g ∈ L2Rd (R+ , dx),
Z ∞
1 ∞ 1
Z
f (s)g (s)ds = (( f + g )2 − ( f − g )2 )(s)ds = E((I ( f + g ))2 − (I ( f − g ))2 ) = E(I ( f )I (g )).
0 4 0 4

46/142
3.8 Wiener measure, canonical Brownian motion, Cameron – Martin formula

The map I defined this way is unique. It is injective since it is an isometry. Finally it is surjective since
F = I (L2Rd (R+ , dx)) is a closed sub-space of G while F is dense in G (taking g = e i 1(0,t ] gives B ti ∈ F ). ■

Note that L2 (Ω) \ G is huge, and most square integrable variables are not Gaussian random variables
obtained as square integrable linear combinations of increments of Brownian motion!

Corollary 3.7.2. Properties of the Wiener integral.

1. For all f ∈ L2Rd (R+ , dx), I ( f ) ∼ N (0, ∥ f ∥22 )


R∞
2. For all f , g ∈ L2Rd (R+ , dx), I ( f ) and I (g ) are independent iff 〈 f , g 〉L2 (R+ ,dx) = 0 f (s)g (s)ds = 0
Rd

3. For all t ≥ 0 and 1 ≤ i ≤ d and f ∈ L2Rd (R+ , dx), we have


µ Z ∞ ¶ Z t
E B ti f (s)dB s = f i (s)ds
0 0

where f i (s) is the i -th coordinate of f (s) = ( f 1 (s), . . . , f d (s))

4. For all f ∈ L2Rd (R+ , dx), (I ( f 1(0,t ] ))t ≥0 is a martingale for the natural filtration of (B t )t ≥0

5. If ( f n )n≥0 is an orthonormal basis of L2Rd (R+ , dx), then (I ( f n ))n≥0 are i.i.d. standard Gaussian
real random variables and for all t ≥ 0, we have the following expansion in L2 (Ω, A , P):

X ³Z ∞ ´Z t
Bt = f n (s)dB s f n (s)ds .
n≥0 | 0 {z } | 0 {z }
I ( fn ) deterministic
Gaussian r.r.v. vector

The Wiener integral produces plenty of martingales from Brownian motion! These martingales are con-
tinuous, but we prove this later on for the more general concept of Itô stochastic integral (Theorem 5.2.2).

Proof. The first two items are immediate.

3. Take g = e i 1(0,t ] then by definition of I we have I (g ) = B ti and


µ Z ∞ ¶ Z ∞ Z t
i
E Bt f (s)dB s = E(I (g )I ( f )) = g (s) · f (s)ds = f i (s)ds.
0 0 0

4. For all f ∈ L2Rd (R+ , dx), all t ≥ 0, and all s ∈ [0, t ], M s,t = I ( f 1(s,t ] ) is the L2 limit of linear combinations
d
finite a i (B v i − B u i ) with a i ∈ R and u i , v i ∈ (s, t ]. In particular it is integrable, centered, measurable
P

for Ft , and independent of Fs . Now M 0,t = M 0,s + M s,t and thus E(M 0,t | Fs ) = M 0,s + E(M s,t ) = M 0,s .

5. If ( f n )n≥0 is an orthonormal basis of L2Rd (R+ , dx) then (I ( f n ))n≥0 is an orthonormal basis of the Gaus-
Rt
sian space G and moreover, by the previous property, 〈B ti , I ( f n )〉G = 0 f ni (s)ds. Note that (I ( f n ))n≥0 is
orthonormal in L2 (Ω, P) but is not a basis: the closure of its span is G ⊊ L2 (Ω, P).

3.8 Wiener measure, canonical Brownian motion, Cameron – Martin formula

Let (B t )t ≥0 be an arbitrary d -dimensional Brownian motion issued from 0, and defined on a probability
space (Ω, A , P). Since (B t )t ≥0 is a continuous process, we know, from Theorem 2.1.3, that we can consider
(B t )t ≥0 as a random variable from (Ω′ , A ′ , P) to (W, BW ) where W = C (R+ , Rd ) is equipped with the topology
of uniform convergence on every compact subset of R+ and where BW is the associated Borel σ-algebra.
As a random variable on trajectories, Brownian motion is not unique. We can construct an infinite num-
ber of versions of it. What is unique is its law µ. This law is known as the Wiener measure. There exists how-
ever a special realization of Brownian motion as a random variable, which is called the canonical Brownian

47/142
3 Brownian motion

motion, defined on a canonical space (W, BW , µ). Namely, on the probability space (W = C (R+ , Rd ), Bw , µ),
where µ is the Wiener measure, let us consider the coordinates process π = (πt )t ≥0 defined by

πt (w) = w t

for all t ≥ 0 and w ∈ C (R+ , Rd ). Under µ, the process π is a d -dimensional Brownian motion issued from the
origin. It is called the canonical Brownian motion.

Theorem 3.8.1. Wiener measure.

There exists a unique probability measure µ on the canonical space (W, Bw ), called the Wiener mea-
sure, such that for all n ≥ 1, 0 < t 1 < · · · < t n , A 1 , . . . , A n ∈ BRd ,
Z
µ({w ∈ W : w t1 ∈ A 1 , . . . , w tn ∈ A n }) = p t1 −t0 (x 1 − x 0 ) · · · p tn −tn−1 (x n − x n−1 )dx 1 · · · dx n
A 1 ×···×A n

where t 0 = 0, x 0 = 0, p is the heat or Gaussian kernel defined for all t > 0 and x ∈ Rd by

|x|2
e− 2t
p t (x) = p .
( 2πt )d

Moreover for all d -dimensional Brownian motion B = (B t )t ≥0 issued from the origin, we have, for all
measurable and bounded or positive Φ : W → R,
Z
E(Φ(B )) = Φ(w)µ(dw).
W

Proof. We know how to construct a d -dimensional Brownian motion B = (B t )t ≥0 issued form the origin. If µ
is the law of B seen as a random variable taking values on the canonical space (W, BW ), then it is immediate
to get the first desired property since

µ(B t1 ∈ A 1 , . . . , B tn ∈ A n ) = µ({w ∈ W : w t1 ∈ A 1 , . . . , w tn ∈ A n }).

Finally µ is unique because it is entirely determined on the family C of cylindrical subsets of W, which is
stable by finite intersections and generates BW (monotone class argument, Corollary 1.8.4). ■

Cameron – Martin formula

A natural motivation is the study of the solution X of the stochastic differential equation
Z t Z t
Xt = X0 + σ(s)dB s + b(s, X s )ds, t ≥ 0,
0 0

where σ, b, and the Brownian motion B are given. Provided that it is well defined, we would like to know if
the solution X has a density with respect to B , on the space of trajectories. The Cameron – Martin formula
provides an answer in the special case in which X 0 = 0, σ(s) = I d and b(s, x) = b(s), for which we have
Z t
X t = Bt + b(s)ds.
0

This corresponds to study the density of the shifted Wiener measure with respect to the initial Wiener mea-
sure. But let us examine first the case of shifted finite dimensional Gaussian distributions. If Z ∼ N (0, I n )
n 1 2
is a standard Gaussian random variable on Rn with density x 7→ γn (x) = (2π)− 2 e− 2 |x| with respect to the
Lebesgue measure, then, for all h ∈ Rn and all bounded and measurable Φ : Rn → R,
|h|2 |h|2
Z Z Z ³ ´

E(Φ(Z + h)) = Φ(x + h)γn (x)dx = ′ ′
Φ(x )γn (x − h)dx =′
Φ(x ′ )γn (x ′ )ex ·h− 2 dx ′ = E Φ(Z )e Z ·h− 2 .
Rn Rn Rn

This nice “translation formula” can be interpreted in terms of Laplace transform or Gaussian integration
by parts. Moreover it turns out that this formula has a counterpart for Brownian motion and the Wiener

48/142
3.8 Wiener measure, canonical Brownian motion, Cameron – Martin formula

measure, which is an infinite dimensional Gaussian distribution, provided that the translation h belongs to
a special space known as the Cameron – Martin space.
The Cameron – Martin space is defined by the set of continuous functions h which are the integral of a
square integrable function denoted ḣ, namely
½ Z t ¾
2
H = h ∈ W : ∀t ≥ 0, h(t ) = ḣ(s)ds, ḣ ∈ LRd (R+ , dx) .
0

This is a subspace of W = C (R+ , Rd ). Note that every element h ∈ H is differentiable and its derivative is
ḣ, hence the notation. Moreover the representation is unique in the sense that for all h 1 , h 2 ∈ H , h 1 = h 2
implies ḣ 1 = ḣ 2 . We equip H with the scalar product
Z ∞
〈h 1 , h 2 〉H = ḣ 1 (s) · ḣ 2 (s)ds.
0

This makes H a Hilbert space isomorphic to L2Rd (R+ , dx). For every h ∈ H we denote
Z ∞
|h|2H = |ḣ(s)|2 ds = ∥ḣ∥2L2 (R+ ,dx)
.
0 Rd

Let (h n )n≥0 be an orthonormal basis of H and for all n ≥ 0 and t ≥ 0,


Z t
h n (t ) = ḣ n (s)ds.
0

The sequence (ḣ n )n≥0 is an orthonormal basis of L2Rd (R+ , dx). Let π = (πt )t ≥0 be the canonical Brownian
motion on Rd and let us define, for almost all w ∈ W and h ∈ H, using the Wiener integral of Theorem 3.7.1,
Z ∞ Z ∞
(w, h) = ḣ(s)dπs (w) = ḣ(s)dw s .
0 0

Now Corollary 3.7.2 gives that (w, h n ), n ≥ 0, are i.i.d. standard Gaussian real random variables, and for all
fixed t ≥ 0 we have the following expansion in L2 (W, BW , µ):

πt (w) = w t =
X
(w, h n )h n (t ).
n≥0

Here t is fixed and h n (t ) is a deterministic vector, while (w, h n )n≥0 are random and orthogonal in L2 (W, BW , µ).
Recall that a notion of density of Wiener measure would require a notion of Lebesgue measure on
Wiener space, which is missing3 . However the shift of the Wiener measure in a direction picked in the
Cameron – Martin space is absolutely continuous with respect to the Wiener measure, with explicit density.
This is an infinite dimensional analogue of the formula above for finite dimensional Gaussian laws.

Theorem 3.8.2. Camerona – Martinb formula and density of shifts.


a Named after Robert Horton Cameron (1908 – 1989), American mathematician.
b Named after William Ted Martin (1911 – 2004), American mathematician.

Let W and µ be the Wiener space and measure, Φ : W → R be measurable and bounded, and h be in
the Cameron – Martin space H. Then we have the Cameron – Martin formula:
Z Z
1 2
Φ(w + h)µ(dw) = Φ(w)e(w,h)− 2 |h|H µ(dw).
1 2
In other words, if B is canonical Brownian motion and F h (w) = e(w,h)− 2 |h|H , then
E(Φ(B + h)) = E(Φ(B )F h (B )),

3 It can be shown that on an infinite-dimensional separable Banach space equipped with its Borel σ-algebra, the only locally

finite and translation-invariant Borel measure is the trivial measure identically equal to zero. Equivalently, every translation-
invariant measure that is not identically zero assigns infinite measure to all open subsets. See for instance https://en.
wikipedia.org/wiki/Infinite-dimensional_Lebesgue_measure and references therein.

49/142
3 Brownian motion

It particular E(F h (B )) = 1 and the law of (B t + h)t ≥0 has density F h with respect to µ.

Proof. By a monotone class argument, we can assume without loss of generality that Φ(w) = f (w t1 , . . . , w tn )
where n ≥ 1 and 0 ≤ t 1 < · · · < t n and f : Rn → R measurable and bounded. Let h ∈ H, h ̸= 0. There exists an
orthonormal basis (h m )m≥0 of H such that h 0 = h/|h|H . For all ω ∈ W and m ≥ 1, let us define w (m) ∈ W by
m
w (m) = µ almost surely.
X
(w, h ℓ )h ℓ ,
ℓ=0

We have limm→∞ w (m) = w = n≥0 (w, h n )h n in L2 (W). Since Φ is continuous and bounded, it follows that
P

limm→∞ Φ(w (m) + h) = Φ(w + h) in probability, and thus, by dominated convergence,

lim Eµ (Φ(w (m) + h)) = Eµ (Φ(w + h)).


m→∞

Similarly we have, using the fact that (w, h) ∼ N (0, |h|2H ) under µ,

³ ³ |h|2 ´´ ³ ³ |h|2 ´´
lim Eµ Φ(w (m) ) exp (w, h) − H = Eµ Φ(w) exp (w, h) − H .
m→∞ 2 2
Also, to prove the desired formula, it suffices to show that for all m ≥ 1,
³ ³ |h|2 ´´
Eµ (Φ(w (m) + h)) = Eµ Φ(w (m) ) exp (w, h) − H .
2
This boils down to a simple computation in finite dimension. Namely, since

h m
w (m) + h = ((w, h 0 ) + |h|H )
X
+ (w, h ℓ )h ℓ ,
|h|H ℓ=1

where (w, h ℓ ), ℓ ≥ 0 are independent and identically distributed with law N (0, 1), we have

1
Z ³ m ´ 1 Pm 2
Eµ (Φ(w (m)
Φ (x 0 + |h|H )h 0 + x ℓ h ℓ e− 2 ℓ=0 xℓ dx 0 · · · dx m
X
+ h)) = p
( 2π)m+1 Rm+1 ℓ=0
1
Z ³ m ´ ′ 1 2 1 Pm ′2
Φ x 0′ h 0 + x ℓ′ h ℓ ex0 |h|H − 2 |h|H − 2 ℓ=0 xℓ dx 0′ · · · dx m

X
= p
( 2π)m+1 Rm+1 ℓ=0
|h|2
(w,h 0 )|h|H − 2H
= Eµ (Φ(w (m) )e )
|h|2
(w,h)− 2H
= Eµ (Φ(w (m) )e ).

Note that we do not need a regular Φ or f to perform a linear substitution (change of variable). ■

Corollary 3.8.3. Density of Cameron – Martin shifts.

Let B = (B t )t ≥0 be a d -dimensional Brownian motion issued from the origin defined on a probability
space (Ω, A , P), seen as a random variable of law µ taking values on the canonical space (W, BW ). For
all h ∈ H, let F h be as in Theorem 3.8.2. Then:

1. For all h ∈ H, we have E(F h (B )) = 1 and


³Z ∞ 1
Z ∞ ´
F h (B ) = exp ḣ s dB s − |ḣ s |2 ds .
0 2 0

2. For all h ∈ H, with respect to (Ω, A , Q) given by dQ = F h (B )dP, the shifted process

(B t − h t )t ≥0

is a d -dimensional Brownian motion issued from the origin.

50/142
3.8 Wiener measure, canonical Brownian motion, Cameron – Martin formula

3. For all h ∈ H, the law of the shifted process

(B t + h t )t ≥0

is absolutely continuous with respect to the Wiener measure µ, with density F h .

By definition of the Wiener measure µ, we have E(F h (B )) = Eµ (F h ). The first E is on the probability space
(Ω, A , P) used to define B while the second is on the canonical Wiener space (W, BW , µ). It is customary to
omit the dependency E = EP over the underlying probability space for an abstract random variable like B .
An extension of Corollary 3.8.3 to random shifts is given by the Girsanov theorem (Theorem 7.5.1).

Proof.
Z ∞
1. Since ḣ s dB s ∼ N (0, |h|2H ), the formula E(F h (B )) = 1 comes from the Laplace transform formula
0

³ ³Z ∞ ´´ ³ |h|2 ´
H
E exp ḣ s dB s = exp .
0 2

2. By using Theorem 3.8.2 in ⋆, denoting Φh = Φ(· − h),



E(Φ(B )) = E(Φ(B + h − h)) = E(Φh (B + h)) = E(Φh (B )F h (B )) = E(Φ(B − h)F h (B )) = EQ (Φ(B − h)).

3. Theorem 3.8.2 writes, for all measurable and bounded Φ : W → R,


³Z ∞ |h|2H ´´
³ Z
E(Φ(B + h)) = E Φ(B ) exp ḣ s dB s − = E(Φ(B )F h (B )) = Φ(w)F h (w)µ(dw),
0 2

hence the result. Note that taking Φ ≡ 1 gives also that E(F h (B )) = 1.
Alternatively, let Q be as above and W = B − h. Then W + h = B , and by definition of Q,

EQ (Φ(W + h)) = EQ (Φ(B )) = E(Φ(B )NT ),

and on the other hand, since h is deterministic and since W has the law of BM under Q, we have

EQ (Φ(W + h)) = E(Φ(B + h)).

This can be skipped at first reading.

Corollary 3.8.4: Hitting time of Brownian motion with drift

For all a > 0 and c ∈ R, the law of T a,c = inf{t ≥ 0 : B t + c t = a} is µ + pδ∞ where µ has density

a (a−cs)2
s ∈ R 7→ p e− 2s 1s>0 ,
2πs 3
and where (
0 if c ≥ 0
p = P(T a,c = ∞) = 2ac
.
1−e if c ≤ 0

The law µ is known as an inverse Gaussian or Wald4 distribution.


Several other facts and formulas about Brownian motion can be found in the book [6].

Proof. Let us define ḣ(s) = c1s≤t , which gives h(s) = c(s ∧t ). The Cameron – Martin formula of Corol-

51/142
3 Brownian motion

lary 3.8.3 with Φ(w) = 1{maxs∈[0,t ] w(s)≥a} gives

P(T a,c ≤ t ) = E(Φ(B + h))


³ ³Z ∞ 1 ∞
Z ´´
= E Φ(B ) exp ḣ(s)dB s − ḣ(s)2 ds
0 2 0
³ ³ c 2 ´´
= E 1{Ta,0 ≤t } exp cB t − t
2

³ ³ c2 ´´
= E 1{Ta,0 ≤t } exp cB t ∧Ta,0 − (t ∧ T a,0 )
2
³ ³ c2 ´´
= E 1{Ta,0 ≤t } exp c a − T a,0
2
Z t
⋆⋆ a a2 c2
= p e− 2s eca− 2 s ds
0 2πs 3
Z t
a (a−cs)2
= p e− 2s ds.
0 2πs 3

c2
We have used for ⋆ the Doob stopping (Theorem 2.5.1) with the martingale (ecB t − 2 t )t ≥0 to get

c2 c2
³ ´
E ecB t − 2 t | Ft ∧Ta,0 = ecB t ∧Ta,0 − 2 (t ∧Ta,0 ) ,

and for ⋆⋆ the density of T a,0 given by Corollary 3.5.3. The density of T a,c gives in turn the formula
for P(T a,c < ∞), which follows also from Doob stopping with the martingale (e−2c(B t +ct ) ){t ≥0} . ■
Named after Abraham Wald (1902 – 1950), Hungarian mathematician.

52/142
Chapter 4

More on martingales

For simplicity, this chapter is about continuous processes only.

4.1 Quadratic variation, square integrable martingales, increasing process

Definition 4.1.1. Quadratric variation if square integrable processes.

Let X = (X t )t ≥0 be a square integrable real process such that X 0 = 0. The quadratic variation process
[X ] = ([X ]t )t ≥0 of X is defined for all t ≥ 0 by the limit (when it exists)

P
(X tk+1 − X tk )2
X
[X ]t = lim
|δ|→0 k

where the convergence takes place in probability, and where δ : 0 = t 0 < · · · < t n = t , n = n δ ≥ 1, runs
over all the partitions or sub-divisions of [0, t ], and where |δ| = max1≤k≤n |t k+1 − t k | is the mesh of
δ. More generally, the quadratic covariation process of a couple of square integrable real processes
X = (X t )t ≥0 and Y = (Y t )t ≥0 is defined for all t ≥ 0 by the following limit when it exists:

P X
[X , Y ]t = lim (X tk+1 − X tk )(Y tk+1 − Y tk ).
|δ|→0 k

We have [X ] = [X , X ]. The set of processes with quadratic variation is a vector space. The operator [·] is
bilinear on this space and we have by polarization [X , Y ] = 41 ([X + Y ] − [X − Y ]).
We use convergence in probability because we do not know if the process has high enough moments.
Recall that for Brownian motion we have used the fourth moment for L2 convergence of quadratic variation.
Theorem 3.2.1 states that for a BM B , we have, for all t ≥ 0, [B ]t = t . Theorem 4.1.4 states that for all any
square integrable continuous martingale M issued form the origin, for all t ≥ 0, E([M ]t ) = E(M t2 ).

Lemma 4.1.2. Continuity and finite variation implies zero quadratic variation.

If a process X = (X t )t ≥0 is continuous and has finite variation then it has zero quadratic variation. In
other words, for a continuous process, non-zero quadratic variation implies infinite variation.

On the same topic, Lemma 4.1.6 states that a finite variation continuous martingale is constant.

Proof. Indeed, for all t > 0 and all partition δ : 0 = t 0 < · · · < t n = t of [0, t ], n = n δ ≥ 1,

(X tk+1 − X tk )2 ≤ max |X tk+1 − X tk | |X tk+1 − X tk | −→ 0.


X X
k k k |δ|→0

The max part of the right hand side tends to 0 since X is continuous and thus uniformly continuous (Heine),
P
while the part is bounded by the 1-variation of X on [0, t ] which is finite since X has finite variation. ■

53/142
4 More on martingales

Coding in action 4.1.3. Quadratic variation of BM.

Could you write a code simulating approximate trajectories of one-dimensional Brownian motion
and their approximate quadratic variation, and plotting both on the same graphic?

We denote by M 2 the set of square integrable continuous martingales.


We denote by M02 the set of square integrable continuous martingales issued from the origin.
We often use the following properties for any M ∈ M 2 :

• Squared L2 norm of increments: for all 0 ≤ s ≤ t ,

E((M t − M s )2 ) = E(E(M t2 − 2M s M t + M s2 | Fs )) = E(M t2 ) − E(M s2 )

and thus for any subdivision s = t 0 < · · · < t n = t , by telescoping summation,

n
E((M ti − M ti −1 )2 ) = E(M t2 ) − E(M s2 ).
X
i =1

• (Conditional) orthogonal increments in L2 : for all 0 ≤ s ≤ t ≤ u ≤ v we have

E((M t − M s )(M v − M u ) | Ft ) = (M t − M s ) E(M v − M u | Ft ) = 0.


| {z }
=M t −M t =0.

The following theorem is a crucial result of martingale theory.

Theorem 4.1.4. Increasing process or angle bracket.

Let M ∈ M02 .

• There exists a unique continuous and non-decreasing process denoted 〈M 〉 = (〈M t 〉)t ≥0 such
that 〈M 〉0 = 0 and (M t2 − 〈M t 〉)t ≥0 is a martingale. In particular 〈M 〉 is adapted.

• For all t ≥ 0, the quadratic variation [M ]t exists and [M ]t = 〈M 〉t .

Uniqueness is up to indistinguishability.
The process 〈M 〉 is called the increasing process or angle bracket of M , or even the compensator of M 2 .
If M ∈ M 2 with M 0 ̸= 0 then we define [M ] = [M − M 0 ] and 〈M 〉 = 〈M − M 0 〉.
If B is a Brownian motion, Theorem 3.1.6 gives that 〈B 〉t = t for all t ≥ 0 by showing that (B t2 − t )t ≥0 is a
martingale, while Theorem 3.2.1 gives that [B ]t = t for all t ≥ 0 by computing the quadratic variation. More
generally Lemma 4.2.6 states that for all continuous local martingale M issued from the origin, [M ] = 〈M 〉.
In Theorem 4.1.4, M 2 is a sub-martingale, and actually Theorem 4.1.4 states a special case of the more
general Doob – Meyer1 decomposition of sub-martingales which is beyond the scope of this course.

Corollary 4.1.5. Boundedness in L2 .

If M ∈ M02 then there exists a random variable 〈M 〉∞ taking values in [0, +∞] such that almost surely

〈M 〉t ↗ 〈M 〉∞ ,
t →∞

and moreover M is bounded in L2 if and only if 〈M 〉∞ ∈ L1 , more precisely, in [0, +∞],

E(〈M 〉∞ ) = sup E(M t2 ).


t ≥0

1 Named after Paul-Anré Meyer (1934 – 2003), French mathematician.

54/142
4.1 Quadratic variation, square integrable martingales, increasing process

Proof of Corollary 4.1.5. The first property follows from the monotony and positivity of 〈M 〉. For the second
property, since M 2 −〈M 〉 is a martingale we get E(M t2 ) = E(〈M 〉t ) for all t ≥ 0, and by monotone convergence,
E(M t2 ) = E(〈M 〉t ) ↗ E(〈M 〉∞ ) ∈ [0, +∞].
t →∞

This can be skipped at first reading.

Proof of Theorem 4.1.4.


• Existence of 〈M 〉 and [M ] and their equality when M is bounded. Let us fix t > 0 and let (δn )n be
a sequence of partitions of [0, t ], δn : 0 = t 0n < · · · < t rnn = t with |δn | = max1≤k≤r n (t kn − t k−1
n
)→0
as n → ∞. It can be checked that the process X = (X s )s∈[0,t ] defined by
rn
X sn =
X
M tk−1 t k ∧s − M t k−1
n (M n n
∧s )
k=1

is a (bounded) martingale (it is crucially zero when s ≤ t in , see Lemma 6.1.3 for more), and that
k
M t2n − 2X tnn = (M tin − M tin−1 )2 .
X
k k
i =1

Now it turns out that


lim E((X sn − X sm )2 ) = 0.
m,n→∞

It follows by the Doob maximal inequality (Theorem 2.5.7) that


³ ´
lim E sup (X sn − X sm )2 = 0.
m,n→∞ s∈[0,t ]

Next, for some subsequence n k and continuous process Y , we have that almost surely X nk → Y
as k → ∞. Moreover Y inherits the martingale property from X . Now the process
k
M t2n − 2X tnn = (M tin − M tin−1 )2
X
k k
i =1

is non-decreasing along t kn ,
1 ≤ k ≤ r n . Letting n → ∞ gives that M 2 − 2Y is almost surely non-
decreasing. This shows that [M ] exists, is equal to M 2 − 2Y , and that we can take 〈M 〉 = [M ].

• Existence of 〈M 〉 and [M ] and their equality when M is not bounded. For all N , we intro-
duce the stopping time T N = inf{t ≥ 0 : |M t | ≥ N }. From the bounded case applied to the
bounded martingale (M t ∧TN )t ≥0 , there exists a unique increasing process (A N t )t ≥0 such that
(M t2∧Tn − A N
t ) is a martingale. The uniqueness gives A N
t ∧T N = A N
t , and the we can define a
t ≥0
process (A t )t ≥0 by setting A t = A N
t on the event {T N ≥ t }. Finally by monotone and dominated
2
converge, (M t − A t )t ≥0 is a martingale.
For the quadratic variation, it suffices to write
³ n ´ ³ n ´
2 N 2
P |A t − n ) | ≥ ε ≤ P(T
N ≤ t ) + P |A t − ∧T N ) | ≥ ε .
X X
(M tkn − M tk−1 (M tkn ∧TN − M tk−1
n

k=1 k=1

In contrast with the bounded case, here A t = 〈M 〉t belongs to L1 but not necessarily to L2 , and
in particular, the convergence of S(δn ) = k (M tkn − M tk−1 2 2
n ) holds in P but not necessarily in L .
P

• Uniqueness of 〈M 〉. If (A t )t ≥0 and (A ′t )t ≥0 are continuous, increasing, issued from 0, such that


(M t2 − A t )t ≥0 and (M t2 − A ′t )t ≥0 are continuous martingales, then (A t − A ′t )t ≥0 is a continuous
finite variation martingale, and by Lemma 4.1.6, it is constant. Since A 0 = A ′0 = 0, we get A = A ′ .

Lemma 4.1.6

If (M s )s∈[0,t ] is a finite variation continuous martingale then it is constant.

55/142
4 More on martingales

Proof of Lemma 4.1.6. Let (M s )s∈[0,t ] be a finite variation continuous martingale. We may as-
sume without loss of generality that M 0 = 0. For all N ≥ 1, we introduce the stopping time
X
T N = t ∧ inf{s ∈ [0, t ] : |M s | ≥ N , sup |M tk+1 − M tk | ≥ N }
k

where the supremum runs over all sub-divisions of [0, t ]. By Theorem 2.5.1, the stopped pro-
cess (M s∧TN )s∈[0,t ] is a bounded martingale and thus, for all s ≤ t ,

E((M t ∧TN − M s∧TN )2 ) = E(E((M t ∧TN − M s∧TN )2 | Fs )) = E(M t2∧TN ) − E(M s∧T
2
N
).

This gives, using a telescoping sum, for an arbitrary sub-division δ : 0 = t 0 < · · · < t n = t ,

E(M T2N ) = E(M t2∧TN ) − E(M 0∧T


2
N
)
= E (M tk+1 ∧TN − M tk ∧TN )2
X
k
≤ E sup |M tk+1 ∧TN − M tk ∧TN |
X
|M tk+1 ∧TN − M tk ∧TN |
k k
≤ N E sup |M tk+1 ∧TN − M tk ∧TN |.
k

Since M is continuous, the sup in the right hand side tends a.s. to 0 as |δ| = maxi (t i +1 − t i ) → 0.
Since it is bounded, by dominated convergence, E(M T2N ) = 0. Thus M TN = 0, which gives in turn
M t = 0 by sending N to ∞ and using the fact that M is continuous with finite variation. ■

Remark 4.1.7: Stochastic integral

In the proof of Theorem 4.1.4, we have approximated [M ]t as M t2 minus 2 times a sort of


Rt
Riemann sum approximating in probability the stochastic integral 0 M s dM s making this ap-
proximation and its limit a martingale. This corresponds to the following calculus formula
Z t 1
Z t

f (M t ) = f (M 0 ) + f (M s )dM s + f ′′ (M s )d〈M 〉s
0 2 0

in the special case f (x) = x 2 . This is a remarkable special case of the Itô formula. The
quadratic variation term, the second term in the right hand side, is due the roughness of M .

Corollary 4.1.8. Angle bracket, square bracket, quadratic covariation.

Let M , N ∈ M02 .

• There exists a unique continuous finite variation process 〈M , N 〉 = (〈M , N 〉t )t ≥0 such that
〈M , N 〉0 = 0 and (M t N t − 〈M t , N t 〉)t ≥0 is a martingale. In particular 〈M , N 〉 is adapted.

• The quadratic covariation of (M , N ) exists and [M , N ]t = 〈M , N 〉t for all t ≥ 0.

It is important that M and N are martingales with respect to the same filtration, the underlying (Ft )t ≥0 .
By Theorem 3.1.6, if B is a d -dimensional Brownian motion then for all 1 ≤ j , k ≤ d and all t ≥ 0,

〈B j , B k 〉t = [B j , B k ]t = t 1 j =k .

Proof. We proceed by quadratic polarization. First the processes (M t + N t )t ≥0 and (M t − N t )t ≥0 are square
integrable continuous martingales with respect to (Ft )t ≥0 . Next, for all t ≥ 0, if we define 〈M , N 〉t as
1
〈M , N 〉t = (〈M + N 〉t − 〈M − N 〉t ),
4

56/142
4.1 Quadratic variation, square integrable martingales, increasing process

then M t N t − 〈M , N 〉t = 41 ((M t + N t )2 − 〈M + N 〉t − ((M t − N t )2 − 〈M − N 〉t )), and thus (M t N t − 〈M , N 〉t )t ≥0


is a martingale by Theorem 4.1.4. Moreover 〈M , N 〉 is continuous with finite variation as the difference of
continuous and increasing processes. The uniqueness follows as in the proof of Theorem 4.1.4. The link
with the quadratic covariation follows by polarization and Theorem 4.1.4. ■

Corollary 4.1.9. Stopped angle brackets.

If M , N ∈ M02 and S, T are stopping times then 〈M T , N S 〉 = 〈M , N 〉S∧T .

Proof. Theorem 2.5.1 gives that (M 2 − 〈M 〉)T = (M T )2 − 〈M 〉T is a martingale. Now (〈M 〉T )0 = 〈M 〉0 = 0 and
〈M 〉T is a continuous increasing process, and thus, by the uniqueness property of the increasing process
provided by Theorem 4.1.4, we have 〈M T 〉 = 〈M 〉T . By polarization we get 〈M T , N T 〉 = 〈M , N 〉T . Finally,
〈M T , N 〉 = 〈M T , N T 〉 from the equality with quadratic covariation (sum of products of increments). ■

This can be skipped at first reading.

Corollary 4.1.10: Kunita – Watanabe inequality

For all square integrable martingales M and N , the following Cauchy – Schwarz type inequal-
ity holds, for all measurable processes ϕ and ψ and all t ≥ 0:
s s
Z t Z t Z t
|ϕs ||ψs |d|〈M , N 〉s | ≤ |ϕs |2 d〈M 〉s |ψs |2 d〈N 〉s ,
0 0 0

where the integrals are in the sense of finite variation integrators (Theorem 1.7.3).

Proof. Set 〈M , N 〉ts = 〈M , N 〉t − 〈M , N 〉s , s ≤ t . The Cauchy – Schwarz inequality on the sums approx-
imating the quadratic variations with rational edges gives, via continuity, that a.s. for all s < t ,
¯ q q
¯〈M , N 〉t ¯ ≤ 〈M , M 〉t 〈N , N 〉t .
¯
s s s

Similarly, we can prove that almost surely


Z t q q
|d〈M , N 〉u | ≤ 〈M , M 〉ts 〈N , N 〉ts .
s

By a monotone class argument, it follows that almost surely, for all bounded Borel set A,
Z sZ sZ
|d〈M , N 〉u | ≤ d〈M , M 〉u d〈N , N 〉u .
A A A

Now, almost surely, if ϕ = i λi 1 A i and ψ = i µi 1 A i are step functions with λi ≥ 0 and µi ≥ 0, then
P P

Z Z
ϕ(s)ψ(s) , N 〉s = λi µi
X
|〈M | |d〈M , N 〉s |
i Ai
s Z s Z
λ2i µ2i
X X
≤ d〈M , M 〉s d〈N , N 〉s
i Ai i Ai
sZ sZ
= ϕ(s)2 d〈M , M 〉s φ(s)2 d〈N , N 〉s .

The generalization to arbitrary non-negative measurable ϕ, ψ follows by monotone convergence. ■

57/142
4 More on martingales

4.2 Local martingales and localization by stopping times

If (M t )t ≥0 is a martingale, then the Doob stopping theorem states that for every stopping time T , the
stopped process (M t ∧T )t ≥0 is again a martingale. Stopping can be used in general to truncate the trajectories
of a process with a cutoff, in order to gain more integrability or tightness. Typically if (X t )t ≥0 is an adapted
process, we could consider the sequence of stopping times (Tn )n≥0 defined by Tn = inf{t ≥ 0 : |X t | ≥ n},
which satisfies almost surely Tn ↗ +∞ as n → ∞ and for which for all n the stopped process (X t ∧Tn )t ≥0
is bounded. We say that (Tn )n≥0 is a localizing sequence. Now a local martingale is simply an adapted
processes (X t )t ≥0 such that for all n ≥ 0 the stopped process (X t ∧Tn )t ≥0 is a (bounded) martingale. Every
martingale is a local martingale. However the converse is false, and strict local martingales do exist.
Local martingales popup naturally when constructing the stochastic integral (see Chapter 7).

Definition 4.2.1. Local martingale.

• A continuous process (M t )t ≥0 issued from the origin is a local martingale if it is adapted and
for all n ≥ 0, introducing the stopping time Tn = inf{t ≥ 0 : |M t | ≥ n}, the stopped process
M Tn = (M t ∧Tn )t ≥0 is a martingale. It is bounded since supt ≥0 |M t ∧Tn | ≤ |M 0 | ∨ n = n < ∞.

• Since the process M is continuous, almost surely Tn ↗ +∞ as n → ∞, and thus, for all t ≥ 0,
limn→∞ M t ∧Tn = M t almost surely. We say that the sequence (Tn )n≥0 localizes or reduces M .

• If we do not have M 0 ̸= 0, then we say that M is a local martingale when M − M 0 is a local mar-
tingale however we still impose that M is adapted and in particular that M 0 is F0 measurable.

• We denote by M loc the set of continuous local martingales w.r.t. the default filtration (Ft )t ≥0 .
We denote by M0loc the subset issued from the origin.

Remark 4.2.2. Alternative or relaxed definitions.

Equivalently we could say that a continuous adapted process (M t )t ≥0 issued from the origin is a local
martingale when there exists a sequence (S n )n≥0 of stopping times such that

1. almost surely S n ↗ +∞ as n → ∞

2. for all n ≥ 1, the continuous process M S n = (M t ∧S n )t ≥0 is a martingale.

Moreover in this definition we could replace martingale by the stronger conditions square integrable
martingale, or u.i. martingale, or bounded in L2 martingale, or bounded martingale. Indeed, it suf-
fices to show that M is then localized by Tn = inf{t ≥ 0 : |M t | ≥ n}. Indeed, since M is continuous,
almost surely Tn ↗ +∞ as n → ∞. Next, if (S n )n≥0 localizes M , then for all n, k ≥ 0, by the Doob
stopping theorem (Theorem 2.5.1) for the martingale M S k and the stopping time Tn , the process
(M S k )Tn = (M t ∧S k ∧Tn )t ≥0 is a martingale, thus for all 0 ≤ s ≤ t , E(M t ∧S k ∧Tn | Fs ) = M s∧S k ∧Tn . More-
over since M 0 = 0 and M is continuous, by definition of Tn , we have supt ≥0 |M t ∧S k ∧Tn | ≤ n, and by
dominated convergence, as k → ∞, we have E(M t ∧Tn | Fs ) = M s∧Tn , hence (M t ∧Tn )t ≥0 is a martingale.

• Localization is a truncation for processes by cutoff that has the advantage of preserving the continuity
of the process and the martingale structure thanks to Doob stopping theorems.

• A martingale is always a local martingale: take Tn = inf{t ≥ 0 : |M t | ≥ n} and use Doob stopping (The-
orem 2.5.1). Note that thanks to the convention inf ∅ = ∞ we have Tn = ∞ on {supt ≥0 |M t | < n}.

• If M is a local martingale, then no integrability is guaranteed for M t for a fixed deterministic t ≥ 0, and
we may have M t ̸∈ L1 . Moreover for every stopping time T , the stopped process M T = (M t ∧T )t ≥0 is a
local martingale but the Doob stopping theorem does not hold in general even if T is bounded.

58/142
4.2 Local martingales and localization by stopping times

Remark 4.2.3. Domination as a martingale criterion.

If M is a continuous local martingale dominated by an integrable random variable, in the sense that
E supt ≥0 |M t | < ∞, then, for all t ≥ 0 and s ∈ [0, t ], by continuity and dominated convergence,

M s = lim M s∧Tn = lim E(M t ∧Tn | Fs ) = E( lim M t ∧Tn | Fs ) = E(M t | Fs )


n→∞ n→∞ n→∞

for any localization sequence (Tn )n for M , hence M is a u.i. martingale. However, there exists con-
tinuous local martingales which are bounded in L2 and thus u.i. and which are not a martingale!

Remark 4.2.4. Strict local martingales.

Are there local martingales which are not martingales? Yesa .

• If M is a martingale, for instance Brownian motion, and if U is measurable with respect to F0 ,


then (U + M t )t ≥0 is a local martingale, and a martingale if and only if U ∈ L1 . Note that if M 0 is
constant and F = σ(M 0 ) = {Ω, ∅} then necessarily U is constant and we cannot have U ̸∈ L1 .

• Let M be a martingale such that M 0 = 1, such as the Doléans-Dade exponential (Theorem


7.3.1). Let U be a random variable independent of M . Then (U M t )t ≥0 is a local martingale with
respect to the enlarged filtration (σ(σ(U ) ∪ Ft ))t ≥0 , localized by Tn = inf{t ≥ 0 : |U M t | ≥ n}.
This is in fact an Itô stochastic integral, see Exercise 4 of the 2020-2021 exam.

• Let (B t )t ≥0 be a 3-dimensional BM with B 0 = x ̸= 0. The process (|B t |)t ≥0 is a Bessel process.


It can be shown that the inverse Bessel process (|B t |−1 )t ≥0 is a local martingale, localized by
Tn = inf{t ≥ 0 : |B t | ≤ 1/(|x| + n)}, but is not a martingale. Moreover it is bounded in L2 and thus
u.i.! For a proof, see Exercise 3 of the 2020-2021 examb .
a Some other famous examples are listed on https://en.wikipedia.org/wiki/Local_martingale
b Or https://djalil.chafai.net/blog/2020/10/31/back-to-basics-local-martingales/

Remark 4.2.5. Vector spaces.

The set M loc and M0loc are real vector spaces. Indeed if M , M ′ ∈ M0loc are localized respectively by
(Tn )n≥1 and (Tn′ )n≥1 , then by the Doob stopping theorem (Theorem 2.5.1), (S n )n≥0 = (Tn ∧ Tn′ )n≥0
S
localizes both M and M ′ . For all n ≥ 0, the process (M + M ′ )S n = M S n + M ′ n is a square integrable
martingale. Note that we have also the following (strict) inclusions:

M20 ⊂ M02 ⊂ M0loc


∩ ∩ ∩
M2 ⊂ M2 ⊂ M loc

Lemma 4.2.6. Increasing process, angle bracket, quadratic variation, square bracket.

Let M , N ∈ M0loc .
1. there exists a unique continuous finite variation process denoted (〈M , N 〉t )t ≥0 with

〈M , N 〉0 = 0 and (M t N t − 〈M , N 〉t )t ≥0 ∈ M0loc .

Moreover 〈M , N 〉 = 14 (〈M + N 〉 − 〈M − N 〉) where 〈M 〉 = 〈M , M 〉.

2. 〈M 〉 is the unique non-decreasing process such that M 2 −〈M 〉 is a continuous local martingale

3. M is localized by Tn = inf{t ≥ 0 : |M t | ≥ n or 〈M 〉t ≥ n} and for all n ≥ 0,


T
sup |M t n | ≤ n and sup〈M Tn 〉t ≤ n.
t ≥0 t ≥0

59/142
4 More on martingales

4. For all t ≥ 0 if (δn )n≥1 is a sequence of sub-divisions of [0, t ], δn : 0 = t 0n < · · · < t m


n
n
= t , then

m
Xn P
S(δn ) = (M tkn − M tk−1
n )(N n − N n ) −→ [M , N ] = 〈M , N 〉
tk t k−1 t t
n→∞
k=1

n
provided that |δn | = max1≤k≤mn (t kn − t k−1 ) → 0 as n → ∞. Furthermore

1
[M , N ] = ([M + N ] − [M − N ]) where [M ] = [M , M ].
4

We say that 〈M 〉 is the increasing process of M .


We say that 〈M , N 〉 is the finite variation process or angle bracket of the couple (M , N ).
We say that [M ] is the quadratic variation of M .
We say that [M , N ] is the quadratic covariation or square bracket of the couple (M , N ).
As for martingales, if M ∈ M loc then we set 〈M 〉 = 〈M −M 0 〉 and [M ] = [M −M 0 ], in particular 〈M 〉 = [M ].
As for martingales, 〈M 〉t is not necessarily in L2 , and in particular S(δ) → 〈M 〉 may not hold in L2 .

Proof.

1. If (S n )n≥0 localizes M and (Tn )n≥0 localizes N then (Un )n≥0 = (Tn ∧ S n )n≥0 localizes both M and N .
By uniqueness of the increasing process of square integrable continuous martingales (Theorem 4.1.4)
used for the square integrable martingales M Un and N Un , we get that for all 0 ≤ n ≤ m and t ≥ 0,

〈M Um , N Um 〉t ∧Un = 〈M Un , N Un 〉t ,

hence (〈M Um , N Um 〉)t ≥0 and (〈M Un , N Un 〉)t ≥0 are equal up to time Un . We then define, for all t ≥ 0,

〈M , N 〉t = lim 〈M Un , N Un 〉t .
n→∞

This is the unique continuous process with finite variations and issued from the origin, denoted
〈M , N 〉 such that for all t ≥ 0 and all n ≥ 0, 〈M , N 〉t ∧Un = 〈M Un , N Un 〉t . We then set 〈M 〉 = 〈M , M 〉.

2. Take M = N in the previous item.

3. It suffices to proceed as in Remark 4.2.3. Note that 〈M Tn 〉 = 〈M 〉Tn gives |〈M Tn 〉| ≤ n.

4. We reduce to M = N by polarization. Next, let (Tn )n≥0 be a localization sequence for M . For all n ≥ 0,
Theorem 4.1.4 used for the square integrable martingale M Tn gives
X T T L2
S Tn (δ) = n
(M ti +1 − M ti n ) −→ 〈M Tn 〉t = 〈M 〉Tn ∧t .
i |δ|→0

Now for all ε > 0 and all n ≥ 0,

P(|S(δ) − 〈M 〉t | > ε) ≤ P(Tn ≤ t ) + P(|S(δ) − 〈M 〉t | > ε, t < Tn )


≤ P(Tn ≤ t ) + P(|S Tn (δ) − 〈M 〉Tn ∧t | > ε),

and therefore lim|δ|→0 P(|S(δ) − 〈M 〉t | > ε) = 0.

Lemma 4.2.7. Martingale criterion.

Let M be a continuous local martingale with M 0 ∈ L2 .


1. The following properties are equivalent:

(a) M is a martingale which is square integrable


(b) E(〈M 〉t ) < ∞ for all t ≥ 0.

60/142
4.2 Local martingales and localization by stopping times

2. The following properties are equivalent:

(a) M is a martingale which is bounded in L2 and supt ≥0 E(M t2 ) = E(〈M 〉∞ )


(b) E(〈M 〉∞ ) < ∞

Moreover, in this case M 2 − 〈M 〉 is a u.i. martingale and E(M ∞


2
) = E(M 02 ) + E(〈M 〉∞ ).

The proof of the lemma is rather short but uses many typical martingale ingredients!

Proof. By replacing M with M − M 0 , we assume without loss of generality that M 0 = 0.

1. If M is a square integrable martingale then M 2 − 〈M 〉 is a martingale and in particular 〈M 〉t ∈ L1 for all


t ≥ 0. Conversely, if M is a continuous local martingale with 〈M 〉t ∈ L1 for all t ≥ 0 then since M 2 −〈M 〉
is a continuous local martingale, it follows that there exists a sequence (Tn )n≥0 of stopping times
such that almost surely Tn ↗ +∞ as n → ∞ and for all n ≥ 0 the process (M Tn )2 − 〈M 〉Tn is a square
integrable continuous martingale issued from 0. Hence, for all t ≥ 0, using monotone convergence,

E(M t2∧Tn ) = E(〈M 〉t ∧Tn ) −→ E(〈M 〉t ) < ∞.


n→∞

This implies that (M t ∧Tn )n≥0 is bounded in L2 . On the other hand, it follows by the Fatou lemma that
³ ´
E(M t2 ) = E lim M t2∧Tn ≤ lim E(M t2∧Tn ) < ∞.
n→∞ n→∞

Finally, since for all t ≥ 0, (M t ∧Tn )n≥0 is bounded in L2 , it is u.i., and thus, for all 0 ≤ s ≤ t , since
limn→∞ M t ∧Tn = M t a.s., this convergence holds in L1 and we obtain the martingale property via
³ ´
E(M t | Fs ) = E lim M t ∧Tn | Fs = lim E(M t ∧Tn | Fs ) = lim M s∧Tn = M s .
n→∞ n→∞ n→∞

2. If M is a martingale bounded in L2 then, by Corollary 4.1.5, 〈M 〉∞ ∈ L1 . Conversely, if M is a local


martingale with 〈M 〉∞ ∈ L1 , then, by monotony and positivity of 〈M 〉, 〈M 〉t ∈ L1 for all t ≥ 0, next, by
the first part, M is a square integrable martingale, and thus, by Corollary 4.1.5, M is bounded in L2 .
Finally if M is a martingale bounded in L2 , then the Doob maximal inequality (Theorem 2.5.7) gives
³ ´
E sup M s2 ≤ 4E(M t2 )
s∈[0,t ]

for all t ≥ 0, and by sending t to ∞, we get, by monotone convergence,


³ ´
E sup M t2 ≤ 4 sup E(M t2 ).
t ≥0 t ≥0

This gives the domination


sup |M t2 − 〈M 〉t | ≤ sup M t2 + 〈M 〉∞ ∈ L1 ,
t ≥0 t ≥0

which implies that M 2 − 〈M 〉 is uniformly integrable.

Remark 4.2.8. Vocabulary.

If X and A are continuous adapted processes with X 0 = A 0 = 0 with A of finite variation and such that
X − A is a local martingale then A is unique and is called the compensator of X . For instance if X is a
continuous local martingale with X 0 = 0 then the compensator of X 2 is 〈X 〉.

61/142
4 More on martingales

Remark 4.2.9. Link with Brownian motion.

The Lévy characterization of Brownian motion (Theorem 7.2.1) states that among all continuous
local martingales, Brownian motion is characterized by its angle bracket. On the other hand, the
Dubins – Schwarz theorem (Theorem 7.4.1) states that all continuous local martingales with angle
bracket tending to infinity at infinity are time changed Brownian motion by the angle bracket.

4.3 Convergence in L2 and the Hilbert space M20

Let M20 be the set of continuous martingales issued from the origin and bounded in L2 , for some fixed
underlying filtered probability space (Ω, F , (Ft )t ≥0 , P).
The elements of M20 are centered: for all M ∈ M20 and all t ≥ 0, E(M t ) = E(M 0 ) = 0.
For all M ∈ M20 , we have M 0 = 0 and supt ≥0 E(M t2 ) < ∞. By Theorem 2.1.3, we see the elements of M20 as
random variables taking values in (C (R+ , R), BC (R+ ,R) ). In particular for all M , N ∈ M20 , we have M = N iff M
and N are indistinguishable in other words P(∀t ≥ 0 : M t = N t ) = 1. Also M = 0 iff for all t ≥ 0, M t = 0.

Theorem 4.3.1. Hilbert structure on M20 .

The set M20 is a Hilbert space with scalar product 〈M , N 〉M2 = E(〈M , N 〉∞ ).
0
Moreover, for all M ∈ M20 , we have ∥M ∥2M2 = E(〈M 〉∞ ) = supt ≥0 E(〈M 〉t ) = supt ≥0 E(M t2 ).
0

More generally, it can be shown similarly that for all fixed T > 0, the set M20,T of square integrable
continuous martingales (M t )t ∈[0,T ] such that M 0 = 0 is a Hilbert space for the scalar product 〈M , N 〉M2 =
0,T
E(〈M , N 〉T ). In this case, for all M ∈ M20,T , we have ∥M ∥2M2 = supt ∈[0,T ] E(〈M 〉t ) = supt ∈[0,T ] E(M t2 ).
0,T

Proof. The facts that M20 is a vector space and that 〈·〉 is bilinear, symmetric, and non-negative on the diago-
nal are almost immediate. For the positivity, if M ∈ M20 with E(〈M 〉∞ ) = 0 then we have 〈M 〉t = 0 for all t ≥ 0,
hence E(M t2 ) = 0 for all t ≥ 0, thus M t = 0 for all t ≥ 0. To prove completeness, let (M (n) )n≥1 be a Cauchy
sequence in M20 . Then for all ε > 0, there exists r ≥ 1 such that for all m, n ≥ r , ∥M (n) − M (m) ∥M2 ≤ ε. Thus
0

sup E(|M t(n) − M t(m) |2 ) ≤ ε2 .


t ≥0

This implies that for all t ≥ 0, (M t(n) )t ≥0 is a Cauchy sequence in L2 , and thus converges to an element
M t ∈ L2 . It follows that M = (M t )t ≥0 is a square integrable martingale, issued from the origin. It remains
to prove that M is continuous. To this end, the idea is to use uniform convergence on finite time intervals.
Namely, let us fix t > 0. From the L2 convergence, there exists a sub-sequence (n k )k≥1 such that for all k ≥ 1,

(n k ) (n k +1) 2
E(|M t − Mt | ) ≤ 2−k .

Now the Doob maximal inequality (Theorem 2.5.7) for the martingale (M t(n) − M t(n+1) )t ≥0 gives

(n k ) (n k +1) 2 (n k ) (n k +1) 2
E( sup |M s − Ms | ) ≤ 4E(|M t − Mt | ) ≤ 2−k+2 ,
s∈[0,t ]

and thus, by monotone convergence or the Fubini – Tonelli theorem,


³X ´ X ³ ´
(n k ) (n k +1) 2 (n ) (n +1)
E sup |M s − Ms | = E sup |M s k − M s k |2 < ∞.
k≥1 s∈[0,t ] k≥1 s∈[0,t ]

Therefore for all t > 0, almost surely

(n k ) (n k +1)
X
sup |M s − Ms | < ∞.
k≥1 s∈[0,t ]

62/142
4.3 Convergence in L2 and the Hilbert space M20

Lemma 4.3.2. Criterion.

In a Banach space if ∞
P
n=1 ∥u n − u n+1 ∥ < ∞ then (u n )n≥1 converges.

(−1)n
−→ 0 but |u n −u n+1 | ∼ n2
P∞
The converse is false, for instance u n = n n→∞ and thus n=1 |u n −u n+1 | = ∞.
n→∞

Proof of Lemma 4.3.2. The sequence (u n )n≥1 is Cauchy since for all n ≥ 1 and m ≥ 1 we have
n+m−1
X X
∥u n+m − u n ∥ ≤ ∥u k+1 − u k ∥ ≤ ∥u k+1 − u k ∥ −→ 0.
n→∞
k=n k≥n


By using Lemma 4.3.2 with the Banach space (C ([0, t ], R), ∥·∥ = sup[0,t ] |·|), this implies that for all t > 0,
(n )
almost surely, the sequence of continuous functions (s ∈ [0, t ] 7→ M s k )k≥1 converges uniformly towards a
limit denoted (M s′ )s≥[0,t ] which is continuous thanks the uniform convergence. This almost sure event can
(n k )
be chosen independent of t for instance by taking integer values for t . Now for all t ≥ 0, (M t )k≥1 converges
to M t in L2 and to M t′ almost surely, and therefore M t = M t′ . ■

Theorem 4.3.3. Convergence of martingales bounded in L2 .

Let M be a square integrable martingale bounded in L2 . Then there exists M ∞ ∈ L2 such that

lim M t = M ∞ almost surely and in L2 .


t →∞

Note that M is uniformly integrable because it is bounded in Lp with p = 2 > 1.

Proof. Let us show that M satisfies the L2 Cauchy criterion. Recall that for all 0 ≤ s ≤ t , we have
E((M t − M s )2 ) = E(M t2 − 2M s E(M t | Fs ) + M s2 ) = E(M t2 − M s2 ).
But M 2 is a sub-martingale and t 7→ E(M t2 ) grows and is bounded above by supt ≥0 E(M t2 ) < ∞. Thus limt →∞ E(M t2 )
exists. Hence (M t )t ≥0 is Cauchy in L2 , and therefore it converges in L2 towards some M ∞ ∈ L2 . It remains to
establish the almost sure convergence. Now, by the Markov inequality, for all s ≥ 0 and all ε > 0,
³ ´ 1 ³ ´
P sup |M t − M ∞ | ≥ ε ≤ 2 E sup(M t − M ∞ )2
t ≥s ε t ≥s
2³ ³ ´ ³ ´´
≤ 2 E (M s − M ∞ )2 + E sup(M t − M s )2 .
ε t ≥s

Now the monotone convergence theorem gives


³ ´ ³ ´
E sup(M t − M s )2 = lim E sup (M t − M s )2 .
t ≥s T →∞ t ∈[s,T ]

On the other hand, for all s ≥ 0, the process (|M t − M s |)t ≥s is a continous non-negative sub-martingale, for
which the Doob maximal inequality of Theorem 2.5.7 gives
³ ´
E sup(M t − M s )2 ≤ lim 4E((M T − M s )2 ) = 4E((M ∞ − M s )2 ).
t ≥s T →∞

Therefore we obtain ³ ´ 10
P sup |M t − M ∞ | ≥ ε ≤ 2 E((M s − M ∞ )2 ) −→ 0.
t ≥s ε s→∞

Since the right hand side decreases as s grows, we get, for all ε > 0,
³ ´ ³ ´
P ∩s∈Q+ {sup |M t − M ∞ | ≥ ε} = lim P sup |M t − M ∞ | ≥ ε = 0,
t ≥s s→∞ t ≥s

Similarly, the right hand side decreases as ε grows, and then


³ ´ ³ ´
P ∪ε∈Q+ ∩s∈Q+ {sup |M t − M ∞ | ≥ ε} = lim lim P sup |M t − M ∞ | ≥ ε = 0,
t ≥s ε→0 s→∞ t ≥s

which means that limt →∞ M t = M ∞ almost surely! ■

63/142
4 More on martingales

4.4 Convergence in L1 , closedness, uniform integrability

As for the sum of independent and identically distributed random variables, there is, for martingales, in
a way, an L2 theory and an L1 theory. The L2 theory is in a sense simpler due to the Hilbert structure.

Theorem 4.4.1. Doob convergence theorem for martingales bounded in L1 .

Let M be a continuous martingale bounded in L1 . Then there exists M ∞ ∈ L1 such that

lim M t = M ∞ almost surely.


n→∞

Moreover the convergence holds in L1 if and only if M is uniformly integrable.

If M is a non-negative martingale, then it is always bounded in L1 .


If M is martingale bounded in L1 but not u.i., then E(M t ) = E(M 0 ) for all t ≥ 0 but E(M ∞ ) ̸= E(M 0 ).

This can be skipped at first reading.

Proof. We can assume that M 0 = 0, otherwise consider the martingale M − M 0 = (M t − M 0 )t ≥0 which


is also bounded in L1 , making M t → M 0 + (M − M 0 )∞ a.s. We proceed by truncation and reduction to
the square integrable case. By the Doob maximal inequality (Theorem 2.5.7) with p = 1 and all r > 0,
³ ´ E(|M |)
t
P sup |M s | ≥ r ≤ .
s∈[0,t ] r

By monotone convergence, with C = supt ≥0 E(|M t |) < ∞, for all r > 0,


³ ´ C
P sup |M t | ≥ r ≤ .
t ≥0 r

It follows that ³ ´ ³ ´
P sup |M t | = ∞ ≤ lim P sup |M t | ≥ r = 0,
t ≥0 r →∞ t ≥0

in other words almost surely (M t )t ≥0 is bounded: supt ≥0 |M t | < ∞. Thus, there exists an almost sure
event, say Ω′ , on which for all n ≥ supt ≥0 |M t | (beware that this threshold on n is random),

Tn = inf{t ≥ 0 : |M t | ≥ n} = ∞.

Next, by Doob stopping (Theorem 2.5.1), for all n ≥ 0, (M t ∧Tn )t ≥0 is a martingale, bounded since
supt ≥0 |M t ∧Tn | ≤ |M 0 | ∨ n = n (M is continuous and M 0 = 0). Now, since (M t ∧Tn )t ≥0 is bounded in L2 ,
(n) (n)
by Theorem 4.3.3, there exists M ∞ ∈ L2 such that limt →∞ M t ∧Tn = M ∞ almost surely (and in L2 but
this is useless). Let us denote by Ωn the almost sure event on which this holds. Then, on the almost
sure event Ω′ ∩ (∩n Ωn ), for all t ≥ 0 and n ≥ sups≥0 |M s |, we have M t ∧Tn = M t , thus the sequence
(n) (n)
(M ∞ )n is stationary in the sense that M ∞ is constant when n ≥ sups≥0 |M s |, hence, if M ∞ is its limit,

lim M t = M ∞ .
t →∞

(n)
Contrary to M ∞ , the limit M ∞ has not reason to belong to L2 . However M ∞ ∈ L1 since from the
almost sure convergence, the boundedness in L1 of (M t )t ≥0 , and by using the Fatou lemma, we have

E(|M ∞ |) = E( lim |M t |) ≤ lim E(|M t |) ≤ C < ∞.


t →∞ t →∞

Finally an almost sure convergence to an L1 limit holds in L1 if and only if the sequence is u.i. ■

The result remains valid for super-martingales.

64/142
4.4 Convergence in L1 , closedness, uniform integrability

Theorem 4.4.2: Doob convergence theorem for super-martingales bounded in L1

Let M be a continuous super-martingale bounded in L1 . Then there exists M ∞ ∈ L1 such that

lim M t = M ∞ almost surely.


n→∞

Note that a non-negative super-martingale is automatically bounded in L1 .

Proof. See for instance [31, Theorem 3.19 page 58 – 59] for a classical proof using oscillations. ■

Remark 4.4.3. Non-negative local martingales are super-martingales.

If (M t )t ≥0 is a non-negative continuous local martingale and M 0 ∈ L1 , then it is a non-negative su-


per-martingale and by Theorem 4.4.2 it converges almost surely to an integrable random variable.
Indeed, if (Tn )n is a localizing sequence then for all t ≥ 0 and s ∈ [0, t ], by the Fatou Lemma,

E(M t | Fs ) = E( lim M t ∧Tn | Fs ) ≤ lim E(M t ∧Tn | Fs ) = lim M s∧Tn = M s .


n→∞ n→∞ n→∞

Note that the conditional expectations are well defined in [0, +∞] because M is non-negative.

Corollary 4.4.4. Convergence of martingales bounded in Lp , p > 1.

If M is a continuous martingale bounded in Lp with p > 1 then there exists M ∞ ∈ Lp such that

lim M t = M ∞ almost surely and in Lp .


t →∞

In particular, for p = 2 this gives an alternative to Theorem 4.3.3.

This can be skipped at first reading.

Proof. Since M is a super-martingale bounded in L1 , Theorem 4.4.1 gives M ∞ ∈ L1 such that


limt →∞ M t = M ∞ almost surely. But since M is bounded in Lp with p > 1, it follows that M is uni-
formly integrable, and therefore limt →∞ M t = M ∞ in L1 . We have M ∞ ∈ Lp since by the Fatou lemma,

E(|M ∞ |p ) = E( lim |M t |p ) ≤ lim E(|M t |p ) < ∞.


t →∞ t →∞

On the other hand, by the Doob maximal inequality (Theorem 2.5.7), since M is bonded in Lp , for all
t ≥ 0, sups∈[0,t ] |M s |p ∈ L1 and E(sups∈[0,t ] |M s |p ) ≤ c p E(|M t |p ). Therefore, by monotone convergence,
³ ´
E sup |M t |p ≤ sup E(|M t |p ) < ∞.
t ≥0 t ≥0

Hence supt ≥0 |M t |p ∈ L1 , and thus, by dominated convergence, limt →∞ M t = M ∞ in Lp . ■

Corollary 4.4.5. Doob theorem on closed martingales or Doob martingale convergence theorem.

Let M be a continuous martingale. The following properties are equivalent:

1. (convergence) M t converges in L1 as t → ∞

2. (closedness) there exists M ∞ ∈ L1 such that for all t ≥ 0, M t = E(M ∞ | Ft )

3. (integrability) the family {M t : t ≥ 0} is uniformly integrable.

In this case, for all t ≥ 0, M t = E(M ∞ | Ft ), and limt →∞ M t = M ∞ a.s. and in L1 , and E(M 0 ) = E(M ∞ ).

65/142
4 More on martingales

If M is a martingale then for all fixed a ≥ 0, the stopped martingale M a = (M t ∧a )t ≥0 is closed by M a since
E(M a | Ft ) = M a 1a≤t + M t 1a>t = M t ∧a . Hence M a is uniformly integrable. Note that limt →∞ M t ∧a = M a .
Note that in the proof below, Theorem 4.4.1 is used in every implication of the equivalence.

This can be skipped at first reading.

Proof. 1. ⇒ 2. If M converges in L1 , then it is bounded in L1 , and by Theorem 4.4.1, its converges


a.s. to M ∞ ∈ L1 (the convergence holds also in L1 but we do not use this fact now). For all t ≥ 0
and s ∈ [0, t ] and all A ∈ Fs , the martingale property for M gives E(M t 1 A ) = E(M s 1 A ). By dominated
convergence as t → ∞, we get E(M ∞ 1 A ) = E(M s 1 A ) therefore M s = E(M ∞ | Fs ) for all s ≥ 0.
2. ⇒ 3. Let us assume that for some M ∞ ∈ L1 we have M t = E(M ∞ | Ft ) for all t ≥ 0. Then
supt ≥0 E(|M t |) ≤ E(|M ∞ |) < ∞ and thus, by Theorem 4.4.1, M t converges a.s. as t → ∞. It follows
that almost surely M ∗ = supt ≥0 |M t | < ∞. Now limR→∞ 1M∗ ≥R = 0 almost surely, and for all R ≥ 0,

sup E(|M t |1|M t |≥R ) = sup E(|E(M ∞ | Ft )|1|M t |≥R ) ≤ sup E(|M ∞ |1|M t |≥R ) ≤ E(|M ∞ |1M∗ ≥R ) −→ 0
t ≥0 t ≥0 t ≥0 R→∞

where the convergence follows by dominated convergence. Therefore M is u.i.


3. ⇒ 1. If M is u.i. then it is bounded in L1 , and from Theorem 4.4.1, there exists M ∞ ∈ L1 such that
limt →∞ M t = M ∞ a.s. Since M is u.i., the convergence holds in L1 . ■

The following generalizes the Doob stopping theorem (Theorem 2.5.1).

Corollary 4.4.6. Doob stopping for uniformly integrable martingales.

Let M be a u.i. continuous martingale and let T be a stopping time (not necessarily bounded or
finite). We set M T = M ∞ on {T = ∞} where M ∞ = limt →∞ M t is as in Corollary 4.4.5. Then:

1. (M t ∧T )t ≥0 is a uniformly integrable martingale, M T ∈ L1 , and for all t ≥ 0, M t ∧T = E(M T | Ft ).


In particular, for all t ≥ 0, E(M 0 ) = E(M t ∧T ) = E(M T ).

2. Moreover if S is another stopping time with S ≤ T then M S = E(M T | FS ).


In particular, for all stopping time S, M S = E(M ∞ | FS ) and E(M S ) = E(M ∞ ) = E(M 0 ).

This can be skipped at first reading.

Proof. We will prove the first property by using the second property.
1. For all t ≥ 0, both t ∧ T and T are stopping times. By the second property of the present the-
orem, M t ∧T ∈ L1 and M T ∈ L1 . Moreover M t ∧T is measurable for Ft ∧T , and thus for Ft since
t ≤ t ∧ T . Now, in order to prove that E(M T | Ft ) = M t ∧T , it suffices to show that for all A ∈ Ft ,

E(1 A M T ) = E(1 A M t ∧T ).

But for all A ∈ Ft , we have immediately from T = t ∧ T on {T ≤ t } that

E(1 A∩{T ≤t } M T ) = E(1 A∩{T ≤t } M t ∧T ).

The second property of the present theorem for the stopping times S = t ∧ T and T gives

M t ∧T = E(M T | Ft ∧T ).

Now since A ∩ {T > t } ∈ Ft and A ∩ {T > t } ∈ FT , we get A ∩ {T > t } ∈ Ft ∩ FT = Ft ∧T , and

E(1 A∩{T >t } M T ) = E(1 A∩{T >t } M t ∧T ).

By adding this to a previous formula we get the desired result E(1 A M T ) = E(1 A M t ∧T ).
Finally, the fact that M T = (M t ∧T )t ≥0 is a martingale follows from what precedes used with the
u.i. martingale M a = (M t ∧a )t ≥0 for all a ≥ 0, which gives M s∧Ta
= E(M Ta | Fs ) for all s ≥ 0, in
other words M s∧a∧T = E(M a∧T | Fs ). Taking a = t ≥ s gives the martingale property for M T .

66/142
4.4 Convergence in L1 , closedness, uniform integrability

2. Following for instance [31, Theorem 3.22 page 59], we discretize as in the proof of Theorem
2.5.1 or Theorem 3.5.1. Namely, for all n ≥ 0, let us define the stopping times

X∞ k +1 X∞ k +1
Sn = n
1k2−n <S≤(k+1)2−n + ∞1S=∞ and Tn = n
1k2−n <T ≤(k+1)2−n + ∞1T =∞ .
k=0 2 k=0 2

We have S n ↘ S and Tn ↘ T as n → ∞, and S n ≤ Tn for all n ≥ 0. Next, for all n ≥ 0, 2n S n and


2n Tn are integer valued stopping times for the discrete time filtration (Fk(n) )k≥0 = (Fk2−n )k≥0 ,
while M (n) = (M k2−n )k≥0 is a uniformly integrable discrete time martingale with respect to this
filtration. By using the Doob stopping theorem for u.i. discrete time martingales, we get

M S n = M 2(n) (n) (n)


n S = E(M 2n T | F2n S ) = E(M T n | FS n ).
n n n

Now, for all A ∈ FS ⊂ FS n , we have E(1 A M S n ) = E(1 A M Tn ). Since M is (right) continuous, a.s.

M S = lim M S n and M T = lim M Tn .


n→∞ n→∞

For the L1 convergence, the Doob stopping theorem for u.i. discrete time martingales gives
M S n = E(M ∞ | FS n ) for all n ≥ 0 and thus (M S n )n≥0 and (M Tn )n≥0 are u.i. This also gives that
M S ∈ L1 and M T ∈ L1 . This also allows to pass to the limit in E(1 A M S n ) = E(1 A M Tn ) to get
E(1 A M S ) = E(1 A M T ). This holds for all A ∈ FS , thus M S = E(M T | FS ).

67/142
4 More on martingales

68/142
Chapter 5

Itô stochastic integral with respect to Brownian motion

This chapter is purely pedagogical can be completely bypassed if time is very limited.
In this chapter, we construct a stochastic integral which goes beyond the Wiener integral of Chapter 3,
with an integrator (B t )t ≥0 which is Brownian motion and with an integrand (ϕt )t ≥0 which can be random
and square integrable. The ambition is thus to define the process
³Z t ´
ϕs dB s .
0 t ≥0

Following Itô, we start with a finite sum when ϕ is a d -dimensional step process. We impose a predictable
structure to the step process in such a way that the resulting stochastic integral is a martingale. This corre-
sponds to choose the value at the left-end time of intervals in the sum. This Itô stochastic integral coincides
with the Wiener integral when the integrand ϕ is deterministic.

5.1 Itô versus Stratonovich stochastic integrals in a nutshell

Let (B t )t ≥0 be a one-dimensional BM issued from the origin. Let us try to give a meaning to
Z t
B s dB s .
0

This is not a Wiener integral since the integrand is random. Using an approximation with step processes,
we have, rearranging terms to produce telescoping summation or already known sums,

B ti (B ti +1 − B ti ) = (B t2i +1 − B t2i − B t2i +1 + B ti B i i +1 )


X X
i i
= (B t2i +1 − B t2i − B ti +1 (B ti +1 − B ti ))
X
i
X 2
(B ti +1 − B t2i ) − B ti (B ti +1 − B ti ) − (B ti +1 − B ti )2
X X
=
i i i

which gives, if we can pass to the limit in probability,


Z t Z t
B s dB s = B t2 − B s dB s − [B ]t ,
0 0

where [B ]t is the quadratic variation of B on [0, t ] (Theorem 3.2.1). Since [B ]t = t (Theorem 3.2.1 again),

t B t2 − t
Z
B s dB s = .
0 2

The process ( 12 (B t2 − t ))t ≥0 is a centered martingale. The term − 12 t is the martingale correction to the dif-
ferential calculus term 12 B t2 . Taking the value at the left-end time intervals in the Riemann sum produces
a centered martingale and a stochastic integral which is a centered martingale. This is the Itô1 stochastic
integral. Indeed this is confirmed by the rigorous construction in this chapter.
1 Named after Kiyosi Itô (1915 – 2008), Japanese mathematician. He used the notation “Kiyosi Itô” for his name (Kunrei-shiki

romanization), instead of the more standard “Kiyoshi Itō” (Hepburn romanization).

69/142
5 Itô stochastic integral with respect to Brownian motion

Let us examine the notion of Stratonovich2 stochastic integral which corresponds to take the mean value
of the left-end and right-end times of the intervals in the Riemann sum. Namely, we have

X B ti +1 + B ti 1X 2 1
(B ti +1 − B ti ) = (B ti +1 − B t2i ) = B t2 ,
i 2 2 i 2

which gives, provided that the convergence holds in probability, and denoting with ◦ the Stratonovich
stochastic integral to distinguish it from the Itô stochastic integral,
Z t 1
B s ◦ dB s = B t2 .
0 2

This time the rule of differential calculus is satisfied, but the result has no reason to be a martingale. By
symmetry we can also define some sort of anticipative integral from the formula
X
B ti +1 (B ti +1 − B ti ),
i

which will lead to 2(Stratonovicht − 21 Itôt ) = B t2 + 12 t .

Coding in action 5.1.1. Numerical stochastic integration.


Rt
Could you write a code approximating by simulation the sums related to the integral 0 B s dB s given
in Section 5.1, and plotting the resulting process?

5.2 Itô stochastic integral with respect to Brownian motion

Let B = (B t )t ≥0 be a d -dimensional (Ft )t ≥0 Brownian motion issued from the origin. In this section we
construct rigorously the Itô stochastic integral with integrator B , namely, with Section 5.1 in mind,

Z t
ϕs dB s
0

where ϕ is some sort of square integrable stochastic process, that can be at least taken equal to B , say.
A d -dimensional step process (ϕt )t ≥0 is a process for which there exists 0 ≤ t 0 ≤ · · · ≤ t n , n ≥ 1, and
bounded random variables U0 , . . . ,Un−1 which are Ft0 , . . . , Ftn−1 measurable respectively, with for all t ≥ 0,

n−1
ϕt = U0 10 (t ) +
X
Ui 1(ti ,ti +1 ] (t ).
i =0

Such a step process is progressive (measurable with respect to P ), left-continuous, and on each time inter-
val, the random value is measurable with respect to the σ-algebra which corresponds to the left end time of
the interval3 . The vector space of step processes is denoted S Rd . We have S Rd ⊂ L R2d where

L R2d = L2Rd (Ω × R+ , P , P ⊗ dt )

is the Hilbert space of d -dimensional processes (ϕt )t ≥0 which are progressive with respect to (Ft )t ≥0 and

³Z ∞ ´ Z ∞
2
E |ϕs | ds = E(|ϕs |2 )ds = ∥ϕ∥2L2 (Ω×R+ )
< ∞.
0 0 Rd

2 Named after Ruslan Leont’evich Stratonovich (1930 – 1997), Russian physicist, engineer, and probabilist.
3 Hence the name “predictable” which is used sometimes.

70/142
5.2 Itô stochastic integral with respect to Brownian motion

Lemma 5.2.1. Approximation or density.

The set S Rd is dense in L R2d , namely for all ϕ ∈ L R2d and all ε > 0, there exists ψ ∈ S Rd such that
³Z ∞ ´
E |ϕs − ψs |2 ds < ε.
0

Proof. We can assume that ϕ is bounded, since by dominated convergence,


³Z ∞ ´
lim E |ϕs − ϕs 1ϕs ∈[−n,n] |2 ds = 0.
n→∞ 0

We can moreover assume that ϕ vanishes outside a finite time interval since
³Z ∞ ´
lim E |ϕs − ϕs 1s∈[0,n] |2 ds = 0.
n→∞ 0

We can assume furthermore that such a process is (left-)continuous since


³ Z ∞¯ ³ Z s ´ ¯2 ´
lim E ¯ϕ s − n ϕu du 1s> 1 ¯ ds = 0.
¯ ¯
n→∞ 0 s− n1 n

Finally it suffices to approximate such a process with elements of S Rd , namely


³Z ∞¯ ∞ ¯2 ´
lim E ¯ϕs − ϕi 1 ³i
X
¯ i ¯¯ ds = 0
n→∞ 0 n t∈ , i +1
i =0 n n

(makes sense since ϕ is bounded, left-continuous, and supported in a finite time interval). ■

We denote by M 2 be the set of continuous square integrable martingales for (Ft )t ≥0 . The elements of
M 2 are seen as random variables on the Wiener space (C (R+ , R), BC (R+ ,R) ). Two elements X and Y of M 2
are equal iff they are indistinguishable processes: P(∀t ≥ 0 : X t = Y t ) = 1.

Theorem 5.2.2. Brownian Itô stochastic integral of square integrable progressive processes.

There exists a unique linear map I : L 2 → M 2 denoted for all ϕ ∈ L R2d and all t ≥ 0 as
Z t
I (ϕ)t = ϕs dB s ,
0

and called the Itô stochastic integral with respect to Brownian motion, such that
Pn−1
1. (step processes) for all ϕ ∈ S Rd with decomposition ϕ = U0 10 + i =0 Ui 1(ti ,ti +1 ] , for all t ≥ 0,
Z t n−1
ϕs dB s =
X
Ui · (B ti +1 ∧t − B ti ∧t )
0 i =0

2. (Itô isometry) for all ϕ ∈ L R2d and all t ≥ 0,

³³ Z t ´2 ´ ³Z t ´
E ϕs dB s =E |ϕs |2 ds .
0 0

Moreover this linear isometry satisfies, for all ϕ, ψ ∈ L R2d , I (ϕ)0 = 0 and for all t ≥ 0,

³Z t ´ DZ • Z • E Z t
E ϕs dB s = 0 and ϕs dB s , ψs dB s = ϕs · ψs ds.
0 0 0 t 0

Furthermore I (ϕ) coincides with the Wiener integral of Chapter 3 when ϕ is not random.

71/142
5 Itô stochastic integral with respect to Brownian motion

Rt
Note that 0 ϕs dB s is centered even if ϕt is not. This comes from progressivity of ϕ and centering of B .
Theorem 5.2.2 provides plenty of continuous martingales from Brownian motion. Moreover contrary to
Wiener integrals, the Itô stochastic integral may produce martingales with non-deterministic bracket.

Proof. Uniqueness. Let I and I ′ be two linear maps from L R2d to M 2 satisfying the two properties of the
theorem. Let ϕ ∈ L R2d . From Lemma 5.2.1, for all n ≥ 1, there exists ψ(n) ∈ S Rd such that limn→∞ ψ(n) = ϕ in
L R2d . The linearity and isometry properties of I and I ′ and the equality of I and I ′ on S Rd give

L2 L2
I (ϕ)t = lim I (ψn )t = lim I ′ (ψn )t = I ′ (ϕ)t .
n→∞ n→∞

Therefore I (ϕ) and I ′ (ϕ) are modifications of each other: ∀t ≥ 0, P(X t = Y t ) = 1, and since they are continu-
ous, they are indistinguishable, in other words I = I ′ in M 2 .
Properties for step processes. Let ϕ ∈ S Rd . The process (I (ϕ)t )t ≥0 satisfies I (ϕ)0 = 0 and is continuous
since B is continuous. Moreover, by construction I (ϕ)t is the (finite) sum of centered bounded random
variables measurable with respect to Ft . Note that here it is crucial that Ui is Fti measurable for all i .
Furthermore, for all 0 ≤ s ≤ t , if t ∈ (t k , t k+1 ] and if s ∈ (t k ′ , t k ′ +1 ], k ′ ≤ k, the decomposition

I (ϕ)t = U0 · (B t1 − B t0 ) + · · · +Uk ′ · (B s − B tk ′ ) +Uk ′ · (B tk ′ +1 − B s ) + · · · +Uk · (B t − B tk+1 )

gives, using the conditional independence of the increments of B and the measurability assumptions on U ,

E(I (ϕ)t | Fs ) = I (ϕ)s + 0.

The zero comes from the fact that for all v ≥ u ≥ s, E(U · (B v − B u ) | Fs ) = E(U · E(B v − B u | Fu ) | Fs ) = 0 when
U is bounded and Fu -measurable, since B v −B u is conditionally independent of Fu . Hence X = I (ϕ) ∈ M 2 .
Let us now establish the formula for the increasing process 〈X 〉. Let us show that for all 0 ≤ s < t and A ∈ Fs ,
³ ³ Z t ´´
E 1 A X t2 − X s2 − |ϕu |2 du = 0.
s

Since X is a square integrable martingale, we have4

E(1 A (X t2 − X s2 )) = E(1 A (X t2 − 2X t X s + X s2 )) = E(1 A (X t − X s )2 ).

Now, with ϕt = U0 10 (t ) + i Ui 1(t i ,t i +1 ] (t ), ∆u,v


P
= B v − B u , and t 0 = 0, we have

X t = U0 · ∆t0 ,t1 +U1 · ∆t1 ,t2 + · · · +Uk · ∆tk ,t if t ∈ (t k , t k+1 ].

Since s < t , there exists ℓ ≤ k such that s ∈ (t ℓ , t ℓ+1 ], and thus



k s

 if i = ℓ
ei · ∆si ,si +1
X
1 A (X t − X s ) = U with s i := ti if i ∈ {ℓ + 1, . . . , k} .
i =ℓ


t if i = k + 1

Now for all i , j ∈ {ℓ, ℓ + 1, . . . , k} with i ≤ j , since U


ei is Fsi measurable, since ∆s j ,s j +1 is conditionally indepen-
dent of Fs j , with mean 0 and variance s j +1 − s j , we get
(
ei · ∆si ,si +1 )2 | Fsi )) = E(|U
E(E((U ei |2 )(s i +1 − s i ) if i = j
ei · ∆si ,si +1 )(U
E((U e j · ∆s j ,s j +1 )) = .
ei · ∆si ,si +1 )(U
E((U e j · E(∆s j ,s j +1 | Fs j ))) = 0 if i < j

It follows then that


k ³ X k ´ ³ Z t ´
2 2
E(1 A (X t − X s ) ) = E((Ui · ∆si ,si +1 )(U j · ∆s j ,s j +1 )) = E 1 A |Ui | (s i +1 − s i ) = E 1 A |ϕu |2 du .
X
e e
i , j =ℓ i =ℓ s

4 In fact Pythagoras theorem in L2 (Ω, F , P): E(X 2 | F ) = E((X − X )2 | F ) + E(X 2 | F ).


s t s t s s s s

72/142
5.2 Itô stochastic integral with respect to Brownian motion

Existence. Let ϕ ∈ L R2d . From Lemma 5.2.1, for all n ≥ 1, there exists ψ(n) ∈ S Rd with
³Z ∞ ´ 1
E |ϕs − ψ(n)
s | 2
ds ≤ n.
0 2

For all t ≥ 0, we set X t(n) = I (ψ(n) )t . By using the linearity and isometry of I on S Rd , we get, for all n and t ,
³Z t ´ 4
E(|X t(n) − X t(n+1) |2 ) = E |ψ(n) (n+1) 2
s − ψs | ds ≤ n .
0 2

Next, the Doob maximal inequality of Theorem 2.5.7 gives

16
E( sup |X s(n) − X s(n+1) |2 ) ≤ 4E(|X t(n) − X t(n+1) |2 ) ≤ .
s∈[0,t ] 2n

Therefore,

E sup |X s(n) − X s(n+1) | = E sup |X s(n) − X s(n+1) | ≤ ∥ sup |X s(n) − X s(n+1) |∥2 < ∞
X X X
n≥0 s∈[0,t ] n≥0 s∈[0,t ] n≥0 s∈[0,t ]

and thus, almost surely,


sup |X s(n) − X s(n+1) | < ∞.
X
n≥0 s∈[0,t ]

By using Lemma 4.3.2 with the Banach space (C ([0, t ], R), ∥·∥ = sup[0,t ] |·|), it follows that almost surely, the
sequence (X (n) )n of continuous martingales converges uniformly on every finite interval of R+ , as n → ∞,
towards a continuous process X = (X t )t ≥0 . This process is a martingale since for all 0 ≤ s < t and all A ∈ Fs ,

E(1 A (X t − X s )) = lim E(1 A (X t(n) − X s(n) )) = 0.


n→∞

The process X depends only on ϕ and does not depend on the particular sequence (ψ(n) )n chosen to con-
struct it. Moreover, from the preceding estimates, it follows that for all t ≥ 0, limn→∞ X t(n) = X t in L2 , in
particular E(X t ) = 0 since X t(n) is centered for all n, while
³Z t ´ ³Z t ´
E(X t2 ) = lim E((X t(n) )2 ) = lim E |ψ(n) 2
s | ds =E |ϕs |2 ds .
n→∞ n→∞ 0 0

We set I (ϕ) = X . The linearity of I follows also from the construction above.
Additional properties. Since almost surely the above convergence holds uniformly on [0, t ] and since
n
X 0 = 0 for all n ≥ 1 it follows that X 0 = 0. It remains to establish the formula for 〈X 〉, and for that it suffices
Rt
to show that for all 0 ≤ s ≤ t and all A ∈ Fs , we have E((X t2 − X s2 − s |ϕu |2 du)1 A ) = 0. But we have

t L1 ³
³ Z ´ Z t ´
X t2 − X s2 − |ϕu |2 du 1 A = lim (X t(n) )2 − (X s(n) )2 − |ψu(n) |2 du 1 A ,
s n→∞ s

and from what we did for step functions, the right hand side has zero mean for all n ≥ 0.
Finally, the last formula of the theorem comes from the fact that I is a linear isometry, via the polariza-
Rt
tion 4I (ϕ)I (ψ) = I (ϕ + ψ)2 − I (ϕ − ψ)2 for all ϕ, ψ ∈ L R2d , giving, for all t ≥ 0, 〈I (ϕ), I (ψ)〉t = 0 (ϕs · ψs )ds. ■

Rt
Example 5.2.3. 0 B s dB s .

In the proof of Theorem 5.2.2, the approximation by step functions remains valid for square inte-
grable Ui ’s. Since (B s 1s∈[0,t ] )s≥0 ∈ L R2d , this gives a meaning to the formula in Section 5.1:

t t P B t2 − t
Z Z X
B s dB s = B s 1s∈[0,t ] dB s = lim B ti −1 (B ti − B ti −1 ) = , t ≥ 0.
0 0 |δ|→0 i 2

We will learn how to compute many other integrals by using the Itô formula of Chapter 7. The quan-

73/142
5 Itô stochastic integral with respect to Brownian motion

tity above is clearly a centered martingale. We can also easily check the Itô isometry as
³³ Z t E(B t4 − 2t B t2 + t 2 ) 3t 2 − 2t 2 + t 2 t 2 t
´2 ´ Z
E B s dB s = = = = E(B t2 )ds.
0 4 4 2 0

Note that we have here a martingale with non-deterministic increasing process: for all t ≥ 0,
DZ • E Z t
B s dB s = B s2 ds (random and non-Gaussian).
0 t 0

The angle bracket is a quadratic variation and is thus here some sort of χ2 .

Example 5.2.4. Itô integration by parts formula.

Let B = (B t )t ≥0 and W = (Wt )t ≥0 be two BM on R for the same filtration (Ft )t ≥0 , with B 0 = W0 = 0. As
in Section 5.1 and Example 5.2.3, for all t > 0 and all subdivision 0 = t 0 < · · · < t n = t , the identity
X X
Wti (B ti +1 − B ti ) = (Wti +1 B ti +1 − Wti B ti )
i i
X
− (Wti +1 − Wti )(B ti +1 − B ti )
i
X
− B ti (Wti +1 − Wti )
i

gives the integration by parts formula (will work for arbitrary continuous martingales)
Z t Z t
Ws dB s = Wt B t − [W, B ]t − B s dWs .
0 0

We recover the formula of Example 5.2.3 when B = W . On the contrary, if B and W are independent
then [W, B ]t = 0. Namely, with ∆Bi = B ti +1 − B ti and ∆W = Wti +1 − Wti , we have E( i ∆W ∆Bi ) = 0 while
P
i i
¢2 ¢
E ∆W B
i ∆i E((∆Bi )2 )E((∆W 2
(t i +1 − t i )2 ≤ t max(t i +1 − t i ) → 0.
¡¡ X X X
= i ) )=
i i i i

Alternatively we may use [B,W ]t = 〈B,W 〉t and note that 〈B,W 〉t = 0 since for all 0 ≤ s ≤ t ,

E(B t Wt | Fs ) = E((B t − B s )(Wt − Ws ) + B t Ws + B s Wt − B s Ws | Fs ) = B s Ws

thus BW is a martingale and thus 〈B,W 〉 = 0.

Remark 5.2.5. Reduction to univariate stochastic integrals.

For all d ≥ 1, ϕ = (ϕ1 , . . . , ϕd ) ∈ L R2d if and only if ϕi ∈ L R2 for all 1 ≤ i ≤ d , and we have, for all t ≥ 0,

Z t d Z t
ϕs dB s = ϕis dB si .
X
0 i =1 0

The d -dimensional integral is a linear combination of one-dimensional integrals.

5.3 Brownian semi-martingales and Itô formula

Our main objective is to solve Stochastic Differential Equations with respect to X of the form

Z t Z t
Xt = X0 + σ(s, X s )dB s + b(s, X s )ds,
0 0

74/142
5.3 Brownian semi-martingales and Itô formula

t ≥ 0, driven by a Brownian motion B , where X 0 , σ, and b are given. The first integral of the right hand side
requires the notion of stochastic integral with respect to Brownian motion, defined in P, while the second
integral requires the notion of Lebesgue – Stieltjes integral ω by ω. Beyond stochastic differential equations,
we say that a process (X t )t ≥0 is a Brownian semi-martingale or an Itô process when it takes the form
Z t Z t
Xt = X0 + ϕs dB s + ψs ds
0 0

where ϕ ∈ L R2 is a square integrable progressive process and where ψ is a locally bounded progressive pro-
cess. The first integral in the right hand side is a square integrable martingale while the second integral is a
finite variation process. Then Itô formula writes, for all f ∈ C 2 (Rd , R),
Z t 1
Z t
f (X t ) − f (X 0 ) = f ′ (X s )dX s + f ′′ (X s )ds
0 2 0
Z t Z t 1
Z t
= f ′ (X s )ϕs dB s + f ′ (X s )ψs ds + f ′′ (X s )ds.
0 0 2 0

The problem with this formula is that the first integral in the right hand side is not well defined because
f ′ (X )ϕ is not necessarily in L R2 . Actually this stochastic integral can be defined via a limiting procedure
involving truncation stopping times, producing what we call a local martingale which is not a martingale
in general. Provided that we extend the stochastic integral by this way, the right hand side is a again an
(extended) Itô process, showing stability of this structure by composition with C 2 functions.
In Chapter 7 the Itô formula follows from a Taylor formula, and the last term is produced as in Section
5.1 by the quadratic variation of the Brownian integrator. In particular, if f ∈ C 1 (R, R), with primitive F , then
Z t 1
Z t
f (B s )dB s = F (B t ) − f ′ (B s )ds.
0 2 0
Rt
We recover as a special case the formula 0 B s dB s = 12 B t − 12 t identified by hand in Section 5.1.
More generally, it is proved in particular in Chapter 7 that if (X t )t ≥0 is a d -dimensional process for which
each coordinate process is an Itô process, then the Itô formula writes, for all f ∈ C 2 (Rd , R),

d Z t 1X d Z t
∂i f (X s )dX si ∂2 f (X s )ds.
X
f (X t ) = f (X 0 ) + +
i =1 0 2 i =1 0 i ,i

If needed, these Itô formulas for Itô processes are essentially sufficient for a first reading of Chapter 8 on
Stochastic Differential Equations driven by Brownian motion, as well as Chapter 9 on the Feynman – Kac
formula and the Kakutani representation of the solution of Dirichlet problems.
The notion of stochastic integral with respect to a general (semi-)martingale integrator beyond BM is
considered in Chapter 6, and the corresponding Itô formula is proved and studied in Chapter 7. These
notions are deep achievement of stochastic calculus. Beyond pragmatism, and in order to avoid being the
Monsieur Jourdain of semi-martingales, the best would be to try at some point to enter chapters 6 and 7.

75/142
5 Itô stochastic integral with respect to Brownian motion

76/142
Chapter 6

Itô stochastic integral and semi-martingales

Our aim is to construct a stochastic integral which goes beyond the stochastic integral with respect to
the Brownian motion integrator of Chapter 5, with an integrator (M t )t ≥0 which can be at least a general
continuous square integrable martingale (M t )t ≥0 and with an integrand (ϕt )t ≥0 which can be at least ran-
dom and possibly square integrable. The ambition is thus to define the process

³Z t ´
ϕs dM s .
0 t ≥0

• In Section 6.1, for one-dimensional processes, we construct the Itô stochastic integral when M is a
continuous martingale bounded in L2 , and when ϕ is square integrable in a suitable sense. This is
done from step processes and by using the Hilbert structure available for both ϕ and M .

• In Section 6.2, by using truncation (localization via stopping times cutoff), we extend the previous Itô
stochastic integral to the case where M is a continuous local martingale. This Itô stochastic integral
coincides with the integral constructed previously with respect to Brownian motion.

• In Section 6.3, the most general integrators M that we reach for the Itô stochastic integral are sums of
local martingales and bounded variation processes, called semi-martingales.

6.1 Stochastic integral with respect to continuous martingales bounded in L2

In this section, the integrator is taken in M20 , the set of continuous martingales bounded in L2 and issued
from the origin. This allows to benefit from the Hilbert nature of this set, see Theorem 4.3.11 . We would like
to define, for M ∈ M20 ,
Z t
ϕs dM s , t ≥ 0,
0

for reasonable integrands ϕ. Since the integrator M is a one-dimensional process, it is natural to consider
a one-dimensional integrand ϕ. Inspired by what we did before for the Brownian motion integrator in the
one-dimensional case d = 1, we denote by L 2 (M ) the set of progressive real processes (ϕt )t ≥0 such that
³Z ∞ ´
E ϕ2s d〈M 〉s < ∞.
0

In this formula, the integral on [0, ∞) is understood at all fixed ω as the limit as t → ∞ of the integral on [0, t ]
with respect to the increasing and thus bounded variation function s ∈ [0, t ] 7→ 〈M 〉s , see Theorem 1.7.3.
Since the (random) process 〈M 〉 may be constant on some intervals of time, we define the equivalence
R∞
relation ∼ on L 2 (M ) by setting ϕ ∼ ψ iff E( 0 (ϕs − ψs )2 d〈M 〉s ) = 0, and we consider the quotient space
L ∼2 (M ) = L 2 (M )/ ∼, still denoted L 2 (M ) for convenience. In fact, with this definition and convention,

L 2 (M ) = L2 (Ω × R+ , P , µ),
1 Note that the one-dimensional Brownian motion is square integrable but not bounded in L2 , but we could restrict our analysis

on processes on the time interval [0, t ], and Brownian motion is bounded in L2 on every finite time interval.

77/142
6 Itô stochastic integral and semi-martingales

where P is the progressive σ-field (Theorem 2.1.1), and µ the finite measure defined for all A ∈ P by
Z Z ∞
µ(A) = 1 A (ω, s)dP(ω)d〈M 〉s (ω).
Ω 0

Its total mass is E(〈M 〉∞ ) = ∥M ∥2M2 . Note that the increasing process 〈M 〉 is random in general, and this
0
makes a notable difference with the Brownian motion case studied before for which 〈B 〉 it is deterministic.
The scalar product and the norm of the Hilbert space L 2 (M ) are given respectively by
³Z ∞ ´ ³Z ∞ ´
2
〈ϕ, ψ〉L 2 (M ) = E ϕs ψs d〈M 〉s and ∥ϕ∥L 2 (M ) = E ϕ2s d〈M 〉s .
0 0

We denote for short by S the set of real valued progressive bounded step processes. We have S ⊂ L 2 (M ).

Lemma 6.1.1. Approximation or density.

Let M ∈ M20 . The vector space S of bounded step processes is dense in L 2 (M ). In other words, for
all ϕ ∈ L 2 (M ), and all ε > 0, there exists ψ ∈ S with
³Z ∞ ´
E (ϕs − ψs )2 d〈M 〉s < ε.
0

Proof. Since L 2 (M ) is a Hilbert space, it suffices to show that for all ϕ ∈ L 2 (M ), if 〈ϕ, ψ〉L 2 (M ) = 0 for all
ψ ∈ S , then ϕ = 0. Let ϕ ∈ L 2 (M ) be such an element, and set, for all t ≥ 0,
Z t
Φt = ϕs d〈M 〉s .
0

The integral in the right hand side makes sense since by the Cauchy – Schwarz inequality,
³Z t ´ ³ ³Z t ´´1/2 ³ ´1/2
E |ϕs |d〈M 〉s ≤ E ϕ2s d〈M 〉s E(〈M 〉∞ )
0 0

and the right hand side is finite since M ∈ M20 and ϕ ∈ L 2 (M ). This shows that almost surely, for all t ≥ 0,
Z t
|ϕs |d〈M 〉s < ∞.
0

The process (Φt )t ≥0 is continuous with finite variations and Φt ∈ L1 for all t ≥ 0. Now, let 0 ≤ s ≤ t and let Z
be a bounded Fs -measurable random variable. Since (ψu )u≥0 = (Z 1u∈(s,t ] )u≥0 ∈ S , we have
³ Z t ´
〈ϕ, ψ〉L 2 (M ) = E Z ϕu d〈M 〉u = 0.
s

Therefore E(Z (Φt − Φs )) = 0. Since Z is an arbitrary bounded Fs -measurable random variable and since
Φt ∈ L1 for all t ≥ 0, it follows that (Φt )t ≥0 is a martingale for (Ft )t ≥0 . On the other hand Φ is continuous with
finite variations, and therefore, thanks to Lemma 4.1.6, we get Φ = 0. Having in mind the initial definition on
Φ, this means that almost surely the signed measure with density ϕs with respect to d〈M 〉s is zero. But this
is possible only if ϕs = 0, d〈M 〉s almost everywhere, in other words only if ϕ = 0 in L 2 (M ), as expected. ■

Theorem 6.1.2. Itô stochastic integral with respect to elements of M20 .

Let M ∈ M20 . There exists a unique linear map I M : L 2 (M ) → M20 , denoted for all ϕ ∈ L 2 (M ) and t ≥ 0
Z t
I M (ϕ)t = ϕs dM s ,
0

and called the Itô stochastic integral with respect to M , such that

78/142
6.1 Stochastic integral with respect to continuous martingales bounded in L2

Pn−1
1. for all ϕ ∈ S with decomposition ϕt = U0 10 (t ) + i =0 Ui 1(ti ,ti +1 ] (t ), we have

n−1
X
I M (ϕ)t = Ui (M ti +1 ∧t − M ti ∧t )
i =0

2. the map I M is an isometry, namely for all ϕ ∈ L 2 (M ) add all t ≥ 0,


³³ Z ∞ ´2 ´ ³Z ∞ ´
E ϕs dM s =E ϕ2s d〈M 〉s .
0
| {z } | 0 {z }
∥I M (ϕ)∥2 2 ∥ϕ∥2
M L 2 (M )
0

Moreover, for all ϕ ∈ L 2 (M ), I M (ϕ) is the unique element of M20 such that for all N ∈ M20 and t ≥ 0,
Z t
〈I M (ϕ), N 〉t = ϕs d〈M , N 〉s .
0

Furthermore, for all ϕ ∈ L 2 (M ) and all stopping time T and all t ≥ 0,


Z t ∧T Z t Z t
ϕs dM = ϕs 1s≤T dM = ϕs dM T .
|0 {z } |0 {z } | 0 {z }
I M (ϕ)Tt I M (ϕ1•≤T )t I M T (ϕ)t

Proof. The proof follows the lines of the proof for the Brownian motion integrator.
First of all the definition of I M (ϕ) for ϕ ∈ S does not depend on the decomposition chosen for ϕ. It is
immediate to check that the map I M is linear on S . Let us prove that it is an isometry on S .
The following lemma generalizes an observation already made in the proof of Theorem 4.1.4.

Lemma 6.1.3. Baby stochastic integral.

Let M be a continuous martingale, let 0 ≤ u ≤ v, and let U be an Fu measurable bounded random


variable. Set M [u,v] = U (M v − M u ) = (U (M v∧t − M u∧t ))t ≥0 .

1. M [u,v] is a martingale, which is almost surely constant outside the time interval (u, v), more
precisely identically zero on [0, u] and constant and equal to U (M v − M u ) on [v, +∞)

2. If M is square integrable, then so is M [u,v] and 〈M [u,v] 〉 = U 2 (〈M 〉v − 〈M 〉u )


′ ′
3. If M is square integrable and 0 ≤ u ≤ v ≤ u ′ ≤ v ′ then 〈M [u,v] , M [u ,v ] 〉 = 0

Proof. Proof of Lemma 6.1.3

1. Almost immediate, the fact that U (M v − M u ) = 0 on [0, u] is crucial

2. [U (M v − M u )] = [U M v ] + [U M u ] − 2[U M u ,U M v ] = [U M ]v + [U M ]u − 2[U M ]u∧v = U ([M ]v − [M ]u )

3. Follows from the definition of quadratic covariation and first property (zero outside intervals).

For all ϕ ∈ S and all i ∈ {0, . . . , n − 1}, by Lemma 6.1.3,


n−1
M (i ) = Ui (M ti +1 ∧• − M ti ∧• ) ∈ M20 , M (i ) ∈ M20 ,
X
and thus I M (ϕ) =
i =0

moreover M (i ) is constant outside the interval (t i , t i +1 ), and for all i , j ∈ {0, . . . , n − 1},

〈M (i ) , M ( j ) 〉 = Ui2 (〈M 〉ti +1 ∧• − 〈M 〉ti ∧• )1i = j ,

79/142
6 Itô stochastic integral and semi-martingales

(note that this implies orthogonality in M20 of M (i ) and M ( j ) when i ̸= j ). This gives

n−1 n−1 Z •
〈M (i ) 〉 = Ui2 (〈M 〉ti +1 ∧• − 〈M 〉ti ∧• ) = ϕ2s d〈M 〉s .
X X
〈I M (ϕ)〉 =
i =0 i =1 0

The isometry property follows, more precisely


³Z ∞ ´
∥I M (ϕ)∥2M2 = E(〈I M (ϕ)〉∞ ) = E ϕ2s d〈M 〉s = ∥ϕ∥2L 2 (M ) .
0 0

By Lemma 6.1.1, S is dense in the space L 2 (M ), and the linear isometry I M extends uniquely to M20 .
Let N ∈ M20 . For all ϕ ∈ L 2 (M ), the Kunita – Watanabe inequality (Corollary 4.1.10) gives
³Z ∞ ´
E |ϕ|s |d〈M , N 〉s | ≤ ∥ϕ∥L 2 (M ) ∥N ∥M2 .
0
0
R∞
It follows that 0 ϕs d〈M , N 〉s is well defined and belongs to L1 . If ϕ ∈ S then

n−1
〈M i , N 〉
X
〈I M (ϕ), N 〉 =
i =0

and 〈M i , N 〉 = Ui (〈M , N 〉ti +1 ∧• − 〈M , N 〉ti ∧• ), and thus

n−1 n−1 Z •
i
ϕs d〈M , N 〉s .
X X
〈I M (ϕ), N 〉 = 〈M , N 〉 = Ui (〈M , N 〉ti +1 ∧• − 〈M , N 〉ti ∧• ) =
i =0 i =0 0

This gives the desired formula when ϕ ∈ S . But the map X ∈ M20 7→ 〈X , N 〉∞ ∈ L1 is continuous since by the
Kunita – Watanabe inequality (Corollary 4.1.10), we have

E(|〈X , N 〉∞ |) ≤ E(〈X 〉∞ )1/2 E(〈N 〉∞ )1/2 = ∥N ∥M2 ∥X ∥M2 .


0 0

If now ϕ ∈ L 2 (M ) and ϕ = limn→∞ ϕ(n) in L 2 (M ) for a sequence (ϕ(n) )n≥0 in S , then


Z ∞ Z ∞
(n) (n)
〈I M (ϕ), N 〉∞ = lim 〈I M (ϕ ), N 〉∞ = lim ϕs d〈M , N 〉s = ϕs d〈M , N 〉s ,
n→∞ n→∞ 0 0

where we have used for the last equality the inequality


³¯ Z ∞ ¯´
E ¯ (ϕ(n)
s − ϕs )d〈M , N 〉s ¯ ≤ ∥ϕ
(n)
− ϕ∥L 2 (M ) ∥N ∥M2 ,
¯ ¯
0
0

which follows from the Kunita – Watanabe inequality (Corollary 4.1.10). We have obtained
Z ∞
〈I M (ϕ), N 〉∞ = ϕs d〈M , N 〉s .
0

Finally, to replace ∞ by an arbitrary t ≥ 0, we can replace N by the stopped process N t = (N s 1s≤t )s≥0 .
Moreover this formula characterizes I M (ϕ) in M20 . Indeed, if X ∈ M20 satisfies the same formula, then, for
all N ∈ M20 , 〈I M (ϕ) − X , N 〉 = 0, and taking X = I M (ϕ) − X gives 〈I M (ϕ) − X 〉 = 0, thus X = I M (ϕ).
Furthermore, for all N ∈ M20 and ϕ ∈ L 2 (M ), and all stopping time T , the properties of the angle bracket
(Corollary 4.1.9) of two continuous square integrable martingales give, for all t ≥ 0,
Z t ∧T Z t
T
〈I M (ϕ) , N 〉t = 〈I M (ϕ), N 〉t ∧T = ϕs d〈M , N 〉s = ϕs 1s≤T d〈M , N 〉s .
0 0

Similarly, we have, for all t ≥ 0,


Z t Z t Z t
T
〈I M T (ϕ), N 〉t = ϕs d〈M , N 〉s = ϕs d〈M , N 〉Ts = ϕs 1s≤T d〈M , N 〉s .
0 0 0

These formulas, together with the preceding characterization of I M , give the desired formulas with T . ■

80/142
6.2 Stochastic integral with respect to continuous local martingales

Corollary 6.1.4. Angle bracket, moments, associativity.

1. For all M ∈ M20 , all ϕ ∈ L 2 (M ), and all t ≥ 0,


³Z t ´
E ϕs dM s = 0.
0

2. For all M , N ∈ M20 , all ϕ ∈ L 2 (M ), all ψ ∈ L 2 (N ), and all t ≥ 0,


DZ • Z • E Z t
ϕs dM s , ψs dN s = ϕs ψs d〈M , N 〉s ,
0 0 t 0

which gives
³³ Z t ´³ Z t ´´ ³Z t ´
E ϕs dM s ψs dN s =E ϕs ψs d〈M , N 〉s .
0 0 0

3. For all M ∈ M20 , all ϕ ∈ L 2 (M ), and all progressive process ψ, we have

ψ ∈ L 2 (I M (ϕ)) iff ϕψ ∈ L 2 (M ), and in this case I I M (ϕ) (ψ) = I M (ϕψ).

Proof.

1. Theorem 6.1.2 gives I M (ϕ) ∈ M20 , in particular I M (ϕ) is centered, as for all the elements of M20

2. By polarization, we can assume without loss of generality that M = N . The property follows then
from the isometry property of Theorem 6.1.2 applied with (ϕs 1s≤t )s≥0 together with the stopping time
property in Theorem 6.1.2 with the deterministic stopping time t .

3. By using Theorem 1.7.3 and then the second property of the present Corollary with N = M and ψ = ϕ,
³Z ∞ ´ ³Z ∞ Z s ´ ³Z ∞ ´
2 2 2 2
E ϕs ψs d〈M , M 〉s = E ψs d ϕu d〈M , M 〉u = E ψ2s d〈I M (ϕ)〉s ,
0 0 0 0

which gives that ϕψ ∈ L 2 (M ) iff ψ ∈ L 2 (I M (ϕ)). Next, by Theorem 6.1.2, for all N ∈ M20 and all t ≥ 0,
Z t Z t Z s Z t
〈I M (ϕψ), N 〉t = ϕs ψs d〈M , N 〉s = ψs d ϕu d〈M , N 〉u = ψs d〈I M (ϕ), N 〉s ,
0 0 0 0

which implies, by the characterization via N property of Theorem 6.1.2, that I M (ϕψ) = I I M (ϕ) (ψ).

6.2 Stochastic integral with respect to continuous local martingales

Up to now we have constructed the Itô integral for an integrator which is either BM or an element of M20 .
Our aim now is to consider an integrator M ∈ M0loc . The notion of increasing process of a local martingale
is natural, see Lemma 4.2.6. By analogy with what we did before, we consider. . .

• the set L 0 (M ) of progressive ϕ : Ω × R+ 7→ R such that almost surely


Z ∞
ϕ2s d〈M 〉s < ∞,
0

• the set L 2 (M ) ⊂ L 0 (M ) of progressive ϕ : Ω × R+ 7→ R such that


³Z ∞ ´
E ϕ2s d〈M 〉s < ∞,
0

both quotiented by the equivalence relation related to equality 〈M 〉 almost everywhere.

81/142
6 Itô stochastic integral and semi-martingales

Theorem 6.2.1. Itô stochastic integral with respect to continuous local martingales.

Let M ∈ M0loc .

• For all ϕ ∈ L 0 (M ), there exists a unique I M (ϕ) ∈ M0loc such that for all N ∈ M0loc and t ≥ 0,
Z t
〈I M (ϕ), N 〉t = ϕs d〈M , N 〉s .
0

• For all ϕ ∈ L 0 (M ) and all stopping time T , we have, for all t ≥ 0,


Z t ∧T Z t Z t
ϕs dM s = ϕs 1s≤T dM s = ϕs dM sT
|0 {z } |0 {z } | 0 {z }
I M (ϕ)Tt I M (ϕ1•≤T ) I M T (ϕ)t

• If ϕ ∈ L 0 (M ) and if ψ is a progressive process then

ψ ∈ L 0 (I M (ϕ)) iff ϕψ ∈ L 0 (M ), and in this case I I M (ϕ) (ψ) = I M (ϕψ)

• We have a defective Itô isometry in the sense that for all ϕ ∈ L 0 (M ) and all t ≥ 0, in [0, +∞],
³³ Z t ´2 ´ ³Z t ´
E ϕs dM s ≤E ϕ2s d〈M 〉s ,
0 0

either the right hand side is finite and equality holds or it is infinite and the inequality is trivial.

• If M ∈ M20 then I M coincides on L 2 (M ) with the Itô integral of Theorem 6.1.2

• If M ∈ M02 and ϕ ∈ L 2 (M ) then I M (ϕ) ∈ M02 and the Itô isometry holds: for all t ≥ 0, ϕ ∈ L 2 (M ),
Z t
E(I M (ϕ)2t ) = E ϕ2s d〈M 〉s .
0

In particular if M is BM on R then I M coincides on L R2d with Theorem 5.2.2 with d = 1

Note that if M ∈ M0loc , then for all t ≥ 0, the random variable M t may be not integrable and in particular
not square integrable. In particular, the Itô stochastic integral I M (ϕ) with respect to a local martingale may
not be centered and the Itô isometry may not hold. However, the Itô stochastic integral with respect to
M ∈ M02 such as Brownian motion do satisfy the centering and Itô isometry for integrands in L 2 (M ).

Proof. We proceed by localization with stopping times. For all n ≥ 0 we define the stopping time
n Z t o
Tn = inf t ≥ 0 : (1 + ϕ2s )d〈M 〉s ≥ n .
0

Almost surely Tn ↗ +∞ as n → ∞. The fact that it grows with n comes from the way we define T via the
integral of something non-negative for a positive measure. Now for all n ≥ 0, the Doob stopping (Theorem
2.5.1) implies that M Tn ∈ M0loc . Moreover, thanks to the 1 + · · · in Tn , for all n ≥ 0 and all t ≥ 0, we have

〈M Tn 〉t = 〈M 〉t ∧Tn ≤ n.

Therefore, by Lemma 4.2.7, M Tn ∈ M20 . Moreover we have, from the properties of the angle bracket,
Z ∞ Z Tn
2 Tn
ϕs d〈M 〉s = ϕ2s d〈M 〉s ,
0 0
2 Tn
therefore ϕ ∈ L (M ) and from Theorem 6.1.2, the stochastic process I M Tn (ϕ) makes sense and belongs to
M20 . The sequence of processes (M Tn )n≥0 is stationary since for all m > n, we have Tn ≤ Tm and thus

I M Tn (ϕ) = I M Tm ∧Tn (ϕ) = (I M Tm (ϕ))Tn .

82/142
6.3 Notion of semi-martingale and stochastic integration

Therefore there exists a unique process I M (ϕ) = limn→∞ I M Tn (ϕ) such that I M (ϕ)Tn = I M Tn (ϕ) for all n ≥ 0.
This process is continuous, adapted, and belongs to M0loc since (I M (ϕ))Tn = I M Tn (ϕ) ∈ M20 for all n ≥ 0.
Now, let N ∈ M0loc . For all n ≥ 0, let us define Tn′ = inf{t ≥ 0 : |N t | ≥ n} and S n = Tn ∧ Tn′ . Almost surely

S n ↗ +∞ as n → ∞. We have N Tn ∈ M20 and, thanks to Theorem 6.1.2, for all t ≥ 0,
S ′
〈I M (ϕ), N 〉t n = 〈I M (ϕ)Tn , N Tn 〉t
Z t

= ϕs d〈M Tn , N Tn 〉s
0
Z t
S
= ϕs d〈M , N 〉s n
0
Z t ∧S n
= ϕs d〈M , N 〉s ,
0

which gives the desired formula as n → +∞. As in Theorem 6.1.2, this formula characterizes I M (ϕ), mainly
due to the fact that if M ∈ M0loc is such that 〈M 〉 = 0 then M = 0.
It is not difficult to prove the remaining properties, including the relation to previous integrals. ■

Remark 6.2.2. Brownian motion as a martingale.

Let B be a real Brownian motion issued from the origin. We have E(B t ) = 0 and E(B t2 ) = t < ∞ for all
t ≥ 0 and supt ≥0 E(B t2 ) = +∞, therefore

B ∈ M02 ⊂ M0loc while B ̸∈ M20 .

Moreover since B ∈ M0loc we get that B Tn ∈ M20 for all n, where Tn = inf{t ≥ 0 : |B t | ≥ n}. We could
define stochastic integrals for processes on the time interval [0, t ] and BM is bounded in L2 on each
finite interval, and this would correspond to introduce the space M20,[0,t ] . Another possibility would
be to generalize the construction of stochastic integral that we gave for BM to integrators in M02 .

6.3 Notion of semi-martingale and stochastic integration

The notion of quadratic variation of a process is considered in Definition 4.1.1. The quadratic variation
of Brownian motion is considered in Theorem 3.2.1. In dimension one, for all t ≥ 0, [B ]t = 〈B 〉t = t .

Definition 6.3.1. Semi-martingales.

A continuous semi-martingale X = (X t )t ≥0 is an adapted process with decomposition of the form

X = X0 + M + V

where M and V are adapted continuous processes issued from the origin and such that

• M = (M t )t ≥0 is a continuous local martingale issued from the origin (∈ M0loc )

• V = (Vt )t ≥0 is a continuous finite variation process issued form the origin.

Note in particular that X 0 is F0 measurable, and that X is adapted.

Lemma 6.3.2. Canonical decomposition.

The decomposition of a continuous semi-martingale is unique.


In particular, a finite variation continuous local martingale is almost surely constant.

Proof. If X = X 0 + M + V = X 0 + M
f + Ve . Now the process W = Ve − V = M − M f is continuous and has finite
variation, and thus, by Lemma 4.1.2 (see also 4.1.6) it has zero quadratic variation. Therefore, by Lemma
4.2.6, 〈M − M
f〉 = [M − Mf] = 0, and since M 0 = M f = 0 and thus V = Ve . ■
f0 = V0 = Ve0 = 0, this implies M − M

83/142
6 Itô stochastic integral and semi-martingales

Let L locb be the set of processes which are progressive and locally bounded.
By Theorem 2.1.1, all continuous adapted processes belong to L locb .
Let M0semi be the set of continuous semi-martingales issued from the origin.
Let ϕ = (ϕt )t ≥0 ∈ L locb and X = M + V ∈ M0semi . Then, for all t ≥ 0, almost surely,
Z t
|ϕs ||dVs | < ∞.
0
0
R•
Additionally, we have ϕ ∈ L (M ), and therefore the stochastic integral 0 ϕt dM t is well defined. It follows
then that we can define the integral I X (ϕ) of ϕ with respect to the semi-martingale X = M + V as
Z t Z t Z t
I X (ϕ) = ϕs dX s = ϕs dM s + ϕs dVs ,
0 0 0

and we can see that the result I X (ϕ) is itself a continuous semi-martingale.

Theorem 6.3.3. Properties of the integral with respect to a continuous semi-martingale.

1. the map (ϕ, X ) 7→ I X (ϕ) is bilinear, from L locb × M0semi to M0semi

2. for all stopping time T , ϕ ∈ L locb , X ∈ M0semi , we have


Z t ∧T Z t Z t
ϕs dX s = ϕs 1s≤T dX s = ϕs dX sT .
|0 {z } | 0
{z 0
} | {z }
(I X (ϕ))T I X (ϕ1[0,T ] ) I X T (ϕ)

3. for all ϕ, ψ ∈ L locb and X ∈ M0semi , we have I I X (ϕ) (ψ) = I X (ϕψ) in other words
Z t Z s Z t
ψs d ϕu dX u = ϕs ψs dX s
0 0 0

4. for all X ∈ M0semi , if X is a local martingale (respectively a finite variation process) then for all
ϕ ∈ L locb , the process I X (ϕ) is a local martingale (respectively a finite variation process)

5. if ϕis a step process with decomposition ϕt = U0 10 (t ) + n−1


P
i =0 Ui 1(t i ,t i +1 ] (t ), 0 = t 0 < t 1 < · · · < t n ,
n ≥ 1, with Ui Fti -measurable for all i , then for all X ∈ M0semi and all t ≥ 0,
Z t n−1
ϕs dX s =
X
I X (ϕ)t = Ui (X ti +1 ∧t − X ti ∧t ).
0 i =0

Proof. The first four properties follow immediately from the definition or from the properties of the stochas-
tic integral with respect to continuous local martingales and with respect to finite variation processes.
The fifth and last property does not follow immediately due to the fact that the random variables Ui ’s
are not assumed to be bounded. It suffices to overcome this difficulty when X = M is a continuous local
martingale. In this case, we define, for all k ≥ 1,

Tk = inf{t ≥ 0 : |ϕt | ≥ k} = inf{t i : |Ui | ≥ k} ∈ [0, +∞].

It is a stopping time and almost surely Tk ↗ +∞ as k → ∞ and, for all k ≥ 1 and s ≥ 0,


n−1
ϕs 1[0,Tk ] (s) = Ui(k) 1(ti ,ti +1 ] (s) with Ui(k) = Ui 1Tk >ti .
X
i =0

Now ϕ1[0,Tk ] ∈ S , which allows to write


n−1
Ui(k) (M ti +1 ∧t − M ti ∧t )
X
(I M (ϕ))t ∧Tk = I M (ϕ1[0,Tk ] )t =
i =0

and it remains to send k to +∞. ■

84/142
6.3 Notion of semi-martingale and stochastic integration

Theorem 6.3.4. Dominated convergence for stochastic integrals.

Let X = M + V ∈ M0semi . Let ϕ and ϕ(n) , n ≥ 1, be progressive locally bounded processes (∈ L locb ).
Let ψ be a non-negative progressive locally bounded process (≥ 0 and ∈ L locb ). Let t > 0. Suppose
that almost surely the following properties hold:

• for all s ∈ [0, t ], limn→∞ ϕ(n)


s = ϕs

• for all s ∈ [0, t ] and all n ≥ 0, |ϕ(n)


s | ≤ ψs

Then Z t Z t
P
ϕ(n)
s dX s n→∞
−→ ϕs dX s .
0 0

Proof. From Theorem 1.7.3, the usual dominated convergence theorem gives
Z t Z t
(n) a.s.
ϕs dVs −→ ϕs dVs .
0 n→∞ 0

It remains to address the local martingale part. We do it by localization with the stopping time
n Z s o
Tk = inf s ∈ [0, t ] : ψ2u d〈M 〉u ≥ k .
0

We have Tk → +∞ as k → ∞ almost surely. Now by using the defective Itô isometry


³³ Z t ∧Tk Z t ∧Tk ´2 ´ ³ Z t ∧Tk ´
E ϕ(n)
s dM s − ϕ s dM s ≤ E (ϕ(n) 2
s − ϕs ) d〈M 〉s
0 0 0

But by definition of Tk , and thanks to the assumptions on ϕ and ψ, we can use dominated convergence to
get that for all k, the right hand side tends to 0 as n → ∞. It follows in particular that for all k,
Z t ∧Tk Z t ∧Tk
(n) P
ϕs dM s − ϕs dM s −→ 0.
0 0 n→∞

Therefore, for all ε > 0,


³¯ Z t Z t ¯ ´ ³¯ Z t ∧Tk Z t ∧Tk ¯ ´
P ¯ ϕ(n) dM − ϕ dM ≥ ε ≤ P(T ≤ t ) + P ϕ(n)
s dM s − ϕs dM s ¯ ≥ ε .
¯ ¯ ¯ ¯
s s s s ¯ k ¯
0 0 0 0

It remains to select k sufficiently large, and then n sufficiently large. ■

Theorem 6.3.5. From sums to stochastic integrals.

Let X = M +V ∈ M0semi and let ϕ be a continuous adapted process (which is in particular progressive).
Then for all t > 0 and for all sequence (δn )n≥0 of sub-divisions of [0, t ], δn : 0 = t 0(n) < · · · < t m
(n)
n
= t,
(n)
m n ≥ 1, such that |δn | = max0≤k≤mn −1 |t k+1 − t k(n) | → 0 as n → ∞, we have

mX
n −1
Z t
P
ϕt (n) (X t (n) − X t (n) ) −→ ϕs dX s .
k k+1 k n→∞ 0
k=0

Pmn −1
Proof. For all n ≥ 0, the step process ϕ(n) defined by ϕ(n) = ϕ0 10 + k=0
ϕt (n) 1(t (n) ,t (n) ] is progressive and
k k k+1
satisfies all the assumptions of the dominated convergence Theorem 6.3.4 with the process ψ defined by
ψs = supu∈[0,s] |ϕu | for all s ∈ [0, t ]. This process ψ is indeed continuous and adapted and thus progressive
and locally bounded. The desired result follows then from Theorem 6.3.4 since
Z t mX
n −1
ϕ(n)
s dX s = ϕtkn (X t (n) − X t (n) ).
0 k+1 k
k=0

85/142
6 Itô stochastic integral and semi-martingales

Remark 6.3.6. Multivariate case.

We encounter the multivariate case when X = B is BM in Rd and ϕ = (ϕ1 , . . . , ϕd ) is deterministic and


in L2Rd (R+ , x), for which I X (ϕ) = I B (ϕ) is the Wiener integral

Z • d Z •
ϕs dB s = ϕis dB ti .
X
0 i =1 0

At the opposite side of generality, let X = (X 1 , . . . , X d ) be a d -dimensional process such that for
all 1 ≤ i ≤ d , X i = M i + V i ∈ M0semi . We decide to say that such a process is by definition a d -
dimensional continuous semi-martingale issued from the origin. Then for all d -dimensional process
ϕ = (ϕ1 , . . . , ϕd ) such that for all 1 ≤ i ≤ d , ϕi ∈ L locb , we define
Z • d Z •
ϕs dX s = ϕis dX si .
X
0 i =1 0

In particular, when ϕi ∈ L 2 (M i ) for all 1 ≤ i ≤ d , then, for all t ≥ 0,

³³ Z t ´2 ´ d Z t
j
E ϕs dM E ϕis ϕs d〈M i , M j 〉s .
X
=
0 i , j =1 0

When M is a d -dimensional BM then 〈M i , M j 〉s = s1i = j and we recover the Itô isometry

³³ Z t ´2 ´ d Z t Z t
E ϕs dB s E (ϕis )2 ds = E |ϕs |2 ds.
X
=
0 i =1 0 0

6.4 Summary of stochastic integrals and involved spaces

Integrator X Integrand ϕ Integral I X (ϕ)


BM0 in Rd L2 d (R+ , dx) Gaussian martingale issued from the origin (Wiener integral, Chapter 3)
R
d
BM0 in R L 2d M02 (Itô integral with Brownian motion integrator, Chapter 5)
R
M20 2
L (M ), X = M M20
M0loc L 0 (M ), X = M M0loc
M0semi L locb M0semi

Space Definition
L2 d (R+ , dx) Deterministic square integrable ϕ : R+ → Rd
R
S Rd Progressive ϕ : Ω × R+ → Rd step processes with bounded increments
R∞
L 2d Progressive ϕ : Ω × R+ 7→ Rd such that E 0 |ϕs |2 ds < ∞
R
S Progressive ϕ : Ω × R+ → R step processes with bounded increments
R∞
L 2 (M ) Progressive ϕ : Ω × R+ 7→ R such that E 0 ϕ2s d〈M 〉s < ∞
R∞
L 0 (M ) Progressive ϕ : Ω × R+ 7→ R such that a.s. 0 ϕ2s d〈M 〉s < ∞
L locb Progressive ϕ : Ω × R+ 7→ R, a.s. sups∈[0,t ] |ϕs | < ∞ (locally bounded)
M02 Continuous square integrable martingales issued from the origin
M20 Continuous martingales bounded in L2 issued from the origin
M0loc Continuous local martingales issued from the origin
M0semi Continuous semi-martingales issued from the origin

86/142
Chapter 7

Itô formula and applications

7.1 Itô formula

Classical analysis comes with its fundamental formula of integral calculus expressing a regular func-
tion as the Riemann/Stieltjes/Lebesgue/Young integral of its derivative. For the Itô stochastic integral of
stochastic analysis, the analogue is the Itô formula. This is also known as the “Itô lemma”. A theorem here!
It was discovered in 2000 – see [7] – that we might speak about Döblin1 – Itô theorem/lemma/formula.
We have already seen in Section 5.1 that if B is a real Brownian motion issued from the origin then
Z •
B2 = 2 B s dB s + 〈B 〉 ∈ M semi .
0

The Itô formula goes beyond and states more generally that the image of a semi-martingale by a C 2 function
is again a semi-martingale. It can be seen as a fundamental rule of calculus for the Itô stochastic integral.

Theorem 7.1.1. Itô or Döblin formula for d -dimensional semi-martingales.

If X = (X t )t ≥0 is d -dimensional such that for all 1 ≤ i ≤ d its i -th coordinate (X ti )t ≥0 is a continuous


semi-martingale with decomposition X i = X 0i + M i + V i then for all f ∈ C 2 (Rd , R) and all t ≥ 0,

d Z t ∂f 1 X n Z t ∂2 f
(X s )dX si + (X s )d〈M i , M j 〉s
X
f (X t ) = f (X 0 ) +
i =1 0 ∂x i 2 i , j =1 0 ∂x i ∂x j
| {z } | {z }
usual calculus term Itô stochastic correction
d Z t ∂f d Z t ∂f 1 X n Z t ∂2 f
(X s )dM si + i
(X s )d〈M i , M j 〉s .
X X
= f (X 0 ) + (X s )dVs +
i =1 0 ∂x i i =1 0 ∂x i 2 i , j =1 0 ∂x i ∂x j
| {z } | {z }
∈M0loc finite variation process

In particular f (X ) = ( f (X t ))t ≥0 ∈ M semi . For d = 1 the formula simply writes, for all t ≥ 0,
Z t 1
Z t
f (X t ) = f (X 0 ) + f ′ (X s )dX s + f ′′ (X s )d〈M 〉s
0 2 0
| {z } | {z }
usual calculus term Itô stochastic correction
Z t Z t
1 t
Z
=f (X 0 ) + f ′ (X s )dM s + f ′ (X s )dVs + f ′′ (X s )d〈M 〉s .
0 0 2 0
| {z } | {z }
∈M0loc finite variation process

The example above corresponds to d = 1, X 0 = 0, M = B , V = 0, and f = (•)2 .


Rt
In the case f (x) = x for all x ∈ Rd we get the very natural formula 0 dX s = X t − X 0 .
The Itô formula remains valid for processes defined on a finite and deterministic time interval [0, T ].

1. The second order term with the local martingale part of X is typical from the Itô stochastic integral. It
does not appear in the similar formula for the Stratonovich stochastic integral.
1 Named after Wolfgang Döblin or Vincent Doblin, French – German mathematician (1915 – 1940).

87/142
7 Itô formula and applications

2. Alternatively, in a more condensed form, the formula writes, denoting 〈X 〉 = 〈M 〉,


Z t
1 t
Z
f (X t ) = f (X 0 ) + ∇ f (X s ) · dX s + Hess( f )(X s ) · d〈X 〉s .
0 2 0
The formula can also be written alternatively using differential abstract notation, namely
1
d f (X t ) = ∇ f (X t ) · dX t + Hess( f )(X t ) · d〈X 〉t .
2

3. When d = 1 and V = 0 then this gives, for all M ∈ M loc (beware that we incorporate X 0 into M )
Z t
1 t ′′
Z

f (M t ) = f (M 0 ) + f (M s )dM s + f (M s ) · d〈M 〉s .
0 2 0
Note that M 0 plays no role in the integrators dM and d〈M 〉 however it plays a role in the integrands
f ′ (M ) and f ′′ (M ). Now if M is additionally bounded then we can take expectations, and this gives
Z t
1
E( f (M t )) = E( f (M 0 )) + E f ′′ (M s )d〈M 〉s .
2 0
We could always localize M − M 0 with a stopping time in order to get it bounded (in particular in L2 ).

4. When d = 1 and M = 0 we recover the fundamental formula of calculus for the Lebesgue – Stieltjes
integral, namely for all continuous adapted finite variation process V ,
Z t
f (Vt ) = f (V0 ) + f ′ (Vs )dVs .
0

5. When d = 1 and f = (·)2 we obtain the formula


X t2 − X 02 − 〈M 〉t
Z t
X s dX s = .
0 2
This generalizes the formula that we have already obtained for Brownian motion in Section 5.1. In
particular when X 0 = 0 and X = M in other words when V = 0 then this states
Z •
2 M s dM s = M 2 − 〈M 〉.
0

Actually we already knew that if M ∈ M0loc then M 2 = (M 2 −〈M 〉)+〈M 〉 ∈ M0semi , and more generally, in
the same spirit, if M ∈ M loc then M 2 = M 02 +(M − M 0 )2 − 〈M − M 0 〉 + 2M 0 (M − M 0 ) +〈M −M 0 〉 ∈ M semi .
| {z }
∈M0loc

6. For all M , N ∈ M loc , the Itô formula with f (x 1 , x 2 ) = x 1 x 2 and X = (M , N ) gives, for all t ≥ 0,
Z t Z t
M t N t = M 0 N0 + M s dN s + N s dM s + 〈M , N 〉t
0 0

which is an integration by parts formula. In the same spirit, for all M ∈ M loc and for all adapted
continuous finite variation process V , taking f (x 1 , x 2 ) = x 1 x 2 and X = (M ,V ) gives, for all t ≥ 0,
Z t Z t
M t V t = M 0 V0 + M s dVs + Vs dM s .
0 0

7. When V = 0, X 0 = x, and M = B is BM with B 0 = 0 then 〈M i , M j 〉t = t 1i =k and the Itô formula becomes


Z t
1 t
Z
f (B t ) = f (x) + ∇ f (B s ) · dB s + ∆ f (B s )ds.
0 2 0
In particular, when f = |·|2 , we obtain
Z t
2 2
|B t | = |x| + 2 B s · dB s + t d .
0
Rt
When d = 1 and B 0 = 0 we recover the formula 12 (B t2 − t ) = 0 B s dB s that we have already found in
Chapter 6. We say that (|B t |)t ≥0 is a Bessel process and that (|B t |2 )t ≥0 a squared Bessel process.

88/142
7.1 Itô formula

8. Let us consider the case X = (X 1 , . . . , X d +1 ) = (B,V ), where B is a Brownian motion in Rd , Vt = t for all
t ≥ 0, and f (x, x d +1 ) = exp(λ · x − 21 |λ|2 x d +1 ), with λ ∈ R fixed. The process N λ = f (X ) satisfies N0λ = 1.
As explained in the proof of Theorem 7.2.1, by the Itô formula, N λ is a semi-martingale that solves the
following stochastic differential equation which catches intuitively its exponential nature:
Z t
N tλ = 1 + N sλ d(λ · B s ).
0

An exponential semi-martingale of the same type appears also in Theorem 7.3.1 with more general
ingredients. Such “exponential semi-martingales” are known as Doléans-Dade exponential martin-
gales. When d = λ = 1 this process is also known as geometric Brownian motion.

9. For a time dependent function f (t , x), the formula with Xe = (t , X ) gives


Z t Z t 1
Z t
f (t , X t ) = f (0, X 0 ) + ∂t f (s, X s )ds + ∇x f (s, X s ) · dX s + Hessx ( f )(s, X s )d〈M 〉s · d〈M 〉s .
0 0 2 0

Note that t ∈ R+ 7→ t is a continuous finite variation process. It does not contribute to the last term.

10. The Itô formula extends naturally to a complex valued f : Rd → C. Let us examine a nice and impor-
tant special case. Let X = X 0 + M + V be a continuous semi-martingale and let A be a finite variation
λ2
y
process. For all λ ∈ R, by the Itô formula for f (x, y) = eiλx+ 2 and (X t , A t )t ≥0 ,
t λ2 t λ2 t
2
Z Z Z
iλX t + λ2 A t
N tλ =e =e iλX 0
+ iλ N sλ dX s + N sλ dA s − N sλ d〈M 〉s .
0 2 0 2 0

Now if additionally X 0 = 0 and A = 〈M 〉 then we get the stochastic differential equation


Z t
N tλ = 1 + iλ N sλ dX s .
0

This semi-martingale is used in the proof of the Lévy characterization of Brownian motion in Theorem
7.2.1. The case N −iλ is a special case of the Doléans-Dade exponential of Theorem 7.3.1.

The Itô formula involves 〈M , N 〉 with M , N continuous local martingales. Note that we have defined
〈M , N 〉 only when both M and N are continuous local martingales. This coincides with the quadratic vari-
ation [M , N ]. Note also that we have defined (when it exists) the quadratic variation [M , N ] for arbitrary
processes. In particular [M , N ] = 0 if M is a continuous process and N is a finite variation process.

Proof. We suppose first that on some event Ω′ , the random variables and processes X 0 , M , and V are
bounded in the sense that for some deterministic and finite constant C , almost surely
³ ´
sup |X 0 | + |M ti | + |Vti | + |〈M i , M j 〉t | ≤ C .
t ≥0
1≤i , j ≤d

Under this assumption, we can assume without loss of generality that f is compactly supported. Next, note
also that under this assumption, for all 1 ≤ i ≤ d , since M i is a bounded continuous local martingale, it is,
by localization and dominated convergence, a bounded continuous martingale.
A Taylor formula for f gives, for all x, y ∈ Rd ,

1
f (y) − f (x) = 〈∇ f (x), y − x〉 + 〈Hess( f )(x)(y − x), y − x〉 + r (x, y)|x − y|2
2
X ∂f 1 X ∂2 f
= (x)(y i − x i ) + (x)(y i − x i )(y j − x j ) + r (x, y)|y − x|2 .
i ∂x i 2 i,j ∂x i ∂x j

∂2 f
Since f is C 2 with compact support, by Heine theorem, x 7→ f ′′ (x) = Hess( f )(x) = ( ∂xi ∂x j )1≤i , j ≤d is uniformly
continuous, and therefore there exists a bounded continuous non-decreasing function g : R+ → R such that
limu→0 g (u) = 0 and |r (x, y)| ≤ g (|x − y|) for all x, y ∈ R .

89/142
7 Itô formula and applications

Now we fix t > 0 and we consider a sequence (δn )n≥0 of sub-divisions of [0, t ],

δn : 0 = t 0(n) < · · · < t m


(n)
n
= t, mn ≥ 1

(n)
such that limn→∞ max0≤k≤mn −1 |t k+1 − t k(n) | = 0. To simplify, we drop from now on the superscript (n) . We
denote ∆Yk = Y tk+1 − Y tk for all k and all process Y . A telescopic summation and the Taylor formula give
X
f (X t ) − f (X 0 ) = ( f (X tk+1 ) − f (X tk ))
k
1X
= 〈∇ f (X tk ), ∆X k 〉 + 〈Hess( f )(X tk )∆X k , ∆X k 〉 + r (X tk , X tk+1 )|∆X k |2 .
X X
k 2 k k
| {z } | {z } | {z }
S1 S2 S3

For the term S 1 , we have, using Theorem 6.3.5,

d Z ∂f
Z t Z t t
P
S 1 = 〈∇ f (X tk ), ∆Vk 〉 + 〈∇ f (X tk ), ∆M k 〉 −→ (X s )dX si .
X X X
∇ f (X s )dVs + ∇ f (X s )dM s =
k k
n→∞ 0 0 i =1 0 ∂x i

For the term S 2 , we have


1X
S2 = 〈Hess( f )(X tk )∆X k , ∆X k 〉
2 k
1 X X ∂2 f j
= (X tk )∆X ki ∆X k
2 k i , j ∂x i ∂x j
1 X X ∂2 f j XX ∂ f
2
j 1 X X ∂2 f j
= (X tk )∆M ki ∆M k + (X tk )∆M ki ∆Vk + (X tk )∆Vki ∆Vk .
2 k i , j ∂x i ∂x j k i,j ∂x i ∂x j 2 k i,j ∂x i ∂x j
| {z } | {z } | {z }
S 2′ S 2′′ S 2′′′

Now since V j has finite variation and since M i is continuous, we have


X ¯ ∂2 f Z t¯ 2
¯ ∂ f
¯ ¯ ¯
i j a.s. X
′′
(X s )¯¯ d|V j |s = 0.
X ¯ ¯
2|S 2 | ≤ max |∆M k | ¯ ¯ (X tk )¯ |∆Vk | −→
¯ 0× ¯
k ∂x i ∂x j 0 ∂x i ∂x j
n→∞
i,j k
¯
i,j

Similarly, using this time the continuity of V i and the finite variation of V j , we get

X ¯ ∂2 f Z t¯ 2
¯ ∂ f
¯ ¯ ¯
i j a.s. X
′′′
(X s )¯¯ d|V j |s = 0.
X ¯ ¯
2|S 2 | ≤ max |∆Vk | ¯ ¯ (X tk )¯ |∆Vk | −→
¯ 0× ¯
k ∂x i ∂x j 0 ∂x i ∂x j
n→∞
i,j k
¯
i,j

For S 2′ , using the notation M i , j for “M i , M j ” and using the formulas of Lemma 7.1.2 in (∗) and (∗∗),
³³ X ∂2 f ´2 ´ ³ X³ ∂2 f ´2 ´
j ∗ j
E (X tk )(∆M ki ∆M k − ∆〈M i , j 〉k ) = E (X tk ) (∆M ki ∆M k − ∆〈M i , j 〉k )2
k ∂x i ∂x j k ∂x i ∂x j
³³ X ´2 ´
j
≤ ∥Hess( f )∥2∞ E (∆M ki ∆M k − ∆〈M i , j 〉k )
k
³³ X ´2 ´
∗∗ j
= ∥Hess( f )∥2∞ E ∆M ki ∆M k − 〈M i , j 〉t .
k

We have, for all i , j , using the fact that M is a bounded continuous martingale,

j L2
∆M ki ∆M k − 〈M i , j 〉t −→ 0.
X
n→∞
k

Now, by Theorem 4.1.4 for 〈M i , M j 〉 = [M i , M j ] for all i , j , the fact that f is C 2 , and Theorem 1.7.3,

P P 1 X d X ∂2 f
lim S 2′ = lim (X tk )〈M i , j 〉k
i , j =1 k ∂x i ∂x j
n→∞ n→∞ 2

90/142
7.1 Itô formula

1 X d Z t ∂2 f
= (X s )d〈M i , M j 〉s .
2 i , j =1 0 ∂x i ∂x j

Regarding S 3 , we have, using the monotony of g and Theorem 4.1.4,


d ³X ´
|r (X tk , X tk+1 )||∆X k |2 ≤ g (max |∆X k |) (∆M ki )2 + (∆Vki )2 .
X X X
k
k | {z } i =1 |k {z } |k {z }
P
−→ 0 P P
n→∞ −→ 〈M i 〉t −→ 〈V 〉t =0
n→∞ n→∞

This achieves the proof under the assumptions of boundedness of X 0 , M , and V on an event Ω′ .
To prove the general case, we consider the sequence (Tk )n≥0 of stopping times defined for all k ≥ 0 by
n d o
|M ti | + |Vti | + |〈M i , M j 〉t | ≥ k
X
Tk = inf t ≥ 0 : |X 0 | +
i , j =1

then Tk ↗ +∞ a.s. as k → ∞, and by arguing as in Remark 4.2.3, it follows that X Tk is a continuous martin-
gale. Moreover |X Tk | ≤ |X 0 |1|X 0 |>k + k1|X 0 |≤k a.s. Using the first part on the event Ω′ = Ω′k = {|X 0 | ≤ k}, this
gives, on Ω′k , thanks to the properties of (stochastic) integrals with respect to stopping times,
d Z t ∧Tk ∂f 1 X d Z t ∧Tk ∂2 f
i
(X s )d〈M i , M j 〉s .
X
f (X t ∧Tk ) = f (X 0 ) + (X s )dX s +
i =1 0 ∂x i 2 i , j =1 0 ∂x i ∂x j

Now both sides converge in probability as k → ∞ to the desired formula. Indeed we get the desired formula
on the event Ω′k ∩ {Tk ≥ t }, and P(Ω′k ∩ {Tk ≥ t }) −→ 1. ■
k→∞

It could be possible to prove Theorem 7.1.1 without truncation by using dominated convergence for
stochastic integrals (Theorem 6.3.4) but still the second order term will remain the difficult part, see for
instance [31]. Another classical proof of Theorem 7.1.1 consists in using the linearity of the statement with
respect to f and the functional monotone class theorem machinery to get it for all f , see for instance [4].
This approach requires a stability by product which amounts to prove separately as a preparatory result
the integration by parts formula. Such a proof may look short at first sight but remains a bit magical and
artificial. The proof that we give based on a Taylor formula is maybe more intuitive and constructive.

Lemma 7.1.2. Quadratic formulas with increments of martingales.

Let M , N be bounded martingales issued from the origin and let Z be a bounded adapted process.
Let t > 0 and let δ : 0 = t 0 < · · · < t n = t be a subdivision or partition of the interval [0, t ]. Then,
denoting ∆X k = X tk+1 − X tk and Zk = Z tk , we have
³³ X ´2 ´ ³X ´
E Zk (∆M k ∆Nk − ∆〈M , N 〉k ) = E Zk2 (∆M k ∆Nk − ∆〈M , N 〉k )2
k k

and in particular, reading this formula from right to left when Z = 1 gives
³X ´ ³³ X ´2 ´
E (∆M k ∆Nk − ∆〈M , N 〉k )2 = E ∆M k ∆Nk − 〈M , N 〉t .
k k

Proof. The intuitive idea is to rely on conditional orthogonality properties of increments of square inte-
grable martingales. For all k, we denote for simplicity Tk = Zk (∆M k ∆Nk − ∆〈M , N 〉k ). By using the martin-
gale property for the martingales M and N in (∗) and for the martingale M N − 〈M , N 〉 in (∗∗), we get

E(Tk | Ftk ) = Zk E(M tk+1 N tk+1 + M tk N tk − M tk+1 N tk − M tk N tk+1 − ∆〈M , N 〉k | Ftk )



= Zk E(M tk+1 N tk+1 − M tk N tk − ∆〈M , N 〉k | Ftk )
= Zk E(∆(M N )k − ∆〈M , N 〉k | Ftk )
∗∗
= Zk E(∆(M N − 〈M , N 〉)k | Ftk ) = 0.

For all k ′ < k, Tk ′ is Ftk measurable and E(Tk ′ Tk ) = E(Tk ′ E(Tk | Ftk )) = 0. Thus E(( k Tk )2 ) = E( k Tk2 ).
P P

91/142
7 Itô formula and applications

Remark 7.1.3. Extension to discontinuous semi-martingales.

Let X be a real semi-martingale, right continuous with left limits (càdlàg). A jump occurs at time t
when X t − X t − ̸= 0. It can be proved that the quadratic variation of the càdlàg local martingale part
M of X , is well defined and admits a decomposition of the form

(M s − M s − )2 + [M , M ]ct
X
[M , M ]t =
s≤t

where the sum runs over all jumps up to time t while the super-script “c” stands for “continuous”.
The Doléans-Dade – Meyer generalized Itô formula states then that for all f ∈ C 2 (R, R) and all t ≥ 0,
Z t 1
Z t
′ ′
f ′′ (X s )d[M , M ]cs .
X
f (X t ) = f (X 0 ) + f (X )dX s +
s− ( f (X s ) − f (X ) − f (X )(X s − X )) +
s− s− s−
0 0<u≤t 2 0

An accessible presentation of stochastic calculus for jump processes can be found in [41].

Remark 7.1.4. Stratonovich stochastic integral and Itô formula.

If M and N are continuous semi-martingales, we define the Stratonovich stochastic integral by


Z t Z t 1
M s ◦ dN s = M s dN s + 〈M , N 〉t .
0 0 2

The Itô formula is then simpler and resembles to the fundamental rule of calculus

d Z t ∂f
(X s ) ◦ dX si ,
X
f (X t ) = f (X 0 ) +
i =1 0 ∂x i

for f ∈ C 2 (Rd , R). For more information and applications, see for instance [4, Chapter 6].

7.2 Lévy characterization of Brownian motion

Theorem 7.2.1. Lévy.

Let X = (X t )t ≥0 be a d -dimensional adapted process such that:

• for all 1 ≤ k ≤ d the k-th coordinate (X tk )t ≥0 is a continuous local martingale

• for all 1 ≤ j , k ≤ d we have (〈X j , X k 〉t )t ≥0 = (t 1 j =k )t ≥0 .

Then X is a Brownian motion (with respect to the same filtration).

In practice, when d = 1, this is often used after showing that (X t )t ≥0 and (X t2 − t )t ≥0 are local martingales.

Proof. By replacing X by X − X 0 we can assume that X 0 = 0. By Theorem 3.1.3, in order to show that X is a
Brownian motion, it suffices to show that for all λ ∈ Rd , the process N λ = (exp(iλ · X t + 12 |λ|2 t ))t ≥0 is a mar-
tingale. Now N λ = f (X ,V ) where f (x, x d +1 ) = exp(iλ· x + 21 |λ|2 x d +1 ) and Vt = t for all t ≥ 0. The Itô formula2
(Theorem 7.1.1) gives that N λ is a continuous semi-martingale. It remains to use Lemma 7.2.2.

Lemma 7.2.2. Martingale criterion.

Let (M t )t ≥0 be a continuous local martingale such that M 0 ∈ L1 and such that for all t ≥ 0, there exists
a finite constant C t such that sups∈[0,t ] |M s | ≤ C t . Then M is a martingale.

Note that we can construct continuous local martingales which are bounded in L2 but not martingales!
2 We can alternatively use the Itô formula for the time dependent f (t , x) = exp(iλ · x + 1 |λ|2 t ) and the semi-martingale X .
2

92/142
7.3 Doléans-Dade exponential

Proof. We can assume that M 0 = 0 by considering M − M 0 . Next, we know that M is localized by (Tn )n≥0
where Tn = inf{t ≥ 0 : |M t | ≥ n}. Now, for all t ≥ 0, we take n > C t , hence Tn > t and (M s )s∈[0,t ] = (M s∧Tn )s∈[0,t ] ,
and since M Tn is a martingale, we get that (M s )s∈[0,t ] is a martingale for all t ≥ 0, hence M is a martingale. ■

Let us write the Itô formula in the proof of Theorem 7.2.1. It is crucial that N λ = exp(iλX t + 21 |λ|2Vt ) for
a finite variation process V such that that for all 1 ≤ j , k ≤ d , 〈X j , X k 〉 = V 1 j =k . We have

d Z t³ 1 ´ i2 X d Z t
N tλ = 1 + iλk N sλ dX sk + |λ|2 N sλ dVs + λ j λk N sλ 1 j =k dVs
X
k=1 0 2 2 j ,k=1 0
Z t Z t Z t
1 1
= 1+i N sλ d(λ · X s ) + |λ|2 N sλ dVs − |λ|2 N sλ dVs
0 2 0 2 0
| {z }
=0
Z t
= 1+i N sλ d(λ · X s ).
0

This shows that N λ is a semi-martingale solving the stochastic differential equation


Z •
λ
N = 1+i N sλ λ · dX s .
0

The same idea, with the Laplace transform instead of the Fourier transform, is at the heart of the concept of
Doléans-Dade exponential of Theorem 7.3.1. Note that N −iλ is a multivariate Doléans-Dade exponential.

7.3 Doléans-Dade exponential

The following theorem generalizes what we already know for the Laplace transform of Brownian motion.

Theorem 7.3.1. Doléans-Dade3 exponential.

Named after Catherine Doléans-Dade (1942 – 2004), French American mathematician.

Let M = (M t )t ≥0 and V = (Vt )t ≥0 be continuous adapted real processes issued from the origin, with
V non-decreasing. For all λ ∈ R let us define the process

λ2
X λ = (X tλ )t ≥0 = (eλM t − V
2 t )t ≥0 .

Then the following properties are equivalent.

1. M is a local martingale and 〈M 〉 = V

2. X λ is a local martingale for all λ ∈ R.

Moreover, in this case,


Z t
λ
3. X solves the stochastic differential equation X 0λ = 1 and X tλ = 1+λ X sλ dM s , t ≥ 0.
0

4. X λ is a super-martingale, and a martingale if and only if E X tλ = 1 for all t ≥ 0

5. If X λ is a martingale and EeλM t < ∞ for all λ ∈ R and all t ≥ 0 then M is a martingale.

Proof. Suppose that M is a local martingale and that we have 〈M 〉 = V . For all λ ∈ R, by the Itô formula,
Z •
λ
X = 1+λ X sλ dM s .
0

93/142
7 Itô formula and applications

Thus X λ is a non-negative local martingale. Conversely, suppose first that for all λ ∈ R, X λ is a martingale
and that M and V are bounded. Then for all 0 ≤ s < t and all A ∈ Fs ,
λ2 λ2
E(1 A eλM t − V
2 t ) = E(1 A X tλ ) = E(1 A X sλ ) = E(1 A eλM s − V
2 s ).

Taking the derivative with respect to λ, which is allowed here by dominated convergence since X λ (M − λV )
is dominated by a constant thanks to the fact that M and V are bounded, gives
λ2 λ2
E(1 A eλM t − V
2 t (M t − λVt )) = E(1 A eλM s − V
2 s (M s − λVs )),

which shows by taking λ = 0 that M is a martingale. Moreover, taking again the derivative with respect to λ,
which is allowed here by dominated convergence since X λ ((M − λV )2 − V ) is dominated by a constant,

E(1 A X tλ ((M t − λVt )2 − Vt )) = E(1 A X sλ ((M s − λVs )2 − Vs )),

which shows by taking λ = 0 that M 2 − V is a martingale. In particular we get 〈M 〉 = V . Back to the general
case, if for all λ ∈ R, X λ is a local martingale, then we proceed by localization via Tn = inf{t ≥ 0 : |M t |+Vt ≥ n},
for which Tn ↗ +∞ almost surely as n → ∞ and, for all n ≥ 0, (X λ )Tn is a bounded martingale and M Tn and
V Tn are bounded. We use then what we did for the martingale case to get that M Tn is a martingale and
〈M Tn 〉 = V Tn , for all n ≥ 0, which implies that M is a local martingale and 〈M 〉 = V .
Let us prove the last properties stated when M is a continuous local martingale and V = 〈M 〉.

3. We already have seen that at the beginning of the proof.

4. Since X λ is a local martingale, there exists stopping times (Tn )n≥0 such that Tn ↗ +∞ almost surely
and such that (X tλ∧Tn ) is a martingale for all n ≥ 0 then for all 0 ≤ s ≤ t , by the Fatou lemma,
t ≥0

X sλ = lim X s∧T
λ
= lim E(X tλ∧Tn | Fs ) ≥ E( lim X tλ∧Tn | Fs ) = E(X tλ | Fs ).
n→∞ n n→∞ n→∞

In particular E X tλ ≤ E X 0λ = 1 and X λ is martingale iff4 E X tλ = 1 for all t ≥ 0.

5. We have already seen that at the beginning of the proof!

For a continuous local martingale M issued from the origin, there are criteria to ensure that the local
1
martingale eM − 2 〈M 〉 (Doléans-Dade exponential) is a martingale or even an u.i. martingale, namely:

• Domination: a local martingale dominated by an integrable random variable is a u.i. martingale.

• Bracket: a local martingale with integrable bracket is a martingale (see Lemma 4.2.7). Note that
DZ • Z • E Z t Z t
λ
E〈X 〉t = λ E 2
X sλ dM s , X sλ dM s 2
=λ E e 2λM s −λ2 〈M 〉s
d〈M 〉s ≤ λ E 2
e2λM s d〈M 〉s .
0 0 t 0 0

• Novikov and Kazamaki conditions or criteria, which are specific to Doléans-Dade exponentials.

Theorem 7.3.2. Novikova and Kazamakib conditions.


a Named after Alexander Novikov, Russian-Australian mathematician.
b Named after Norihiko Kazamaki, Japanese mathematician.

If M = (M t )t ≥0 is a continuous local martingale issued from the origin then 1. ⇒ 2. ⇒ 3. where


1
1. Novikov condition: Ee 2 〈M 〉∞ < ∞
1
2. Kazamaki condition: M is a u.i. martingale and Ee 2 M∞ < ∞

4 If A ≤ B then A = B iff E(A) = E(B ). In particular, a super or sub martingale is a martingale iff its expectation is constant.

94/142
7.4 Dubins – Schwarz theorem

1
3. eM − 2 〈M 〉 is a u.i. martingale.

This can be skipped at first reading.

Proof.
1
• 1. ⇒ 2. We have E〈M 〉∞ ≤ Ee 2 〈M 〉∞ < ∞, hence, by Lemma 4.2.7, M is a continuous martingale
bounded in L2 and thus u.i. in particular M ∞ exists. By proceeding as in the proof Theorem,
1 1
7.3.1 eM − 2 〈M 〉 is a super-martingale issued from 1 and by the Fatou lemma EeM t − 2 〈M 〉t ≤ 1 for
all t ∈ [0, +∞]. Finally by the Cauchy – Schwarz inequality,
1 1 1 1
(Ee 2 M∞ )2 ≤ EeM∞ − 2 〈M 〉∞ Ee 2 〈M 〉∞ ≤ Ee 2 〈M 〉∞ < ∞.

• 2. ⇒ 3. Since M is a u.i. martingale, by Corollary 4.4.6, for an arbitrary stopping time T we have
E(M ∞ | FT ) = M T and by the Jensen inequality
1 1
e 2 MT ≤ E(e 2 M∞ | FT ).
1 1
But since Ee 2 M∞ < ∞, it follows that the family {E(e 2 M∞ | FT ) : T stopping time} is u.i., and by
1
the inequality, the family {e 2 MT : T stopping time} is u.i.
For all a ∈ (0, 1) let us define M (a) = aM /(a + 1). We have

1
³ 1
´a 2 2
³ 1
´a 2 ³ (a) ´1−a 2
eaM − 2 〈aM 〉 = eM − 2 〈M 〉 eaM −a M = eM − 2 〈M 〉 eM ,

and then, for all stopping time T , by the Hölder inequality,

1
³ 1
´a 2 ³ (a) ´1−a 2
E(eaMT − 2 〈aM 〉T ) ≤ EeMT − 2 〈M 〉T E eMT
⋆ (a) 2
≤ E(eMT )1−a
⋆⋆ 1
≤ E(e 2 MT 1 A )2a(1−a) .
1
We have used for ⋆ the fact that E(eMT − 2 〈M 〉T ) ≤ 1 which comes from Theorem 2.5.6 with S = 0
1
for the non-negative super-martingale eM − 2 〈M 〉 with expectation ≤ 1. We have used for ⋆⋆ the
Jensen inequality for the concave function u ∈ R+ 7→ u 2a/(1+a) thanks to 2a/(1 + a) ∈ (0, 1).
1
It follows in particular that the family {eaMT − 2 〈M 〉T : T stopping time} is u.i. Now, if (Tn )n is a
1
localizing sequence for the local martingale eaM − 2 〈aM 〉 , then for all 0 ≤ s ≤ t ,
1 1
E(eaM t ∧Tn − 2 〈aM 〉t ∧Tn | Fs ) = eaM s∧Tn − 2 〈aM 〉s∧Tn ,

and we can pass to the limit by the u.i. property for the family indexed by stopping times, hence
1
eaM − 2 〈aM 〉 is a u.i. martingale. Finally, using again the Jensen inequality in the last step,
1 1 2 (a) 2 1 2 1
1 = E(eaM∞ − 2 〈aM 〉∞ ) ≤ E(eM∞ − 2 〈M 〉∞ )a E(eM∞ )1−a ≤ E(eM∞ − 2 〈M 〉∞ )a E(e 2 M∞ )2a(1−a) .
1 1 1
Now a → 1 gives E(eM∞ − 2 〈M 〉∞ ) ≥ 1 hence E(eM∞ − 2 〈M 〉∞ ) = 1, thus eM − 2 〈M 〉 is a martingale.

7.4 Dubins – Schwarz theorem

The following theorem says that a continuous local martingale is a time changed Brownian motion.

95/142
7 Itô formula and applications

Theorem 7.4.1. Dubinsa – Schwarzb or Dambisc theorem.


a Named after Lester Dubins (1920 – 2010), American mathematician.
b Named after Gideon E. Schwarz (1933 – 2007), Israeli mathematician and statistician.
c Named after K.È. Dambis, Russian mathematician who apparently published a single article, in Russian, in 1965.

Let M be a continuous local martingale with M 0 = 0 and 〈M 〉∞ = ∞ almost surely. For all t ≥ 0, let

T t = inf{s ≥ 0 : 〈M 〉s > t } = 〈M 〉−1


t

be the generalized inverse of the non-decreasing process 〈M 〉 issued from the origin. Then

1. B = (M 〈M 〉−1 ) is a Brownian motion with respect to the filtration (FTt )t ≥0


t t ≥0

2. (B 〈M 〉t )t ≥0 = (M t )t ≥0 .

These equalities are as random variables taking values in C (R+ , R), in other words “a.s. for all t ≥ 0”.
For instance, if M = αW where α > 0 is a constant and W is a Brownian motion issued from the origin,
then 〈M 〉t = α2 t and T t = α−2 t , and B = (M Tt )t ≥0 = (αWα−2 t )t ≥0 is a BM with respect to (Fα−2 t )t ≥0 .
This theorem is as dangerous as the Skorokhod embedding theorem, it does not state that there exists a
Brownian motion B with respect to the filtration for which M is a local martingale and for which B 〈M 〉 = M .

Proof. Beware that since 〈M 〉 can be flat on an interval, the map t 7→ T t can be discontinuous. Regarding the
process B = (M Tt )t ≥0 , Lemma 7.4.2 states that the processes M and 〈M 〉 are constant on the same intervals.

Lemma 7.4.2. Simultaneous flatness for M and 〈M 〉.

Let M be a continuous local martingale. Then the processes M and 〈M 〉 are constant on same inter-
vals, in the sense that almost surely, for all 0 ≤ a < b,

∀t ∈ [a, b], M t = M a if and only if 〈M 〉b = 〈M 〉a .

Let us postpone the proof of Lemma 7.4.2.


For all t ≥ 0, the random variable T t is a stopping time with respect to the filtration (Fu )u≥0 , and s 7→ T s
is non-decreasing. It follows that for all 0 ≤ s ≤ t , we have FTs ⊂ FTt , and thus (FTu )u≥0 is a filtration.
Moreover for all t ≥ 0, the random variable T t is a stopping time for the filtration (FTu )u≥0 . We have T t < ∞
for all t ≥ 0 on the almost sure event {〈M 〉∞ = ∞}. By construction, the process (T t )t ≥0 is right continuous,
non-decreasing (and thus with left limits), and adapted with respect to the filtration (FTt )t ≥0 . Since M is
continuous, the process B = (M Tt )t ≥0 is right continuous with left limits. Moreover, for all t ≥ 0,

B t − = lim B s = M Tt − .
s→t
<

Lemma 7.4.2 implies that almost surely B t − = B t for all t ≥ 0, in other words that B is continuous.
Let us show that B is a Brownian motion for (FTt )t ≥0 . For all n ≥ 0, the process M Tn is a continuous local
martingale issued from the origin and 〈M Tn 〉∞ = 〈M 〉Tn = n a.s. By Lemma 4.2.7, we get that for all n ≥ 0, the
processes M Tn and (M Tn )2 − 〈M 〉Tn are u.i. martingales. Now, for all 0 ≤ s ≤ t ≤ n, and by the Doob stopping
theorem for u.i. martingales (Corollary 4.4.6) and the martingale property, using T s ≤ T t ≤ Tn ,
T T
E(B t | FTs ) = E(M T n | FTs ) = M T n = M Tn ∧Ts = B s
t s

T
and similarly, using additionally the property 〈M 〉Tn = 〈M 〉Tn ∧Tt = 〈M 〉Tt = t ,
t

T T
E(B t2 − t | FTs ) = E((M T n )2 − 〈M Tn 〉Tt | FTs ) = (M T n )2 − 〈M Tn 〉Ts = B Ts .
t s

Thus B and (B t2 − t )t ≥0
are martingales with respect to the filtration (FTt )t ≥0 . It follows now from the Lévy
characterization (Theorem 7.2.1) that B is a Brownian motion with respect to the filtration (FTt )t ≥0 .
Let us show that M = B 〈M 〉 . By definition of B , a.s., for all t ≥ 0, B 〈M 〉t = M T〈M 〉t . Now T〈M 〉−t ≤ t ≤ T〈M 〉t and
since 〈M 〉 takes the same value at T〈M 〉−t and T〈M 〉t , we get t = T〈M 〉t and Lemma 7.4.2 gives M t = M T〈M 〉t for
all t ≥ 0 a.s. In other words, using the definition of B , this means that a.s., for all t ≥ 0, M t = M T〈M 〉t = B 〈M 〉t .

96/142
7.5 Girsanov theorem for Itô integrals

Proof of Lemma 7.4.2. Since M and 〈M 〉 are continuous, it suffices to show that for all 0 ≤ a ≤ b, a.s.

{∀t ∈ [a, b] : M t = M a } = {〈M 〉b = 〈M 〉a }.

The inclusion ⊂ comes from the approximation of the quadratic variation 〈M 〉 = [M ]. Let us prove the
converse. To this end, we consider the continuous local martingale (N t )t ≥0 = (M t − M t ∧a )t ≥0 . We have

〈N 〉 = 〈M 〉 − 2〈M , M a 〉 + 〈M a 〉 = 〈M 〉 − 2〈M 〉a + 〈M 〉a = 〈M 〉 − 〈M 〉a .

For all ε > 0, we set the stopping time Tε = inf{t ≥ 0 : 〈N 〉t > ε}. The continuous semi-martingale N Tε satisfies
T
N0 ε = 0 and 〈N Tε 〉∞ = 〈N 〉Tε ≤ ε. By Lemma 4.2.7, N Tε is a martingale bounded in L2 , and for all t ≥ 0,

E(N t2∧Tε ) = E(〈N 〉t ∧Tε ) ≤ ε.

Let us define the event A = {〈M 〉b = 〈M 〉a }. Then A ⊂ {Tε ≥ b} and, for all t ∈ [a, b],

E(1 A N t2 ) = E(1 A N t2∧Tε ) ≤ E(N t2∧Tε ) ≤ ε.

By sending ε to 0 we obtain E(1 A N t2 ) = 0 and thus N t = 0 almost surely on A. ■

7.5 Girsanov theorem for Itô integrals

Here is a generalization to random shifts of the Cameron – Martin theorem (Theorem 3.8.2).

Theorem 7.5.1. Girsanova .


a Named after Igor Vladimirovich Girsanov (1934 – 1967), Russian mathematician. Beware that the “G” in “Girsanov”

should not be spelled like in the English word “Girt” but rather like the “gh” in the Arabic word “Maghrib” spelled in Arabic.

Let T > 0 be deterministic. Let B = (B t )t ∈[0,T ] be a d -dimensional BM with B 0 = 0. Let ϕ = (ϕt )t ∈[0,T ]
be d -dimensional progressive and uniformly locally bounded in the sense that there exists a deter-
R•
ministic C < ∞ such that sups∈[0,T ] |ϕs | ≤ C almost surely. Set h = 0 ϕs ds. Then:

1. The Doléans-Dade exponential N = (N t )t ∈[0,T ] defined for all t ∈ [0, T ] by


µZ t 1
Z t ¶
2
N t = exp ϕs dB s − |ϕs | ds
0 2 0

is a non-negative martingale and EN t = 1 for all t ∈ [0, T ].

2. On the changed probability space (Ω, F , Q) where Q is defined by dQ = NT dP, the process
µ Z t ¶
B − h = Bt − ϕs ds
0 t ∈[0,T ]

is an (Ft )0≤t ≤T Brownian motion.

3. If a continuous semi-martingale (X t )t ∈[0,T ] solves the stochastic differential equation


Z •
X = x +B + b(s, X s )ds
0

with b : R+ × Rd → R measurable and bounded, then the law of (X t )t ∈[0,T ] , as a random variable
on the canonical space W = C ([0, T ], Rd ), has a density with respect to the law of x +B given by
µZ T
1 T
Z ¶
w ∈ W 7→ exp b(s, x + w s )dw s − b(s, x + w s )ds .
0 2 0

Finally, if N is u.i. then all this holds with [0, T ], NT , F T replaced by R+ , N∞ , F ∞ .

97/142
7 Itô formula and applications

We recover Theorem 3.8.2 of Cameron – Martin when ϕ is deterministic.


Note that for all t ∈ [0, T ], we have N t ≥ 0 and E(N t ) = 1 and thus N t is a density. Moreover, for any
bounded Ft measurable random variable Z , typically a bounded and measurable function of the trajecto-
ries up to time t of a continuous adapted process, the martingale property for N gives

E(Z NT ) = E(Z E(NT | Ft )) = E(Z N t ).

In other words the density NT satisfies a rule of compatibility when T varies, which is a sort of projection
property, and which is directly related to the martingale property of N .

Proof.
1
1. The process N = eM − 2 〈M 〉 is a Doléans-Dade exponential where M is the local martingale
Z t Z t
Mt = ϕs dB s , which satisfies 〈M 〉t = |ϕs |2 ds.
0 0

From Theorem 7.3.1, it follows that for all λ ∈ R, N λ = exp(λM − λ2 〈M 〉) is a non-negative local mar-
2

tingale, and by Theorem 7.3.1 a super-martingale. In particular with λ = 2 we get that for all s ≥ 0 we
have E(e2(M s −〈M 〉s ) ) ≤ E(e2(M0 −〈M 〉0 ) ) = 1. For λ = 1 we recover N . Let us show that N is a martingale.
As in the proof of Theorem 7.3.1, it suffices to show that the expectation of the angle bracket is finite,
Rt
by using the fact that N solves the SDE N t = 1 + 0 N s dM s . Indeed, for all 0 ≤ t < T we have, denoting
C = sups∈[0,T ] |ϕs |2 (recall that ϕ is uniformly locally bounded),

E〈N 〉t = E〈N , N 〉t
DZ • Z • E
=E N s dM s , N s dM s
0 0 t
Z t
=E N s2 d〈M 〉s
0
Z t
=E e2M s −〈M 〉s |ϕs |2 ds
0
Z t ³ ´
≤C E e2M s −〈M 〉s ds
0
Z t ³ ´
=C E e2M s −2〈M 〉s e〈M 〉s ds
0
Z t
Ct
≤ Ce Ee2M s −2〈M 〉s ds
0
≤ C eC t t < ∞.

Therefore, N is a martingale thanks to Lemma 4.2.7.

2. In order to check that B − h is a Brownian motion under Q, we use Theorem 3.1.3 which reduces the
problem to show that for all λ ∈ Rd and all fixed T ≥ 0, the process
|λ|2
³ ´
eλ·(B −h)t − 2
t
0≤t ≤T

is a martingale under Q. Indeed, for all 0 ≤ s < t ≤ T and A ∈ Fs ,


|λ|2 Rt |λ|2
³ ´ ³ ´
EQ 1 A eλ·(B −h)t − 2 t = E 1 A eλ·B t −λ· 0 ϕu du− 2 t NT

Rt |λ|2
³ ´
= E 1 A eλ·B t −λ· 0 ϕu du− 2 t N t
³ Rt 1 t
R 2
´
= E 1 A e 0 (λ+ϕu )·dB u − 2 0 |λ+ϕu | du
⋆⋆
³ Rs 1 s
R 2
´
= E 1 A e 0 (λ+ϕu )·dB u − 2 0 |λ+ϕu | du
Rs |λ|2
³ ´
= E 1 A eλ·B s −λ· 0 ϕu du− 2 s N s

98/142
7.6 Sub-Gaussian tail bound and exponential square integrability for local martingales


Rs |λ|2
³ ´
= E 1 A eλ·B s −λ· 0 ϕu du− 2 s NT
|λ|2
³ ´
= EQ 1 A eλ·(B −h)s − 2 s ,

where we have used in ⋆ the fact that the process N is a martingale (under P) and in ⋆⋆ the fact that
the process N (with ϕ replaced by λ + ϕ) is a martingale (under P).

3. We give the proof when b is continuous and bounded. We can assume without loss of generality
Rt
that x = 0, otherwise we use X t − x = B t + 0 b(s, x + X s − x)ds. We have ϕt = −b(t , X t ). Since the
R•
shift h = 0 ϕs ds is random, the simple argument used in the proof of Corollary 3.8.3 (Cameron –
Martin) in order to get the density of the law of B +h with respect to the law of B is no longer available.
Nevertheless, for all bounded measurable Φ : W → R,

E(Φ(B − h)) = E(Φ(B − h)NT−1 NT ).


R•
Let us show now that NT−1 is a function of X = B − h. We have B = X + h = X + 0 ϕs ds and thus
Z T Z T Z T
ϕs dB s = ϕs dX s + |ϕs |2 ds,
0 0 0

hence
µ Z T
1 T
Z ¶
2
NT−1 = exp − ϕs dB s + |ϕs | ds
0 2 0
µ Z T
1 T
Z ¶
2
= exp − ϕs dX s − |ϕs | ds
0 2 0
µZ T
1 T
Z ¶
= exp b(X s )dX s − |b(X s )|2 ds
0 2 0
= Ψ(X ) = Ψ(B − h).

The function Ψ is bounded. We admit that it is measurable. Therefore, using the previous result in ⋆,


E(Φ(B − h)) = E(Φ(B − h)NT−1 NT ) = E(Φ(B − h)Ψ(B − h)NT ) = EQ (Φ(B − h)Ψ(B − h)) = E(Φ(B )Ψ(B )).

Hence the density of X on the canonical space W with respect to the Wiener measure is Ψ.

Finally, if N is u.i. then by Corollary 4.4.5 there exists N∞ ∈ L1 such that limt →∞ N t = N∞ a.s. and in L1 , and
|λ|2
N t = E(N∞ | Ft ) for all t ≥ 0. This implies that (eλ·(B −h)t − 2 t )t ≥0 is a martingale for all λ ∈ Rd under the
probability measure Q on (Ω, F ) absolutely continuous with respect to P and with density N∞ . ■

Coding in action 7.5.2. Usage in statistics.

Suppose that a phenomenon is modeled with an unknown function b : Rd → R, and the observation
Rt
by a process (X t )t ≥0 solution of the stochastic differential equation X t = x +B t + 0 b(X s )ds where x is
known and where B is an unobserved BM modeling noise. We would like to estimate b from the ob-
servation of X , or test the hypothesis that b = 0. The Girsanov theorem (Theorem 7.5.1) provides the
likelihood of the observations! Of course life is more complicated because in practice, we observe X
only at a finite number of discrete times. This type of stochastic calculus in action belongs to the field
of statistical analysis of diffusion processes, see [22, 33, 34, 29]. Could you write a computer program
simulating approximate trajectories for a given b and approximating numerically their likelihood?

7.6 Sub-Gaussian tail bound and exponential square integrability for local martingales

99/142
7 Itô formula and applications

Theorem 7.6.1. Sub-Gaussian tail bound and exponential square integrability.

Let M = (M t )t ≥0 be a continuous local martingale issued from the origin. Then for all t , K , r ≥ 0,

r2
³ ´
P sup |M s | ≥ r and 〈M 〉t ≤ K ≤ 2e− 2K ,
s∈[0,t ]

and in particular, if 〈M 〉t ≤ K t then


³ ´ r2
³ ´ 1
and E eα sups∈[0,t ] |M s | < ∞ for all α <
2
P sup |M s | ≥ r ≤ 2e− 2K t .
s∈[0,t ] 2K t

The condition on 〈M 〉t is a comparison to Brownian motion B for which 〈B 〉t = t .

Proof. Let us prove the first inequality. For all λ, t ≥ 0, by Remark 7.3.1, the process

λ2
³ ´
X λ = eλM t − 2 〈M 〉t .
t ≥0

is a positive super-martingale issued from 1 and E X tλ ≤ 1 for all t , λ ≥ 0. Therefore, for all t , λ, r, K ≥ 0,

λ2
³ ´ ³ ´
P 〈M 〉t ≤ K , sup M s ≥ r ≤ P 〈M 〉t ≤ K , sup X sλ ≥ eλr − 2 K
0≤s≤t 0≤s≤t
λ2
³ ´
≤ P sup X sλ ≥ eλr − 2 K
0≤s≤t
λ2 λ2
≤ E(X 0λ )e−λr + 2
K
= e−λr + 2
K

where the last step comes from the maximal inequality (Theorem 2.5.9) for X λ . Taking λ = r /K gives

r2
³ ´
P 〈M 〉t ≤ K , sup M s ≥ r ≤ e− 2K .
0≤s≤t

The same reasoning provides (note by the way that 〈−M 〉 = 〈M 〉 obviously)

r2
³ ´
P 〈M 〉t ≤ K , sup (−M s ) ≥ r ≤ e− 2K .
0≤s≤t

The desired result follows now by the union bound, hence the factor 2 in the right hand side.
Finally, the exponential square integrability comes from the usual link between tail bound and integra-
bility, namely if X = sups∈[0,t ] |M s |, U (x) = eαx , α < 2K1 t , then, by the Fubini – Tonelli theorem,
2

³Z X ´ ³Z ∞ ´ Z ∞ Z ∞ x2
2αxeαx e− 2K t dx < ∞.
2
′ ′ ′
E(U (X )) = E U (x)dx = E 1x≤X U (x)dx = U (x)P(X ≥ x)dx ≤
0 0 0 0

7.7 Burkholder – Davis – Gundy inequalities

These inequalities allow to control the moments of the sup of a local martingale via the moments of its
angle bracket. This is useful for stochastic integrals, and in particular for stochastic differential equations.

100/142
7.7 Burkholder – Davis – Gundy inequalities

Theorem 7.7.1. Burkholdera – Davidb – Gundyc inequalities.


a Named after Donald Burkholder (1927 – 2013), American mathematician.
b Named after Burgess Davis, American mathematician.
c Named after Richard F. Gundy, American mathematician.

For all p > 0 there exists universal constants c p ,C p < ∞ such that for all continuous local martingale
M = (M t )t ≥0 issued form the origin, for all fixed T ≥ 0, the following inequalities hold in [0, +∞]:
³ ´ ³ ´
p
c p E sup |M t |2p ≤ E(〈M 〉T ) ≤ C p E sup |M |2p .
t ∈[0,T ] t ∈[0,T ]

The essential ingredients of the proof are Doob maximal inequality and Itô formula.

This can be skipped at first reading.

Proof. Let us fix T > 0 and set ∥M ∥T = sup0≤t ≤T |M t |. We have, almost surely as n → ∞,

Tn = inf{t ≥ 0 : |M t | ≥ n or |〈M 〉t | ≥ n} ↗ +∞.

By Lemma 4.2.6, for all n ≥ 0, M Tn is a continuous martingale and


T
sup |M t n | ≤ n and sup〈M Tn 〉 ≤ n.
t ≥0 t ≥0

Moreover, almost surely,

〈M Tn 〉T = 〈M 〉T ∧Tn ↗ 〈M 〉T and ∥M Tn ∥T = ∥M ∥T ∧Tn ↗ ∥M ∥T .


n→∞ n→∞

Now if the BDG inequality is satisfied by M Tn for all n ≥ 0 then, by monotone convergence, it is also
satisfied by M . Therefore, from now on, we can assume without loss of generality that both M and
〈M 〉 are bounded. Note that the constants must be universal, depending on p but of course neither
on M nor on T .
Note at this stage that the Doob maximal inequality of Theorem 2.5.7 gives, for all r > 1,
³ r ´r
E(∥M ∥rT ) ≤ E(|M T |r ).
r −1
Case p = 1. In this case E(〈M 〉T ) = E(M T2 ) and the desired BGD inequality is verified with c 1 = 1/4
(maximal inequality with r = 2) and C 1 = 1 (monotony of expectation M T2 ≤ ∥M ∥2T ).
Case p > 1. We have, from the Itô formula (Theorem 7.1.1) for f (x) = |x|2p and X = M , for all t ≥ 0,
Z t Z t
|M t |2p = 2p |M s |2p−1 sign(M s )dM s + p(2p − 1) |M s |2p−2 d〈M 〉s .
0 0

Despite appearances, f is C 2 . Indeed, to clarify the situation at the origin, we have, since 2p − 1 > 0,

f (x) − f (0) |x|2p


f ′ (0) = lim = lim = sign(x)|x|2p−1 −→ 0,
x→0 x −0 x→0 x x→0

and, since 2p − 2 > 0,

f ′ (x) − f ′ (0) |x|2p−1


f ′′ (0) = lim = lim = |x|2p−2 −→ 0.
x→0 x −0 x→0 |x| x→0

In the first (stochastic) integral in the right hand side in the display above, the integrand
|M |2p−1 sign(M )) is continuous and bounded while the integrator M belongs to M20 (recall that M 0 = 0
and M is bounded). As a consequence, this stochastic integral is a martingale issued from the ori-
gin and is therefore centered. Hence, for all 0 ≤ t ≤ T , taking expectations and using the Hölder
inequality with p and q = 1/(1 − 1/p) = p/(p − 1),
Z t
E(|M t |2p ) = p(2p − 1)E |M s |2p−2 d〈M 〉s
0

101/142
7 Itô formula and applications

2(p−1)
≤ p(2p − 1)E(∥M ∥T 〈M 〉T )
2p p
≤ p(2p − 1)E(∥M ∥T )1−1/p (E(〈M 〉T )1/p .

Combined with the maximal inequality above used with r = 2p, we obtain the second BGD inequal-
ity. To prove the first BGD inequality, we use the Itô formula (Theorem 7.1.1) with f (x 1 , x 2 ) = x 1 x 2 ,
X = (M , 〈M 〉(p−1)/2 ),
Z t Z t
(p−1)/2 (p−1)/2 (p−1)/2
M t 〈M 〉t = 〈M 〉s dM s + M s d(〈M 〉s )
0 0

(note that there is no second order term here since either ∂2i , j f = 0 or 〈X i , X j 〉 = 0). Now, if we define
Z t
(p−1)/2
Nt = 〈M 〉s dM s ,
0

we have, for all t ∈ [0, T ],


(p−1)/2
|N t | ≤ 2∥M ∥T 〈M 〉T ,
which gives, using the Hölder inequality with p and q = 1/(1 − 1/p) = p/(p − 1),
p−1 2p p
E(N t2 ) ≤ 4E(∥M ∥2T 〈M 〉T ) ≤ 4(E(∥M ∥T ))1/p (E(〈M 〉T )1−1/p .

Combined with Z t 1
p−1 p
E(N t2 ) = E 〈M 〉s d〈M 〉s = E(〈M 〉t )
0 p
we obtain
p 2p
E(〈M 〉T ) ≤ (4p)p E(∥M ∥T ),
which is the first BGD inequality. Z
t
(p−1)/2
Case 0 < p < 1. Let us define N t = 〈M 〉s dM s . We have
0
Z t Z t Z t
(1−p)/2 (p−1)/2 (1−p)/2
Mt = dM s = 〈M 〉s 〈M 〉s ds = 〈M 〉s dN s
0 0 0

and
Z t Z t
(1−p)/2 (1−p)/2 (1−p)/2
N t 〈M 〉t = 〈M 〉s dN s + N s d(〈M 〉s )
0 0
Z t
(1−p)/2
= Mt + N s d(〈M 〉s ).
0

Therefore, for all t ∈ [0, T ],


(1−p)/2 (1−p)/2
|M t | ≤ 2∥N ∥T 〈M 〉T and ∥M ∥T ≤ 2∥N ∥T 〈M 〉T ,

thus, using the Hölder inequality with 1/p and its conjugate exponent 1/(1 − p),
2p 2p p(1−p)
E(∥M ∥T ) ≤ 4p E(∥N ∥T 〈M 〉T )
p 2 p p
≤ (4 ) (E(∥NT ∥ )) (E(〈M 〉T )1−p 2

p
≤ (4p )2 (E(NT2 ))p (E(〈M 〉T )1−p
p p
= 16p (p −1 E(〈M 〉T ))p (E(〈M 〉T ))1−p
µ ¶p
16 p
= E(〈M 〉T ).
p

102/142
7.8 Representation of Brownian functionals and martingales as stochastic integrals

This proves the first BGD inequality. To prove the second BGD inequality, let α > 0 (the reason for α is
to avoid the singularity at 0 of x 7→ x p−1 due to p − 1 < 0). Now write, using the Itô formula (Theorem
7.1.1),
Z t Z t
M t (α + ∥M ∥t )p−1 = (α + ∥M ∥s )p−1 dM s + M s d(α + ∥M ∥s )p−1
0 0
Z t
= N t + (p − 1) M s (α + ∥M ∥s )p−2 d∥M ∥s
0
Z t
where N t = (α + ∥M ∥s )p−1 dM s . We have then (taking α → 0)
0
Z t 1
p−1 p
|N t | ≤ ∥M t ∥p + (1 − p) ∥M ∥s d∥M ∥s = ∥M ∥t
0 p

and thus Z t 1 2p
E (α + ∥M ∥s )2(p−1) d〈M 〉s = E(N t2 ) ≤ E(∥M ∥t ),
0 p2
which gives finally the inequality (recall that 2(1 − p) < 0)

1 2p
E((α + ∥M ∥t )2(p−1) 〈M 〉t ) ≤ E(∥M ∥t ).
p2

But the identity


p p
〈M 〉t = (〈M 〉t (α + ∥M ∥t )2p(p−1) )(α + ∥M ∥t )2p(1−p)
gives, using the Hölder inequality with 1/p and its conjugate exponent 1/(1 − p), that
p
E(〈M 〉t ) ≤ (E(〈M 〉t (α + ∥M ∥t )2(p−1) ))p (E((α + ∥M ∥t )2p ))1−p
µ ¶p
1 2p
≤ 2 (E(∥M ∥t ))p (E((α + ∥M ∥t )2p ))1−p .
p

Taking the limit as α → 0, we obtain

p 1 2p
E(〈M 〉t ) ≤ E(∥M ∥t )
p 2p

which is the second BGD inequality. ■

7.8 Representation of Brownian functionals and martingales as stochastic integrals

Let B = (B t )t ≥0 be a d -dimensional Brownian motion, and let ϕ = (ϕt )t ≥0 be a d -dimensional progres-


R∞
sive process with respect to the completed filtration of B , such that E 0 |ϕs |2 ds < ∞. Then the stochastic
R∞
integral 0 ϕs dB s is a measurable function of B seen as a random variable with values on the Wiener space.
Indeed it is the limit in probability of finite sums which are measurable functions of B . Conversely, the fol-
lowing theorem states that every measurable function of B is the stochastic integral of a progressive process.

Theorem 7.8.1. Representation of Brownian functionals and martingales as stochastic integrals.

Let B = (B t )t ≥0 be a d -dimensional Brownian motion issued from the origin. Let (Ft )t ≥0 be its com-
pleted natural filtration, and let F∞ = σ(∪t ≥0 Ft ).

1. For all square integrable random variable Z ∈ L2 (Ω, F∞ , P), there exists a unique progressive
R∞
d -dimensional process ϕ = (ϕt )t ≥0 such that E 0 |ϕt |2 dt < ∞ and
Z ∞
Z = E(Z ) + ϕs dB s .
0

103/142
7 Itô formula and applications

2. If M is an (Ft )t ≥0 martingale (continuous or not) bounded in L2 and issued from the origin
R∞
then there exists a unique progressive ϕ = (ϕt )t ≥0 with E 0 |ϕs |2 ds < ∞ and, for all t ≥ 0,
Z t
Mt = ϕs dB s .
0

3. If M is an (Ft )t ≥0 continuous local martingale issued from the origin then there exists a unique
Rt
progressive ϕ = (ϕt )t ≥0 such that for all t ≥ 0, 0 |ϕs |2 ds < ∞ a.s. and
Z t
Mt = ϕs dB s .
0

Being measurable for F∞ means being a measurable function of B , and we say Brownian functional.
R•
In the second item, the continuous process 0 ϕs dB s can be seen as a continuous modification of M . if
R•
M is continuous then M and 0 ϕs dB s are equal as random variables on the canonical space.

This can be skipped at first reading.

Proof.

1. The uniqueness follows from the representation with the progressive processes ϕ and ϕ′ such
R∞ R∞
that E 0 |ϕt |2 dt < ∞ and E 0 |ϕ′t |2 dt < ∞, then ϕ = ϕ′ since the Itô isometry gives
Z ∞ ³³ Z ∞ Z ∞ ´2 ´
E (ϕs − ϕ′s )2 ds = E ϕs dB s − ϕ′s dB s = 0.
0 0 0

Let us prove the existence. Let us consider the sub-vector space


n Z ∞ Z ∞ o
F = Z ∈ L2 (Ω, F∞ , P) : Z = E(Z )+ ϕs dB s for a progressive ϕ = (ϕt )t ≥0 with E |ϕs |2 ds < ∞ .
0 0

For all Z and Z ′ in F , if ϕ and ϕ′ are the associated progressive processes, by the Itô isometry,
Z ∞
′ 2 ′ 2
E(|Z − Z | ) = |E Z − E Z | + E |ϕs − ϕ′s |2 ds.
0

Hence F is a sub-Hilbert space of L2 (Ω, F∞ , P). In order to show that F = L2 (Ω, F∞ , P), it suf-
fices to show that F ⊥ = {0}. Note that F contains all the random variables of the form
³Z ∞ 1 ∞
Z ´
Z = exp ϕs dB s − |ϕs |2
0 2 0
R∞
where ϕ is deterministic and such that 0 |ϕs |2 ds < ∞. Indeed, if we define, for all t ≥ 0,
³Z t 1 t
Z ´
X t = exp ϕs dB s − |ϕs |2 ds
0 2 0

then Z = X ∞ . Since it is a Doléans-Dade exponential (see Theorem 7.3.1), for all t ∈ [0, ∞],
Z t Z ∞
Xt = 1 + X s dB s and in particular E(Z ) = E(X ∞ ) = 1 + E X s dB s ,
0 0

which means that Z ∈ F . Let Y ∈ F ⊥ .


R∞
We have then, for all deterministic ϕ such that 0 |ϕs |2 ds < ∞,
³ ³Z ∞ ´´ ³ ³Z ∞ 1
Z ∞ ´´ ³1 Z ∞ ´
E Y exp ϕs dB s = E Y exp ϕs dB s − |ϕs |2 ds exp |ϕs |2 ds = 0.
0 0 2 0 2 0

104/142
7.8 Representation of Brownian functionals and martingales as stochastic integrals

Pn
Used with h = k=1
λk 1(sk ,sk+1 ] , for arbitrary n ≥ 1, λ1 , . . . , λn ∈ R, and 0 ≤ s 1 < · · · < s n , this gives
³ ´
E Y eλ1 B s1 +···+λn B sn = 0.

Thus, by analytic continuation,


³ ´
E Y eiλ1 B s1 +···+i λn B sn = 0.

Also ³ ³ Pn ´´
E E Y | (B s1 , . . . , B sn )ei k=1 λk B sk = 0.

Since this is valid for all λ1 , . . . , λn ∈ R, we get

E(Y | (B s1 , . . . , B sn )) = 0.

We have then, for all n ≥ 1, all 0 ≤ s 1 < · · · < s n , and all bounded measurable f : Rn → R,

E(Y f (B s1 , . . . , B sn )) = 0.

Finally it follows by the monotone class theorem that Y = 0.

2. Since M is bounded in L2 , it is u.i. and M ∞ ∈ L2 (Ω, F∞ , P), and from the first part
Z ∞
M ∞ = E(M ∞ ) + ϕs dB s
0
R∞
where ϕ = (ϕt )t ≥0 is progressive and such that E 0 |ϕs |2 ds < ∞. Now, E(M ∞ ) = E(M 0 ) = 0,
while by the martingale property of the stochastic integral, for all t ≥ 0,
Z t
M t = E(M ∞ | Ft ) = ϕs dB s .
0

The uniqueness of ϕ follows from its uniqueness in the decomposition of M ∞ .

3. For all n ≥ 0, let Tn = inf{t ≥ 0 : |M t | ≥ n}. The preceding item used for the martingale M Tn =
(M t ∧Tn )t ≥0 which is bounded in L2 gives
Z t
M t ∧Tn = ϕ(n)
s dB s
0
R∞
for a progressive ϕ(n) such that E 0 |ϕ(n) 2
s | ds < ∞. The uniqueness of the progressive process
(m) (n)
gives, for all m < n, ϕs = 1[0,Tm ] (s)ϕs in L2 (Ω× R+ , F∞ ⊗BR+ , P ⊗ds). This allows to construct
Rt Rt
a (unique) process ϕ such that for all t ≥ 0, almost surely 0 |ϕs |2 ds < ∞, and M t = 0 ϕs dB s .

Corollary 7.8.2. Filtration of Brownian motion and martingale regularization.

Let B = (B t )t ≥0 be a d -dimensional BM with B 0 = 0 and let (Ft )t ≥0 be its completed filtration. Then:

1. The filtration (Ft )t ≥0 is right-continuous and left-continuous in the sense that for all t ≥ 0,

Ft = Ft + = ∩s>t Fs and Ft = Ft − = σ(∪s∈[0,t ) Fs )

2. If M is a martingale with respect to (Ft )t ≥0 , then it admits a continuous modification.

Proof.

105/142
7 Itô formula and applications

1. Let us prove right-continuity. Let t ≥ 0 and let Z be a bounded and Ft + measurable random variable.
By Theorem 7.8.1 used with d = 1, there exists a progressive process ϕ with respect to (Ft )t ≥0 such
R∞ R∞
that E 0 ϕ2s ds < ∞ and Z = E Z + 0 ϕs dB s . For all ε > 0, the random variable Z in Ft +ε measurable
and by the martingale property of the stochastic integral,
Z t +ε Z t
L2
Z = E(Z | Ft +ε ) = E(Z ) + ϕs dB s −→ E(Z ) + ϕs dB s .
0 ε→0 0

Thus Z is equal (as a random variable: a.s.) to an Ft -measurable random variable. Since the filtration
is complete, this means that Z is Ft -measurable. A similar argument works to prove left-continuity.

2. If M is bounded in L2 , this follows from the representation in Theorem 7.8.1. The proof of the general
case is not very difficult but takes a page, and we can find it for instance in [31, p. 130-131].

106/142
Chapter 8

Stochastic differential equations

Let B = (B t )t ≥0 be a d -dimensional (column vector) (Ft )t ≥0 Brownian motion issued from the origin.
Let Mq,d (R) be the set of q × p matrices with entries in R.
Pq P Pq
The Hilbert – Schmidt norm of A ∈ Mq,d (R) is |A| = ( i =1 dj=1 |A i , j |2 )1/2 = ( i =1 |A i ,• |2 )1/2 .
|λ|2
We have seen in Theorem 7.1.1 that for all λ ∈ Rd , the Doléans-Dade exponential (eλ·B t − 2
t
)t ≥0 satisfies
Z t
X 0 = 1 and Xt = 1 + X s d(λ · B s ) for all t ≥ 0.
0

The present chapter is devoted to the study of far more general stochastic differential equations (SDE).

8.1 Stochastic differential equations with general coefficients

We seek for a q-dimensional process X = (X t )t ≥s solution of the stochastic differential equation


Z t Z t
Xt = η + σ(u, X u )dB u + b(u, X u )du a.s., t ≥ s. (SDE)
s s

Here

• s ≥ 0 is the initial time

• η is a q-dimensional random vector playing the role of initial value or initial condition or initial data

• the function σ : R+ × Ω × Rq → Mq,d (R) plays the role of a diffusion matrix

• the function b : R+ × Ω × Rq → Rq plays the role of a drift.

For the intuition, the best is to think about X t as the position of a particle in Rd at time t . This physical
picture is made more precise in our study of the Langevin stochastic process (Example 8.2.7).
The Doléans-Dade exponential corresponds to q = 1, η = 1, σ = λ (constant row vector), and b = 0.
Another basic example is given by the Ornstein – Uhlenbeck process (Example 8.2.2).
We say that (SDE) is driven by B . We can interpret (SDE) either as a deformation of Brownian motion,
or as an Ordinary Differential Equation (ODE) with noise1 . Note that (SDE) means that for all 1 ≤ j ≤ q,
Z t Z t
j
Xt = η j + σ j • (u, X u )dB u + b j (u, X u )du a.s., t ≥s
s s
d Z t Z t
= ηj + σ j k (u, X u )dB uk
X
+ b j (u, X u )du a.s., t ≥ s.
k=1 s s

In other words, in differential notations,

X s = η, dX t = σ(t , X t )dB t + b(t , X t )dt a.s., t ≥ s,


1 This ODE with noise point of view leads sometimes to put the noise at the end, namely dX = b(t , X )dt + σ(t , X )dB .
t t t t

107/142
8 Stochastic differential equations

in other words
j j
Xs = ηj , dX t = σ j ,• (t , X t )dB t + b j (t , X t )dt a.s., t ≥ s,
d
σ j ,k (t , X t )dB tk + b j (t , X t )dt
X
= a.s., t ≥ s,
k=1

Note that σ and b can be random and may for instance depend on B on the canonical space. We will
sometimes make explicit or not the dependency over ω, namely σ(t , ω, x) and b(t , ω, x) or σ(t , x) and b(t , x).
For a given law of initial condition η, we say that we have. . .

• existence in law (or weak existence) when (SDE) has a solution on some filtered probability space and
with some BM defined on it which are not necessarily the ones for which (SDE) is stated initially

• uniqueness in law (or weak uniqueness) when additionally all solutions of (SDE) (not necessarily on
the same probability space or with the same Brownian motion) have same law on C (R+ , Rq )

• pathwise uniqueness (or strong uniqueness) when two solutions of (SDE) defined on the same prob-
ability space and with the same Brownian motion are indistinguishables.

For solving (SDE), we assume that the following properties hold for σ and b. Such conditions are natural,
having in mind the (stochastic) nature of the solution, and how we solve basic deterministic ODEs.

• (Lip) Lipschitz regularity in x uniformly in ω and t . There exists a constant c > 0 such that for all
(t , ω) ∈ R+ × Ω and all x, y ∈ Rq ,

|σ(t , ω, x) − σ(t , ω, y)| ≤ c|x − y| and |b(t , ω, x) − b(t , ω, y)| ≤ c|x − y|

• (Mes) Progressive measurability in ω and t . For all t > 0 and all x ∈ Rq , the following maps are
measurable with respect to B[0,t ] ⊗ Ft :

(u, ω) ∈ [0, t ] × Ω 7→ σ(u, ω, x) and (u, ω) ∈ [0, t ] × Ω 7→ b(u, ω, x)

• (Int) Square integrability in ω and in t (locally). For all t > 0 and x ∈ Rq ,


Z t Z t
2
E |σ| (u, ·, x)du < ∞ and E |b|2 (u, ·, x)du < ∞.
0 0

Note that (Lip) implies that if (Int) holds for some x ∈ Rq then it holds for all x ∈ Rq .
Note that (Lip) implies continuity in x and thus measurability in x.
Note that we do not assume continuity of b(t , ω, x) and σ(t , ω, x) with respect to t .

Lemma 8.1.1. Well posedness of the stochastic differential equation.

Let s ≥ 0, and let X = (X t )t ≥s be a q-dimensional continuous adapted process such that for all t ≥ s,
Z t
E |X u |2 du < ∞.
s

Then for all t ≥ 0,


Z t q Z t
|b(u, X u )|2 du < ∞ almost surely and E |σi ,• (u, X u )|2 du < ∞.
X
s i =1 s

In particular, the integrals in the right hand side of (SDE) make sense. Moreover, for all 1 ≤ j ≤ q,

³Z t ´ d Z
³X t ´
σ j ,• (u, X u )dB u = σ j ,k (u, X u )dB uk
s t ≥s k=1 s t ≥s

108/142
8.1 Stochastic differential equations with general coefficients

is a square integrable martingale, and the Itô isometry gives, for all t ≥ s,
³¯ Z t ¯2 ´ ³Z t ´
E ¯ σ(u, X u )dB u ¯ = E |σ(u, X u )|2 du .
¯ ¯
s s

Furthermore if X solves (SDE) then for all j , X j is a continuous semi-martingale, with local martin-
R• R•
gale part 0 σ j ,• (u, X u )dB u , which is a martingale, and finite variation part 0 b j (u, X u )du.

Proof. Thanks to (Lip), for all (u, ω) ∈ R+ × Ω, the maps x ∈ Rq 7→ σ(u, ω, x) and x ∈ Rq 7→ b(u, ω, x) are (uni-
formly) continuous. Since X is adapted and continuous, it is the pointwise limit of a sequence of adapted
step processes taking a finite number of values (discretization of time and space). Thanks to the continuity
of σ and b with respect to x, and to (Mes), it follows that the processes (σ(t , X t ))t ≥s and (b(t , X t )t ≥0 are the
pointwise limit of progressively measurable processes, and are thus progressively measurable.
Now for the integral involving b, we write, using (Lip),

|b(u, X u )|2 ≤ 2(|b(u, 0)|2 + c 2 |X u |2 ),

and by (Int) for b and the Cauchy – Schwarz inequality and the square integrability assumed for X , we get
³³ Z t ´2 ´ ³Z t ´
E |b(u, 0)|du ≤ (t − s)E |b(u, 0)|2 du < ∞
s s

and thus almost surely Z t


|b(u, X u )|du < ∞.
s
Let us consider the integral involving σ. Similarly to what we did for b, by (Lip) for σ,

|σ(u, X u )|2 ≤ |σ(u, X u )|2 ≤ 2(|σ(u, 0)|2 + c 2 |X u |2 ),

and thus, using (Int) for σ and the square integrability assumed for X , we get
³Z t ´ ³Z t ³ ´ ´
2
E |σ(u, X u )| du ≤ E 2 |σ(u, 0)|2 + c 2 |X u |2 du < ∞.
s s
Rt
Hence for all 1 ≤ i ≤ q, the stochastic integral s σi ,• (u, X u )dB u is well defined and is a martingale. ■

Theorem 8.1.2. Solving stochastic differential equations and pathwise uniqueness.

For all s ≥ 0 and all Fs -measurable square integrable random vector η of Rq , there exists an adapted
and continuous q-dimensional process X = (X t )t ≥s such that the following properties hold true:
Z t
1. for all t ≥ s, E |X u |2 du < ∞
s

2. X solves (SDE) with initial condition η

3. such a solution is unique up to indistinguishability, hence pathwise uniqueness!

When σ and b do not depend on the space variable x, then (SDE) has an immediate explicit solution
Z t Z t
Xt = Xs + σ(u)dB u + b(u)du
s s

which is known as an Itô process. It is a martingale when b = 0, and a finite variation process when σ = 0.
The proof below is constructive in the sense that the solution is approximated by S n Y for n large enough
and an arbitrary initial process Y . The solution does not come from the usage of the axiom of choice via
a general theorem such as the Hahn – Banach theorem. However, a true algorithm on a computer would
require to discretize time (Euler scheme for instance) and space and to control the quality of such an ap-
proximation. This is the subject of a whole theory that we can call stochastic numerical analysis, see [27, 26].

109/142
8 Stochastic differential equations

Proof. The idea is to try to use a fixed point method just like in the Picard theorem or the Cauchy – Lipschitz
theorem for ordinary differential equations, adapted to our stochastic processes context.
Let D be the set of continuous adapted q-dimensional processes (Y t )t ≥s with, for all t ≥ 0,
³ ´
∥Y ∥2t = E sup |Yu |2 < ∞.
s≤u≤t

For all Y ∈ D, thanks to Lemma 8.1.1, we can define, for all t ≥ s,


Z t Z t
SY (t ) = η + σ(u, Yu )dB u + b(u, Yu )du.
s s

It is unclear for now if SY belongs to D or not. For all Y 1 and Y 2 in D we have, for all t ≥ s,
Z t Z t
1 2
(SY )t − (SY )t = (σ(u, Yu1 ) − σ(u, Yu2 ))dB u + (b(u, Yu1 ) − b(u, Yu2 ))du,
s s

and then, by using the Cauchy – Schwarz inequality twice,


¯Z t ¯2 Z t
1 2 2 1 2
|(SY )t − (SY )t | ≤ 2 ¯¯ (σ(u, Yu ) − σ(u, Yu ))dB u ¯¯ + 2(t − s) |b(u, Yu1 ) − b(u, Yu2 )|2 du.
¯ ¯
s s

Note that there is not hope to use Itô isometry because the norm in D is the expectation of a supremum not
the converse, but this reminds Doob maximal inequality! Using (Lip) for b, we get, for all t ≥ s,
¯Z u ¯2 Z t
1 2 2 1 2 2
∥SY − SY ∥t ≤ 2E sup ¯ (σ(v, Y v ) − σ(v, Y v ))dB v ¯ + 2c (t − s) E(|Yu1 − Yu2 |2 )du.
¯ ¯
s≤u≤t s s

Next, by Lemma 8.1.1, the Doob maximal inequality (Theorem 2.5.7), the Itô isometry, and (Lip) for σ,
Z t Z t
∥SY 1 − SY 2 ∥2t ≤ 8E |σ(u, Yu1 ) − σ(u, Yu2 )|2 du + 2c 2 (t − s) E(|Yu1 − Yu2 |2 )du
s s
Z t
≤ 2c 2 (4 + (t − s)) E(|Yu1 − Yu2 |2 )du
s
Z t
1 2 2
= Ct E(|Yu − Yu | )du.
s

Taking Y 2 ≡ 0, this shows that SY ∈ D when Y ∈ D (beware that S0 ̸= 0). This gives also the inequality

∥SY 1 − SY 2 ∥2t ≤ C t ∥Y 1 − Y 2 ∥2t .

So S is Lipschitz, but S is not necessarily a contraction because C t can be arbitrarily large. To circumvent the
problem, we bootstrap the estimate by plugin the same estimate into itself recursively. Namely, if we define

ϕ(u) = E(|Yu1 − Yu2 |2 ),

and if we denote, for all n ≥ 1, by S n = S ◦ · · · ◦ S the n-th iteration of S, we get


Z t Z u
∥S n Y 1 − S n Y 2 ∥2t ≤ (C t )2 du E(|(S n−2 Y 1 )v − (S n−2 Y 2 )v |2 )dv (⋆)
s s
..
.
Z
n
≤ (C t ) 1t ≥u1 ≥···≥un ≥s ϕ(u n )du 1 . . . du n
(t − s)n
≤ (C t )n ∥Y 1 − Y 2 ∥2t ,
n!
where we used the basic estimate (also used in the study of order statistics and simple Poisson process)
Z X Z Z
1= du 1 · · · du n = 1uσ(1) ≥···≥uσ(n) du 1 · · · du n = n! 1u1 ≥···≥un du 1 · · · du n .
[0,1]n σ∈Σn [0,1]
n [0,1]n

110/142
8.1 Stochastic differential equations with general coefficients

Let us show now that S admits a fixed point. We start from an arbitrary Y ∈ D, and we set X 0 = Y , and
X n = S n Y for all n ≥ 1. Then we have
³ ´ (C (t − s))n
t
E sup |X un − X un+1 |2 ≤ ∥Y − SY ∥2t . (⋆⋆)
s≤u≤t n!
It follows that
X (C t (t − s))n
E sup |X un − X un+1 |2 ≤ ∥Y − SY ∥2t ≤ ∥Y − SY ∥2t eC t (t −s) < ∞.
X
n≥0 s≤u≤t n≥0 n!
Thus, for all t > s, almost surely
sup |X un − X un+1 |2 < ∞.
X
n≥0 s≤u≤t
By using Lemma 4.3.2 in the Banach space C ([s, t ], Rq ) for an arbitrary say integer t ≥ s, it follows that almost
surely, the sequence of continuous functions (u ≥ s 7→ X un )n≥0 converges uniformly on every compact subset
of [s, ∞) towards the trajectory of a continuous adapted process denoted X = (X u )u≥0 and from (⋆⋆) we get
³ ³ ´´1/2 X
E sup |X un − X u |2 ≤ ∥X m − X m+1 ∥t −→ 0.
s≤u≤t m≥n n→∞

It follows that X ∈ D, that X n → X in D, and that (recall that X n+1 = S n+1 X = S X n )


∥X − S X ∥t ≤ ∥X − X n+1 ∥t + ∥S X n − S X ∥t −→ 0.
n→∞

It follows that X = S X in other words X is a fixed point of S. Finally, if X and Xe are two fixed points of S, then
for all n ≥ 0, X − Xe = S n X − S n Xe , and from (⋆), for all t ≥ 0,
(C t (t − s))n
∥X − Xe ∥2t ≤ ∥X − Xe ∥2t −→ 0
n! n→∞

and therefore X = Xe , hence the uniqueness up to indistinguishability. ■

Theorem 8.1.3. Dependency over initial condition.

e of Rq , if X and Xe are
For all s ≥ 0, for all Fs -measurable square integrable random vectors η and η
solutions on the same space and for the same B, σ, b of
Z t Z t
Xt = η + σ(u, X u )dB u + b(u, X u )du a.s., t ≥ s,
s s

and Z t Z t
Xet = η
e+ σ(u, Xeu )dB u + b(u, Xeu )du a.s., t ≥s
s s
respectively, then, for all t ≥ s, there exists a constant C t > 0 such that
³ ´
E sup |X u − Xeu |2 ≤ C t E(|η − η
e|2 ).
s≤u≤t

Proof. This is a byproduct of the proof of Theorem 8.1.2. Let us give a direct proof via Lemma 8.1.4. We have
Z t Z t
Xt − Xt = η − η
e e + (σ(u, X u ) − σ(u, X u ))dB u + (b(u, X u ) − b(u, Xeu ))du.
e
s s
For all t ≥ s, setting ³ ´
f (t ) = E sup |X u − Xeu |2 ,
s≤u≤t
we get, by Lemma 8.1.1, the Doob maximal inequality (Theorem 2.5.7), the Itô isometry, and (Lip),
Z t Z t
e|2 ) + 12E
f (t ) ≤ 3E(|η − η |σ(u, X u ) − σ(u, Xeu )|2 du + 3(t − s)E |b(u, X u ) − b(u, Xeu )|2 du
s s
Z t
2 2
≤ 3E(|η − η
e| ) + c (12 + 3(t − s)) f (u)du.
s

e|2 ) and b = c 2 (12 + 3(t − s)).


It remains to use the Grönwall lemma (Lemma 8.1.4) with a = 3E(|η − η

111/142
8 Stochastic differential equations

Lemma 8.1.4. Grönwalla lemmab .


a Named after Thomas Hakon Grönwall (1877 – 1932), Swedish mathematician.
b Original version published in 1919 by Grönwall. There are plenty of versions, differential, integral, with variable coeffi-

cients, etc, including a non-linear version (Bihari – LaSalle inequality). Such lemmas are essential for ODEs and SDEs.

Let s < u and f : [s, u] → R bounded measurable. If for constants a ∈ R and b ≥ 0, and all t ∈ [s, u],
Z t
f (t ) ≤ a + b f (v)dv, then for all t ∈ [s, u], f (t ) ≤ aeb(t −s) .
s

Proof. By iterating the condition we obtain, by induction on n, for all n ≥ 0 and all t ∈ [s, u], with t 0 = t ,

(b(t − s))n t0 tn
Z Z
f (t ) ≤ a + a(b(t − s)) + · · · + a + b n+1 ··· f (t n+1 )1t0 ≥···≥tn+1 dt 1 · · · dt n+1 .
n! s s

n+1
Now the integral term is bounded above by ∥ f ∥∞ (b(t(n+1)!
−s))
which tends2 to 0 as n → ∞. ■

Theorem 8.1.5. Regular solution of the stochastic differential equation.

For all s ≥ 0, there exists a family (X ts (x) : x ∈ Rq , t ≥ s) of random variables such that:

1. for all t ≥ s, the map (x, ω) ∈ Rq × Ω 7→ X ts (x, ω) ∈ Rq is measurable with respect to BRq ⊗ Ft

2. for all square integrable random vector η of Rq measurable with respect to Fs , the stochastic
process (Y t )t ≥0 defined by Y t (ω) = X ts (η(ω), ω) solves the stochastic differential equation
Z t Z t
Yt = η + σ(u, Yu )dB u + b(u, Yu )du a.s., t ≥ s. (⋆)
s s

Proof. In order to construct a solution measurable with respect to the initial condition, we discretize the
space using an at most countable mesh and we rely on the regularity of the solution with respect to the
initial condition. Namely, for all n ≥ 0, let (A n,k )k≥0 be an at most countable partition of Rq such that for all
k ≥ 0, diam(A n,k ) ≤ 2−n . For each k ≥ 0, we select z n,k ∈ A n,k , and we define, for all x ∈ Rq ,

g n (x) = z n,k where k is such that x ∈ A n,k .

Let z ∈ Rq . We consider the solution Xet (z, ω) of


Z t Z t
Xet = z + σ(u, Xeu )dB s + b(u, Xeu )du,
s s

for all t ≥ s and all ω ̸∈ N z where N z is a negligible set. Let us define

X tn (x, ω) = Xet (g n (x), ω)1Ω\Nn (ω).


[
Nn = N zn,k and
k

The map (x, ω) 7→ X tn (x, ω) is measurable with respect to BRq ⊗ Ft , and by Theorem 8.1.3, for all x ∈ Rq ,
³ ´ ³ 1 ´2
E sup |X un (x) − Xeu (x)|2 ≤ C t |x − g n (x)|2 ≤ C t n .
s≤u≤t 2

Thus, for all x ∈ Rq ,


E sup |X un (x) − Xeu (x)| < ∞
X
n≥0 s≤u≤t

p 1
2 The Stirling formula is n! ∼
n→∞ 2πn n+ 2 e−n .

112/142
8.2 Ornstein – Uhlenbeck, Bessel, and Langevin processes

and therefore, for all x ∈ Rq , for all t ≥ s, almost surely,

sup |X un (x) − Xeu (x)| −→ 0.


s≤u≤t n→∞

Now we define (the limit is taken component by component)

X ts (x, ω) = lim X tn (x, ω).


n→∞

This limit is measurable as a pointwise limit of measurable functions. Now, let η be a square integrable
Fs -measurable random vector η of Rq . One can check easily that Y tn (ω) = X tn (η(ω), ω) solves
Z t Z t
Y tn = g n (η) + σ(u, Yun )dB u + b(u, Yun )du, t ≥ s,
s s

indeed, for all k, almost surely,


Z t Z t Z t
1η∈A k σ(u, X un (z k ))dB u = 1η∈A k σ(u, X un (z k ))dB u = 1η∈A k σ(u, Yun )dB u .
s s s

Finally, one can check easily using Theorem 8.1.3 and Lemma 4.3.2 that for all t ≥ s, almost surely,

sup |Yun − Yu | −→ 0,
s≤u≤t n→∞

where Y = (Y t )t ≥0 is the solution of (⋆). It follows that for all t ≥ s, almost surely,

X ts (η(ω), ω) = Y t (ω).

Corollary 8.1.6. Composition.

For all 0 ≤ s ≤ t ≤ u, x ∈ Rq , with the notions of Theorem 8.1.5, a.s.

X us (x, ω) = X ut (X ts (x, ω), ω)).

Proof. We have
Z u Z u
X us (x) = x + σ(v, X vs )dB v + b(v, X vs )dv
s s
Z t Z t Z u Z u
=x+ σ(v, X vs )dB v + b(v, X vs )dv + σ(v, X vs )dB v + b(v, X vs )dv
s s t t
Z u Z u
s s s
= X t (x) + σ(v, X v )dB v + b(v, X v )dv
t t

where the last equality holds almost surely. Therefore (X us )u≥t solves the SDE started from X ts at time t . Now
by the representation of Theorem 8.1.5 and the pathwise uniqueness of Theorem 8.1.2, we get, a.s.

X us (x, ω) = X ut (X ts (x, ω), ω).

8.2 Ornstein – Uhlenbeck, Bessel, and Langevin processes

Example 8.2.1. Shifted Brownian motion.

If q = d , η = 0, σ(t , ω, x) = I d (constant), and b(t , ω, x) = b(t , ω) (possibly random and time varying

113/142
8 Stochastic differential equations

but constant in x), then (SDE) gives, for all t ≥ s,


Z t
X t = Bt + b(u)du.
s

If b = 0 then X = B . If b is deterministic, then Theorem 3.8.2 (Cameron – Martin) gives the density of
the law of X with respect to Wiener measure. See also Theorem 7.5.1 and Theorem 8.4.11.

Example 8.2.2. Ornsteina – Uhlenbeckb process.


a Named after Leonard Ornstein (1880 – 1941), Deutch physicist.
b Named after George Eugene Uhlenbeck (1900 – 1988), Dutch-American theoretical physicist.

For simplicity, let X 0 ∈ L2Rd independent of (B t )t ≥0 . The Ornstein – Uhlenbeck process starting from
X 0 ∈ Rd solves the stochastic differential equation (SDE)

dX t = σdB t − µX t dt , t ≥ 0,
p
where σ ≥ 0 and µ ∈ R are constants (the standard O.-U. is with σ = 2 and µ = 1), in other words
Z t Z t Z t
Xt = X0 + σ B s ds + µ X s ds = X 0 + σB t − µ X s ds.
0 0 0

This corresponds to (SDE) with q = d , σ(u, x) = σI d (constant), and b(u, x) = −µx. Let us identify the
solution of this SDE. We use an apriori estimate. More precisely, if X exists and is a semi-martingale,
then, by the Itô formula with f (x, y) = x y and the process (eµt , X t ), and by using the SDE, we get
Z t
d(eµt X t ) = eµt dX t + eµt µX t dt = eµt σdB t and thus eµt X t − e0 X 0 = σ eµs dB s ,
0
a
which gives Z t
Xt = e −µt
X0 + σ eµ(s−t ) dB s .
0
This gives uniqueness, and we check from this formula that this process solves the SDE. The integral
in the right hand side is a Wiener integral. For all t ≥ 0, since 0 (σeµ(s−t ) )2 ds = σ2 1−eµ , we get
Rt 2 −2µt

³ σ2 1 − e−2µt ´ 1 − e−2µt
Law(X t | X 0 = x) = N xe−µt , I d with convention = 2t if µ = 0.
2 µ µ
If X 0 = x = (x 1 , . . . , x d ), then the d coordinates of X are independent one-dimensional O.-U. processes
started from x 1 , . . . , x d . By the isometry property for Wiener – Itô integrals, for all s, t ≥ 0, 1 ≤ i , j ≤ d ,
³Z s Z t ´
eµ(u−s) dB ui , eµ(u−t ) dB u
j j
Cov(X si , X t | X 0 = x) = σ2 E
0 0
Z s∧t
= σ2 1i = j e−µ(t +s) e2µu du
0
³ e−µ|t −s| − e−µ(s+t ) ´
= σ2 1i = j 1µ̸=0 + (s ∧ t )1µ=0 .

When µ > 0 then this quantity is small when both s + t , and |t − s| are large. For all s, t ≥ 0,
Z t +s Z s
−µs −µs u−t −µs −µs
X t +s = e X t + e σ e dB u = e X t + e σ e−µu dB t +u = F s (X t , (B u )u∈(s,t ] ).
t 0

The process X is a continuous auto-regressive Gaussian process, constructed by incorporating along


the time a new independent input, nevertheless it does not have independent increments. The SDE
allows simulation as well as a dynamical interpretation of the trajectories, see Figure 8.1.
a is not the canonical decomposition of the semi-martingale X since 0• eµ(s−t ) dB s is not a local martingale even if
R
R • This
µs
0 e dB s is a martingale. The canonical decomposition is given by the SDE. See also Exercise 1 of the 2020-2021 exam.

114/142
8.2 Ornstein – Uhlenbeck, Bessel, and Langevin processes

Figure 8.1: In black, four trajectories of the Ornstein – Uhlenbeck process with X 0 = 5, σ = 1, µ = 1. The mean
σ2
converges to 0 while the variance converges to 2µ = 12 . At the beginning, the drift term in the SDE is stronger
than the diffusion term and this has the effect to drive exponentially fast the process to a neighborhood
of the origin. The process fluctuates then in this neighborhood forever, the drift term and the diffusion
term remaining at the same order. The drift term −µX t dt has the effect of a spring force (force de rappel in
French) since µ > 0. In light gray, the trajectories of the underlying driving Brownian motion of the SDE, the
mean remains at 5 for all times while the variance grows linearly in time (they also corresponds to µ = 0).

Remark 8.2.3. Quantitative exponential long time behavior of O. – U. via coupling.

Let X = (X t )t ≥0 be an Ornstein – Uhlenbeck process solving the SDE

dX t = σdB t − µX t dt ,

with µ > 0. Let X ′ = (X t′ )t ≥0 be another Ornstein – Uhlenbeck process in Rd solving the same SDE
(same Brownian motion) but with an initial condition X 0′ possibly distinct from X 0 . This is a way to
construct the couple (X , X ′ ). The law of (X , X ′ ) is a coupling of the laws of X and X ′ . Now

d(X t − X t′ ) = −µ(X t − X t′ )dt and thus X t − X t′ = (X 0 − X 0′ )e−µt .

Let P 2 be the set of probability measures on Rd integrating |·|2 . The Wassersteina – Kantorovichb –
c
Fréchet – Monged coupling distance W2 on P 2 is defined for all µ, ν ∈ P 2 by
Ï
W2 (µ, ν)2 = inf |x − y|2 π(dx, dy) = inf E(|U − V |2 )
π Rd ×Rd (U ,V )

where the first infimum runs over all the probability measures π on the product space with marginal
distribution µ and ν, and the second infimum over all couple of random variables with marginal laws

115/142
8 Stochastic differential equations

µ and ν. From now on, let us assume that the laws of X 0 and X 0′ are in P 2 . It can be shown then that
the laws of X t and X t′ are also in P 2 for all t ≥ 0. We have, from our previous estimates,

W2 (Law(X t ), Law(X t′ )) ≤ |X 0 − X 0′ |e−µt .

By taking in turn the infimum over all couplings of X 0 and X 0′ we get

W2 (Law(X t ), Law(X t′ )) ≤ W2 (Law(X 0 ), Law(X 0′ ))e−µt .

σ 2
Finally, recall that we already know that γ = N (0, 2µ I d ) is an invariant law of our Ornstein –
Uhlenbeck process, in the sense that X 0 ∼ γ implies X t ∼ γ for all t ≥ 0. It follows that
′ ′

W2 (Law(X t ), γ) ≤ W2 (Law(X 0 ), γ)e−µt .

This is a quantitative version of the long time exponential behavior of the process.
a Named after Leonid Vaseršteı̆n, Russian-American mathematician.
b Named after Leonid Vitaliyevich Kantorovich (1912 – 1986), Soviet mathematician and economist.
c Maurice René Fréchet (1878 – 1973), French mathematician.
d Gaspard Monge, Comte de Péluse (1746 – 1818), French mathematician.

Coding in action 8.2.4. Simulation.

Write a code to simulate approximate trajectories of the Ornstein – Uhlenbeck process in dimension
d = 1 and plot them on the same graphics. Hint: use the structure of the increments. What is the
effect of changing µ and σ on the trajectories? Consider in particular the case σ = 0 versus σ > 0, and
the cases µ < 0, µ = 0, and µ > 0. Could you check numerically the exponential convergence in law to
the standard Gaussian as time tends to infinity when σ, µ > 0? See Figure 8.1 for an example of plots.

Example 8.2.5. Bessela processes.


a Named after Friedrich Bessel (1784 - 1846), German astronomer, mathematician, physicist and geodesist.

For all x ∈ Rd , we define the process X = (X t )t ≥0 by

X t = |x + B t |2 .

Let r = |x|2 = X 0 . We say that X is a squared Bessel process issued from r . The Itô formula (Theorem
7.1.1) with f (x) = |x|2 which is C 2 (Rd , R) gives, via ∇ f (x) = 2x and ∆ f (x) = 2d ,
Z t
2 2
|x + B t | = |x| + 2 (x + B s )dB s + t d , t ≥ 0.
0

Thus on the canonical space for B , the process X solves the stochastic differential equation
Z t Z t
Xt = r + σ(s, X s )dB s + b(s, X s )ds
0 0

with a random σ(s, x, ω) = 2(x +ω) and b(s, x, ω) = d . They are constant in s and Lipschitz in x and we
can use Theorem 8.1.2 to get the existence and pathwise uniqueness of the solution. Alternatively,
the Lévy characterization of Brownian motion (Theorem 7.2.1) shows that the continuous martingale
³Z t x +B ´ ³Z t x +B ´
s s
W= dB s = p dB s
0 |x + B s | t ≥0 0 Xs t ≥0

(with convention 0/|0| = 1) issued from the origin is a Brownian motion on R since for all t ≥ 0,
DZ • x +B E Z t (x + B ) · (x + B ) Z t
s s s
〈W 〉t = dB s = d〈B 〉 s = ds = t .
0 |x + B s | t 0 |x + B s |2 0

116/142
8.2 Ornstein – Uhlenbeck, Bessel, and Langevin processes

p
Now by writing x + B = X x+B
p we see that X solves the stochastic differential equation
X
Z tp
Xt = r + 2 X s dWs + t d .
0
p
Note that σ(x) = 2 x is not Lipschitz at x = 0 and Theorem 8.1.2 does not apply. However a theorem
due to Yamada – Watanabe states existence and pathwise uniqueness for the stochastic differential
equation dX t = σ(X t )dB t + b(X t )dt as soon as σ : R → R and b : R → R satisfy

|σ(x) − σ(y)| ≤ C
p
|x − y| and |b(x) − b(y)| ≤ C |x − y|

for some C > 0 and all x, y ∈ R. See [31, Exercise 8.14 pages 231 – 232].
R•p
The local martingale 0 X s dWs is a martingale since
³Z t ´ Z t
( X s )2 ds = E(|x + B s |2 )ds < ∞.
p
E
0 0

Therefore (X t − t d )t ≥0 is a martingale. This is not a surprise since it is the sum of the martingales
((B ti )2 − t )t ≥0 . More generally, a squared Bessel process of dimension α > 0 solves
Z tp
Xt = r + 2 X s dWs + t α.
0
p
It is not obvious that such a process stays non-negative. The process Y = X in known as a Bessel
process. It can be shown that when r > 0, α > 1, Y solves the following SDE with singular drift

α − 1 dt
dY t = dWt + ,
2 Yt

see [31, Ex. 8.13]. See also [31, Ex. 5.31 & Sec. 8.4.3] and [43, Ch. XI] for more on Bessel processes.

Example 8.2.6. Time change.

Let X = (X t (x))t ≥0 be the solution of the stochastic differential equation


Z t Z t
Xt = x + σ(u, X u )dB u + b(u, X u )du.
0 0

Let α > 0. Then the time changed process Y = (X αt (x))t ≥0 solves the stochastic differential equation
Z αt Z αt
Yt = x + σ(u, X u )dB u + b(u, X u )du.
0 0

Now, denoting Be = ( p1α B αt ) , we get, with the substitution u = αv that Y is a solution of


t ≥0
Z t p
Z t
Yt = x + ασ(αv, Y v )dBev + αb(αv, Y v )dv.
0 0

Since Be and B have the same law, it follows that Y is a weak solution of
Z t p
Z t
Yt = x + ασ(αu, Yu )dB u + αb(αu, Yu )du.
0 0

For example, if X is an Ornstein – Uhlenbeck process solution of dX t = σdB t − µX t dt , with σ ≥ 0 and


p
µ ∈ Rd , then for all α > 0 the process Y = (X αt )t ≥0 is a weak solution of the SDE dY t = ασdB t −
αµY t dt . We speed up (respectively slow down) the process when α > 1 (respectively α < 1).

117/142
8 Stochastic differential equations

Example 8.2.7. Langevina process.


a Named after Paul Langevin (1872 – 1946), French physicist.

A unit mass particle in Rd , with position Y and velocity Z , feels a force that depends on its position
via a potential U : Rd 7→ R, a friction force that depends on its velocity via a potential V : Rd 7→ R, and
a random Brownian force of variance σ2 > 0 due to the medium. The Langevin stochastic differential
equation follows from Newton fundamental relation of dynamicsa :
(
dY t = Z t dt
p
dZ t = γσdB t − γ∇V (Z t )dt − ∇U (Y t )dt .

Here γ ≥ 0 is the “friction” parameter. This is our (SDE) with q = 2d , X = (Y , Z ), and coefficients
(p (
γσ if 1 ≤ j ≤ d and i = d + j z if 1 ≤ i ≤ d
σi , j = , and b i (y, z) =
0 otherwise −γ∇V (z) − ∇U (y) if d + 1 ≤ i ≤ 2d .

They are deterministic, constant in time, and Lipschitz iff ∇U and ∇V are Lipschitz. When U (y) =
1 2 1 2
2 |y| and V (z) = 2 |z| , the Langevin process X = (Y , Z ) is known as a kinetic Ornstein – Uhlenbeck
process. When γ = 0 then there is no randomness and we speak about a Hamiltonian equation.
The position Y t and the velocity Z t are coupled in the second equation above via the drift term
−∇U (Y t )dt . It turns out that they decouple in the limit of a time – friction scaling. Namely, we can
dilate time with a factor α > 0, giving the equation (we keep same notations for processes)
(
dY t = αZ t dt
p
dZ t = αγσdB t − αγ∇V (Z t )dt − α∇U (Y t )dt .

If α → 0 (slow down the process) and γ → ∞ (high friction) while keeping αγ = 1, we get dY t = 0 and

dZ t = σdB t − ∇V (Z t )dt .

This is known as an overdamped Langevin equation, as a generalized Ornstein – Uhlenbeck equa-


tion, and also as a Kolmogorov equation in [46]. We recover Ornstein – Uhlenbeck when V (z) = |z|2 .
The initial Langevin equation is called sometimes the underdamped or kinetic Langevin equation.
We refer to [12] for a presentation of the physical aspects of Brownian motion and the Langevin equa-
tion, from the historical roots to nowadays physics, see also [17]. Beyond its physical signifiance, the
underdamped Langevin process is a key ingredient in the Hamiltonian or Hybrid Monte Carlo (HMC)
computational algorithms for the simulation of probability measures, see for instance [32].
a The first equation expresses the fact that the derivative of position with respect to time is the velocity, while the second
dZ p dB t dB
the fact that mass × acceleration = dtt = sum of forces = γσ dt − γ∇V (Z t ) − ∇U (Y t ). The term dtt is a white noise.

Example 8.2.8. Geometric Brownian Motion and Blacka – Scholesb process.


a Fisher Black, American economist (1938 – 1995).
b Myron Scholes, Canadian-American financial economist (1941 – ).

The Black – Scholes process solves the SDE

dS t = S t (σt dB t + µt dt ), S 0 > 0.

By using the Itô formula for log(S), we obtain


µZ t 1
Z t Z t ¶
S t = S 0 exp σs dB s − σ2s ds + µs ds .
0 2 0 0

118/142
8.3 Markov property, Markov semi-group, weak uniqueness

In the original model, σ and µ are constants and we find the geometric BM

σ2
S t = S 0 eσB t − 2
t +µt
.

This was used historically in the study of European options pricing, see [30, 25]. The usage of stochas-
tic calculus in mathematical finance is widely developed in specialized courses of the Master MASEF.

Coding in action 8.2.9. Simulation.

Write a code to simulate approximate trajectories of the Bessel processes with various parameters,
and for the Black – Scholes process. How to deal with the singularity at the origin of the Bessel SDE
for non integer parameter? Do the same for the overdamped Langevin process with potential V = |·|4
by using an Euler scheme for the SDE.

8.3 Markov property, Markov semi-group, weak uniqueness

In this section, we assume that σ and b are deterministic (do not depend on ω) and that:

• there exists a constant C > 0 such that for all x, y ∈ Rq and all u ∈ R+ ,

|σ(u, x) − σ(u, y)| + |b(u, x) − b(u, y)| ≤ C |x − y|

• σ and b are measurable maps from R+ × Rq to Mq,d (R) and Rq respectively

• for all t > 0 and all x ∈ Rq , Z t


(|σ(u, x)|2 + |b(u, x)|2 )du < ∞.
0

With these simplified assumptions, the initial assumptions (Lip), (Mes), and (Int) are satisfied. For all s ≥ 0,
we denote by (X ts (x, ω))x∈Rq ,t ≥s the regular solution of (SDE) provided by Theorem 8.1.5.

X ss (x) = x, dX ts (x) = σ(t , X ts (x))dB s + b(t , X ts (x))dt , t ≥ s.

Theorem 8.3.1. Weak Markov property.

For all s ≥ 0 and x ∈ Rq , let (X ts (x))t ≥s be the solution of (SDE) with η = x. Then for all bounded and
measurable f : Rq → R, and for all u ≥ t ≥ s, almost surely

E( f (X us (x)) | Ft ) = E( f (X us (x)) | X ts (x)) = Πt ,u ( f )(X ts (x)),

where for all z ∈ Rq ,


Πt ,u ( f )(z) = E( f (X ut (z))).

A Markov process has no memory, it the sense that for any time t interpreted as the present, the con-
ditioning of its future with respect to its present and past is equal to the conditioning with respect to the
present. This is equivalent to conditional independence of future and past given the present. On the other
hand, a process with long memory can always be seen as a Markov process by seeing the whole trajectory
as a state, simple examples are provided by ARMA time series and high order Markov chains for instance.
Ru
Proof. A key observation is that the stochastic integral t σ(v, X vt )dB v involves the increments of B after
time t and is thus independent of the portion of B before time t . For all z ∈ Rq , almost surely,
Z u Z u
t t
X u (z) = z + σ(v, X v (z))dB v + b(v, X vt (z))dv
t t
Z u−t Z u−t
t t
=z+ σ(t + v, X t +v (z))dB v + b(t + v), X tt+v (z))dv,
0 0

119/142
8 Stochastic differential equations

where B t = (B vt )v≥0 = (B t +v − B t )v≥0 is a translated Brownian motion, independent of Ft . Corollary 8.1.6


gives X us = X ut (X ts (x)) = F t ,u (X ts (x), B t ) where F t ,u is measurable. Since X ts (x) depends only on (B v )s≤v≤t , it
is Ft -measurable and independent of B t . Hence, for all bounded measurable f : Rq → R, by Remark 1.5.2,
E( f (X us ) | Ft ) = E( f (F t ,u (X ts , B t )) | Ft ) = E( f (F t ,u (X ts , B t )) | X ts ) = Πt ,u ( f )(X ts )
where for all z ∈ Rq ,
Πt ,u ( f )(z) = E( f (F t ,u (z, B t ))) = E( f (X ut (z))).
An explicit construction of F t ,u can be done on the Wiener space. ■

Remark 8.3.2. Markov transition kernel and Markov semi-group.

For all 0 ≤ s ≤ t , let Πs,t (x, dy) be the Markov transition kernel on Rq given for x ∈ Rq and A ∈ BRq by

Πs,t (x, A) = P(X ts (x) ∈ A).

It acts on bounded measurable f : Rq → R as


Z
Πs,t ( f )(x) = f (y)Πs,t (x, dy), x ∈ Rq .
Rq

Theorem 8.3.1 gives, for u ≥ t , Πs,u = Πs,t ◦ Πt ,u in the sense that for all f and x we have

Πs,u ( f )(x) = E( f (X us (x))) = E(E( f (X us (x)) | Ft )) = E(Πt ,u ( f )(X ts (x))) = Πs,t (Πt ,u ( f ))(x),

and Theorem 8.3.1 with f replaced by Πt ,u ( f ) gives that the process

(Πt ,u ( f )(X ts (x)))t ∈[s,u]

is an (Ft )t ∈[s,u] martingale. This gives the (non-homogeneous) Markov semi-group property:
Z
Πs,u (x, dy) = Πs,t (x, dz)Πt ,u (z, dy), u ≥ t ≥ s ≥ 0, x ∈ Rq .
Rq

Conversely, the Markov semi-group (Πs,t (x, dy))0≤s≤t fully determines the law of (X ts (x))t ≥s . Indeed,
for all n ≥ 1, 0 ≤ s ≤ t 1 ≤ t 2 ≤ · · · ≤ t n , and bounded and measurable f 1 , . . . , f n from Rq to R, we have

E( f 1 (X ts1 (x)) · · · f n (X tsn (x))) = E( f 1 (X ts1 (x)) · · · f n−1 (X tsn−1 )Πtn−1 ,tn ( f n )(X tsn−1 (x)))
Z
= Πs,t1 (x, dy 1 )Πt1 ,t2 (y 1 , dy 2 ) · · · Πtn−1 ,tn (y n−1 , dy n ) f 1 (y 1 ) · · · f n (y n ).
Rq

Theorem 8.3.3. Uniqueness in law or weak uniqueness.

Let (Ω,e Ff, (F


ft )t ≥0 , P
e ) be another filtered probability space on which is defined a d -dimensional
(Ft )t ≥0 Brownian motion Be = (Bet )t ≥0 issued from the origin. Let x ∈ Rq , and let X = (X t (x, ω))t ≥0
f
and Xe = ( Xet (x, ω)t ≥0 be the solutions of the respective stochastic differential equations:
Z t Z t
X t (x) = x + σ(u, X u (x))dB u + b(u, X u (x))du a.s. t ≥ 0,
0 0

and Z t Z t
Xet (x) = x + σ(u, Xeu (x))dBeu + b(u, Xeu (x))du a.s. t ≥ 0.
0 0

Then these processes X and Xe have the same law on (C (R+ , Rq ), BC (R+ ,Rq ) ).

Proof. Since σ and b do not depend on the randomness, regarding weak solutions, we can play with the
probability space. We consider the canonical Brownian motion π = (πt (ω))t ≥0 defined on the Wiener space
(W = C (R+ , Rd ), BW , (Ft )t ≥0 , µ)

120/142
8.4 Martingale, generator, Kolmogorov equations, strong Markov property, Girsanov theorem

where µ is the Wiener measure. Let (Y t (x, w))t ≥0,w∈W be the regular solution provided by Theorem 8.1.5 of
the stochastic differential equation
Z t Z t
Y t (x) = x + σ(u, Yu (x))dπu + b(u, Yu (x))du µ almost surely.
0 0

We can check easily that the processes Z t (x, ω) = Y t (x, B (ω)), t ≥ 0, ω ∈ Ω, and Zet (x, ω) e = Y t (x, Be(ω)),
e t ≥ 0,
ω
e ∈Ω e are respectively solutions of the SDE satisfied by X and Xe . The pathwise uniqueness of these solutions
provided by Theorem 8.1.2 gives that (Y t (x, B (ω))t ≥0 = (X t (x, ω))t ≥0 P-a.s. and (Y t (x, Be(ω))
e t ≥0 = ( Xet (x, ω))
e t ≥0
d
e -a.s. But the Brownian motions B and Be have same law on W = C (R+ , R ), which is the Wiener measure µ,
P
and therefore the processes X and Xe and (Y t )t ≥0 have the same law on C (R+ , Rq ). ■

8.4 Martingale, generator, Kolmogorov equations, strong Markov property, Girsanov theorem

In this section, we consider the deterministic case and we assume furthermore that σ(t , x) and b(t , x) do
not depend on the time variable t , in other words σ and b are two deterministic maps from Rq to Mq,d (R)
and Rq respectively. This case is also referred to as the deterministic and time homogeneous case.
We denote by (X t (x))t ≥0 = (X t0 (x))t ≥0 the regular solution of the SDE provided by Theorem 8.1.5:
Z t Z t
X t (x) = x + σ(X u (x))dB u + b(X u (x))du a.s., t ≥ 0, x ∈ Rq .
0 0

Theorem 8.4.1. Simple Markov property.

For all u ≥ t ≥ 0 and all measurable and bounded f : Rq 7→ R,

E( f (X u (x)) | Ft ) = E( f (X u (x)) | X t (x)) = Πu−t ( f )(X t (x)) a.s.

where for all s ≥ 0 and x ∈ Rq ,


Πs ( f )(x) = E( f (X s (x))).

In the case of the Ornstein – Uhlenbeck process of Example 8.2.2, we have the “Mehler formula”
à às !!
−µt 1 − e−2µt
Πt ( f )(x) = E f xe +σ Z where Z ∼ N (0, I d ) .

Note that with σ = 1 and µ → 0 we recover the heat kernel formula for Brownian motion, namely
p
E( f (x + B t )) = E(x + σ t Z ).

Proof. Thanks to Theorem 8.3.1 with s = 0 it suffices to show that for all u ≥ t ≥ 0,

E( f (X ut (x))) = E( f (X u−t
0
(x))).

But Z u−t Z u−t


X ut (x) = x + σ(X tt+s (x))dB st + b(X tt+s )ds,
0 0

where B st = B t +s − B t for all s ≥ 0, in other words, setting Y s (x) = X tt+s (x),


Z s Z s
t
Y s (x) = x + σ(Yu (x))dB u + b(Yu (x))du a.s., s ≥ 0,
0 0

Thus the process Y (x) solves a stochastic differential equation similar to the one solves by X (x), obtained
by replacing the Brownian motion B by the translation Brownian motion B t . From the weak uniqueness
property (Theorem 8.3.3), it follows that the processes X (x) and Y (x) have same law, and thus, for all s ≥ 0,

E( f (X tt+s (x))) = E( f (X s0 (x))).

121/142
8 Stochastic differential equations

For all t ≥ 0, let Πt (·, dy) be the Markov transition kernel on Rq defined by
Πt (x, A) = P(X t (x) ∈ A), x ∈ R q , A ∈ BR q .
It acts on a bounded or positive measurable function f : Rq → R as
Z
Πt ( f )(x) = f (y)Π(x, dy) = E( f (X t (x))), x ∈ Rq .
Rq
On bounded measurable functions, it defines a homogeneous Markov semi-group (Πt (x, dy))t ≥0 ,
Π0 = Id, Πs ◦ Πt = Πt +s , s, t ≥ 0.
In other words for all s, t ≥ 0 and all x ∈ Rq ,
Z
Πs+t (x, dy) = Πt (x, dz)Πs (z, dy) = (Πt Πs )(x, dy)).
Rq

Theorem 8.4.2. Markov semi-group properties.

For all t ≥ 0 the operator Πt preserves globally

1. the set M (Rq , R) of bounded and measurable functions Rq → R

2. the set C b (Rq , R) of bounded and continuous functions Rq → R

3. the set C 0 (Rq , R) of bounded and continuous functions Rq → R vanishing at infinity


provided however that the coefficients σ and b are bounded.

The stability of bounded continuous functions is known as the Feller3 continuity.


The condition for the last property to hold is very simple but too restrictive, for instance the Ornstein –
Uhlenbeck process satisfies the property as one can check using the Mehler formula and dominated con-
vergence, while the drift is not bounded.

Proof.
1. Immediate for a Markov transition kernel

2. We need to establish preservation of continuity. Let t ≥ 0, f ∈ C b (Rq , R), x = limn→∞ x n ∈ Rq . We have,


|Πt ( f )(x n ) − Πt ( f )(x)| = |E( f (X t (x n ))) − E( f (X t (x)))|.
Since by Theorem 8.1.3, E(|X t (x n ) − X t (x)|2 ) ≤ C t |x n − x|2 , it follows that limn→∞ X t (x n ) = X t (x) in L2 ,
and thus in law, and therefore limn→∞ Πt ( f )(x n ) = Πt ( f )(x), which implies Πt ( f ) ∈ C b (Rq , R).

3. It suffices to establish the preservation of nullity at infinity. Let f ∈ C 0 (Rq , R) and ε > 0. There exists
A > 0 such that for all y ∈ Rq such that |y| > A, we have | f (y)| < ε. Let (X t (x))t ≥0 be the solution of the
stochastic differential equation associated to the semi-group, namely
Z t Z t
X t (x) = x + σ(X s (x))dB s + b(X s (x))ds.
0 0
q
We have, for all x ∈ R such that |x| > B > A, using the Markov inequality and the Itô isometry,
|E( f (X t (x))| ≤ E(| f (X t (x))|1|X t (x)|>A ) + ∥ f ∥∞ P(|X t (x)| ≤ A)
³¯ Z t Z t ¯ ´
≤ ε + ∥ f ∥∞ P ¯ σ(X s (x))dB s + b(X s (x))ds ¯ ≥ B − A
¯ ¯
0 0
∥ f ∥∞ ³¯¯ t
Z Z t ¯2 ´
≤ ε+ E σ(X (x))dB + b(X (x))ds
¯
s s s
(B − A)2
¯ ¯
0 0
∥ f ∥∞
≤ ε+2 (∥|σ|∥2∞ t + (∥|b|∥∞ t )2 )
(B − A)2
≤ 2ε for B sufficiently large.


3 Named after William Feller (1906 – 1970), Croatian-American mathematician specializing in probability theory.

122/142
8.4 Martingale, generator, Kolmogorov equations, strong Markov property, Girsanov theorem

Remark 8.4.3. Brownian motion and the Laplacian.

If f : Rq → R is C 2 then the Itô formula gives, for all x ∈ Rq and t > 0,


Z t 1
Z t
f (x + B t ) = f (x) + ∇ f (x + B s )dB s + (∆ f )(x + B s )ds.
0 2 0

If f has bounded first and second order derivatives then in particular the stochastic integral term is
a centered martingale and the second integral is integrable, and by taking the expectation we get
Z t ³1 ´
E( f (x + B t )) = f (x) + E (∆ f )(x + B s ) ds.
0 2

If we denote Πt ( f )(x) = E( f (x + B t )), this implies

1
∂t =0 Πt ( f )(x) = ∆ f (x).
2
Alternatively, if we do not know the Itô formula and if f : Rq → R has third order derivative with
bounded second order derivative (Hessian), then a Taylor formula gives,
q q
X ∂f 1 X ∂2 f X k kq
f (x + h) = f (x) + (x)h i + (x)h i h j + r x,k (h)h 1 1 · · · h q
i =1 ∂x i 2 i , j =1 ∂x i ∂x j k 1 ≥0,...,k q ≥0
k 1 +···+k q =3

where r x,k is continuous with limh→0 r x,k (h) = 0 and suph∈Rq |r x,k (h)| < ∞. Taking h = B t , t ≥ 0, and
q
the expectation, using the independence of the components B t1 , . . . , B t of B t , their mean 0, variance
t , and third absolute moment t 3/2 = o(t ), and dominated convergence for the remainder term,

1
E( f (x + B t )) = f (x) + t ∆ f (x) + o(t ).
2

In other words, here again, the partial derivative of Πt ( f )(x) = E( f (x +B t )) at t = 0 is equal to 21 ∆ f (x).
We will see that more generally, we can associate to the solution of (SDE) a second order partial
differential operator, involved in formulas related to the time derivative of the Markov semigroup.

Let C 2 (Rq , R) be the space of functions Rq → R of class C 2 in other words twice differentiable with
continuous second derivative (Hessian). We define the second order linear partial differential operator with-
out constant term L : C 2 (Rq , R) → C (Rq , R), by, for all f ∈ C 2 (Rq , R) and all x ∈ Rq ,

q q
1 X ∂2 f X ∂f
L( f )(x) = a i , j (x) (x) + b i (x) (x), (L)
2 i , j =1 ∂x i ∂x j i =1 ∂x i

where b(x) = (b 1 (x), . . . , b q (x)) and a(x) = σ(x)(σ(x))⊤ in other words

d
σi ,k (x)σ j ,k (x).
X
a i , j (x) =
k=1

For all x ∈ Rq , the matrix a(x) is symmetric, and positive4 since for all y ∈ Rq ,

〈a(x)y, y〉 = |σ(x)⊤ y|2 ≥ 0.

We say that L is elliptic when the inequality is > 0 for all x and all y ̸= 0, and sub-elliptic otherwise. There are
also other notions such as uniformly elliptic and hypo-elliptic, which are outside the scope of this course.
If L is elliptic in the sense that a(x) has full rank d for all x ∈ Rq , then there exists linearly independent
vector fields V0 ,V1 , . . . ,Vd on Rq such that L = V12 + · · · + Vd2 + V0 , see for instance [4, Proposition 6.32].
4 When the inequality is strict whenever y ̸= 0 it is customary in matrix analysis to say that a(x) is positive definite.

123/142
8 Stochastic differential equations

Example 8.4.4. Langevin, Ornstein – Uhlenbeck, Brownian motion.

From Example 8.2.2 & 8.2.7, we get

Process SDE Operator


Brownian motion dX t = dB t L( f )(x) = 12 (∆ f )(x)
σ2
Ornstein – Uhlenbeck dX t = σdB t − µX t dt L( f )(x) = 2 (∆ f )(x) − µx · ∇ f (x).
σ2
Overdamped Langevin dZ t = σdB t − ∇V (Z t )dt L( f )(x) = 2 (∆ f )(x) − ∇V (x) · ∇ f .

For the underdamped Lagevin, we find L( f )(y, z) = γ σ2 (∆z f )−γ∇V (z)·∇z f −∇U (y)·∇z f +∇V (z)·∇ y f .
2

Theorem 8.4.5. Martingale, generator, Duhamela formula, and Kolmogorov equation.


a Named after Jean-Marie Duhamel (1797 – 1872), French mathematician.

Let x ∈ Rq and let (X t (x))t ≥0 be the regular solution of the SDE as in Theorem 8.4.1.

• For all f ∈ C 2 (Rq , R), the following process is an (Ft )t ≥0 local martingale issued from the origin:
µ Z t ¶
f f
M = (M t )t ≥0 = f (X t (x)) − f (x) − L f (X s (x))ds ,
0 t ≥0

where L is the differential operator defined in (L) and where


Z •
f
〈M 〉 = Γ( f )(X s (x))ds where Γ( f )(x) = |σ(x)⊤ ∇ f (x)|2 = a(x)∇ f (x) · ∇ f (x).
0

• For all f ∈ C b2 (Rq , R) in other words has bounded first and second order derivatives then M f is
a martingale with respect to the filtration (Ft )t ≥0 and we have the Duhamel formula:
Z t
Πt ( f )(x) = f (x) + Πs (L( f ))(x)ds, t ≥ 0,
0

in particular we obtain the Kolmogorov equation

∂t Πt ( f ) = Πt (L f ).

If L f = 0, we say that f is harmonic with respect to L, and M f = ( f (X t ) − f (x))t ≥0 is a local martingale.


The functional quadratic form Γ is known as the carré du champ operator of the Markov process X . The
formula Γ( f ) = 12 L( f 2 ) − f L f is at the heart of [2]. When σ is constant and equal to I q then Γ f = |∇ f |2 .
We say that X solves a Stroock – Varadhan martingale problem with respect to the differential operator
L, see [47, 21, 15, 24].

Proof. Let X i (x), 1 ≤ i ≤ q, be the coordinates processes of the process X (x). Let f ∈ C 2 (Rq , R). From the
SDE, X i (x) is a continuous semi-martingale with martingale part

Z • d Z •
i
σi ,• (X s )dB s = σi ,k (X s (x))dB sk
X
M =
0 k=1 0
Z •
and finite variation part b i (X s (x))ds. Now the idea is to use the Itô formula for f (X (x)) and to collect all
0
the non-martingale parts into an operator. The Itô formula of Theorem 7.1.1 applied to f (X t (x)) gives

f (X t (x))
q Z q Z t q Z t
t ∂f ∂f 1 X ∂2 f
(X s (x))dM si + (X s (x))d〈M i , M j 〉s .
X X
= f (x) + (X s (x))b i (X s (x))ds +
i =1 0 ∂x i i =1 0 ∂x i 2 i , j =1 0 ∂x i ∂x j

124/142
8.4 Martingale, generator, Kolmogorov equations, strong Markov property, Girsanov theorem

But now, since we have


d Z t d Z t Z t
〈M i , M j 〉 = σi ,k (X s (x))σ j ,ℓ (X s (x))d〈B k , B ℓ 〉s = σi ,k (X s (x))σ j ,k (X s (x))ds =
X X
a i , j (X s (x))ds,
k,ℓ=1 0 k=1 0 0

we get that
q Z q X
t t ∂f d Z t ∂f
Z
f
(X s (x))dM si = (X s (x))σi ,k (X s (x))dB sk
X X
M t = f (X t (x)) − f (x) − L( f )(X s (x))ds =
0 i =1 0 ∂x i i =1 k=1 0 ∂x i

is an (Ft )t ≥0 local martingale. Note that in particular, for all 1 ≤ i ≤ q,


Z • Z d
• X
〈M i 〉 = 〈M i , M i 〉 = a i ,i (X s (x))ds = σ2i ,k (X s (x))ds.
0 0 k=1

Moreover, using the fact that 〈B k , B k 〉s = s1k=k ′ ,
q
d
t X ∂f ∂f
Z
〈M f 〉t =
X
(X s (x)) (X s (x))σi ,k (X s (x))σ j ,k (X s (x))ds.
0 k=1 i , j =1 ∂x i ∂x j

Furthermore, if now f ∈ C b2 (Rq , R), then for all 1 ≤ i ≤ q, by using the boundedness of ∂i f , the Fubini –
Tonelli theorem, and the square integrability of σi ,k (X s (x)) which comes from (Lip) and Theorem 8.1.2,

∂f ∂f d
Z t³ ´2 Z t³ ´2 X
E (X s (x)) d〈M i 〉s = E (X s (x)) σ2i ,k (X s (x))ds < ∞,
0 ∂x i 0 ∂x i k=1

and thus M f is a martingale as a finite sum of martingales. Finally, when M f is a martingale, then the initial
f f
condition M 0 = 0 gives E(M t ) = 0 for all t ≥ 0, and the Duhamel formula follows from the expression of M f
by taking expectations and using the Fubini – Tonelli theorem. ■

Corollary 8.4.6. Infinitesimal generator of Markov semi-group.

The following properties hold true:

1. Continuity. For all f ∈ C 0 (Rq , R), limt →0+ ∥Πt ( f ) − f ∥∞ = 0

2. Differentiability. For all f ∈ C c2 (Rq , R) in other words C 2 (Rq , R) with compact support,
°Π (f )− f °
° t
lim+ ° − L f ° = 0.
°
t →0 t ∞

We say that L is the infinitesimal generator of the semigroup (Πt )t ≥0 , and formally Πt = et L .

Proof.
1. The Duhamel formula of Theorem 8.4.5 gives, for all g ∈ C b2 (Rq , R),
Z t Z t
Πt (g )(x) − g (x) = Πs (L(g ))(x)ds = E L(g )(X s (x))ds
0 0

and thus ∥Π(g ) − g ∥∞ ≤ t ∥Lg ∥∞ → 0 as t → 0 . Now if f ∈ C 0 (Rq , R), then, for all ε > 0, we there exists
+

g ∈ C b2 (Rq , R) such that ∥ f − g ∥∞ ≤ ε, and if follows then that for t > 0 small enough,

∥Πt ( f ) − f ∥∞ ≤ ∥Πt ( f − g )∥∞ + ∥Πt (g ) − g ∥∞ + ∥g − f ∥∞ ≤ 2ε + ∥Πt (g ) − g ∥∞ ≤ 3ε.

Let us detail an approximation argument to construct g . Since f ∈ C 0 (Rq , R), by the Heine theorem,
f is uniformly continuous and thus there exists η > 0 such that for all x, y ∈ Rq , if |x − y| ≤ η then
| f (x)− f (y)| ≤ ε. Next, let ρ ∈ C ∞ (Rq , R+ ) be a compactly supported probability density function with
support included in the ball {z ∈ Rq : |z| ≤ η}. We have g = f ∗ ρ ∈ C b∞ (Rq , R), and, for all x ∈ Rq ,
Z Z
| f (x) − g (x)| ≤ | f (x) − f (y)|ρ(x − y)dy = | f (x) − f (y)|ρ(x − y)dy ≤ ε.
Rq |x−y|≤η

125/142
8 Stochastic differential equations

2. For all f ∈ C c2 (Rq , R) and all t > 0, we have, from Duhamel formula, using the first part for the last step,
°Π (f )− f ° °1 Z t °
° t
− L f ° =° (Πs (L( f )) − L( f ))ds ° ≤ sup ∥Πs (L( f )) − L( f )∥∞ −→+ 0.
° ° °
°
t ∞ t 0 ∞ s∈[0,t ] t →0

(we have used the fact that f ∈ C c2 gives L f ∈ C c ⊂ C 0 ).


Theorem 8.4.7. Strong Markov property.

Let x ∈ Rq and (X t (x))t ≥0 be the regular solution of the SDE as in Theorem 8.4.1. Let T be an (Ft )t ≥0
stopping time and let FT be its stopping σ-algebra.

1. For all bounded measurable f : Rq → R, and all t ≥ 0,

E( f (X T +t (x))1T <∞ | FT ) = Πt ( f )(X T (x))1T <∞ .

2. For all bounded measurable Φ : C (, Rq , R+ ) → R,

E(Φ((X T +s (x))s≥0 )1T <∞ | FT ) = Ψ(X T (x))1T <∞

where the measurable function Ψ : Rq → R is defined for all y ∈ Rq by

Ψ(y) = EΦ((X s (y))s≥0 ).

When T = s (deterministic) then we recover the weak Markov property (Theorem 8.3.1).
If d = q, σ = I d , and b = 0, we recover the strong Markov property for Brownian motion (Theorem 3.5.1).

This can be skipped at first reading.

Proof.

1. Suppose first that T takes its values in an at most countable set T ⊂ [0, ∞]. We have to show
that for all A ∈ FT and for all t ≥ 0,

E( f (X T +t )1 A∩{T <∞} ) = E(Πt ( f )(X T (x))1 A∩{T <∞} ).

Indeed, using the simple Markov property of Theorem 8.4.1, the left hand side is equal to

E( f (X r +t (x))1 A∩{T =r } ) = E(Πt ( f )(X r (x))1 A∩{T =r } ) = E(Πt ( f )(X T (x))1 A∩{T <∞} ).
X X
r ∈T \{∞} r ∈T \{∞}

Suppose now that T takes arbitrary values in [0, ∞]. It suffices to prove the desired property for
all bounded continuous f . Let us define, for all n ≥ 0, the discretized stopping time

X k +1
Tn = n
1[k/2n ,(k+1)/2n ) (T ) + ∞1T =∞ .
k≥0 2

We have Tn ↘ T . For all n ≥ 0 and all A ∈ FTn , we get, from the first part of the proof,

E( f (X Tn +t (x))1 A∩{Tn <∞} ) = E(Πt ( f )(X Tn (x))1 A∩{Tn <∞} ).

By letting n → ∞ and using the right-continuity of X and dominated convergence, we obtain,

E( f (X T +t (x))1 A∩{T <∞} ) = E(Πt ( f )(X T (x))1 A∩{T <∞} ),

where we also used Theorem 8.4.2 about the continuity of Πt to get

Πt ( f )(X Tn (x))1Tn <∞ −→ Πt ( f )(X T (x))1T <∞ a.s.


n→∞

126/142
8.4 Martingale, generator, Kolmogorov equations, strong Markov property, Girsanov theorem

2. Suppose that Φ is cylindrical, in the sense that for some n ≥ 1, some s n ≥ · · · ≥ s 1 ≥ 0, and some
bounded measurable f 1 , . . . , f n : Rq 7→ R, we have, for all w ∈ W ,

Φ(w) = f 1 (w s1 ) · · · f n (w sn ).

We have in this case to show that

E( f 1 (X T +s1 (x)) · · · f n (X T +sn (x))1T <∞ | FT ) = Ψ(X T (x))1T <∞ a.s.

where Ψ : Rq → R is the function defined for all y ∈ Rq by

Ψ(y) = E( f 1 (X s1 (y)) · · · f n (X sn (y))).

Indeed, for n = 1, this is the first property of the Theorem that we have already proved. We then
proceed by induction on n, and suppose that it is already proved for some n ≥ 1. Let us prove
it for n + 1. We have, denoting for short Yi = f i (X T +si (x)),

E(Y1 · · · Yn Yn+1 1T <∞ | FT ) = E(Y1 · · · Yn E(Yn+1 1T <∞ | Fsn +T ) | FT )


= E(Y1 · · · Yn Πsn+1 −sn ( f n+1 )(X T +sn (y))1T <∞ | FT )
= Ψ(X T (x))1T <∞

where
Ψ(y) = E( f 1 (X s1 (y)) · · · f n (X sn (y))Πsn+1 −sn ( f n+1 )(X sn (y))).
But using the induction hypothesis and the simple Markov property of Theorem 8.4.1,

Ψ(y) = E( f 1 (X s1 (y)) · · · f n (X sn (y))E( f n+1 (X sn+1 (y)) | Fsn ))


= E( f 1 (X s1 (y)) · · · f n (X sn (y)) f n+1 (X sn+1 (y))).

This gives the result for Φ cylindrical. Finally we can use monotone classes (Section 1.8).

Theorem 8.4.8. Heat-type equation and Kolmogorov equation.

Assume that σ and b are moreover C b2 , in other words C 2 with bounded first and second derivatives.
Let L be the differential operator defined in (L). Let f ∈ C b2 (Rq , R). Then:

• There exists a unique Ψ = (Ψ(t , x))t ≥0,x∈Rq solution of the following problem:

– (t , x) 7→ Ψ(t , x) is C 1 in t and C b2 in x
– for all (t , x) ∈ R+ × Rq ,
q q
∂Ψ 1 X ∂2 Ψ X ∂Ψ
(t , x) = L(Ψ(t , ·))(x) = a i , j (x) (t , x) + b i (x) (t , x).
∂t 2 i , j =1 ∂x i ∂x j i =1 ∂x i

– for all x ∈ Rq , Ψ(0, x) = f (x).

• For all x ∈ Rq and t ≥ 0, denoting (X t (x))t ≥0 the solution of the SDE as in Theorem 8.4.1,

Ψ(t , x) = E( f (X t (x))) = Πt ( f )(x).

In particular the infinitesimal generator L determines the Markov semi-group (Πt )t ≥0 which
characterizes the law of the Markov diffusion process (X t (x))t ≥0 .

• In other words, we have the Kolmogorov equation

∂t Πt ( f ) = LΠt ( f ).

127/142
8 Stochastic differential equations

Remark 8.4.9. Kolmogorov equations and Markov processes.

The combination of the Kolmogorov equations provided by Theorem 8.4.5 and Theorem 8.4.8 pro-
vides a commutation between semi-group and generator, namely

∂t Πt = LΠt = Πt L.

The special case ∂t =0 Πt ( f ) = Π0 (L f ) = LΠ0 ( f ) = f is already provided by Corollary 8.4.6. The Kol-
mogorov equation is an essential feature of Markov processes. It expresses the remarkable fact that a
Markov process is a deterministic evolution in time of distributions at time t . The determinism is at
the level of the distribution of the stochastic process instead of being at the level of the trajectories of
the stochastic process. In the case of diffusion processes solutions of (SDE), the deterministic evolu-
tion is described by a linear partial differential operator (L) of second order without constant term.
The identification of the deterministic behavior of distributions of stochastic phenomena was a great
scientific discovery, which can be traced back to Laplacea , Queteletb , Boltzmannc , and Maxwelld ,
among others, connected in a way to the mechanical view of nature developed by Darwine .
a Pierre-Simon Laplace (1749 – 1827), French scholar and polymath.
b Adolphe Quetelet (1796 – 1874), Belgian astronomer, mathematician, statistician and sociologist.
c Ludwig Boltzmann (1844 – 1906), Austrian physicist and philosopher.
d James Clerk Maxwell (1831 – 1879), Scottish scientist in the field of mathematical physics.
e Charles Darwin (1809 – 1882), English naturalist, geologist and biologist.

Remark 8.4.10. Kolmogorov equations and Fokkera – Planckb equation.


a Named after Adriaan Fokker (1887 – 1972), Dutch physicist and musician.
b Named after Max Plack (1858 – 1947), German theoretical physicist.

If we denote by µt the law of X t , then the Kolmogorov equation provided by Theorem 8.4.5 writes
∂t f dµt = L f dµt . This is, in the sense of Schwartz distributions, the Fokker – Planck equation,
R R

∂t µt = Lµt .

If µt has density p t then, denoting L ∗ the adjoint of L,


q q
1 X
L ∗ f (x) = ∂2i , j (a i , j (x) f (x)) + ∂i (b i (x) f (x))
X
2 i , j =1 i =1

and the Fokker – Planck equation becomes

∂t p t = L ∗ p t .

This is also known as the forward Kolmogorov equation, while the one provided by Theorem 8.4.5
or Theorem 8.4.8 is known as the backward Kolmogorov equation. The terms forward and backward
can also be understood from the formula Πt −s = Πs,t , which allows to take the derivative with respect
to t (forward in time) or with respect to s (backward in time).

This can be skipped at first reading.

Proof. We admit the following result, which relies on the assumptions made on σ and b, see [18]:
for all f ∈ C b2 (Rq , R), the quantity Πt ( f )(x) is C b2 in x.
The fact that Πt ( f )(x) is C 1 in t can be checked on the Duhamel formula of Theorem 8.4.5.
Let u ≥ t > 0. The Itô formula (Theorem 7.1.1) for function Πu−t ( f ) and semi-martingale X (x) gives,
proceeding as in the proof of Theorem 8.4.5,
Z t
Πu−t ( f )(X t (x)) = Πu ( f )(x) + N t + (L − ∂u )(Πu−s ( f ))(X s (x))ds
0

128/142
8.4 Martingale, generator, Kolmogorov equations, strong Markov property, Girsanov theorem

where (N t )t ≥0 is a continuous local martingale. But as observed in Remark 8.3.2, the process
(Πu−t ( f )(X t (x)))t ∈[0,u] is a continuous martingale. It follows then that the finite variation process
³Z t ´
(L − ∂u )(Πu−s ( f ))(X s (x))ds
0 t ≥0

is a continuous local martingale, issued from zero, and thus identically equal to zero. Thus

1
Z t
(L − ∂u )(Πu ( f ))(x) = lim Πs ((L − ∂u )(Πu−s ( f ))(x)ds = 0.
t →0 t 0

Therefore the formula for Ψ in the statement of the theorem provides a solution to the problem (heat
equation) considered in the theorem, since Π0 ( f )(x) = f (x). Conversely, if Ψ(t , x) is a solution to this
problem then for all u > 0 the Itô formula for (Ψ(u − t , X t (x)))t ∈[0,u] gives

Ψ(0, X u (x)) = f (X u (x))


Z uh i
= Ψ(u, x) + N
eu + − (∂u Ψ)(u − t , X t (x)) + L(Ψ(u − t , ·))(X t (x)) dt
0
= Ψ(u, x) + N
eu

where (N
eu )u≥0 is a stochastic integral with zero expectation, and therefore

Ψ(u, x) = E( f (X u (x))) = Πu ( f )(x).

Theorem 8.4.11. Girsanov theorem for overdamped Langevin process.

Let T > 0 be a fixed number. Set s = 0, d = q, σ = I d , b = −∇V for a “potential” V ∈ C b2 (Rd , R). Then
Z t
X t (x) = x + B t − ∇V (X s (x))ds,
0

and the law of (X t (x))t ∈[0,T ] , seen as a random variable on the canonical space C ([0, T ], R), is abso-
lutely continuous with respect to the Wiener measure, with density w ∈ C ([0, T ], Rd ) 7→ e−H (w) where

1
Z T
H (w) = V (x + w T ) − V (x) + (|∇V |2 − ∆V )(x + w s )ds.
2 0

Proof. Thanks to the proof of the Girsanov theorem (Theorem 7.5.1), X = (X t (x))t ∈[0,T ] has density

³ Z T 1
Z T ´
exp − ∇V (X s )dX s − |∇V (X s )|2 ds
0 2 0

with respect to Q. Now the Itô formula (Theorem 7.1.1) gives


Z t 1
Z t
V (X T ) = V (x) + ∇V (X s )dX s + (∆V )(X s )ds,
0 2 0

Therefore Z T 1
Z T
− ∇V (X s )dX s = V (x) − V (X T (x)) + ∆V (X s )ds,
0 2 0

hence the formula, since according to Theorem 7.5.1, X with respect to Q has the law of x + B . ■

There are other instances of the Girsanov theorem, for instance with a general L, or between the solu-
tions of two SDE driven by the same BM with the same diffusion coefficient σ but with distinct drifts b.

129/142
8 Stochastic differential equations

This can be skipped at first reading.

8.5 Locally Lipschitz coefficients and explosion time

We assume in this section that σ(t , ω, x) and b(t , ω, x) depend neither on the randomness ω nor on
the time t . They are defined on Rq and take values in Mq,d (R) and Rq . We also assume that they are
locally Lipschitza : for all bounded K ⊂ Rq , there exists a constant C K > 0 such that for all x, y ∈ K ,

|σ(x) − σ(y)| + |b(x) − b(y)| ≤ C K |x − y|.

Beware that these assumptions are not a specialization of the general assumptions made at the be-
ginning of the chapter: deterministic is less general than random, but locally Lipschitz is more gen-
eral than Lipschitz. The main problem with these assumptions on σ and b is that the SDE
Z t Z t
X t (x) = x + σ(X s (x))dB s + b(X s (x))ds
0 0

may not have a solution X t (x) for all time t ≥ 0, and an explosion may occur in finite (random) time.
A way to define a solution for all time is to use a localization procedure in order to define the process
before explosion, and then to stick the process to an extra point at infinity after explosion. We use the
Alexandroffb compactification Rq ∪{∞} of Rq obtained by adding to Rq a point at infinity denoted ∞.
The neighborhoods of ∞ in Rq ∪ {∞} are the complements of the closed proper subsets of Rq .
Theorem 8.5.1: Solving SDE with locally Lipschitz coefficients

For all x ∈ Rq , there exists a unique couple (X x , ξx ) where ξx is a stopping time taking values in
(0, ∞] called the explosion time and where X x = (X t (x))t ≥0 is an adapted process with values
in Rq ∪ {∞} such that the following properties hold:

1. a.s. the path t 7→ X t (x) is continuous from [0, ξx ) to Rq and X t (x) = ∞ for all t ≥ ξx

2. almost surely, on the event {ξx < ∞},

lim |X t (x)| = +∞
<
t →ξx

3. for all stopping time T such that {T < ξx } almost surely on {ξx < ∞},
Z t Z t
X t ∧T (x) = x + 1s≤T σ(X s (x))dB s + 1s≤T b(X s (x))ds a.s., t ≥ 0.
0 0

(this SDE has random coefficients, it involves only the values of X at times before T ).

Before giving the proof of Theorem 8.5.1, let us prepare some ingredients.
From the assumption on σ and b, for all n ≥ 1, there exist maps

σn : Rq → Mq,d (R) and b n : Rq → Rq

such that the following properties hold true:

• for all x ∈ Rq with |x| ≤ n, we have σn (x) = σ(x) and b n (x) = b(x)

• there exists a constant c n ∈ R+ , such that for all x, y ∈ Rq ,

|σn (x) − σ(x)| + |b n (x) − b(x)| ≤ c n |x − y|.

These extended maps can be simply obtained by using cutoff and regularization by convolution.
For all n ≥ 1 and x ∈ Rq , let (X tn (x))t ≥0 be the solution of the SDE (provided by Theorem 8.1.5)
Z t Z t
X tn (x) = x + σn (X sn (x))dB s + b n (X sn (x))ds a.s., t ≥ 0.
0 0

130/142
8.4 Martingale, generator, Kolmogorov equations, strong Markov property, Girsanov theorem

For all m ≥ 1, let us define the stopping time

Tnm (x) = inf{t ≥ 0 : |X tn (x)| ≥ m}.

Lemma 8.5.2: Cutoff and stationarity

For all m ≥ n ≥ 1 and x ∈ Rq , we have

Tnn (x) = Tm
n m
(x) ≤ Tm (x) a.s.,

and, for all t ∈ [0, Tnn (x)], we have

X tn (x) = X tm (x) a.s.

n
Proof of Lemma 8.5.2. Let us define T = Tnn (x) ∧ Tm (x). We have, for all t ≥ 0,
Z t Z t
X tn∧T = x + n
1s≤T σn (X s∧T )dB s + n
1s≤T b n (X s∧T )ds
0 0

and Z t Z t
X tm∧T = x + m
1s≤T σm (X s∧T )dB s + m
1s≤T b m (X s∧T )ds.
0 0

By definition of T , the processes (X tn∧T )t ≥0 and (X tm∧T )t ≥0 solve the same SDE
Z t Z t
Zt = x + 1s≤T σ(Z s )dB s + 1s≤T b(Z s )ds,
0 0

and thus, using the pathwise uniqueness (Theorem 8.1.2), we have X tn∧T (x) = X tm∧T (x) a.s. for all t ≥ 0.
Moreover, on the event {0 < T < ∞}, for all t ∈ [0, T ),

|X tm (x)| = |X tn (x)| < n and [X Tm (x)| = |X Tn (x)| = n.

It follows that T = Tnn (x) = Tm


n
(x). Furthermore, if T = 0, then |x| ≥ n and T = Tnn (x) = Tm
n
(x) = 0,
n n m n m
while if T = ∞, then Tn (x) = Tm (x) = ∞. Finally the continuity of X (x) gives Tm (x) ≤ Tm (x). ■

Proof of Theorem 8.5.1. Existence. We set ξx = supn≥0 Tn (x) where Tn (x) = Tnn (x). If |x| < n then
Tn (x) > 0 and thus ξx > 0. Let t ∈ [0, ξx ). By definition of ξx , there exists n such that Tn (x) > t and for
all m ≥ n, we have X tm (x) = X tn (x) a.s. from Lemma 8.5.2. We can then define
(
limn→∞ X tn (x) if t ∈ [0, ξx )
X t (x) =
∞ if t ∈ [ξx , ∞).

This process (X t (x))t ≥0 verifies the first property stated by the Theorem. Moreover, on {ξx < ∞}, we
have Tn (x) < Tn+1 (x) < · · · < ξx and |X Tn (x) | = n, and therefore, almost surely, on the event {ξx < ∞},

lim |X t (x)| = +∞.


<
t →ξx

Now let us proceed by contradiction and suppose that


³ ´
P lim |X t (x)| < +∞, ξx < ∞ > 0.
<
t →ξx

Then we can find real numbers r and R such that 0 < r < R < ∞ and
³ ´
P lim |X t (x)| < r, lim |X t (x)| > R, ξx < ∞ > 0. (⋆)
<
<
t →ξx t →ξx

131/142
8 Stochastic differential equations

Let f ∈ C c2 (Rq , R) be such that f (x) = 0 if |x| = r and f (x) = 1 if |x| = R. Now, by Theorem 8.4.5,
³ Z t ´
f (X tn (x)) − L n ( f )(X sn (x))ds
0 t ≥0

is a martingale, where L n is the infinitesimal generator of X n . The differential operator L n coincides


with the infinitesimal generator L of X (x) on {x ∈ Rq : |x| ≤ n}, and since f is compactly supported, if
follows that L n f = L f for n sufficiently large, and we can replace L n by L. Next, it follows by the Doob
stopping theorem (Theorem 2.5.1) that for all m ≥ n, the process
³ Z t ∧Tm ´
f (X tn∧Tm (x)) − L( f )(X sn (x))ds
0 t ≥0

is a continuous martingale. By letting n → ∞, by dominated convergence, we get that


³ Z t ∧Tm ´
f (X t ∧Tm (x)) − L( f )(X s (x))ds
0 t ≥0

is a continuous martingale. Now by letting m → ∞, we see similarly that


³ Z t ∧ξx ´
f (X t (x))1t <ξx − L( f )(X s (x))ds
0 t ≥0

is a continuous martingale. Note that if t < ξx then X t ∧Tm (x) (x) → X t (x) as m → ∞ while if t ≥ ξx then
f (X t ∧Tm (x) (x)) = f (X Tm (x) (x)) → 0 as m → ∞. It follows then that the process (( f (X t (x))(x))1t <ξx )t ≥0
is a continuous martingale. Now the left continuity of this process at t = ξ contradicts (⋆) and the
definition of f . It follows that the second property stated in the theorem holds true: a.s., on {ξx < ∞},

lim |X t (x)| = +∞.


<
t →ξx

Furthermore, the third and last property stated in the theorem can be deduced as follows: if T is a
stopping time such that T < ξx almost surely on {ξx < ∞} then for all n ≥ 1 we have
Z t Z t
X tn∧T ∧Tn = x + 1s≤T ∧Tn σ(X sn )dB s + 1s≤T ∧Tn b(X sn )ds,
0 0

thus, Z t Z t
X t ∧T ∧Tn = x + 1s≤T ∧Tn σ(X s )dB s + 1s≤T ∧Tn b(X s )ds,
0 0
and the desired result follows by letting n → ∞ and using dominated convergence.

Uniqueness. Let (X , ξ) and (X ′ , ξ′ ) be two solutions with same initial condition x satisfying to the
theorem properties. Then they are both solutions of the SDE with random coefficients given by the
third property, with T = ξ ∧ ξ′ . Now, if we force them to zero after T , then, by proceeding as in the
proof of Theorem 8.1.3, we get X = X ′ on [0, T ]. Let us prove that ξ = ξ′ . On {ξ < ξ′ }, we have T = ξ,
and by definition of ξ, limt ↗ξ |X t | = +∞, while by the definition of ξ′ , limt ↗ξ |X t′ | = |X ξ′ | since X ′ is
continuous on [0, ξ′ ) and ξ ∈ [0, ξ′ ). Contradiction. Thus {ξ < ξ′ } = ∅, and by symmetry, ξ = ξ′ . ■
a This is the case for instance when σ and b are C 1 on Rq .
b Named after Pavel Alexandrov (1896 – 1982), Russian mathematician.

132/142
Chapter 9

More links with partial differential equations

9.1 Feynman – Kac formula

Let σ : Rq 7→ Mq,d (R) and b : Rq 7→ Rq be Lipschitz. For all x ∈ Rq , let (X tx )t ≥0 be the solution of the SDE
Z t Z t
X tx = x + σ(X sx )dB s + b(X sx )ds, t ≥ 0.
0 0

Its infinitesimal generator is the differential operator (linear, second order, without constant term)
q q
1 X
(σ(x)σ⊤ (x))i , j ∂2i , j +
X
L= b i (x)∂i .
2 i , j =1 i =1

We have seen that L is the infinitesimal generator of a Markov semigroup (Πt )t ≥0 associated to X . But what
can be done for more general linear partial differential operators? The simplest generalization beyond L is
obtained by adding to L a zero order term, say U ∈ C 2 (Rq , R), which gives the linear operator

LU = L +U

in the sense that for all f ∈ C 2 (Rq , R) and x ∈ Rq ,


q q
1 X
(σ(x)σ⊤ (x))i , j ∂2i , j f (x) +
X
LU ( f )(x) = b i (x)∂i f (x) +U (x) f (x).
2 i , j =1 i =1

The following theorem states that there exists a semi-group associated to LU .

Theorem 9.1.1. Feynmana – Kacb formula and semi-group.


a Named after Richard Phillips Feynman (1918 – 1988), American theoretical physicist.
b Named after Marc Kac (1914 – 1984), Polish American mathematician.

Let U ∈ C 2 (Rq , (−∞, c]) be upper bounded, for some finite constant c. For all t ≥ 0 we define the
operator Q t acting on bounded and measurable functions f : Rq → R by the Feynman – Kac formula
Rt
x ∈ Rq 7→ Q t ( f )(x) = E( f (X t (x))e 0 U (X s (x))ds
).

1. The family (Q t )t ≥0 is a semi-group (known as the Feynman – Kac semi-group).

2. For all f ∈ C b2 (Rq , R), all t ≥ 0, and all x ∈ Rq , there is a Duhamel formula
Z t
Q t ( f )(x) = f (x) + Q s (LU ( f ))ds
0
which involves the real Schrödinger operator (diffusion operator + multiplicative potential)

LU ( f )(x) = L( f )(x) +U (x) f (x).

In particular we have the Kolmogorov equation

∂t Q t ( f )(x) = Q t (LU ( f ))(x).

133/142
9 More links with partial differential equations

3. The semi-group (Q t )t ≥0 is continuous in the sense that for all f ∈ C 0 (Rq , R),
° °
lim+ °Q t ( f ) − f °∞ = 0.
t →0

4. The operator LU is the infinitesimal generator of the semi-group (Q t )t ≥0 , namely, for all f ∈ C c2 ,
°Q (f ) − f °
° t
lim+ ° − LU f ° = 0.
°
t →0 t ∞

We could also prove a Kolmogorov equation similar to the one of Theorem 8.4.8, namely ∂t Q t = LU Q t ( f ),
in other words (∂t − LU )Q t ( f ) = 0, which includes the version at t = 0, ∂t =0+ Q t ( f ) = LU Q 0 ( f ) = LU f .
In the formula defining Q t ( f )(x), the exponential weight involves the accumulated value of U along the
trajectory s ∈ [0, t ] 7→ X sx . When U ≥ 0 this is an amplification while when U ≤ 0 this is an attenuation. The
quantity Q t (1 A ) can be interpreted as a quantity of matter or particles in the set A at time t , which is subject
to amplification or attenuation according to the Feynman – Kac evolution equation.
We have ∂t Q t = Q t (LU ). We say that LU is a linear second order differential operator with a multiplica-
tive potential U . This potential models a medium with amplification or attenuation depending on its sign.
The semi-group (Q t )t ≥0 is positive in the sense that for all t ≥ 0, f ≥ 0 implies Q t ( f ) ≥ 0. However,
it is not a Markov
Rt
semi-group in the sense that if we denote by 1 the constant function equal to 1 then
U (X )ds
Q t (1) = E(e 0 s ) is not equal to 1 in general when t > 0.

Proof.

1. For all bounded and measurable f : Rq → R, all x ∈ Rq and s, t ≥ 0, the Markov property for X x gives
Rt R t +s
U (X vx )dv U (X vx )dv
Q t +s ( f )(x) = E(e 0 E( f (X tx+s )e t | Ft ))
Rt x
= E(e 0 U (X v )dv Q s ( f )(X tx ))
= Q t (Q s ( f ))(x).

2. For all f ∈ C 2 (Rq , R), all x ∈ Rq , and all t ≥ 0, we have, from Theorem 8.4.5,
Z t
f
f (X t ) = f (x) + M t + (L f )(X sx )ds.
x
0

Note that M f is a martingale and not only a local martingale because f is of class C b2 and not only C 2 .
Rt
Now, from Itô formula (Theorem 7.1.1) for F (y, z) = yez and semi-martingale (Y t , Z t ) = ( f (X t ), 0 U (X s )ds),
Z t Z t
Zs
F (Y t , Z t ) = F (Y0 , Z0 ) + e dY s + Y s e Zs dZ s
0 0
Z t Z t
f
= f (x) + e Zs (dM s + (L f )(X sx )ds) + Y s e Zs U (X s )ds.
0 0

Note that we do not have a second order part in the Itô formula since the local martingale contribu-
tions for (Y , Z ) comes from Y and on the other hand ∂2y F (y, z) = 0.
By taking expectations, the martingale term disappears, and we obtain
Z t Z t
x Zs
Q t ( f )(x) = f (x) + E((L f )(X s )e )ds + E(Y s U (X s )e Zs )ds
0 0
Z t
= f (x) + (Q s (L f )(x) +Q s ( f U ))ds
0
Z t
= f (x) + Q s (LU ( f ))ds.
0
Rt Rs
We have used the Fubini – Tonelli theorem to swap E and 0 ds, which is allowed since LU ( f )(X s )e 0 U (X v )dv
is integrable for P ⊗ 1[0,t ] ds since U is upper bounded and f ∈ C b2 .

134/142
9.1 Feynman – Kac formula

3. Proceeding as in Corollary 8.4.6, we first show that the result holds for f ∈ C b2 via the Duhamel formula
for (Q t )t ≥0 and the boundedness of L f , and then we generalize to f ∈ C 0 by density w.r.t. ∥·∥∞ .

4. Proceeding as in Corollary 8.4.6, it suffices to use again the Duhamel formula for (Q t )t ≥0 and the
previous item for LU f , which belongs to C 0 since f ∈ C c2 and U ∈ C 2 .

Remark 9.1.2. Killing.

We take Theorem 9.1.1 settings with U ∈ C 2 (Rq , (−∞, 0]) bounded. We think about X t as the position
Rt
of a particle in a medium at time t , −U (x) as a quantity of poison at position x, and − 0 U (X s )ds as
the total quantity of poison accumulated by X along its trajectory on the time interval [0, t ].
Let E ∼ Exp(1) be an exponential random variable of unit mean, independent of X . Let us define
n Z t o
T = inf t ≥ 0 : −U (X s )ds = E .
0

We extend Rq by a cemetery state ∞ ̸∈ Rq , and we define the process Xe with state space Rq ∪ {∞} by
(
X t if t < T ,
Xet =
∞ if t ≥ T .

In other words, starting from x ∈ Rq , the process Xe evolves like X in Rq and after an exponential time
it jumps to ∞ and stays there forever, while if it starts from ∞ then it never moves. It can be checked
that Xe is Markov, with infinitesimal generator L e given for f : Rq ∪ {∞} → R which is C 2 on Rq by
(
0 if x = ∞
L(
e f )(x) =
L( f )(x) −U (x)( f (∞) − f (x)) if x ∈ Rq .

In particular, if f (∞) = 0 then we recover the Feynman – Kac operator

L(
e f ) = LU ( f ),

and from this point of view, when U is negative, the Feynman – Kac operator and semi-group can be
interpreted as the description of the killed Markov diffusion process Xe . Let us check that indeed, L
e is
Rt
the generator of X . If we define λ X (t ) = − 0 U (X s )ds, then we have T = λ X (E ), and
e −1

{t < T } = {t < λ−1


X (E )} = {λ X (t ) < E }.

In particular, using the boundedness of U we get (beware that t → 0+ and not t → +∞)
Rt
U (X s )ds
P(t < T | Ft ) = P(λ X (t ) < E | Ft ) = e 0 = 1 + o t →0+ (1)

and Rt
U (X s )ds
P(t ≥ T | Ft ) = 1 − e 0 = −tU (x) + o t →0+ (1)
where the o(1) are uniform in ω. Now we can write

E( f ( Xet )) − f (x) = E(E( f ( Xet ) − f (x) | Ft ))


= E(( f (X t ) − f (x))P(t < T | Ft )) + ( f (∞) − f (x))E(P(t ≥ T | Ft ))
= E(( f (X t ) − f (x))(1 + o(1))) + ( f (∞) − f (x))(−tU (x) + o(1))
= E(( f (X t ) − f (x))(1 + o(1))) + ( f (∞) − f (x))(−tU (x) + o(1))
= L( f )(x) + o(1) −U (x)( f (∞) − f (x)) + o(1)
= L(
e f )(x) + o(1).

This suggests to interpret LU as the generator of a Markov process with killing. When U is constant
and equal to −λ with λ > 0 then T ∼ Exp(λ) is independent of X and, as soon as f (∞) = 0,

Q t ( f )(x) = E( f (X t ))e−λt = E( f (X t ))P(t < T ) = E( f (X t )1t <T ) = E( f ( Xet )).

135/142
9 More links with partial differential equations

9.2 Kakutani probabilistic formulation of Dirichlet problems

Let us give first an informal presentation of the Dirichlet problem in the case of the Laplacian, and
its probabilistic representation using Brownian motion. Let D be an open subset of Rd , with boundary
∂D = D \ D. The Dirichlet problem consists, for some prescribed g : ∂D → R, to find u : D → R such that u is
harmonic on D in the sense that ∆u = 0 in D, while u = g on ∂D. Following Hilbert, the Dirichlet problem
can be solved by using functional spaces and a variational formulation (it can also be solved using pure
probabilistic arguments). If we assume that the solution u exists, it is also possible, following Kakutani, to
obtain a probabilistic representation for u. Namely, let B be a d -dimensional Brownian motion issued from
the origin. Assume that for all x ∈ D, T∂D = inf{t ≥ 0 : x +B t ∈ ∂D} satisfied P(T∂D ) < ∞. The Itô formula gives
Z t ∧T∂D
u(x + B t ∧T∂D ) = u(x) + M tu∧T∂D + ∆u(x + B s ) ds,
0 | {z }
=0,s<T∂D

for all x ∈ D and all t ≥ 0. The integral in the right-hand side is zero since ∆u = 0 on D and x + B s ∈ D if
u
s < T∂D . Moreover if g is bounded then M •∧T ∂D
is a martingale issued from 0, and taking expectations gives

E(u(x + B t ∧T∂D )) = u(x).

Now, if u is continuous and bounded on D, we obtain, by dominated convergence and since u = g on ∂D,

E(g (x + B T )) = u(x).

This is the Kakutani probabilistic representation of the solution of the Dirichlet problem. It shows by the
way the uniqueness of the solution. It allows Monte – Carlo methods. Note that when d = 1 and D = (a, b),
then u(x) = αx + β for all x ∈ [a, b] with α = (g (b) − g (a))/(b − a), and β = (bg (a) − ag (b))/(b − a).
We can define more generally the Dirichlet problem for a linear partial differential operator of second
order, possibly with constant term, and with a prescribed arbitrary value inside the domain. To study this
generalization, we take the following ingredients.

• D ⊂ Rq is open and bounded. Its boundary is ∂D = D \ D. We have D ∩ ∂D = ∅ and D = D ∪ ∂D.

• x ∈ Rq 7→ a(x) = (a i , j (x))1≤i , j ≤d = σ(x)σ⊤ (x) where σ(x) is a d × q matrix, Lipschitz in x.

• x ∈ Rq 7→ b(x) = (b i (x))1≤i ≤d is a vector field, Lipschitz in x

• f : D 7→ R is continuous and bounded.

• g : ∂D 7→ R is continuous and bounded.

• c : D 7→ (−∞, 0] is continuous and negative.


1 Pq 2 Pq
• differential operator L = 2 i , j =1 a i , j (x)∂x i ,x j + b (x)∂xi .
i =1 i

The Dirichlet problem1 on D consists in seeking for u ∈ C 2 (D, R) ∩ C 0 (D, R) such that

Lu(x)

 + c(x)u(x) = f (x) for all x ∈ D,
| {z }
L c (u)(x) (DirP)

 lim u(x) = g (x 0 ) for all x 0 ∈ ∂D,
x→x0

x∈D

When c = 0 and f = 0 then Lu = 0 on D and we say that f is harmonic for L on D. Let us consider the
solution (X tx )t ≥0 provided by Theorem 8.1.5 of the stochastic differential equation
Z t Z t
X tx = x + σ(X sx )dB s + b(X sx )ds, t ≥ 0, x ∈ Rq , (SDE)
0 0
1 Named after Peter Gustav Lejeune Dirichlet (1805 – 1859), German mathematician.

136/142
9.2 Kakutani probabilistic formulation of Dirichlet problems

where B = (B s )s≥0 is a d -dimensional BM. For all x ∈ D, we define the stopping time

T∂D = inf{t ≥ 0 : X tx ∈ ∂D} ∈ [0, +∞].

In the sequel, we abridge X x into X and we denote by Px and Ex to indicate the starting point of X .
The Dirichlet problem by itself is a static problem (does not involve time), but the Kakutani probabilistic
representation of its solution is dynamic, it involves a stochastic process, an evolution equation, time.

Theorem 9.2.1. Kakutania probabilistic representation of the solution of the Dirichlet problem.
a Named after Shizuo Kakutani (1911 – 2004), Japanese mathematician.

Suppose that there exists a solution u to (DirP).

• If f = c = 0, then, for all x ∈ D, if Px (T∂D < ∞) = 1, then

u(x) = Ex (g (X T∂D )).

• More generally, for all x ∈ D, if Ex (T∂D ) < ∞ then


µ R T∂D Z T∂D Rs ¶
u(x) = Ex g (X ∂T )e 0 c(X s )ds − f (X s )e 0 c(X v )dv ds .
0

Moreover if f = 0 then Ex (T∂D ) < ∞ can be replaced by the weaker condition Px (T∂D < ∞) = 1.

In particular the solution in unique.

Remark 9.2.2. Existence of solution with probabilistic arguments.

Following [31, Section 7.2], it can be shown, say in the simple case c = f = 0 and L = 12 ∆, that we have
Ex (T∂D ) < ∞, and that if g ∈ C 0 (∂D, R) then x ∈ D 7→ u(x) = Ex (g (X T∂D )) is C 2 (D, R) and harmonic on
D in the sense that ∆u = 0 on D. This can be done by showing that u has the mean value property
and this can be proved by using the strong Markov property for X . We have then also u = g on ∂D.
However there are extra regularity assumptions on D to get u ∈ C 0 (D, R).

Proof. We have already produced a proof for q = d , σ = I d , b = 0, f = 0, c = 0. We now follow the same
scheme in the general case. We suppose first that u can be extended to Rq as a C b2 function. The Itô formula
Rt
(Theorem 7.1.1) for (x, y) 7→ u(x)e y and (X t , Y t = 0 c(X s )ds) gives, for all t ≥ 0,
Z t Z t Z t
u(X t )eYt − u(x) = 〈∇u(X s )eYs , σ(X s )dB s 〉 + f (X s )eYs ds + (Lu + cu − f )(X s )eYs ds.
0 0 0

If we replace t by t ∧ T∂D , the third and last integral in the right-hand side vanishes due to the fact that
X s ∈ D when s < T and u solves (DirP) namely Lu + cu − f = 0 on D. Next, since u is bounded on D, the
first integral in the right hand side above is a continuous martingale issued from the origin, and thus, by the
Doob stopping theorem (Theorem 2.5.1), its expectation is zero. This gives finally that
³ ´ ³Z t ∧T∂D ´
u(x) = Ex u(X t ∧T∂D )eYt ∧T∂D − Ex f (X s )eYs ds .
0

Since Px (T∂D < ∞) = 1, the desired formula follows then by letting t → ∞ and using dominated con-
vergence. The condition Ex (T∂D ) < ∞ is used to handle the integral involving f via the uniform bound
RT
| 0 ∂D f (X s )eYs ds| ≤ ∥ f ∥∞ T∂D . When f = 0, we could replace Ex (T∂D ) < ∞ by Px (T∂D < ∞) = 1.
Actually what we did is similar to the proof of the Duhamel formula for the Feynman – Kac operator L c
in the proof of Theorem 9.1.1 except that before taking expectation we stopped the process and we used the
definition of the stopping time to in order to use the fact that u is a solution of the Dirichlet problem.
The general case on u can be addressed as follows. We have to take into account the fact that ∇u and
∆u may blow up when approaching ∂u, because we only know that u ∈ C 2 (D)∩C 0 (D). Thus we restrict the

137/142
9 More links with partial differential equations

domain D to gain regularity. For all n ≥ 1, let D n = {x ∈ D : dist(x, ∂D) > 1/n}. For all x ∈ D, we have x ∈ D n
for n large enough. For all n ≥ 1, D n ⊂ D n+1 ⊂ D, and ∪n D n = D. For all n ≥ 1, let u n ∈ C b2 (Rq , R) be such
that u n = u on D n . This can be constructed by proceeding as in Lemma 9.2.3 below. The function u n solves
the Dirichlet problem on D n with boundary condition u n |∂D n . Then, for all x ∈ D n ,

³ ´ ³Z T∂D n ´
YT∂D
u n (x) = u(x) = Ex u(X T∂D n )e n − Ex f (X s )eYs ds .
0

We have T∂D n ≤ T∂D n+1 ↗ T as n → ∞, and it remains to let n → ∞ and to use dominated convergence. ■

Lemma 9.2.3. Itô formula on a domain.

The Itô formula of Theorem 7.1.1 remains valid for all f ∈ C 2 (D, R) where D ⊂ Rd is open, provided
that X 0 = x ∈ D, and, almost surely, for all t ≥ 0, X t ∈ D.

Proof. Since f may blow up at the boundary of D, the idea is first to reduce the domain and then to extend
the restricted function, which is bounded, to the whole space. Namely, for all n ≥ 1, let

D n = {x ∈ D : dist(x, ∂D) > 1/n}.

We have ∪n D n = D, and for all n ≥ 1, D n ⊂ D n ⊂ D n+1 . Let ϕn ∈ C ∞ (Rd , [0, 1]) such that ϕn = 1 on D n and
ϕn = 0 on (D n+1 )c . Let us define fe by fe = f on D and fe = 0 on D c . Let us define

f n = feϕn + (1 − ϕn )( fe ∗ ϕn ).

Now f ∈ C 0 (D n+1 , R) gives f n ∈ C b∞ (Rd , R), and f n = f on D n . The Itô formula (Theorem 7.1.1) for f n gives
the canonical decomposition of the semi-martingale ( f n (X t ∧T∂D n ))t ≥0 , which depends only on f . Finally,
since f n → f as n → ∞ on D, it remains to use dominated convergence as n → 0 (Theorem 6.3.4). ■

Remark 9.2.4. Brownian motion.

Let us consider the simple situation where q = d , σ = I d , b = 0, f = 0, c = 0, for which X = x + B .


The fact that P(T∂D < ∞) = 1 for all x ∈ D follows from the fact that the coordinates of B are one-
dimensional BM and that with probability one they exit any given finite interval, see example 2.5.2.
Let us show that for all x ∈ D, Ex (T∂D ) < ∞. Since B is continuous, it suffices to show that for all R > 0,

E(τ) < ∞ where τ = inf{t ≥ 0 : |B t | = R} = inf{t ≥ 0 : |B t |2 = R 2 }

(recall that B 0 = 0). The process |B |2 is a squared Bessel process issued from the origin and
(|B t |2 − t d )t ≥0 is a martingale. The Doob stopping theorem (Theorem 2.5.1) gives, for all t ≥ 0,

E(|B t ∧τ |2 ) = d E(t ∧ τ).

By definition of τ, the left-hand side is bounded by R 2 , and by monotone convergence for the right
hand side, we get E(τ) ≤ R 2 /d , which implies P(τ < ∞) = 1, which gives by dominated convergence

R2
E(τ) = < ∞.
d
We may also compute the exponential moments of τ with an exponential martingale.

Remark 9.2.5. Assumptions.

For a complete probabilistic analysis of Dirichlet type problems, we refer to [40].

1. If c takes its values in (−∞, −a], a > 0, then the assumption Ex (T∂D ) < ∞ is useless, and we get

138/142
9.2 Kakutani probabilistic formulation of Dirichlet problems

from the proof by letting t → ∞ in the display with t ∧ T∂D that


³ R T∂D ´ ³Z T∂D ´
c(X s )ds
u(x) = Ex u(X T∂D )e 0 1T∂D <∞ − Ex f (X s )eYs ds .
0

2. If for some a > 0 and all x ∈ D, Ex (eaT∂D ) < ∞, then Theorem 9.2.1 remains valid for c taking
values in (−∞, a] with a > 0. This is the case for instance if for all x ∈ D, T∂D is bounded.

3. In the proof of Theorem 9.2.1, we can simply assume that (SDE) admits a (weak) solution for
all x ∈ D, the coefficients σ and b being supposed mesurable (not necessarily Lipschitz!). The
hypothesis of continuity for f , c, and g are superfluous as well, and one can assume that they
are just mesurable, f and g being bounded and non-negative. This remark is useful for certain
problems in stochastic control theory.

4. The probabilistic representation provided by Theorem 9.2.1 shows that it suffices to give g on a
subset ∂R D of ∂D such that Px (X T∂D ∈ ∂R D) = 1 for all x ∈ D. The elements of such as subset are
called regular points of the boundary. This same representation shows also that the Dirichlet
problem is ill posed if g is arbitrary outside ∂R D.

5. The Itô formula of Theorem 7.1.1 can be generalized to functions which are not C 2 but are
differentiable in a weak sense, and it follows that the probabilistic representation provided by
Theorem 9.2.1 remains valid in this general case.

Remark 9.2.6. Harmonic measure and Poisson kernel.

The exit law Π(x, dy) = Px (X T∂D ∈ dy) is the harmonic measure. When the ingredients ∂D, L are reg-
ular enough, then supx∈D Ex (T∂D ) < ∞, and (DirP) has a solution u (say f = 0, c = 0), the assump-
tions of Theorem 9.2.1 are satisfied, and the harmonic measure is absolutely continuous. Its density
Π(x, dy) = Π(x, y)dy, x ∈ D, y ∈ ∂D, is the Poisson kernel:
Z
u(x) = Ex (g (X T∂D )) = g (y)Π(x, dy).
∂D

The function (x, y) 7→ Π(x, y) is strictly positive, C 2 in x ∈ D, and

lim Π(x, y) = 0 for all y, y 0 ∈ ∂D, y ̸= y 0 ,




x→y 0

x∈D
 lim |Π(x, y 0 )| = +∞
x→y
 for all y 0 ∈ ∂D.
0
x∈D

We refer to [37] and [40] for more details on these aspects.

Remark 9.2.7. Discrete Dirichlet problem.

Let (X n )n∈{0,1,2,...} be a simple random walk on Zd defined by X 0 ∈ Zd and

X n+1 = X n + εn+1 = X 0 + ε1 + · · · + εn+1

where (εn )n≥0 are independent and identically distributed “increments” uniformly distributed on

{±e 1 , . . . , ±e d }

where e 1 , . . . , e d is the canonical basis of Rd . This is the discrete time and space analogue of Brownian
motion. Its physics is essentially the same, since it is a matter of space-time scale. Mathematically, in
discrete settings, we do not have the difficulties of regularity and infinite dimensional analysis, how-
ever we do not have the advantages of continuity and the chain rule for differentials. The simulation

139/142
9 More links with partial differential equations

of Brownian motion on a computer always reduces to some sort of discrete time and space random
walk, and the link between the two is the subject of stochastic numerical analysis in relation with the
central limit phenomenon. The Feynman – Kac formula and the Dirichlet problem have fully discrete
analogues. For instance, for the Dirichlet problem, if D ⊂ Zd , we could define

∂D = {x ̸∈ D : |x − y| = 1} and D = D ∪ ∂D

where |x − y| = 1 is equivalent to say that x − y ∈ {±e 1 , . . . , ±e d }, and

T = inf{n ≥ 0 : X n ∈ ∂D}.

Now if D is not empty and bounded, then, for all g : ∂D → R, there exists a unique

u:D →R

such that u = g on ∂D and ∆u = 0 on D where for all x ∈ Zd ,

1
∆u(x) =
X
u(y) − u(x).
2d y:|x−y|=1

This discrete Laplacian leads to the continuous Laplacian via a second order Taylor formula and the
approximation εZd → Rd as ε → 0. The discrete Kakutani probabilistic representation simply reads

u(x) = Ex (g (X T∂D )).

Remark 9.2.8. Dynkin formula.

For all x ∈ Rq , all stopping time T with Ex (T ) < ∞, and all f ∈ C 2 (Rq , R) with compact support,
Z T
Ex ( f (X T )) = f (x) + Ex L f (X s )ds.
0

If T is the exist time of a bounded set then we can drop the restriction of compactness of support.

Coding in action 9.2.9. Simulation.

Write a code for the simulation of the law of T∂D and X T∂D for various choices of D, d , when X is BM.
Same question when X is an Ornstein – Uhlenbeck process. Hint: start with the discrete Dirichlet
problem, and then think about the discretization of the continuous case.

140/142
Bibliography

[1] G. W. A NDERSON , A. G UIONNET & O. Z EITOUNI – An introduction to random matrices, Cambridge Studies in Advanced Math-
ematics, vol. 118, CUP, 2010.
[2] D. B AKRY, I. G ENTIL & M. L EDOUX – Analysis and geometry of Markov diffusion operators, Grundlehren der Mathematischen
Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 348, Springer, 2014. 124
[3] P. B ALDI – Stochastic calculus, Universitext, Springer, Cham, 2017, An introduction through theory and exercises. iii
[4] F. B AUDOIN – Diffusion processes and stochastic calculus, EMS Textbooks in Mathematics, EMS, 2014. iii, 14, 18, 91, 92, 123
[5] R. M. B LUMENTHAL & R. K. G ETOOR – Markov processes and potential theory, Pure and Applied Mathematics, Vol. 29, Aca-
demic Press, New York-London, 1968.
[6] A. N. B ORODIN & P. S ALMINEN – Handbook of Brownian motion—facts and formulae, second ed., Probability and its Applica-
tions, Birkhäuser Verlag, Basel, 2002. 51
[7] B. B RU & M. Y OR – “Comments on the life and mathematical legacy of W. Doeblin”, Finance Stoch. 6 (2002), no. 1, p. 3–47. vi,
87
[8] F. C OMETS & T. M EYRE – Calcul stochastique et modèles de diffusions : Cours et exercices corrigés, Dunod, 2006, Coll. Sciences
sup. iii
[9] C. D ELLACHERIE & P.-A. M EYER – Probabilities and potential, North-Holland Mathematics Studies, vol. 29, North-Holland,
1978. iii, 18, 20
[10] — , Probabilities and potential. B, North-Holland Mathematics Studies, vol. 72, North-Holland Publishing, 1982, Theory of
martingales, Translated from the French by J. P. Wilson. iii
[11] — , Probabilities and potential. C, North-Holland Mathematics Studies, vol. 151, North-Holland, 1988, Potential theory for
discrete and continuous semigroups, Translated from the French by J. Norris.
[12] B. D UPLANTIER – “Brownian motion, “diverse and undulating””, in Einstein, 1905–2005, Prog. Math. Phys., vol. 47, Birkhäuser,
2006, Translated from the French by Emily Parks, p. 201–293. vi, 43, 118
[13] R. D URRETT – Stochastic calculus, Probability and Stochastics Series, CRC Press, 1996, A practical introduction. iii
[14] R. D URRETT – Probability—theory and examples, Cambridge Series in Statistical and Probabilistic Mathematics, vol. 49, Cam-
bridge University Press, Cambridge, 2019, Fifth edition of [ MR1068527]. 3
[15] S. N. E THIER & T. G. K URTZ – Markov processes, Wiley Series in Probability and Mathematical Statistics: Probability and
Mathematical Statistics, John Wiley & Sons, Inc., New York, 1986, Characterization and convergence. 124
[16] L. C. E VANS – An introduction to stochastic differential equations, American Mathematical Society, Providence, RI, 2013. iii
[17] R. F. F OX – “Stochastic calculus in physics”, J. Statist. Phys. 46 (1987), no. 5-6, p. 1145–1157. 118
[18] I. I. G IKHMAN & A. V. S KOROKHOD – The theory of stochastic processes. III, Springer, 1979, Translated from Russian by Samuel
Kotz, With corrections to Volumes I and II, Grundlehren der Mathematischen Wissenschaften, 232. iii, 128
[19] A. G UT – Probability: a graduate course, second ed., Springer Texts in Statistics, Springer, New York, 2013. 3
[20] N. I KEDA & S. WATANABE – Stochastic differential equations and diffusion processes, second ed., North-Holland Mathematical
Library, vol. 24, North-Holland, 1989. iii
[21] J. J ACOD – Calcul stochastique et problèmes de martingales, Lecture Notes in Mathematics, vol. 714, Springer, Berlin, 1979. iii,
124
[22] J. J ACOD & A. N. S HIRYAEV – Limit theorems for stochastic processes, Grundlehren der Mathematischen Wissenschaften [Fun-
damental Principles of Mathematical Sciences], vol. 288, Springer-Verlag, Berlin, 1987. 99
[23] J.-P. K AHANE – “Le mouvement brownien: un essai sur les origines de la théorie mathématique”, in Matériaux pour l’histoire
des mathématiques au XXe siècle (Nice, 1996), Sémin. Congr., vol. 3, Soc. Math. France, Paris, 1998, p. 123–155. vi
[24] I. K ARATZAS & S. E. S HREVE – Brownian motion and stochastic calculus, second ed., Graduate Texts in Mathematics, vol. 113,
Springer, 1991. iii, 124
[25] — , Methods of mathematical finance, Applications of Mathematics (New York), vol. 39, Springer-Verlag, New York, 1998. 119
[26] P. E. K LOEDEN & E. P LATEN – Numerical solution of stochastic differential equations, Applications of Mathematics (New York),
vol. 23, Springer-Verlag, Berlin, 1992. 109

141/142
BIBLIOGRAPHY

[27] P. E. K LOEDEN , E. P LATEN & H. S CHURZ – Numerical solution of SDE through computer experiments, Universitext, Springer-
Verlag, Berlin, 1994, With 1 IBM-PC floppy disk (3.5 inch; HD). 109
[28] H.-H. K UO – Introduction to stochastic integration, Universitext, Springer, New York, 2006. iii, 34
[29] Y. A. K UTOYANTS – Statistical inference for ergodic diffusion processes, Springer Series in Statistics, Springer-Verlag London,
Ltd., London, 2004. 99
[30] D. L AMBERTON & B. L APEYRE – Introduction to stochastic calculus applied to finance, second ed., Chapman & Hall/CRC Fi-
nancial Mathematics Series, Chapman & Hall/CRC, 2008. 119
[31] J.-F. L E G ALL – Brownian motion, martingales, and stochastic calculus, french ed., Graduate Texts in Mathematics, vol. 274,
Springer, 2016. iii, 25, 65, 67, 91, 106, 117, 137
[32] T. L ELIÈVRE , M. R OUSSET & G. S TOLTZ – Free energy computations, Imperial College Press, London, 2010, A mathematical
perspective. 118
[33] R. S. L IPTSER & A. N. S HIRYAEV – Statistics of random processes. I, expanded ed., Applications of Mathematics (New York),
vol. 5, Springer-Verlag, Berlin, 2001, General theory, Translated from the 1974 Russian original by A. B. Aries, Stochastic Mod-
elling and Applied Probability. 99
[34] — , Statistics of random processes. II, expanded ed., Applications of Mathematics (New York), vol. 6, Springer-Verlag, Berlin,
2001, Applications, Translated from the 1974 Russian original by A. B. Aries, Stochastic Modelling and Applied Probability. 99
[35] H. P. M C K EAN – Stochastic integrals, AMS Chelsea Publishing, Providence, RI, 2005, Reprint of the 1969 edition, with errata.
[36] M. M ÉTIVIER – Semimartingales, de Gruyter Studies in Mathematics, vol. 2, Walter de Gruyter & Co., Berlin-New York, 1982, A
course on stochastic processes. iii
[37] C. M IRANDA – Partial differential equations of elliptic type, Ergebnisse der Mathematik und ihrer Grenzgebiete, Band 2,
Springer, 1970, Second revised edition. Translated from the Italian by Zane C. Motteler. 139
[38] E. N ELSON – Dynamical theories of Brownian motion, Princeton University Press, 1967. vi
[39] J. P ERRIN – Les atomes (1913), CNRS Éditions, 2014. vi, 30
[40] S. C. P ORT & C. J. S TONE – Brownian motion and classical potential theory, Academic Press, 1978, Probability and Mathemat-
ical Statistics. 138, 139
[41] N. P RIVAULT – Stochastic finance, Chapman & Hall/CRC Financial Mathematics Series, CRC Press, Boca Raton, FL, 2014, An
introduction with market examples. 92
[42] P. E. P ROTTER – Stochastic integration and differential equations, Stochastic Modelling and Applied Probability, vol. 21,
Springer, 2005, Second edition. Version 2.1, Corrected third printing. iii
[43] D. R EVUZ & M. Y OR – Continuous martingales and Brownian motion, third ed., Grundlehren der Mathematischen Wis-
senschaften [Fundamental Principles of Mathematical Sciences], vol. 293, Springer, 1999. iii, 117
[44] L. C. G. R OGERS & D. W ILLIAMS – Diffusions, Markov processes, and martingales. Vol. 1, Cambridge Mathematical Library,
Cambridge University Press, Cambridge, 2000, Foundations, Reprint of the second (1994) edition. iii
[45] — , Diffusions, Markov processes, and martingales. Vol. 2, Cambridge Mathematical Library, Cambridge University Press,
Cambridge, 2000, Itô calculus, Reprint of the second (1994) edition. iii
[46] G. R OYER – An initiation to logarithmic Sobolev inequalities, SMF/AMS Texts and Monographs, vol. 14, AMS-SMF, 2007, Trans-
lated from the 1999 French original by Donald Babbitt. 118
[47] D. W. S TROOCK & S. R. S. VARADHAN – Multidimensional diffusion processes, Classics in Mathematics, Springer, 2006, Reprint
of the 1997 edition. iii, 124
[48] M. S. TAQQU – “Bachelier and his times: a conversation with Bernard Bru”, Finance Stoch. 5 (2001), no. 1, p. 3–32. vi, 43
[49] B. Ø KSENDAL – Stochastic differential equations, sixth ed., Universitext, Springer, 2003, An introduction with applications. iii

142/142

You might also like