0% found this document useful (0 votes)
16 views

MCMC

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

MCMC

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Lectures on Markov chain Monte Carlo methods

General comments.
• These lectures assume that the audience is familiar with discrete-time Markov
chains evolving on finite state spaces. The lectures are essentially based on
the book [Häggström], but rely also on the books listed at the end of these
notes.
• The videos do not replace books. I suggest to read the corresponding
sections before or after the videos.
• After the statement of a result, interrupt the video and try to prove the
assertion. It is the only way to understand the difficulty of the problem, to
differentiate simple steps from crucial ones, and to appreciate the ingenuity
of the solution. Sometimes you find an alternative proof of the result.
• You can speed-up or slow-down the video. By pressing settings at the
bottom-right corner, you can modify the playback speed.
• Send me an e-mail if you find a mistake which is not reported in these notes.

February 10, 2021


1
2

Lecture 1: MCMC: the Gibbs sampler

Summary. This lecture is based on [Häggström, Chapter 7].

Content and Comments.


0:00 Definition of graphs
2:00 The hard core model
6:00 Formulation of the problem: simulate the uniform measure on the space
of configurations of the hard-core model with a sequence i.i.d. random
variables distributed according to U [0, 1].
10:50 The q-coloring model.
16:30 Strategy to simulate the uniform measure on the space of configurations of
the q-coloring model. Define an aperiodic, irreducible Markov chain whose
stationary state is the measure.
21:58 A Markov chain for the hard-core model.
42:20 The Markov chain constructed is aperiodic, irreducible and its unique sta-
tionary state is the uniform measure.
54:55 Two problems: (a) This procedure does not provide a sample of the station-
ary state, but something close to that; (b) Need to estimate the distance
from the state of process to the stationary state. Additionally, are there
chains which converge faster?
59:26 Definition of the Gibbs sampler.
1:00:08 Computation of the Gibbs sampler transition probability
1:12:12 The Gibbs sampler is aperiodic. Irreducibility is model dependent. The
stationary state is the Gibbs measure (actually reversible).
1:26:11 In the hard-core model, the Gibbs sampler corresponds to the Markov chain
presented at the beginning of the lecture.

Further readings.
A. The reader will find in [Norris] and [LPW] a full account of the theory of
discrete-time Markov chains.

Recommended exercises.
a. Show that a measure which satisfies the detailed-balance conditions is sta-
tionary. Give an example of an aperiodic and irreducible Markov chain
whose stationary state does not satisfy the detailed-balance conditions.
b. Show that the Gibbs sampler for the q-coloring model corresponds to the
algorithm presented in the lecture.
c. [Häggström, Chapter 7], problems 1, 3 and 4.
3

Lecture 2: The Gibbs sampler and the Metropolis dynamics.

Summary. This lecture is based on Chapters 3 and 7 of [Häggström] and on


Chapter 4 of [LPW].

Content and Comments.


0:00 The Gibbs sampler for the q-coloring model.
19:30 The Metropolis dynamics
27:00 The reference measure is reversible for the Metropolis dynamics.
30:48 The Metropolis dynamics is irreducible and aperiodic (if π(η)/d(η) is not
constant). If it this ratio is constant, the lazy version is aperiodic and
reversible with respect to π.
42:25 The cyclic Gibbs sampler.
52:20 How to construct a Markov chain on a finite state space with i.i.d. random
variables uniformly distributed on [0, 1].
1:10:35 The total variation distance Pbetween probability measures. Definition and
lemma: k ν − µ kTV = (1/2) x∈X | ν(x) − µ(x) |.
1:17:55 The total variation between two distributions decrease in time.

Further readings.
A. The reader will find in Chapter 3 of [LPW] a discussion on Gibbs samplers
and the Metropolis dynamics, and in Chapter 4 the definition of the total
variation distance and its relation to coupling and mixing times.
B. Section 2.4 in [Aldous-Fill] defines the total variation distance and pro-
vides some examples and applications. Bounds for mixing times of different
chains are the subject of more than one section.
C. [Aldous-Fill, Chapter 11] proposes alternative Metropolis dynamics.
D. Interested readers may want to learn about the cutoff phenomenon [LPW,
Chapter 18].

Recommended exercises.
a. Prove Proposition 4.5 and 4.7 in [LPW].
4

Lecture 3: The total variation distance, coupling and mixing times.

Summary. This lecture is based on Chapter 4 of [LPW] and Chapter 8 of [Häggström].

Content and Comments.


0:00 Definition of a coupling between two probability measures on a finite space.
Problem: Maximize the measure of the diagonal among all couplings of two
measures.
4:40 Proposition: k µ − ν kTV = inf{ P (Dc ) : P coupling between µ and ν }.
This is [LPW, Proposition 4.7].
34:54 Proposition: Let (Xn : n ≥ 1) be an aperiodic, irreducible, discrete-time
Markov chain. Denote by π is stationary state. There exist C0 < ∞ and
0 < α < 1 such that k Pn (x, ·) − π(·) kTV ≤ C0 αn for all x ∈ Ω, n ≥ 1.
This is [LPW, Theorem 4.9].
58:17 Definition of the mixing time.
1:00:55 Coupling Markov chains to estimate the mixing times. Coupling two copies
of the q-coloring model.
1:10:10 Estimating the mixing time of the q-coloring model with no restrictions on
colors using a coupling. This elementary example illustrates the interest in
updating the vertices following a deterministic order. In this case we are
sure that at time M all vertices have been updated
1:19:33 Coupling the cyclic Gibbs sampler for the q-coloring model assuming that
q > 2d2 , where d is the maximum degree of the graph. This is [Häggström,
Theorem 8.1].
1:29:22 Actually, |A1 | + 2|A2 | = 2d(v1 ) ≤ 2 maxk d(vk ) = 2d.

Further readings.
A. The reader will find in Section 4.4 of [LPW] properties of the total variation
distance in the context of Markov chains.
B. [LPW, Chapter 5] discusses coupling of Markov chains and provides general
bounds on the mixing time of some standard Markov chains. Among them,
the q-coloring model and the hardcore model.
C. [Aldous-Fill, Chapter 12] presents many examples in which coupling is used
to bound the total variation distance, providing bounds on the mixing
times.

Recommended exercises.
a. Fill the details of Example 4.15 of [LPW]
b. [LPW, Chapter 4], exercises 1, 3, 4, 5.
c. [Häggström, Chapter 8], problems 2, 4.
d. [LPW, Chapter 5], exercises 1 – 4.
5

Lecture 4: The Propp–Wilson algorithm.

Summary. This lecture is based on Chapter 10 of [Häggström].

Content and Comments.


0:00 The Propp–Wilson algorithm. General considerations
5:46 The Propp–Wilson algorithm. Definition. This algorithm is sometimes
called an exact simulation since it provides an output which is distributed
according to the stationary state π.
17:04 An example: the Propp–Wilson algorithm applied to a 4-state Markov
chain.
30:37 Theorem. Assume that the chain is aperiodic and irreducible. Let Y be
the outcome of the Propp-Wilson algorithm. If the algorithm is successful,
then Y is distributed according to the unique stationary state. This is
[Häggström, Theorem 10.1].
47:38 Comments on the result: (a). One needs to run too many chains. (b)
Running forward in time does not work, example.
55:19 (c) Renewing the random variables (Ut : t ≥ 1) at each new step does not
work, example.
1:10:03 Definition of a partial order in a set. The classical terminology is “partial
order” and not “order”, as I adopted in the lecture. An example in {0, 1}V .
Maximal and minimal configurations.
1:15:50 Monotone or attractive chains. These are Markov chains defined on a par-
tially ordered set for which there exist a coupling which keeps the partial
order.
1:17:18 Example of a monotone Markov chain.

Further readings.
A. [Aldous-Fill, Section 9.3] provides other examples and references for other
exact simulations.

Recommended exercises.
*a. [Häggström, Chapter 10], problem 2. This example shows that the choice
of the update function is important in the Propp-Wilson algorithm.
b. [Häggström, Chapter 10], problem 1.
c. Let Ω be a partially ordered set. Denote by M the set of monotone func-
tions: f ∈ M if and only if f (η) ≤ f (ξ) for all η ≤ ξ. Say that a Markov
chain is monotone if P f ∈ M for all f ∈ M. Show that there exists a
coupling which preserves the order if and only if the chain is monotone.
6

Lecture 5: Simulating the Gibbs measure of the Ising model


with the Propp–Wilson algorithm.

Summary. This lecture is based on Chapter 11 of [Häggström].

Content and Comments.


0:00 Fix a partially ordered, finite set Ω and an aperiodic, irreducible, Ω-valued
Markov chain (Xt ). Assume that there exists an update function ϕ : Ω ×
[0, 1] → Ω such that ϕ(η, ·) ≤ ϕ(ξ, ·) for all η ≤ ξ. Then, for all p ≥ 1, it is
(p,η)
possible to couple copies of the chain Xt , η ∈ Ω, such that
(p,η)
(a) X−Np = η for all η ∈ Ω, and
(p,η) (p,ξ)
(b) Xt ≤ Xt for all η ≤ ξ, −Np ≤ t ≤ 0.
In particular, if Ω admits a minimal and a maximal configuration, rep-
(p,η ) (p,η) (p,η ? )
resented by η? , η ? , respectively, Xt ? ≤ Xt ≤ Xt for all η ∈ Ω,
−Np ≤ t ≤ 0.
Therefore, under the previous assumptions, the Propp–Wilson algorithm
(p,η ) (p,η ? )
produces an output at step p if and only if X0 ? = X0 .
17:10 The Ising model in a square without boundary conditions. P
(a) The energy of a configuration is given by H(σ) = − y∼x σy σx .
(b) The Gibbs measure µβ (σ) = (1/Zβ )e−βH(σ) .
(c) The case β = 0 and β → ∞
39:25 Phase transition in the Ising model. The Ising model with an external field.
44:50 Of course, Eµβ,h [σ0 ] = 0 for h = 0. This is corrected later.
55:16 The Gibbs sampler for the Ising model.
1:07:59 The update function ϕ for the Gibbs sampler in the Ising model.
1:16:00 The function ϕ is monotone in the sense that ϕ(σ, ·) ≤ ϕ(σ 0 , ·) for all σ ≤ σ 0 .
Conclusion: To simulate the Gibbs measure of the Ising model, we can use
the Propp–Wilson algorithm starting from only two configurations: one
with only minuses and the other with all pluses.

Further readings.
A. [LPW, Chapter 15] provides bounds for the mixing time of the Gibbs sam-
pler applied to the Ising model in different graphs. See also Section 3.3.5
in this book.
B. [LPW, Chapter 22], written by J. G. Propp and D. B. Wilson, discusses
various applications of coupling from the past.
C. [Friedli-Velenik, Chapter 3] is a comprehensive monograph on the phase
transition of the Ising model.

Recommended exercises.
*a. [Häggström, Chapter 11], problems 1 – 3.
7

References
[Aldous-Fill] D. Aldous, J. A. Fill: Reversible Markov Chains and Random Walks on Graphs.
Unfinished monograph, 2002 (recompiled version, 2014).
Available at https://www.stat.berkeley.edu/users/aldous/RWG/book.html
[Häggström] O. Häggström Finite Markov Chains and Algorithmic Applications. London Math-
ematical Society Student Texts 00, cambridge University Press, 2002.
[LPW] D. A. Levin, Y. Peres, and E. L. Wilmer. Markov chains and mixing times. American
Mathematical Soc. (2009)
[Norris] J. Norris: Markov Chains, Cambridge University Press. (1997)
[Friedli-Velenik] S. Friedli, Y. Velenik: Statistical Mechanics of Lattice Systems: A Concrete
Mathematical Introduction Cambridge: Cambridge University Press, 2017. ISBN: 978-1-107-
18482-4 DOI: 10.1017/9781316882603

You might also like