SlideShare a Scribd company logo
Density exploration methods
Pierre Jacob
Department of Statistics, University of Oxford
pierre.jacob at stats.ox.ac.uk
March 2014
Pierre Jacob Density exploration 1/ 49
Outline
1 MCMC and multimodal target distributions
2 Parallel MCMC, tempering and equi-energy moves
3 Wang–Landau algorithm
Pierre Jacob Density exploration 2/ 49
Outline
1 MCMC and multimodal target distributions
2 Parallel MCMC, tempering and equi-energy moves
3 Wang–Landau algorithm
Pierre Jacob Density exploration 3/ 49
MCMC and multimodal target distributions
Algorithm 1 Metropolis–Hastings targeting π
1: Init X0 ∈ X.
2: for t = 1 to T do
3: Sample X⋆ from some proposal distribution q(Xt−1, ·).
4: Compute the acceptance ratio:
α(Xt−1, X⋆
) = min
(
1,
π(X⋆)
π(Xt−1)
q(X⋆, Xt−1)
q(Xt−1, X⋆)
)
.
5: With probability α(Xt−1, X⋆), set Xt = X⋆;
otherwise Xt = Xt−1.
6: end for
Pierre Jacob Density exploration 3/ 49
MCMC and multimodal target distributions
MCMC methods allow to approximate
∫
φ(x)π(x)dx,
as long as π can be evaluated / estimated point-wise, and
generate
X0, X1, . . . , XT .
The guarantees are largely asymptotic in T going to infinity.
For multimodal target distributions the non-asymptotic regime
might be very different.
Pierre Jacob Density exploration 4/ 49
MCMC and multimodal target distributions
−4
−2
0
2
4
6
8
−4 −2 0 2 4 6 8
µ1
µ2
Figure : Posterior distribution of (µ1, µ2) in a Gaussian mixture model.
See Stephens (1997), Bayesian methods for mixtures of normal distribu-
tions, PhD thesis. Figure obtained using PAWL.
Pierre Jacob Density exploration 5/ 49
MCMC and multimodal target distributions
r
θ
−2
−1
0
1
2
−2 −1 0 1 2
Figure : Posterior distribution of (r, θ) in a theta-Ricker Hidden Markov
model. See Polansky et al. (2009), Likelihood ridges and multimodality
in population growth rate models. Figure obtained using SMC2
.
Pierre Jacob Density exploration 6/ 49
MCMC and multimodal target distributions
0.00
0.05
0.10
0.15
0.20
0.25
−5 0 5 10 15
X
density
Figure : Toy example: a mixture of well-separated normal distributions.
Pierre Jacob Density exploration 7/ 49
MCMC and multimodal target distributions
0.0
0.1
0.2
0.3
0.4
0.5
−5 0 5 10 15
AMH
density
Figure : Markov chain still stuck in one mode after 50, 000 iterations.
Pierre Jacob Density exploration 8/ 49
MCMC and multimodal target distributions
−750
−500
−250
0
0 200 400 600 800
X
Y
0.25
0.50
0.75
density
Figure : Feist your eyes on the moustarget distribution!
Pierre Jacob Density exploration 9/ 49
MCMC and multimodal target distributions
Note that multimodal distributions are not difficult to sample
from if the modes are not well separated.
In fact we can [re]define a mode as a region from where
Metropolis-Hastings cannot escape.
Non-asymptotic Error Bounds for Sequential MCMC
Methods in Multimodal Settings. N. Schweizer 2012
Pierre Jacob Density exploration 10/ 49
Outline
1 MCMC and multimodal target distributions
2 Parallel MCMC, tempering and equi-energy moves
3 Wang–Landau algorithm
Pierre Jacob Density exploration 11/ 49
Parallel MCMC
A first idea is to run N chains independently, from various
starting points chosen to be “spread out”.
The chains can thus find multiple modes, and other benefits
such as parallelization and convergence diagnostics.
What if there are > N modes? What if all the chains are
initialized in the attraction zone of the same mode?
Pierre Jacob Density exploration 11/ 49
Parallel MCMC
Figure : Parallel MCMC on the moustarget distribution
Pierre Jacob Density exploration 12/ 49
Parallel MCMC
−400
−300
−200
−100
0
100
0 2500 5000 7500 10000
iterations
Y
indexchain
1
2
3
4
5
6
7
8
9
10
Figure : Parallel MCMC on the moustarget distribution
Pierre Jacob Density exploration 13/ 49
Parallel Tempering
The idea of parallel tempering is to run N chains targeting
different versions of π, of “increasing difficulty”.
Introduce “inverse temperatures”:
0 < γ1 < . . . < γN = 1.
Introduce “tempered” distributions πγn for n = 1, . . . , N.
For γ ≈ 0, πγ is considered easier to sample because the
variations of π are smaller.
Pierre Jacob Density exploration 14/ 49
Parallel Tempering
One MCMC chain per inverse temperature, for instance using
a Metropolis-Hastings kernel targeting πγn .
Note that the local modes of πγ are the same for every γ.
The chains interact through “swap moves”.
Pierre Jacob Density exploration 15/ 49
Parallel Tempering
When a “swap move” is to be performed, do the following.
Sample indices k1, k2 uniformly in {1, . . . , N}.
With acceptance probability
min
(
1,
πγk1 (xk2 )πγk2 (xk1 )
πγk1 (xk1 )πγk2 (xk2 )
)
,
exchange the value of xk1 and xk2 .
This doesn’t change the joint target distribution
πγ1
⊗ πγ2
⊗ . . . ⊗ πγN
.
In particular the N-th chain still targets πγN = π.
Pierre Jacob Density exploration 16/ 49
Parallel Tempering
Figure : Parallel Tempering on the moustarget distribution, with γ
equally spaced in [0.5, 1].
Pierre Jacob Density exploration 17/ 49
Parallel Tempering
−400
−300
−200
−100
0
100
0 2500 5000 7500 10000
iterations
Y
indexchain
1
2
3
4
5
6
7
8
9
10
Figure : Parallel Tempering on the moustarget distribution, with γ
equally spaced in [0.5, 1].
Pierre Jacob Density exploration 18/ 49
Parallel Tempering
Figure : Parallel Tempering on the moustarget distribution, with γ
equally spaced in [0.1, 1].
Pierre Jacob Density exploration 19/ 49
Parallel Tempering
−1000
−500
0
0 2500 5000 7500 10000
iterations
Y
indexchain
1
2
3
4
5
6
7
8
9
10
Figure : Parallel Tempering on the moustarget distribution, with γ
equally spaced in [0.1, 1].
Pierre Jacob Density exploration 20/ 49
Parallel Tempering
The choice of (γn)N
n=1 is essential.
Taking γ1 very low increases the exploration for the chain
targeting πγ1 .
If the increments γn − γn−1 are too large, the swap moves
tend to be rejected, which decreases the exploration for the
“upper” chains.
Pierre Jacob Density exploration 21/ 49
SMC sampler
Sequence of distributions, for instance πγn for n = 1, . . . , N
such that 0 < γ1 < . . . < γN = 1. Say N = 100.
M particles (say 10,000), sequentially importance sampling
from µ to πγ1 and then from πγn−1 to πγn .
When the effective sample size is low, resample and then
MCMC move for each particle (say 5 steps for each particle).
The ability to recover modes is sensitive to the choice of the
initial distribution µ.
Pierre Jacob Density exploration 22/ 49
SMC sampler
q qqq
qq
q
q qq
−750
−500
−250
0
0 200 400 600 800
X
Y
Figure : SMC sampler on the moustarget distribution
µ = N
((
400
−100
)
,
(
322 0
0 322
))
Pierre Jacob Density exploration 23/ 49
SMC sampler
q
qqq
q
q
qq
q
q
q
q
q
q
q
q
q
q
qq
q
qqq
q
q
q
q
qqq
qq
q
q
q
qqq
q
qqq q
q
q q
q
q
q
q
q
qqq
q
q
q
q
q
qq
q
qqq
q
q q
q
q
q
q
q
q
q
qqq
q
q
q
qq
q
q q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q q
q
qq
q
q
q
q
q
q
−750
−500
−250
0
0 200 400 600 800
X
Y
Figure : SMC sampler on the moustarget distribution
µ = N
((
400
−400
)
,
(
1002 0
0 1002
))
Pierre Jacob Density exploration 24/ 49
Equi-energy sampler
Same initial setting as parallel tempering: N chains (Xn
t )N
n=1,
each targets πγn , using MH kernel.
First chain (X1
t ) simply targets πγ1 using MH.
For an upper chain (Xn
t ), with probability ε perform an
equi-energy move, otherwise MH step targeting πγn .
Adaptive Equi-Energy Sampler : Convergence and
Illustration, Schreck, Fort, Moulines, 2013.
Pierre Jacob Density exploration 25/ 49
Equi-energy sampler
An equi-energy move consists in proposing a point among the
history of the chain just below, (Xn−1
t ), and then accepting it
with a MH type acceptance probability.
[Whereas in Parallel Tempering we proposed the current state
of any other chain.]
The proposal is reduced to points with roughly similar values
of π(x) (hence “equi-energy”).
Pierre Jacob Density exploration 26/ 49
Equi-energy sampler
Introduce a sequence ξ0 = 0 < ξ1 < . . . < ξS = +∞ cutting
the density axis R+ in S intervals.
Introduce H(x, y) such that H(x, y) = 1 if π(x) and π(y) are
in the same interval [ξk, ξk+1).
Introduce the proposal distribution given Xn
t and
θ = {Xn−1
k }1≤k≤t:
gθ(Xn
t , dy) ∝
t∑
k=1
H(Xn−1
k , Xn
t )δXn−1
k
(dy).
Then the proposed point is accepted with probability
min
(
1,
πγn−γn−1 (y)
πγn−γn−1 (Xn
t )
)
(similar to swap acceptance probability in Parallel Tempering)
Pierre Jacob Density exploration 27/ 49
Outline
1 MCMC and multimodal target distributions
2 Parallel MCMC, tempering and equi-energy moves
3 Wang–Landau algorithm
Pierre Jacob Density exploration 28/ 49
Wang–Landau algorithm
The main idea is to force the chain to avoid regions that have
already been visited.
The concept of region is formalized by a partition of the state
space.
The self-avoiding effect is achieved by an adaptation of the
transition kernel.
Determining the density of states for classical statistical
models: A random walk algorithm to produce a flat
histogram, F. Wang and D. Landau, Physical Review E 2001.
Pierre Jacob Density exploration 28/ 49
Wang–Landau algorithm
Partition the state space:
X =
d∪
i=1
Xi
Desired frequencies of visit:
ϕ = (ϕ1, . . . , ϕd) such that
d∑
i=1
ϕi = 1
Pierre Jacob Density exploration 29/ 49
Wang–Landau algorithm
0.00
0.05
0.10
0.15
0.20
0.25
−5 0 5 10 15
X
density
Figure : Partition the state space in 11 bins.
Pierre Jacob Density exploration 30/ 49
Wang–Landau algorithm
Penalized distribution for any θ = (θ1, . . . , θd):
πθ(x) ∝
π(x)
θ(J(x))
where J(x) such that x ∈ XJ(x).
There is θ⋆ such that:
∀i ∈ {1, . . . , d}
∫
Xi
πθ⋆ (x)dx = ϕi
i.e. πθ⋆ gives a desired mass ϕi to each bin Xi.
These ideal penalties θ⋆ are not available.
Pierre Jacob Density exploration 31/ 49
Wang–Landau algorithm
0.00
0.05
0.10
0.15
0.20
0.25
−5 0 5 10 15
X
density
Figure : Biased target distribution: each bin now carries the same mass.
Pierre Jacob Density exploration 32/ 49
Wang–Landau algorithm
0.00
0.05
0.10
0.15
0.20
0.25
−5 0 5 10 15
X
density
Figure : Markov chain exploring the biased target over 50, 000 iterations.
Pierre Jacob Density exploration 33/ 49
Wang–Landau algorithm
Algorithm 2 Wang-Landau with deterministic schedule (ηt)
1: Init θ0 > 0, X0 ∈ X.
2: for t = 1 to T do
3: Sample Xt from Kθt−1 (Xt−1, ·), MH kernel targeting πθt−1 .
4: Update the penalties:
log θt(i) ← log θt−1(i) + ηt (1IXi (Xt) − ϕi)
5: end for
Pierre Jacob Density exploration 34/ 49
Wang–Landau algorithm
If ηt → 0 “fast enough”, (θt)t≥0 converges.
If for each bin i, ϕi = 1/d:
θt(i) −−−→
t→∞
∫
Xi
π(x)dx =: ψi
at least up to a multiplicative constant.
(Xt)t≥0 is asymptotically distributed according to πθ⋆ .
Convergence of the Wang-Landau algorithm,
G. Fort, B. Jourdain, E. Kuhn, T. Lelievre, G. Stoltz
2012, on arXiv.
Pierre Jacob Density exploration 35/ 49
Wang–Landau algorithm
Choice of (ηt) can have a huge impact on the results.
Define the counters:
νt(i) :=
t∑
n=1
1IXi (Xn)
Flat Histogram (FH) is reached when:
max
i∈{1,...,d}
νt(i)
t
− ϕi < c
for some c > 0.
Instead of decreasing (ηt) at each iteration, decrease only
when the Flat Histogram criterion is reached.
Pierre Jacob Density exploration 36/ 49
Wang–Landau algorithm
Algorithm 3 Wang-Landau with stochastic schedule (ηκt )
1: Init θ0 = 1, X0 ∈ X.
2: Init κ0 ← 0.
3: for t = 1 to T do
4: Sample Xt from Kθt−1 (Xt−1, ·), MH kernel targeting πθt−1 .
5: If (FH) then κt ← κt−1 + 1, otherwise κt ← κt−1.
6: Update the penalties:
log θt(i) ← log θt−1(i) + ηκt (1IXi (Xt) − ϕi)
7: end for
Pierre Jacob Density exploration 37/ 49
Wang–Landau algorithm
To be sure that eventually, for any c > 0:
max
i∈{1,...,d}
νt(i)
t
− ϕi < c
we have proved:
∀i ∈ {1, . . . , d}
νt(i)
t
P
−−−→
t→∞
ϕi
for any fixed η > 0,
which implies:
E
[
inf
{
t ≥ 0 : ∀i ∈ {1, . . . , d} |
νt(i)
t
− ϕi| < c
}]
< ∞.
The Wang-Landau algorithm reaches the Flat Histogram
criterion in finite time, PJ & R. Ryder, AAP 2013.
Pierre Jacob Density exploration 38/ 49
Wang–Landau algorithm
N chains (X
(1)
t , . . . , X
(N)
t ) using the same kernel Kθt
targeting πθt at time t.
The interaction is made through the common penalties (θt).
The update was
log θt(i) ← log θt−1(i) + η (1IXi (Xt) − ϕi) .
Pierre Jacob Density exploration 39/ 49
Wang–Landau algorithm
N chains (X
(1)
t , . . . , X
(N)
t ) using the same kernel Kθt
targeting πθt at time t.
The interaction is made through the common penalties (θt).
The update is now
log θt(i) ← log θt−1(i) + η
(
1
N
∑N
k=1 1IXi (X
(k)
t ) − ϕi
)
.
Pierre Jacob Density exploration 39/ 49
Wang–Landau algorithm
Default choice of partition: along density values
(always 1-dimensional!).
Introduce a sequence ξ0 = 0 < ξ1 < . . . < ξd = +∞ cutting
the density axis R+ in d intervals.
Some sense of a good range can be grasped from pilot runs.
Pierre Jacob Density exploration 40/ 49
Wang–Landau algorithm
Figure : Wang-Landau on the moustarget distribution. Colours represent
the partition.
Pierre Jacob Density exploration 41/ 49
Wang–Landau algorithm
Figure : First bin of the partition.
Pierre Jacob Density exploration 42/ 49
Wang–Landau algorithm
Figure : Last bin of the partition.
Pierre Jacob Density exploration 43/ 49
Wang–Landau algorithm
Figure : Evolution of log θt along the iterations.
Pierre Jacob Density exploration 44/ 49
Wang–Landau algorithm
In some situations we might have some intuition on the
direction along which the modes are spread.
For instance, if we knew that the modes of the moustarget
distribution were along the y-axis:
∀i ∈ {1, . . . , d} Xi = R × (yi, yi+1)
with −∞ = y1 < y2 < . . . < yd = +∞.
Pierre Jacob Density exploration 45/ 49
Wang–Landau algorithm
Figure : Wang-Landau using the y-axis to partition the space.
Pierre Jacob Density exploration 46/ 49
Wang–Landau algorithm
Figure : Exploration of the bins.
Pierre Jacob Density exploration 47/ 49
Wang–Landau algorithm
Figure : Evolution of log θ along the iterations.
Pierre Jacob Density exploration 48/ 49
Bibliography
The Wang-Landau algorithm in general state spaces:
applications and convergence analysis, Y. Atchad´e and J.
Liu, Statistica Sinica 2010.
Determining the density of states for classical statistical
models: A random walk algorithm to produce a flat
histogram, F. Wang and D. Landau, Physical Review E 2001.
An Adaptive Interacting Wang-Landau Algorithm for
Automatic Density Exploration, L. Bornn, PJ, P. Del Moral,
A. Doucet, JCGS 2013.
The Wang-Landau algorithm reaches the Flat Histogram
criterion in finite time, PJ & R. Ryder, AAP 2013.
Adaptive Equi-Energy Sampler : Convergence and
Illustration, Schreck, Fort, Moulines, 2013.
Efficiency of the Wang-Landau algorithm: a simple test
case, G. Fort, B. Jourdain, E. Kuhn, T Lelievre, G. Stoltz,
2014.
Pierre Jacob Density exploration 49/ 49

More Related Content

What's hot (19)

PDF
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Christian Robert
 
PDF
Bayesian inversion of deterministic dynamic causal models
khbrodersen
 
PDF
Chris Sherlock's slides
Christian Robert
 
PDF
Unbiased Bayes for Big Data
Christian Robert
 
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Maximum likelihood estimation of regularisation parameters in inverse problem...
Valentin De Bortoli
 
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Introduction to MCMC methods
Christian Robert
 
PDF
Complexity of exact solutions of many body systems: nonequilibrium steady sta...
Lake Como School of Advanced Studies
 
PDF
Nested sampling
Christian Robert
 
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
MCMC and likelihood-free methods
Christian Robert
 
PDF
Bayesian hybrid variable selection under generalized linear models
Caleb (Shiqiang) Jin
 
PDF
Approximate Bayesian Computation with Quasi-Likelihoods
Stefano Cabras
 
PDF
Poster for Bayesian Statistics in the Big Data Era conference
Christian Robert
 
PDF
Lesage
eric_gautier
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Christian Robert
 
Bayesian inversion of deterministic dynamic causal models
khbrodersen
 
Chris Sherlock's slides
Christian Robert
 
Unbiased Bayes for Big Data
Christian Robert
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Valentin De Bortoli
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Introduction to MCMC methods
Christian Robert
 
Complexity of exact solutions of many body systems: nonequilibrium steady sta...
Lake Como School of Advanced Studies
 
Nested sampling
Christian Robert
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
MCMC and likelihood-free methods
Christian Robert
 
Bayesian hybrid variable selection under generalized linear models
Caleb (Shiqiang) Jin
 
Approximate Bayesian Computation with Quasi-Likelihoods
Stefano Cabras
 
Poster for Bayesian Statistics in the Big Data Era conference
Christian Robert
 
Lesage
eric_gautier
 

Similar to Density exploration methods (20)

PDF
How many components in a mixture?
Christian Robert
 
PDF
Metodo Monte Carlo -Wang Landau
angely alcendra
 
PDF
Testing for mixtures at BNP 13
Christian Robert
 
PDF
A bit about мcmc
Alexander Favorov
 
PDF
2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
An investigation of inference of the generalized extreme value distribution b...
Alexander Decker
 
PPTX
Monte Carlo Berkeley.pptx
HaibinSu2
 
PDF
Discussion of Matti Vihola's talk
Christian Robert
 
PDF
CLIM: Transition Workshop - Accounting for Model Errors Due to Sub-Grid Scale...
The Statistical and Applied Mathematical Sciences Institute
 
PPTX
JOURNALnew
Mohomed Abraj.
 
PDF
Introduction to advanced Monte Carlo methods
Christian Robert
 
PDF
intro
ssuser9ed16a1
 
PDF
Talk at 2013 WSC, ISI Conference in Hong Kong, August 26, 2013
Christian Robert
 
PDF
Unbiased Markov chain Monte Carlo
JeremyHeng10
 
PDF
Gibbs flow transport for Bayesian inference
JeremyHeng10
 
PDF
Sampling and Markov Chain Monte Carlo Techniques
Tomasz Kusmierczyk
 
PPT
HUST-talk-1.pptof uncertainty quantification. Volume 6. Springer, 2017. SFK08...
bappadasgolkunda
 
PDF
Pre-computation for ABC in image analysis
Matt Moores
 
PDF
A new implementation of k-MLE for mixture modelling of Wishart distributions
Frank Nielsen
 
How many components in a mixture?
Christian Robert
 
Metodo Monte Carlo -Wang Landau
angely alcendra
 
Testing for mixtures at BNP 13
Christian Robert
 
A bit about мcmc
Alexander Favorov
 
2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...
The Statistical and Applied Mathematical Sciences Institute
 
An investigation of inference of the generalized extreme value distribution b...
Alexander Decker
 
Monte Carlo Berkeley.pptx
HaibinSu2
 
Discussion of Matti Vihola's talk
Christian Robert
 
CLIM: Transition Workshop - Accounting for Model Errors Due to Sub-Grid Scale...
The Statistical and Applied Mathematical Sciences Institute
 
JOURNALnew
Mohomed Abraj.
 
Introduction to advanced Monte Carlo methods
Christian Robert
 
Talk at 2013 WSC, ISI Conference in Hong Kong, August 26, 2013
Christian Robert
 
Unbiased Markov chain Monte Carlo
JeremyHeng10
 
Gibbs flow transport for Bayesian inference
JeremyHeng10
 
Sampling and Markov Chain Monte Carlo Techniques
Tomasz Kusmierczyk
 
HUST-talk-1.pptof uncertainty quantification. Volume 6. Springer, 2017. SFK08...
bappadasgolkunda
 
Pre-computation for ABC in image analysis
Matt Moores
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
Frank Nielsen
 
Ad

More from Pierre Jacob (14)

PDF
Talk at CIRM on Poisson equation and debiasing techniques
Pierre Jacob
 
PDF
ISBA 2022 Susie Bayarri lecture
Pierre Jacob
 
PDF
Monte Carlo methods for some not-quite-but-almost Bayesian problems
Pierre Jacob
 
PDF
Estimation of the score vector and observed information matrix in intractable...
Pierre Jacob
 
PDF
Estimation of the score vector and observed information matrix in intractable...
Pierre Jacob
 
PDF
Estimation of the score vector and observed information matrix in intractable...
Pierre Jacob
 
PDF
Estimation of the score vector and observed information matrix in intractable...
Pierre Jacob
 
PDF
Current limitations of sequential inference in general hidden Markov models
Pierre Jacob
 
PDF
On non-negative unbiased estimators
Pierre Jacob
 
PDF
Path storage in the particle filter
Pierre Jacob
 
PDF
SMC^2: an algorithm for sequential analysis of state-space models
Pierre Jacob
 
PDF
PAWL - GPU meeting @ Warwick
Pierre Jacob
 
PDF
Presentation of SMC^2 at BISP7
Pierre Jacob
 
PDF
Presentation MCB seminar 09032011
Pierre Jacob
 
Talk at CIRM on Poisson equation and debiasing techniques
Pierre Jacob
 
ISBA 2022 Susie Bayarri lecture
Pierre Jacob
 
Monte Carlo methods for some not-quite-but-almost Bayesian problems
Pierre Jacob
 
Estimation of the score vector and observed information matrix in intractable...
Pierre Jacob
 
Estimation of the score vector and observed information matrix in intractable...
Pierre Jacob
 
Estimation of the score vector and observed information matrix in intractable...
Pierre Jacob
 
Estimation of the score vector and observed information matrix in intractable...
Pierre Jacob
 
Current limitations of sequential inference in general hidden Markov models
Pierre Jacob
 
On non-negative unbiased estimators
Pierre Jacob
 
Path storage in the particle filter
Pierre Jacob
 
SMC^2: an algorithm for sequential analysis of state-space models
Pierre Jacob
 
PAWL - GPU meeting @ Warwick
Pierre Jacob
 
Presentation of SMC^2 at BISP7
Pierre Jacob
 
Presentation MCB seminar 09032011
Pierre Jacob
 
Ad

Recently uploaded (20)

PPTX
Introduction to Probability(basic) .pptx
purohitanuj034
 
PPTX
Rules and Regulations of Madhya Pradesh Library Part-I
SantoshKumarKori2
 
PPTX
LDP-2 UNIT 4 Presentation for practical.pptx
abhaypanchal2525
 
PPTX
Applied-Statistics-1.pptx hardiba zalaaa
hardizala899
 
PPTX
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
PPTX
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
PPTX
Electrophysiology_of_Heart. Electrophysiology studies in Cardiovascular syste...
Rajshri Ghogare
 
PPTX
Constitutional Design Civics Class 9.pptx
bikesh692
 
PPTX
Command Palatte in Odoo 18.1 Spreadsheet - Odoo Slides
Celine George
 
PPTX
Unlock the Power of Cursor AI: MuleSoft Integrations
Veera Pallapu
 
PPTX
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
PPTX
FAMILY HEALTH NURSING CARE - UNIT 5 - CHN 1 - GNM 1ST YEAR.pptx
Priyanshu Anand
 
PPTX
YSPH VMOC Special Report - Measles Outbreak Southwest US 7-20-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
PDF
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
PPTX
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
PDF
My Thoughts On Q&A- A Novel By Vikas Swarup
Niharika
 
DOCX
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
PPT
DRUGS USED IN THERAPY OF SHOCK, Shock Therapy, Treatment or management of shock
Rajshri Ghogare
 
PDF
EXCRETION-STRUCTURE OF NEPHRON,URINE FORMATION
raviralanaresh2
 
PPTX
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
Introduction to Probability(basic) .pptx
purohitanuj034
 
Rules and Regulations of Madhya Pradesh Library Part-I
SantoshKumarKori2
 
LDP-2 UNIT 4 Presentation for practical.pptx
abhaypanchal2525
 
Applied-Statistics-1.pptx hardiba zalaaa
hardizala899
 
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
Electrophysiology_of_Heart. Electrophysiology studies in Cardiovascular syste...
Rajshri Ghogare
 
Constitutional Design Civics Class 9.pptx
bikesh692
 
Command Palatte in Odoo 18.1 Spreadsheet - Odoo Slides
Celine George
 
Unlock the Power of Cursor AI: MuleSoft Integrations
Veera Pallapu
 
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
FAMILY HEALTH NURSING CARE - UNIT 5 - CHN 1 - GNM 1ST YEAR.pptx
Priyanshu Anand
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 7-20-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
My Thoughts On Q&A- A Novel By Vikas Swarup
Niharika
 
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
DRUGS USED IN THERAPY OF SHOCK, Shock Therapy, Treatment or management of shock
Rajshri Ghogare
 
EXCRETION-STRUCTURE OF NEPHRON,URINE FORMATION
raviralanaresh2
 
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 

Density exploration methods

  • 1. Density exploration methods Pierre Jacob Department of Statistics, University of Oxford pierre.jacob at stats.ox.ac.uk March 2014 Pierre Jacob Density exploration 1/ 49
  • 2. Outline 1 MCMC and multimodal target distributions 2 Parallel MCMC, tempering and equi-energy moves 3 Wang–Landau algorithm Pierre Jacob Density exploration 2/ 49
  • 3. Outline 1 MCMC and multimodal target distributions 2 Parallel MCMC, tempering and equi-energy moves 3 Wang–Landau algorithm Pierre Jacob Density exploration 3/ 49
  • 4. MCMC and multimodal target distributions Algorithm 1 Metropolis–Hastings targeting π 1: Init X0 ∈ X. 2: for t = 1 to T do 3: Sample X⋆ from some proposal distribution q(Xt−1, ·). 4: Compute the acceptance ratio: α(Xt−1, X⋆ ) = min ( 1, π(X⋆) π(Xt−1) q(X⋆, Xt−1) q(Xt−1, X⋆) ) . 5: With probability α(Xt−1, X⋆), set Xt = X⋆; otherwise Xt = Xt−1. 6: end for Pierre Jacob Density exploration 3/ 49
  • 5. MCMC and multimodal target distributions MCMC methods allow to approximate ∫ φ(x)π(x)dx, as long as π can be evaluated / estimated point-wise, and generate X0, X1, . . . , XT . The guarantees are largely asymptotic in T going to infinity. For multimodal target distributions the non-asymptotic regime might be very different. Pierre Jacob Density exploration 4/ 49
  • 6. MCMC and multimodal target distributions −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 µ1 µ2 Figure : Posterior distribution of (µ1, µ2) in a Gaussian mixture model. See Stephens (1997), Bayesian methods for mixtures of normal distribu- tions, PhD thesis. Figure obtained using PAWL. Pierre Jacob Density exploration 5/ 49
  • 7. MCMC and multimodal target distributions r θ −2 −1 0 1 2 −2 −1 0 1 2 Figure : Posterior distribution of (r, θ) in a theta-Ricker Hidden Markov model. See Polansky et al. (2009), Likelihood ridges and multimodality in population growth rate models. Figure obtained using SMC2 . Pierre Jacob Density exploration 6/ 49
  • 8. MCMC and multimodal target distributions 0.00 0.05 0.10 0.15 0.20 0.25 −5 0 5 10 15 X density Figure : Toy example: a mixture of well-separated normal distributions. Pierre Jacob Density exploration 7/ 49
  • 9. MCMC and multimodal target distributions 0.0 0.1 0.2 0.3 0.4 0.5 −5 0 5 10 15 AMH density Figure : Markov chain still stuck in one mode after 50, 000 iterations. Pierre Jacob Density exploration 8/ 49
  • 10. MCMC and multimodal target distributions −750 −500 −250 0 0 200 400 600 800 X Y 0.25 0.50 0.75 density Figure : Feist your eyes on the moustarget distribution! Pierre Jacob Density exploration 9/ 49
  • 11. MCMC and multimodal target distributions Note that multimodal distributions are not difficult to sample from if the modes are not well separated. In fact we can [re]define a mode as a region from where Metropolis-Hastings cannot escape. Non-asymptotic Error Bounds for Sequential MCMC Methods in Multimodal Settings. N. Schweizer 2012 Pierre Jacob Density exploration 10/ 49
  • 12. Outline 1 MCMC and multimodal target distributions 2 Parallel MCMC, tempering and equi-energy moves 3 Wang–Landau algorithm Pierre Jacob Density exploration 11/ 49
  • 13. Parallel MCMC A first idea is to run N chains independently, from various starting points chosen to be “spread out”. The chains can thus find multiple modes, and other benefits such as parallelization and convergence diagnostics. What if there are > N modes? What if all the chains are initialized in the attraction zone of the same mode? Pierre Jacob Density exploration 11/ 49
  • 14. Parallel MCMC Figure : Parallel MCMC on the moustarget distribution Pierre Jacob Density exploration 12/ 49
  • 15. Parallel MCMC −400 −300 −200 −100 0 100 0 2500 5000 7500 10000 iterations Y indexchain 1 2 3 4 5 6 7 8 9 10 Figure : Parallel MCMC on the moustarget distribution Pierre Jacob Density exploration 13/ 49
  • 16. Parallel Tempering The idea of parallel tempering is to run N chains targeting different versions of π, of “increasing difficulty”. Introduce “inverse temperatures”: 0 < γ1 < . . . < γN = 1. Introduce “tempered” distributions πγn for n = 1, . . . , N. For γ ≈ 0, πγ is considered easier to sample because the variations of π are smaller. Pierre Jacob Density exploration 14/ 49
  • 17. Parallel Tempering One MCMC chain per inverse temperature, for instance using a Metropolis-Hastings kernel targeting πγn . Note that the local modes of πγ are the same for every γ. The chains interact through “swap moves”. Pierre Jacob Density exploration 15/ 49
  • 18. Parallel Tempering When a “swap move” is to be performed, do the following. Sample indices k1, k2 uniformly in {1, . . . , N}. With acceptance probability min ( 1, πγk1 (xk2 )πγk2 (xk1 ) πγk1 (xk1 )πγk2 (xk2 ) ) , exchange the value of xk1 and xk2 . This doesn’t change the joint target distribution πγ1 ⊗ πγ2 ⊗ . . . ⊗ πγN . In particular the N-th chain still targets πγN = π. Pierre Jacob Density exploration 16/ 49
  • 19. Parallel Tempering Figure : Parallel Tempering on the moustarget distribution, with γ equally spaced in [0.5, 1]. Pierre Jacob Density exploration 17/ 49
  • 20. Parallel Tempering −400 −300 −200 −100 0 100 0 2500 5000 7500 10000 iterations Y indexchain 1 2 3 4 5 6 7 8 9 10 Figure : Parallel Tempering on the moustarget distribution, with γ equally spaced in [0.5, 1]. Pierre Jacob Density exploration 18/ 49
  • 21. Parallel Tempering Figure : Parallel Tempering on the moustarget distribution, with γ equally spaced in [0.1, 1]. Pierre Jacob Density exploration 19/ 49
  • 22. Parallel Tempering −1000 −500 0 0 2500 5000 7500 10000 iterations Y indexchain 1 2 3 4 5 6 7 8 9 10 Figure : Parallel Tempering on the moustarget distribution, with γ equally spaced in [0.1, 1]. Pierre Jacob Density exploration 20/ 49
  • 23. Parallel Tempering The choice of (γn)N n=1 is essential. Taking γ1 very low increases the exploration for the chain targeting πγ1 . If the increments γn − γn−1 are too large, the swap moves tend to be rejected, which decreases the exploration for the “upper” chains. Pierre Jacob Density exploration 21/ 49
  • 24. SMC sampler Sequence of distributions, for instance πγn for n = 1, . . . , N such that 0 < γ1 < . . . < γN = 1. Say N = 100. M particles (say 10,000), sequentially importance sampling from µ to πγ1 and then from πγn−1 to πγn . When the effective sample size is low, resample and then MCMC move for each particle (say 5 steps for each particle). The ability to recover modes is sensitive to the choice of the initial distribution µ. Pierre Jacob Density exploration 22/ 49
  • 25. SMC sampler q qqq qq q q qq −750 −500 −250 0 0 200 400 600 800 X Y Figure : SMC sampler on the moustarget distribution µ = N (( 400 −100 ) , ( 322 0 0 322 )) Pierre Jacob Density exploration 23/ 49
  • 26. SMC sampler q qqq q q qq q q q q q q q q q q qq q qqq q q q q qqq qq q q q qqq q qqq q q q q q q q q q qqq q q q q q qq q qqq q q q q q q q q q q qqq q q q qq q q q q q q q q q qq q q q q q q q q q qq q q q q q q −750 −500 −250 0 0 200 400 600 800 X Y Figure : SMC sampler on the moustarget distribution µ = N (( 400 −400 ) , ( 1002 0 0 1002 )) Pierre Jacob Density exploration 24/ 49
  • 27. Equi-energy sampler Same initial setting as parallel tempering: N chains (Xn t )N n=1, each targets πγn , using MH kernel. First chain (X1 t ) simply targets πγ1 using MH. For an upper chain (Xn t ), with probability ε perform an equi-energy move, otherwise MH step targeting πγn . Adaptive Equi-Energy Sampler : Convergence and Illustration, Schreck, Fort, Moulines, 2013. Pierre Jacob Density exploration 25/ 49
  • 28. Equi-energy sampler An equi-energy move consists in proposing a point among the history of the chain just below, (Xn−1 t ), and then accepting it with a MH type acceptance probability. [Whereas in Parallel Tempering we proposed the current state of any other chain.] The proposal is reduced to points with roughly similar values of π(x) (hence “equi-energy”). Pierre Jacob Density exploration 26/ 49
  • 29. Equi-energy sampler Introduce a sequence ξ0 = 0 < ξ1 < . . . < ξS = +∞ cutting the density axis R+ in S intervals. Introduce H(x, y) such that H(x, y) = 1 if π(x) and π(y) are in the same interval [ξk, ξk+1). Introduce the proposal distribution given Xn t and θ = {Xn−1 k }1≤k≤t: gθ(Xn t , dy) ∝ t∑ k=1 H(Xn−1 k , Xn t )δXn−1 k (dy). Then the proposed point is accepted with probability min ( 1, πγn−γn−1 (y) πγn−γn−1 (Xn t ) ) (similar to swap acceptance probability in Parallel Tempering) Pierre Jacob Density exploration 27/ 49
  • 30. Outline 1 MCMC and multimodal target distributions 2 Parallel MCMC, tempering and equi-energy moves 3 Wang–Landau algorithm Pierre Jacob Density exploration 28/ 49
  • 31. Wang–Landau algorithm The main idea is to force the chain to avoid regions that have already been visited. The concept of region is formalized by a partition of the state space. The self-avoiding effect is achieved by an adaptation of the transition kernel. Determining the density of states for classical statistical models: A random walk algorithm to produce a flat histogram, F. Wang and D. Landau, Physical Review E 2001. Pierre Jacob Density exploration 28/ 49
  • 32. Wang–Landau algorithm Partition the state space: X = d∪ i=1 Xi Desired frequencies of visit: ϕ = (ϕ1, . . . , ϕd) such that d∑ i=1 ϕi = 1 Pierre Jacob Density exploration 29/ 49
  • 33. Wang–Landau algorithm 0.00 0.05 0.10 0.15 0.20 0.25 −5 0 5 10 15 X density Figure : Partition the state space in 11 bins. Pierre Jacob Density exploration 30/ 49
  • 34. Wang–Landau algorithm Penalized distribution for any θ = (θ1, . . . , θd): πθ(x) ∝ π(x) θ(J(x)) where J(x) such that x ∈ XJ(x). There is θ⋆ such that: ∀i ∈ {1, . . . , d} ∫ Xi πθ⋆ (x)dx = ϕi i.e. πθ⋆ gives a desired mass ϕi to each bin Xi. These ideal penalties θ⋆ are not available. Pierre Jacob Density exploration 31/ 49
  • 35. Wang–Landau algorithm 0.00 0.05 0.10 0.15 0.20 0.25 −5 0 5 10 15 X density Figure : Biased target distribution: each bin now carries the same mass. Pierre Jacob Density exploration 32/ 49
  • 36. Wang–Landau algorithm 0.00 0.05 0.10 0.15 0.20 0.25 −5 0 5 10 15 X density Figure : Markov chain exploring the biased target over 50, 000 iterations. Pierre Jacob Density exploration 33/ 49
  • 37. Wang–Landau algorithm Algorithm 2 Wang-Landau with deterministic schedule (ηt) 1: Init θ0 > 0, X0 ∈ X. 2: for t = 1 to T do 3: Sample Xt from Kθt−1 (Xt−1, ·), MH kernel targeting πθt−1 . 4: Update the penalties: log θt(i) ← log θt−1(i) + ηt (1IXi (Xt) − ϕi) 5: end for Pierre Jacob Density exploration 34/ 49
  • 38. Wang–Landau algorithm If ηt → 0 “fast enough”, (θt)t≥0 converges. If for each bin i, ϕi = 1/d: θt(i) −−−→ t→∞ ∫ Xi π(x)dx =: ψi at least up to a multiplicative constant. (Xt)t≥0 is asymptotically distributed according to πθ⋆ . Convergence of the Wang-Landau algorithm, G. Fort, B. Jourdain, E. Kuhn, T. Lelievre, G. Stoltz 2012, on arXiv. Pierre Jacob Density exploration 35/ 49
  • 39. Wang–Landau algorithm Choice of (ηt) can have a huge impact on the results. Define the counters: νt(i) := t∑ n=1 1IXi (Xn) Flat Histogram (FH) is reached when: max i∈{1,...,d} νt(i) t − ϕi < c for some c > 0. Instead of decreasing (ηt) at each iteration, decrease only when the Flat Histogram criterion is reached. Pierre Jacob Density exploration 36/ 49
  • 40. Wang–Landau algorithm Algorithm 3 Wang-Landau with stochastic schedule (ηκt ) 1: Init θ0 = 1, X0 ∈ X. 2: Init κ0 ← 0. 3: for t = 1 to T do 4: Sample Xt from Kθt−1 (Xt−1, ·), MH kernel targeting πθt−1 . 5: If (FH) then κt ← κt−1 + 1, otherwise κt ← κt−1. 6: Update the penalties: log θt(i) ← log θt−1(i) + ηκt (1IXi (Xt) − ϕi) 7: end for Pierre Jacob Density exploration 37/ 49
  • 41. Wang–Landau algorithm To be sure that eventually, for any c > 0: max i∈{1,...,d} νt(i) t − ϕi < c we have proved: ∀i ∈ {1, . . . , d} νt(i) t P −−−→ t→∞ ϕi for any fixed η > 0, which implies: E [ inf { t ≥ 0 : ∀i ∈ {1, . . . , d} | νt(i) t − ϕi| < c }] < ∞. The Wang-Landau algorithm reaches the Flat Histogram criterion in finite time, PJ & R. Ryder, AAP 2013. Pierre Jacob Density exploration 38/ 49
  • 42. Wang–Landau algorithm N chains (X (1) t , . . . , X (N) t ) using the same kernel Kθt targeting πθt at time t. The interaction is made through the common penalties (θt). The update was log θt(i) ← log θt−1(i) + η (1IXi (Xt) − ϕi) . Pierre Jacob Density exploration 39/ 49
  • 43. Wang–Landau algorithm N chains (X (1) t , . . . , X (N) t ) using the same kernel Kθt targeting πθt at time t. The interaction is made through the common penalties (θt). The update is now log θt(i) ← log θt−1(i) + η ( 1 N ∑N k=1 1IXi (X (k) t ) − ϕi ) . Pierre Jacob Density exploration 39/ 49
  • 44. Wang–Landau algorithm Default choice of partition: along density values (always 1-dimensional!). Introduce a sequence ξ0 = 0 < ξ1 < . . . < ξd = +∞ cutting the density axis R+ in d intervals. Some sense of a good range can be grasped from pilot runs. Pierre Jacob Density exploration 40/ 49
  • 45. Wang–Landau algorithm Figure : Wang-Landau on the moustarget distribution. Colours represent the partition. Pierre Jacob Density exploration 41/ 49
  • 46. Wang–Landau algorithm Figure : First bin of the partition. Pierre Jacob Density exploration 42/ 49
  • 47. Wang–Landau algorithm Figure : Last bin of the partition. Pierre Jacob Density exploration 43/ 49
  • 48. Wang–Landau algorithm Figure : Evolution of log θt along the iterations. Pierre Jacob Density exploration 44/ 49
  • 49. Wang–Landau algorithm In some situations we might have some intuition on the direction along which the modes are spread. For instance, if we knew that the modes of the moustarget distribution were along the y-axis: ∀i ∈ {1, . . . , d} Xi = R × (yi, yi+1) with −∞ = y1 < y2 < . . . < yd = +∞. Pierre Jacob Density exploration 45/ 49
  • 50. Wang–Landau algorithm Figure : Wang-Landau using the y-axis to partition the space. Pierre Jacob Density exploration 46/ 49
  • 51. Wang–Landau algorithm Figure : Exploration of the bins. Pierre Jacob Density exploration 47/ 49
  • 52. Wang–Landau algorithm Figure : Evolution of log θ along the iterations. Pierre Jacob Density exploration 48/ 49
  • 53. Bibliography The Wang-Landau algorithm in general state spaces: applications and convergence analysis, Y. Atchad´e and J. Liu, Statistica Sinica 2010. Determining the density of states for classical statistical models: A random walk algorithm to produce a flat histogram, F. Wang and D. Landau, Physical Review E 2001. An Adaptive Interacting Wang-Landau Algorithm for Automatic Density Exploration, L. Bornn, PJ, P. Del Moral, A. Doucet, JCGS 2013. The Wang-Landau algorithm reaches the Flat Histogram criterion in finite time, PJ & R. Ryder, AAP 2013. Adaptive Equi-Energy Sampler : Convergence and Illustration, Schreck, Fort, Moulines, 2013. Efficiency of the Wang-Landau algorithm: a simple test case, G. Fort, B. Jourdain, E. Kuhn, T Lelievre, G. Stoltz, 2014. Pierre Jacob Density exploration 49/ 49