abc_slides
abc_slides
computation
PhD course FMS020F–NAMS002 “Statistical inference for partially
observed stochastic processes”, Lund University
http://goo.gl/sX8vU9
Umberto Picchini
Centre for Mathematical Sciences,
Lund University
Well, we know the most obvious answer: it’s because this is what we
do when exact methods are impractical. No big news...
However you might have simplified the model a wee too much to be
realistic/useful/sound.
y∗ ∼ p(y|θ)
Z
∗
f (θ ) = p(y∗ |θ∗ )π(θ∗ )Iy (y∗ )dy∗ = p(y|θ∗ )π(θ∗ ) ∝ π(θ∗ |y)
Y
Let’s try something really trivial. We show how ABC rejection can
become easily inefficient.
Try = 20
shape scale
0.15
0.08
Density
Density
0.10
0.04
0.05
0.00
0.00
0 1 2 3 4 5 6 0 2 4 6 8 10
N = 45654 Bandwidth = 0.1699 N = 45654 Bandwidth = 0.3038
shape scale
Density
0.10
0.00
0 1 2 3 4 5 6 0 2 4 6 8 10
N = 19146 Bandwidth = 0.1779 N = 19146 Bandwidth = 0.2186
shape scale
Density
0.10
0.00
0 2 4 6 2 4 6 8 10
N = 586 Bandwidth = 0.321 N = 586 Bandwidth = 0.2233
shape scale
1.2
0.8
0.8
0.6
Density
Density
0.4
0.4
0.2
0.0
0.0
3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 3.5 4.0 4.5 5.0 5.5
N = 474 Bandwidth = 0.1351 N = 474 Bandwidth = 0.08315
2
Pritchard et al. 1999, Molecular Biology and Evolution, 16:1791-1798.
Umberto Picchini ([email protected])
Using summary statistics clearly introduces a further level of
approximation. Except when S(·) is sufficient for θ (carries the same
info about θ as the whole y).
π (θ|S(y)) = π (θ|y)
shape scale
1.5
1.2
1.0
Density
Density
0.8
0.5
0.4
0.0
0.0
1.5 2.0 2.5 3.0 4.0 4.5 5.0 5.5
N = 453 Bandwidth = 0.07102 N = 453 Bandwidth = 0.06776
The “1” at the denominator it’s there because of course we must start the
algorithm at some admissible (accepted) y# , hence the denominator will
always have Iy (y# ) = 1.
3
Marjoram et al. 2003, PNAS 100(26).
Umberto Picchini ([email protected])
By considering the simplification in the previous acceptance
probability we have the ABC-MCMC:
4
Sisson and Fan (2010), chapter in Handbook of Markov chain Monte Carlo.
Umberto Picchini ([email protected])
Choice of the threshold
We would like to use a “small” > 0, however it turns out that if you
start at a bad value of θ a small will cause many rejections.
start with a fairly large allowing the chain to move in the
parameters space.
after some iterations reduce so the chain will explore a
(narrower) and more precise approximation to π(θ|y)
keep reducing (slowly) . Use the set of θ’s accepted using the
smallest to report inference results.
It’s not obvious how to determine the sequence of
1 > 2 > ... > k > 0. If the sequence decreases too fast there will
be many rejections (chain suddenly trapped in some tail).
θ γ
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
×104 ×104
θ γ
5 7
6
4
5
3
4
3
2
2
1
1
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1.2
θ γ
1 1
epsilon = 0 epsilon = 0
epsilon = 2 epsilon = 2
0.8 epsilon = 6 epsilon = 6
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
-0.2 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Prangle5 notes that if S(y) = (S1 (y), ..., Sd (y)) and if we give to all Sj
the same weight (hence W is the identity matrix) then the distance
k · k is dominated by the most variable summary Sj .
The σj could be determined from some pilot study. Say that we are
using ABC-MCMC, after some appropriate burnin say that we have
stored R realizations of S(y∗ ) corresponding to the R parameters θ∗
into a R × d matrix.
However, we can bypass the need for S(·) if we use an ABC version
of sequential Monte Carlo.
For the sake of brevity, just consider a bootstrap filter approach with
N particles.
Step 1.
resample N particles {xti , w̃it }. Set wit = 1/N.
Set t := t + 1 and if t = T + 1, stop.
Step 2.
For i = 1, ..., N sample xti ∼ p(xt |xt−1
i ) and y∗i ∼ p(y |xi ). Compute
t t t
6
Martin et al. 2014, arXiv:1409.8363.
Umberto Picchini ([email protected])
Semi-automatic summary statistics
To date the most important study on the construction of summaries in ABC
is in Fearnehad-Prangle 20127 which is a discussion paper on JRSS-B.
Recall a well-known result: consider the class of quadratic losses
and for each column of the left matrix do a multivariate linear regression (or
lasso, or...)
(1)
(∗1) (∗1) (∗1)
θj 1 y1 y2 · · · yn
(2)
θj = 1 y1(∗2) y(∗2) 2
(∗2)
· · · yn × βj (j = 1, ..., p),
.. .. .. ..
. . . ··· .
(j)
and obtain a statistic for θj , Sj (·) = β̂0 + β̂(j) η(·).
Umberto Picchini ([email protected])
Use the same coefficients when calculating summaries for simulated
data and actual data, i.e.
(j)
Sj (y) = β̂0 + β̂(j) η(y)
(j)
Sj (y∗ ) = β̂0 + β̂(j) η(y∗ )