Sequential Monte Carlo Methods
Sequential Monte Carlo Methods
Handout 13
Sampling from a sequence of distributions that change over time is difficult task in MCMC methology. This
is, however, an important problem that arises in a range of applications. For instance, the observations
may be arriving sequentially in time and one could be interested in performing Bayesian inference in real
time. To take full advantage of data, one should update the posterior distribution as data become available.
Some real-life applications include tracking of aircrafts using radar measurements, estimating the trends and
volatility of financial measurements, etc. An additional benefit of sequential methods is their computational
simplicity since the data are dealt in a sequential manner.
More details can be found in the monograph Doucet et al. (2001). Also, there is a page devoted to Sequential Monte Carlo Methods/ Particle Filtering at Cambridge, http://www-sigproc.eng.cam.ac.uk/smc/
1.1
Definitions
For any sequence {ak }, let ai: j = (ai , ai+1 , . . . , aj ). Consider a sequence of probability distributions
{n , n = 1, 2, . . . , } where The distribution n is defined on the space En = E n . We will assume that
each distribution n admits a probability density n (1: n ). Each density n is known up to a normalizing
1:n )
where fn is known pointwise and Cn is unknown. The index n is often
constant; i.e. n (1:n ) = fn (
Cn
referred as the time index although in applications may not have connection with real time.
SMC methodology is a set of algorithms that generate at each time instance collection of N (N 1)
P
(i) (i)
(i)
(i)
weighted random samples (particles) {n , 1:n ; i = 1, . . . , N } where n > 0, N
i=1 n = 1 and such
that for any test function n : En R
N
X
Z
(i)
n(i) n (1:n )
n=1
as N . Because of in-line applications, it is desirable that the algorithms have linear (in the number of
particles N ) computational complexity and that the complexity is independent of n.
Two fundamental actions in SMC are sequential importance sampling and resampling.
We briefly describe the sequential importance sampling technique first. Assume that at time n 1
(i)
particles {1:n1 } have been sampled from an importance density qn1 (1:n1 ). Since the particles are not
samples from the target density they are weighted. Their weights are given by
(i)
n1
(i)
n1 (1:n1 )
(i)
qn1 (1:n1 )
(i)
(i)
(i)
At time n, one extends each path 1:n1 by sampling n according to an importance density n
(i)
qn (|1:n1 );
the paths till time n 1 are not modified in order to keep the algorithm sequential. It fol(i)
lows that the joint importance density of the paths {1:n } is of the form
qn (1:n ) = qn (n |1:n1 )qn1 (1:n1 )
n
Y
= q1 (1 )
qk (k |1:k1 ).
k=2
To correct for the discrepancy between the new target density n and the importance density qn , one needs
to update the weights according to
(i)
n(i)
n (1:n )
(i)
(i)
(i)
(i)
n (1:n )
(i)
(i)
(i)
n1 (1:n1 ) qn (n | 1:n1 )
n1 (1:n1 )
(i)
n (1:n )
(i)
qn1 (1:n1 )
(i)
(i)
(i)
n1 (1:n1 ) qn (n |
(i)
1:n1 )
n1 .
(i)
(i)
n(i) (i)
1:n
(d1:n )
N
(i)
X
Nn
i=1
(i) (d1:n )
1:n
P
(i)
(i)
(i)
where Nn N is the number of copies of the particles 1:n under the constraint N
i=1 Nn = N to
keep the size of the population constant. In order to minimize the error introduced by the resampling
(i)
(i)
(i)
scheme, one usually selects a stochastic mechanism to obtain {Nn } such that E[Nn ] = N n (unbiased
(i)
approximation) and with small variances var[Nn ]. Several resampling schemes have been proposed in the
literature including multinomial, residual and stratified resampling [3].
SMC provide an estimate of the joint distribution n (1:n ) at index n. However, one can only expect to
obtain good approximations of the most recent marginal distributions n (k:n ) for n k say below 10.
(i)
Indeed, if particles are resampled many times between time k and n, there are very few distinct paths {1:k }
at index n. Fortunately, this is the only requirement in many applications.
We have presented here a simple generic SMC method. However, like MCMC methods, SMC methods
are not a black box and it is necessary to design carefully the algorithm so as to obtain good performance for
a reasonable number of particles. Recently many papers have proposed various SMC methods to improve
this basic scheme: construction of efficient importance sampling distributions, Rao-Blackwellised estimates,
use of MCMC moves, etc. A comprehensive coverage of state-of-the-art techniques on the subject can be
found in [3].
2
2.1
Some Applications
Kalman Filters
(Harvey, 1989; Anderson and Moore, 1979) In cases where the state space model is linear and Gaussian, the
classic Kalman filter is optimal. In this case we have:
f (xt+1 |xt ) = N (xt+1 |Axt , C)
g(yt |xt ) = N (yt |Bxt , D)
where N (x|, Q) is the Gaussian density function with mean vector and covariance matrix Q. We can
write this equivalently as:
xt+1 = Axt + vt
yt = Bxt + wt ,
where vt and wt are zero mean Gaussian vectors with covariance matrices C and D, respectively. The errors
vt and wt are independent over time and also independent of one another. We also require that the initial
state be Gaussian distributed,
p(x0 ) = N (x0 |0 , P0 ).
We first require p(xt+1 |y0:t ), the prediction step from the above filtering recursion,
Z
p(xt+1 |y0:t ) = p(xt |y0:t ) f (xt+1 |xt ) dxt
Suppose that we have already that at time t
p(xt |y0:t ) = N (xt |t , Pt ).
Since xt+1 = Axt + vt the standard change of variables (linear Gaussian case) gives
p(xt+1 |y0:t ) = N (xt+1 |t+1|t Pt+1|t )
3
where
t+1|t = At , Pt+1|t = C + APt A0 .
The correction step of the above filtering recursion is
p(xt+1 |y0:t+1 ) =
1
exp (yt+1 Bxt+1 )0 D1 (yt+1 Bxt+1 )
2
1
0 1
exp (xt+1 t+1|t ) Pt+1|t (xt+1 t+1|t )
2
1
0 1
exp (xt+1 t+1 ) Pt+1 (xt+1 t+1 )
2
N (xt+1 |t+1 , Pt+1 )
where
1
t+1 = Pt+1 (B 0 D1 yt+1 + Pt+1|t
t+1|t ), and
Pt+1 = (B 0 D1 B + Pt+1|t )1 .
This gives, after re-expressing inverse matrix,
t+1 = t+1|t + Kt (yt+1 B t+1|t ), and
Pt+1 = (I Kt B) Pt+1|t ,
where
Kt = Pt+1|t B 0 (B Pt+1|t B 0 + D)1 .
Finally the complete Kalman filtering recursion can be summarized as
t+1|t = A t
Pt+1|t = C + A Pt A0
Kt = Pt+1|t B 0 (B Pt+1|t B 0 + D)1
t+1 = t+1|t + Kt (yt+1 Bt+1|t )
Pt+1 = (I Kt B) Pt+1|t .
2.2
Optimal Filtering
The application of SMC to optimal filtering was first presented in [5]. The problem of interest is estimating
the state of a Markov process {Xk }k1 given some observations {Yk }k1 . The unobserved (hidden) Markov
process is defined by
X1 , Xk |Xk1 f (|Xk1 )
4
whereas the observations are assumed to be independent conditional upon {Xk }k1 with marginal distribution
Yk |Xk g(|Xk ).
Estimating the posterior distribution of Xk given Y1:k is a very important problem known as optimal filtering. If the model is linear and Gaussian, the posterior distribution is Gaussian and its statistics can be
computed using the Kalman filter. However in many real-world applications, these linearity and Gaussianity
assumptions are not valid and one needs to use numerical methods. SMC methods can be applied directly
to this problem by setting n as the posterior density of the collection of states X1:n given a realization of
the observations Y1:n = y1:n . Indeed this posterior distribution satisfies
n (x1:n ) (x1 )
n
Y
f (xk |xk1 )
k=2
n
Y
g(yk |xk )
k=1
2.3
The filtering problem is characterized by the dynamic nature of the statistical model. However it is important
to realize that SMC methods can also be used to perform inference about a static parameter. More generally,
one is often interested in using SMC methods to sample from a sequence of distributions {n }nN defined
on a common measurable space E; each n being known up to a normalizing constant. For example, n (x)
could be the posterior distribution of a random parameter X given the observations available at time n. In a
global optimization context, one could also define n (x) [(x)]n where {n } is an increasing sequence
such that n so as to maximize (x); a similar idea is the basis of simulated annealing.
SMC methods described previously do not apply directly in this context as they address the case where
n is defined on En = E n instead of E. However, it is still possible to use SMC methods by constructing
an artificial sequence of distributions {e
n }nN where
en is defined on En and satisfies
Z
en (x1:n ) = n (xn )
n
Y
Lk (xk1 |xk )
k=2
where {Ln }nN is an arbitrary sequence of Markov transition kernels. The resulting SMC algorithm can be
interpreted as an adaptive importance sampling resampling algorithm.
References
[1] Doucet, A. (2004) Sequential Monte Carlo Methods, Entry in Encyclopedia of Statistical Sciences,
Second Edition, Wiley. In Print.
[2] Doucet, A., Godsill, S.J. and Andrieu, C. (2000) On sequential Monte Carlo sampling methods for
Bayesian filtering, Statist. Comput., 10, 197-208, 2000.
[3] Doucet, A., De Freitas, J.F.G. and Gordon, N.J. (editors) (2001) Sequential Monte Carlo Methods in
Practice. Springer Series in Statistics for Engineering and Information Science. New York: SpringerVerlag.
5
[4] Djuric, P. M. and J.-H. Chun, J. -H. (2002). An MCMC sampling approach to estimation of nonstationary hidden Markov models, IEEE Transactions on Signal Processing, 50, 11131124.
[5] Gordon, N.J., Salmond, D.J. and Smith, A.F.M. (1993) Novel approach to nonlinear/non-Gaussian
Bayesian state estimation, IEE-Proc. F, 140, 107-113.
[6] Iba, Y. (2000) Population Monte Carlo algorithms, Trans. Japan. Soc. Arti. Intel., 16, 279-286.
[7] Liu, J.S. and Chen, R. (1998) Sequential Monte Carlo methods for dynamic systems, J. Am. Statist.
Ass., 93, 1032-1044.
[8] Pitt, M.K. and Shephard, N. (1999) Filtering via simulation: auxiliary particle filters, J. Am. Statist.
Ass., 94, 590-599.