0% found this document useful (0 votes)
61 views

Sequential Monte Carlo Methods

This document provides an overview of sequential Monte Carlo (SMC) methods. SMC methods are used to sample from a sequence of probability distributions that change over time, which arises in applications like online Bayesian inference and tracking models with sequential data. The key steps of SMC are sequential importance sampling and resampling. Importance sampling updates particle weights to match the target distribution, while resampling copies particles to avoid weight degeneracy over time. Examples where SMC can be applied include Kalman filtering of linear Gaussian state space models and optimal filtering of general hidden Markov models.

Uploaded by

memex1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

Sequential Monte Carlo Methods

This document provides an overview of sequential Monte Carlo (SMC) methods. SMC methods are used to sample from a sequence of probability distributions that change over time, which arises in applications like online Bayesian inference and tracking models with sequential data. The key steps of SMC are sequential importance sampling and resampling. Importance sampling updates particle weights to match the target distribution, while resampling copies particles to avoid weight degeneracy over time. Examples where SMC can be applied include Kalman filtering of linear Gaussian state space models and optimal filtering of general hidden Markov models.

Uploaded by

memex1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

ISyE8843A, Brani Vidakovic

Handout 13

Sequential Monte Carlo Methods.

Sampling from a sequence of distributions that change over time is difficult task in MCMC methology. This
is, however, an important problem that arises in a range of applications. For instance, the observations
may be arriving sequentially in time and one could be interested in performing Bayesian inference in real
time. To take full advantage of data, one should update the posterior distribution as data become available.
Some real-life applications include tracking of aircrafts using radar measurements, estimating the trends and
volatility of financial measurements, etc. An additional benefit of sequential methods is their computational
simplicity since the data are dealt in a sequential manner.
More details can be found in the monograph Doucet et al. (2001). Also, there is a page devoted to Sequential Monte Carlo Methods/ Particle Filtering at Cambridge, http://www-sigproc.eng.cam.ac.uk/smc/

1.1

Definitions

For any sequence {ak }, let ai: j = (ai , ai+1 , . . . , aj ). Consider a sequence of probability distributions
{n , n = 1, 2, . . . , } where The distribution n is defined on the space En = E n . We will assume that
each distribution n admits a probability density n (1: n ). Each density n is known up to a normalizing
1:n )
where fn is known pointwise and Cn is unknown. The index n is often
constant; i.e. n (1:n ) = fn (
Cn
referred as the time index although in applications may not have connection with real time.
SMC methodology is a set of algorithms that generate at each time instance collection of N (N 1)
P
(i) (i)
(i)
(i)
weighted random samples (particles) {n , 1:n ; i = 1, . . . , N } where n > 0, N
i=1 n = 1 and such
that for any test function n : En R
N
X

Z
(i)
n(i) n (1:n )

n (1:n )n (1:n )d1:n

n=1

as N . Because of in-line applications, it is desirable that the algorithms have linear (in the number of
particles N ) computational complexity and that the complexity is independent of n.
Two fundamental actions in SMC are sequential importance sampling and resampling.
We briefly describe the sequential importance sampling technique first. Assume that at time n 1
(i)
particles {1:n1 } have been sampled from an importance density qn1 (1:n1 ). Since the particles are not
samples from the target density they are weighted. Their weights are given by
(i)
n1

(i)

n1 (1:n1 )
(i)

qn1 (1:n1 )

(i)

(i)

(i)

At time n, one extends each path 1:n1 by sampling n according to an importance density n
(i)
qn (|1:n1 );

the paths till time n 1 are not modified in order to keep the algorithm sequential. It fol(i)

lows that the joint importance density of the paths {1:n } is of the form
qn (1:n ) = qn (n |1:n1 )qn1 (1:n1 )
n
Y
= q1 (1 )
qk (k |1:k1 ).
k=2

To correct for the discrepancy between the new target density n and the importance density qn , one needs
to update the weights according to
(i)

n(i)

n (1:n )
(i)

(i)

(i)

qn (n | 1:n1 ) qn1 (1:n1 )


(i)

(i)

n (1:n )
(i)

(i)

(i)

n1 (1:n1 ) qn (n | 1:n1 )

n1 (1:n1 )

(i)

n (1:n )

(i)

qn1 (1:n1 )
(i)

(i)
(i)
n1 (1:n1 ) qn (n |

(i)
1:n1 )

n1 .
(i)

(i)

In most applications, the computational complexity required to compute n given n1 is independent of


n.
The efficiency of this method is highly dependent on the choice of the importance density. To minimize
the conditional variance of the weights at time n, it is easy to see that the optimal importance distribution is
given by
qn (n |1:n1 ) = n (n |1:n1 ).
However, it might be impossible to sample easily from this density. Moreover, even if it is feasible, the
incremental importance weight is given in this case by
n (1:n )
n (1:n1 )
=
n1 (1:n1 )qn (n |1:n1 )
n1 (1:n1 )
and might not admit an analytical expression as it requires to compute
Z
n (1:n1 ) = n (1:n )dn .
Therefore, a good alternative strategy consists of coming up with an approximation of the optimal importance sampling distribution; several approximation techniques have been presented in a nonlinear nonGaussian state-space models context [2], [8].
Irrespective of the choice of the importance density, the main problem of sequential importance
sampling is that it is just a special instance of importance sampling and degenerates when n increases. After only a few time steps, one weight approaches 1 whereas all the other weights are
approach zero.
The key idea of SMC lies in the Resampling step. In the ideal scenario where qn (1:n ) = n (1:n ),
the weights would all be equal to N 1 . In practice this is obviously not the case and, roughly speaking,
(i) (i)
the approximation of n by {n , 1:n } is poor if the distribution of the weights has a high variance/small
entropy. In this case, i.e. if the variance of the weights is too high/entropy of the weights is below a value
specified by the user, particles with small weights are killed and particles with large weights are copied
multiple times. The underlying idea is to focus the computational efforts on the promising zones of the
space. Finally one assigns equal weights N 1 to each copy. The resampling step is what makes SMC work.
Clearly it introduces locally in time additional Monte Carlo errors but it can be shown both practically and
theoretically that this ensures that the algorithm does not degenerate over time. More formally, it consists
of performing the following approximation
N
X
i=1

n(i) (i)

1:n

(d1:n )

N
(i)
X
Nn
i=1

(i) (d1:n )
1:n

P
(i)
(i)
(i)
where Nn N is the number of copies of the particles 1:n under the constraint N
i=1 Nn = N to
keep the size of the population constant. In order to minimize the error introduced by the resampling
(i)
(i)
(i)
scheme, one usually selects a stochastic mechanism to obtain {Nn } such that E[Nn ] = N n (unbiased
(i)
approximation) and with small variances var[Nn ]. Several resampling schemes have been proposed in the
literature including multinomial, residual and stratified resampling [3].
SMC provide an estimate of the joint distribution n (1:n ) at index n. However, one can only expect to
obtain good approximations of the most recent marginal distributions n (k:n ) for n k say below 10.
(i)
Indeed, if particles are resampled many times between time k and n, there are very few distinct paths {1:k }
at index n. Fortunately, this is the only requirement in many applications.
We have presented here a simple generic SMC method. However, like MCMC methods, SMC methods
are not a black box and it is necessary to design carefully the algorithm so as to obtain good performance for
a reasonable number of particles. Recently many papers have proposed various SMC methods to improve
this basic scheme: construction of efficient importance sampling distributions, Rao-Blackwellised estimates,
use of MCMC moves, etc. A comprehensive coverage of state-of-the-art techniques on the subject can be
found in [3].

2
2.1

Some Applications
Kalman Filters

(Harvey, 1989; Anderson and Moore, 1979) In cases where the state space model is linear and Gaussian, the
classic Kalman filter is optimal. In this case we have:
f (xt+1 |xt ) = N (xt+1 |Axt , C)
g(yt |xt ) = N (yt |Bxt , D)
where N (x|, Q) is the Gaussian density function with mean vector and covariance matrix Q. We can
write this equivalently as:
xt+1 = Axt + vt
yt = Bxt + wt ,
where vt and wt are zero mean Gaussian vectors with covariance matrices C and D, respectively. The errors
vt and wt are independent over time and also independent of one another. We also require that the initial
state be Gaussian distributed,
p(x0 ) = N (x0 |0 , P0 ).
We first require p(xt+1 |y0:t ), the prediction step from the above filtering recursion,
Z
p(xt+1 |y0:t ) = p(xt |y0:t ) f (xt+1 |xt ) dxt
Suppose that we have already that at time t
p(xt |y0:t ) = N (xt |t , Pt ).
Since xt+1 = Axt + vt the standard change of variables (linear Gaussian case) gives
p(xt+1 |y0:t ) = N (xt+1 |t+1|t Pt+1|t )
3

where
t+1|t = At , Pt+1|t = C + APt A0 .
The correction step of the above filtering recursion is
p(xt+1 |y0:t+1 ) =

g(yt+1 |xt+1 ) p(xt+1 |y0:t )


.
p(yt+1 |y0:t )

Substituting the above Gaussian forms into the numerator gives


p(xt+1 |y0:t+1 ) N (yt+1 |Bxt+1 , D) N (xt+1 |t+1|t , Pt+1|t )

1
exp (yt+1 Bxt+1 )0 D1 (yt+1 Bxt+1 )
2

1
0 1
exp (xt+1 t+1|t ) Pt+1|t (xt+1 t+1|t )
2

1
0 1
exp (xt+1 t+1 ) Pt+1 (xt+1 t+1 )
2
N (xt+1 |t+1 , Pt+1 )
where
1
t+1 = Pt+1 (B 0 D1 yt+1 + Pt+1|t
t+1|t ), and

Pt+1 = (B 0 D1 B + Pt+1|t )1 .
This gives, after re-expressing inverse matrix,
t+1 = t+1|t + Kt (yt+1 B t+1|t ), and
Pt+1 = (I Kt B) Pt+1|t ,
where
Kt = Pt+1|t B 0 (B Pt+1|t B 0 + D)1 .
Finally the complete Kalman filtering recursion can be summarized as
t+1|t = A t
Pt+1|t = C + A Pt A0
Kt = Pt+1|t B 0 (B Pt+1|t B 0 + D)1
t+1 = t+1|t + Kt (yt+1 Bt+1|t )
Pt+1 = (I Kt B) Pt+1|t .

2.2

Optimal Filtering

The application of SMC to optimal filtering was first presented in [5]. The problem of interest is estimating
the state of a Markov process {Xk }k1 given some observations {Yk }k1 . The unobserved (hidden) Markov
process is defined by
X1 , Xk |Xk1 f (|Xk1 )
4

whereas the observations are assumed to be independent conditional upon {Xk }k1 with marginal distribution
Yk |Xk g(|Xk ).
Estimating the posterior distribution of Xk given Y1:k is a very important problem known as optimal filtering. If the model is linear and Gaussian, the posterior distribution is Gaussian and its statistics can be
computed using the Kalman filter. However in many real-world applications, these linearity and Gaussianity
assumptions are not valid and one needs to use numerical methods. SMC methods can be applied directly
to this problem by setting n as the posterior density of the collection of states X1:n given a realization of
the observations Y1:n = y1:n . Indeed this posterior distribution satisfies
n (x1:n ) (x1 )

n
Y

f (xk |xk1 )

k=2

n
Y

g(yk |xk )

k=1

and is typically known up to a normalizing constant.

2.3

Population Monte Carlo and Static Parameter Inference

The filtering problem is characterized by the dynamic nature of the statistical model. However it is important
to realize that SMC methods can also be used to perform inference about a static parameter. More generally,
one is often interested in using SMC methods to sample from a sequence of distributions {n }nN defined
on a common measurable space E; each n being known up to a normalizing constant. For example, n (x)
could be the posterior distribution of a random parameter X given the observations available at time n. In a
global optimization context, one could also define n (x) [(x)]n where {n } is an increasing sequence
such that n so as to maximize (x); a similar idea is the basis of simulated annealing.
SMC methods described previously do not apply directly in this context as they address the case where
n is defined on En = E n instead of E. However, it is still possible to use SMC methods by constructing
an artificial sequence of distributions {e
n }nN where
en is defined on En and satisfies
Z

en (x1:n )dx1:n = n (xn ).


An obvious choice for
en is given by

en (x1:n ) = n (xn )

n
Y

Lk (xk1 |xk )

k=2

where {Ln }nN is an arbitrary sequence of Markov transition kernels. The resulting SMC algorithm can be
interpreted as an adaptive importance sampling resampling algorithm.

References
[1] Doucet, A. (2004) Sequential Monte Carlo Methods, Entry in Encyclopedia of Statistical Sciences,
Second Edition, Wiley. In Print.
[2] Doucet, A., Godsill, S.J. and Andrieu, C. (2000) On sequential Monte Carlo sampling methods for
Bayesian filtering, Statist. Comput., 10, 197-208, 2000.
[3] Doucet, A., De Freitas, J.F.G. and Gordon, N.J. (editors) (2001) Sequential Monte Carlo Methods in
Practice. Springer Series in Statistics for Engineering and Information Science. New York: SpringerVerlag.
5

[4] Djuric, P. M. and J.-H. Chun, J. -H. (2002). An MCMC sampling approach to estimation of nonstationary hidden Markov models, IEEE Transactions on Signal Processing, 50, 11131124.
[5] Gordon, N.J., Salmond, D.J. and Smith, A.F.M. (1993) Novel approach to nonlinear/non-Gaussian
Bayesian state estimation, IEE-Proc. F, 140, 107-113.
[6] Iba, Y. (2000) Population Monte Carlo algorithms, Trans. Japan. Soc. Arti. Intel., 16, 279-286.
[7] Liu, J.S. and Chen, R. (1998) Sequential Monte Carlo methods for dynamic systems, J. Am. Statist.
Ass., 93, 1032-1044.
[8] Pitt, M.K. and Shephard, N. (1999) Filtering via simulation: auxiliary particle filters, J. Am. Statist.
Ass., 94, 590-599.

You might also like