Low Variance Sampling Techniques For Particle Filter
Low Variance Sampling Techniques For Particle Filter
Quan Nguyen
[email protected]
University of Hamburg
Department of Informatics
Abstract Variance controlling has been considered as one probabilistic robotics. Section IV demonstrates one simulated
of essential components of numerical approximation estimators. experiment which makes use of all three sampling techniques
In probabilistic robotics, one of such estimators named Particle and evaluate the performance of these techniques in a simple
Filter has been studied extensively and proved to has numerous
application in different problems of Robotics and Machine contrived robot localization environment. Section V summa-
Learning such as Object Tracking, Robot Localization and rizes the discussion and analysis of the experiment.
Image Processing. This paper introduces the theoretical frame-
work of Particle Filter and compares three different sampling II. S EQUENTIAL M ONTE C ARLO M ETHOD
techniques to controlling the variance in re-sampling step of
Particle Filter. A. Basic Monte Carlo Approximation Method
In many problems we are interested in the expectation
E[f (x)] of some measurement f (x) of a random variable x.
I. I NTRODUCTION
If the probability density p(x) is well-defined, we have:
A large number of problems in computing requires es- Z
timating the value of some features that the systems are E[f (x)] = f (x)p(x)dx (2)
interested in. Generally, while designing such estimators
we aim to find better models that can approximate the Normally x is not a simple feature but a setting of
features accurately. One frequently used evaluation criteria multiple features represented as a multi-dimensional vector
for such estimators is measuring the discrepancy between the of d features: f : Rd R. In addition the function f (x) and
estimated values and the theoretical values, denoted as the p(x) could be extremely complex and a direct computation
mean squared error (MSE). Suppose that we are estimating of this integral is not possible. One common functional
some parameter and our model return estimated value , approximation technique is Monte Carlo method in which
the MSE is computed as E[f (x)] is estimated by the expectation of a set of N samples
{Xi }N1 :
M SE = E[ ]2
N
X
In order to measure the MSE of the estimator, we wish to E[f (x)] I[f ] = 1/N f (Xi ) (3)
somehow derive MSE from other quantity so that we no i=1
longer dependent on the actual expected value . Rewrite where Xi is draw from p(x): Xi p(x). It could be
we have:
the above equation by adding one term E[], proved [4] that this estimator is unbiased and consistent. The
variance of this estimator is
M SE = E[ E[]
+ E[]
]2
= E[ E[]]
2 + (E[]
)2 (1) V ar[I[f ]] = V ar[f (x)]/N (4)
+ (E[]
= V ar[] )2
The basic Monte Carlo method needs two assumptions to
Equation 1 reveals the relation between MSE of an estimator be feasible: function calculating f (x) is tractable for each
= )
and its variance. If the estimator is unbiased (i.e E[] sample and so is the drawing samples process from p(x).
then the MSE is equivalent to the Variance of the estimator. However it could be the case that the latter assumption does
This insight suggests that the performance of an estimator not hold. In that circumstance we can pick another function
can be evaluated directly through its variance without having q(x) that is closely related to p(x) but much simpler for the
information about the actual expected value. For that reason, drawing process. Altering equation 2 we have:
variance control becomes an essential part of those estima- Z
p(x)
tors. This paper aims at introducing one of the most common E[f (x)] = f (x) q(x)dx (5)
q(x)
unbiased estimator named Monte Carlo Approximation and
its application in robotics. N
X
] = 1/N
E[f (x)] I[f f (Xi )w(Xi ) (6)
The details of Sequential Monte Carlo Method with three
sampling techniques is presented in section II. Section III i=1
briefly introduce the theoretical framework of Particle Filter where Xi is sample draw from q(x) and (w(x) =
as one of application of Sequential Monte Carlo Method in p(x)/q(x) is the important weight function. This method is
1
called Importance Sampling because each sample carries a B. Sequential Monte Carlo Method
weight. This estimator is also unbiased and consistent. Its In many applications the estimation is required to work
variance is not only at one time step but in a time series where
the distribution of features varies. For example, in Gesture
V ar[f (x) p(x)
q(x) ] Recognition problems the position of hands and objects
]] =
V ar[I[f (7)
N change along with the posture of people. Such conditions
Equation 7 hints that we can reduce the variance of require multiple Monte Carlo approximation to be conducted
Importance Sampling estimator if a proper choice of q(x) is online in response to additional inputs. We now rewrite every
chosen. In an even more extreme but realistic circumstance, factors in weight equations above in temporal context:
the weight values cannot be evaluated directly because the pn (xn )
two probability density functions p(x) and q(x) are normal- w
n (xn ) = (13)
qn (xn )
ized from two measurements p(x) and q(x) by some constant
Zp and Zq which we do not know beforehand: p(x) = p(x)
Zp
Sequential Monte Carlo method works under an assump-
and q(x) = q(x)
Technically we can compute Zp and Zq as tion that q(x) can be computed recursively from the previous
Zq .
time step [2] [4]. Specifically
Z
Zp = p(x)dx
qn (xn |xn1 )
qn (xn ) = qn1 (xn1 ) (14)
Z (8)
Zq = q(x)dx Because the distribution qn1 (xn1 ) is represented at
time n 1 by our set of samples, equation 14 practically
However, as we discussed above computing integrals of indicates that the sample at time step n is draw from
density functions can be tricky in many cases. Interestingly, distribution qn (xn |xn1 ). In other words, it tells us how
it is still possible to estimate E[f (x)] but this time we need to employ the transition functions - often made available
one more additional estimation step. Rewriting equation 5 in many dynamic systems - to draw samples in the new
we have: time step. In reality the distribution of new samples will
Z not only dependent on the previous states but also on the
p(x) newly observed data. The detail of incorporating environment
E[f (x)] = f (x) q(x)dx
q(x) concepts in transition functions will be discussed in section
Z
Zq p(x) ??. Substituting equation 14 to 13 we have:
= f (x) q(x)dx (9)
Zp q(x)
N pn1
(xn1 ) pn (xn )
Zq 1 X w
n (xn ) =
E[f (x)] f (Xi )w(x)
qn1
(xn1 ) pn1
(xn1 ) qn (xn |xn1 ) (15)
Zp N i=1
w
n (xn ) = w
n1 (xn1 )n (xn )
p(x)
where w(x)
= q(x) . At this point we still need to estimate
Zq
The quantity n (xn ) is called incremental importance
Zp : weight [4]. Basically equation 15 manifests a mechanism to
Z update the weights of samples between two successive time
Zp 1 steps. Because the weights in our estimator constantly change
= p(x)dx
Zq Zq after every time step, its variance also changes through time.
Z (10)
p(x) Kong et. al. [5] showed that the variance of the importance
= q(x)dx
q(x) weights keeps increasing over time. This is obviously an
undesirable effect because as the variance increases, the
It can be observed that equation 10 follows the same form
mean squared error will eventually become larger and larger.
with equation 2, we can use Monte Carlo approximation
Intuitively, in the context of robotics schemes this means
again with the same set of samples draw from q(x):
that as given get more inputs from environment the robots
PN should be more certain about its state and the set of samples
Zp i=1 w(X
i) should be converging to a smaller set of possible worlds.
(11)
Zq N However the current setting of Importance Sampling wastes
In overall from equation 9, 11 we have: many samples in unimportant positions and quickly increases
its variance. There are some cases where variance even
N
X increases exponentially as n grows [2]. One very natural
E[f (x)] = f (Xi )W (Xi ) (12) idea to address this problem is to "reconstructing" the set of
i=1 samples, in a way that the less important samples could be
diminished and the more important samples are maintained.
where W (Xi ) = PNw(X
i)
is the normalized weight of This idea leads to an additional step after drawing samples:
i=1 w(X
i)
samples. The two densities p(x) and q(x) are called target re-sampling this newly constructed samples so that they
distribution and proposal distribution, respectively. represent the target distribution with higher accuracy.
2
C. Re-sampling Techniques the system as a whole and because of its randomness, some
Since each samples carries a weight indicate its im- samples keep being removed and the others being maintained
portance, re-sampling phase strive to select samples with without any explicit reason. Eventually some regions of
probabilities proportional to their weights. This is similar to domain space are no longer be represented in samples set.
Fitness Proportionate Selection class of algorithms has been The following two techniques called Stratified Sampling and
widely applied in genetic algorithms where the weights are Systematic Sampling address this problem by decomposing
considered the fitness of individuals in a population and are the original domain space into smaller sub-regions and retain
used for selecting the offspring of the next generation (time the representative in each sub-region.
step). In general all re-sampling shares a common strategy: 2) Stratified Sampling: In stratified sampling, the domain
is represented as a set of N exhaustive partitions, recall
Algorithm 1 Weighted Re-Sampling algorithm that N is equal to the number of samples to be drawn.
We then apply Simple Random Sampling in each sub-region
1: procedure R ESAMPLING (w: list of weights; N: number
iteratively, but this time Simple Random Sampling will draw
of samples to be draw)
only one instance.
2: X = empty list of samples
3: for i = 1 to N do Algorithm 2 Stratified Re-Sampling algorithm
4: Obtain random number kP U [0, 1)P
i i+1 1: procedure R ESAMPLING (w: list of weights; N: number
5: Find index i where k [ t=1 wt , t=1 wt )
of samples to be draw)
6: Add sample i to X
2: X = empty list of samples
7: end for
3: D = Domain space
8: return X
4: A = set of N sub-regions of D. A1 A2 ...An = D
9: end procedure
5: for i = 1 to N do
6: Draw one sample s from random number k
In the context of unnormalised weights in section II-A, if generated in region Ai
we denote SumW as the sum of all weights w(Xi ) then 7: Add sample s to X
it can be proved that algorithm 1 will select sample i with 8: end for
w(Xi )
probability proportional to SumW by considering k in each 9: return X
step as a cumulative density function. Since drawing random 10: end procedure
number k is the only stochastic element in this re-sampling
algorithm, the variance reduction in Sequential Monte Carlo Stratified Sampling has been proven that it is able to
Approximation depends on how this number k is obtained. In achieves lower variance for estimator than Simple Random
the following section we introduce the three most common Sampling [4] [2]. To simplify the process of generating
techniques to achieve low variance sampling process. random number in each sub-region, we can make all sub-
regions equivalent in the covering area. Stratified Sampling
is considered to be an appropriate method which gain higher
performance when the robot follows a belief of multi-modal
distribution [6]
3) Systematic Sampling: Systematic Sampling is very
similar to Stratified Sampling in the sense that it draws sam-
ples from random criteria selected in each sub-regions, only
Fig. 1. Uniform Sample using 03 different sampler: simple random in different method of generating them. While the random
sampling (x), stratified sampling (+) and systematic sampling (.) [4] factors k in sub-regions of Stratified Sampling are mutually
independent, in Systematic Sampling they are identical in
1) Simple Random Sampling: As the name suggests, domain U [0, 1) and only differ when being transformed to
Simple Random Sampling simply draws N random numbers weight values.
k independently from uniform distribution U [0, 1) and then The efficient discrepancies between Stratified Sampling
transform this number to range U [0, SumW ). The sim- and Systematic Sampling are generally system-specific. Re-
plest implementation of Simple Random Sampling requires cent study by [1] Bhatta et. al concludes that System-
O(N 2 ) time complexity because the task of mapping k to atic Sampling not only achieves lower complexity but also
sample indexes cost is implemented as an inner loop through performs better when multiple parts of the environment
all samples which costs O(N ) complexity. Most practical have strong correlations. Figure 1 illustrates one example
implementation achieves O(N logN ) by sorting and binary of sampling pattern from 03 different sampling techniques.
search or even O(N ) complexity by inverse cumulative While both Stratified Sampling and Systematic Sampling
density function [3] cover are able to maintain a higher diversity of samples with
While being simple and easy to implement, Simple Ran- respect to the distribution space, the Systematic Sampling
dom Sampling suffers from natural deterioration of samples. clearly shows a correlation between samples from different
Concretely, it could be the case that the systems do not regions and thus is expected to achieve lower variance than
retrieve new inputs over time, yet the re-sampling process Stratified Sampling.
3
Algorithm 3 Systematic Re-Sampling algorithm probability and the belief distribution from previous time
1: procedure R ESAMPLING (w: list of weights; N: number steps. It turns out that such information is sufficient to cal-
of samples to be draw) culate bel(xt ) following a recursive procedure called Bayes
2: X = empty list of samples Filter [6]:
3: D = Domain space
4: A = set of N sub-regions of D. A1 A2 ...An = D Algorithm 4 Bayes Filter algorithm
5: Draw a random number k U [0, 1) 1: procedure BAYES F ILTER (bel(xt1 , ut , zt )
6: for i = 1 to N do 2: X = state space
7: Transform k to sample selection criterion in 3: for all xt in XR do do
region Ai 4: t ) = p(xt |ut , xt1 )bel(xt1 )dxt1
bel(x
8: Find sample s satisfies the criterion above 5: t)
bel(xt ) = p(zt |xt )bel(x
9: Add sample s to X 6: end for
10: end for 7: return bel(xt )
11: return X 8: end procedure
12: end procedure
As we discussed earlier in section II-A, computation of
integral in line 4 of algorithm 4 is not always tractable. One
III. PARTICLE F ILTER solution for those cases are transforming the Bayes Filter for
In probabilistic robotics, states are features which cannot continuous state space to discrete state space. It could also
be measured directly by the robot, but can be inferred be the case the original systems are actually discrete:
from the sensor and control data. Since uncertainty factors
stemmed from noise and stochastic nature of the environment Algorithm 5 Discrete Bayes Filter algorithm [6]
and robot devices, state x at time step t are often represented 1: procedure BAYES F ILTER (pk,t1 , ut , zt )
as a random variable distributed over state transition proba- 2: K = discrete state space
bility [6]: 3: for all k inPK do do
K
4: pk,t = i p(xk |ut , xi )pi,t1
P (x = xt ) = p(xt |x0:t1 , z1:t1 , u1:t ) (16) 5: pk,t = p(zt |xk )
pk,t
6: end for
where z(t) is environment data measured by robots sen- 7: return bel(xt )
sors and u(t) is control action performed by the robot. 8: end procedure
Although the internal dynamics of the robot is stochastic,
in general the probability distribution of states as a function For point estimation of belief as in Discrete Bayes Filter
of measurement and control data is deterministic. In other algorithm, each sub-region is represented by a single data
words, the distribution of xt represents is complete in the point - a particle. This is identical to the notion of samples
sense that the whole measurement and control data up to in Basic Monte Carlo method. By transforming the idea
time t can be represented by p(xt ). This is called Markov of sequential Monte Carlo Approximation to Discrete State
assumption [6] and it allows us to rewrite the transition Estimation problems using particle, we obtain an algorithm
probability as called Particle Filter algorithm [6]. Denoting Xt as the set
of particles at time t and M as the cardinality of that set,
P (x = xt ) = p(xt |xt1 , ut ) (17) we have
Since the data that a robot can retrieve from environments
Algorithm 6 Particle Filter algorithm [6]
is strongly dependent on the actual position (state) of the
1: procedure PARTICLE F ILTER (Xt1 , ut , zt )
robot, we can safely assume that the measurement data are
2: Xt = Xt =
also random variable whose distributions parameters are
3: for m = 1 to M do do
the states and control data of the robot. Under Markov [m] [m]
4: sample xt p(xt |ut , xt1 )
assumption we can write the measurement probability [6] [m] [m]
as 5: wt = p(zt |xt )
[m] [m]
6: Add (xt , wt ) to Xt
p(zt |x0:t , z1:t1 , u1:t ) = p(zt |xt ) (18) 7: end for
8: for m = 1 to M do do
[m]
It is desirable to formulate the distribution of all states 9: Draw particle i from Xt with probability wt
by all the measurement and control data acquired over time. [i]
10: Add xt to Xt
Such distribution is called belief of the robot [6]: 11: end for
12: return Xt
bel(xt ) = p(xt |z1:t , u1:t ) (19) 13: end procedure
4
In algorithm 6, we choose the proposal distribution is B. Performance Evaluation
p(xt |ut , xt1 )bel(xt1 ). Different choice will lead to dif- In section I we have argued that for unbiased estimator, the
ferent sampling and weight result, but intuitively the most mean squared error is identical to the variance. Hence, we
efficient choice would be the distribution that involves both can measure the variance of each estimator corresponding
the measurement data and the previous estimation since we to each sampling technique through the estimation error
already has the transition probability available. Because after about the final position of the robot after 30 moves. A
re-sampling phase, the set of particles do not maintain any more rigorous evaluation would be taking into account the
weights value, the newly constructed set will have identical estimation error after each move, but since the performance
unormalised weights of 1.0. The overall state of the robot of each experiment is inherently dependent on the initial con-
is estimated similar to the expectation in basic Monte Carlo figuration of particles set we can ignore those intermediate
method: steps and concentrate on the final result only.
P Before using Particle Filter to estimate the final position
i Xi of each experiment, a simulation course of movements are
E[bel(xt )] = (20)
M conducted beforehand to achieve a set of ground truth data.
Although this is not the best way to obtain the most accurate
ground truth because the noises in two different runs are
IV. T HE F OUR L ANDMARKS E XPERIMENT
different, the variance of the noises are small enough to
A. Environment Setup ensure the discrepancies between the ground truth data are
negligible. Now that we have the ground truth data and the
estimation data, the error of each experiment is computed as
the Euclidean distance of two values:
5
simpler distributions for drawing samples from. This step
of finding proposal distribution not only provide a solution
for the intractable computation problem of the original
distribution but also open suggestion for improvement of the
overall estimator. However, as the weights are transformed
over time, the variance of the estimator always increases and
the mean square error eventually diverges. This issue raises
the necessity for deriving methods to circumvent the set of
samples being constructed at each step. Re-sampling is one
of those techniques that partially solves the problem by elim-
inating the unimportant samples and creating more offspring
Fig. 4. Estimation Error of 03 sampling techniques of important samples. As a result the whole system is able
to make use of additional input over time in a greater extent.
However the randomness of re-sampling process bring some
drawbacks, including the problem of particles deterioration
which basically reducing the diversity of the set of particles
and accidentally eliminate some particles that are actually
worth keeping. Different re-sampling techniques have been
studied to reduce the rate of diversity decline, two of them
are Stratified Sampling and Systematic Sampling. While still
being subjected to the number of particles as other sampling
process, Stratified Sampling and Systematic Sampling have
been proved both theoretically and empirically to be much
better than Simple Random Sampling at covering wider area
of domain space while keeping low variances of estimators.
Fig. 5. Estimation Error of 03 sampling techniques
In addition to employing different re-sampling techniques
to boost the performance of the systems, the diversity of
particles can also be enhanced by restricting the frequency of
performance. No matter how the configuration of particle
re-sampling phase. One obvious situation in which sampling
set is generated, the maximum error of Systematic Sampling
should be avoided is when no information is provided to
( 6) never exceeds half of the tolerance threshold (15) and
the systems so that no particles should be removed for
remains the smallest one. The average estimation error of
no reasons. There has been more intricate insights on this
this method is being 4.903, slightly higher than Stratified
matter. For example, Thrun et.al. [6] suggested that when
Sampling of 4.674 and well under result of Simple Ran-
the variance of the weights is low, which means many
dom Sampling (7.095). However, the small average error
particles share the same importance, then re-sampling should
of Stratified Sampling is probably resulted from the outlier
be limited. However it is believed that the actual scenarios
value where the error is nearly 0.17 as it can be easily
of conducting re-sampling phase or not requires practical
spotted from the box plot in Figure 3. The performance
experience and empirical evaluation [6].
result also support the hypothesis that Stratified Sampling
indeed bring significant improvement compare to Simple
Random Sampling method. In fact, the Simple Random
Sampling sometimes results in unacceptable estimation with
error values higher than acceptance threshold. In the case
when the sensors are highly accurate but the motors are not,
Simple Random Sampling should be avoided because in that
case a higher weights will be accounted by a very inaccurate
samples drawn from p(xt |ut , xt1 ) and thus the variance will
diverge, which should be completely prevented.
6
R EFERENCES
[1] Kuber Prasad Bhatta, Ram Prasad Chaudhary, and Ole Reidar Vetaas.
A comparison of systematic versus stratified-random sampling design
for gradient analyses: a case study in subalpine himalaya, nepal.
Phytocoenologia, 42(3-4):191202, 12 2012.
[2] Arnaud Doucet, Email Arnaudismacjp, and Adam M Johansen. A
Tutorial on Particle Filtering and Smoothing : Fifteen years later.
(December):46, 2008.
[3] Arnaud Doucet, Simon Godsill, and Christophe Andrieu. On sequential
monte carlo sampling methods for bayesian filtering. Statistics and
Computing, 10(3):197208.
[4] Jeroen D Hol. Institutionen f or systemteknik Department of
Electrical Engineering. 2004.
[5] Augustine Kong, Jun S. Liu, and Wing H. Wong. Sequential Imputa-
tions and Bayesian Missing Data Problems. Journal of the American
Statistical Association, 89(425):278288, March 1994.
[6] Sebastian Thrun. PROBABILISTIC ROBOTICS. pages 19992000,
2000.