Articulated Body Motion Capture by Stochastic Search
Articulated Body Motion Capture by Stochastic Search
c 2005 Springer Science + Business Media, Inc. Manufactured in The Netherlands.
Received August 19, 2003; Revised April 9, 2004; Accepted April 9, 2004
Abstract. We develop a modified particle filter which is shown to be effective at searching the high-dimensional
configuration spaces (c. 30 + dimensions) encountered in visual tracking of articulated body motion. The algo-
rithm uses a continuation principle, based on annealing, to introduce the influence of narrow peaks in the fitness
function, gradually. The new algorithm, termed annealed particle filtering, is shown to be capable of recovering full
articulated body motion efficiently. A mechanism for achieving a soft partitioning of the search space is described
and implemented, and shown to improve the algorithm’s performance. Likewise, the introduction of a crossover
operator is shown to improve the effectiveness of the search for kinematic trees (such as a human body). Results
are given for a variety of agile motions such as walking, running and jumping.
Keywords: human motion capture, visual tracking, particle filtering, genetic algorithms
in short form in Deutscher et al. (2000, 2001). Our ap- geometry) are compared with the visual data from an
proach is characterised by the following: 1. articulated image stream, to estimate a best fit for each frame in the
body model, 2. weak dynamical modelling, 3. edge scene. Thus the principal components usually comprise
and background subtraction image measurements, and a target model, an image search method, and a dynam-
4. a particle-based stochastic search algorithm. The key ical model.
contributions comprise: The system of Plänkers and Fua (2003) represents
one of the best examples of this paradigm in its simplest
• The development of a novel, particle-based stochas- form. Their system does not have a (strong) model of
tic search algorithm, called annealed particle filter- the person’s dynamics (in contrast to Sidenbladh et al.
ing. The method uses a continuation principle, based (2000), e.g., see below) or have a sophisticated multi-
on annealing, to introduce the influence of narrow modal search algorithm such as we describe. Rather,
peaks in the fitness function gradually. This is intro- the key to the success of their system is in much more
duced in Section 3. careful modelling of the shape and appearance than in
• The annealed particle filter is applied to the problem most other work, and in the use of binocular disparity
of markerless human motion capture, and shown to maps as well as silhouette data. Unlike most other work
be more effective and efficient than, for example, (including our own) their system estimates the size of
Condensation (Blake and Isard, 1998), at localising the body as well as the pose parameters.
the pose. Section 4 discusses implementation issues In considering the role of the other two system com-
and Section 5 shows results. ponents, search and dynamics it is useful to discuss the
• We demonstrate how adaptive selection of the vari- influential work of Harris (1992). He showed how rapid
ances/covariances which control the diffusion dur- motions can be tracked by constraining the search area
ing annealing can lead to what can be thought of as a via a predicted motion of the object. Harris used rigid
“soft” hierarchical partitioning of the configuration polyhedral models (and simple surfaces of revolution)
space, and hence to further gains in efficiency. and sought the 6 DOF pose of object. Given a predicted
• We introduce a crossover operator, analogous that location, the system searches from predicted edge lo-
that found in Genetic Algorithms, into the particle cations along 1D profiles to find “actual” edges. These
filtering framework. We demonstrate that this opera- 1D measurements are then combined to obtain a pose
tor improves the ability of the algorithm to search the update.
configuration spaces of objects whose articulations Drummond and Cipolla (2001) showed how many
can be modelled as a kinematic tree. In particular of the ideas in Harris’ system can be extended to artic-
we show results for reliable and efficient tracking ulated objects by effectively tracking body parts using
for walking, running and jumping with no special Harris’ method but enforcing global consistency via
training of the dynamics of any activity. kinematic constraints.
These latter two developments are discussed in A second and arguably more important innovation in
Section 6. Harris (1992) was to place the tracking system within
the framework of a Kalman Filter, a provably optimal
We begin with a review of relevant literature in recursive estimator for linear systems which can be
Section 2, including a detailed discussion of the two thought of as an algorithm for sequential propagation
most closely associated technologies: particle filtering of Gaussian probability densities.
and simulated annealing. A natural step would be to consider the use of a
Kalman Filter, (or its extension to non-linear measure-
2. Background ments and/or dynamics, the Extended Kalman Filter
or EKF) for articulated body tracking. Wachter and
2.1. Visual Tracking and Particle Filters Nagel (1999) demonstrated this for single view track-
ing using image motion and edges (though the results
Full-body motion capture is an example of model- show only motion parallel to the image plane). More
based tracking; i.e. the process of sequentially esti- recently Mikic et al. (2001) demonstrated the extrac-
mating the parameters of a model of a target over tion and filtering of pose parameters from a volumetric
time from visual data. Typically a priori knowledge model obtained by “carving” space using silhouettes
about the target’s observable properties (such as its from multiple cameras.
Articulated Body Motion Capture by Stochastic Search 187
While Gaussian uncertainty is sufficient for mod- weight for each particle proportional to how well it
elling many motion and measurement noise sources, fits image data. The weighted particle set produced
the Kalman Filter has been shown to fail catastrophi- represents the new (posterior) probability density
cally in cases where the true probability function has after movement and measurement.
a very different shape. In particular attempts to track
objects moving against a very cluttered background,
where measurement densities include the chance of Particle filtering works well for tracking in clutter
detecting erroneous image features and are therefore because it can represent arbitrary functional shapes
multi-modal, lead to tracking failure for Harris’ algo- and propagate multiple hypotheses. Less likely model
rithm and many of its ilk. configurations will not be thrown away immediately
An alternative, more general approach is particle fil- but given a chance to prove themselves later on, re-
tering, in which arbitrary densities are approximated. sulting in more robust tracking. In its original imple-
This was introduced in the context of visual tracking mentation, Condensation demonstrated robust tracking
in the form of the Condensation algorithm (Isard and in low-dimensional configuration spaces (up to about
Blake, 1996). A posterior density p(X | Zt ) represent- 10 DOF) in the presence of significant clutter. Even
ing current knowledge about the model state X after in the absence of a cluttered background, the compli-
incorporation of all measurements up to the current cated nature of the observation process during human
time-step t, Zt , is represented by a finite set of nor- motion capture causes the posterior density to be non-
malised weighted particles, or samples, Gaussian and multi-modal as shown experimentally
by Deutscher et al (1999). Condensation has indeed
(0) ) (N ) been implemented successfully for short human mo-
s(0)
t , πt · · · s(N
t , πt .
tion capture sequences (see Deutscher et al. (2000)
and Sidenbladh et al. (2000)), however, in the high-
An estimate of the state Xt at each time-step t can
dimensional configuration spaces occurring in human
easily be estimated by the sample mean of the posterior
motion capture and other domains, there are serious
density, p(X | Zt ),
problems with Condensation arising from the inabil-
N ity of a manageable size particle set (of, say, a few
Xt = E[X] = πt(n) s(n)
t (1) thousand particles), adequately to populate the space
n=1 and represent an arbitrary density. In fact it has been
shown by MacCormick and Isard (2000) that N ≥ Dαmin d
or the mode where N is the number of particles required, d is the
( j) ( j) number of dimensions. The survival diagnostic Dmin
Xt = M[X] = st , πt = max πt(n) . (2) and the particle survival rate α are both constants with
α 1. Clearly when d is large normal particle filtering
Essentially, a smooth probability density function is becomes infeasible.
approximated by a finite collection of weighted sam- Cham and Rehg (1999) proposed the use of a mul-
ple points, and it can be shown that as the number of tiple hypothesis tracker which represented the poste-
points tends to infinity the behaviour of the particle set rior distribution as a piecewise Gaussian. As only local
is indistinguishable from that of the smooth function. modes are propagated between frames, the solution is
Tracking with a particle filter works by: computationally much cheaper than Condensation, but
they avoid the pitfalls of a single hypothesis tracker.
1. Resampling, in which a weighted particle set is Unlike our work, in which we explicitly model the joint
transformed into a set of evenly weighted particles angles and overall pose degrees of freedom, they use a
distributed with concentration dependent on proba- so-called scaled prismatic model which explicitly mod-
bility density; els 2D in-plane translation and rotation, but models out
2. Stochastic movement and dispersion of the particle of plane rotation via a per-link independent scaling.
set in accordance with a motion model to represent Partitioned sampling was developed by
the growth of uncertainty during movement of the MacCormick and Blake (1999) as a variation on
tracked object; Condensation to reduce the number of particles
3. Measurement, in which the likelihood function is needed to track more than one object, and applied
evaluated at each particle site, producing a new by MacCormick and Isard (2000) to the problem
188 Deutscher and Reid
of tracking articulated objects. Using partitioned In addition to the problems of representing PDFs
sampling reduces the number of particles required via particle sets in high dimensional spaces, a second
to N >= Dαmin making the problem tractable. How- difficulty is associated with constructing a valid obser-
ever, this assumes that the configuration space can vation model p(Zt | X) as a normalised probability den-
be sliced so that one can construct an observation sity distribution. Even if such a likelihood model can be
density p(Zt | xi ) for each dimension xi of the model constructed the cost of evaluating it can be prohibitive.1
configuration vector X = {x0 . . . xd }. This assumption, Often an intuitive weighting function w(Zt , X) can be
that it is possible independently to localise separate constructed that approximates the probabilistic likeli-
parts of an articulated model, is similar to that made hood p(Zt | X) but which requires much less computa-
by Gavrila and Davis (1996) to enable a hierarchical tional effort to evaluate.
search. In this paper we reduce the problem from propagat-
Another variation on the standard particle filter used ing the conditional density p(X | Zt ) using p(Z | X),
to reduce the number of particles needed effectively simply to finding the configuration Xt which returns the
to represent a posterior density has been developed by maximum value from a simple and efficient weighting
Sullivan et al. (1999). Called layered sampling it is function w(Zt , X) at each time t, given Xt−1 . By doing
centred around the concept of importance resampling. this gains are made on two fronts: (i) it is possible to
The technique we present in this paper bears some sim- make do with fewer likelihood (or weighting function)
ilarity to layered sampling, but experimental evidence evaluations because the function p(X | Zt ) no longer
suggests our technique is more effective at reducing the has to be fully represented; and (ii) an evaluation of a
number of particles required when the dimensionality simple weighting function w(Zt , X) requires less com-
of the search space approaches 30. putational effort when compared to an evaluation of
Two successful recent approaches which use parti- the observation model p(Zt | X). The main disadvan-
cle filtering are due to Sminchisescu and Triggs (2001) tage is that we no longer work within a truly Bayesian
and Sidenbladh et al. (2000). Both are concerned with framework.
monocular tracking (in some important ways more dif- We retain the use of a particle based stochastic frame-
ficult than the multi-camera case) but in other respects work because of its ability to handle multi-modal like-
problem is essentially the same: how can a high dimen- lihoods, or in the case of a weighting function, one
sional space be adequately populated with a particle set with many local maxima. In order most effectively to
of manageable size? Their approaches to this problem optimise the non-convex weighting function we use an
are quite different and in some ways complementary. approach similar to that of simulated annealing.
The former introduces the idea of covariance sam-
pling, spreading particles in areas where there is least
2.2. Simulated Annealing
confidence in the localisation. This idea is very closely
related to our soft partitioning approach developed in
The Markov chain based method of simulated anneal-
Section 6.1. More recently they have extended this
ing was proposed by Kirkpatrick et al. (1983) as a
work explicitly to take into account the particular ambi-
means to optimise a multi-modal objective function
guities that arise from human kinematics, “scattering”
U (x). It proceeds by defining a distribution over the
particles into areas of potential ambiguity and therefore
function values
making better use of the particle set (Sminchisescu and
Triggs, 2003). P(x) = const e−λU (x)
The latter work (Sidenbladh et al., 2000, 2002) on
the other hand, takes the approach that dynamical mod- The aim is then to generate samples xi from this dis-
elling can be used to obtain strong, predictive priors, tribution, in the knowledge that as λ → ∞, the prob-
reducing the search space to manageable proportions. ability mass concentrates on the minumum of U , and
Indeed in Sidenbladh et al. (2000) tracking was re- hence the samples xi will cluster around the minimum
stricted, via the learnt dynamics, to the case of walk- value state.
ing people. More recently however (Sidenbladh et al., Samples from the distribution are generated in a
2002) showed how a database of motions could be con- straightforward fashion using the Metropolis-Hastings
structed and efficiently indexed in order to obtain pre- algorithm (Metropolis et al., 1953) which generates
dictions over a wide class of motions. a Markov sequence of points whose distribution will
Articulated Body Motion Capture by Stochastic Search 189
π
A series of weighting functions w0 (Z, X) to w M (Z, X) Each particle in the set St,m is considered as an
(i) (i) (i)
is employed in which each wm differs only slightly (st,m , πt,m ) pair in which st,m is an instance of the
(i)
from wm−1 (see Fig. 2, where M = 3). The function multi-variate model configuration X, and πt,m is the
190 Deutscher and Reid
5. The set St,m−1 has now been produced which can be measure of the effective number of particles that will be
used to initialise layer m −1. The process is repeated chosen for propagation to the next layer is the survival
π
until we arrive at the set St,0 . diagnostic D (taken from MacCormick (2000)) where
π
6. St,0 is used to estimate the optimal model configu-
−1
ration Xt using
N
(n) 2
D= π (11)
N
n=1
Xt = s(i) (i)
t,0 πt,0 . (8)
i=1
and from this the particle survival rate α can be esti-
π
7. The set St+1,M is then produced from St,0 using mated MacCormick (2000)
s(n) (n) D
t+1,M = st,0 + B0 . (9) α= . (12)
N
This set is then used to initialise layer M of the next
annealing run at tt+1 . In the case of traditional annealing, the temperature
acts like a barrier, restricting the movement of sam-
Note that Step 7, where the particle set for the next ples: the cooler the temperature, the fewer the number
time-step is generated, incorporates no dynamic model. of samples with a low function value U (x) (energy) that
There is nothing in the algorithm that precludes the will be generated. In the context of a particle set, a high
use of dynamics: simply replace Eq. (9) with the more survival rate corresponds to an even spread probability
general mass, while a low one suggests the mass is concen-
trated in a few particles. Hence decreasing the survival
(n)
s(n)
t+1,M = f st,0 + B0 (10) rate has the same effect as cooling the temperature in
traditional annealing.
where the function f represents the dynamical model. Now D is clearly a monotonic decreasing function
We have not done so since our focus is on tracking of βmk . At a given layer, we therefore adjust the value
previously unseen/unmodelled agile motions. While a of βmk to change the value of D(βmk ) so that α = D/N
dynamical model is certainly beneficial during “steady approaches a desired value. This is trivially done by
state” tracking, it can be a hindrance if the model is searching over βmk (using the value from the previous
poor (as it often is for agile motions). The price we pay time step βmk−1 as the starting point) to find the value
for this is a less economical use of particles than would that solves the equation
be ideal, and the potential for jittery trajectories. The
latter could be addressed by smoothing the recovered αdesired = D βmk N
pose/joint trajectories.
i.e. produces the desired rate of annealing.
3.1. Setting the Tracking Parameters Note that this does not mean the weights have to
be completely re-evaluated each time βmk is adjusted
It remains to consider how best to set the free parame- during the search. Since wm (Z, X) = w(Z, X)βm the
ters of the algorithm, and in particular, to consider how values w(Z, X) = s(i) t,m , i : 1 . . . N can be stored for
to influence the annealing schedule. In Eq. (3) it is the each set S k,m and βmk applied to each individual weight
π
value of βmk that determines the rate of annealing at as appropriate to produce St,m .
each layer. How then are the appropriate values for α0 . . . α M
To see how and why this is so, first note that the determined? There are also a number of other track-
equivalent of temperature in our particle-based frame- ing parameters that need to be set before tracking can
work is the particle survival rate: the ratio of effective begin, including the number of particles N , the num-
particles to total number of particles. If the probabil- ber of annealing layers M and the diffusion covariance
ity mass is all concentrated in a few particles then the matrices P M . . . P0 . A tentative framework has been de-
number of effective particles is low, and conversely, an veloped to allocate values to these parameters although
even distribution of probability mass amongst particles it is acknowledged that more work needs to be done in
signals a large number of effective particles. A sensible this area.
192 Deutscher and Reid
Pm = α M × · · · × αm−1 × P0 (13)
we have constructed the weighting function on the
and produced decent results, but a better, adaptive basis of two image features—edges and foreground
method for setting the P is described in Section 6.1. silhouette—chosen for their joint virtues of simplic-
3. The appropriate rates of annealing α M . . . α1 are in- ity (i.e. easy and efficient to extract), and a degree
fluenced by the number of annealing layers used. of invariance to imaging conditions. While these fea-
With a higher number of annealing layers a lower tures are not fully general (in particular the silhou-
rate of annealing can be used to obtain the desired ette relies on a knowledge of the background which
resolution. It was found that while using 10 anneal- may not be available in more general environments)
ing layers setting α M = . . . = α1 = 0.5 provided they suffice for our purposes. Even without a large
sufficient resolution of Xt . degree of background clutter distracting edge mea-
surements, there remains a challenging, multi-modal
search problem because of self occlusions and fore-
4. Implementation ground clutter (i.e. unmodelled markings on the tar-
get). Other features such as optic flow could equally be
4.1. The Model used.
The articulated model of the human body used in this
paper is built around the framework of a kinematic tree, 4.2.1. Edges. The strongest continuous edges pro-
as seen in Fig. 4. Each limb is fleshed out using cones duced by a human subject in an image usually provide
with elliptical cross-sections. Such a model has a num- a good outline of visible arms and legs and are mostly
ber of advantages including computational simplicity, invariant to colour, clothing texture, lighting and pose.
high-level interpretation of output and compact repre- In severely cluttered environments or when the subject
sentation. is wearing very baggy clothes edges may lose some
of their usefulness, however in most situations they
provide a good basis for a weighting function. A gradi-
4.2. The Weighting Function ent based edge detection mask is used to detect edges.
The result is thresholded to eliminate spurious edges,
The basic annealed particle filter is a general optimi- smoothed with a Gaussian mask and remapped between
sation tool and can be used for a variety of purposes 0 and 1. This produces a pixel map (Fig. 5(b)) in which
(for another application see Deutscher et al. (2002)) each pixel is assigned a value related to its proximity
with different weighting functions. In the present work to an edge.
Articulated Body Motion Capture by Stochastic Search 193
where X is the model’s configuration vector and Z is the An equal weighting to each component was determined
image from which the pixel map is derived. pi (X, Z) empirically.
are the values of the edge pixel map at the N sampling When there is more than one camera the measure-
points taken along the model’s silhouette as seen in ments are combined in a similar way, giving
Fig. 6(a).
C
e
w(X, Z) = exp − i (X, Z) + i (X, Z)
r
i=1
(17)
where C is the number of cameras and i∗ (X) is from
camera i. An example of the output of this weighting
function can be seen in Fig. 7.
5. Results
is uncertain. The effect of this can be interpreted as a on the spot. A configuration of the arm is described by
hierarchical search strategy which automatically par- an instance of the state variable x = (x1 , x2 , x3 , x4 ).
titions the search space in a soft way, without any ex- The weighting function w(z, x) required for the APF
plicit representation of the partitions (Section 6.1). Sec- is computed by a Sum of Squared Differences (SSD)
ond, we introduce a crossover operator (similar to that measure between a model template and a silhouette im-
found in Genetic Algorithms) which improves the abil- age (the regional correlation portion of the observation
ity of the tracker to search different partitions in parallel model in Eq. (15)).
(Section 6.2). The set St,m is initialised with particles uniformly
We present results for simple examples to demon- distributed over a range of x that we know to con-
strate the new algorithm’s implementation and ef- tain the actual position of the arm. This results in a
fectiveness, and show that these measures together large and similar variance for each parameter of x over
increase the tracker efficiency by a factor of 4 and in- all the particles in St,m as can be seen in Fig. 12(a).
crease agility of the motion that can be tracked.
We apply the tracker to the complex problem of Hu-
man Motion Capture with 34 degrees of freedom. Extra
degrees of freedom have been added to the model in
Fig. 4 in the back (2) to allow arching that would not
normally be encountered in everday walking (and was
not neceeary in our ealier experiments), in the neck (1)
to account for head nodding, and the clavicles are given
independent motion (2 each).
Figure 12. Parameter variance over one annealing layer: new APF
vs. old APF. On the left graphs a, b and c plot the variance of each
parameter of x = (x1 , x2 , x3 , x4 ) through the first annealing run of
the APF when tracking the articulated arm seen in Fig. 11. Graphs d, e
and f show the same information for the improved APF as described in
Section 6.1. Graphs a and d show the variances of the initial set St,m ,
displaying equal variances for each parameter. Graphs b and e show
the variances of the set St,m−1 before the addition of diffusion noise.
Note that in both b and e, x1 has a very small variance indicating
advanced localisation, however the variance of x2 , x3 and x4 has
been reduced only a little. Up to this point the algorithms are the
same and any differences between b and e are random. After the
Figure 11. A planar articulated arm with 4 DOF is shown (a). It addition of noise in the original APF the localisation of x1 has been
consists of four links connected by swivelling joints and rooted at O. greatly degraded as seen in graph c, however when noise is added
The configuration of the arm is described by x = (x1 , x2 , x3 , x4 ) as in proportion to each parameter’s variance the localisation of x1 is
seen in (b). preserved as seen in graph f.
Articulated Body Motion Capture by Stochastic Search 197
(i)
After calculating a weight πt,m for each particle us-
(i)
ing wm (zt , st,m ) we then proceed to Step 4 of the APF
π
and draw N particles from St,m with replacement and
probability proportional to each particle’s weight.
Consider the set St,m so produced before the addition
of any noise. In a typical annealing run the individual
parameters of each particle were found to have variance
as detailed in Fig. 12(b). Note here that the variance of
x1 has been greatly reduced while the other parame-
ters x2 , x3 and x4 have been hardly reduced at all. The
variance of any parameter can be considered (with a
number of acceptable caveats) to be directly related to
the degree to which the optimal value for that parame-
ter has been localised. Figure 12(b) shows that x1 has
been localised down to a very small area of its range
simply because it dominates the topology of the search
space whereas each particle’s values for x2 , x3 and x4
had very little influence on whether it was selected or
not. In effect we see here an automatic partitioning of
the state space into soft partitions according each pa-
rameter’s topological dominance.
The weakness of the original APF (indeed any par- Figure 13. Variance reduction with the improved APF. Here we
see the orderly reduction of each of the four parameters variances
ticle filter) arises with the addition of diffusion noise from most dominant (x1 ) to least dominant (x4 ) over 6 layers of the
to each particle upon selection. According to Eqs. (7) annealing process while tracking the simple articulated arm. Using
and (13) an equal amount of noise should be added to the improved APF results in a 2-fold increase in efficiency over the
each parameter. This results in a parameter variance classical APF. Tracker efficiency was measured by the minimum
profile like that seen in Fig. 12(c) with the localisa- number of particles needed to successfully track the articulated arm
over 40 frames.
tion of x1 seen in Fig. 12(b) all but wiped out by the
excessive addition of noise.
If instead the amount of randomness added to the
parameters of each selected particle is proportional to Sminchisescu and Triggs (2001) independently ar-
the variance of that parameter over the entire set of rived at a very similar idea, although in that work they
particles, these gains will be protected from disruption. were concerned with most effective use of particles be-
Instead we will arrive at the situation seen in Fig. 12(f) tween frames in order to recover from “ambiguous”
where enough noise has been added to each parameter poses.
to allow the thorough diffusion of the particles into the The changes to the APF are almost trivial, and can
spaces between repeatedly selected particles, but not be formalised as follows. Step 4 of the APF algorithm
enough to increase the variance of any given parameter described in Section 3 is amended so that at layer m,
which would erase any localisation gains made up to Pm is set to be proportional to the covariance of the
that point. particles in St,m as it exists before the addition of noise,
If this new method for determining the elements of i.e..
Pi (the covariance matrix of B from Eq. (7)), is con-
tinued through all the annealing layers we can see that 1 N
(i) (i) av T
each parameter is localised in turn, with some degree Pm ∝ st,m − st,m
av
· st,m − st,m . (18)
N i=1
of overlap as seen in Fig. 13. This can be compared
to the pattern of variance reduction for the original
APF algorithm seen in Fig. 14. This is exactly the where sav t,m is the sample mean of the particle set.
kind of hierarchical soft partitioning that was desired Using this modification enabled successful tracking
and no explicit partition boundaries or functions were with the APF with fewer than half the number of par-
required. ticles; i.e. a 2-fold increase in efficiency.
198 Deutscher and Reid
Figure 15. A pair of planar articulated arms consisting of 3 segments each and each rooted to point O (as seen in b) are used to demonstrate
the effectiveness of the crossover operator. The configuration of the arms is described by x = (x1 , . . . , x6 ) as seen in (b).
Articulated Body Motion Capture by Stochastic Search 199
We have also built a parallel implementation, that are to be expected, with processing time
in which particles are farmed out to indepen- decreasing linearly in the number of proces-
dent processors which compute the weight/likelihood sors (with a constant of proportionality around
function. This achieves the sort of speed-ups 0.8).
Articulated Body Motion Capture by Stochastic Search 203
7. Discussion and Conclusion traditional particle filters, but which retains a number
of their significant advantages. The algorithm has been
We have developed a general algorithm for searching applied to the problem of visually tracking a person in
large configuration spaces which is more efficient than multiple cameras. In this context we have demonstrated
204 Deutscher and Reid
of the IEEE Conf. on Computer Vision and Pattern Recognition, Sminchisescu, C. and Triggs, B. 2001. Covariance scaled sampling
vol. 1, pp. 455–462. for monocular 3d body tracking. In Proc. of the IEEE Conf. on
Neal, R.M. 2001. Annealed importance sampling. Statistics and Computer Vision and Pattern Recognition, vol. 1, pp. 447–454.
Computing, (11):125–139. Sminchisescu, C. and Triggs, B. 2002. Hyperdynamics importance
Plänkers, R. and Fua, P. 2003. Articulated soft objects for multi-view sampling. In Proc. 7th European Conf. on Computer Vision,
shape and motion capture. IEEE Transactions on Pattern Analysis Copenhagen, vol. 1, pp. 769–783.
and Machine Intelligence, 25(10):1182–1187. Sminchisescu, C. and Triggs, B. 2003. Kinematic jump processes
Sidenbladh, H., Black, M.J., and Fleet, D.J. 2000. Stochastic track- for monocular 3d human tracking. In Proc. of the IEEE Conf. on
ing of 3D human figures using 2D image motion. In Proc. 6th Computer Vision and Pattern Recognition, vol. 1, pp. 69–76.
European Conf. on Computer Vision, Dublin, vol. 2, pp. 702– Sullivan, J., Blake, A., Isard, M., and MacCormick, J. 1999. Object
718. localization by bayesian correlation. In Proc. 7th Int. Conf. on
Sidenbladh, H., Black, M.J., and Sigal, L. 2002. Implicit probabilistic Computer Vision, vol. 2, pp. 1068–1075.
models of human motion for synthesis and tracking. In Proc. 7th Wachter, S. and Nagel, H. 1999. Tracking persons in monocular
European Conf. on Computer Vision, Copenhagen, vol. 1, pp. 784– image sequences. Computer Vision and Image Understanding,
800. 74(3):174–192.