712854
712854
Research Article
Vehicle Trajectory Estimation Using Spatio-Temporal MCMC
Copyright © 2010 Yann Goyat et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This paper presents an algorithm for modeling and tracking vehicles in video sequences within one integrated framework. Most
of the solutions are based on sequential methods that make inference according to current information. In contrast, we propose a
deferred logical inference method that makes a decision according to a sequence of observations, thus processing a spatio-temporal
search on the whole trajectory. One of the drawbacks of deferred logical inference methods is that the solution space of hypotheses
grows exponentially related to the depth of observation. Our approach takes into account both the kinematic model of the vehicle
and a driver behavior model in order to reduce the space of the solutions. The resulting proposed state model explains the trajectory
with only 11 parameters. The solution space is then sampled with a Markov Chain Monte Carlo (MCMC) that uses a model-driven
proposal distribution in order to control random walk behavior. We demonstrate our method on real video sequences from which
we have ground truth provided by a RTK GPS (Real-Time Kinematic GPS). Experimental results show that the proposed algorithm
outperforms a sequential inference solution (particle filter).
MCMC have been already used in visual tracking. In [9, 10], Priors
a MCMC based particle filter is presented for multiobject
tracking and an extension is proposed to handle a varying X(0) X∗
Proposal
number of objects (Reversible Jump Markov Chain Monte
Video
Carlo, RJMCMC). In [8], the RJMCMC algorithm is used Acceptance sequence
in a deferred logical inference framework to track several rule
vehicles offline from a video sequence.
In MCMC methods, the random walk behavior is driven I0
X(1)
by proposal distributions. We use priors on driver behav- X∗
i=1
p(X | Z) ≈ {X(i) }N
Proposal
ior and road geometry to define efficiently the proposal.
1
k} K
k=
Exploration is achieved with the Metropolis-Hasting rule Acceptance
{I
according to a global likelihood function. rule
We use a likelihood function based on a background
subtraction algorithm. A discrete set of positions of the X(2)
vehicle into the video sequence is generated from the
trajectory state. A generic 3D model of a vehicle is then
projected into each image and then compared to a back-
ground/foreground map of the video sequence. We propose
an efficient implementation of the likelihood function using X(N)
a line integral image to decrease computation time.
Experiments have been done to compare, on real video
sequences, the deferred logical inference approach with a Figure 1: Given a video sequence and an initial state, the method
classic sequential particle filter. samples the posterior distribution of the trajectory using a random
The remaining of the paper is organized as follows. step method (MCMC).
Section 2 presents the probabilistic framework proposed to
solve the tracking problem. Section 3 provides a detailed
description of the vision likelihood function. A set of exper-
a 11-dimensional state space by driver temporal command
imental results along with both qualitative and quantitative
parameters. This method drastically reduces the dimension
analysis is presented in Section 4, before we conclude in
of the state space, thus improving computational efficiency.
Section 5.
Kinematic L L
modelδ State Vedio
vector
Z
X
Car
Bicycle δ
kinematic model
Camera
parameters
Figure 4: The bicycle model synthesizes the displacement of a
four-wheel vehicle, through the displacement of two wheels whose
centers are connected by a rigid axis of length L. Ackerman’s theory
serves to estimate the steering angle of the front axis of a vehicle
traveling at low speed.
k
k z0
{xk }K
k=1 {Ik }K
k=1
(Rw ) T n θnt
(R0 ) (xt ) R0 y0
K zw ynt x0
{ p(zk | xk )}K
k=1 p(Zk | Xk ) = k=1 p(zk | xk )
yw
Rw
xnt
xw
Figure 3: Illustration of the likelihood function. A discrete set of
positions of the vehicle into the video sequence is generated from Figure 5: Example of a simple three-dimensional geometric model
the trajectory sample. A generic 3D model of a vehicle is then pro- used for a vehicle. It is composed of two cubes. The coordinate
jected into each image and compared to a background/foreground system associated with the cube and the other system associated
map of the video sequence. with the scene are related according to pure translation. The plane
(Oxy) of the world coordinate system and component axes are
merged with the GPS coordinate system.
3.1. Building a Discrete Set of Vehicle Positions. Let X define
a discrete set of temporal positions and orientations of the
vehicle, associated to a sample X of the posterior distribution 3.2. Computing p(zk | xk ). Since the video sequence comes
from a static camera, vehicle extraction is achieved using
.
X = {xk }Kk=1 , (7) a background/foreground extraction approach. We use a
nonparametric method [11], based on discrete modelization
. of the background probability density of the pixel color
with xk = (xk , yk , αk )T is a vector which gives the position .
and orientation of the vehicle at time k into a world reference (RGB). The algorithm provides a set of binary images I =
. T
frame Rw associated to a planar ground. xk can be computed {Ik }K
k=1 , where Ik (u) = 1 if the pixel u = (ux , u y ) is
in a recursive way using a simple kinematic model of the associated to foreground and Ik (u) = −1 if the pixel is
vehicle. Here, we used a bicycle model (cf. Figure 4) associated to background.
A simplified three-dimensional geometric model of
xk = xk−1 + T · vk−1 · cos(αk−1 ), the vehicle is used, as depicted in Figure 5. This model
is composed of two nested parallelepipeds. In a general
yk = yk−1 + T · vk−1 · sin(αk−1 ), (8) case, the model may be more complex and contain PM
v
αk = αk−1 + T · k−1 · tan(δk−1 ), parallelepipeds. Let M(R0 ) = {Mi(R0 ) }i=1,...,NM represent the
L model’s set of cube vertices (NM = 8 × PM ), expressed
where T is the sample time used for the video acquisition within a coordinate system associated with model R0 . This
and L denotes the wheelbase (distance between front and rear coordinate system is selected such that the 3 axes all lie in the
wheels). δk and vk are given by the steering angle and velocity same direction as that of the world coordinate system Rw .
parametric functions presented into Section 2.2. Each point of the vehicle model is projected onto the
The likelihood function p(Z | X) can be written by image via the following equation
and assuming independence of random variables with M homogeneous coordinates associated with point
M; Cc is the camera projection matrix, and (Rw ) T(R0 ) (xk )
K the homogeneous transformation matrix between the world
p(Z | X) = p(zk | xk ). (10) coordinate system and the system associated with the 3D
k=1 model (cf. Figure 5).
EURASIP Journal on Advances in Signal Processing 5
Table 1: Position error (the true position is given by a RTK GPS) for the proposed deferred logical inference method and a sequential particle
filter.
Method Position error (m) Position std. (m) Orientation error (degrees) Orientation std. (degrees)
Sequential filter 0.27 0.26 3.67 3.36
Deferred logical inference 0.20 0.22 1.12 0.97
100 100
80 80
Accuracy (%)
Accuracy (%)
60 60
40 40
20 20
0 0
0 10 20 30 40 50 0 10 20 30 40 50
Position tolerance (cm) Orientation (yaw) tolerance (degrees/10)
Deferred Deferred
Sequential Sequential
(a) (b)
Figure 7: Percentage of correct position/orientation related to the tolerance. (a) position absolute tolerance. (b) orientation (yaw angle)
absolute tolerance.
not presented in this paper.) This vehicle traveled through trajectory, thus bringing more time consistency than the
the test section 20 times along various trajectories at speeds sequential method.
ranging from 40 to 80 km/hr. The error was quantified as the Figure 8 illustrates the two methods on a real sequence.
average distance between each estimated vehicle position and Curves on the right column show zooms on local trajectories.
the straight line passing through the two closest GPS points. The middle column illustrates the image projection of the
For each test, at least five vehicle runs were carried out, vehicle position for the sequential method. The right column
which enabled deriving a very rough statistic on the recorded illustrates the image projection of the vehicle position for
measurements. For the tests actually conducted, the vehicle the deferred method. It is of high interest to notice the
has been tracked in a curve over a distance of approximately noisy estimation provided by the sequential method, where
100 m (minimum radius = 130 m). the estimated trajectory does not seem to match the vehicle
All the experiments presented here have been done using kinematic model. The reason for this weak consistency is
200 particles for the two methods. that the maximum a posteriori estimate may be found
Table 1 presents the average error and related standard on different particles at every time step. In contrast, the
deviations for the two tested methods. The proposed spatio-temporal deferred approach ensures faithfulness to
deferred logical inference provides a lower global error than the model, thus explaining the observed improvement.
the sequential particle filter.
Figure 7 plots the estimation accuracy as a percentage 5. Conclusion
of correct positions (vertical axis) versus an error tolerance
(horizontal axis) for both methods. On the left graph the We have presented a solution for estimating vehicle tra-
error tolerance is the position absolute error, ranging from jectories using a single static color camera. A spatio-
0 to 50 cm, while on the right graph the error tolerance is temporal deferred logical inference solution which takes
the vehicle orientation (yaw angle) absolute error, ranging into account both vehicle kinematics and driver behavior
from 0 to 5 degrees. The curve associated to the proposed has been proposed, using a stochastic approach to estimate
method outperforms the sequential particle filter both for the the posterior distribution of the trajectory. By choosing a
position and the orientation estimation. Moreover, the right MCMC, the random walk evolution is controlled by injecting
graph emphasizes the benefit of the deferred method, which priors on both driver and vehicle behavior and on geometric
integrates the vehicle and driver priors in every generated knowledge about the road. Moreover, a global likelihood
EURASIP Journal on Advances in Signal Processing 7
177
176
175
174
198
197
196
195
194
193
192
191
190
189
103 104 105 106 107 108 109 110 111
206.8
206.6
206.4
206.2
206
205.8
205.6
205.4
205.2
205
204.8
99.2 99.6 100 100.4 100.8
Real trajectory
Sequential method
Deferred method
Figure 8: Snapshots illustrating the two methods. Left column: zoom on local trajectories. Middle column: the bounding box illustrates
the position of the vehicle estimated with the sequential method. Right column: the bounding box illustrates the position of the vehicle
estimated with the sequential method.
function using background/foreground binary extraction the entire curve, the system is composed of three color
has been proposed, with an efficient implementation. cameras with very little overlap. The system has successfully
Experiments have been achieved to demonstrate that the analyzed observations recorded under actual traffic condi-
proposed method outperforms a classic sequential particle tions over several-day periods.
filter solution using statistics performed on real video
sequences. Two points explain this improvement. First, the
spatio-temporal deferred approach processes over the whole
data set, thus ensuring time consistency. Second, the spatio- References
temporal deferred approach, unlike the sequential approach, [1] D. Comaniciu, V. Ramesh, and P. Meer, “Real-time tracking
ensures total faithfulness to the model at any time step, of non-rigid objects using mean shift,” in Proceedings of IEEE
because the maximum a posteriori estimate may be found Computer Society Conference on Computer Vision and Pattern
on different particles at every time step. Recognition, vol. 2, pp. 142–149, 2000.
The method discussed in this paper is currently operating [2] M. Isard and A. Blake, “Condensation—conditional density
24 hours a day with various weather conditions to provide propagation for visual tracking,” International Journal of
statistics on curve trajectories. For the purpose of covering Computer Vision, vol. 29, no. 1, pp. 5–28, 1998.
8 EURASIP Journal on Advances in Signal Processing