Region Covariance Based Object Tracking Using Monte Carlo Method
Region Covariance Based Object Tracking Using Monte Carlo Method
1
Control and Automation
Xiamen, China, June 9-11, 2010
Abstract— Covariance features enabled efficient fusion of been proposed in [2]. Covariance matrix extracted from a
different type of image features have low dimensions and region is used as region descriptor. The covariance matrix
covariance-based object tracking has been proved robust, fuses multiple features which might be correlated such as
versatile for a modest computational cost. In this paper, a
method combined Monte Carlo method and covariance features intensity, color, gradients, filter responses, etc. It is robust,
is proposed. Monte Carlo method is used to determine the scope discriminative and the computing based on integral image
of the search target at the region level. Covariance features are make the computational cost is independent of the size of
used to model the objects appearance at the object level. An the region.
improved object matching and occlusion handling strategies Covariance-based object tracking proposed in [2], [3],
are given, which are followed by an appearance model update
method. Experiments show our approach is robust and effective [4] has been proved robust and versatile for a modest
for tracking the object with irregular movement and partial computational cost. Tuzel et al. [2] proposed a global search
occlusions. algorithm to find the best matching between consecutive
video frames, wherein covariance features are proved faster
I. INTRODUCTION and robuster than color histograms. Porikli et al. [3] proposed
Object tracking is always an important task in computer a simple and elegant algorithm to track non-rigid objects
vision, it has various application in visual surveillance, using a covariance based object description and an update
human-computer interaction, vehicle navigation and so on. mechanism based on means on Riemannian manifolds. Wu et
Object tracking can be classified into point tracking, kernel al. [4] and Palaio, Batista [5] proposed the covariance-based
tracking and silhouette tracking [1]. For differen tracking particle filter for single and multi objects tracking, separately.
task different tracking method is used. Point tracking use However, when particle filters are applied to visual tracking,
point features which are usually detected by Harris, KTL the particle degeneracy and how to establish the target’s
and SIFT detectors to describe object. It is implemented by motion model are two of the main problems.
find the points’ relationship in single and consecutive frames. In this paper, we propose a Monte Carlo method [6]
Kernel tracking is typically performed by computing the combined a covariance based object description to track non-
motion of the kernel from one frame to the next. Silhouette rigid objets with irregular movement and partial occlusions.
tracking is performed by find the object frame by means of an In order to improve the performance of occlusion handling,
object model generated using the previous frames, wherein we give a new object matching and model update strategy.
the object model is in the form of a color histogram, object The paper is organized as follows: Section II describes
edges or the object contours. Details of these three object the region covariance descriptor; in section III, we give the
tracking categories are referred to Ref. [1]. Monte Carlo method; then the object matching and model
Generally, a suitable feature will be selected to describe update strategy are introduced in section IV; experiment
the target in kernel tracking and silhouette tracking. So results are provided in section V, which is followed by
how to select a discriminate, robust and easy computing conclusions in section VI.
feature is very critical to the performance of object tracking. II. R EGION C OVARIANCE D ESCRIPTOR
Firstly, the raw pixel values of several image statics such The region covariance descriptor proposed in [2] by Tuzel
as color, gradient and filter responses are used for image et al. fuses multiple features into a covariance matrix which
features. These features are easy to change in the presence is low-dimension and discriminative.
of illumination changes and non-rigid motions. Then, his- Let I be a three dimensional color image. Let F be the
togram, a nature extension of raw pixel values, is used. W × H × d dimensional feature image extracted from I
However, histogram described features are exponential with
its numbers. Recently, the covariance region descriptor has F (x, y) = φ(I, x, y) (1)
where the function φ can be any mapping such as intensity,
This work is supported by the National High Technology Research and color, gradients, filter responses, etc. For a given rectangular
Development Program of China (No. 2007AA11Z227) and the Natural
Science Foundation of Jiangsu Province of China (No. BK2009352). region R in image I, let {fk }k=1..n be the d-dimensional
Xiaofeng Ding, Fengchen Huang, Lizhong Xu and Xiao-fang feature points inside R. The region covariance can be ex-
Li are with the College of Computer and Information, Hohai pressed as
University, Nanjing, 210098, China. [email protected],
n
[email protected]. 1 X
Chengrong Huang is with the School of Computer Engineering, Nanjing CR = (fk − µ)(fk − µ)T , (2)
Institute of Technology, Nanjing, 211167, China. n−1
k=1
A. Covariance Computation and Dissimilarity Metric The distance between X and Y is given by d(X, Y ) =
k logX (Y )kX . Combining (6) and (4), we have
During the covariance based object tracking process, a
fast covariance matrix computation method of a given region d2 (X, Y ) = hlogX (Y ), logX (Y )iX
is necessary. After getting the regions’ covariance matrixes, 1 1 (7)
= tr log2 (X − 2 Y X − 2 ) .
how to measure the dissimilarity of the matrixes is the basis
of the good tracking performance. To over come these two
III. M ONTE C ARLO M ETHOD
problems, a integral image based computation method [2]
and a Riemannian Manifold [3] are used. Tracking can be considered as estimation of the state
The integral image based covariance matrix computation given all the measurements up to the moment, or equiva-
method proposed in [2] will be used in our paper. When this lently constructing the probability density function of object
method is used in covariance computation, the computation location [3]. Generally, a filtering with prediction approach
cost is independent of the size of the region. Details of the is used. The most common filter is particle filter. However,
computation process are referred to Ref. [2]. the particle degeneracy and how to establish the target’s
Although a covariance metric has been proposed in [3] motion model are critical to the tracking performance. Our
which is an invariant metric for the tangent space for approach is based on the Monte Carlo method [6], which is a
symmetric positive definite matrices. For the completeness special case of particle filter and do not consider the particle
of this paper and the convenience of the next section IV of degeneracy and the object’s motion model.
the object matching and model update, we provide here a During the Monte Carlo method, if we get the object’s
necessary introduction. location in the current frame, then using Monte Carlo method
A covariance metric proposed in [3] is given by to sample the object’s possible locations and scales in the
1 1 next frame. After getting these samples, the best matching,
hy, ziX = tr X − 2 yX −1 zX − 2 , (4)
i.e. the object’s location, can be got by (11).
where hy, ziX is the metric defined by a collection of inner Let the distribution of the object’s motion between con-
products on the tangent space, y and z are arbitrary points tinuous frames is two-dimensional Gaussian. Using the two-
of the manifold; tr(.) denotes the matrix’s trace; the capital dimensional Gaussian distribution to produce n sampling
letters denote the points on manifold and the small letters rectangles. The center of these rectangles subjects to two-
corresponds to vectors on the tangent space. dimensional Gaussian distribution, the mean is the center of
The distance on the manifold, i.e. the dissimilarity between the rectangle in the former frame, the variance reflects the
two regions’ covariance matrixes, is defined in terms of speed of the object’s movement. The sampling rectangles’
minimum length curves between points on the manifold. The centers are produced by the Box-Muller [7] method. Let the
curve with minimum length is called the geodesic and the random variable x and y is the uniform distribution in [0, 1],
length of the curve is the intrinsic distance. Let y ∈ TX M , then u and v expressed as
where TX M is the tangent space at point X ∈ M . There is ( √
a unique geodesic starting at X with tangent vector y. The u = ( −2 ln x cos 2πy) × σ + µ
√ , (8)
exponential map, exp X: TX M → M , maps the vector y to v = ( −2 ln x sin 2πy) × σ + µ
a point Y belonging to the previous geodesic. We denote by
logX its inverse. The exponential map is given by which are the two-dimensional Gaussian distribution. Let the
1 1 1 1 object’s scale is also Gaussian distribution, and the mean is
expX (y) = X 2 exp X − 2 yX − 2 X 2 . (5) the scale of the object in the former frame, the variance
The logarithm map is given by reflects the rate of its change. Then using Monte Carlo
method, we can get the object’s possible location and scale
1 1 1 1
logX (Y ) = X 2 log X − 2 Y X − 2 X 2 . (6) in the next frame, see Fig. 1.
1803
FrB7.1
Let the predictive object location and scale be the (n + 1)-th A. Experiment 1
sampling region. An experiment was implemented to prove that our object
Among all the sampling regions, for the k-th sampling matching strategy is robust and exactly. In Table I, the
region, we can get its five covariance matrixes representation values in the final column do not differ much although the
Cki . For a given i, by min1≤k≤n d2 (Cki , CTi ) we can get occlusions are different; and from column 2 to 6, in each
five sampling regions k1 , k2 , k3 , k4 , k5 based on the spatial column the minim value is corresponding to the maxim
relationships between the original object and the subregions visual area of the target object.
1804
FrB7.1
TABLE I: The dissimilarities of the object without and with different partial occlusions. These dissimilarities are calculated
i
under the best matching selected manually. CO is the i-th covariance matrix (Fig. 2) of the object in Fig. 3(a). Cji is the i-th
covariance matrix of the object with occlusion-j (Fig. 3) with the i-th covariance representation.
1805