0% found this document useful (0 votes)
35 views

A Detection-Based Multiple Object Tracking Method: Mei Han Amit Sethi Yihong Gong

Object Tracking Algorithms

Uploaded by

Helga Shiryaeva
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

A Detection-Based Multiple Object Tracking Method: Mei Han Amit Sethi Yihong Gong

Object Tracking Algorithms

Uploaded by

Helga Shiryaeva
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

A Detection-Based Multiple Object Tracking Method

Mei Han Amit Sethi† Yihong Gong


NEC Laboratories America, Cupertino, CA, USA

University of Illinois at Urbana Champaign, Champaign, IL, USA

Abstract A well-known early work in multiple hypothesis tracking


(MHT) is the algorithm developed by Reid [5]. The joint
In this paper we describe a method for tracking multiple ob- probabilistic data association filter (JPDAF) [6] finds the
jects whose number is unknown and varies during tracking. state estimate by evaluating the measurement-to-track as-
Based on preliminary results of object detection in each sociation probabilities. Some methods [7, 8] are presented
image which may have missing and/or false detection, the to model the data association as random variables which
multiple object tracking method keeps a graph structure are estimated jointly with state estimation by EM itera-
where it maintains multiple hypotheses about the number tions. Most of these work are in the small target tracking
and the trajectories of the objects in the video. The image community where object representation is simple.
information drives the process of extending and pruning We propose a multiple hypothesis method to track multi-
the graph, and determines the best hypothesis to explain ple objects based on object detection. The detection results
the video. While the image-based object detection makes a recognize the tracking targets in each image. Any object
local decision, the tracking process confirms and validates detection method can be used. In our implementation, we
the detection through time, therefore, it can be regarded apply a neural network based object detection module to
as temporal detection which makes a global decision across detect pedestrians. The tracking algorithm accumulates
time. The multiple object tracking method gives feedbacks the detection results in a graph-like structure and main-
which are predictions of object locations to the object de- tains multiple hypotheses of objects trajectories. At the
tection module. Therefore, the method integrates object same time, the multiple object tracking method gives feed-
detection and tracking tightly. The most possible hypothe- backs which are predictions of object locations to the object
sis provides the multiple object tracking result. The exper- detection module. Therefore, the tracking method tightly
imental results are presented. integrates object detection and tracking to guarantee a ro-
bust and efficient tracking algorithm. Many people have
worked on the integration of object detection and tracking.
1 Introduction SVM tracker [9] applies recognition algorithms to efficient
visual tracking. Many systems of multiple people detection
and tracking are presented using aspect ratio [10], silhou-
Multiple object tracking has been a challenging research ette [11], human shape model [12] to detect human. None
topic in computer vision. It has to deal with the diffi- of these methods maintains multiple hypotheses.
culties existing in single object tracking, such as changing
Our multiple object tracking method is reliable to deal
appearances, non-rigid motion, dynamic illumination and
with occlusions, irregular object motions, changing appear-
occlusion, as well as the problems related to multiple ob-
ances by postponing the decision of object trajectories until
ject tracking including inter-object occlusion, multi-object
sufficient information is accumulated over time. It makes a
confusion. There has been much work on multiple object
global decision. The most possible hypothesis generates the
visual tracking. MacCormick and Blake [1] use a sampling
multiple object tracking result. The trajectories provide in-
algorithm for tracking fixed number of objects. Tao et al.
formation of object identifications, motion histories, timing
[2] present an efficient hierarchical algorithm to track mul-
and object interactions. The information can be applied to
tiple people. Isard and MacCormick [3] propose a Bayesian
detect abnormal behaviors in video surveillance and collect
multiple-blob tracker. Hue et al. [4] describe an extension
traffic data in traffic control systems.
of classical particle filter where the stochastic assignment
vector is estimated by a Gibbs sampler. These methods
only keep one hypothesis of the tracking result which has
the largest posterior probability based on current and pre- 2 Object Detection
vious observations. They may fail with background clutter,
occlusion and multi-object confusion. Multiple hypothesis The multiple object tracking method works on fixed cam-
methods are more robust because the tracking result cor- eras. It starts with an adaptive background modelling mod-
responds to the state sequence which maximizes the joint ule which deals with changing illuminations and does not re-
state-observation probability. quire objects to be constantly moving. A Gaussian-mixture
ber of hypotheses are maintained in the graph structure,
which improves the computation efficiency.
In the graph structure (Figure 2), the graph nodes rep-
resent the object detection results. Each node is composed
(a) (b) (c) of the object detection probability, object size or scale, lo-
cation and appearance. Each link in the graph is com-
Figure 1: Object detection: (a) original image, (b) fore- puted based on position closeness, size similarity and ap-
ground mask image, the white pixels represent the mask pearance similarity between two nodes (detected objects).
of the foreground objects, (c) human detection results, the The graph is extended over time. In this section we describe
lighter pixels show the higher detection probabilities. three steps of the tracking algorithm: hypotheses genera-
tion, likelihood computation and hypotheses management.

based background modelling method [13] is used to gen-


erate a binary foreground mask image as shown in Figure
1(b). The white pixels represent the mask of the foreground
objects. An object detection module takes the foreground
pixels generated by background modelling as input and out-
puts the probabilities of object detection. It searches over
the foreground pixels and gives the probability of each lo-
cation where a certain scale object is found. Any object
detection approach can be fit into this part. In our imple-
mentation, we apply a neural network based object detec-
tion module to detect pedestrians. Each foreground blob
Figure 2: Graph structure in multiple object tracking
is potentially the image of a person. Each pixel location is
applied to a neural network that has been trained for this
task. The neural network generates a score, or probability,
indicative of the probability that the blob around the pixel 3.1 Hypotheses Generation
does in fact represent a human of some scale. A particular
part of the detected person, e.g., the approximate center of Given object detection results in each image, the hypothe-
the top of the head, is illustratively used as the “location” ses generation step firstly calculates the connections be-
of the object, which is shown as a light spot in Figure 1(c). tween the maintained graph nodes and the new nodes from
The lighter spot demonstrates the higher detection score. current image. The maintained nodes include the ending
The neural network searches over each pixel at a few scales. nodes of all the trajectories in maintained hypotheses. They
The detection score corresponds to the best score, i.e., the are not necessarily from the previous image since object de-
largest detection probability, among all scales. tection may have missing detections. The connection prob-
ability is computed according to,

pcon = wa × pa + wp × pp + ws × ps (1)
3 Tracking Algorithm
where wa , wp and ws are the weights in the connection prob-
ability computation, that is, the connection probability is
The tracking algorithm accepts the probabilities of prelim-
a weighted combination of appearance similarity probabil-
inary object detection and keeps multiple hypotheses of
ity pa , position closeness probability pp and size similarity
object trajectories in a graph structure, as shown in Fig-
probability ps . We prune the connections whose probabili-
ure 2. Each hypothesis consists of the number of objects
ties are very low for the sake of computation efficiency.
and their trajectories. The first step in tracking is to ex-
tend the graph to include the most recent object detec- As shown in Figure 2, the generation process takes care
tion results, that is, to generate multiple hypotheses about of object occlusion by track splitting and merging. When
the trajectories. An image based likelihood is then com- a person appears from occlusion, the occluding track splits
puted to give a probability to each hypothesis. This com- into two tracks, on the other hand, when a person gets
putation is based on the object detection probability, ap- occluded, the corresponding node is connected (merged)
pearance similarity, trajectory smoothness and image fore- with the occluding node. The generation process deals with
ground coverage and compactness. The probabilities are missing data naturally by skipping nodes in graph exten-
calculated based on a sequence of images, therefore, they sions, that is, the connection is not necessarily built on
are temporally global representations of hypotheses likeli- two nodes from consecutive image frames. The generation
hood. The hypotheses are ranked by their probabilities and handles false detections by keeping the hypotheses ignoring
the unlikely hypotheses are pruned from the graph in the some nodes. It initializes new trajectories for some nodes
hypotheses-management step. In this way a limited num- depending on their (weak) connections with existing nodes
and their locations (at appearing areas, such as doors, view The hypothesis likelihood is a value refined over time.
boundaries). The multiple object tracking algorithm keeps It provides a global description of object detection results.
all possible hypotheses in the graph structure. At each lo- Generally speaking, the hypotheses with higher likelihood
cal step, it extends and prunes the graph in a balanced way are composed of better object detections with good image
to maintain the hypotheses as diversified as possible and explanation. It tolerates missing and false detections since
delays the decision of most likely hypothesis to a later step. it has a global view of image sequences.

3.2 Likelihood Computation 3.3 Hypotheses Management

The likelihood or probability of each hypothesis generated This step ranks the hypotheses according to their likeli-
in the first step is computed according to the connec- hood values. To avoid combinatorial explosion in graph ex-
tion probability, the object detection probability, trajec- tension, we only keep a limited number of hypotheses and
tory analysis and the image likelihood computation. The prune the graph accordingly. The hypotheses management
hypothesis likelihood is accumulated over image sequences, step deletes the out-of-date tracks, which correspond to the
objects which are gone for a while, and keeps a short list of
likelihoodi = likelihoodi−1 active nodes which are the ending nodes of the trajectories
Pn
j=1 log(pconj ) + log(pobjj ) + log(ptrjj ) of all the kept hypotheses. The number of active nodes is
+ the key to determine the scale of graph extension, there-
n
+ Limg (2) fore, a careful management step assures efficient computa-
tion. The design of this multiple object tracking algorithm
where i is the current image frame number, n represents follows two principles: 1. We keep as many hypotheses as
the number of objects in current hypothesis. pconj denotes possible and make them as diversified as possible to cover
the connection probability of jth trajectory computed in all the possible explanations of image sequences. The top
Equation (1). If jth trajectory has missing detection in hypothesis is chosen at a later time to guarantee it is an
current frame, a small probability, i.e., missing probability, informed and global decision. 2. We make local prunes
is assigned to pconj . pobjj is the object detection proba- of unlikely connections and keep only a limited number of
bility and ptrjj measures the smoothness of jth trajectory. hypotheses. With reasonable assumptions of these thresh-
We use the average likelihood of multiple trajectories in olds, the method achieves real-time performance in a not-
the computation. The metric prefers the hypotheses with too-crowded environment. The graph structure is applied
better human detections, stronger similarity measurements to keep multiple hypotheses and make reasonable prunes
and smoother tracks. Limg is the image likelihood of the for both reliable performance and efficient computation.
hypothesis. It is composed of two items,
The tracking module provides feedbacks to the object de-
Limg = lcov + lcomp (3) tection module to improve the local detection performance.
According to the trajectories in the top hypothesis, the mul-
where tiple object tracking module predicts the most likely loca-
T Sn ! tions to detect objects. This interaction tightly integrates
|A ( j=1 Bj ) + c| the object detection and tracking, and makes both of them
lcov = log
|A| + c more reliable.
T Sn !
|A ( j=1 Bj ) + c|
lcomp = log Pn (4)
| j=1 Bj | + c
4 Experiment
lcov calculates the hypothesis coverage of the foreground
pixels and lcomp measures the hypothesis compactness. A The multiple object tracking method has been tested on
denotes the sum of foreground pixels andTBj represents the two existing CCTV cameras. The first scenario includes
pixels covered bySjth node (or track). denotes the set two persons coming into the door about the same time.
intersection and the set union. The numerators in both Figure 3(a) shows 4 images from the sequence with overlaid
lcov and lcomp represent the foreground pixels covered by bounding boxes showing the human detection results. The
the combination of multiple trajectories in current hypoth- darker the bound box the higher the detection probability.
esis, therefore, lcov represents the foreground coverage of Figure 3(b) demonstrates the multi-tracks with the largest
the hypothesis, the higher the larger coverage, and lcomp probability generated by the multiple object tracking. The
measures how much the nodes overlap with each other, the tracks are overlaid on the detection score map. Different
larger the less overlap and the more compact. c is a con- intensities represent different tracks. The human detection
stant. These two values give a spatially global explanation based on each image is certainly not perfect. In the first
of the image (foreground) information. This computation and third images, the human detector misses the person
is similar to the image likelihood computation in [2]. in the back due to occlusion and the person in the front
(a) (b)

Figure 3: Tracking results with missing/false human detections: (a) original images with overlaid bounding boxes showing the
human detection results, (b) multiple object tracking result overlaid on the human detection map.

(a) (b)

Figure 4: Tracking results of crossing tracks: (a) original images with overlaid bounding boxes showing the human detection
results, (b) multiple object tracking result overlaid on the human detection map.

due to distortion, respectively. There are false detections [5] D.B. Reid, “An algorithm for tracking multiple targets,”
in the forth image caused by background noise and people AC, vol. 24, no. 6, pp. 843–854, December 1979.
interaction. However, the multiple object tracking method [6] T.E. Fortmann, Y. Bar-Shalom, and M. Scheffe, “Sonar
manages to maintain the right number of tracks and their tracking of multiple targets using joint probabilistic data
configurations, as shown in Figure 3(b), because it searches association,” IEEE Journal Oceanic Eng., vol. OE-8, pp.
for the best explanation sequence of the observations over 173–184, July 1983.
time. [7] R.L. Streit and T.E. Luginbuhl, “Maximum likelihood
method for probabilistic multi-hypothesis tracking,” in
Figure 4 demonstrates an example of multiple people
Proceedings of SPIE International Symposium, Signal and
tracking with crossing tracks. The example first shows the
Data Processing of Small Targets, 1994.
lady opens the door for the person in gray shirt, then the
person in dark shirt follows and goes into the area. Figure [8] H. Gauvrit and J.P. Le Cadre, “A formulation of multitar-
get tracking as an incomplete data problem,” IEEE Trans.
4(a) shows the images from the sequence and (b) demon-
on Aerospace and Electronic Systems, vol. 33, no. 4, pp.
strates the tracking result. Interestingly, there is one short
1242–1257, Oct 1997.
track close to the up-left corner of the result image because
one person is standing inside the door and the human de- [9] S. Avidan, “Support vector tracking,” in CVPR01, 2001,
pp. I:184–191.
tection consistently detects him through the glass window.
Therefore, 4 tracks are shown in Figure 4(b), the short track [10] I. Haritaoglu, D. Harwood, and L.S. Davis, “W4s: A real-
for the standing person, the long track for the lady, the light time system for detecting and tracking people in 2 1/2-d,”
track for the guy in gray shirt, and the dark track for the in ECCV98, 1998.
guy in dark shirt. [11] I. Haritaoglu, D. Harwood, and L.S. Davis, “Hydra: Mul-
tiple people detection and tracking using silhouettes,” in
VS99, 1999.
References [12] T. Zhao, R. Nevatia, and F. Lv, “Segmentation and track-
ing of multiple humans in complex situations,” in CVPR01,
[1] J.P. MacCormick and A. Blake, “A probabilistic exclusion 2001, pp. II:194–201.
principle for tracking multiple objects,” in ICCV99, 1999,
[13] C. Stauffer and W.E.L. Grimson, “Learning patterns of
pp. 572–578.
activity using real-time tracking,” PAMI, vol. 22, no. 8,
[2] H. Tao, H.S. Sawhney, and R. Kumar, “A sampling algo- pp. 747–757, August 2000.
rithm for tracking multiple objects,” in Vision Algorithms
99, 1999.
[3] M. Isard and J.P. MacCormick, “Bramble: A bayesian
multiple-blob tracker,” in ICCV01, 2001, pp. II: 34–41.
[4] C. Hue, J.P. Le Cadre, and P. Perez, “Tracking multiple
objects with particle filtering,” IEEE Trans. on Aerospace
and Electronic Systems, vol. 38, no. 3, pp. 791–812, July
2002.

You might also like