A Detection-Based Multiple Object Tracking Method: Mei Han Amit Sethi Yihong Gong
A Detection-Based Multiple Object Tracking Method: Mei Han Amit Sethi Yihong Gong
pcon = wa × pa + wp × pp + ws × ps (1)
3 Tracking Algorithm
where wa , wp and ws are the weights in the connection prob-
ability computation, that is, the connection probability is
The tracking algorithm accepts the probabilities of prelim-
a weighted combination of appearance similarity probabil-
inary object detection and keeps multiple hypotheses of
ity pa , position closeness probability pp and size similarity
object trajectories in a graph structure, as shown in Fig-
probability ps . We prune the connections whose probabili-
ure 2. Each hypothesis consists of the number of objects
ties are very low for the sake of computation efficiency.
and their trajectories. The first step in tracking is to ex-
tend the graph to include the most recent object detec- As shown in Figure 2, the generation process takes care
tion results, that is, to generate multiple hypotheses about of object occlusion by track splitting and merging. When
the trajectories. An image based likelihood is then com- a person appears from occlusion, the occluding track splits
puted to give a probability to each hypothesis. This com- into two tracks, on the other hand, when a person gets
putation is based on the object detection probability, ap- occluded, the corresponding node is connected (merged)
pearance similarity, trajectory smoothness and image fore- with the occluding node. The generation process deals with
ground coverage and compactness. The probabilities are missing data naturally by skipping nodes in graph exten-
calculated based on a sequence of images, therefore, they sions, that is, the connection is not necessarily built on
are temporally global representations of hypotheses likeli- two nodes from consecutive image frames. The generation
hood. The hypotheses are ranked by their probabilities and handles false detections by keeping the hypotheses ignoring
the unlikely hypotheses are pruned from the graph in the some nodes. It initializes new trajectories for some nodes
hypotheses-management step. In this way a limited num- depending on their (weak) connections with existing nodes
and their locations (at appearing areas, such as doors, view The hypothesis likelihood is a value refined over time.
boundaries). The multiple object tracking algorithm keeps It provides a global description of object detection results.
all possible hypotheses in the graph structure. At each lo- Generally speaking, the hypotheses with higher likelihood
cal step, it extends and prunes the graph in a balanced way are composed of better object detections with good image
to maintain the hypotheses as diversified as possible and explanation. It tolerates missing and false detections since
delays the decision of most likely hypothesis to a later step. it has a global view of image sequences.
The likelihood or probability of each hypothesis generated This step ranks the hypotheses according to their likeli-
in the first step is computed according to the connec- hood values. To avoid combinatorial explosion in graph ex-
tion probability, the object detection probability, trajec- tension, we only keep a limited number of hypotheses and
tory analysis and the image likelihood computation. The prune the graph accordingly. The hypotheses management
hypothesis likelihood is accumulated over image sequences, step deletes the out-of-date tracks, which correspond to the
objects which are gone for a while, and keeps a short list of
likelihoodi = likelihoodi−1 active nodes which are the ending nodes of the trajectories
Pn
j=1 log(pconj ) + log(pobjj ) + log(ptrjj ) of all the kept hypotheses. The number of active nodes is
+ the key to determine the scale of graph extension, there-
n
+ Limg (2) fore, a careful management step assures efficient computa-
tion. The design of this multiple object tracking algorithm
where i is the current image frame number, n represents follows two principles: 1. We keep as many hypotheses as
the number of objects in current hypothesis. pconj denotes possible and make them as diversified as possible to cover
the connection probability of jth trajectory computed in all the possible explanations of image sequences. The top
Equation (1). If jth trajectory has missing detection in hypothesis is chosen at a later time to guarantee it is an
current frame, a small probability, i.e., missing probability, informed and global decision. 2. We make local prunes
is assigned to pconj . pobjj is the object detection proba- of unlikely connections and keep only a limited number of
bility and ptrjj measures the smoothness of jth trajectory. hypotheses. With reasonable assumptions of these thresh-
We use the average likelihood of multiple trajectories in olds, the method achieves real-time performance in a not-
the computation. The metric prefers the hypotheses with too-crowded environment. The graph structure is applied
better human detections, stronger similarity measurements to keep multiple hypotheses and make reasonable prunes
and smoother tracks. Limg is the image likelihood of the for both reliable performance and efficient computation.
hypothesis. It is composed of two items,
The tracking module provides feedbacks to the object de-
Limg = lcov + lcomp (3) tection module to improve the local detection performance.
According to the trajectories in the top hypothesis, the mul-
where tiple object tracking module predicts the most likely loca-
T Sn ! tions to detect objects. This interaction tightly integrates
|A ( j=1 Bj ) + c| the object detection and tracking, and makes both of them
lcov = log
|A| + c more reliable.
T Sn !
|A ( j=1 Bj ) + c|
lcomp = log Pn (4)
| j=1 Bj | + c
4 Experiment
lcov calculates the hypothesis coverage of the foreground
pixels and lcomp measures the hypothesis compactness. A The multiple object tracking method has been tested on
denotes the sum of foreground pixels andTBj represents the two existing CCTV cameras. The first scenario includes
pixels covered bySjth node (or track). denotes the set two persons coming into the door about the same time.
intersection and the set union. The numerators in both Figure 3(a) shows 4 images from the sequence with overlaid
lcov and lcomp represent the foreground pixels covered by bounding boxes showing the human detection results. The
the combination of multiple trajectories in current hypoth- darker the bound box the higher the detection probability.
esis, therefore, lcov represents the foreground coverage of Figure 3(b) demonstrates the multi-tracks with the largest
the hypothesis, the higher the larger coverage, and lcomp probability generated by the multiple object tracking. The
measures how much the nodes overlap with each other, the tracks are overlaid on the detection score map. Different
larger the less overlap and the more compact. c is a con- intensities represent different tracks. The human detection
stant. These two values give a spatially global explanation based on each image is certainly not perfect. In the first
of the image (foreground) information. This computation and third images, the human detector misses the person
is similar to the image likelihood computation in [2]. in the back due to occlusion and the person in the front
(a) (b)
Figure 3: Tracking results with missing/false human detections: (a) original images with overlaid bounding boxes showing the
human detection results, (b) multiple object tracking result overlaid on the human detection map.
(a) (b)
Figure 4: Tracking results of crossing tracks: (a) original images with overlaid bounding boxes showing the human detection
results, (b) multiple object tracking result overlaid on the human detection map.
due to distortion, respectively. There are false detections [5] D.B. Reid, “An algorithm for tracking multiple targets,”
in the forth image caused by background noise and people AC, vol. 24, no. 6, pp. 843–854, December 1979.
interaction. However, the multiple object tracking method [6] T.E. Fortmann, Y. Bar-Shalom, and M. Scheffe, “Sonar
manages to maintain the right number of tracks and their tracking of multiple targets using joint probabilistic data
configurations, as shown in Figure 3(b), because it searches association,” IEEE Journal Oceanic Eng., vol. OE-8, pp.
for the best explanation sequence of the observations over 173–184, July 1983.
time. [7] R.L. Streit and T.E. Luginbuhl, “Maximum likelihood
method for probabilistic multi-hypothesis tracking,” in
Figure 4 demonstrates an example of multiple people
Proceedings of SPIE International Symposium, Signal and
tracking with crossing tracks. The example first shows the
Data Processing of Small Targets, 1994.
lady opens the door for the person in gray shirt, then the
person in dark shirt follows and goes into the area. Figure [8] H. Gauvrit and J.P. Le Cadre, “A formulation of multitar-
get tracking as an incomplete data problem,” IEEE Trans.
4(a) shows the images from the sequence and (b) demon-
on Aerospace and Electronic Systems, vol. 33, no. 4, pp.
strates the tracking result. Interestingly, there is one short
1242–1257, Oct 1997.
track close to the up-left corner of the result image because
one person is standing inside the door and the human de- [9] S. Avidan, “Support vector tracking,” in CVPR01, 2001,
pp. I:184–191.
tection consistently detects him through the glass window.
Therefore, 4 tracks are shown in Figure 4(b), the short track [10] I. Haritaoglu, D. Harwood, and L.S. Davis, “W4s: A real-
for the standing person, the long track for the lady, the light time system for detecting and tracking people in 2 1/2-d,”
track for the guy in gray shirt, and the dark track for the in ECCV98, 1998.
guy in dark shirt. [11] I. Haritaoglu, D. Harwood, and L.S. Davis, “Hydra: Mul-
tiple people detection and tracking using silhouettes,” in
VS99, 1999.
References [12] T. Zhao, R. Nevatia, and F. Lv, “Segmentation and track-
ing of multiple humans in complex situations,” in CVPR01,
[1] J.P. MacCormick and A. Blake, “A probabilistic exclusion 2001, pp. II:194–201.
principle for tracking multiple objects,” in ICCV99, 1999,
[13] C. Stauffer and W.E.L. Grimson, “Learning patterns of
pp. 572–578.
activity using real-time tracking,” PAMI, vol. 22, no. 8,
[2] H. Tao, H.S. Sawhney, and R. Kumar, “A sampling algo- pp. 747–757, August 2000.
rithm for tracking multiple objects,” in Vision Algorithms
99, 1999.
[3] M. Isard and J.P. MacCormick, “Bramble: A bayesian
multiple-blob tracker,” in ICCV01, 2001, pp. II: 34–41.
[4] C. Hue, J.P. Le Cadre, and P. Perez, “Tracking multiple
objects with particle filtering,” IEEE Trans. on Aerospace
and Electronic Systems, vol. 38, no. 3, pp. 791–812, July
2002.