People_tracking_in_RGB-D_data_with_on-line_boosted_target_models

This paper presents a novel approach for people tracking using RGB-D data by integrating a multi-cue person detector with an online learning mechanism. The system employs a multi-hypothesis tracker that adapts to individual target models through on-line boosting, enhancing tracking performance without relying on background learning or ground plane assumptions. Experimental results demonstrate reliable 3D tracking of individuals in populated indoor environments using a setup of multiple Kinect sensors.

Uploaded by

bsy8856

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

People_tracking_in_RGB-D_data_with_on-line_boosted_target_models

Uploaded by

bsy8856

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2011 IEEE/RSJ International Conference on

Intelligent Robots and Systems

September 25-30, 2011. San Francisco, CA, USA

People Tracking in RGB-D Data

With On-line Boosted Target Models
Matthias Luber Luciano Spinello Kai O. Arras

Abstract— People tracking is a key component for

robots that are deployed in populated environments.
Previous works have used cameras and 2D and 3D
range finders for this task. In this paper, we present
a 3D people detection and tracking approach using
RGB-D data. We combine a novel multi-cue person
detector for RGB-D data with an on-line detector that
learns individual target models. The two detectors
are integrated into a decisional framework with a
multi-hypothesis tracker that controls on-line learning
through a track interpretation feedback. For on-line
learning, we take a boosting approach using three
types of RGB-D features and a confidence maximiza-
tion search in 3D space. The approach is general
in that it neither relies on background learning nor
a ground plane assumption. For the evaluation, we
collect data in a populated indoor environment using
a setup of three Microsoft Kinect sensors with a joint Fig. 1. People tracking in RGB-D data. The top pictures show the
field of view. The results demonstrate reliable 3D three color and depth images, below the 3D point cloud. The data
tracking of people in RGB-D data and show how the was collected in the lobby of a large university canteen at lunch
framework is able to avoid drift of the on-line detector time with a setup joining the views of three Kinect sensors. The
colored disks and dots in the point cloud show the positions and
and increase the overall tracking performance. trajectories of five tracked persons.

I. Introduction
People detection and tracking is an important and a fixed pedestrian model. Unlike these works that require
fundamental component for many robots, interactive a ground plane assumption, Spinello et al. [6] overcome
systems and intelligent vehicles. Popular sensors for this this limitation via a voting approach of classified parts
task are cameras and range finders. While both sensing and a top-down verification procedure that learns an
modalities have advantages and drawbacks, their dis- optimal feature set and volume tessellation.
tinction may become obsolete with the availability of In the computer vision literature, the problem of de-
affordable and increasingly reliable RGB-D sensors that tecting, tracking and modeling humans has been exten-
provide both image and range data. sively studied [7], [8], [9], [10]. A major difference to
Many researchers in robotics have addressed the issue range-based systems is that the richness of image data
of detection and tracking people in range data. Early makes is straightforward to learn target appearance mod-
works were based on 2D data in which people have been els. For this reason, visual tracking systems can achieve
detected using ad-hoc classifiers that find moving local good results with methods as simple as independent
minima in the scan [1], [2]. A learning approach has been particle filters with nearest-neighbor data association
taken by Arras et al. [3], where a classifier for 2D point [11]. Dense depth data from stereo are used by Beymer
clouds has been trained by boosting a set of geometric and Konolige [12] to support foreground segmentation in
and statistical features. an otherwise vision-based people detection and tracking
People detection and tracking in 3D range data is a system. They use a set of binary person templates to
rather new problem with little related work. Navarro et detect people in images and demonstrate multi-person
al. [4] collapse the 3D scan into a virtual 2D slice to tracking with learned appearance-based target models.
find salient vertical objects above ground and classify a The work of [13], [14] detect people in intensity images
person by a set of SVM classified features. Bajracharya and track them in 3D. In [15] a stereo system for combin-
et al. [5] detect people in point clouds from stereo vision ing intensity images, stereo disparity maps, and optical
by processing vertical objects and considering a set of flow is used to detect people. Multi-modal detection and
geometrical and statistical features of the cloud based on tracking of people is performed in [16] where a trainable
2D range data and camera system is presented.
All authors are with the Social Robotics Lab, Depart-
ment of Computer Science, University of Freiburg, Germany This paper advances the state of the art in the fol-
{luber,spinello,arras}@informatik.uni-freiburg.de. lowing aspects. First, we address the novel problem of

978-1-61284-456-5/11/$26.00 ©2011 IEEE 3844

Authorized licensed use limited to: The University of Toronto. Downloaded on October 28,2022 at 02:53:05 UTC from IEEE Xplore. Restrictions apply.
detecting and tracking people in RGB-D data. We com- and size of the bounding boxes in the depth images. They
bine an a priori person detector with an on-line learned are the observations zi (t) that constitute the set of mk
person detector and a multi-hypothesis tracker (MHT), observations Z(t) at time index t.
able to estimate the motion state of multiple people in
3D. Learning individual target models is a new aspect III. On-line Boosting
to range data-based object tracking that usually deals The detector described in the previous section learns a
with targets of identical appearance. To this end, we generic person model from a priori labeled data. In this
adapt the on-line learning method from Grabner et al. section, we describe the use of on-line boosting to learn
[17] to RGB-D data. We present a novel framework to target appearance models in RGB-D data, later used to
integrate the two detectors and the tracker that involves guide data association in the tracking system.
a track interpretation feedback to control learning. This Boosting is a widely used technique to improve the
enables the system to bridge gaps of misdetections of accuracy of learning algorithms. Given training samples
the a priori detector and handle target occlusions while x with labels y, a strong classifier H(x) is computed as
avoiding drift of the on-line detector. Finally, we give linear combination of a set of weighted hypotheses called
quantitative results using the CLEAR MOT performance weak classifiers h(x). The discrete AdaBoost algorithm
metric. Unlike the above mentioned works that integrate by Freund and Shapire [19] belongs to the most popular
multiple sensory modalities, we consider image and range boosting algorithms. The method trains weak classifiers
data as equally important cues for detection, tracking, from labeled training samples (x, y), initialized with
and target model adaptation. We further present a novel uniform weights wi associated to each x. Learning is done
integration framework to effectively combine a tracker in rounds where the weights are updated based on the
with on-line learned target classifiers. mistakes of the previous weak learner. By increasing the
The paper is structured as follows: the a priori peo- weights of the wrongly classified samples the algorithm
ple detector is briefly summarized in the next section focuses on the difficult examples.
followed by the description of our on-line AdaBoost On-line boosting, initially proposed by Oza and Russell
learning approach for target appearances in RGB-D data [20], processes each training instance “on arrival” without
in Section III. The integration of this learning procedure the need of storage and reprocessing, and maintains a
into the tracking system is described in Section IV. current hypothesis that reflect all the training samples
Section V describes the experiments and gives the results. seen so far. The approach has been applied for object
Section VI concludes the paper. detection while tracking by Grabner et al. [17]. We build
upon the latter to develop our on-line people detector in
II. Detection of People in 3D Range Data RGB-D data.
In this section we briefly summarize the a priori A. Updating the Weak Classifiers
people detector used in this paper. We rely on a novel Unlike the off-line approach to boosting, the on-line
RGB-D person detector called Combo-HOD (Combined algorithm presents training samples only once and dis-
Histograms of Oriented Depths and Gradients). The cards them after training. The weak classifiers have thus
method takes inspiration from Histogram of Oriented to be updated in an on-line fashion each time a new
Gradients (HOG) introduced by Dalal and Triggs [7] and training sample is available. As the difficulty of the
combines the HOG detector in the color image with a samples is not known in advance the computation of
novel approach in the depth image called Histograms of the weight distribution of the samples is a critical issue.
Oriented Depths (HOD). The basic idea of on-line boosting is that the weight of
Since RGB-D data contains both color and depth a sample (called importance λ in this context) can be
information, the Combo-HOD detector combines the two estimated by propagating it through a fixed chain of
sensory cues. HOD descriptors are computed in the weak classifiers [20]. If the sample is misclassified, λ is
depth image and HOG descriptors are computed in the increased proportional to the error of the weak classifier.
color image. They are fused on the level of detections Therefore, the importance has the same effect as the
via a weighted mean of the probabilities obtained by adapted weight in the off-line approach. The error of
a sigmoid fitted to the SVM outputs. HOD includes the i-th weak classifiers is estimated from the summed
a depth-informed scale-space search in which the used weights of the correctly (λcorr
i ) and wrongly (λwrong
i )
scales in an image are first collected and then tested classified samples,
for compatibility with the respective depth. This test is
made particularly efficient by the use of integral tensors, λwrong
i
ei = . (1)
an extension of integral images over several scales. This (λwrong
i + λcorr
i )
strategy dramatically reduces the number of descriptors
computed in the image at improved detection rates. For B. On-line-boosting for Feature Selection
more details, the reader is referred to [18]. For the purpose of learning target models during track-
The output of the detector in each step are the posi- ing, Grabner et al. [17] propose feature selectors. The
tions and size of all targets in 3D space and the center main idea is to apply on-line boosting not directly to

3845

Authorized licensed use limited to: The University of Toronto. Downloaded on October 28,2022 at 02:53:05 UTC from IEEE Xplore. Restrictions apply.
mentioned adaptation mechanism, their relative numbers
can change to best describe a target dynamically.
The features are computed in rectangular areas sam-
pled with randomized positions and scales in the bound-
ing box associated to each target. This is done once at
initialization and then kept fix over the lifetime of a
target (up to the weak feature that get replaced). The
best ten features of two persons are shown in Fig. 2.
D. On-line Boosting for Tracking
Fig. 2. Bounding boxes of two detected persons in the RGB On-line boosting enables a tracker to continuously
and depth images. The ten best features of each on-line detector update a target model to optimally discriminate it from
are marked with colored rectangles. Haar-like features calculated the current background. This is a formulation of tracking
on the intensity image are shown in green and Haar-like features
computed on the depth image are marked in red. The Lab color as a classification problem [22] which is implemented by
features calculated on the RGB image are depicted in blue. a confidence maximization procedure around the current
tracking region . The region is obtained as the bounding
box of the previous detection. All features within the
the weak classifiers but to the selectors. A selector hsel
region are considered the positively labeled foreground
selects the best weak classifier from a pool of M weak
samples. The negative samples are obtained by sweep-
learners F with ‘best’ being defined by the lowest error.
ing the bounding box over a local neighborhood. The
With the number of selectors N being a fix parameter,
classifier is then evaluated at each sweep position of this
the following procedure is repeated for all selectors when
neighborhood yielding a confidence map whose maximum
a new sample (x, y) arrives: First, all weak classifiers are
is taken as the new position of the tracking region. The
updated and the best one, denoted m+ , is selected
classifier is updated in this region and the process is
hsel weak
n (x) = hm+ (x) (2) continued. The evolution of the confidence values over
time can be seen in Fig. 5.
with m+ = arg minm (en,m ) and en,m defined like Eq. 1
Unlike [17] where the new region is bootstrapped from
with subscript n, m for i. Then, the voting weight
the previous detection, we use the bounding box position
αn = 21 · ln( 1−e
en ) is computed where en = en,m
n
+ and
of the a priori detector to recenter the on-line detector.
the updated importance weight λ is propagated to the
This strategy avoids a key problem of on-line adaptation
next selector hsel
n+1 . Similar to AdaBoost, λ is increased namely drifting of the model to background, clutter, or
if hsel
n predicts x correctly and decreased otherwise. other targets.
The strong classifier is finally obtained by computing
the confidence as a linear combination of the N selectors IV. Integration into the Tracking System
and applying the signum function, In this section we describe how the on-line detector
N
X is integrated into a Kalman filter based multi-hypothesis
κ(x) = (αn · hsel
n (x)) , H(x) = sign(κ(x)). (3) tracking framework (MHT). For reasons of limited space,
n=1 we will only discuss the aspects that change in the MHT,
Unlike the off-line version, the on-line procedure creates refer to [23], [24] for more details.
an always-available strong classifier in a any-time fashion. In short, the MHT algorithm hypothesizes about the
In order to increase the diversity of the classifier pool target states by considering all statistically feasible as-
F and to adapt to appearance changes of the targets, signments between measurements and tracks and all
at the end of each iteration, the worst weak classifier is possible interpretations of measurements as false alarms
replaced by one randomly chosen from F. or new track and tracks as matched, occluded or obsolete.
Thereby, the MHT handles the entire life-cycle of tracks
C. Features from creation and confirmation to occlusion and deletion.
We take advantage of the richness of RGB-D data Formally, let ξ(t) = (xt yt zt ẋt ẏt żt )T be the filtered
by computing three types of features that correspond state of a track t at time t with position and velocity
to the weak classifiers: Haar-like features [21] in the information in 3D and Σ its associated 6 × 6 covariance.
intensity image (converted from the RGB values), Haar- Let Z(t) = {zi (t)}m i=1 be the set of mt observations which
t

like features in the depth image, and illumination agnos- in our case is the set of detected people in RGB-D data.
tic Lab color features in the RGB image. Lab features Observations consist in a 3D position from the a priori
are computed by summing up the intensity values in detector zi (t) and a training sample xi (t) from the on-
a* (b*) space under the area. The advantage of the line detector. The sample xi (t)) is a vector of stacked
Lab color model is that features in a* or b* space can features values computed in the rectangular areas within
compactly and robustly subsume entire RGB histograms. the current tracking region.
A total of M features is computed where the initial Let Ωl (t) be the l-th hypothesis at time t and Ωt−1 p(l)
number of features is M/3 for all types. Given the above the parent hypothesis from which Ωl (t) was derived. Let

3846

Authorized licensed use limited to: The University of Toronto. Downloaded on October 28,2022 at 02:53:05 UTC from IEEE Xplore. Restrictions apply.
Detection
hypotheses z
A Priori
Detector
3D target
Sensory Bounding box
motion states
data positions Multi-Hypothesis
Tracker

On-line
Detector Detection
Measurement and track
hypotheses z* interpretations
Data (matched, new, occluded)
Controls

Fig. 4. The setup consisting in three vertically mounted Kinect

Fig. 3. The decisional framework to integrate both detectors and sensors offering a joint field of view of 130◦ × 50◦ and supplying
the tracking system RGB-D data with a resolution of 1440 × 640 pixels at 30 Hz . They
are mounted at 1.2 m height.

further ψj (t) denote a set of assignments which associates

predicted tracks to measurements in Z(t). In each cycle, boosting algorithm to create and update the strong
the method tries to associate the tracks in the parent classifiers:
hypotheses of the previous step to the new set Z(t), • When an observation znew is declared as a new
producing all possible assignment sets ψ(t) that each give target, a new track tnew is initialized and a new
birth to a child hypothesis that branches off its parent. strong classifier Hnew is created at the bounding box
This results in an exponentially growing hypothesis tree. position of the hypothesis of the a priori detector.
Most practical MHT implementations prune the tree by • When an existing target ti is associated to an ob-
Murty’s algorithm able to generate and evaluate the servation zj (t), the strong classifier Hit is updated
current k best hypotheses in polynomial time. using the features xj (t) calculated within the new
bounding box of the a priori detector. The on-
A. Joint Likelihood Data Association line detector is centered at this new bounding box
The measurement likelihood in the regular MHT position.
p(zi (t)|ψjt , Ωt−1
p(l) ) consists in two terms, one for observa- • When the MHT declares a track as occluded, there
tions interpreted as new tracks and false alarms (which are two possible reasons: an occlusion or a misdetec-
we leave unchanged) and a second one for matched tion. To cope with both cases, we proceed as follows:
observations zi (t) that follows the Gaussian likelihood Given the on-line learned model, we search for
model centered on the measurement prediction ẑj (t) with targets without valid observations by centering a
innovation covariance matrix Sij (t), p(zi (t)|ψjt , Ωt−1 p(l) ) = 3D confidence map around the motion prediction
N (zi (t) ; ẑj (t), Sij (t)). This likelihood quantifies how well of the Kalman filter. The map size is proportional
an observation matches a predicted measurement based to the uncertainty of the prediction, the confidence
on position and velocity. values are calculated using the projections of the
Here, the on-line classifier H adds an appearance 3D positions into image space. This is unlike [17] in
likelihood that expresses how much the observed target’s which this search is carried out in image space and
appearance matches the learned model. We thus have a with a fixed-size search window. If a high-confidence
joint likelihood that accounts for both motion state and match can be found, we interpret the event as a
appearance. With xi (t) being the feature descriptor of misdetection and make the confidence maximum an
zi (t), zi (t) = (zi (t), xi (t)), and assuming independence observation z∗ (t). Otherwise, we interpret the event
between the two terms, as a target occlusion and stop on-line learning of
the corresponding strong classifier until the target
p(zi (t)|ψjt , Ωt−1
p(l) , H
t−1 t−1
) = p(zi (t)|ψjt , Ωp(l) ) (4)
reappears. This strategy also avoids drifting of the
t−1
· p(xi (t)|H ). model to background, clutter, or other targets.
We also model the appearance likelihood to be a Gaus- Observations z∗ (t) from the on-line detector are treated
sian pdf centered on the maximum confidence of the like regular observations for the MHT.
strong classifier (which is 1)
V. Experiments
t−1
p(xi (t) | H )= N (κ(xi (t)) ; 1, σa2 ), (5) To evaluate and compare the different detector ap-
where σa2
is the variance of the Gaussian and a smoothing proaches, we collected a large-scale indoor data set with
parameter to trade off the two likelihoods. unscripted behavior of people. The data set has been
taken in the lobby of a large university canteen at lunch
B. Feeding Data Association Back to On-line Boosting time. The a priori detector has been trained with an addi-
In each cycle, the tracker produces assignments of mea- tional background data set collected in another, visually
surements to tracks and interpretations of measurements different university building. This is to avoid detector
as new tracks or false alarms and of track as occluded bias towards the visual appearance of the canteen lobby,
or deleted. This information directly serves the on-line especially since we acquired the data from a stationary

3847

Authorized licensed use limited to: The University of Toronto. Downloaded on October 28,2022 at 02:53:05 UTC from IEEE Xplore. Restrictions apply.
Fig. 5. Evolution of the confidence of the on-line detector. The
top image shows the confidences over the life cycle of a track.
After initialization the values achieves steady state. Person 2 is
occluded twice between frames 172 to 185 and frames 192 to 199.
Thanks to the feedback from the MHT tracker, the on-line detector
pauses its adaptation. This strategy avoids drifting of the model to
Fig. 6. Visualization of the 3D point cloud produced by the three
background, clutter, or other targets. When the person reappears,
Kinect sensors including the positions and trajectories of eight of 31
adaptation is resumed immediately with high confidence.
tracks in the data set. The colored disks mark the current Kalman
filter estimates of the target positions, the small dots show their
past trajectories. The tracker maintains full 3D estimates as it can
sensor. The data set has been manually annotated to be seen by the dark blue trajectory of the subject coming down the
include the bounding box in 2D depth image space, the stairs.
visibility of subjects (fully visible/partially occluded),
and the data association ground truth of the tracks. A of view without interference with other targets. After
total of 3021 instances of people in 1133 frames and 31 an initialization phase of nearly ten frames, the on-line
tracks have been labeled. The data set will be made detector has adapted to its appearance and achieves
available on the laboratory webpage at publication date steady state at a value of around 0.8. Person 2 undergoes
of this paper. two occlusions. During the occlusions the confidence
The sensory setup for data collection is shown in Fig. 4. values drop immediately, indicating that the target is no
It consists in three vertically mounted Kinect sensors longer visible. As the MHT correctly declares the target
that jointly extend the field of view to 130◦ × 50◦ . as occluded, adaptation of H is paused and resumed
Measures have been taken to calibrate the intrinsics and with high confidences after the person reappears. We
extrinsics of the setup and to guarantee synchronized have further investigated the usage statistics of the three
acquisition of the three images at frame rate. features types of the on-line detector. They are generally
The parameters of the MHT have been learned from a used with similar frequency and importance.
training data set over 600 frames. The detection proba- We then compare the on-line boosting approach to the
bility is set to pdet = 0.99 and the termination likelihood baseline using the CLEAR MOT metrics. The results
to λdel = 30. The average rates of new tracks and show a clear improvement of all values except for the
false alarms are determined to be λnew = 0.001 and number of false positives (see Table I). We manually
λfal = 0.005, respectively. Further, the maximal number inspected the behavior of the tracker and discuss the
of hypothesis NHyp is set to 100. The strong classifiers of insights gained.
the targets are based on 50 selectors which are trained The strongest impact of the presented approach is the
with 50 weak hypotheses. reduction of the number of missed targets by 50%. This
To assess the impact of the on-line boosting onto the improvement is caused by the on-line observations z∗ .
tracking performance we run the tracker with the a priori When the a priori detector fails to detect an existing
detector only to obtain a baseline. All following runs are track in several consecutive frames, the best MHT hy-
then compared using the CLEAR MOT metrics [25]. The pothesis will eventually (and wrongly) declare the track
metric counts three numbers with respect to the ground as deleted. When this happens, the miss count (FN)
truth that are incremented at each frame: misses (missing is increased at each frame until the detector finds the
tracks that should exist at a ground truth position, FN), target again and creates a new track. This is where the
false positives (tracks that should not exist, FP), and z∗ observations come into play by detecting the target
mismatches (track identifier switches, ID). The latter from the on-line learned model. Given a z∗ , the MHT
value quantifies the ability to deal with occlusion events can match the target and correctly continue the track.
that typically occur when tracking people. From these This benefit comes at the expense of a delayed deletion
numbers, two values are determined: MOTP (avg. metric of tracks that are incorrectly created from wrong false
distance between estimated targets and ground truth) positives of the a priori detector. In this case, the on-
and MOTA (avg. number of times of a correct tracking line detector tries to continue the track with the same
output with respect to the ground truth). We ignore strategy leading to a increase of the number of false
MOTP as it is based on a metric ground truth of target positives (FP) by 19%. We observed that this happens
positions which is unreliable in our data. for recurring false positive detections on static objects on
which the on-line detector can particularly well adapt.
A. Results The improvement in the number of id switches (ID) is
First, we analyze the confidence values of the strong achieved by the joint likelihood model that guides data
classifier H and the integration framework in different association in situations of interacting and thus occluding
situations (see Fig. 5). Person 1 traverses the sensor field targets. The fact that this number is not higher is due

3848

Authorized licensed use limited to: The University of Toronto. Downloaded on October 28,2022 at 02:53:05 UTC from IEEE Xplore. Restrictions apply.
FN FP ID MOTA [3] K. O. Arras, O. Martı́nez Mozos, and W. Burgard, “Using
Baseline 1502 168 42 62% boosted features for the detection of people in 2d range data,”
On-line boosting 751 201 32 78% in Int. Conf. on Robotics and Automation (ICRA), 2007.
[4] L. Navarro-Serment, C. Mertz, and M. Hebert, “Pedestrian
Improvement 50% -19% 24% 16% detection and tracking using three-dimensional LADAR data,”
in International Conference on Field and Service Robotics,
TABLE I
Cambridge, USA, 2009.
CLEAR MOT results. [5] M. Bajracharya, B. Moghaddam, A. Howard, S. Brennan, and
L. Matthies, “Results from a real-time stereo-based pedestrian
detection system on a moving vehicle,” in Workshop on People
to the unscripted behavior of people in our data set. At Detection and Tracking, IEEE ICRA, Kobe, Japan, 2009.
[6] L. Spinello, M. Luber, and K. O. Arras, “Tracking people in 3D
the particular place of data collection, subjects mainly using a bottom-up top-down people detector,” in Int. Conf. on
walked past rather than creating situations that stress Robotics and Automation (ICRA), Shanghai, China, 2011.
the occlusion handling capability of the tracker. [7] N. Dalal and B. Triggs, “Histograms of oriented gradients for
human detection,” in Proc. of the IEEE Conf. on Comp. Vis.
VI. Conclusions and Pat. Rec. (CVPR), San Diego, USA, 2005.
[8] B. Leibe, E. Seemann, and B. Schiele, “Pedestrian detection
In this paper we presented a novel 3D people detection in crowded scenes,” in Proc. of the IEEE Conf. on Comp. Vis.
and tracking approach in RGB-D data. We combined and Pat. Rec. (CVPR), San Diego, USA, 2005.
[9] P. Felzenszwalb, D. McAllester, and D. Ramanan, “A dis-
on-line learning of target appearance models using three criminatively trained, multiscale, deformable part model,” in
types of RGB-D features with multi-hypothesis tracking. Proc. of the IEEE Conf. on Comp. Vis. and Pat. Rec. (CVPR),
We proposed an decisional framework to integrate the on- Anchorage, USA, 2008.
[10] M. Enzweiler and D. Gavrila, “Monocular pedestrian detec-
line person detector, an off-line learned a priori detector tion: Survey and experiments,” IEEE Trans. on Pattern Anal-
and a multi-hypothesis tracker. The framework enables ysis and Machine Intell. (PAMI), vol. 31, no. 12, 2009.
the tracker to support the on-line classifier in training [11] M. D. Breitenstein, F. Reichlin, B. Leibe, E. Koller-Meier,
and L. V. Gool, “Online multi-person tracking-by-detection
only on the correct samples and to guide data association from a single, uncalibrated camera,” IEEE Trans. on Pattern
via a joint motion and appearance likelihood. It also Analysis and Machine Intell. (PAMI), vol. 33, no. 9, 2011.
avoids the key problem of on-line adaptation namely [12] D. Beymer and K. Konolige, “Real-time tracking of multiple
people using stereo,” in ICCV Workshop on Frame-rate Ap-
drifting of models to background, clutter, or other targets plications, Kerkyra, Greece, 1999.
by resetting the detection window at the location of the [13] A. Ess, B. Leibe, K. Schindler, and L. V. Gool, “Robust multi-
a priori detector and pausing adaptation in case of occlu- person tracking from a mobile platform,” IEEE Trans. on
Pattern Analysis and Machine Intell. (PAMI), vol. 31, no. 10,
sions. The framework further allows to fill gaps of false pp. 1831–1846, 2009.
negatives from the a priori detector by observations of [14] B. Leibe, K. Schindler, N. Cornelis, and L. V. Gool, “Coupled
the on-line detectors found by confidence maximization object detection and tracking from static cameras and moving
vehicles,” IEEE Trans. on Pattern Analysis and Machine
search in 3D space. Intell. (PAMI), pp. 1683–1698, 2008.
The experiments show a clear overall improvement of [15] M. Enzweiler, A. Eigenstetter, B. Schiele, and D. Gavrila,
the tracking performance, particularly in the number “Multi-cue pedestrian classification with partial occlusion han-
dling,” in Proc. of the IEEE Conf. on Comp. Vis. and Pat. Rec.
of missed tracks and also in the number of identifier (CVPR), 2010.
switches. They demonstrate that the on-line classifier [16] L. Spinello, R. Triebel, and R. Siegwart, “Multiclass mul-
contributes to find the correct observations in cases when timodal detection and tracking in urban environments,”
Int. Journal of Robotics Research, vol. 29, no. 12, pp. 1498–
the a priori detector fails. This reduces the number of 1515.
missed tracks by 50%. Further, the joint data association [17] H. Grabner and H. Bischof, “On-line boosting and vision,” in
likelihood helps to decrease the number of track identifier Proc. of the IEEE Conf. on Comp. Vis. and Pat. Rec. (CVPR),
New York, USA, 2006.
switches by 24%. The overall tracking accuracy (MOTA) [18] L. Spinello and K. O. Arras, “People detection in RGB-D
is improved by 16%. data,” in Proc. of the IEEE/RSJ Int. Conf. on Intelligent
Future work will focus on the collection and annota- Robots and Systems (IROS), San Francisco, USA, 2011.
[19] Y. Freund and R. Schapire, “A decision-theoretic generaliza-
tion of more RGB-D data sets containing a variety of tion of on-line learning and an application to boosting,” in
challenging social situations that stress more aspects of Computational Learning Theory, 1995.
this approach. [20] N. C. Oza and S. Russell, “Online bagging and boosting,” in
Artificial Intelligence and Statistics, 2001, pp. 105–112.
Acknowledgment [21] P. Viola and M. Jones, “Rapid object detection using a boosted
cascade of simple features,” Proc. of the IEEE Conf. on Comp.
This work has been supported by the German Research Vis. and Pat. Rec. (CVPR), vol. 1, pp. 511–518, 2001.
Foundation (DFG) under contract number SFB/TR-8. [22] S. Avidan, “Support vector tracking,” IEEE Trans. on Pattern
Analysis and Machine Intell. (PAMI), vol. 26, no. 8, 2004.
References [23] D. B. Reid, “An algorithm for tracking multiple targets,” IEEE
Transactions on Automatic Control, vol. 24, no. 6, 1979.
[1] A. Fod, A. Howard, and M. Matarı́c, “Laser-based people [24] I. J. Cox and S. L. Hingorani, “An efficient implementation of
tracking,” in Int. Conf. on Robotics and Automation (ICRA), reid’s multiple hypothesis tracking algorithm and its evalua-
2002. tion for the purpose of visual tracking,” IEEE Trans. Pattern
[2] D. Schulz, W. Burgard, D. Fox, and A. Cremers, “People track- Anal. Mach. Intell. (PAMI), vol. 18, no. 2, pp. 138–150, 1996.
ing with a mobile robot using sample-based joint probabilistic [25] K. Bernardin and R. Stiefelhagen, “Evaluating multiple object
data association filters,” International Journal of Robotics tracking performance: the CLEAR MOT metrics,” EURASIP
Research (IJRR), vol. 22, no. 2, pp. 99–116, 2003. Journal on Image and Video Processing, vol. 2008, 2008.

3849

Authorized licensed use limited to: The University of Toronto. Downloaded on October 28,2022 at 02:53:05 UTC from IEEE Xplore. Restrictions apply.

American - Family and Friends Grade.3 Lesson - Plans PDF
100% (1)
American - Family and Friends Grade.3 Lesson - Plans PDF
363 pages
2022 Visual Object Tracking A Survey
No ratings yet
2022 Visual Object Tracking A Survey
42 pages
Person Follower
No ratings yet
Person Follower
6 pages
Vision Based Person Tracking With A Mobile Robot
No ratings yet
Vision Based Person Tracking With A Mobile Robot
10 pages
RGB-D Human Detection and Tracking For Industrial Environments
No ratings yet
RGB-D Human Detection and Tracking For Industrial Environments
14 pages
Activity Recognition: Fundamentals and Applications
From Everand
Activity Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
robotics
No ratings yet
robotics
3 pages
Journal of Robotics - 2023 - Nguyen - Study on Tracking Real‐Time Target Human Using Deep Learning for High Accuracy
No ratings yet
Journal of Robotics - 2023 - Nguyen - Study on Tracking Real‐Time Target Human Using Deep Learning for High Accuracy
11 pages
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Personfollowingrobot CNN Icvs2017
No ratings yet
Personfollowingrobot CNN Icvs2017
14 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Vision Based Human Tracking From A Mobile Robotic Platform
No ratings yet
Vision Based Human Tracking From A Mobile Robotic Platform
2 pages
Vision Based People Tracking For Ubiquitous Augmented Reality Applications
No ratings yet
Vision Based People Tracking For Ubiquitous Augmented Reality Applications
2 pages
Vision_Sensor-Based_Shoe_Detection_for_Human_Tracking_in_a_HumanRobot_Coexisting_Environment_A_Photometric_Invariant_Approach_Using_DBSCAN_Algorithm
No ratings yet
Vision_Sensor-Based_Shoe_Detection_for_Human_Tracking_in_a_HumanRobot_Coexisting_Environment_A_Photometric_Invariant_Approach_Using_DBSCAN_Algorithm
11 pages
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
IET Image Processing - 2021 - Tsai - A Robust Tracking Algorithm for a Human‐Following Mobile Robot
No ratings yet
IET Image Processing - 2021 - Tsai - A Robust Tracking Algorithm for a Human‐Following Mobile Robot
11 pages
People-Tracking-By-Detection and People-Detection-By-Tracking
No ratings yet
People-Tracking-By-Detection and People-Detection-By-Tracking
8 pages
Pedestrian Detection and Tracking
No ratings yet
Pedestrian Detection and Tracking
13 pages
Multi-Modal People Detection From A Mobile Robot in Crowded Scenes
No ratings yet
Multi-Modal People Detection From A Mobile Robot in Crowded Scenes
82 pages
Online Learning For 3D LiDAR-based Human Detection
No ratings yet
Online Learning For 3D LiDAR-based Human Detection
19 pages
Person Tracking and Following With 2D Laser Scanners: Angus Leigh, Joelle Pineau, Nicolas Olmedo, and Hong Zhang
No ratings yet
Person Tracking and Following With 2D Laser Scanners: Angus Leigh, Joelle Pineau, Nicolas Olmedo, and Hong Zhang
8 pages
Real Time Robust Human Detection and Tracking System: Jianpeng Zhou and Jack Hoang I3DVR International Inc
No ratings yet
Real Time Robust Human Detection and Tracking System: Jianpeng Zhou and Jack Hoang I3DVR International Inc
8 pages
Human Detection Using Depth Information by Kinect
No ratings yet
Human Detection Using Depth Information by Kinect
8 pages
Portmann Et Al. (2014)
No ratings yet
Portmann Et Al. (2014)
8 pages
Efficient Detection and Tracking of Human Using 3D LiDAR Sensor
No ratings yet
Efficient Detection and Tracking of Human Using 3D LiDAR Sensor
12 pages
An Adaptable System For RGB-D Based Human Body Detection and Pose Estimation
No ratings yet
An Adaptable System For RGB-D Based Human Body Detection and Pose Estimation
44 pages
OPTICS-Based_Template_Matching_for_Vision_Sensor-Based_Shoe_Detection_in_HumanRobot_Coexisting_Environments
No ratings yet
OPTICS-Based_Template_Matching_for_Vision_Sensor-Based_Shoe_Detection_in_HumanRobot_Coexisting_Environments
9 pages
Adapt: Real-Time Adaptive Pedestrian Tracking For Crowded Scenes
No ratings yet
Adapt: Real-Time Adaptive Pedestrian Tracking For Crowded Scenes
8 pages
A Monocular Vision-Based Specific Person
No ratings yet
A Monocular Vision-Based Specific Person
10 pages
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
Full Body Recognisation
No ratings yet
Full Body Recognisation
10 pages
Optical Braille Recognition: Empowering Accessibility Through Visual Intelligence
From Everand
Optical Braille Recognition: Empowering Accessibility Through Visual Intelligence
Fouad Sabry
No ratings yet
06748981
No ratings yet
06748981
10 pages
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
From Everand
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
Fouad Sabry
No ratings yet
Geometric Feature Learning: Unlocking Visual Insights through Geometric Feature Learning
From Everand
Geometric Feature Learning: Unlocking Visual Insights through Geometric Feature Learning
Fouad Sabry
No ratings yet
Human-Following of Mobile Robots Based On Object Tracking and Depth Vision
No ratings yet
Human-Following of Mobile Robots Based On Object Tracking and Depth Vision
5 pages
Automatic Target Recognition: Fundamentals and Applications
From Everand
Automatic Target Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
多模态目标跟踪综述
No ratings yet
多模态目标跟踪综述
40 pages
Harley Track Check Repeat An EM Approach To Unsupervised Tracking CVPR 2021 Paper
No ratings yet
Harley Track Check Repeat An EM Approach To Unsupervised Tracking CVPR 2021 Paper
11 pages
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
From Everand
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
Fouad Sabry
No ratings yet
Computer Vision: Fundamentals and Applications
From Everand
Computer Vision: Fundamentals and Applications
Fouad Sabry
No ratings yet
Zhang 2020
No ratings yet
Zhang 2020
5 pages
2207.04551v2
No ratings yet
2207.04551v2
38 pages
Computer Vision: Exploring the Depths of Computer Vision
From Everand
Computer Vision: Exploring the Depths of Computer Vision
Fouad Sabry
No ratings yet
2D vs. 3D LiDAR-based Person Detection on Mobile Robots
No ratings yet
2D vs. 3D LiDAR-based Person Detection on Mobile Robots
8 pages
A Person Following Algorithm For Use With A Single Forward Facing
No ratings yet
A Person Following Algorithm For Use With A Single Forward Facing
67 pages
MMPTRACK Large-scale Densely Annotated Multi-came
No ratings yet
MMPTRACK Large-scale Densely Annotated Multi-came
10 pages
Articulated Body Pose Estimation: Unlocking Human Motion in Computer Vision
From Everand
Articulated Body Pose Estimation: Unlocking Human Motion in Computer Vision
Fouad Sabry
No ratings yet
Sahdev Raghavender 2017-05!22!05!13!13 Icvss2017 Sahdev Stereowheels Updated
No ratings yet
Sahdev Raghavender 2017-05!22!05!13!13 Icvss2017 Sahdev Stereowheels Updated
1 page
People Counting With Stereo Cameras: Two Template-Based Solutions
No ratings yet
People Counting With Stereo Cameras: Two Template-Based Solutions
5 pages
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet
Active Target Search For Autonomous Person Following
No ratings yet
Active Target Search For Autonomous Person Following
7 pages
YOLO Based Real Time Human Detection Using Deep Learning
No ratings yet
YOLO Based Real Time Human Detection Using Deep Learning
9 pages
Joint Detection and Tracking
No ratings yet
Joint Detection and Tracking
7 pages
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
From Everand
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
Fouad Sabry
No ratings yet
Real-Time People Tracking in A Camera Network: Wasit Limprasert, Andrew Wallace, and Greg Michaelson
No ratings yet
Real-Time People Tracking in A Camera Network: Wasit Limprasert, Andrew Wallace, and Greg Michaelson
9 pages
Real-Time Object Detection Using Deep Learning: Journal of Advances in Mathematics and Computer Science June 2023
No ratings yet
Real-Time Object Detection Using Deep Learning: Journal of Advances in Mathematics and Computer Science June 2023
10 pages
Adaptive Probabilistic Visual Tracking with Incremental Subspace Update 1st edition by David Ross, Jongwoo Lim, Ming Hsuan Yang ISBN 3540219835 9783540219835 instant download
100% (3)
Adaptive Probabilistic Visual Tracking with Incremental Subspace Update 1st edition by David Ross, Jongwoo Lim, Ming Hsuan Yang ISBN 3540219835 9783540219835 instant download
39 pages
Adaptive Probabilistic Visual Tracking with Incremental Subspace Update 1st edition by David Ross, Jongwoo Lim, Ming Hsuan Yang ISBN 3540219835 9783540219835instant download
100% (6)
Adaptive Probabilistic Visual Tracking with Incremental Subspace Update 1st edition by David Ross, Jongwoo Lim, Ming Hsuan Yang ISBN 3540219835 9783540219835instant download
52 pages
Articulo 2
No ratings yet
Articulo 2
5 pages
Navigation in Crowded Spaces Using Trajectory Prediction
No ratings yet
Navigation in Crowded Spaces Using Trajectory Prediction
3 pages
Class X - Chapter 1 A Letter To God - Janet Asokan - 0
No ratings yet
Class X - Chapter 1 A Letter To God - Janet Asokan - 0
6 pages
CAE Writing
100% (2)
CAE Writing
13 pages
MATHEMATICS2 Summative Test Quarter 2 With Tos
No ratings yet
MATHEMATICS2 Summative Test Quarter 2 With Tos
5 pages
Training Need Annual Appraisal 2021-22 (Updated Till 24th March 2022) (11945)
No ratings yet
Training Need Annual Appraisal 2021-22 (Updated Till 24th March 2022) (11945)
24 pages
T e 1685052371a Esl Curriculum Level 4 Lesson 8 Powerpoint Ver 1
No ratings yet
T e 1685052371a Esl Curriculum Level 4 Lesson 8 Powerpoint Ver 1
49 pages
business value ppt
No ratings yet
business value ppt
50 pages
Error in Upgrading SQL From 2016 To 2017
No ratings yet
Error in Upgrading SQL From 2016 To 2017
6 pages
Transactional Conversation: Simple Past Tense & Present Perfect Tense
No ratings yet
Transactional Conversation: Simple Past Tense & Present Perfect Tense
16 pages
2nd Grade Lesson Plan 1
No ratings yet
2nd Grade Lesson Plan 1
3 pages
Curriculum Map in 21 ST Century Literatu
No ratings yet
Curriculum Map in 21 ST Century Literatu
7 pages
Taskalfa 2551ci
No ratings yet
Taskalfa 2551ci
536 pages
Iustitia Dei A History of the Christian Doctrine of Justification 4th Edition Mcgrath - Download the ebook today and own the complete content
No ratings yet
Iustitia Dei A History of the Christian Doctrine of Justification 4th Edition Mcgrath - Download the ebook today and own the complete content
80 pages
Western Colleges Inc. Naic Cavite: Final Exam May 04, 2020
No ratings yet
Western Colleges Inc. Naic Cavite: Final Exam May 04, 2020
2 pages
Educ 214 (Yeba)
No ratings yet
Educ 214 (Yeba)
3 pages
22330 2018 Winter Model Answer Paper
No ratings yet
22330 2018 Winter Model Answer Paper
29 pages
little-peter-rabbit-1-activities-with-music-songs-nursery-rhymes-workshe_97457
No ratings yet
little-peter-rabbit-1-activities-with-music-songs-nursery-rhymes-workshe_97457
2 pages
Past paper questions - geometry-circle-theorems
No ratings yet
Past paper questions - geometry-circle-theorems
21 pages
AN INTRODUCTION TO FUNCTIONAL SYSTEMIC LINGUISTICS - SUMMARY Eggins
No ratings yet
AN INTRODUCTION TO FUNCTIONAL SYSTEMIC LINGUISTICS - SUMMARY Eggins
7 pages
The Epic Poem & The Epic Hero
No ratings yet
The Epic Poem & The Epic Hero
27 pages
Agam DSA Project
No ratings yet
Agam DSA Project
24 pages
A Short Guide To Writing About Film
No ratings yet
A Short Guide To Writing About Film
15 pages
E4 - Apostila 2024
No ratings yet
E4 - Apostila 2024
41 pages
English: Individual Home Learning Plan
100% (2)
English: Individual Home Learning Plan
4 pages
Sri Vedantha Desika Vidya: (Na Deivam Desikatparam)
No ratings yet
Sri Vedantha Desika Vidya: (Na Deivam Desikatparam)
33 pages
LM English 8 Week 2
No ratings yet
LM English 8 Week 2
4 pages
TIM 40 Instruction Book
No ratings yet
TIM 40 Instruction Book
44 pages
Unit 1
No ratings yet
Unit 1
12 pages
Debate 2
No ratings yet
Debate 2
8 pages
PTE Academic Is A Computer
No ratings yet
PTE Academic Is A Computer
3 pages

People_tracking_in_RGB-D_data_with_on-line_boosted_target_models

Uploaded by

People_tracking_in_RGB-D_data_with_on-line_boosted_target_models

Uploaded by

2011 IEEE/RSJ International Conference on

Intelligent Robots and Systems

People Tracking in RGB-D Data

Abstract— People tracking is a key component for

978-1-61284-456-5/11/$26.00 ©2011 IEEE 3844

Fig. 4. The setup consisting in three vertically mounted Kinect

further ψj (t) denote a set of assignments which associates

You might also like