Real-Time Human Detection in Uncontrolled Camera Motion Environments
Real-Time Human Detection in Uncontrolled Camera Motion Environments
Mohamed Hussein
Wael Abd-Almageed
Yang Ran
Larry Davis
Institute for Advanced Computer Studies
University of Maryland
{mhussein, wamageed, rany, lsd}@umiacs.umd.edu
Abstract
achieved through integration of algorithms, for human detection, tracking, and motion analysis, in one framework so
that the final decision is based on the agreement of more
than one algorithm. Efficiency was achieved through multithreaded design and usage of a high performance library. A
final merit of our system is its object oriented design. Object
orientation was adopted to abstract the algorithmic details
away from the system design so that we can easily experiment with different algorithms. Therefore, our system can
be regarded as a testbed in testing different algorithms for
human detection, tracking, and motion analysis.
The rest of the paper is organized as follows: Section 2
explores some of the related works. Section 3 explains our
system design in detail. Section 4 briefly introduces the algorithms used in our implementation. Section 5 presents
some experimental results. Finally, Sec. 6 concludes our
paper.
2 Related Work
1 Introduction
The problem of object detection and tracking in video sequences becomes much harder when the camera is allowed
to move uncontrollably. If the object of interest is a deformable object, like a human, the problem becomes even
more challenging. Nevertheless, several interesting applications are waiting for solutions to this problem. A modern car prepared with a digital camera and a computer vision software for human detection can automatically avoid
running over pedestrians. A military unmanned vehicle
equipped with similar technology can automatically detect
and deal with an enemy before being attacked.
In this paper, a real-time computer vision system for human detection and tracking in uncontrolled moving camera platforms is presented. The contribution presented in
this paper is not in the algorithmic aspect of the system.
Rather, our focus is on the system design and implementation aspects. Namely, our design was made to achieve
two main goals: robustness and efficiency. Robustness was
1
from left to right, are: the frame grabbing module, the human detection module, the object tracking module, the motion analysis module, and finally the output module. Input
of the system is video frames obtained either online from
a video camera, or off-line from a video file or individual
video frame files. The output can be shown online or stored
to the disk for further analysis.
Each of the modules runs in a separate thread and data is
passed from one module to another via a shared data structure. Details of inter-thread communication are given in
Sec. 3.4.
3 System Design
In this section, the details of our system design and
implementation are explained. To make the paper selfcontained, in Sec. 4, the algorithms used in our current implementation are briefly introduced.
3.1
3.3
Objectives
3.2
System Components
System Architecture
Queue Element:
+ Frame Pointer
Queue Element:
+ Frame Pointer
+ List of Detections
Stabilization
Motion
Analysis
Module
Object
Tracking
Module
ShapeBased
Detection
Output
Module
Video Input
Tracking Results
+ Frames Queue
+ Tracks List
Frames Queue
Queue Element:
+ Frame Pointer
+ Track Counter
Tracks List
List Element:
+ Track Element List
3.4
Inter-Thread Communications
4.1
This interface is a queue of structure elements. Each element contains a pointer to the frame along with a list of detections found in it. If the list of detections is empty, then,
either the detection algorithm was not run on this frame, or
the detection algorithm did not find any human in it.
4.2
Stabilization Algorithm
The human detection algorithm used in our implementation was introduced in [3]. This algorithm searches for
humans in the image by matching its edge features to a
database of templates of human silhouettes. Examples of
these templates are shown in Fig. 2. The matching is done
by computing the average Chamfer distance [2] between the
template and the edge map of the target image area. The image area under consideration must be of the same size as the
template. Let the template T be a binary image that is 0 everywhere except for the silhouette pixels where the value is
1, and let the Chamfer distance transform of the edge map
of the target image area be denoted by C. The distance
between a template T and the target image area I can be
computed by
D(I, T ) =
4 Algorithms
1 X
Ci Ti ,
|T | i
In this section, the algorithms used in our implementation are briefly explained. We do not claim that the algorithms selected are the best ones in their functions. In
our system design, any of these algorithms can be safely replaced by others as long as the interfaces between different
modules are preserved.
4
4.4
line. That limits the comparison to a few number of templates and accelerates the search in the database. Details
of building the hierarchy and matching the edge features to
templates are explained in [3].
4.3
d
Y
X
p(Zt /t ) =
mi,t N Zt (k); i,t (k), i,t
(k) ,
k=1
5 Experimental Results
Our system was experimented on a set of challenging
video sequences. It has succeeded to demonstrate robustness and close to real time performance (around 15 frames
per second.) In this section, we will present the results of
two sequences. In the figures presented, rectangular bounding boxes are the output of the detection algorithm. Green
boxes are the detections that are verified by the motion analysis algorithm, and the red boxes are the ones that are rejected by it. The reader is referred to the electronic version
for clarity of results.
In the sequence shown in Fig. 3, there was only one false
detection and it was caught by the motion analysis. On the
other hand, the sequence shown in Fig. 4 clearly shows the
advantages of the verification step. The shape-based detection algorithm produced many false alarms due to high
edge density on the left part of the scene. After tracking
these false detections for a period of time, the motion analysis algorithm decided that they did not exhibit the periodic
motion of a human.
i=f,s,w
(a) Frame 1
(a) Frame 1
(b) Frame 22
(b) Frame 23
(c) Frame 45
(c) Frame 45
(d) Frame 65
(d) Frame 66
6 Conclusion
In this paper, a real-time computer vision system for human detection, tracking, and verification in uncontrolled
camera motion environment has been presented. The key
features of our system are robustness and efficiency. Robustness was achieved via integration of more than one algorithm, each of which uses a different visual cue to identify
humans. The efficiency was achieved via a multi-threaded
design with efficient inter-thread communication, and the
usage of a highly optimized software library. The system
has demonstrated a satisfactory performance on highly challenging video sequences. Our short term plan is to further
optimize our system and experiment with other algorithms.
Our long term plan is to extend the system to analyze human
activities and evaluate threats.
References
[1] Openthreads. http://openthreads.sourceforge.net/.
[2] H. G. Barrow. Parametric correspondence and chamfer
matching: two new techniques for image matching. In International Joint Conference on Artificial Intelligence, pages
659663, 1977.
[3] D. Gavrila. Pedestrian detection from a moving vehicle. In
ECCV 00: Proceedings of the 6th European Conference on
Computer Vision-Part II, pages 3749, London, UK, 2000.
Springer-Verlag.
[4] A. Hampapur, L. Brown, J. Connell, , A. Ekin, N. Hass,
M. Lu, H. Merkl, S. Pankanti, A. Senior, C.-F. Shu, and Y. L.
Tian. Smart video surveillance. IEEE Signal Processing
Magazine, pages 3851, March 2005.
[5] A. Hampapur, L. Brown, J. Connell, N. Hass, M. Lu,
H. Merkl, S. Pankanti, A. Senior, C.-F. Shu, and Y. Tian.
S3-r1: The ibm smart surveillance system-release 1. In ACM
SIGMM workshop on Effective telepresence, 2004.
[6] I. Haritaoglu, D. Harwood, and L. Davis. w4 : Realtime surveillance of people and their activities. IEEE
Transactions on Pattern Analysis and Machine Intelligence,
22(8):809830, 2000.
[7] W. Hu, T. Tan, L. Wang, and S. Maybank. A survey on
visual surveillance of object motion and behaviors. IEEE
Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews, 34:334352, 2004.
[8] T. B. Moeslund and E. Granum. A survey of computer
vision-based human motion capture. Computer Vision and
Image Understanding, 81:231268, 2001.
[9] V. Morellas, I. Pavlidis, and P. Tsiamyrtzis. Deter: Detection of events for threat evaluation and recognition. Machine
Vision and Applications, 15:2945, 2003.
[10] Y. Ran, I. Weiss, Q. Zheng, and L. Davis. Pedstrian detection via periodic motion analysis. To Appear, International
Journal on Computer Vision.
[11] M. Spengler and B. Schiele. Towards robust multi-cue integration for visual tracking. Machine Vision and Applications, 14:5058, 2003.