0% found this document useful (0 votes)
6 views6 pages

EXPERIMENTS WITH PATCH-BASED OBJECT CLASSIFICATION

This document discusses a patch-based algorithm for object classification in video surveillance, focusing on improving scene understanding through advanced object models. The algorithm utilizes Gabor-filtered images and template matching to classify various objects, achieving detection rates above 95%, which can improve to 98% when accounting for object orientation. A new dataset containing over 9,000 object images was created to support the study, alongside results from existing datasets.

Uploaded by

Logan Shaw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views6 pages

EXPERIMENTS WITH PATCH-BASED OBJECT CLASSIFICATION

This document discusses a patch-based algorithm for object classification in video surveillance, focusing on improving scene understanding through advanced object models. The algorithm utilizes Gabor-filtered images and template matching to classify various objects, achieving detection rates above 95%, which can improve to 98% when accounting for object orientation. A new dataset containing over 9,000 object images was created to support the study, alongside results from existing datasets.

Uploaded by

Logan Shaw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

EXPERIMENTS WITH PATCH-BASED OBJECT CLASSIFICATION

R. G. J. Wijnhoven1,2 P. H. N. de With2,3
1 2
Bosch Security Systems B.V. Tech. Univ. Eindhoven, 3 LogicaCMG
Eindhoven, The Netherlands Eindhoven, The Netherlands

Abstract object classification. For improved scene understanding,


more advanced object models are required, taking specific
We present and experiment with a patch-based algorithm object features from the video into account. The aim of our
for the purpose of object classification in video surveillance. object modeling is to classify various objects in a reliable
A feature vector is calculated based on template matching way, thereby supporting the decision-making process for a
of a large set of image patches, within detected regions-of- security operator of a CCTV surveillance system.
interest (ROIs, also called blobs), of moving objects. In- The main disadvantage of using wire-frame models (as
stead of matching direct image pixels, we use Gabor-filtered proposed in [1]) is that for each object, a new model has to
versions of the input image at several scales. We present re- be designed. As an alternative, we attempt to find a more
sults for a new typical video surveillance dataset containing general approach that applies to more object classes. There-
over 9,000 object images. Additionally, we show results for fore, in this paper we study a patch-based algorithm as pro-
the PETS 2001 dataset and another dataset from literature. posed by Serre et al. [2]. In this technique, the computa-
Because our algorithm is not invariant to the object orien- tional expensive stage of template and pattern matching, is
tation, the set was split into four subsets with different ori- independent of the number of object classes and the clas-
entation. We show the improvements, resulting from taking sification is performed afterwards, on a subset of the data,
the object orientation into account. Using 50 training sam- using feature vectors. The contribution of this paper is in
ples or higher, our resulting detection rate is on the average applying this generic algorithm onto a surveillance dataset
above 95%, which improves with the orientation consider- and corresponding objects. Classification results on the
ation to 98%. Because of the inherent scalability of the al- dataset show that a classification rate above 95% is possible
gorithm, an embedded system implementation is well within for this algorithm. Separating the new dataset into subsets
reach. containing different object orientations improves this rate
to 98%. Additionally, we show results for the PETS 2001
dataset and the dataset presented by Ma and Grimson [3].
1 Introduction The remainder of the paper is as follows. In Section 2 re-
lated work is presented. Section 3 discusses the model that
In video surveillance systems, recently a shift occurs from we use for object classification. The datasets are introduced
content-agnostic video streams, generated by traditional in Section 4. Also, the initial new dataset is divided into
cameras towards smart video processing inside cameras. subsets containing different object orientations. The results
This processing aims at generating a notion of activity in of the algorithm are presented for all three datasets in Sec-
the monitored scene by means of Video Content Analy- tion 5. The paper ends with conclusions and future work.
sis (VCA). State-of-the-art VCA systems comprise object
detection and tracking, thereby generating location data of
key objects in the video imagery of each camera. For video 2 Related work
surveillance, this technology can be used to effectively as-
sist security personnel. Model-based object classification/detection approaches are
While the detection and tracking algorithms are becom- based on two different classes of models: rigid (non-
ing mature, the classification of the detected objects is still deformable) and non-rigid (deformable) models. Rigid
in an early stage. This classification is commonly per- models are commonly used for the detection of objects like
formed using the size of the object/blob, where simple vehicles, where non-rigid models are typically used for per-
camera calibration is applied to compensate for the per- son detection.
spective. However, effects such as shadows and occlusion In the following, we consider three types of algorithms.
negatively influence the segmentation process and thus the In various surveillance systems, classification methods are

978-1-4244-1696-7/07/$25.00 ©2007 IEEE. 105


Authorized licensed use limited to: UNED- Universidad Estatal a Distancia. Downloaded on August 12,2024 at 18:10:00 UTC from IEEE Xplore. Restrictions apply.
Prototype
Feature vector generation
database proach is that the image analysis part is independent of the
Object SVM
amount of object classes. For this reason, the algorithm
detection S1 C1 S2 C2 Classifier
is suited for embedded implementation and was therefore
adopted for further exploration.
Figure 1: Architecture for classification of objects in
camera image. 3 Software model
commonly based on the pixel-size of the object blob. More Since humans are good at object classification, it is rea-
advanced algorithms for traffic surveillance match 3D wire- sonable to look into biological and neurological findings.
frame models onto the input image for the purpose of ob- Based on findings from Hubel and Wiesel [13], Riesenhu-
ject tracking or classification. Within the domain of generic ber and Poggio have developed the ”HMAX” model [14]
object recognition in large multimedia databases, various that has been extended recently by Serre et al. [2] and opti-
proposed algorithms are based on low-level local descrip- mized by Mutch and Lowe [15]. We have implemented the
tors that model the object’s appearance. Each of the three model proposed by Serre et al. [2]. For completeness, we
methods will now be addressed briefly. will shortly address the operation of the algorithm in the fol-
Methods only considering the blob area information are lowing. A simplified graphical representation of the model
the most simple object models and computationally inex- for classification of objects detected in a video camera is
pensive. Systems that segment the camera input images into shown in Figure 1, where the first step of object detection is
a static background image and moving foreground blobs described in [4].
(e.g. [4]), already provide some information about the de- The algorithm is based on the concept of a feedforward
tected objects, e.g. pixel-size and -speed. This concept is architecture, alternating between simple and complex lay-
used by Bose and Grimson [5]. Haritaoglu et al. [6] apply ers, in line with the findings of Hubel and Wiesel [13]. The
projection histograms in x- and y-direction for tracked ob- first layer (S1) is implemented as a series of line-detectors,
jects to make a distinction between various object types, where each detector filters the graylevel input image with a
which also leads to perspective invariance. Gabor filter of a specific size, as to obtain scale-invariance
Wire-frame models have been proposed for the purpose for the series. The filters are normalized to have zero mean
of model-based object detection and tracking [1]. For a and a unity sum of squares. The filter size increases from
more complete overview, we refer to previous work of the 7 × 7 elements (at scale zero) to 37 × 37 (at scale 15).
authors [7], where rigid object models have been considered Applying the Gabor filters onto the input image results in
for the purpose of vehicle classification. The algorithm is a set of filtered images. As an example, the filter response
based on finding the best matching image position for all for the input image of a car is shown in Figure 2. Note that
models in the database by projecting each 3D wire-frame only the filter response for one particular filter size (scale)
model onto the 2D camera image. The projected 2D line- is shown, but at all four orientations.
set is shifted over the image region and calculates a match-
ing error for each pixel position. The position giving the
smallest error defines the best matching pixel position. The
model with the lowest matching error is chosen as the clas-
sified object model.
Low-level image features describing the object appear-
ance are used by several object recognition systems. Haar-
wavelets are commonly used, because of the low compu-
tational complexity [8][9]. Dalai and Triggs [10] compare
the performance of Haar wavelets, PCA-SIFT [11] and His- Figure 2: Gabor filter response (filter size 7 × 7 elements)
togram Of Gradient methods (HoG). They show that the on input image of a car (scaled to 140 pixels in height).
HoG method outperforms the others. Ma and Grimson [3]
propose a method based on SIFT for the purpose of vehicle The next layer in the processing chain, as depicted in
classification in traffic video using a constant viewpoint. Figure 1, is the complex layer C1. This processing step ob-
Serre et al. [2] model findings from biology and neuro- tains invariance in both the spatial dimensions and in the
science using a hierarchical feed-forward architecture. The dimension of scale. Considering the dimension of scale,
model is shown to have performance in line with human two S1 feature maps in consecutive scales (132 elements in
subjects, considering the first 150 milliseconds of the hu- height for scale zero) are element-wise maximized. This
man visual system in a simple binary classification task generates one feature map for every two scales. The com-
[12]. As previously mentioned, the advantage of this ap- bination of several scales results in a band. Next, in order

106
Authorized licensed use limited to: UNED- Universidad Estatal a Distancia. Downloaded on August 12,2024 at 18:10:00 UTC from IEEE Xplore. Restrictions apply.
to obtain spatial invariance, the maximum is taken over a 4 Dataset
local spatial neighborhood around each pixel and the result-
ing image is sub-sampled. Because of the down-sampling, Most available datasets for classification focus on the do-
the number of C1 features is much lower than the number main of generic object detection, where surveillance spe-
of S1 features. The resulting C1 feature maps for the input cific datasets have been created for the purpose of object
image (33 elements in height at band zero and 12 at band 7) tracking, and therefore contain a strictly limited number of
of the car image in Figure 2 are shown in Figure 3. different objects. For the purpose of object classification, a
high number of different objects is required. Ma and Grim-
son [3] presented a limited dataset for separating various car
types. Since future smart cameras should be able to make a
distinction between more object classes, we have created a
new dataset.
A one-hour video capture was made at CIF resolution
(352x288 pixels) from a single, static camera, monitoring a
traffic crossing. After applying the tracking algorithm pro-
posed by the authors of [4], the resulting object images (of
size 10-100 pixels) were manually adjusted if required, to
Figure 3: C1 feature maps for S1 responses from Figure 2
have a clean performance of the blob extraction and avoid
(at band 0). The C1 maps are re-scaled for visualization.
any possible negative interference with the new algorithm.
For this reason, redundant images, images of occluded ob-
The next layer (S2) in the processing chain of the model jects and images containing false detections have been re-
applies template matching of image patches onto the C1 moved. Because of the limited time-span of the recording,
feature maps. This can be compared to the simple layer the scene conditions do not change significantly. The final
S1, where the filter response is generated for several Gabor dataset contains 9,233 images of objects.
filters. This template matching is done for several image
The total object set has been split into the following 13
patches (prototypes). These patch prototypes are extracted
classes: trailers, cars, city buses, Phileas buses (name of a
from natural images at a random band and spatial location,
specific type of bus), small buses, trucks, small trucks, per-
at the C1 level. Each prototype contains all four orientations
sons, cleaning cars, bicycles, jeeps, combos and scooters.
and prototypes are extracted at four different sizes: 4 × 4,
Some examples of each object class are shown in Figure 5.
8 × 8, 12 × 12 and 16 × 16 elements. Hence, a 4 × 4 patch
contains 64 C1 elements. Next to the new surveillance dataset, we used the
datasets from Ma and Grimson [5] and the PETS 2001
The response of a prototype patch P over the C1 fea-
dataset1 . Results are presented in the next section.
ture map C of the input image I is defined by a radial basis
function that normalizes the response to the patch-size con-
sidered, as proposed by Mutch and Lowe [15].
4.1 Orientation separation
Examples of image patches (prototypes) and the corre-
sponding S2-responses are shown in Figure 4 for the car The detection performance of the human visual system de-
image from Figures 2 and 3. Note that we only show two pends on the 3D viewpoint of the objects learned. Lo-
patch prototypes, each of size 4 × 4 C1 elements. gothetis et al. [17] demonstrate this view-dependence of
The last step in the processing architecture is the extrac- the visual system and state that detection performance de-
tion of the most relevant response. The maximum patch- creases with a deviation in viewpoint angle. When the ob-
response over all bands and all spatial locations is stored as ject rotation increases above roughly 30 degrees, the detec-
the final value in the feature vector for each patch prototype tion performance decreases drastically.
considered. Therefore, the final feature vector has a dimen- Since some orientation measure is already produced by
sionality equal to the number of prototype patches used. In the tracking algorithm, we can use this knowledge a priori.
our implementation, we used 1,000 prototype patches. Note To be independent of the tracking performance, the object
that by considering a higher or lower number of C1 patch orientations have been manually annotated for all images in
prototypes, the required computation power can be linearly the dataset. For each object-class of the total set of C ob-
scaled. ject classes, we create N new classes, where N equals the
In order to classify the resulting C2 feature vector, we number of quantized orientations. However, note that since
use a one-vs-all Support Vector Machine (SVM) classifier the object orientation is a priori given, we use N indepen-
with a linear kernel. The SVM with highest output score
1 PETS 2001 Dataset 1, Cameras 1 & 2, testing sets only. The complete
defines the output class of the feature vector. The Torch3
library [16] was used for the implementation of the SVM. dataset is available from http://ftp.pets.rdg.ac.uk/.

107
Authorized licensed use limited to: UNED- Universidad Estatal a Distancia. Downloaded on August 12,2024 at 18:10:00 UTC from IEEE Xplore. Restrictions apply.
Figure 4: Patch response of 4 × 4 C1 elements for two example patches. The right 8 images represent the S2 feature maps
at each band. The top prototype results in higher responses in the medium bands, where the lower prototype gives a higher
reaction in the lower bands.
Table 1: Detection rates for the four-class classification
problem, without and with orientation separation.

Train Normal Orientation separation Gain


1 58.3 ± 7.4% 69.9 ± 9.9% 19.9%
5 83.9 ± 2.2% 88.2 ± 3.3% 3.9%
Figure 5: Surveillance dataset Wijnhoven 2006. 10 90.5 ± 1.8% 93.9 ± 2.6% 3.7%
50 95.9 ± 1.1% 98.3 ± 0.4% 2.5%
dent classification systems, individually trained for the cor- 100 97.6 ± 0.4% 99.1 ± 0.3% 1.5%
responding object orientations.
The original dataset was split in four main orientation fier is trained with the feature vectors of the images in the
bins, as shown in Figure 6. To compensate for the perspec- training set and tested with the test set. We present the de-
tive, the left and right bins comprise 60 degrees, and the top tection rate, being the percentage of images correctly clas-
and bottom bins comprise 120 degrees. sified. The final detection rate is calculated by averaging
the results over ten iterations. The correct detection rate,
averaged over all classes, in the case of 30 training samples
per class is 87.7%. The main misdetections are bicycles and
scooters (13%), and combos and small buses (13%).
For some simple applications, the classification between
four object classes is already significant. A camera that can
make a distinction between cars, buses, persons and bikes
with high accuracy adds functionality to the camera that
only comprises object detection and tracking. Therefore,
the total dataset of 9,233 object images has been redivided
Figure 6: Division into four orientation bins, taking the per- into a new dataset, containing only the mentioned four ob-
spective of the recorded scene into account. ject classes. Applying the same tests as mentioned before,
results in an increase in detection rate. Furthermore, be-
5 Results cause there are less classes with a low number of object im-
ages, the number of learning samples can be increased. Ta-
This section shows the results for the object classification ble 1 shows that the detection rate of such a four-class sys-
on the surveillance datasets presented in Section 4. Each tem increases up to 97.6% when 100 samples are learned.
image is first converted to grayscale and scaled to 140 pix- Taking the object orientation into account results in a
els in height while maintaining the aspect ratio. The images performance gain. The classification system is divided into
containing a pedestrian are scaled to 280 pixels in height. four independent systems, each considering a different in-
The total set of images for each class is at random divided terval of orientations. Each classifier is trained indepen-
into a training and a test set. For the training set, the num- dently and each test object image is fed to the classifier
ber is specified (e.g. 30 samples) and the remainder of the with corresponding orientation. The improvement in cor-
images is used for the test set. rect classification rate is listed in Table 1. As can be seen,
Next, the feature vectors for all images are calculated the improvement increases up to 99.1% for 100 training
using the methods discussed in Section 3. The SVM classi- samples, adding 1.5% over the normal four-class system.

108
Authorized licensed use limited to: UNED- Universidad Estatal a Distancia. Downloaded on August 12,2024 at 18:10:00 UTC from IEEE Xplore. Restrictions apply.
Table 2: Confusion matrix of the four-class classifica- in the correct classification. Note that both cameras have
tion problem for the normal and the orientation separated a different viewing angle, which are both not equal to the
datasets (10 training samples, values in %). camera configuration of the training set (see Section 4).

Normal Orientation separation


C B P B C B P B 6 Conclusions and future work
Car 87.3 5.1 0.3 4.1 92.7 3.4 0.2 1.5
Bus 10.6 91.9 0.2 0.8 6.2 95.4 0.2 0.4 We have presented a scalable patch-based algorithm, suited
People 0.2 1.8 93.5 5.7 0.2 1.1 91.7 4.3 for parallel implementation in an embedded system. The
Bike 1.9 1.1 6.0 89.4 0.9 0.1 8.0 93.8
algorithm has been tested on a new dataset extracted from a
typical traffic crossing. When the total set of object images
Table 3: Detection rates for the traffic dataset from Ma and is divided into 13 classes and 30 samples per class are used
Grimson [5]. for training, a correct classification rate of 87.7% has been
obtained. This performance increases to 95% when the set
Ma Grimson Our method Difference is split into only four classes and reaches 97.6% with 100
Car-van 98.5% 99.25% +0.75% training samples. When we split the dataset in four indepen-
Sedan-taxi 95.76% 95.25% -0.49% dent datasets, and give a priori the object orientation, the
performance of the four-class problem increases to 99.1%.
A confusion matrix for this system is shown in Table 2. Furthermore, we have shown a comparable performance
Furthermore, we have compared our system with the sys- with the SIFT-based algorithm by Ma and Grimson [5] us-
tem of Ma and Grimson [5]. As can be seen in Table 3, ing their dataset and high correct classification rates on the
our system outperforms the proposed SIFT-based system PETS 2001 dataset.
for the car-van problem, in contrast to the sedan-taxi prob- The aforementioned performances can be further im-
lem. Where our proposed algorithm has been designed to proved by exploiting application-specific information.
limit the influence of small changes within an object class, Object-tracking algorithms provide useful information that
the SIFT-based algorithm focuses on describing more spe- can be taken into account in the classification step. Viola
cific details of the test objects. This explains the differences and Jones [18] show a performance gain by using the in-
in performance. formation from two consecutive frames. Also, extracting
For comparison with the PETS 2001 dataset, we have a sub-set of relevant features (C1 patch prototypes in our
extracted images from the object blobs and removed all case) which are specific to our application, can give a per-
images where the target object is occluded and all images formance gain as shown by Wu and Nevatia [19].
where the object appearance did not change. For the re- For future research, it is interesting to know how much
maining images, every 5th frame was kept. The system was sensor resolution is required to obtain a decent classification
trained with all images from the dataset as introduced in system. One of the first experiments would be to measure
Section 4, separated into four classes. Note that for certain the influence of the input image resolution on the classifica-
persons, the images only consist of a few pixels. The correct tion performance.
classification rates are listed in Table 4. For the persons, the A generic object modeling architecture can consist of
main confusion is the bike class, with 6.1% and 5.9% for several detectors that include pixel-processing elements and
camera one and two, respectively. The first car was only classification systems. We propose a generic architecture as
confused with a city bus for three images in Camera 1 and visualized in Figure 7, where detectors can exchange both
one image in Camera 2. For camera two, the second car was features extracted at the pixel level and classification results.
for 20% confused with a city bus. The minivan was mainly For the purpose of person detection, Mohan et al. [9] pro-
confused with a city bus for both cameras. Altogether, the pose multiple independent component detectors. The clas-
correct classification rates are still very high. Averaging sifier output of each component is used in a final classifica-
over all images of the object’s lifetime will always result tion stage. In contrast to this fully parallel implementation,
Zuo [20] proposes a cascaded structure with three different
Table 4: Detection rates for the four-class classification detectors to limit the computational cost in a face-detection
problem, testing with the PETS2001 dataset. system.
Recently, the authors have considered a 3D wire-frame
Object Cam 1 Cam 2 modeling approach [7] that is completely application-
Car 1 96.8% 99.4% specific. This means that for each typical new application,
Car 2 100% 77.8% 3D models have to be manually generated. Furthermore,
Minivan 80.7% 86.1% addition of a new object class requires a new model that
Persons 85.3% 85.8% differs from the other models and implies the design of a

109
Authorized licensed use limited to: UNED- Universidad Estatal a Distancia. Downloaded on August 12,2024 at 18:10:00 UTC from IEEE Xplore. Restrictions apply.
new detector. On the opposite, the patch-based approach is Pattern Anal. Machine Intell. (PAMI), vol. 22, pp. 809–830,
a more general approach which generates one feature vector IEEE, August 2000.
for every object image and the SVM classifier is trained to [7] R. Wijnhoven and P. de With, “3d wire-frame object-
make a distinction between the application-specific object modeling experiments for video surveillance,” in Proc. 27th
classes. Int. Symp. Inform. Theory in Benelux, pp. 101–108, June
2006.
[8] P. Viola and M. Jones, “Rapid object detection using a
boosted cascade of simple features,” in Proc. IEEE Com-
puter Society Conf. Comp. Vision Pattern Recogn. (CVPR),
vol. 1, pp. 511–518, 2001.
[9] A. Mohan, C. Papageorgiou, and T. Poggio, “Example-based
object detection in images by components,” IEEE Transac-
tions on Pattern Analysis and Machine Intelligence (PAMI),
vol. 23, pp. 349–361, April 2001.
[10] N. Dalai and B. Triggs, “Histogram of oriented gradients for
human detection,” in Proc. of the IEEE Computer Society
Conf. on Computer Vision and Pattern Recognition (CVPR),
Figure 7: Generic object modeling architecture, containing vol. 1, pp. 886–893, IEEE, June 2005.
multiple detectors. [11] Y. Ke and R. Sukthankar, “Pca-sift: A more distinctive rep-
resentation for local image descriptors,” in In Proc. of IEEE
Computer Vision and Pattern Recognition (CVPR), vol. 2,
In our view, when aiming at a generic object
pp. 506–513, 2004.
modeling architecture, we envision a convergence be-
tween application-specific techniques and application- [12] T. Serre, Learning a Dictionary of Shape-Components in Vi-
sual Cortex: Comparison with Neurons, Humans and Ma-
independent algorithms, thereby leading to a mixture of
chines. PhD thesis, Massachusetts Institute of Technol-
both types of approaches. The architecture as shown in Fig- ogy Computer Science and Artificial Intelligence Labora-
ure 7 should be interpreted in this way. For example, in tory, April 2006.
one detector the pixel processing may be generic whereas
[13] D. H. Hubel and T. N. Wiesel, “Receptive fields of single
in the neighboring detector the pixel processing could be neurons in the cats visual system,” Journal of Physiology,
application-specific. The more generic detectors may be re- vol. 148, pp. 574–591, October 1959.
used for different purposes in several applications. [14] M. Riesenhuber and T. Poggio, “Models of object recogni-
tion,” Nature Neuroscience, vol. 3, pp. 1199–1204, 2000.
References [15] J. Mutch and D. Lowe, “Multiclass object recognition with
sparse, localized features,” in IEEE Computer Society Conf.
[1] J. Lou, T. Tan, W. Hu, H. Yang, and S. Maybank, “3-d model- on Computer Vision and Pattern Recognition (CVPR), vol. 1,
based vehicle tracking,” IEEE Trans. Image Proc., vol. 14, pp. 11–18, June 2006.
pp. 1561–1569, October 2005. [16] R. Collobert, S. Bengio, and J. Mariethoz, “Torch: a modular
[2] T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Pog- machine learning software library,” tech. rep., Dalle Molle
gio, “Robust object recognition with cortex-like mecha- Instit. for Percep. Artif. Intell., Valais, Switzerland, October
nisms,” Trans. Pattern Anal. Machine Intell. (PAMI), vol. 29, 2002.
pp. 411–426, March 2007. [17] N. K. Logothetis, J. Pauls, and T. Poggio, “Shape represen-
[3] M. Xiaoxu and W. Grimson, “Edge-based rich representation tation in the inferior temporal cortex of monkeys,” Current
for vehicle classification,” in Proc. IEEE Int. Conf. Computer Biology, vol. 5, pp. 552–563, March 1995.
Vision (ICCV), vol. 2, pp. 1185–1192, October 2005. [18] P. Viola, M. Jones, and D. Snow, “Detecting pedestrians us-
[4] S. Muller-Schneiders, T. Jager, H. Loos, and W. Niem, “Per- ing patterns of motion and appearance,” in Proc. of the Ninth
formance evaluation of a real time video surveillance sys- IEEE Int. Conf. on Computer Vision (ICCV), vol. 2, pp. 734–
tem,” in Proc. 2nd Joint IEEE Int. Workshop Visual Surv. 741, October 2003.
and Perf. Eval. of Tracking and Surv. (VS-PETS), pp. 137– [19] B. Wu and R. Nevatia, “Detection of multiple, partially oc-
144, October 2005. cluded humans in a single image by bayesian combination of
[5] B. Bose and W. E. L. Grimson, “Improving object classi- edgelet part detectors,” in Proc. 10th IEEE Int. Conf. Comp.
fication in far-field video,” in Proc. IEEE Comp. Vision Pat- Vision (ICCV), vol. 1, pp. 90–97, IEEE Comp. Soc., Wash-
tern Recogn. (CVPR), vol. 2, pp. 181–188, IEEE Comp. Soc., ington DC, USA, 2005.
Washington DC, USA, June 2004. [20] F. Zuo, Embedded face recognition using cascaded struc-
[6] I. Haritaoglu, D. Harwood, and L. Davis, “W4: real-time tures. PhD thesis, Technische Universiteit Eindhoven, Oc-
surveillance of people and their activities,” in IEEE Trans. tober 2006.

110
Authorized licensed use limited to: UNED- Universidad Estatal a Distancia. Downloaded on August 12,2024 at 18:10:00 UTC from IEEE Xplore. Restrictions apply.

You might also like