0% found this document useful (0 votes)
3 views8 pages

2. Incremental learning

The paper presents a novel approach to incremental learning of object detectors using a visual shape alphabet, enabling models for new categories to benefit from previously built detectors. It introduces a joint learning algorithm that adapts AdaBoost for shared weak classifiers across classes, allowing for sublinear growth in complexity as new categories are added. Experimental results demonstrate improved recognition performance and reduced training data requirements compared to traditional per-class detector training methods.

Uploaded by

fansthetic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views8 pages

2. Incremental learning

The paper presents a novel approach to incremental learning of object detectors using a visual shape alphabet, enabling models for new categories to benefit from previously built detectors. It introduces a joint learning algorithm that adapts AdaBoost for shared weak classifiers across classes, allowing for sublinear growth in complexity as new categories are added. Experimental results demonstrate improved recognition performance and reduced training data requirements compared to traditional per-class detector training methods.

Uploaded by

fansthetic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Incremental learning of object detectors using a visual shape alphabet

Andreas Opelt, Axel Pinz Andrew Zisserman


Inst. of Electrical Measurement and Sign. Proc. Dept. of Engineering Science
Graz University of Technology, Austria University of Oxford, UK

Abstract of methods which learn object category models individu-


ally.
We address the problem of multiclass object detection. In this paper we concentrate on object models consisting
Our aims are to enable models for new categories to ben- of an assembly of curve fragments. This choice is because
efit from the detectors built previously for other categories, the curve fragments more closely represent the object shape
and for the complexity of the multiclass system to grow sub- (than the more commonly used appearance patches [3, 10,
linearly with the number of categories. To this end we intro- 12]). This representation can be complemented by adding
duce a visual alphabet representation which can be learnt appearance patches, though we do not investigate that here.
incrementally, and explicitly shares boundary fragments Our object model is similar to those of [13, 14, 15] and is
(contours) and spatial configurations (relation to centroid) briefly reviewed in section 2.
across object categories. We introduce a novel joint learning algorithm which is a
We develop a learning algorithm with the following variation on that of Torralba et al. [16], where weak classi-
novel contributions: (i) AdaBoost is adapted to learn fiers are shared between classes. The principal differences
jointly, based on shape features; (ii) a new learning sched- are that our algorithm allows incremental as well as joint
ule enables incremental additions of new categories; and learning, and we can control the degree of sharing. Less
(iii) the algorithm learns to detect objects (instead of cate- significant differences follow from the use of the bound-
gorizing images). Furthermore, we show that category sim- ary fragment model [13] in that we learn an object detector
ilarities can be predicted from the alphabet. (rather than the classification of an image window, and de-
We obtain excellent experimental results on a variety of tection by scanning over the whole image as is done in [16]).
complex categories over several visual aspects. We show The main benefits of the approach, over individual learning
that the sharing of shape features not only reduces the num- of category detectors, are: (i) that we need less training data
ber of features required per category, but also often im- when sharing across categories; and (ii) that we are able
proves recognition performance, as compared to individual to add new categories incrementally making use of already
detectors which are trained on a per-class basis. acquired knowledge.
Others have also used information from previously learnt
classes. For example, Fei-Fei et al. [4] used prior infor-
1 Introduction mation from previously learnt categories to train a gener-
ative probabilistic model for a novel class, and Bart and
Many recent papers on object category recognition have Ullman [2] introduced a cross-generalization method where
proposed models and learning methods where a new model useful patches for one category guide the search within the
is learnt individually and independently for each object cat- pool of possible patches for a new, but similar, category.
egory [1, 5, 10, 12]. In this paper we investigate how models Krempp et al. [9] have a similar objective of incremental
for multiple object categories, or for multiple visual aspects learning of categories and a shared alphabet. However, their
of a single category, can be built incrementally so that new category model and learning algorithm differ substantially
models benefit from those created earlier. Such models and from that proposed here.
methods are necessary if we are to achieve the long sought A brief outline of the paper is as follows: we start with an
after system that can recognize tens of thousands of cate- introduction of the BFM and show that we need to train only
gories: we do not want to be in a position where, in order a few relevant aspects per category. Next, we present the
to add one more category (after number 10,000), we have to incremental learning of the visual alphabet, which is shared
retrain everything from scratch. Of course, the constraint is over categories. Similarly, our detectors are learnt incre-
that our recognition performance should at least equal that mentally and can be shared. Finally, our experiments show

Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)
0-7695-2597-0/06 $20.00 © 2006 IEEE
that this sharing leads to a sublinear growth of required al- model to viewpoint change. The evaluation is carried out on
phabet entries / detectors, but maintains excellent detection the ETH-80 dataset. This is a toy dataset (pun intended), but
performance. is useful here for illustration because it contains image sets
of various instances of categories at controlled viewpoints.
2 The boundary fragment model (BFM) We carry out the following experiment: a BFM is learnt
from instances of the cow category in side views. The
We present a very brief overview of our previous work model is then used to detect cows in test images which vary
which introduced a boundary fragment model (BFM) de- in two ways: (i) they contain cows (seven different object
tector (see [13] for details). The BFM consists of a set of instances) over varying viewpoints – object rotation about
curve fragments representing the edges of the object, both a vertical and horizontal axis (see figure 2); (ii) they con-
internal and external (silhouette), with additional geomet- tain instances of other categories (horses, apples, cars . . . ),
ric information about the object centroid (in the manner again over varying viewpoints.
of [10]). A BFM is learnt in two stages. First, random Figure 2 shows the resulting Hough votes on the cen-
boundary fragments γi are extracted from the training im- troid, averaged over the seven cow instances for a number
ages. Then costs K(γi ) are calculated for each fragment of rotations. It can be seen that the BFM is robust to signif-
on a validation set. Low costs are achieved for boundary icant viewpoint changes with the mode still clearly defined
fragments that match well on the positive validation im- (though elongated). The graph in figure 3 summarizes the
ages, not so well on the negative ones, and have good cen- change in the detection response averaged over the different
troid predictions on the positive validation images. Second, cows or other objects under rotation about a vertical axis
combinations of k = 2 boundary fragments are learnt as (as in the top row of figure 2). Note that the cow detection
weak detectors (not just classifiers) within an AdaBoost [6] response is above that of other non-cow category objects.
framework. Detecting instances of the object category in a The side-trained BFM can still discriminate the object class
new test image is done by applying the weak detectors and based on detection responses with rotations up to 45 de-
collecting their votes in a Hough voting space. An object grees in both directions. In summary: the BFM trained on
is detected if a mode (obtained using Mean-Shift mode esti- one visual aspect can correctly detect the object class over
mation) is above a detection threshold. Following the detec- a wide range of viewpoints, with little confusion with other
tion, boundary fragments that contributed to that mode are object classes. Similar results are obtained for BFM detec-
backprojected into the test image and provide an object seg-
mentation. An overview of the detection method is shown H:90,V:90 H:90,V:112 H:90,V:135 H:90,V:158 H:90,V:180 H:90,V:202 H:90,V:225 H:90,V:248 H:90,V:270

in figure 1.

Original Image All matched boundary


fragments Centroid Voting on a subset of the matched fragments

H:22,V:180 H:35,V:135 H:35,V:225 H:45,V:180 H:66,V:117 H:66,V:153 H:66,V:207 H:66,V:243 H:68,V:180

Segmentation / Detection Backprojected Maximum

Figure 2: Robustness of the BFM to viewpoint changes un-


der rotations about a vertical (V) or horizontal (H) axis. Top
Figure 1: Overview of object detection with the boundary row: rotations about a vertical axis. Bottom row: rotations
fragment model (BFM). about both vertical and horizontal. The viewpoint angles
are given above each image.

3 On multiple aspects tors learnt for other object categories (e.g. horses), whilst
for some categories with greater invariance to viewpoint
We want to enable an object to be detected over several (e.g. bottles) the response is even more stable. These re-
visual aspects. The BFM implicitly couples fragments via sults allow us to cut down the bi-infinite space of different
the centroid, and so is not as flexible as, say, a “bag of” viewpoints to a few category relevant aspects. These as-
features model where feature position is not constrained. In pects allow the object to be categorized and also to predict
this section we investigate qualitatively the tolerance of the its viewpoint.

Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)
0-7695-2597-0/06 $20.00 © 2006 IEEE
40 But there is more information that can be used for shar-
apple
35 car ing. The second possibility of sharing is achieved by eval-
cup
30 dog uating each boundary fragment on the validation sets of all

detection confidence
horse
25 pear
tomatoe
other categories. This results in average matching costs of
20 avg. cow
avg. other
the boundary fragment on all these other categories. These
15 costs indicate how suitable the boundary fragment is for
10 each of the other categories. The straight forward way
5 of sharing is now that each alphabet entry whose bound-
0
−100 −50 0 50 100
ary fragment has costs below thK on a certain category
degree of rotation
is also shared for that category. However, costs are low if
the boundary fragment matches well on the validation im-
Figure 3: The detection response of a BFM trained on cows- ages of that category and gives a reliable centroid predic-
side, and tested on cows rotated about a vertical axis and on tion. The final possibility of sharing is where the boundary
other objects. fragment matches well, but additional centroid vectors are
associated for the fragment for the new category. Figure 4
shows an example of a boundary fragment extracted from
4 Learning the shape based alphabet incre- one category also matching on images of another class (or
mentally aspect). The first column shows the original boundary frag-
ment (in red/bold) on the training image from which it was
learnt (green/bold cross showing the true object centroid,
In this section we describe how the basic alphabet is as-
and blue/bold the centroid vote of this boundary fragment).
sembled for a set of classes. Each entry in the alphabet
The other columns show sharing on another category (first
consists of three elements: (i) a curve fragment, (ii) asso-
row), and within aspects of the same category (second row).
ciated vectors specifying the object’s centroid, and (iii) the
Note, that we share the curve fragment and update the geo-
set of categories to which the vectors apply. The alphabet
metric information.
can be enlarged in two ways: (i) adding additional curve
fragments, or (ii) adding additional vectors to existing curve
fragments – so that a fragment can vote for additional ob-
ject’s centroids. Pairs of curve fragments are used to con-
struct the weak detectors of section 5.
We start from a set of boundary fragments for each cat-
egory. This set is obtained from the fragment extraction
stage (see section 2 or [13]) by choosing fragments whose
costs on the validation set of the category are below a given
threshold thK . Typically this threshold is chosen so that
there are about 100 fragments available per category. Our
aim is to learn a common alphabet from these pooled indi-
vidual sets that is suitable for all the categories one wants to
learn. Figure 4: Sharing of boundary fragments over categories
(first row) and aspects (second row).
4.1 Building the alphabet and sharing of
boundary fragments

In a sequential way each boundary fragment from each 4.2 Class similarities on the alphabet level
category is compared (using Chamfer distance) to all exist-
ing alphabet entries. If the distance to a certain alphabet We now have alphabet entries for a number of classes.
entry is below a similarity threshold thsim , the geometric Using this information we can preview class similarities be-
information (for the centroid vote) is updated. If the exist- fore training the final detector. A class similarity matrix is
ing alphabet entry originates from another category than the calculated where each element is a count of the number of
boundary fragment we are currently processing, we also up- alphabet entries in common between the classes. In turn,
date the information for which categories this entry is suit- the classes can be agglomeratively clustered based on their
able. This is the first case where boundary fragments are similarity. For this clustering the normalized columns of
shared. This sharing is just based on the boundary fragment the similarity matrix provide feature vectors and Euclidean
similarity. distance is used as a distance measure. An example similar-

Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)
0-7695-2597-0/06 $20.00 © 2006 IEEE
ity matrix and dendrogram (representing the clustering) are 2C − 1 possible subsets Sn of the jointly trained classes C,
shown in figures 8(a) and (b) respectively. we employ the maximally greedy strategy from [16]. This
starts with the first class that achieves alone the lowest er-
5 Incremental Joint-Adaboost Learning ror on the validation set, and then incrementally adds the
next class with the lowest training error. The combination
In this section we describe the new Adaboost based algo- which achieves the best overall detection performance over
rithm for learning the strong object detectors. It is designed all classes is then selected. [16] showed that this approxi-
mation does not reduce the performance much.
to scale well for many categories and to enable incremen-
tal and/or joint learning. It has to do two jobs: (i) select
pairs of fragments to form the weak detectors (see section Incremental learning: implements the following idea:
2); and (ii) select weak detectors to form the strong detector suppose our model was jointly trained on a set of cate-
for each object category. Sharing occurs at two levels: first, gories C L = {c1 , c2 , c3 }. Hence the “knowledge” learnt
at the alphabet level where an alphabet entry may be ap- is contained in a set of three strong detectors HL =
plicable to several categories; second, at the weak detector {H1 , H2 , H3 } which are composed from a set of weak de-
level, where weak detectors are shared across strong detec- tectors hL . The number of these weak detectorsdepends
C
tors. on the degree of sharing and is defined as Ts ≤ i=1 Tci
The algorithm can operate in two modes: either joint (C = 3 here). Now we want to use this existing informa-
learning (as in [16]); or incremental learning. In both cases tion to learn a detector for a new class cnew (or classes) in-
our aim is a reduction in the total number of weak detectors crementally. To achieve this, one can search already learnt
required compared to independently learning weak detectors hL to see whether they are also suitable
Ceach class. (cnew < 0.5) for the new class. If so, these existing weak
For C classes this gain can be measured by i=1 Tci − Ts
(as suggested in [16]) where Tci is the number of weak de- detectors are also used to form a detector for the new cate-
tectors required for each class trained separately (to achieve gory and only a reduced number of new weak detectors have
a certain error on the validation set) and Ts is the number of to be learnt using the joint learning procedure. Note that
weak detectors required when sharing is used. In the sepa- joint and incremental training reduces to standard Boosting
rate training case this sum is O(C), whereas in the sharing if there is only one category.
case it should grow sub-linearly with the number of classes.
The algorithm optimizes an error rate En over all classes. Weak detectors: are formed from pairs of fragments.
The possible combinations of k fragments define the fea-
Joint learning: involves for each iteration searching for ture pool (the size of this set is the binomial coefficient
the weak detector for a subset Sn ∈ C that has the lowest of k and the number of alphabet entries). This means for
accumulated error En on all classes C. Subsets might be each sharing of each iteration we must search over all these
e.g. S1 = {c2 } or S3 = {c1 , c2 , c4 }. A weak detector only possibilities to find our best weak detector. We can reduce
fits for a category if ci on this category ci is below 0.5 (and the size of this feature pool by using only combinations of
is rejected otherwise). En is the sum of all class specific boundary fragments which can be shared over the same cat-
errors ci if ci ∈ Sn and a penalty error p (0.6 in our im- egories as candidates for weak detectors. E.g. it does not
plementation) otherwise. Searching for a minimum of En make much sense to test a weak detector which is combined
over a set of subsets Sn guides the learning towards sharing from a boundary fragment representing a horses leg and one
weak detectors over several categories. We give a brief ex- that represents a bicycle wheel if the boundary horses leg
ample of that behavior: imagine we learn three categories, never matches in the bike images.
c1 , c2 and c3 . There is one weak detector with c1 = 0.1 but
this weak detector does not fit any other category (c2 > 0.5 Details of the algorithm: The algorithm is summarized
and c3 > 0.5). Another weak detector can be found with in figure 5. We train on C different classes where each class
c1 = 0.2, c2 = 0.4 and c3 = 0.4. In this case the al- ci consists of Nci validation images, and a set of Nbg back-
gorithm would select the second weak detector as its accu- ground validation images (which are shared for all classes
mulated error of En = 1.0 is smaller than the error of the and are labeled 0i ). The total number of validation im-
first weak detector of En = 1.3 (note that for each category ages for all classes and background is denoted by N . The
not shared p is added). This makes the measure En useful weights are initialized for each class separately. This re-
to find detectors that are suitable for both distinguishing a sults in a weight vector wic of length N for each class ci ,
class from the background, and for distinguishing a class normalized with respect to the varying number of positive
from other classes. Clearly, the amount of sharing is influ- validation images Nci . In each iteration a weak detector
enced by the parameter p which enables us to control the for a subset Sn is learnt. To encourage the algorithm to fo-
degree of sharing in this algorithm. Instead of exploring all cus also on the categories which were not included in Sn

Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)
0-7695-2597-0/06 $20.00 © 2006 IEEE
we vary the weights of these categories slightly for the next means we are not searching and comparing the learning ef-
iteration (c = p, ∀c ∈
/ Sn , with p = 0.47 in our implemen- fort for a certain error rate (as is done in [16]), but we re-
tation). port the RPC-equal-error-rate for a certain learning effort
(namely T weak detectors). Keeping track of the training
0
error is more difficult in our model, as we detect in the
Input: Validation images (I1 , P
1 ), . . . , (IN , N ),
C

ci ∈ {C, −1}, N = Nbg + C i=1 Nci .


Hough voting space manner of [13] instead of classifying
Initialization: Set the weight matrices wic : subwindows like [16].
8
< 2N1c if i = c. The experiments are organized as follows: First we
i
wic = 1
else
: 2(N +PC Nc ) briefly explain how the detection procedure works for the
bg i=1,ci =i i
Learn incrementally: multi-class case. Then we specify the used data, and show
For ci = 1 : C results on the plain alphabet followed by a comparison of
For hL (I, Sn ) ∈ HL (I, c)
if ci < 0.5: hL = hL (I, Sn ∩ ci ), update wic , t = t + 1 incremental and joint learning. Finally we present results of
Tci = Tci + 1 learning many categories independently or jointly.
For t = 1, ..., Tmax
C(C+1)
1. For n = 1, 2, .., 2
Detection algorithm: For a test image our task is to de-
(a) Find the best weak detector ht (I, Sn ) w.r.t.
the weights wiSn . tect one or more objects from C classes. This is carried
(b) Evaluatejerror:
PC c
out by the standard detection procedure (see section 2, and
c  if c < 21 , ∀c ∈ Sn for details [13]) extended to the multi-class case. All weak
En =
8 P C else
< N c 1 c 2
i=1 wi ·( 2 (i −ht (Ii ,Sn )) )
detectors trained for the C classes are applied to the test im-
PN if i ∈ Sn ,
with c = wc age. For each class we then manage a separate Hough vot-
: i=1 i
p otherwise.
ing space and add votes for all weak detectors that matched
2. Get best sharing by selecting: n = argminn En on that image and are shared by that category (included in
and pick corresponding ht , Sn
the strong detector for that category). Finally, we search
3. Update additive model and weights: each of the voting spaces for maxima and detect an object
H(I, c) = H(I, c) + αt ht (I, Sn )
c of class ci if there is a maximum in the corresponding vot-
wic ← wic · αi h“t (Ii ,c) ”
with αt = 1
log 1−c
, and c = p for c ∈
/ Sn
ing space above threshold.
2 c

4. Update Tci , and if Tci ≥ T ∀ci → ST OP Dataset: we have combined different categories from sev-
eral available datasets (at [8]) together with new images
from Google Image Search, in order to assemble a dataset
Figure 5: Incremental joint-Adaboost learning algorithm. containing 17 categories of varying complexity and aspect.
Figure 6 overviews the dataset, giving an example image for
6 Experiments each of the 17 categories. Table 1 summarizes the data used
for training, validation and testing.
We will measure detector performance in two ways: first, We use the same test set as [5] for the first four cate-
by applying the detector to a category specific test set (posi- gories so that our performance can be compared to others
tive images vs. background). The measure used is the Recall (although fewer training images are used). The same is
Precision Curve (RPC)-equal-error rate. This rate is com- done for category 11 (CowSide) so that performance can
monly used for detection and pays respect to false positive be compared with [10]. For the other categories we are not
detections (see [1] for more details); second, by a confusion directly comparable as subsets of the training and test data
table computed on a multi-class testset. Note that a detec- have been selected. As background images we used a sub-
area(box ∩boxgt )
tion is correct if area(boxpred
pred ∪boxgt )
≥ 0.5, with boxpred set of the background images used in [5] and [12] (the same
being the predicted bounding box and boxgt the bounding number of background as positive training images). To de-
box denoting the ground truth. termine to what extent the model confuses categories, we
The detectors are trained in three ways: (i) independently select a multiclass test dataset M which consists of the first
using the category’s validation set (images with the object, 10 test images from each category1 .
and background images); (ii) jointly over multiple cate-
gories; and (iii) incrementally. We compare performance, The alphabet: Figure 7 shows entries of the alphabet
learning complexity, and efficiency of the final strong de- trained on horses only. This nicely illustrates the different
tectors over these three methods. properties of each entry: shape and geometric information
For all experiments training is over a fixed number of for the centroid. When we train on 17 categories each of the
weak detectors T = 100 per class (for C classes the max-
imum number of weak detectors is Tmax = T · C). This 1 The whole dataset is available at [7].

Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)
0-7695-2597-0/06 $20.00 © 2006 IEEE
Figure 6: Example images of the 17 different categories (or aspects) used in the experiments.

C Name train val test source


1 Plane 50 50 400 Caltech [5]
2 CarRear 50 50 400 Caltech [5]
3 Motorbike 50 50 400 Caltech [5]
4 Face 50 50 217 Caltech [5]
5 BikeSide 45 45 53 Graz02 [12]
6 BikeRear 15 15 16 Graz02 [12]
7 BikeFront 10 10 12 Graz02 [12]
8 Cars2-3Rear 17 17 18 Graz02 [12]
9 CarsFront 20 20 20 Graz02 [12]
10 Bottles 24 30 64 ImgGoogle [13]
11 CowSide 20 25 65 [11] Figure 7: Example alphabet entries from learning only
12 HorseSide 30 25 96 ImgGoogle
horses. Each column shows the shape of the boundary
13 HorseFront 22 22 23 ImgGoogle
14 CowFront 17 17 17 ImgGoogle fragment (top), the associated centroid vector for this entry
15 Person 19 20 19 Graz02 [12] (middle), and the training image where the boundary frag-
16 Mug 15 15 15 ImgGoogle ment (shown in red/bold) was extracted.
17 Cup 16 15 16 ImgGoogle

Table 1: The number of training, validation and test images.


tors when adding categories incrementally. The more sim-
ilar categories are already known the more can be shared.
This can be confirmed by a simple experiment where the
alphabet entries is on average shared over approximately 5
category HorseSide is incrementally learnt, based on the
categories. The alphabet can be used to take a first glance
previous knowledge of an already learnt category CowSide,
at class similarities. Figures 8(a) and (b) show the results
showing that 18 weak detectors are shared. In comparison,
of the procedure described in section 4.2. The correlations
the joint learning shares a total of 32 detectors (CowSide
visible in the similarity matrix are due to alphabet entries
also benefits from HorseSide features). For the 17 cate-
that can be shared over categories. The dendrogram for the
gories incremental learning shows its advantage at the al-
17 categories shows some intuitive similarities (e.g. for the
phabet level. We observe (see figure 8(c)) that the alphabet
CarRear and CarFront classes).
requires only 779 entries (worst case approximately 1700
for our choice of the threshold thK , giving roughly a set of
Incremental learning: Here we investigate our incre- 100 boundary fragments per category).
mental learning at the alphabet level, and on the number Figure 8(c) shows the increase in number of shared weak
of weak detectors used. We compare its sharing abilities detectors, when a number of new categories are added in-
to independent and joint learning. A new category can be crementally, one category at a time. Assuming we do learn
learnt incrementally, as soon as one or more categories have 100 weak detectors per category the number of the worst
already been learnt. This saves the effort of a complete re- case (1700) can be reduced to 1116 by incremental learn-
training procedure, but only the new category will be able to ing. Learning all categories jointly reduces the number of
share weak detectors with previously learnt categories, not used weak detectors even further to 623. However, a major
the other way round. However, with an increasing number advantage of the incremental approach is the significantly
of already learnt categories the pool of learnt weak detec- reduced computational complexity. While the joint learn-
tors will enlarge and give a good basis to select shareable ing with I validation images requires O(2C I) steps for each
weak detectors for the new unfamiliar category. We thus weak detector, incremental learning has a complexity of
can expect a sublinearly growing number of weak detec- only O(hL I) for those weak classifiers (from already learnt

Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)
0-7695-2597-0/06 $20.00 © 2006 IEEE
1
1600
alphabet entries
weak detectors (incr.)
1400 0.8
weak detectors (joint)

nr. of alphabet entires /

1 − RPC−equ.−error
nr. of weak detectors
worst case
1200

1000 0.6

800
0.4
600
independently
400
0.2 jointly
200

0 0
0 2 4 6 8 10 12 14 16 17 0 5 10 15 20 25 30
Number of categories Number of training images per class

(a) (b) (c) (d)

Figure 8: (a) Similarity matrix of alphabet entries for the different categories (brighter is more similar). (b) Dendrogram
generated from this similarity matrix. (c) The increase in the number of alphabet entries and weak detectors when adding
new classes incrementally or training a set of classes jointly. The values are compared C to the worst case (linear growth,
dotted line). For weak detectors the worst case is training independent and given by ( i=1 Tci ), and for the alphabet we
approximate the worst case by assuming an addition of 100 boundary fragments per category. Classes are taken sequentially
(Planes(1), CarRear(2), Motorbike(3), ...). Note the sublinear growth. (d) Error averaged for 6 categories (Planes, CarRear,
Motorbike, Face, BikeSide and HorseSide) either learnt independently or jointly with a varying number of training images
per category.

weak classifiers) that can be shared. One could use the in- egories test set (category images and background images),
formation from the dendrogram from figure 8(b) to find out denoted by T and on the multiclass test set (M) for both
the optimal order of the classes for the incremental learning, cases. It also gives comparisons to some other methods that
but this is future work. used this data in the single category case where we used the
same test data. The joint learning procedure does not sig-
nificantly reduce the detection error (although we gain more
Joint learning: First we learn detectors for different than we loose), but we gain in requiring just 623 weak de-
aspects of cows, namely the categories CowSide and tectors instead of the straightforward 1700 (i.e. 100 times
CowFront independently, and then compare this perfor- the number of classes for independent learning). Errors are
mance with joint learning. For CowSide the RPC-equal- more often because of false positives than false negatives.
error is 0% for both cases. For CowFront the error is re- We are superior or similar in our performance compared to
duced from 18% (independent learning) to 12% (joint learn- state-of-the-art approaches (note that classification is easier
ing). At the same time the number of learnt weak hypothe- than detection) as shown in table 2. Looking at the multi-
ses is reduced from 200 to 171. We have carried out a sim- class case (I, M, and J, M, in error per image), we obtain
ilar comparison for horses which again shows the same be- comparable error rates for independent and joint learning.
havior. This is due to the reuse of some information gath- Figure 9 shows examples of weak detectors learnt in this
ered from the side aspect images to detect instances from experiment, and their sharing over various categories.
the front. Information that is shared here are e.g. legs,
or parts of the head. This is precisely what the algorithm
should achieve – fewer weak detectors with the same or a 7 Discussion
superior performance. The joint algorithm has the opportu-
nity of selecting and sharing a weak detector that can sep- It is worth comparing our algorithm and results to that
arate both classes from the background. This only has to of Torralba et al. [16]. We have used AdaBoost instead of
be done once. On the other hand, the independent learning GentleBoost (used in [16]) as in experiments it gave supe-
does not have this opportunity, and so has to find such a rior performance and proofed that it is more suitable for our
weak detector for each class. type of weak detectors. Compared to [16] we share signif-
In figure 8(d) we show that joint learning can achieve icantly fewer entries as they have a 4-fold reduction, com-
better performance with less training data as a result of shar- pared to our 2-fold reduction. This is mainly caused by their
ing information over several categories (we use 6 categories type of basic features which are much less complex and thus
in this specific experiment). more common over different categories than ours.
Finally we focus on many categories, and compare in- Initial experiments show that a combination of our model
dependent learning performance to that achieved by learn- with appearance patches increases the detection perfor-
ing jointly. Table 2 shows the detection results on the cat- mance, but this is the subject of future work.

Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)
0-7695-2597-0/06 $20.00 © 2006 IEEE
Class Plane CarR Mb Face B-S B-R B-F Car23 CarF Bottle CowS H-S H-F CowF Pers. Mug Cup
6.3 6.1 7.6 6.0 0.0
Ref.
[5],C [10],D [15],D [15],D [10],D
I,T 7.4 2.3 4.4 3.6 28.0 25.0 41.7 12.5 10.0 9.0 0.0 8.2 13.8 18.0 47.4 6.7 18.8
J,T 7.4 3.2 3.9 3.7 22.4 20.8 31.3 12.5 7.6 10.7 0.0 7.8 11.5 12.0 42.0 6.7 12.5
I,M 1.1 7.0 6.2 1.4 10.3 7.7 8.5 5.2 7.6 7.1 1.6 10.0 8.2 9.5 29.1 5.1 8.0
J,M 1.5 4.3 4.5 1.6 8.9 5.9 7.7 3.8 8.5 6.1 1.3 11.0 4.7 6.8 27.7 5.8 8.3

Table 2: Detection results. In the first row we compare categories to previously published results. We distinguish between
detection D (RPC-eq.-err.) and classification C (ROC-eq.err.). Then we compare our model, either trained by the independent
method (I) or by the joint (J) method, and tested on the class test set T or the multiclass test set M. On the multiclass set
we count the best detection in an image (over all classes) as the object category. The abbreviations are: B=Bike, H=Horse,
Mb=Motorbike, F=Front, R=Rear, S=Side.

Figure 9: Examples of weak detectors that have been learnt for the whole dataset (resized to the same width for this illus-
tration). The black rectangles indicate which classes share a detector. Rather basic structures are shared over many classes
(e.g. column 2). Similar classes (e.g. rows 5, 6, 7) share more specific weak detectors (e.g. column 12, indicated by the arrow,
where parts of the bike’s wheel are shared).

Acknowledgements [7] http://www.emt.tugraz.at/∼pinz/data/multiclass/.


This work was supported by the Austrian Science Fund, [8] http://www.pascalnetwork.org/challenges/VOC/.
FWF S9103-N04, Pascal Network of Excellence and EC [9] S. Krempp, D. Geman, and Y. Amit. Sequential learning
of reusable parts for object detection. Technical report, CS
Project CLASS.
Johns Hopkins, 2002.
[10] B. Leibe, A. Leonardis, and B. Schiele. Combined ob-
References ject categorization and segmentation with an implicit shape
model. In ECCV04. Workshop on Stat. Learning in Com-
[1] S. Agarwal and D. Roth. Learning a sparse representation puter Vision, pages 17–32, May 2004.
for object detection. In Proc. ECCV, volume 4, pages 113– [11] D. Magee and R. Boyle. Detection of lameness using
130, 2002. re-sampling condensation and multi-steam cyclic hidden
[2] E. Bart and S. Ullman. Cross-generalization:learning novel markov models. IVC, 20(8):581–594, 2002.
classes from a single example by feature replacement. In [12] A. Opelt, M. Fussenegger, A. Pinz, and P. Auer. Generic
Proc. CVPR, volume 1, pages 672–679, 2005. object recognition with boosting. PAMI, 28(3), 2006.
[3] G. Csurka, C. Bray, C. Dance, and L. Fan. Visual catego- [13] A. Opelt, A. Pinz, and A. Zisserman. A boundary-fragment-
rization with bags of keypoints. In ECCV04. Workshop on model for object detection. In Proc. ECCV, volume 2, pages
Stat. Learning in Computer Vision, pages 59–74, 2004. 575–588, May 2006.
[4] L. Fei-Fei, R. Fergus, and P. Perona. Learning generative [14] E. Seemann, B. Leibe, K. Mikolajczyk, and B. Schiele. An
visual models from few training examples: An incremental evaluation of local shape-based features for pedestrian de-
bayesian approach tested on 101 object categories. In Proc. tection. In Proc. BMVC, 2005.
CVPR Workshop on Generative-Model Based Vision, 2004. [15] J. Shotton, A. Blake, and R. Cipolla. Contour-based learning
[5] R. Fergus, P. Perona, and A. Zisserman. Object class recog- for object detection. In Proc. ICCV, volume 1, pages 503–
nition by unsupervised scale-invariant learning. In Proc. 510, 2005.
CVPR, pages 264–271, 2003. [16] A. Torralba, K. P. Murphy, and W. T. Freeman. Sharing
[6] Y. Freund and R. Schapire. A decision theoretic generali- features: efficient boosting procedures for multiclass object
sation of online learning. Computer and System Sciences, detection. In Proc. CVPR, 2004.
55(1):119–139, 1997.

Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)
0-7695-2597-0/06 $20.00 © 2006 IEEE

You might also like