0% found this document useful (0 votes)
6 views

Class-Specific Material Categorisation: (Caputo, Hayman) @nada - Kth.se

Uploaded by

shobharani0116
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Class-Specific Material Categorisation: (Caputo, Hayman) @nada - Kth.se

Uploaded by

shobharani0116
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Class-Specific Material Categorisation

Barbara Caputo, Eric Hayman and P. Mallikarjuna


Computational Vision and Active Perception Laboratory
School of Computer Science and Communication
Royal Institute of Technology (KTH), SE-100 44 Stockholm, Sweden
{caputo,hayman}@nada.kth.se

Abstract respect to different samples of the same material. For in-


Although a considerable amount of work has been pub- stance, it is frequently more useful to be able to recognise
lished on material classification, relatively little of it stud- any wooden table than only the particular piece of wood that
ies situations with considerable variation within each class. was supplied during the training stage of the algorithm. Al-
Many experiments use the exact same sample, or different though recognising texture patterns drawn from fairly broad
patches from the same image, for training and test sets. classes has been studied within for instance remote sensing
Thus, such studies are vulnerable to effectively recognising and medical imaging, there is a surprising lack of literature
one particular sample of a material as opposed to the ma- on classification techniques for use on autonomous indoor
terial category. In contrast, this paper places firm empha- robots, and database retrieval using materials. (We refer to
sis on the capability to generalise to previously unseen in- sec. 2 for an in-depth review of relevant work.)
stances of materials. We adopt an appearance-based strat- The focus of this paper 1 is firmly on classifying unseen
egy, and conduct experiments on a new database which con- samples of materials; we do not deal with exemplar identi-
tains several samples of each of eleven material categories, fication, but with recognition of categories. The notion of a
imaged under a variety of pose, illumination and scale con- category is key. Categories group similar items, but “simi-
ditions. Together, these sources of intra-class variation pro- larity” need not be entirely visual — in our work similarity
vide a stern challenge indeed for recognition. Somewhat is fundamentally determined by objects’ materials, but the
surprisingly, the difference in performance between vari- same material may be treated in different ways. Consider,
ous state-of-the-art texture descriptors proves rather small for instance, asking an autonomous home robot or internet
in this task. On the other hand, we clearly demonstrate search engine to find a woolly sweater. Valid members of
that very significant gains can be achieved via different this category are made with thread woven and knitted in
SVM-based classification techniques. Selecting appropriate a variety of fashions. This example also shows that for a
kernel parameters proves crucial. This motivates a novel vision system to perform a task specified by a human, cate-
recognition scheme based on a decision tree. Each node gories may also have to be defined with a flavour of seman-
contains an SVM to split one class from all others with a tics. Moreover, often it is not even clear where boundaries
kernel parameter optimal for that particular node. Hence, between categories should be drawn.
each decision is made using a different, optimal, class- We pursue an appearance-based approach to the cate-
-specific metric. Experiments show the superiority of this gorisation problem. The notion is that if the system has
approach over several state-of-the-art classifiers. seen something similar during training, it should be able to
correctly classify a new image. We use visual texture from
1. Introduction greyscale images as input. For now we do not handle the
Recognising materials in uncontrolled environments is issue of segmentation, and instead assume the existence of
useful in a number of applications. In object recognition, 200 × 200 pixel patches, containing only foreground.
for instance, an object’s material may be its most charac- Implementing a recognition scheme of this type typically
teristic property, as opposed to shape or colour. Further- requires the designer to decide upon (i) which descriptors
more, knowledge of the material can be useful in robotic (ii) which classifier; and (iii) which similarity measure to
manipulation tasks since it assists in adopting an appro- use within the classifier. The exact same stages apply to
priate gripping strategy, depending on how much the ob- both categorisation and other recognition tasks. Yet with
ject can be allowed to deform, and on how much friction greater intra-class variability, a categorisation problem will
could be anticipated. Such real-world applications require 1 This work was supported by the EU projects FP6-IST-0042500 CoSy
robust recognition algorithms, able to cope with unknown (BC) and IST-2000-29688 Insight2+ (EH) and the VISCOS project funded
pose, illumination and scale, and able to generalise with by the Swedish Foundation for Strategic Research (EH).
require greater power in one or several of the three ar- [23], and [22] performed a comparative study of several tex-
eas. In our work we do not focus on deriving new de- ture features. Texture analysis has also been used for med-
scriptors, we use existing texture features from the literature ical diagnosis, for instance [11] used conventional images
[24, 25, 20, 16]. Instead we provide contributions with re- for classifying skin lesions.
spect to pattern-classification techniques, thus focusing on There are, however, significantly fewer studies for ap-
points (ii) and (iii) above. plications in robotic vision or material-based database re-
More specifically, the contributions of the paper are: trieval. Instead, many papers on texture classification use
• To provide an in-depth investigation of material cate- non-overlapping patches from the same image, taken for in-
gorisation as opposed to identification of a previously stance from the Brodatz collection [5], for training and test
seen sample. This is aimed at applications for assistive sets. Although this kind of experiment does require some
robots and internet search engines. generalisation capabilities, the difference between training
and test images tends to be more limited than if completely
• We introduce a new database containing four samples different samples were used. Moreover, for material clas-
of each of 11 materials. Each sample is imaged under sification to function in an uncontrolled environment, it is
varying scale, illumination and pose conditions. The important to conduct experiments using also 3D materials
database is publicly available via the Internet [17]. since changing pose or lighting can induce shadows, mu-
tual occlusions and highlights. In this situation the 61-class
• We benchmark existing techniques from material iden-
CUReT database [10] represents the most popular testbed.
tification on the new database. While the choice
It is important to note, however, that the CUReT database
between four state-of-the-art descriptors [24, 25, 20,
mostly only contains one physical sample per class.
16] has little effect, using Support Vector Machines
(SVMs) over Nearest Neighbour classifiers proves Certainly very impressive results up to 98% have been
beneficial, as it did for material identification in [12]. reported on the CUReT database [15, 8, 24, 25, 21, 12]. One
successful strand of research described image patches by
• We present a novel SVM-based algorithm which ex- textons histograms where each pixel is assigned to a char-
ploits the fact that different kernel parameters should acteristic texture element, or texton. Pioneered by Leung
be selected for each class. It consists of a decision and Malik [15], and refined in [8, 24], texons were defined
tree in which different kernel parameters, and possi- as cluster centres in a feature space derived from the output
bly different features, may be used at each node in the of a filter-bank. Cula and Dana [9] also used their method
tree. The descriptor type and/or the parameters are au- from [8] to distinguish between skin samples from three dif-
tomatically chosen to be optimal for each particular 2- ferent parts of the hand. Results were very satisfactory, yet
class sub-problem, thus resulting in a class-specific al- in our work we are interested in problems with more than
gorithm. Experiments clearly show the superiority of three classes, and each class contains greater variation.
this approach over other, existing techniques. More recently the role of filter-banks has been ques-
The latter contribution should also be of interest to re- tioned. With the joint descriptor Varma and Zisserman [25]
searchers in other areas of visual pattern classification such kept the definition of a texton as a cluster centre, but instead
as object and face recognition. formed the initial feature vector at each pixel by concatenat-
The remainder of the paper is organised as follows. Pre- ing all greyscale values from a fixed size patch. Two other
vious work on material recognition is reviewed in sec. 2. variants were proposed, including the the MRF descriptor
Sec. 3 introduces the new image database, while sec. 4 ap- which additionally modelled the distribution of the central
plies existing techniques for material recognition on this pixel’s greyvalue, conditioned on the assigned texton.
database. Sec. 5 discusses the significance of choosing the The basic idea of the Local Binary Pattern (LBP) de-
kernel parameters in SVMs. Novel methods based on de- scriptor, originally proposed by Ojala et al. in [19], is to
cision trees are presented in sec. 6 before conclusions are threshold each of N pixels in a circular region with the grey-
drawn in sec. 7. value of the central pixel. Each of the N pixels thus pro-
duces one bit, and these bits are concatenated into the LBP.
2. Previous work on material recognition Consequently, there are 2N possible patterns, each of which
This section reviews existing literature on recognising may be considered a texton. Variants of LBP have been in-
materials and includes a brief discussion of the descriptors troduced to provide rotational invariance, and also to reduce
we will use in this paper. Relevant work on kernel methods the effect of random noise (the uniform LBP) [20]. These
for visual pattern recognition will be discussed as and when have the advantage of considerably reducing the number of
it is needed in secs. 5 and 6. possible textons. Another variant forms a multi-resolution
Using texture for recognising fairly broad classes is im- descriptor by concatenating histograms for several neigh-
portant within remote sensing where the aim is to distin- bourhoods at different radii into a single feature vector [20].
guish between water, forest, crops and urban areas, or sub- Common to these systems is that a Nearest-Neighbour
classes thereof. For instance, Gabor filters were used in Classifier (NNC) is used. Both the query and labelled train-
Cork Wool Lettuce leaf Aluminium Corduroy Linen Cotton Brown White Wood Cracker
foil bread bread

Figure 1. The variations within each category of the new KTH-TIPS2 database. Each row shows one example image from each of four
samples of a category. In addition, each sample was imaged under varying pose, illumination and scale conditions.

ing images are represented by texton histograms, and the 3. The KTH-TIPS2 material database
Nearest-Neighbour is sought using some distance measure We now introduce the new KTH-TIPS2 material database
between model and query histograms. which may be obtained from [17]. It builds on the KTH-
As an alternative to the dense features discussed above, TIPS database discussed in the previous section, but pro-
sparse feature sets may be applied as in Lazebnik et al. [14] vides a considerable extension by imaging multiple differ-
who perform segmentation and recognition within the same ent samples of different materials. This provides a much
framework. It remains an open question whether dense or more satisfactory basis for experiments than [12]. More
sparse features are best suited for material classification. samples per category are available, and the need for arti-
The experiments in most of the papers reviewed above ficially rescaled CUReT images used in [12] is eliminated.
perform training and test on different images of the same The database contains 4 physical, planar samples of each
physical sample of the CUReT database, so it is highly of 11 materials. One image from each sample is shown in
questionable whether this can be called material classifica- fig. 1. Many of these materials have 3D structure, implying
tion. Motivated by a wish to use material classification in that their appearance can change considerably as pose and
real-world scenarios, [12] conducted experiments on new lighting are changed.
images of different samples of a subset of CUReT materi- The variation in appearance between the samples in each
als. Having also flagged robustness to scale as crucial, these category is larger for some categories than others. Cork,
new samples were imaged not only at different pose and il- for instance, contains relatively little intra-class variation,
lumination, but also at varying scale. Hence the database while cracker and wool exhibit significant variation. As dis-
was called KTH-TIPS (Textures under varying Illumina- cussed in the Introduction, the appearance of wool depends
tion, Pose ad Scale) [17]. Training on artificially rescaled not only on the material, but also on how it has been treated,
CUReT images, and testing on KTH-TIPS images, gave in this case how the thread was spun and subsequently knit-
disappointing recognition rates of just over 20 % in a 61- ted. Moreover, it is not always obvious how the samples
class problem. should be split into categories. For instance, brown bread
Moreover, [12] clearly demonstrated the benefits of and white bread are subclasses of “bread”, and it might also
SVMs [7] over NNCs on the CUReT database. For the ker- make sense to group linen and cotton together in a “woven
nel [12] used the Gaussian Radial Basis Function (RBF) and fabric” class. As such we believe that this database pro-
χ2 kernel [4] vides a good platform for future studies of unsupervised or
supervised grouping of classes into higher-level categories,
 |xi − yi |2 whether visual or semantic, in a hierarchical structure. Fur-
K = exp{−γ χ2 (x, y)}, χ2 = . (1)
i
|xi + yi | thermore, all 11 materials in KTH-TIPS2 are also present
in the CUReT database [10], opening for the possibility
Histograms were treated as feature vectors and normalised for researchers to conduct experiments on a combination of
to unit length, and the kernel parameter γ was found by databases.
cross-validation within the training set. The acquisition of KTH-TIPS2 images largely followed
100 100
the procedure used for KTH-TIPS and is described in more SVM SVM

Classification rate (%)

Classification rate (%)


90 Nearest−Neighbour 90 Nearest−Neighbour
detail in [17]. The database contains images at 9 scales
80 80
equally spaced logarithmically over two octaves. KTH-
70 70
TIPS2 contains images at the same 3 poses as KTH-TIPS
(frontal, rotated 22.5◦ left and 22.5◦ right), but 4 rather than 60 60

50 50
3 illumination conditions. The 3 illuminations from KTH- 1 2 3 1 2 3
TIPS are used (frontal, 45◦ from the top and 45◦ from the No. of training samples per material No. of training samples per material

side, all taken with a desk-lamp with a Tungsten light bulb), (a) 3-scale LBP (b) 4-scale LBP
and for the fourth illumination condition we switched on the
fluorescent lights in the laboratory. Although some varia- 100
SVM
100
SVM

Classification rate (%)


Classification rate (%)
Nearest−Neighbour Nearest−Neighbour
tion in pose and illumination is present, both KTH-TIPS and 90 90

KTH-TIPS2 contain significantly fewer settings for lighting 80 80

and viewing angle than CUReT. 70 70

60 60

4. From identification to categorisation 50 50


1 2 3 1 2 3
No. of training samples per material No. of training samples per material
The aim of the experiments in this section is to evalu-
(c) MR8 (d) Joint, 11 × 11 patch size
ate whether existing techniques which have previously been
applied to identifying particular samples of materials, may Figure 2. Categorisation on the KTH-TIPS2 database, compar-
be used in a categorisation task. The presentation of our ing SVMs and Nearest-Neighbour classifiers on various texture
further enhancements, in terms of improved classifiers, is descriptors. 1-standard deviation error-bars are shown.
delayed until later in the paper. We reiterate that recog-
nition and categorisation may be performed within exactly methodology of [12], and used the χ2 -kernel in eqn (1)
the same pattern recognition framework, but that categori- since that gave best results in identification experiments in
sation is likely to require either more powerful descriptor or [12].
classifier (or both). We first perform experiments where only a single sam-
More specifically, the goals in this section are (i) to study ple is available during training. All images of that sample
which descriptor is best suited to the task; (ii) to study the are placed in the training set, and testing is subsequently
performance of SVMs relative to Nearest-Neighbour Clas- performed on all images of all remaining samples. This ex-
sifiers (NNCs); and (iii) to investigate the improvement in periment is repeated four times with different samples in
performance as more samples are introduced into the train- training. We also perform similar experiments with two and
ing set, thus providing a richer model of each material. then three samples in the training set. Testing is always con-
In this paper we compare four different descriptors: the ducted only on unseen samples, and so we truly are studying
rotationally invariant MR8 descriptor [24], the joint descrip- categorisation as opposed to exemplar identification.
tor of [25] using an 11 × 11 patch, the rotationally invariant,
riu2 Figs. 2a-d illustrate results for the four different descrip-
uniform LBP descriptor at 3 scales, LBP8,1+16,3+24,5 , as
riu2
tors, comparing NNC with SVM. Results are averaged over
in [21], and a 4-scale variant LBP8,1+8,2.4+16,4.2+16,6.2 ad- four runs, and 1-standard deviation error bars are shown.
riu2
vocated in [16]2 . LBP8,1+16,3+24,5 , denotes a rotationally Unsurprisingly, it is of clear benefit to include more sam-
invariant uniform (“riu2”) descriptor with neighbourhoods ples in the training set. Classification rates from SVM are
of 8, 16 and 24 pixels at radii 1, 3 and 5 respectively. In the typically 55–60% for one training sample, and a little over
remainder of the paper, these LBP descriptors are denoted 70 % for 3 training samples. With NNC the classification
3-scale LBP and 4-scale LBP respectively. rates are consistently lower by 5–15%. The graphs for NNC
For the MR8 and joint descriptors we use a texton vocab- and SVM are separated by 1–4 standard deviations, imply-
ulary of 440 textons, whereas the 3-scale LBP and 4-scale ing that there is a statistically meaningful difference in per-
LBP descriptors have fixed sizes of 54 and 56 textons re- formance between the two classifiers. Thus, we conclude
spectively. Thus the LBP descriptors are more compact, and that SVMs are superior to NNC in material categorisation,
are also the fastest to compute. We have not attempted to similar to how [12] demonstrated the benefits of SVMs in
tweak any parameters, and have used parameters suggested material identification. We also tried k-Nearest Neighbour
in the original papers where applicable. classifiers, but did not observe any improvement for any de-
With NNC we used the χ2 -distance between histograms scriptor for k > 1 when averaging over the four runs.
in eqn (1). We also tested the cumulative log distance used Fig. 3 shows SVM results from all four descriptors on a
in [16, 21], but that gave slightly inferior results, and will single graph. There is no significant trend to show the supe-
not be reported any further. For SVM we followed the riority of any descriptor – the plots are all closely grouped.
2 We tested scale-dependent Gaussian smoothing as suggested in [16], This is in stark contrast to the improvement which could
yet found this yielded slightly inferior results to the unsmoothed variant. be achieved using a different classifier. These experiments
We therefore do not report these results further therefore clearly motivate a continued search for more pow-
100 Cork 100
3−scale LBP Lettuce leaf

Classification rate (%)

Classification rate (%)


75 4−scale LBP 80 80

Classification rate (%)


MR8
Joint 60 60
70
40 40

20 20 Cork
65 Lettuce leaf
0 −3 −2 −1 0 1 2 3
0 −3 −2 −1 0 1 2 3
10 10 10 10 10 10 10 10 10 10 10 10 10 10
γ γ
60
(a) 4-scale LBP (b) Joint, 11 × 11 patch size

55
1 2 3 Figure 4. Classification results for Cork and Lettuce leaf using
No. of training samples per material
SVMs in an 11-class problem, for 4-scale LBP (a) and Joint
Figure 3. Categorisation on the KTH-TIPS2 database, compar- (b) representation. Results are reported as the kernel parameter
ing various texture descriptors. SVM is used for classification. γ ∈ [10−3 , 103 ] (logarithmic scale).
None of the descriptors is a clear winner by any means.

map Φ : m → H such that K(x, y) = Φ(x) · Φ(y) for


erful classification techniques, which is the aim of the re- all x, y ∈ m . If the original data is d-dimensional, the
mainder of this paper. mapped data will lie on an at most d-dimensional surface S
With NNC we also conducted some experiments with the in H [7]. Thus, the distance effectively used by the classifier
full MRF descriptor of [25] using 90 bins. It proved slightly is that along the surface S, and its study requires the use of
better (1–2%) than the joint descriptor, yet the combination a Riemannian metric defined by an appropriate tensor [7, 6]
of NNC with MRF was still clearly inferior to the use of  
SVM with any of the other descriptors. Due to the very 1 ∂ 2 K(x, x) ∂ 2 K(x, y)
large size of MRF feature vectors, we decided against in- gij (y) = − , (2)
2 ∂xi ∂xj ∂xi ∂xj x=y
cluding it in further experiments in this paper.
which for the χ2 kernel is (we omit the proof for lack of
5. Kernels and metrics space)
Results reported in the previous section suggest the

promise of SVMs for material categorisation. However, χ 2 γδij 1 i=j
gij = , δij = . (3)
learning is typically time-consuming for SVMs and SVM- |yi | 0 i = j
based algorithms, as it involves (a) finding the support vec-
tors and the corresponding weighting coefficients for a fixed With this analysis in mind, fig. 4 can better be inter-
kernel type and fixed kernel parameters; and (b) searching preted. First, different features will generally have differ-
for the optimal kernel parameters given a kernel type. Mo- ent dimensionality, and we saw above that the dimensional-
tivated by the results of [12], we decided in this paper to ity of the mapped manifold depends on the dimensionality
focus on just one kernel type, the exponential χ2 kernel in of the input data. This explains why, for the same mate-
eqn (1). This kernel has only one parameter γ to optimise. rial, the recognition rate changes differently as γ varies, for
The choice of γ heavily affects the performance of the al- different features. Second, for a given dimensionality of
gorithm. Let us for instance revisit the experiments of fig. 2 the mapped manifold, γ decides the amount of deformation
using SVMs in an 11-class problem. Fig. 4 illustrates the of the mapped manifold. Results reported in fig. 4 tell us
impact of γ on the classification of 2 out of the 11 materi- that the two materials have different optimal metrics, corre-
als (Cork and Lettuce leaf) using two different descriptors, sponding to the optimal deformation according to eqn (3).
4-scale LBP (fig. 4a) and Joint (fig. 4b), for γ ∈ [10−3 , 103 ]. Finally, we remark that the kernel and its associated met-
We see that recognition rates vary quite dramatically as γ ric have such an impact on performance because they are
changes, but unfortunately it is almost impossible to pre- used in an SVM framework. Indeed, the decision function
dict, experimentally or theoretically, for which value of γ of an SVM consists of a weighted linear combination of
the algorithm will obtain the optimal performance. The kernel functions computed on discriminative stored vectors
standard solution in the literature (adopted throughout this (the support vectors) and the test vector. It is this linear
paper) uses cross-validation to select a value of γ yielding combination of nonlinear metrics, as opposed to the met-
the optimal compromise over all classes. This will not, in ric itself, which provides the improvement over NNC in the
general, yield the best possible choice for each material. experiments of sec. 4. This is because using the χ2 kernel
This behaviour may be interpreted by studying the intrin- in an NNC is precisely equivalent to using the χ2 similarity
sic geometry of the manifold onto which data are mapped measure. Relative to the χ2 distance measure, the addition
by the kernel function. More specifically, given data points of the exponent and γ factor in the χ2 kernel merely gives
{x} ∈ m and symmetric continuous functions satisfying a monotonic change in the distance function, and therefore
Mercer’s condition, there exists a Hilbert space H and a does not change any decisions of an NNC whatsoever.
6. Class-specific SVM-decision trees ing for texture classification, but CS-SVM can, via local
In the previous section we analysed why the recogni- kernels [26]. Second, generalised Gaussian kernels have 3
tion results obtained by SVMs in sec. 4 varied, for each kernel parameters (γ, a, b) to be optimised for each class.
material, as γ and the descriptors change. We now use Thus the learning stage is very expensive for K-CSC, while
those findings to motivate new SVM-based methods, which with CS-SVM one may limit computational complexity by
we will show yield enhanced performance over traditional choosing a kernel with fewer parameters. Finally, the per-
SVMs and other kernel-based classifiers. After describing formance of K-CSC depends heavily on the choice of the
the algorithm, and discussing its merits and differences in reference hypothesis, which is purely heuristic [3].
comparison to other recent state-of-the-art classifiers, we Recently, [18] proposed a method which combines an
will present experimental results proving its effectiveness SVM-based algorithm and decision trees. It consisted of
for material categorisation. a cue integration scheme which extended the idea of prob-
The lessons from fig. 4 and eqn (3) were that for a given abilistic accumulation of cues to discriminative classifiers.
type of descriptor, each material has an optimal γ with re- SVMs were trained for each cue independently, and deci-
spect to the recognition performance. Typically, this value sions were made based on the linear combination of all mar-
of γ differs from material to material, and also from de- gins, where the margin is the distance from the test vector
scriptor to descriptor for a given material. Thus, the value to the separating hyperplane. The method was proposed
of γ which is selected during training for SVM is the best alone (Discriminative Accumulation Scheme, or DAS) or
compromise for all materials. Yet the performance would combined with a decision tree (DAS-DT). The two meth-
certainly improve if we could use a class-specific classifier ods (particularly DAS-DT) performed very well for object
where each material is classified according to its optimal γ. recognition in realistic settings, but the authors did not pro-
This can be achieved using a decision tree where deci- vide any theoretical analysis of why it was so. There are
sions are made at each node via SVMs. At each step, de- some similarities between our algorithm and DAS-DT: both
cision trees usually automatically split the classes into two use decision trees, and both make decisions at each node
subgroups such that the composition of the subgroups opti- with an SVM-based algorithm. Yet a closer look reveals
mises the recognition rate. Here we force the algorithm to key differences, and some advantages of our approach:
search for the optimal split of the type “one class against all (i) In DAS-DT a single set of kernel parameters are used
remaining classes”. The decision is taken by an SVM with throughout the entire decision tree, implying a compromise
a kernel parameter optimal for that particular node, selected between different classes. On the other hand, our algo-
via cross-validation during training. Thus, each material is rithm optimises separate parameters for each class to im-
classified using a different, optimal, class-specific metric. prove overall performance. Thus, DAS-DT does not fully
We call this algorithm Class Specific-SVM (CS-SVM). exploit the properties of the metric induced by the kernel.
(ii) While we limit the search to ‘one-against-the-rest’
Discussion of related work: We are not aware of previ- splits, DAS-DT has no constraints on how to divide classes
ous work on decision trees combined with SVM where at into two subgroups at each node. Although our main moti-
each node the kernel parameters are optimised. The idea vation was to derive a class-specific classifier, our method
of feature-based class-specific classifiers was explored by also gives a much faster learning algorithm. With N classes
Baggenstoss et al. [1] in a probabilistic framework. They DAS-DT conducts a search between N (N − 1)/2 possible
proposed using different features for different classes in a combinations, as opposed to only (N − 1) in our algorithm.
Bayes classifier. This was achieved by introducing a com- (iii) DAS-DT uses all cues, at each node, for all classes,
mon reference hypothesis class, and computing probability while CS-SVM uses one single cue and relies on the class-
ratios for each class relative to the reference class. Probabil- -specific optimisation of the kernel parameters to obtain an
ity density functions were modelled by Gaussian mixtures. increase in performance. CS-SVM is therefore faster to
The method showed promising results on signal process- compute than DAS-DT, both for training and test, and it has
ing applications, but its extension to visual pattern recogni- less memory requirements.
tion proved problematic. [3] presented an extension of the
method to kernel Gibbs distributions with kernels as energy Experimental results: A first set of experiments eval-
functions. Thus, the choice of the optimal features for each uates the class-specific decision tree for each of the four
class became the choice of the optimal kernel parameters, feature types described in sec. 4, and compares the new al-
and so their algorithm was called the Kernel-Class Specific gorithm with SVM and NN classifiers. The experimental
Classifier (K-CSC). In comparison with our methods, K- setup is analogous to that described in sec. 4. Figs. 5a-d,
-CSC suffers three main limitations: First, theoretical con- top row, show results for the four different descriptors. As
straints limit the choice of kernel to generalised Gaussian in sec. 4, results are averaged over four runs, and 1-standard
kernels [3], while our CS-SVM can use any kernel type. deviation error bars are shown. Again we see that the per-
For K-CSC, this implies concrete limitations on the type of formance depends little on the feature type, but heavily on
descriptors that can be used. For instance, K-CSC cannot the classifier. Recognition rates for CS-SVM are typically
use as input local descriptors, which in [14] proved promis- 68-70% for one training sample, and 80-84% for 3 train-
100 100 100 100
CS−SVM CS−SVM CS−SVM CS−SVM

Classification rate (%)

Classification rate (%)

Classification rate (%)

Classification rate (%)


SVM SVM SVM SVM
90 Nearest−Neighbour 90 Nearest−Neighbour 90 Nearest−Neighbour 90 Nearest−Neighbour
80 80 80 80

70 70 70 70

60 60 60 60

50 50 50 50
1 2 3 1 2 3 1 2 3 1 2 3
No. of training samples per material No. of training samples per material No. of training samples per material No. of training samples per material

100 100 100 100


CS−SVM CS−SVM CS−SVM CS−SVM
Classification rate (%)

Classification rate (%)

Classification rate (%)

Classification rate (%)


K−CSC K−CSC K−CSC K−CSC
90 DAS−DT 90 DAS−DT 90 DAS−DT 90 DAS−DT
80 80 80 80

70 70 70 70

60 60 60 60

50 50 50 50
1 2 3 1 2 3 1 2 3 1 2 3
No. of training samples per material No. of training samples per material No. of training samples per material No. of training samples per material

(a) 3-scale LBP (b) 4-scale LBP (c) MR8 (d) Joint, 11 × 11 patch size

Figure 5. Categorisation with four different texture descriptors on the KTH-TIPS2 database, comparing Decision Trees with SVMs and
Nearest Neighbour classifiers (top row) and with DAS-DT and K-CSC (bottom row). 1-standard deviation error-bars are shown.

ing samples. As the graphs show, these results represent a tors, thus essentially selecting by hand the descriptor which
statistically significant improvement over SVM and NNC. proved best for each particular experiment. The new version
A second set of experiments (Figs 5a-d, bottom row) of CS-SVM more or less halves the error rate in comparison
benchmarked CS-SVM against K-CSC and DAS-DT for with the second best classifier (single-cue CS-SVM).
each of the four feature types, using the same experimen- The easiest material to recognise was Cork (above 98%),
tal setup as previously. DAS-DT, originally born as a cue- followed by Wool (above 95 %). The latter may seem sur-
combination scheme, is forced to use only one cue, but prising, yet a close look at fig. 1 reveals that although the
maintains its decision tree structure. When using a single Wool samples differ greatly globally, the small-scale local
cue, as here, both DAS-DT and SVM optimise a single texture is actually quite similar, and this is evidently cap-
value of γ on the overall performance, but the decision- tured by the descriptors. Cracker was hardest to recognise
making strategy is different between the two. Two ma- (20 – 40 % depending on the number of training samples)
jor conclusions may be drawn. First, for all feature types, followed, less predictably, by Cotton (35 – 55 %). This
the class-specific algorithms (CS-SVM and K-CSC) per- shows particularly well the importance of our database: tak-
form better than the non class-specific ones (DAS-DT, SVM ing different patches from a single physical sample would
and NNC). These results confirm the analysis of sec. 5 on not recreate the variability that cotton presents in reality.
the effectiveness of using kernel parameters (and conse- It is important to note that this latest algorithm has
quently metrics) especially tailored for each material. Sec- increased complexity during learning in comparison with
ond, the SVM-based class-specific algorithm systematically single-cue CS-SVM. For an N -class problem using M
performs better than the probabilistic one. This is consis- cues, and using αk intervals in the search for each k of a
tent with results of [18] which compared a discriminative total of K kernel parameters (in our case K=1 and α1 = 7
accumulation method with a probabilistic one, based (as K- since we optimise one kernel parameter over logarithmi-
CSC is) on kernel Gibbs distributions. This suggests that the cally spaced intervals, γ ∈ [10−3 , 10−2 , ..., 103 ]), then the
modelling assumptions for this family of probability distri- dimension of the search space for the algorithm parame-
butions might not always be appropriate for vision applica- ters will be M (N − 1) k αk . This is, of course, com-
tions. Finally, we remark that the choice of classifier affects pounded by the fact that training an SVM is already quite
performance much more than the feature type, as before. expensive, although promising approaches have recently
In a final, proof-of-concept experiment, we used CS- been suggested for speeding up the process (see for in-
-SVM optimising not only γ, but also the feature type at stance [2] and references therein). A comfort is that feature-
each split node. The aim was to investigate whether the selection CS-SVM is at least much cheaper than feature-
class-specific strategy can be applied to optimising metrics combining DAS-DT  for which the corresponding
 complex-
and descriptors simultaneously. Results in fig. 6 compare ity is M (M − 1) m βm N (N − 1)/2 k αk , where βm
this approach with the other classifiers (NNC, SVM, single- represents the dimension of the search space for each m of
-cue DAS-DT, K-CSC and CS-SVM optimised only for the M − 1 cue weights [18]. In practise, this makes DAS-DT
kernel parameter). Rather than showing graphs for all four unfeasible when the number of classes N is above 4 or 5
descriptors, we select the highest value over all descrip- in a recognition problem, so we have not conducted exper-
100
also be seen as another strategy for the multi-class exten-
sion of SVM. In the future we plan to compare the perfor-
90
mance of CS-SVM with other multi-class methods [13]. A
80 further contribution of this paper was a new, publicly avail-
Classification rate (%) able, database for categorisation of materials.
70 By demonstrating the importance of selecting class-
specific kernel parameters, and providing an algorithm
60 which exploits this fact, this paper should also be of consid-
CS−SVM: Kernel + features erable interest to researchers in other areas of visual pattern
50 CS−SVM: Kernel only: best feature
K−SCS: best feature classification, for instance object or face recognition.
DAS−DT: best feature
40 SVM: best feature
NN: best feature
References
1 2 3
Number of training samples per material [1] P. M. Baggenstoss and H. Niemann. A Theoretically Optimal Prob-
abilistic Classifier Using Class-specific Features ICPR2002
Figure 6. Categorisation on the KTH-TIPS2 database, using [2] G. H. Bakir, L. Bottou, J. Weston. Breaking SVM complexity with
Decision Trees optimised over (i) both kernels and features, cross-training. NIPS2004
and (ii) just the kernel, and selecting the best of the 4 features. [3] B. Caputo and H. Niemann. To each according to its need: kernel
class specific classifiers. ICPR2002.
Results for DAS-DT, K-CSC, traditional SVM and NNC are [4] S. Belongie, C. Fowlkes, F. Chung, and J. Malik. Spectral partition-
also shown where again only the best of the 4 features is shown. ing with indefinite kernels using the nyström extension. ECCV2002.
[5] P. Brodatz. Textures. Dover, 1966.
[6] C. Burges. Geometry and invariance in kernel based methods. Ad-
vances in Kernel Methods - Support Vector Learning, 1998.
iments with DAS-DT on our 11-class database. The added [7] N. Cristianini and J. S. Taylor. An introduction to support vector ma-
complexity is certainly not possible to justify in our situa- chines and other kernel-based learning methods. Cambridge Uni-
tion where the four features all represent different measures versity Press, 2000.
[8] O. Cula and K. Dana. Compact representation of bidirectional tex-
of texture, and are unlikely to provide orthogonal informa- ture functions. CVR2001.
tion for each and every decision. [9] O. Cula and K. Dana. Image-based skin analysis. Texture Workshop,
pages 35–40, 2002.
In summary, results clearly show the effectiveness of [10] K. Dana, B. van Ginneken, S. Nayar, and J. Koenderink. Reflectance
class-specific decision trees for material categorisation, and and texture of real-world surfaces. TG, 18(1):1–34,1999.
[11] H. Ganster, P. Pinz, R. Rohrer, E. Wildling, M. Binder, H. Kittler.
the importance of the classification algorithm in this task.
Automated melanoma recognition. IEEE MI, 20(3):233–239, 2001.
[12] E. Hayman, B. Caputo, M. Fritz, J.-O. Eklundh. On the significance
7. Conclusions of real-world conditions for material classification. ECCV2004.
[13] C-W Hsu, C-J Lin. A comparison of methods for multiclass support
This paper tackled the problem of recognising previously vector machines. IEEE TNN, 13 (2), 415-425, 2002.
[14] S. Lazebnik, C. Schmid, J. Ponce. Affine-invariant local descriptors
unseen samples of materials by training on other samples and neighbourhood statistics for texture recognition. ICCV2003.
drawn from the same category. [15] T. Leung, J. Malik. Representing and recognizing the visual appear-
A first, perhaps surprising, conclusion of our work was ance of materials using three-dimensional textons. IJCV, 43(1):29–
44, 2001.
that the difference in performance between several state- [16] T. Mäenpää and M. Pietikäinen. Multi-scale binary patterns for tex-
of-the-art texture descriptors proved minimal in this task. ture analysis. SCIA2003.
[17] P. Mallikarjuna, M. Fritz, A. Tavakoli Targhi, E. Hayman, B. Ca-
It remains a somewhat open question, how suitable these puto, and J.-O. Eklundh The KTH-TIPS and KTH-TIPS2 databases.
descriptors are for this categorisation problem, or whether www.nada.kth.se/cvap/databases/kth-tips.
new representations are in fact required. On the other hand, [18] M. Nilsback and B. Caputo. Cue integration through discriminative
accumulation. CVPR2004.
experiments clearly showed that a good choice of classi- [19] T. Ojala, M. Pietikäinen, and D. Harwood. A comparative study of
fier can yield significant rewards. In a first set of experi- texture measures with classification based on feature distributions.
ments, SVMs proved superior to Nearest-Neighbour classi- Pattern Recognition, 29(1):51–59, Jan 1996.
[20] T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution gray-
fiers. Yet a more important contribution of the paper was scale and rotation invariant texture classification with local binary
to demonstrate further gains using novel methods based on patterns. IEEE Trans on PAMI, 24(7):971–987, July 2002.
decision trees in which the kernel parameter was chosen [21] M. Pietikainen, T. Nurmela, T. Maenpaa, M. Turtinen. View-based
recognition of real-world textures. PR, 37(2):313–323, 2004.
independently for each node of the tree. Greater yet im- [22] L. Ruiz, A. Fdez-Sarra, and J. Recio. Texture feature extraction for
provements were available by not only selecting the kernel classification of remote sensing data using wavelet decomposition:
parameter, but also the best descriptor at each node. Com- A comparative study. 20th ISPRS Congress, 2004.
[23] M. Shi and G. Healey. Hyperspectral texture recognition using a
pared to a Nearest Neighbour classifier, the accumulated ef- multiscale opponent representation. IEEE Trans. Geoscience and
fect of all these improvements was to reduce the error rate Remote Sensing, 41(5):1090–1095, May 2003.
by 65 – 80 %, as illustrated in fig. 6. Our new algorithms [24] M. Varma and A. Zisserman. Classifying images of materials:
Achieving viewpoint and illumination independence. ECCV2002.
also outperformed other state-of-the-art kernel-based clas- [25] M. Varma and A. Zisserman. Texture classification: are filter banks
sifiers [18, 3]. Although our main motivation in developing necessary? CVPR2003.
[26] C. Wallraven, B. Caputo, A. Graf. Recognition with local features:
the algorithm was to obtain an effective class-specific strat- the kernel recipe. ICCV2003.
egy for material categorization, the CS-SVM algorithm can

You might also like