Class-Specific Material Categorisation: (Caputo, Hayman) @nada - Kth.se
Class-Specific Material Categorisation: (Caputo, Hayman) @nada - Kth.se
Figure 1. The variations within each category of the new KTH-TIPS2 database. Each row shows one example image from each of four
samples of a category. In addition, each sample was imaged under varying pose, illumination and scale conditions.
ing images are represented by texton histograms, and the 3. The KTH-TIPS2 material database
Nearest-Neighbour is sought using some distance measure We now introduce the new KTH-TIPS2 material database
between model and query histograms. which may be obtained from [17]. It builds on the KTH-
As an alternative to the dense features discussed above, TIPS database discussed in the previous section, but pro-
sparse feature sets may be applied as in Lazebnik et al. [14] vides a considerable extension by imaging multiple differ-
who perform segmentation and recognition within the same ent samples of different materials. This provides a much
framework. It remains an open question whether dense or more satisfactory basis for experiments than [12]. More
sparse features are best suited for material classification. samples per category are available, and the need for arti-
The experiments in most of the papers reviewed above ficially rescaled CUReT images used in [12] is eliminated.
perform training and test on different images of the same The database contains 4 physical, planar samples of each
physical sample of the CUReT database, so it is highly of 11 materials. One image from each sample is shown in
questionable whether this can be called material classifica- fig. 1. Many of these materials have 3D structure, implying
tion. Motivated by a wish to use material classification in that their appearance can change considerably as pose and
real-world scenarios, [12] conducted experiments on new lighting are changed.
images of different samples of a subset of CUReT materi- The variation in appearance between the samples in each
als. Having also flagged robustness to scale as crucial, these category is larger for some categories than others. Cork,
new samples were imaged not only at different pose and il- for instance, contains relatively little intra-class variation,
lumination, but also at varying scale. Hence the database while cracker and wool exhibit significant variation. As dis-
was called KTH-TIPS (Textures under varying Illumina- cussed in the Introduction, the appearance of wool depends
tion, Pose ad Scale) [17]. Training on artificially rescaled not only on the material, but also on how it has been treated,
CUReT images, and testing on KTH-TIPS images, gave in this case how the thread was spun and subsequently knit-
disappointing recognition rates of just over 20 % in a 61- ted. Moreover, it is not always obvious how the samples
class problem. should be split into categories. For instance, brown bread
Moreover, [12] clearly demonstrated the benefits of and white bread are subclasses of “bread”, and it might also
SVMs [7] over NNCs on the CUReT database. For the ker- make sense to group linen and cotton together in a “woven
nel [12] used the Gaussian Radial Basis Function (RBF) and fabric” class. As such we believe that this database pro-
χ2 kernel [4] vides a good platform for future studies of unsupervised or
supervised grouping of classes into higher-level categories,
|xi − yi |2 whether visual or semantic, in a hierarchical structure. Fur-
K = exp{−γ χ2 (x, y)}, χ2 = . (1)
i
|xi + yi | thermore, all 11 materials in KTH-TIPS2 are also present
in the CUReT database [10], opening for the possibility
Histograms were treated as feature vectors and normalised for researchers to conduct experiments on a combination of
to unit length, and the kernel parameter γ was found by databases.
cross-validation within the training set. The acquisition of KTH-TIPS2 images largely followed
100 100
the procedure used for KTH-TIPS and is described in more SVM SVM
50 50
3 illumination conditions. The 3 illuminations from KTH- 1 2 3 1 2 3
TIPS are used (frontal, 45◦ from the top and 45◦ from the No. of training samples per material No. of training samples per material
side, all taken with a desk-lamp with a Tungsten light bulb), (a) 3-scale LBP (b) 4-scale LBP
and for the fourth illumination condition we switched on the
fluorescent lights in the laboratory. Although some varia- 100
SVM
100
SVM
60 60
20 20 Cork
65 Lettuce leaf
0 −3 −2 −1 0 1 2 3
0 −3 −2 −1 0 1 2 3
10 10 10 10 10 10 10 10 10 10 10 10 10 10
γ γ
60
(a) 4-scale LBP (b) Joint, 11 × 11 patch size
55
1 2 3 Figure 4. Classification results for Cork and Lettuce leaf using
No. of training samples per material
SVMs in an 11-class problem, for 4-scale LBP (a) and Joint
Figure 3. Categorisation on the KTH-TIPS2 database, compar- (b) representation. Results are reported as the kernel parameter
ing various texture descriptors. SVM is used for classification. γ ∈ [10−3 , 103 ] (logarithmic scale).
None of the descriptors is a clear winner by any means.
70 70 70 70
60 60 60 60
50 50 50 50
1 2 3 1 2 3 1 2 3 1 2 3
No. of training samples per material No. of training samples per material No. of training samples per material No. of training samples per material
70 70 70 70
60 60 60 60
50 50 50 50
1 2 3 1 2 3 1 2 3 1 2 3
No. of training samples per material No. of training samples per material No. of training samples per material No. of training samples per material
(a) 3-scale LBP (b) 4-scale LBP (c) MR8 (d) Joint, 11 × 11 patch size
Figure 5. Categorisation with four different texture descriptors on the KTH-TIPS2 database, comparing Decision Trees with SVMs and
Nearest Neighbour classifiers (top row) and with DAS-DT and K-CSC (bottom row). 1-standard deviation error-bars are shown.
ing samples. As the graphs show, these results represent a tors, thus essentially selecting by hand the descriptor which
statistically significant improvement over SVM and NNC. proved best for each particular experiment. The new version
A second set of experiments (Figs 5a-d, bottom row) of CS-SVM more or less halves the error rate in comparison
benchmarked CS-SVM against K-CSC and DAS-DT for with the second best classifier (single-cue CS-SVM).
each of the four feature types, using the same experimen- The easiest material to recognise was Cork (above 98%),
tal setup as previously. DAS-DT, originally born as a cue- followed by Wool (above 95 %). The latter may seem sur-
combination scheme, is forced to use only one cue, but prising, yet a close look at fig. 1 reveals that although the
maintains its decision tree structure. When using a single Wool samples differ greatly globally, the small-scale local
cue, as here, both DAS-DT and SVM optimise a single texture is actually quite similar, and this is evidently cap-
value of γ on the overall performance, but the decision- tured by the descriptors. Cracker was hardest to recognise
making strategy is different between the two. Two ma- (20 – 40 % depending on the number of training samples)
jor conclusions may be drawn. First, for all feature types, followed, less predictably, by Cotton (35 – 55 %). This
the class-specific algorithms (CS-SVM and K-CSC) per- shows particularly well the importance of our database: tak-
form better than the non class-specific ones (DAS-DT, SVM ing different patches from a single physical sample would
and NNC). These results confirm the analysis of sec. 5 on not recreate the variability that cotton presents in reality.
the effectiveness of using kernel parameters (and conse- It is important to note that this latest algorithm has
quently metrics) especially tailored for each material. Sec- increased complexity during learning in comparison with
ond, the SVM-based class-specific algorithm systematically single-cue CS-SVM. For an N -class problem using M
performs better than the probabilistic one. This is consis- cues, and using αk intervals in the search for each k of a
tent with results of [18] which compared a discriminative total of K kernel parameters (in our case K=1 and α1 = 7
accumulation method with a probabilistic one, based (as K- since we optimise one kernel parameter over logarithmi-
CSC is) on kernel Gibbs distributions. This suggests that the cally spaced intervals, γ ∈ [10−3 , 10−2 , ..., 103 ]), then the
modelling assumptions for this family of probability distri- dimension of the search space for the algorithm parame-
butions might not always be appropriate for vision applica- ters will be M (N − 1) k αk . This is, of course, com-
tions. Finally, we remark that the choice of classifier affects pounded by the fact that training an SVM is already quite
performance much more than the feature type, as before. expensive, although promising approaches have recently
In a final, proof-of-concept experiment, we used CS- been suggested for speeding up the process (see for in-
-SVM optimising not only γ, but also the feature type at stance [2] and references therein). A comfort is that feature-
each split node. The aim was to investigate whether the selection CS-SVM is at least much cheaper than feature-
class-specific strategy can be applied to optimising metrics combining DAS-DT for which the corresponding
complex-
and descriptors simultaneously. Results in fig. 6 compare ity is M (M − 1) m βm N (N − 1)/2 k αk , where βm
this approach with the other classifiers (NNC, SVM, single- represents the dimension of the search space for each m of
-cue DAS-DT, K-CSC and CS-SVM optimised only for the M − 1 cue weights [18]. In practise, this makes DAS-DT
kernel parameter). Rather than showing graphs for all four unfeasible when the number of classes N is above 4 or 5
descriptors, we select the highest value over all descrip- in a recognition problem, so we have not conducted exper-
100
also be seen as another strategy for the multi-class exten-
sion of SVM. In the future we plan to compare the perfor-
90
mance of CS-SVM with other multi-class methods [13]. A
80 further contribution of this paper was a new, publicly avail-
Classification rate (%) able, database for categorisation of materials.
70 By demonstrating the importance of selecting class-
specific kernel parameters, and providing an algorithm
60 which exploits this fact, this paper should also be of consid-
CS−SVM: Kernel + features erable interest to researchers in other areas of visual pattern
50 CS−SVM: Kernel only: best feature
K−SCS: best feature classification, for instance object or face recognition.
DAS−DT: best feature
40 SVM: best feature
NN: best feature
References
1 2 3
Number of training samples per material [1] P. M. Baggenstoss and H. Niemann. A Theoretically Optimal Prob-
abilistic Classifier Using Class-specific Features ICPR2002
Figure 6. Categorisation on the KTH-TIPS2 database, using [2] G. H. Bakir, L. Bottou, J. Weston. Breaking SVM complexity with
Decision Trees optimised over (i) both kernels and features, cross-training. NIPS2004
and (ii) just the kernel, and selecting the best of the 4 features. [3] B. Caputo and H. Niemann. To each according to its need: kernel
class specific classifiers. ICPR2002.
Results for DAS-DT, K-CSC, traditional SVM and NNC are [4] S. Belongie, C. Fowlkes, F. Chung, and J. Malik. Spectral partition-
also shown where again only the best of the 4 features is shown. ing with indefinite kernels using the nyström extension. ECCV2002.
[5] P. Brodatz. Textures. Dover, 1966.
[6] C. Burges. Geometry and invariance in kernel based methods. Ad-
vances in Kernel Methods - Support Vector Learning, 1998.
iments with DAS-DT on our 11-class database. The added [7] N. Cristianini and J. S. Taylor. An introduction to support vector ma-
complexity is certainly not possible to justify in our situa- chines and other kernel-based learning methods. Cambridge Uni-
tion where the four features all represent different measures versity Press, 2000.
[8] O. Cula and K. Dana. Compact representation of bidirectional tex-
of texture, and are unlikely to provide orthogonal informa- ture functions. CVR2001.
tion for each and every decision. [9] O. Cula and K. Dana. Image-based skin analysis. Texture Workshop,
pages 35–40, 2002.
In summary, results clearly show the effectiveness of [10] K. Dana, B. van Ginneken, S. Nayar, and J. Koenderink. Reflectance
class-specific decision trees for material categorisation, and and texture of real-world surfaces. TG, 18(1):1–34,1999.
[11] H. Ganster, P. Pinz, R. Rohrer, E. Wildling, M. Binder, H. Kittler.
the importance of the classification algorithm in this task.
Automated melanoma recognition. IEEE MI, 20(3):233–239, 2001.
[12] E. Hayman, B. Caputo, M. Fritz, J.-O. Eklundh. On the significance
7. Conclusions of real-world conditions for material classification. ECCV2004.
[13] C-W Hsu, C-J Lin. A comparison of methods for multiclass support
This paper tackled the problem of recognising previously vector machines. IEEE TNN, 13 (2), 415-425, 2002.
[14] S. Lazebnik, C. Schmid, J. Ponce. Affine-invariant local descriptors
unseen samples of materials by training on other samples and neighbourhood statistics for texture recognition. ICCV2003.
drawn from the same category. [15] T. Leung, J. Malik. Representing and recognizing the visual appear-
A first, perhaps surprising, conclusion of our work was ance of materials using three-dimensional textons. IJCV, 43(1):29–
44, 2001.
that the difference in performance between several state- [16] T. Mäenpää and M. Pietikäinen. Multi-scale binary patterns for tex-
of-the-art texture descriptors proved minimal in this task. ture analysis. SCIA2003.
[17] P. Mallikarjuna, M. Fritz, A. Tavakoli Targhi, E. Hayman, B. Ca-
It remains a somewhat open question, how suitable these puto, and J.-O. Eklundh The KTH-TIPS and KTH-TIPS2 databases.
descriptors are for this categorisation problem, or whether www.nada.kth.se/cvap/databases/kth-tips.
new representations are in fact required. On the other hand, [18] M. Nilsback and B. Caputo. Cue integration through discriminative
accumulation. CVPR2004.
experiments clearly showed that a good choice of classi- [19] T. Ojala, M. Pietikäinen, and D. Harwood. A comparative study of
fier can yield significant rewards. In a first set of experi- texture measures with classification based on feature distributions.
ments, SVMs proved superior to Nearest-Neighbour classi- Pattern Recognition, 29(1):51–59, Jan 1996.
[20] T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution gray-
fiers. Yet a more important contribution of the paper was scale and rotation invariant texture classification with local binary
to demonstrate further gains using novel methods based on patterns. IEEE Trans on PAMI, 24(7):971–987, July 2002.
decision trees in which the kernel parameter was chosen [21] M. Pietikainen, T. Nurmela, T. Maenpaa, M. Turtinen. View-based
recognition of real-world textures. PR, 37(2):313–323, 2004.
independently for each node of the tree. Greater yet im- [22] L. Ruiz, A. Fdez-Sarra, and J. Recio. Texture feature extraction for
provements were available by not only selecting the kernel classification of remote sensing data using wavelet decomposition:
parameter, but also the best descriptor at each node. Com- A comparative study. 20th ISPRS Congress, 2004.
[23] M. Shi and G. Healey. Hyperspectral texture recognition using a
pared to a Nearest Neighbour classifier, the accumulated ef- multiscale opponent representation. IEEE Trans. Geoscience and
fect of all these improvements was to reduce the error rate Remote Sensing, 41(5):1090–1095, May 2003.
by 65 – 80 %, as illustrated in fig. 6. Our new algorithms [24] M. Varma and A. Zisserman. Classifying images of materials:
Achieving viewpoint and illumination independence. ECCV2002.
also outperformed other state-of-the-art kernel-based clas- [25] M. Varma and A. Zisserman. Texture classification: are filter banks
sifiers [18, 3]. Although our main motivation in developing necessary? CVPR2003.
[26] C. Wallraven, B. Caputo, A. Graf. Recognition with local features:
the algorithm was to obtain an effective class-specific strat- the kernel recipe. ICCV2003.
egy for material categorization, the CS-SVM algorithm can