0% found this document useful (0 votes)

135 views

Leaf Image Classification With Shape Context and SIFT Descriptors

Computer science. Image Processing Technique to classify leaf image based on its shape, color, or contour. this journal using shape context and SIFT descriptor to classify the leaf

Uploaded by

RisaKukuhFebriany

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

135 views

Leaf Image Classification With Shape Context and SIFT Descriptors

Computer science. Image Processing Technique to classify leaf image based on its shape, color, or contour. this journal using shape context and SIFT descriptor to classify the leaf

Uploaded by

RisaKukuhFebriany

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

2011 International Conference on Digital Image Computing: Techniques and Applications

Leaf Image Classication with Shape Context and SIFT Descriptors

Zhiyong Wang , Bin Lu , Zheru Chi and Dagan Feng
of Information Technologies, The University of Sydney, Australia
Dept. of Electronic and Information Engineering, Hong Kong Polytechnic University, Hong Kong
Email: [email protected], [email protected], [email protected], [email protected]
School

represent and categorize venation patterns. By treating leaf

veins as a texture, many texture based features can also be
investigated for characterizing venation patterns [12], though
most texture descriptors cannot handle rotation very well
and it is usually required to align leaf images to a specic
direction.
Recently, local descriptors such as SIFT (Scale Invariant
Feature Transform) descriptors have been widely used in
object recognition and image classication [13]. In [14],
Nilsback and Zisserman proposed to apply SIFT descriptors
for ower image classication since SIFT descriptors are
invariant to afne transformation. Though there are also
some variations of SIFT that could improve certain aspects
of the original SIFT descriptors, in our method we utilize
the original one [13].
In this paper, we propose to take both global shape feature
and such local descriptors into account in the classication
process. Due to its robustness to rotation and scaling and
its effectiveness in object recognition, shape context (SC)
[6] is employed as the global shape feature. Note that IDSC
(Inner-Distance Shape Context) [7] is not employed due to
its expensive computational cost, though it is an improved
version of SC [6]. A weighted K-NN algorithm is employed
to perform classication, where the similarity between two
leaf images is measured by matching their shape contexts
and SIFT descriptors. Finally, our proposed algorithm is
evaluated with the large ICL leaf benchmark dataset.

AbstractNowadays leaf image classication is very useful

for both botanists and ordinary users since advanced imaging
devices such as smart phones make it ever easier to capture
leaf images for various tasks such as retrieval and classication. Most of existing approaches mainly utilize global shape
features. In this paper, we propose to improve leaf image
classication by taking both global features and local features
into account. As one of the most effective shape features, shape
context is utilized as global feature. And SIFT (Scale Invariant
Feature Transform) descriptors that have been successfully
utilized for object recognition and image classication are
selected as local features. Finally, weighted K-NN algorithm is
utilized for classication. Experimental results on the large ICL
dataset demonstrate that the proposed method outperforms the
state-of-the-art.
Keywords-Leaf image, image classication, shape context,
SIFT descriptors, K-NN

I. I NTRODUCTION
Identication of plant species from plant leaves is an
important and challenging task for botanists due to a large
number of species available on our planet. Content based
image retrieval and classication can greatly assist botanists
in this process [1]. In addition, with the advances in imaging
devices, more and more mobile devices such as smart phones
come with cameras so that ordinary users are able to take a
photo of a leaf image of a plant which they are interested in
and to obtain further information about the plant by querying
the retrieval and classication system with the sample photo,
which is of great signicance for education and promoting
the awareness of environment protection.
From a visual perspective, leaf images are generally characterized through their shape and veins, since large majority
of leaves are generally of green color. Most of existing
shape features focus on leaf contours. In [2], Curvature
Scale Space (CSS) images were proposed by ltering leaf
contours with Gaussian functions at different scales. Im et
al. proposed to hierarchically approximate leaf contours with
polygons [3]. And it was also proposed to combine different
shape features for better classication performance [4][5].
Recently, shape context proposed for object recognition has
also been successfully utilized for leaf image classication
[6][7].
Besides contour information of leaves, veins of leaves
also provide rich information in identifying plant species
[1][8][9]. In [10][11], structure features were extracted to
978-0-7695-4588-2/11 $26.00 2011 IEEE
DOI 10.1109/DICTA.2011.115

II. S HAPE C ONTEXT

Shape Context (SC) introduced in [6] has already demonstrated its effectiveness for many computer vision tasks such
as object recognition. Utilizing shape context for object
recognition in general consists of three steps: 1) calculating
shape context for each sample point of an object, 2) performing matching among the sample points of two objects,
and 3) obtaining transformation cost of aligning two objects
as the their dis-similarity (i.e. distance).
The basic idea of shape context is to record relative
distances from a given point to all other points. SC begins
by taking N sample points from an object. Typically those
sample points are from shape edges. For point pi of N
sample points for an object, there are N 1 vectors from
pi to all other points pj , where i = j. Such vectors strongly
indicate the relative positions of all other points pj to pi .
648
650

respectively. As shape contexts are distributions represented

as histograms, it is natural to use Pearsons Chi-square (2 )
test statistics as the shape context cost of matching two
points:
k

Cij = C(pi , qj ) =

1 [hi (k) hj (k)]2

,
2
hi (k) + hj (k)

(2)

k=1

where Cij is the cost between points pi and pj , and hi (k)

denotes the k-th bin of pi s shape context. Such cost can
give a measurement for the quantied similarity between
two points.
Since the distance between any pair of sample points from
two objects can be calculated by Pearsons Chi-square test
function, we can build up a n n cost matrix C. The goal
is to nd a one-to-one matching that minimizes the total
matching cost. In other words, no elements from the same
column match to the same row element and no elements
from the same row match to the same column element. Given
the set of costs Cij between all pairs of points pi on the rst
object and qj on the second object, we want to minimize the
total cost of matching,

C(pi , q(i) ),
(3)
H () =
i

subject to the constraint that the matching be one-to-one

(i.e. is a permutation). Such problem is an instance of the
square assignment (or weighted bipartite matching) problem
and can be solved in O(N 3 ) time using the Hungarian
Method [15]. One matching sample is shown in Fig. 2.
After the one-to-one matching is obtained, the last step
is to calculate the cost of transforming one shape to the
other. Given a nite set of correspondences between two set
of points from two objects, one can proceed to estimate a
plane transformation T : 2 2 that may be used to map
arbitrary points from one shape to the other. Thin plate spline
(TPS) model is one of the mostly used transformations.
Finally, a shape distance Distsc between two shapes Ii and
Iq is going to be a weighted sum of three potential terms,
shape context distance, appearance cost, and transformation
cost (refer to [6] for more details).

Figure 1. Illustration of shape contexts for the four points (i.e. SC top,
SC bottom, SC left, and SC right) of the leaf contour.

The more points there are, the more accurate representation

shape context can give. However, a full set of these vectors
are too much detail for a shape descriptor, SC only takes the
distribution of such relative position information. Therefore,
the SC of pi is dened with following equation:
hi (k) = # {q = pi : (q pi ) bin (k)} ,

(1)

where the bins are normally taken to be uniform in log-polar

space and the value of each bin is the number of points that
are located into this bin. In general, the log-polar is divided
into 12 5 bins which are 12 angles and 5 distances. Fig. 1
shows an example of shape context for a specic point. As
a result, there will be N histograms for N sample points of
an object and such a set of histograms is the shape context
descriptors for the object.
In order to perform matching between two sets of sample points of two objects, two issues should be solved,
measuring the distance between two sample points characterized with their shape contexts and obtaining optimal
correspondence between the two sets of points. Denote the
shape contexts for points pi and pj as hi (k) and hj (j),

III. SIFT D ESCRIPTORS

Features extracted using the SIFT algorithm are invariant
to image scale, rotation, and partially robust to changing
viewpoints and changes in illumination. The invariance and
robustness of the features extracted using this algorithm
makes it an extremely good candidate for object recognition
and achieving one of the best performance gures from all
current feature extraction techniques [13].
In order to obtain SIFT descriptors, there are two steps,
key point detection and feature extraction. In the rst step, an
image will be ltered with Gaussian functions at different
scales and Difference of Gaussian (DoG) is employed to

651
649

Figure 3.
images.

A sample matching using SIFT descriptors between two leaf

between two images is dened as,

DistSIF T = 1

Sif tM atch(q, i)
,
Keypoints(q)

(4)

where Sif tM atch(q, i) is the number of matched key points

between images Iq and Ii , and Keypoints(q) is the number
of key points available in image Iq so as to normalize the
distance value to the range of [0, 1].

Figure 2. Illustration of shape context matching. (Top: match with both

the contour and the vein. Bottom: matching with the contour only.)

IV. K-NN BASED CLASSIFICATION

detect key points which are the local extrema (maxima or
minima).
In the second step, a unique descriptor for each key point
is computed within a 16 16 neighbor window around the
key point. Such a neighbor window will be further broken
into sixteen 4 4 sub-windows. Within each 4 4 subwindow, orientations are computed and assigned into an 8bin histogram. As a result, a 128-dimension (= 4 4 8)
vector is obtained as the descriptor for the key point.
Therefore, the similarity between two images is decided
by how well the key points from two images are matched.
The nearest neighbor is dened as the key point with minimal Euclidean distance by comparing the 128-dimension
SIFT descriptors. However, the problem is that many key
points will not have correct match in the other image when
two images in comparison are too distinct. A solution to
lter the false matches is to also take the second nearest
neighbor into account. Lowe suggested to reject all matches
of which the distance ratio between the nearest neighbor and
the second nearest neighbor is greater than 0.8 [16]. As a
result, 90% of the false matches will be ltered with the cost
of also ltering less than 5% correct matches at the same
time. A matching sample is shown in Fig. 3.
In general, the more key points are matched between two
images, the more similar the images are. And the similarity

For K-NN based classication, the label of a query sample

is determined by the dominant label among its k nearest
neighbours in the ground-truth dataset. In order to have a
unied distance value, the matching costs from two features,
shape context and SIFT descriptors, are linearly combined
by,
Dist(i, q) = w DistSIF T (i, q) + (1 w) DistSC (i, q),
(5)
where DistSIF T and DistSC is the normalized matching
costs based on SIFT descriptors and shape context, respectively, and w is the weighting factor empirically set to 0.5
in our experiments.
In order to favor the nearer neighbours, the weighted
KNN classication scheme is employed. At rst, k nearest
neighbours Ii are obtained for query Iq . For class c, when
there is at least one sample in the k nearest neighbours, a
c (q) between Iq and all the samples
weighted distance Dist
of class c in the k nearest neighbours is dened as
c (q)
Dist

i=1

wi Dist(i, q)c (Ii )

,
k
i=1 wi

(6)

1
where wi is dened as Dist(i,q)
2 and the value of c (Ii )
is 0 (1) if Ii is not of class c (if Ii is of class c). And

652
650

Figure 4.
Contour extraction (middle) and edge detection (right) for
original leaf image (left).

query object Iq will be labeled with the class which has the
c (q).
minimum Dist
Figure 5.

V. E XPERIMENTS AND D ISCUSSIONS

Our experiments are conducted with the large ICL
leaf dataset of Institute of Intelligent Machines, Chinese
Academy of Sciences [5]. The whole dataset consists of
16,851 leaf images of 220 species collected from Hefei
Botanical Garden, Hefei, Anhui Province, China. In our
experiments, 20 images per class, 4400 images in total, are
chosen as the training set, and 10 other images per class,
2200 images in total, as the testing set. In literature, most
experiments were conducted on a dataset with around only
2000 images [17].
As shown in Fig. 4, preprocessings such as contour extraction and vein extraction with canny detectors are applied
to the dataset. Contour extraction is achieved through simple
thresholding due to clean background, and vein extraction is
done through Canny edge detection with empirical settings,
though other advanced techniques can be employed [1].
In our experiments, K is set to 3 for K-NN algorithm
and a decision for a given query is correct if its ground
truth class appears in the top T results, which is similar to
the evaluation protocols in literature [17].

Classication results with different settings.

As shown in Fig. 5, SIFT descriptors are useful for leaf

image classication, since the combined feature set (e.g.
Experiments C and F) achieves the best results.
However, it is also noticed that vein patterns are not
always helpful for shape context based classication. For
example, curve B is lower than curve A, though curve E
is higher than D. Since in our experiments vein extraction
based on simple Canny edge detection could generate noisy
outputs, utilizing vein patterns in shape context may lead
to unstable performance. Such situation could be remedied
with advanced vein extraction algorithms [1].
B. Comparison
We compared our proposed method to two recent ones,
Inner-Distance Shape Context (IDSC) [7] and HOG-MMC
(Histogram of Oriented Gradient-Maximum Margin Criterion) [5]. Note that the gure of HOG-MMC is from [5].
As shown in Table 1, our proposed algorithm outperforms
the other two methods, since both IDSC and HOG-MMC
utilize contour information of leaf images only and our
method takes both contour information and vein patterns
into account.

A. Impact of different settings

In order to evaluate our proposed method under different
settings, two sampling values (i.e. 100 and 200) are set for
shape context, and three different schemes are employed to
utilize contour and venation information. Therefore, six sets
of experiments are conducted:
A Classication using shape context extracted from 100
contour sample points;
B Classication using shape context extracted from 100
contour and vein sample points;
C Classication using combination of SIFT descriptors and
the SC features of B;
D Classication using shape context extracted from 200
contour sample points;
E Classication using shape context extracted from 200
contour and vein sample points; and
F Classication using combination of SIFT descriptors and
the SC features of E.

Algorithm
Accuracy

IDSC
83.79%

HOG-MMC
89.40%

Our Method
91.30%

Table I
C OMPARISON WITH OTHER ALGORITHMS .

VI. C ONCLUSION
In this paper, we present a method for leaf image classication by utilizing shape context and SIFT descriptors
so that both global and local attributes can be taken into
account. These two features are invariant for rotation and
translation. K-NN classication scheme is utilized for nal
classication. Experiments on the largest leaf dataset, ICL
dataset, demonstrate that the proposed method can achieve

653
651

better performance than the state-of-the-art. Note that in our

approach, shape context based matching and SIFT based
matching are conducted independently. It is anticipated that
computational cost will be reduced if one of the matching
is utilized to guide the other matching. The classication
performance can be further improved with more advanced
algorithms on vein extraction, key point based matching, and
adaptive feature combination (i.e. w in Eq. 5).

[11] Y. Nam, E. Hwang, and D. Kim, A similarity-based leaf

image retrieval scheme: Joining shape, Computer Vision and
Image Understanding, vol. 110, pp. 245259, 2008.

VII. ACKNOWLEDGEMENTS
The work presented in this paper was partially supported
by ARC (Australian Research Council) grants and Hong
Kong Polytechnic University research grant (project code:
4-ZZ7V).

[13] K. Mikolajczyk and C. Schmid, A performance evaluation

of local descriptors, IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 27, no. 10, pp. 16151630,
2005.

[12] B. S. Anami, S. S. Nandyal, and A. Govardhan, A combined

color, texture and edge features based approach for identication and classication of indian medicinal plants, International Journal of Computer Applications, vol. 6, no. 12, pp.
4551, Sep 2010.

[14] M.-E. Nilsback and A. Zisserman, Automated ower classication over a large number of classes, in Indian Conference
on Computer Vision, Graphics and Image Processing, Dec
2008.

R EFERENCES
[1] Z. Chi, Data management for live plant identication, in
Multimedia information retrieval and management: technological fundamentals and applications, D. Feng, W. C. Siu,
and H.-J. Zhang, Eds. Springer, 2003, pp. 432457.

[15] C. Papadimitriou and K. Stieglitz, Combinatorial Optimization: Algorithms and Complexity. Prentice Hall, 1982.
[16] D. G. Lowe, Distinctive image features from scale-invariant
keypoints, International Journal of Computer Vision, vol. 60,
no. 2, pp. 91110, 2004.

[2] S. Abbasi, F. Mokhtarian, and J. Kittler, Reliable classication of chrysanthemum leaves through curvature scale space,
in Proceedings of the First International Conference on ScaleSpace Theory in Computer Vision, Utrecht, The Netherlands,
Jul 1997, pp. 284295.

[17] P. N. Belhumeur, D. Chen, S. Feiner, D. W. Jacobs, W. J.

Kress, H. Ling, I. Lopez, R. Ramamoorthi, S. Sheorey,
S. White, and L. Zhang, Searching the worlds herbaria: A
system for visual identication of plant species, in Proceedings of European Conference on Computer Vision (ECCV),
Marseille, France, Oct 2008, pp. 116129.

[3] C. Im, H. Nishida, and T. Kunii, Recognizing plant species

by leaf shapes-a case study of the acer family, in Proceedings of the International Conference on Pattern Recognition
(ICPR), Brisbane, Australia, Aug 1998, pp. 11711173.
[4] Z. Wang, Z. Chi, and D. Feng, Shape based leaf image
retrieval, IEE Proceedings of Vision, Image, and Signal
Processing, vol. 150, pp. 3443, 2003.
[5] X.-Y. Xiao, R. Hu, S.-W. Zhang, and X.-F. Wang, HOGbased approach for leaf classication, in Proceedings of the
International Conference on Intelligent Computing (ICIC),
Changsha, China, Aug 2010, pp. 149155.
[6] S. Belongie, J. Malik, and J. Puzicha, Shape matching and
object recognition using shape contexts, IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 24, no. 4,
pp. 509522, Apr 2002.
[7] H. Ling and D. W. Jacobs, Shape classication using the
inner-distance, IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 29, no. 2, pp. 286299, Feb 2007.
[8] H. Fu and Z. Chi, Combined thresholding and neural network approach for vein pattern extraction from leaf images,
IEE Proceedings - Vision, Image & Signal Processing, vol.
153, no. 6, pp. 881892, December 2006.
[9] J. Clarke, S. Barman, P. Remagnino, K. Bailey, D. Kirkup,
S. Mayo, and P. Wilkin, Venation pattern analysis of leaf images, in Proceedings of International Symposium on Visual
Computing, Lake Tahoe, NV, USA, Nov 2006, pp. 427436.
[10] J. Park, E. Hwang, and Y. Nam, Utilizing venation features
for efcient leaf image retrieval, The Journal of Systems and
Software, vol. 81, pp. 7182, 2008.

654
652