0% found this document useful (0 votes)
26 views

Imp Document3

This document discusses a method for recognizing printed Tamil characters using multiclass hierarchical support vector machines (SVMs). It explores the interclass relationships between Tamil characters by organizing them into a hierarchy based on shape similarity. An SVM is trained using feature vectors extracted from character images. The hierarchical SVM formulation allows classification at each level of the hierarchy, improving accuracy compared to other classifiers like KNN and decision trees. The method achieved an accuracy of 96.85% on experimental data.

Uploaded by

sgrrsc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Imp Document3

This document discusses a method for recognizing printed Tamil characters using multiclass hierarchical support vector machines (SVMs). It explores the interclass relationships between Tamil characters by organizing them into a hierarchy based on shape similarity. An SVM is trained using feature vectors extracted from character images. The hierarchical SVM formulation allows classification at each level of the hierarchy, improving accuracy compared to other classifiers like KNN and decision trees. The method achieved an accuracy of 96.85% on experimental data.

Uploaded by

sgrrsc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Multiclass Hierarchical SVM for Recognition of Printed Tamil Characters

Shivsubramani K, Loganathan R, Srinivasan CJ, Ajay V, Soman KP


Centre for Excellence in Computational Engineering
Amrita Vishwa Vidyapeetham
Tamilnadu, India
{ramanand, r_logu, cj_srinivasan, v_ajay}@ettimadai.amrita.edu
characters and classifying them. The other processes involved include preprocessing activities like binarization and
skew estimations. It is followed by major phases like Segmentation and Feature Extraction.
Every character in a language forms a class. Character
recognition, thus, involves classification of characters into
multi-classes. Of the 126 unique characters identified in
Tamil language, inter-class dependencies were found within
many characters due to the similarity in their shapes. This
enabled them to be organized into hierarchies, thus enhancing and simplifying the process classification. Taking advantage of the inter-class dependencies within the character
a hierarchical based classification is possible based on the
views put forth by [Szedmak et al., 2005]. Combining both
the views together, a Multiclass Hierarchical SVM algorithm was devised and is understood to be very efficient
methods for character classification.
Experimental outcome of the algorithm fetched us better
results compared to other classifiers like Multilayer perceptron, KNN, Naive Bayes, decision tree and other rule based
classifiers. The paper presents a detailed comparative study
of the efficiency of the various classifiers.

Abstract
This paper presents an efficient method for recognizing printed Tamil characters exploring the interclass relationship between them. This is accomplished using Multiclass Hierarchical Support Vector Machines [Crammer et al., 2001; Weston et al.,
1998], a new variant of Multi Class Support Vector
Machine which constructs a hyperplane that separates each class of data from other classes. 126
unique characters in Tamil language have been
identified. A lot of inter-class dependencies were
found in them based on their shapes. This enabled
the characters to be organized into hierarchies
thereby enhancing the process of recognizing the
characters. The System was trained using features
extracted from the binary character sub-images of
sample documents using Hus [Hu., 1962; Jain et
al., 1996] moment invariant feature extraction
method. The system fetched us promising results in
comparison with other classifying algorithms like
KNN, Bayesian Classifier and decision trees. An
accuracy of 96.85% was obtained in the experiments using Multiclass Hierarchical SVM

Introduction

Multiclass formulation of the SVM with


vector output

We will now see how the multiclass formulation and interpretation differs from classical binary SVMs. First, class
labels are vectors instead of +1s and 1s in the binary
SVM. Thus class labels in binary SVM belong to one dimensional subspace where as for Multiclass SVM class label belongs to multi-dimensional subspace. Second, W that
defines the separating hyper plane in Binary SVM is a vector. In Multiclass, W is a Matrix. We can imagine the job of
W in two-class SVMs is to map the data/feature vector into
one-dimensional subspace. In multiclass SVM, the natural
extension is then, mapping data/feature space into vector
label space whose defining bases are vectors. In other
words multiclass learning may be viewed as vector labeled
learning or vector value- learning.
Assume we have a sample
S of pairs
{(yi , xi ) : yi Hy, xi Hx, i = 1, . . . ,m} independently and
identically generated by an unknown multivariate distribution P. The Support Vector Machine with vector output is

Support vector machine [Burges, 1998;Cristianini et al.,


2000] is a training algorithm for learning, classification and
regression rules from data. It is emerging as a very efficient
learning methodology in Artificial Intelligence. Fundamentally SVMs are binary classification algorithm with a strong
theoretical foundation in statistical learning theory [Vapnik,
1998]. Their ease of use, theoretical appeal, and remarkable
performance has made them the system of choice for many
learning problems. It is noted by [Cristianini et al., 2002]
that the limitations in most of the non-SVM learning
algorithms proposed in the past 20 years had been due to the
fact that they were based, to a large extent, on heuristics or
on loose analogies with natural learning systems. The new
pattern-recognition SVM algorithms overcome such
limitations with a strong underlying mathematical
foundation.
The most crucial stage in the process of Optical Character
Recognition (OCR) [Nagy, 1992] is that of recognizing the

93

AND 2007

realized on this sample by the following optimization problem.


min

1
2

where we write the output of inner products in the objective


as kernel items ( xi ), ( x j )

stand for the elements of the kernel matrices for the feature
vectors and for the label vectors respectively. Hence, the
vector labels are kernelized as well. The synthesized kernel
is the element-wise product of the input and the output kernels, an operation that preserves positive semi-definiteness.
The main point to be noted in the above formulation (1) is
the constraint equations.

T
T
tr(W W ) + Ce

Subject to
{W|W: H ( x ) H y , W is a linear operator},
{b | b H y }, bias vector

y i , (W (xi ) 1 - i , i = 1, . . . , m,

{ | H m } ,slack or error vector

i 0 , i = 1, . . . , m,

y i , (W (xi ) +b ) qi - pii , i = 1, . . . , m,

Here when we project W (xi ) onto y i , we are restricting

(1)

the resulting value to be always less than or equal to one.


There seems to be no compelling reason for such a restric-

where 0 denote the vectors with components 0. The real


values qi and
pi denote normalization constraints that
can

{1,

be

chosen

y i , (x i ) ,

yi

from
(x i )

the

} depending

set

of

tion. So if we allow y i , (W (xi )

y i , (W (xi )

m
min i j ( xi ), ( x j )
i , j =1

(2)

1
1
min tr(W T W ) + C T
2
2

Subject to

K yij
m
(yi , y j ) i ,
i =1

= 1 - i , i = 1, . . . , m,

The above change in turn necessitates 1-norm minimization


T
term Ce in the objective function to take the two-norm
T
form 12 C . The formulation given below (4) basically
can be thought of an extension of the formulation given
[Mangasarian et al., 2001] for a two-class SVM.

| i = 1, m

to the margin constraints and based on the Karush-KuhnTucker theory we can express the linear operator W by using the tensor products of the output and the feature vectors,
that is

K ij

goes out. Fur-

y i , (W (xi ) 1 - i , i = 1, . . . , m,

then the magnitude of the error measured by the slack variables i will be the same independently of the norm of the

The dual gives

ther the inequality constraint given below becomes equality


constraint

task. The bias term b can be put as zero because it has been
shown in [Kecman et. al., 2005] that polynomial and RBF
kernel do not require the bias term. To understand the geometry of the problem better, first we let qi and pi be 1,

m
T
W = i yi ( xi )
i =1

to take value around 1

(both sides) the nonnegative restriction on

values

on the particular

feature vectors. Introducing dual variables { i

( y i , y j ) = K i j K i j where

(x) H y , W linear operator},

{W|W: H

(3)

{| H m } ,slack or error vector

Subject to

y , (W (x ) ) = 1 - , i = 1, . . . , m,
i
i
i

{ i | i },
m
(y ) t i = 0, t = 1, . . . ,dim(H y ),
i=1 i

Lagrangian is given by,

C i 0, i = 1, , m ,

L =

94

1
2

tr( W

W) +

(4)

m
T
T
C i ( y i W ( x )-1 + i )
i =1
2

Solving for primal variables,


L

m
T
= W i y i ( x ) = 0
i =1

m
T
W = i y i ( x )
i =1

(5)

= C i i = 0 ,

i =
=

(6)

.* K

) = e -

(10)

K = e -

realized by a vector valued function


components of ( ) are given by

y
Where K = K . * K
I

Hierarchical based learning

1,., V gives an indexing of the nodes. The embedding is

(K +

T -1
if i belongs to t, t = 1, . . . ,T

T
(yi )t =
1
otherwise
T (T -1)

Introducing the concept of hierarchical organizing of the


characters, the above mentioned equation (10) can be represented as equation (11). The hierarchy is conceptualized
manually wherein every child node holds no relationship or
similarity with nodes other than its siblings alone. The hierarchy learning [Szedmak et al., 2005] is realized via an embedding of each path going from a node to the root of the
tree. Let V be the set of nodes in the tree. A path p ( ) V
is defined as a shortest path from the node to the root of
the tree and its length is equal to p ( ) . The set I =

T
Substituting W = i y i ( x) and =
in the coni =1
C
straint (1), we obtain
y

(9)

(K

1 if i belongs to category t, t = 1, . . . ,T
(yi ) t =
0 otherwise.

( ) i =

) = e

r if i p ( ),
k
sq if i ( p ) and k =

p ( )

:V

p (i )

, and the

(11)

Q = e
Where Q = ( K + I )

Where r, q, s are the parameters of embedding. The parameter q can express the diminishing weight of the nodes being
0
closer to the root. If q=0, assuming 0 = 1 , then the intermediate nodes and the root are discarded, thus we have a
multiclass classification problem. The value of r can be 0
but some experiments show it may help to improve the classification performance. This method was successfully applied to WIPO-alpha patent dataset and Reuters Corpus and
were known to give good results against them.

(7)

=Q

1
e

This leads to a closed form solution for SVM Training.


Here s are unrestricted in sign and unbounded.

Multiclass classification

The multiclass classification can be implemented within the


framework of the vector valued SVM. Let us assume the

label vectors are chosen out of a finite set { y1 , , y T } in


the learning task. The decision function predicting one of
these labels can be expressed by using the predicted vector
output [Shawe-Taylor et al., 2005]
d ( x) = arg max

t = 1 ,..., T

K
i =1

(y t , y i )K (xi , x)

Experiments

Tamil is a South Indian Language mainly spoken in southern parts of India, Sri Lanka, Malaysia and Singapore.
Tamil character set contains 12 vowels, 18 consonants and
totally 247 alphabets.126 unique commonly occurring characters in shape have been identified. Hierarchy is built based
on the 126 characters as explained in the section 5.4. Thus
the classification was to be done into 126 classes. The following flow chart Figure 1 describes the steps involved in
preparing the features extracted for training the system and
for classification thereby.

(8)

Now we are able to set up a multiclass classification. In


[Shawe-Taylor et al., 2005] following two rules for setting
up labels are given.

95

AND 2007

1 = 20 + 02
2 =

(20 02 )

3 =

(30 312 ) + (3 21 03 )

4 =

(30 + 12 ) + ( 21 + 03 )

5 =

(30 312 )(30 + 12 ) [(30 + 12 )

2
11

7 =

+ 4
2

+ 3

6 =

21

( 21 + 03 )

)(21 + 03 ) [3 (30 + 12 ) (21 + 03 )

3
2

03

( 20 02 ) [(30 + 12 ) ( 21 + 03 )
+ 4 (
+ )(
+ )
11
30
12
21
03
2

(3 21 03 )(30 + 12 ) [(30 + 12 )

2
]

( 21 + 03 )

2
]+

2
2
(321 30 )(21 + 03 ) 3 (30 + 12 ) (21 + 03 )

Where pq is ( p + q ) order normalized central moment.


th

5.4 Hierarchical labeling


A lot of inter class dependencies were found in Tamil characters based on their shapes. Many characters exhibit a lot
of similarity with other characters. The feature values of
such a pair of characters have a very minimal difference.
Some examples of such character pairs are as shown in Figure 2.

Figure 1: OCR process

5.1 Preprocessing
Documents were digitized and stored as gray scale Bitmap
images. Binarization was performed based on a threshold
value applied on the image. Since the scanned images were
noise free to a considerable extent, a noise reduction technique was not required to be performed on the image.

5.2 Segmentation
Segmentation was performed in two phases. (i) Line segmentation wherein each line in the document was segmented using horizontal profile. (ii) Character segmentation
wherein each character in the line was segmented using 8connected component analysis [Haralick et al., 1992]. The
two phase approach was adopted based on a comparative
study where this approach yielded a better result.

5.3 Feature extraction


Feature extraction involves extracting the attributes that best
describe the segmented character image. Moment based
invariants are the most commonly used feature extraction
method in many applications. It explores information across
an entire image and it can capture some of the global properties like the overall image orientation. Hus [Hu., 1962;
Jain et al., 1996] moment invariant method was adopted
since it is invariant to scaling, rotation and image translation. Seven invariant features were extracted based on the
following non linear functions,

Figure 2: Similarity in shapes

Taking advantage of such a property exhibited by the


Tamil characters, the characters exhibiting similarity were
organized into hierarchies which eased the process of classification. Figure 3 shows a hierarchical tree structure of some
selected characters with similarity in shapes.

96

The accuracy rate yielded by the SVM classifier was


quite commendable. Based on a comparative study performed, Multiclass Hierarchical SVM showed better accuracy rate than many other classifiers used like Multilayer
perceptron, KNN, Naive Bayes, decision tree and other rule
based classifiers.
The system was tested thrice using 3, 7 and 20 most similar characters respectively. The accuracy rate was calculated
using 10 fold cross validation technique. Table 1 depicts the
comparison values of the accuracy yielded by the various
classifiers.
Classifier

Figure 3: Hierarchical labeling


Incase the character feature exhibits the property of a particular subclass, the classes not belonging to this hierarchy
need not be considered for classification at all. Thereby a
126 class classification problem can be broken down to a 10
class or 8 class problem. This, to a large extent, enhances
the accuracy and efficiency by increasing the inter-class
differentiability factor.

5.5 Training
Training data set was generated by labeling the features
extracted from the test character image, with the corresponding class. A training dataset for a particular class, on
average, contains 20 sample training data.

Results

Multiclass Hierarchical SVM turned out to be a very efficient method in process of classification. The accuracy of
the algorithm depended on two parameter settings (RBF
Kernel parameter and regularization parameter C). The

( x x ) 2

95

Accuracy

90
85
80
75
70
-5

-3

-5

-5

-20

-10

0.5

0.5

0.5

0.5

0.9

96.85

96.23

96.86

Multilayer
Perceptron

91.8

95.45

93.43

KNN

89.40

90.05

89.90

Nave Bayes

84.5

88.90

88.20

Decision
Trees

91.0

92.84

93.23

Enhancements
The system can be enhanced by including a module of
Language heuristics wherein a character not leading to
a particular class could be classified based on predictions made using the general language grammar and
rules.
Another enhancement possible is compiling a language dictionary wherein in a situation of ambiguity,
classification could be performed based on language
semantics
Apart from the 126 characters identified here, there
exist some ancient Tamil characters that are not used
commonly. Such characters can also be taken into
consideration for a broader aspect.
The system can also be extended to other oriental languages. We are planning to work on some Indian languages like Sanskrit, Hindi, and Malayalam etc.,
which exhibit the property of similarity between characters in shapes and can be organized into hierarchies.

100

-1

Multiclass
Hierarchical
SVM

Table 1: Comparison of classifier performances

1 2
. The system had to
form of RBF kernel used is e
be fine tuned on these values in order to obtain a better accuracy. Figure 4 shows the improving performance/accuracy
of the system with the changing values of the parameters.

-0.5

Accuracy (%) with


7 char3 char20 characters
acters
acters

Sigma and C

Figure 4: SVM Classifier Accuracy

97

AND 2007

[Kecman et al., 2005] V. Kecman, T.M. Huang and M.


Vogt, Iterative Single Data Algorithm for Training Kernel
Machines from Huge Data Sets: Theory and Performance,
Support Vector Machines: Theory and Applications,
Springer-Verlag,.Studies in Fuzziness and Soft Computing,
Vol. 177, Chap. 12, 2005, pp. 255-274

Conclusion

The paper presented an efficient algorithm for classification


of characters using Multiclass Hierarchical SVM, a variant
of Multiclass SVM. The system was applied for the recognition of printed Tamil language characters. The experimental procedures are explained and the results listed out depicting the efficiency of the system. The algorithm did prove
more efficient than some of the commonly used classifiers.
Some merits of our algorithm are:

[Mangasarian et al., 2001] G. Fung and O.L. Mangasarian,


Proximal Support Vector Machine Classifiers, KDD 2001:
Seventh ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining, San Francisco August 26-29, 2001

Strong mathematical model foundation rather than


heuristics and analogies.
Efficient in terms of accuracy in comparison with
many commonly used classifiers

[Nagy, 1992] G. Nagy, On the Frontiers of OCR, Proceedings of the IEEE, vol. 40, #8, pp. 1093-1100, July 1992.

References

[Osuna et al., 1997] E. Osuna, R. Freund, and F. Girosi, An


Improved Training Algorithm for Support Vector Machines, Proc. IEEE Neural Networks for Signal Processing
VII Workshop, IEEE Press, Piscataway, N.J., 1997, pp.276
285.

[Burges, 1998] C.J.C Burges, A Tutorial on Support Vector


Machines for Pattern Recognition, Data Mining and
Knowledge Discovery, 1998
[Chen., 2003] Qing Chen, Evaluation of OCR Algorithms
for Images with Different Spatial Resolutions and Noises,
Master thesis, School of Information Technology and Engineering, University of Ottawa, 2003

[Schlkopf, et al., 2002] B. Schlkopf and A. J. Smola.

Learning with Kernels. MIT Press, 2002.

[Shawe-Taylor et al., 2005] Sandor Szedmak and J. ShaweTaylor, Multiclass Learning at One-class Complexity,
Technical Report, ISIS Group, Electronics and Computer
Science , 2005.

[Crammer et al., 2001] Koby Crammer and Yoram Singer,


On the Algorithmic Implementation of Multiclass Kernelbased Vector Machines, Journal of Machine Learning Research 2 (2001), pp. 265-292

[Szedmak et al., 2005] Sandor Szedmak, John ShaweTaylor, Learning Hierarchies at Two-class Complexity,
Kernel Methods and Structured domains, NIPS 2005

[Cristianini et al., 2000] Nello Cristianini, John ShaweTaylor, An Introduction to Support Vector Machines and
Other Kernel-based Learning Methods, Cambridge University Press; 1st edition, 2000

[Vapnik, 1995] V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995.

[Cristianini et al., 2002] Nello Cristianini, Bernhard


Scholkopf, Support vector machines and kernel methods:
the new generation of learning machines, Articles AI
Magazine, Fall, 2002

[Vapnik, 1998] V.Vapnik, Statistical Learning Theory.


John-Wiley and Sons , Inc., New York, 1998
[Weston et al., 1998] J. Weston, C. Watkins. Multi-class
support vector machines. Technical Report CSD-TR-98-04,
Department of Computer Science, Royal Holloway, University of London, Egham, TW20 0EX, UK, 1998.

[Joachims, 1998] T. Joachims, Making Large-Scale SVM


Learning Practical, Advances in Kernel methods-Support
Vector Learning, MIT Press, 1998.
[Haralick et al., 1992] Haralick, Robert M., and Linda G.
Shapiro, Computer and Robot Vision, Volume I, AddisonWesley, 1992, pp. 28-48.
[Hu., 1962] MK Hu, Visual Pattern Recognition by Moment
Invariants, IRE Trans. Information Theory, vol. 8, pp. 179187, 1962
[Jain et al., 1996] O. D. Trier, A. K. Jain and T. Taxt, Feature extraction methods for character recognition - A survey,
Pattern Recognition 29, pp. 641-662, 1996.

98

You might also like