tamil_cnn
tamil_cnn
Abstract
1
Figure 2: An input image (or a feature map) is passed through a non-linear filterbank, followed by tanh activation, local
contrast normalization and spatial pooling/sub-sampling. a) First convolutional layer with 16 filters of size 5 ⇥ 5. b) Max-
pooling layer of size 2 ⇥ 2 with stride 2. c) Local response normalization layer with alpha = 0.1 and beta = 0.75. d) Second
convolutional layer with 32 filters of size 5 ⇥ 5. e) Max-pooling of size 2 ⇥ 2 with stride 2. f) Third convolutional layer with
32 filters of size 5 ⇥ 5. g) Max-pooling of size 2 ⇥ 2 with stride 2. h) 35-way softmax classification layer. We train using
stochastic gradient descent with an adaptive learning rate.
rithm where features like number of horizontal and vertical approximately 500 samples for each of the 156 Tamil char-
arcs and width and height of each character are extracted acters written by native Tamil writers. The characters are
during pre-processing. These features are then passed to made available for download as TIFF files. We resize the
an SVM, a Self Organizing Map, an RCS, a Fuzzy Neu- original unequally sized rectangular images into 32 ⇥ 32
ral Network, and a Radial Basis Network. They achieve an square images and save them as JPG files. The resized JPG
accuracy of 97% on test data but their approach is not in- images are exported and saved as rows in a large CSV file
variant to deformations or different writing styles as their where the first column of each row is added as the image
algorithms are highly dependent on the form of the char- class. This is done in MATLAB. A simple Python script
acter. Unfortunately, they provide little to no detail on is used to shuffle this large CSV file and split it into two
their dataset. Ramakrishnan et al. [13] derive global fea- smaller CSV files, one for the training set and another for
tures from discrete Fourier transform (DFT), discrete co- the test set containing approximately 60K and 10K images
sine transform (DCT), wavelet transform to capture overall each. We read both CSV files into the ConvNetJS library by
information about the data and feed into an SVM with a implementing a CSV parser using Papaparse 3 .
radial basis function (RBF) kernel. They obtain 95% ac-
curacy on an online test set. Though there has been a lot 3. Architecture
of research in Tamil handwriting recognition, most of it has
been with online datasets [1, 8], or with online and offline The input to the convolutional neural network is a 32⇥32
hybrid classifiers, and limited research with offline datasets. image passed through a stack of different kinds of layers as
To the best of our knowledge, we have not seen previous follows: n⇥32⇥32 16C5⇥5 P 2⇥2 L3⇥3 32C5⇥
attempts with ConvNets for our particular dataset. We em- 5 P 2 ⇥ 2 32C5 ⇥ 5 P 2 ⇥ 2 35N . This represents
ploy the traditional ConvNet architecture augmented with a net with n input images of size 32 ⇥ 32, a convolutional
different pooling methods and local contrast normalization. layer with 16 maps and filters of size 5 ⇥ 5, a max-pooling
This work is implemented with the open source ConvNetJS layer over non-overlapping regions of size 2 ⇥ 2, a convolu-
library 1 . tional layer with 32 maps of size 5 ⇥ 5, a max-pooling layer
over non-overlapping regions of size 2 ⇥ 2 and a fully con-
2. The Dataset nected output layer with 35 neurons, one neuron per class
(see Figure 2). We use a non-linear hyperbolic tangent ac-
We train the offline IWFHR-10 Tamil character dataset tivation function, where the output f is a function of input
from the HP Labs India website 2 . The dataset contains x such that f (x) = tanh(x) for the convolutional layers,
1 http://cs.stanford.edu/people/karpathy/ a linear activation function for the max-pooling layers, and
convnetjs/ a softmax activation function for the output layer. We train
2 http://lipitk.sourceforge.net/datasets/
tamilchardata.htm 3 http://papaparse.com/
2
selected. More precisely, the probabilities p for every re-
gion Rj (total regions nr ) are calculated after normalizing
the activations within the region (see Figure 3).
xi
pi = P (3)
k2Rj xk
Sampling the multinomial distribution based on p to pick
a location t within the pooling region is simply:
3
ux,y
f
f (ux,y
f )= ↵ (8)
1+ N regionxy
0 0
where mxf ,y here is the mean of all ux,y
f in the 2D neigh-
borhood defined by the summation bounds below.
ux,y
f
f (ux,y
f )= ↵ (6)
1+ N regionxy
where ux,y
f is the activity of a unit in map f at position
x, y prior to normalization, S is the image size, and N is
the size of the region to use for normalization. The output
dimensionality of this layer is always equal to the input di-
mensionality.
LayerParams
The IWFHR-10 classification dataset contains randomly
5 http://bigwww.epfl.ch/sage/soft/ sized images. After resizing all images to 32 samples, we
localnormalization/ split the dataset is into three subsets: train set, validation
4
have been tuned to work well together on the training set but
not on the test set [6]. Dropout is a regularization technique
where on each presentation of each training case, feature
detectors are deleted with probability p and the remaining
weights are trained by backpropagation [2].
5
Pooling Activation Classifier Train Acc. Test Acc. Others
Max Tanh Softmax 91% 87.39% NA
Max ReLu Softmax 69% 48.2% NA
Max Tanh SVM 84% 80.4% NA
Stochastic Tanh Softmax 86% 87.72% NA
Stochastic Tanh Softmax 84% 57.92% FC (dropout:0.1)
Stochastic Tanh SVM 80% 65.99% NA
Stochastic+Prob Wt Tanh Softmax 84% 86.5% NA
Table 1: Experiments with 35 ⇥ 28 images. FC: fully connected layer. Dropout: drop activations with probability 0.1.
Table 2: Experiments with 32 ⇥ 32 images. FC: fully connected layer. Dropout: drop activations with probability 0.2.
6
References [16] C. Sureshkumar and T. Ravichandran. Handwritten tamil
character recognition and conversion using neural network.
[1] K. Aparna, V. Subramanian, M. Kasirajan, G. V. Prakash, Int J Comput Sci Eng, 2(7):2261–67, 2010.
V. Chakravarthy, and S. Madhvanath. Online handwriting
[17] M. D. Zeiler. Adadelta: An adaptive learning rate method.
recognition for tamil. In Frontiers in Handwriting Recog-
arXiv preprint arXiv:1212.5701, 2012.
nition, 2004. IWFHR-9 2004. Ninth International Workshop
[18] M. D. Zeiler and R. Fergus. Stochastic pooling for regular-
on, pages 438–443. IEEE, 2004.
ization of deep convolutional neural networks. arXiv preprint
[2] P. Baldi and P. J. Sadowski. Understanding dropout. In arXiv:1301.3557, 2013.
Advances in Neural Information Processing Systems, pages
2814–2822, 2013.
[3] D. Ciresan, U. Meier, J. Masci, and J. Schmidhuber. A
committee of neural networks for traffic sign classification.
In Neural Networks (IJCNN), The 2011 International Joint
Conference on, pages 1918–1921. IEEE, 2011.
[4] X. Cui, V. Goel, and B. Kingsbury. Data augmentation for
deep neural network acoustic modeling. In Acoustics, Speech
and Signal Processing (ICASSP), 2014 IEEE International
Conference on, pages 5582–5586. IEEE, 2014.
[5] S. Haykin. Self-organizing maps. Neural networks-A com-
prehensive foundation, 2nd edition, Prentice-Hall, 1999.
[6] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and
R. R. Salakhutdinov. Improving neural networks by pre-
venting co-adaptation of feature detectors. arXiv preprint
arXiv:1207.0580, 2012.
[7] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet
classification with deep convolutional neural networks. In
Advances in neural information processing systems, pages
1097–1105, 2012.
[8] R. Kunwar and A. Ramakrishnan. Online handwriting recog-
nition of tamil script using fractal geometry. In Document
Analysis and Recognition (ICDAR), 2011 International Con-
ference on, pages 1389–1393. IEEE, 2011.
[9] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-
based learning applied to document recognition. Proceed-
ings of the IEEE, 86(11):2278–2324, 1998.
[10] Y. LeCun and C. Cortes. The mnist database of handwritten
digits.
[11] M. Matsugu, K. Mori, Y. Mitari, and Y. Kaneda. Subject
independent facial expression recognition with robust face
detection using a convolutional neural network. Neural Net-
works, 16(5):555–559, 2003.
[12] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y.
Ng. Reading digits in natural images with unsupervised fea-
ture learning. In NIPS workshop on deep learning and unsu-
pervised feature learning, volume 2011, page 4, 2011.
[13] A. G. Ramakrishnan and K. B. Urala. Global and local
features for recognition of online handwritten numerals and
tamil characters. In Proceedings of the 4th International
Workshop on Multilingual OCR, MOCR ’13, pages 16:1–
16:5, New York, NY, USA, 2013. ACM.
[14] P. Sermanet and Y. LeCun. Traffic sign recognition with
multi-scale convolutional networks. In Neural Networks
(IJCNN), The 2011 International Joint Conference on, pages
2809–2813. IEEE, 2011.
[15] N. Shanthi and K. Duraiswamy. A novel svm-based hand-
written tamil character recognition system. Pattern Analysis
and Applications, 13(2):173–180, 2010.