A_Hybrid_Model_for_End_to_End_Online_Handwriting_Recognition

The document presents a hybrid model for end-to-end online handwriting recognition, focusing on the challenges posed by unconstrained handwriting in Indic scripts like Devanagari and Bangla. The proposed architecture combines Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and Connectionist Temporal Classification (CTC) to improve recognition accuracy without relying on a specific lexicon. Experimental results on large datasets demonstrate the effectiveness of the model in recognizing cursive handwriting with encouraging accuracy rates.

Uploaded by

KAVI BHARATHI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

A_Hybrid_Model_for_End_to_End_Online_Handwriting_Recognition

Uploaded by

KAVI BHARATHI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2017 14th IAPR International Conference on Document Analysis and Recognition

A Hybrid Model for End to End Online Handwriting Recognition

Partha S. Mukherjee, Ujjwal Bhattacharya, Swapan K. Parui Bappaditya Chakraborty

Computer Vision and Pattern Recognition Unit Department of Computer Science and Engineering
Indian Statistical Institute Brainware University
Kolkata, India Kolkata, India
[email protected], {ujjwal,swapan}@isical.ac.in [email protected]

Abstract—Automatic recognition of online handwritten instead of the end of a character. Thus, the task of recog-
words in a generic mode has significant application potentials. nition of cursive handwriting is far more challenging [1]
However, this recognition job is challenging for unconstrained than recognition of isolated handwritten characters. In this
handwriting data. The challenge is more serious for Indic
scripts like Devanagari or Bangla due to the inherent cur- work, we have studied recognition of unconstrained online
siveness of their characters, large sizes of respective alphabets, handwriting of Devenagari and Bangla, the two most popular
existence of several groups of shape similar characters etc. Indian scripts. This type of handwriting data is captured by
On the other hand, with the recent development of powerful touch screen devices, pen tablets etc. Such devices store
machine learning tools, major research initiatives in this area coordinates of points on the writing surface along the path of
of pattern recognition studies have been observed. Feature
extraction and classification are two major modules of such a movement of finger tip or stylus as a temporal sequence. The
recognizer. Deep architectures of convolutional neural network part of such a sequence between a pair of successive ‘pen
(CNN) models have been found to be efficient in extraction of down’ and ‘pen up’ situations is often termed as a stroke.
useful features from raw signal. On the other hand, a recurrent A piece of online handwritten data is composed of one or
neural network (RNN) along with connectionist temporal classi- more such strokes. An example of such online handwriting
fication (CTC) has been shown to be able to label unsegmented
sequence data. In the present article, we propose a hybrid data is shown in Fig. 1.
layered architecture consisting of three networks CNN, RNN
and CTC for recognition of online handwriting without use of
any specific lexicon. In this study, we have also observed that
feeding hand-crafted features to the CNN at the first level of
the proposed model provides better performance than feeding
the raw signal to the CNN. We have simulated the proposed
model on two large databases of Devanagari and Bangla online
unconstrained handwritten words. The recognition accuracies
provided by the proposed model are encouraging. Figure 1. A piece of online handwritten Hindi text written in Devanagari
script is shown. Circles show the positions of captured coordinates on the
writing surface. Different colors mark different strokes.
I. I NTRODUCTION
A. Devanagari Script
From the perspective of automatic recognition, hand-
writing data are often categorized into offline and online Devanagari is one of the most widely used scripts in
formats. Offline handwriting sample is stored in the form of southern part of Asia. This is a descendant of old Brahmi
a two-dimensional image while online handwriting sample is script and its early use was found around 1000 CE. De-
stored as a temporal sequence of two-dimensional coordinate vanagari script is used to write several languages like
points determining the trajectory of pen tip movement along Sanskrit, Hindi, Nepali, Marathi, Kashmiri etc. The type
with some additional information such as pen status (‘up’ of Devanagari script is alpha-syllabary (also known as
or ‘down’) etc. Automatic recognition or interpretation of Abugida) [2] where a consonant and vowel composition is
both these types of handwriting data has their respective often written as a single unit. Also, two or more of its
challenges. Since the beginning, study of handwriting data basic consonant characters can combine together to form a
has attracted attention of the researchers in the area of compound character. Due to these chracteristics of the script,
pattern recognition. However, automatic recognition of un- the size of its alphabet is large consisting of many compound
constrained cursive handwriting has always been met with characters. Fig. 2 shows modified forms of a basic consonant
serious challenges. In such type of handwriting, information of Devanagari when it is attached with different basic vowel
about the boundaries of individual characters are not readily characters. On the other hand, the first two rows of Fig. 3
available because while a writer writes in an unconstrained show formation of Devanagari compound characters due to
way, the lifting of pen depends upon his/her idiosyncracy combinations of two basic consonant characters.

2379-2140/17 $31.00 © 2017 IEEE 658

DOI 10.1109/ICDAR.2017.113
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on October 24,2024 at 06:26:31 UTC from IEEE Xplore. Restrictions apply.
few recent studies [9], [10], [11], [12], [3] have dealt with
words of some of the Indian scripts.
Both lexicon-based and lexicon-free recognition studies of
online handwritten words of Indic scripts are available in the
literature. Hidden Markov models (HMM) were used in [11]
and [12] for lexicon-based recognition of online handwritten
(a) (b) (c) (a) (b) (c) Devanagari and Bangla words respectively while lexicon-
Figure 2. Shapes of Devanagari characters (c) formed by a basic consonant free generic recognition of Devanagari and Bangla words
character (a) and various basic vowel characters (b). was studied respectively in [11] and [9]. In case of generic
recognition of handwriting, often either stroke-based HMMs
or character-based HMMs are used. However, such strate-
gies usually include a segmentation module and the error
introduced at this level is a major concern. Problems due
(a) (b) (c) (d) (e) (f) to either ‘under segmentation’ or ‘over segmentation’ of
handwritten data are usually tackled at a post processing
Figure 3. Devanagari consonants (a), (b) and Bangla consonants (d), (e) stage. A segmentation-recognition approach for Devanagari
combine to form respective compound characters (c) and (f).
words was proposed in [11] while a rule-based segmentation
approach for cursively written Bangla words was presented
B. Bangla Script
in [9].
Bangla is one of the most popular languages of the Indian Due to the above problem of a segmentation based
subcontinent. The script used to write Bangla language is strategy, segmentation-free or segmentation-by-recognition
also called Bangla. Bangla writing system was derived from approaches have now gained popularity in the area of hand-
the Nagari script in the South Asia around the 11th century writing recognition study. Such segmentation-free online
CE. Bangla or its minor variant Assamese are used in a few handwritten Bangla word recognition strategies have been
states of eastern India and Bangladesh. Bangla has 11 basic studied in [10], [13].
vowels and 39 consonants. Vowels following a consonant
take a modified shape as in Devanagari. Also, two or more III. P ROPOSED H YBRID M ODEL FOR R ECOGNITION
basic consonants of Bangla combine to form compound
characters. The last two rows of Fig. 3 show formation of The contribution of the present study is the development
Bangla compound characters due to combinations of two of a novel hybrid neural network (NN) model towards
Bangla basic consonant characters. Handwriting in Bangla explicit segmentation free recognition of unconstrained on-
is usually mixed cursive in nature similar to English [3]. line handwriting independent of any specific lexicon. The
Examples of handwritten Bangla words and the printed three different NN architectures used in the proposed model
forms of Bangla basic characters or vowel modifiers forming include (i) Convolutional Neural Network (CNN), (ii) Re-
the respective words are shown in Fig. 4. current Neural Network (RNN) and (iii) Connectionist Tem-
poral Classification (CTC) output layer. Brief descriptions of
these three NN tools are provided below.

A. CNN
CNN has now been established as an efficient deep neural
network architecture [14], [15]. It is a layered architecture
with two distinct parts. The tasks of the layers in the first
part of this architecture are convolution and sub-sampling
operations. They have neurons arranged along width, height
(a) (b) and depth dimensions. The neurons in such a layer have
feed-forward connections only with the neurons in a small
Figure 4. (a) Handwritten samples of Bangla words, (b) Bangla basic
characters and vowel modifiers used to form the words. region of the layer in its immediate neighbour. The second
or last part of a CNN architecture is a fully connected mul-
II. R ELATED W ORKS tilayer perceptron consisting of one or more hidden layers.
Its last layer is the final output layer of the CNN which
Several studies of online handwriting recognition are
provides a single vector of class scores corresponding to the
found in the existing literature. A few recent studies on the
input image. The well-known backpropagation algorithm or
same include [4], [5]. The majority of existing online hand-
one of its variations is used to obtain the connection weights
writing recognition studies of the Indic scripts considered
of a CNN. A typical CNN architecture is shown in Fig. 5.
isolated characters as the input [2], [6], [7], [8]. However, a

659

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on October 24,2024 at 06:26:31 UTC from IEEE Xplore. Restrictions apply.
input sequence, RNN makes use of the information about
the previous element of the sequence stored into some sort
of memory of its hidden units. In the literature, RNN has
been successfully used in speech processing [17], natural
language processing [18], online handwriting recognition [1]
etc.

Figure 5. Architecture of a typical convolutional neural network. A

multi-channel image is fed as input. This typical example has 3 pairs of
convolution and maxpooling layers. The number of channels of the input
image to an intermediate layer is the number of feature maps generated
by the preceding layer. Each node of this part of the network has feed-
forward connections with only a local region of an image in its immediate
neighbouring layer. The output of the ﬁnal sub-sampling layer is ﬂattened
before it is fed as input to a fully connected part which has the architecture
of a multilayer perceptron.

Figure 6. RNN architecture: it consists of one or more hidden layers in

Although CNN has much similarity with the traditional addition to input and output layers. Nodes of hidden layers have recurrent
Multi Layer Perceptron (MLP) networks, the former ones connections and they compute Hi = f (UTi + WHi-1 ), where U and W are
weights along input and recurrent connections, f is the activation function
have certain architectural restrictions such as (i) inputs to of hidden units.
them are usually only images and (ii) the neurons in a
layer below the fully connected part are connected to only
The structure of an RNN is shown in Fig. 6. A sequence
a small region of the previous layer, the spatial extent
is fed as input to an RNN which processes its elements
of which is called its ‘filter size’. Also, certain parameter
one after the other and each instant of processing is often
sharing scheme is used at the Convolutional Layers of CNN
termed as a timestep. Thus, the input sequence of Fig. 6 is
reducing the number of its free parameters. Nodes of a
processed in N+1 timesteps. Each node in a hidden layer
CNN extract useful features from the input data which get
receives a feedback from that node itself. This feedback
combined at subsequent layers determining the higher level
mechanism helps to implement the effect of input of the
features. The units in these layer are organized in the form
previous timestep i-1 while processing the input of timestep
of planes, called feature maps. A convolutional layer of
i. The feedback at timestep i = 0 is assumed to be 0. Since
such a network is composed of several such feature maps
this feedback mechanism is recurrent in nature, processing
generating multiple features at each location of the input
at timestep i gets influenced by inputs of all timesteps 0 to i-
image. Such an architecture of the CNN enables it to exploit
1. Thus it is obvious that such an RNN architecture should
strong spatially local correlation present in an input image
be the natural choice as the recognition tool for temporal
and show their robustness to shifts and distortions in the
online handwriting data, processing of which at any timestep
input [16].
needs to be facilitated with the information of previous
The proposed model of online handwriting recognition
timesteps. Training of RNN is usually done using a variant of
includes a CNN architecture. Here, it may be noted that on-
backpropagation algorithm, called Backpropagation Through
line handwriting data, mainly a temporal sequence of (x, y)
Time (BPTT).
coordinates, is not readily available as image. However, it
It may now be noted that a simple RNN architecture as
contains spatial information of the pen-trajectory lying on
described above and trained using the BPTT algorithm has
the writing surface. So, we have applied certain trick to
difficulties of learning dependencies on timesteps of long
present this sequence data to the CNN as an image without
past due to the well-known vanishing gradient problem [19].
losing the temporal information in it. On the other hand, if
The gradient values get smaller exponentially, eventually
only CNN is used in the recognition work flow, it will not
practically vanish after only a few timesteps. Thus, the inputs
be possible to exploit the temporal information over some
to the RNN at those early steps do not contribute to the
long time unless the width of the convolution filter is very
learning process: learning long-range dependencies remains
large. But this opposes the very philosophy of CNN and the
elusive. Since RNNs are usually very deep (as deep as
same is resolved by including an RNN architecture into the
the input sequence length), the vanishing gradient problem
proposed model.
is common with the learning of these architectures. A
B. RNN special type of hidden unit, called Long Short Term Memory
(LSTM) [20] is often used to get rid of this problem.
Recurrent neural networks (RNN) are used to process
The power of RNN is further improved by using BLSTM,
sequential information. While processing an element of the

660

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on October 24,2024 at 06:26:31 UTC from IEEE Xplore. Restrictions apply.
input sequence a = (a1 , a2 , . . . , aN ) of length N, a recurrent
neural network R with k inputs, m outputs and weight vector
W is a continuous map RW : (Rk )N → (Rm )N . Let b =
RW (a) be the sequence of outputs of the network R, and
btn be the activation of output unit n ∈ {1, 2, . . . , |L| + 1}
at time t. Thus, btn is the probability of observing label n
N
at timestep t. It deﬁnes a distribution over the set L of
sequences of length N over the alphabet L = L ∪ {blank}:
Figure 7. (a) A handwritten Devanagari (Hindi) word, (b) Different
segments of the word and the corresponding labels – segments (i), . . .,
(vi) have the labels 1, . . ., 6 respectively.
N
N
p(η|a) = btηt , ∀η ∈ L (1)
t=1

bidirectional recurrent neural network (BRNN) [21], hidden N

Elements of L are termed as paths and denoted by η and
layers of which are made up of the so-called LSTM cells. ηt is the label observed at time t along path η. An operator Q
BLSTM networks are capable of exploiting both the past and is used to remove consecutively repeated labels and then the
the future contexts at any timestep. A BLSTM network is blanks from a path η to produce its labeling ζ. It is obvious
implemented by merging two LSTM networks one of which that Q is a many-to-one mapping and the probability of the
processes the input sequence in the forward direction while labeling ζ is obtained as follows.
the other processes it in the backward direction. BLSTM
networks are efficient in recognizing online unconstrained p(ζ|a) = p(η|a) (2)
handwriting data without segmenting them explicitly into η∈Q−1 (ζ)
characters.
The above formulation given in equation (2) helps to train
C. CTC Layer the RNN without pre-segmentation of its training samples.
For this purpose, a CTC objective function is defined as
In the present recognition task, a sequence of (x, y) coor- in equation (3) which is the negative log probability of the
dinates is required to be transcribed into words or characters network correctly labeling the entire training set.
(parts of words). Although RNNs are capable of learning
such sequences, the objective function of their training algo- O=− ln p(τ |a), (3)
rithm involves a label corresponding to each point of a train- (a,τ )∈T
ing sequence which in turn requires its pre-segmentation.
An example of a handwritten word and the labels of several where the training set T consists of pairs (a, τ ) of input
groups of its successive points are shown in Fig. 7. However, sequence and target. Derivatives of O with respect to the
similar segmentation of unconstrained cursive handwriting network weights are computed for BPTT training of RNN.
before their recognition is difficult and various data driven Details of this training, choice of the label ζ corresponding
approaches have been studied in the literature [9], [12]. to the highest conditional probability as in equation (4) and
These approaches are based on heuristics and involve data the CTC decoding strategy for transcription of unknown
dependent fine tuning of several parameters. Alternatively, input sequence can be found in [1].
HMMs have been frequently used where segmentation is ζ = arg max p(ζ|a) (4)
done at the time of recognition. But, HMMs have their ζ

known limitations [22]. A few hybrid approaches involving D. The Proposed Model
HMM and some neural network such as RNN have also been
studied in the literature [23] for this purpose. Such hybrid A CNN architecture is used at level 1 of the proposed
approaches can only partially solve the problem – they can model. It computes the feature representation of the input.
neither recover from all the drawbacks of HMM nor entirely Level 2 of the model consists of an RNN. It takes care of
exploit the potentials of RNN towards sequence labelling labelling the input sequence using the feature obtained at
tasks [22]. Another alternative is the use of a connectionist level 1. Finally, a CTC output layer is employed at level 3
temporal classification (CTC) output layer [22] which has for transcription of the input handwritten data without any
been used earlier in speech and handwriting recognition explicit segmentation scheme. At level 1, the spatial infor-
tasks to get rid of the problem of pre-segmentation. mation in the input sequence data is exploited in producing a
A CTC network has a softmax output layer with |L|+1 set of efficient features through supervised learning while the
units, where |L| is the size of the alphabet L. At any timestep, RNN of the next level makes use of the temporal information
the probabilities of observing various labels are computed at in the online handwriting data. Fig. 8 shows the architecture
the first |L| units of the output layer and its remaining one of this model while Table I presents detailed configuration
unit stores the probability of ‘no label’ or ‘blank’. For an of this architecture.

661

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on October 24,2024 at 06:26:31 UTC from IEEE Xplore. Restrictions apply.
width or number of columns and f eatures is the number of
channels of input data. The dimension of the output data of
the CNN module is 1 × timesteps × F, where timesteps
is the reduced number of timesteps after last maxpooling
operation (layer 6) and F is the number of feature maps
generated by the last convolution operation (layer 5). This
output data is again rendered the form timesteps × F at
the reshape layer of Level 2. This entire data is now passed
through the BLSTM layers before feeding the output to the
CTC layer of the ﬁnal level.
An implementation of this model can be found in Github.

IV. W ORKFLOW
Before feeding the input handwritten data to the proposed
model described in Section III-D, it is subjected to a prepro-
cessing stage followed by feature extraction. Preprocessing
operations include size normalization (normalized height =
100), re-sampling and translation as described in [13].
We have computed two different sets of features F eat1
and F eat2 for simulations of the proposed approach. F eat1
consists of only three quantities r, sin θ and cos θ, where
(r, θ) are polar coordinates of the points of preprocessed
samples and each such point deﬁnes a ‘timestep’. On the
other hand, the feature set F eat2 has 16 components details
of which are given below. In this later case, each segment
of 5 consecutive points Pi−2 , Pi−1 , Pi , Pi+1 , Pi+2 form a
‘timestep’ and these segments are chosen at a step size of 3
along the pen trajectory. Now, F eat2 representing a segment
consists of the following measures.
• sin α and cos α, where α is the smaller of the two
Figure 8. Proposed Hybrid Architecture adjacent angles between the lines OPi−1 and OPi+1 , O
being the origin of the coordinate system, • sin β and cos β,
Table I
M ODEL C ONFIGURATION where β is the smaller of the two adjacent angles between
the lines OPi−2 and OPi+2 , • vicinity aspect [1], • velocity
Layer no. Layer type Filters / Nodes Speciﬁcations before resampling [1], • y-coordinate after normalization
1
2
Convolution
Max-pooling
16
NA
Kernel size=5, Shift=1
Kernel size=5, Shift=2
[1], • average squared distance [1], • Fourier transforms
3 Convolution 32 Kernel size=5, Shift=1 of dxk , and dyk (k = 1, 2, 3, 4), where dxk and dyk are
4 Max-pooling NA Kernel size=5, Shift=2 respectively the signed differences in x and y values of
5 Convolution 32 Kernel size=5, Shift=1
6 Max-pooling NA Kernel size=5, Shift=2
successive points on the segment.
7 Reshape NA Squeeze Dimension Thus, F eat2 consists of 8 quantities computed in the
8 BLSTM 64 NA spatial domain and another 8 quantities computed in the
9 BLSTM 128 NA
frequency domain.

V. E XPERIMENTAL R ESULTS AND D ISCUSSIONS

As it has been already mentioned in Section III-A, the
The proposed recognition scheme has been simulated
architecture of a CNN is designed to take an image as input.
on two databases of online handwritten Hindi and Bangla
Thus, the dimensions of input data to the ﬁrst convolutional
word samples written respectively in Devanagari and Bangla
layer of Fig. 8 needs to be h × w × c, where h, w, c are
scripts. Table II provides some details of these databases and
respectively the height, the width and the number of channels
the samples of both are groundtruthed using Unicode.
of some image. However, the input data of the present
problem are temporal sequences of (x, y) coordinate values Table III shows character level recognition accuracies on
and the same or its handcrafted feature representation are the two databases provided by the proposed approach. This
rendered to provide the form 1 × timesteps × f eatures, accuracy value is measured as 1- (average edit distance),
where 1 is the height or number of rows, timesteps is the where edit distance of a word is computed as the proportion

662

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on October 24,2024 at 06:26:31 UTC from IEEE Xplore. Restrictions apply.
Table II [9] U Bhattacharya, A Nigam, Y S Rawat, and S K Parui. An
DATABASE D ETAILS analytic scheme for online handwritten Bangla cursive word
recognition. Proc. of the 11th ICFHR, pages 320–325, 2008.
Database No. of samples No. of Lexicon
Training Test Characters Size [10] G A Fink, S Vajda, U Bhattacharya, S K Parui, and B B
Devanagari 41831 9584 79 1959 Chaudhuri. Online Bangla word recognition using sub-stroke
Bangla 61728 15277 57 681
level features and hidden Markov models. In Int. Conf. on
Frontiers in Handwriting Recog., pages 393–398. IEEE, 2010.
Table III
R ECOGNITION P ERFORMANCE ON T EST S ETS OF T WO DATABASES
[11] A. Bharath and S. Madhvanath. HMM-based lexicon-driven
Database Accuracy (%) and lexicon-free word recognition for online handwritten
F eat1 F eat2 Indic scripts. IEEE Trans. on Patt. Anal. and Mach. Intell.,
Devanagari 59.57 82.39
34(4):670–682, 2012.
Bangla 68.21 84.47
[12] O. Samanta, U. Bhattacharya, and S. K. Parui. Smoothing
of HMM parameters for efficient recognition of online hand-
writing. Pattern Recognition, 47(11):3614–3629, November
of total number of insertion, deletion and substitution errors
2014.
with respect to the total number of Unicodes in the word.
Results shown in Table III proves that the use of higher [13] B Chakraborty, P S Mukherjee, and U Bhattacharya. Bangla
level handcrafted features is more efficient compared to online handwriting recognition using recurrent neural network
the use of coordinate values of the signal as the features. architecture. In 10th Indian Conf. on Computer Vision,
Graphics and Image Processing, page 63. ACM, 2016.
In future, we plan to perform further extensive studies on
selection of features for the proposed recognition scheme. [14] A Krizhevsky, I Sutskever, and G E Hinton. Imagenet
classification with deep convolutional neural networks. In
ACKNOWLEDGMENT Advances in Neural Info. Proc. Syst., pages 1097–1105, 2012.
C-DAC, Pune, India has provided Hindi word database.
[15] Y. LeCun and Y. Bengio. Convolutional networks for images,
R EFERENCES speech, and time series. In Michael A. Arbib, editor, The
[1] A Graves, M Liwicki, S Fernández, R Bertolami, H Bunke, Handbook of Brain Theory and Neural Networks, pages 255–
and J Schmidhuber. A novel connectionist system for uncon- 258. MIT Press, Cambridge, MA, USA, 1995.
strained handwriting recognition. IEEE Trans. on Patt. Anal.
and Mach. Intell., 31(5):855–868, 2009. [16] Y LeCun, L Bottou, Y Bengio, and P Haffner. Gradient-based
learning applied to document recognition. Proceedings of the
[2] H Swethalakshmi, A Jayaraman, V S Chakravarthy, and C C IEEE, 86(11):2278–2324, 1998.
Sekhar. Online handwritten character recognition of Devana-
gari and Telugu characters using support vector machines. In [17] A Graves and N. Jaitly. Towards end-to-end speech recogni-
10th Int. Workshop on Frontiers in Handwriting Recog., 2006. tion with recurrent neural networks. In 31st Int. Conference
on Machine Learning, volume 32, pages 1764–1772, 2014.
[3] S Bhattacharya, D S Maitra, U Bhattacharya, and S K
Parui. An end-to-end system for Bangla online handwriting [18] Y Wenpeng, K Katharina, Mo Y, and Hinrich S. Comparative
recognition. In 15th Int. Conf. on Frontiers in Handwriting study of CNN and RNN for natural language processing.
Recognition, pages 373–378. IEEE, 2016. CoRR, abs/1702.01923, 2017.

[4] T Van Phan and M Nakagawa. Combination of global [19] S. Hochreiter, Y. Bengio, and P. Frasconi. Gradient flow in
and local contexts for text/non-text classification in hetero- recurrent nets: the difficulty of learning long-term dependen-
geneous online handwritten documents. Pattern Recognition, cies. In J. Kolen and S. Kremer, editors, Field Guide to
51(C):112–124, March 2016. Dynamical Recurrent Networks. IEEE Press, 2001.

[5] A Delaye and C Liu. Contextual text/non-text stroke classi- [20] S. Hochreiter and J. Schmidhuber. Long short-term memory.
fication in online handwritten notes with conditional random Neural Computation, 9(8):1735–1780, 1997.
fields. Pattern Recogn., 47(3):959–968, 2014.
[21] A. Graves. Supervised Sequence Labelling with Recurrent
[6] S D Connell, R M K Sinha, and A K Jain. Recognition of Neural Networks, volume 385 of Studies in Computational
unconstrained online Devanagari characters. In 15th Int. Conf. Intelligence. Springer, 2012.
on Pattern Recognition, volume 2, pages 368–371, 2000.
[22] A Graves, S Fernández, F Gomez, and J Schmidhuber.
[7] S K Parui, K Guin, U Bhattacharya, and B B Chaudhuri. Connectionist temporal classification: labelling unsegmented
Online handwritten Bangla character recognition using HMM. sequence data with recurrent neural networks. In 23rd Int.
In 19th Int. Conf. on Patt. Recog., pages 1–4. IEEE, 2008. Conf. on Machine learning, pages 369–376. ACM, 2006.

[8] U Bhattacharya, B K Gupta, and S K Parui. Direction [23] Y. Bengio, Y. LeCun, C. Nohl, and C. Burges. Lerec: A
code based features for recognition of online handwritten nn/hmm hybrid for on-line handwriting recognition. Neural
characters of Bangla. In 9th Int. Conf. on Document Analysis Computation, 7(6), 1995.
and Recognition, volume 1, pages 58–62. IEEE, 2007.

663

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on October 24,2024 at 06:26:31 UTC from IEEE Xplore. Restrictions apply.