Computer Science Review: Reya Sharma, Baijnath Kaushik
Computer Science Review: Reya Sharma, Baijnath Kaushik
Survey
article info a b s t r a c t
Article history: The handwritten script recognition is one of the most interesting and challenging areas of pattern
Received 29 April 2020 recognition due to numerous variations in writing styles. Extensive in-depth research work is reported
Received in revised form 19 June 2020 on the recognition of handwritten text in scripts such as Latin, Chinese, Arabic and Japanese. However,
Accepted 26 August 2020
the work reported on handwritten Indic scripts is still in its infancy, so significant research is required
Available online xxxx
in this field. This paper aims to describe various advancements reported over the last few decades
Keywords: in the field of handwritten Indic scripts recognition by analysing several existing state-of-the-art
Indic scripts studies. This comprehensive survey presents a transparent panorama of various feature extraction and
Handwritten character recognition classification techniques for the offline recognition of handwritten Indic scripts. The most important
Pattern recognition part of this survey is to systematically present the reported works on handwritten Indic scripts
Feature extraction
like Devanagari, Bengali, Gurumukhi, Kannada, Telugu, Gujarati, Oriya, Tamil and Malayalam. After
Classification techniques
exploring the reported works, an analysis is performed based on the findings. Several issues and
challenges related to the recognition of Indic scripts are discussed, which indicates some future
research prospects. Based on the extensive study conducted in this article, it has been contemplated
that there is a need to develop hybrid feature extraction and classification approaches for achieving
the most accurate results. So, a novel framework based on improved particle swarm optimization
(PSO) algorithm to automatically construct optimal convolutional neural network (CNN) architecture
has been proposed with an aim to outperform the existing techniques.
© 2020 Elsevier Inc. All rights reserved.
Contents
1. Introduction......................................................................................................................................................................................................................... 2
1.1. Motivation .............................................................................................................................................................................................................. 3
1.2. Application areas ................................................................................................................................................................................................... 3
1.3. Main contributions of the article......................................................................................................................................................................... 3
1.4. Outline of the article ............................................................................................................................................................................................. 3
2. Background.......................................................................................................................................................................................................................... 4
2.1. Evolution of Indic scripts ...................................................................................................................................................................................... 4
2.2. Writing system....................................................................................................................................................................................................... 4
2.3. Classification of Indic scripts ................................................................................................................................................................................ 4
2.4. Peculiarities and challenges in Indic scripts....................................................................................................................................................... 5
3. Related surveys ................................................................................................................................................................................................................... 6
4. Survey protocol................................................................................................................................................................................................................... 6
4.1. Planning the survey............................................................................................................................................................................................... 6
4.2. Research questionnaire ......................................................................................................................................................................................... 6
4.3. Information sources............................................................................................................................................................................................... 7
4.4. Search criteria ........................................................................................................................................................................................................ 7
4.5. Quality assessment ................................................................................................................................................................................................ 7
4.6. Data extraction....................................................................................................................................................................................................... 8
5. Suvey of Indic scripts datasets ......................................................................................................................................................................................... 8
∗ Corresponding author.
E-mail address: [email protected] (R. Sharma).
https://doi.org/10.1016/j.cosrev.2020.100302
1574-0137/© 2020 Elsevier Inc. All rights reserved.
2 R. Sharma and B. Kaushik / Computer Science Review 38 (2020) 100302
and unique patterns that reduce the data for recognition and • Publishing houses
thereby enhance the recognition power [10]. The success rate • Reservation counters
of the classifier strongly depends upon the extracted features. • Forensic document analysis
Depending on the extracted features, the final classification and • Post offices
recognition phase is the prime decision-making stage. The pat-
terns are finally recognized on the basis of input features fed to 1.3. Main contributions of the article
the classifier.
• The most important aspect of this article is to present the
1.1. Motivation work reported in previous years (2000–2019) on the recog-
nition of handwritten Indic scripts by exploring more than
The electronic conversion of printed or handwritten text into 100 articles from reputed journals and renowned confer-
a computer-readable form is known as OCR. The problem of ences.
handwritten text recognition is more interesting and fascinating • It comprehensively outlines the peculiarities and challenges
as compared to printed text recognition due to the presence of major Indic scripts and provides a comparative study of
of uneven variations in handwriting style with respect to the our survey with other related surveys and review articles.
writers, content, and time. The handwriting of a person is always • This article illustrates a survey protocol that depicts the
unique, just like our fingerprints, and this uniqueness creates planning of the survey, research questionnaire, information
motivation and interest among the researchers to work in this sources, search criteria, quality assessment process, and data
exigent and challenging field. extraction.
Various miscellaneous languages are used around the world. • This paper scrutinizes various eminent datasets available for
Many languages have disappeared as their usage is limited and research in handwritten Indic script recognition.
due to their presence in rural or geographically inaccessible parts • This article exhaustively surveys numerous feature extrac-
of the globe. So, at this point, it is highly recommended to use tion as well as classification techniques for the recognition
technologies like OCR and natural language processing to stop the of handwritten Indic scripts.
extinction of languages in the world. There are almost 7000 lan- • An analysis is performed on the basis of findings and re-
guages in the world (https://www.ethnologue.com/guides/how- lated work, which highlights various research challenges
many-languages) and handwritten OCR systems are available and future directions that need to be considered for efficient
only for some of them. OCR systems are mostly available for handwritten script recognition.
languages that are of huge importance and strong economic value • Finally, a proposed framework is given with an aim to over-
like Chinese, Latin, Arabic and Japanese. Most of the languages come the drawbacks of existing classification algorithms.
derived from Indic script appear to be at the risk of vanishment An improved PSO algorithm to automatically construct op-
due to the absence of efforts. So, there is an immense need for timal CNN architecture has been proposed as illustrated in
character recognition related research for Indic scripts. With the Algorithm 1.
prevalent use of computers in homes and organizations, auto-
matic paper documents processing is gaining huge importance, 1.4. Outline of the article
which leads us to a paperless environment and the development
of the OCR system for regional languages. So, in this paper, The remainder of this paper is organized into following sec-
we have made a sincere attempt to perform a systematized tions. Section 2 gives the introduction and evolution of major
state-of-the-art survey for problems that have been discussed Indic scripts, along with their peculiarities and challenges. Sec-
above. tion 3 provides the comparison of present survey with other
traditional surveys and review articles. Section 4 discusses the
1.2. Application areas survey protocol including the plan, research questionnaire, infor-
mation sources, search criteria, etc. which provides a cognisance
The handwritten character recognition has numerous real-life assistance to the researchers to perform the survey. Section 5
commercial and practical applications which provide an enthu- illustrates the detailed study of standardized datasets available
siasm to the researchers to explore this field. Some of these for several Indic scripts. Section 6 exhaustively explains and
application areas are listed below: compares the work done on different feature extraction and
• Banks classification techniques for the recognition of handwritten Indic
• Libraries script. The classification techniques are divided into three broad
• Historic data analysis categories, namely neural network based techniques, SVM based
4 R. Sharma and B. Kaushik / Computer Science Review 38 (2020) 100302
for writing these languages [12]. Scripts basically depict the writ-
ing system for the language that indicates sounds and represents
phonetics of the language.
techniques and miscellaneous techniques. Section 7 summarizes The Brahmi script is categorized into two major groups, one
the analysis investigated on the basis of findings and related constituting the north Indian scripts and the other consisting of
work. Section 8 highlights the challenges in Indic script charac- south Indian scripts [15]. A single script may be used for writing
ter recognition and provides future research prospects. A novel more than one language. For example, Devanagari script which is
framework for handwritten Indic script character recognition has the most widely used Indic script [16] is used for writing Hindi,
been proposed in Section 9. The experiment analysis and com- Dogri, Sanskrit, Kashmiri, Marathi, Sindhi, Nepali, and Konkani.
parison with state-of-the-art works are presented in Section 10. After Devanagari, the most widely used script is Bengali and
Finally, Section 11 concludes the paper. it is used for writing Maithili, Bengali, Manipuri, and Assamese
languages. Table 1, presents detailed information including lan-
2. Background guages, category and domain for several Indic scripts.
Table 1 Table 2
Description and related information of Indic scripts. Comparative analysis of our survey with other related surveys.
Script Languages Script category Domain Topics Our survey [16] [17] [18] [11] [19] [12]
Devanagari Hindi, Sanskrit, Indo-Aryan/ Many States (2020)
Rajasthani, etc. Northern-group Motivation for Research ✓
Gurumukhi Punjabi Indo-Aryan/ Punjab Application areas ✓ ✓ ✓ ✓ ✓ ✓
Northern-group Sources of information ✓
Bengali Bengali, Assamese Indo-Aryan/ Tripura, West Research questions ✓
Northern-group Bengal, Assam Evolution of Indic scripts ✓ ✓
Gujarati Gujarati Indo-Aryan/ Gujarat Peculiarities and challenges ✓ ✓ ✓
Northern-group in Indic scripts
Oriya Odia Indo-Aryan/ Orissa Survey of feature extraction ✓ ✓ ✓ ✓ ✓ ✓ ✓
Northern-group techniques
Takri Dogri, Chambeali Indo-Aryan/ J&K, Himachal Survey of classification ✓ ✓ ✓ ✓ ✓ ✓ ✓
Northern-group Pradesh methods
Comparative analysis based ✓ ✓ ✓ ✓ ✓
Manipuri Meithei Indo-Aryan/ Manipur on accuracy
Northern-group Synthesis analysis based on ✓
Kannada Kannada Dravidian/ Karnataka findings
Southern-group Challenges and future ✓ ✓ ✓ ✓ ✓ ✓
Telugu Telugu Dravidian/ Andhra Pradesh directions
Southern-group Proposed framework ✓
Tamil Tamil Dravidian/ Tamil Nadu
Southern-group
Malayalam Malayalam Dravidian/ Kerala
Southern-group • In addition to compound characters, there are also modi-
fied characters in most Indian scripts. The vowel remains
in its original shape when it appears at the beginning of
2.4. Peculiarities and challenges in Indic scripts the word, however when a vowel follows a consonant it
modifies its shape and thus known as modified character or
• The character set in most Indic scripts consists of basic diacritics [18]. The modified character may be placed on the
characters and compound characters. The basic characters top, bottom, left or right of the consonant. The character set
are simply the collection of consonants and vowels, however of major Indic scripts is shown in Figs. 3 and 4.
compound characters are formed by the combination of two • Most Indian scripts are written from left to right [22]. How-
or more basic characters [20,21]. The compound characters ever, some modified characters may not follow this left to
are usually more complex in shape than the basic character. right writing sequence. In Indic scripts there is no concept
However, some Indic scripts like Gurumukhi and Tamil do of upper case and lower-case characters. Unlike other Indic
not have compound characters. scripts the alphabetic sequence in Urdu script is from right
6 R. Sharma and B. Kaushik / Computer Science Review 38 (2020) 100302
Table 3
Research questionnaire with motivation.
S. No. Research questions Motivation
RQ1 What is handwritten OCR? To know about suitable approaches for handwritten text recognition.
RQ2 Application areas where handwritten OCR systems are most To know about the advantages and usefulness in real-life particle and
suited? commercial applications to make handwritten OCR systems more demanding.
RQ3 What are the different approaches used for building To explore and study techniques as well as literature available for
handwritten OCR model for Indic scripts? handwritten OCR systems for Indic script.
RQ4 Identify the number of Indic scripts in which handwritten OCR To study and identify that for many Indic scripts, handwritten OCR systems
systems are successful? have been developed successfully.
RQ5 Availability of standardized datasets for the research? It provide a detailed study of benchmark and standardized datasets available
for Indic scripts.
RQ6 What are the best feature extraction techniques and It urges to exhaustively explore and study various feature extraction
classification algorithms for handwritten Indic script OCR? techniques and classification algorithms for handwritten Indic script
recognition.
RQ7 Work done in major Indic scripts? It motivates towards a comparative study for major Indic scripts (Devanagari,
Gurumukhi, Bengali, Kannada, Telugu, Gujarati, Oriya, Tamil and Malayalam)
and reports the observations made on reviewed work.
RQ8 Does the accuracy of proposed system depend upon the size Experimental analysis has to be conducted in order to evaluate and compare
of dataset? the recognition accuracies of handwritten characters with specific algorithm.
RQ9 Does the accuracy increase with the use of hybrid technique To study and investigate the experimental studies using novel hybrid
for feature extraction and classification? techniques and analyse the comparison on the basis of recognition accuracies.
The research on handwritten Indic script character recognition Research questionnaire form the basic structure of a research
is still at its infancy, and this leads to the necessity of methodical paper, project, survey, or study. They provide a clear focus and
R. Sharma and B. Kaushik / Computer Science Review 38 (2020) 100302 7
• CMATER: Center for Microprocessor Applications for Train- 5.5. Kannada standardized datasets
ing Education and Research (CMATER) in Jadavpur Univer-
sity, Kolkata, India has created various database repository
for major Indic scripts OCR. The CMATERdb 3.2.1 [28] is a
• KHTD: The Kannada Handwritten Text Dataset (KHTD) [42]
Devanagari numeral database, it consists of 2000 training consists of 4298 text lines and 26,115 words, gathered from
samples and 1000 test samples. The CMATERdb 2.2.3 [29] 204 different handwritten documents written by almost 51
dataset consists of 15,528 Devanagari word images. The native Kannada writers.
CMATERdb 1.4 [30] dataset has 150 document images with
handwritten Devanagari text lines. 5.6. Oriya standardized datasets
• ISI-HDND: The ISI Handwritten Devanagari Numeral
Database (ISI-HDND) [31], consists of 22,556 handwritten
• ISI-HOND: The ISI Handwritten Oriya Numeral Database (ISI-
Devanagari numeral samples written by 1049 different writ-
HOND) [39] consists of 5970 handwritten Oriya numerals
ers. These samples are scanned optically from three types
of handwritten documents like job application forms, postal collected from 356 writers and prepared by using 300 dpi
mail and special set of forms designed by the collectors. flatbed HP scanner.
• HPL: The HPL Offline Handwritten Devanagari Isolated Char- • IITBBS: A dataset named ‘‘IITBBS’’ [43] developed at IIT
acter database (HPL-iso-dev-char) [32], consists of 18,192 Bhubaneswar consists of 35,000 handwritten Oriya charac-
handwritten character images specified for training and ter images written by 500 native Oriya writers. The images
3894 handwritten character images specified for testing are collected by using optical scanner with a resolution of
written by more than 100 native. speakers of Devanagari. 300 and 600 dpi.
R. Sharma and B. Kaushik / Computer Science Review 38 (2020) 100302 9
Table 5
Detailed description of standardized handwritten datasets for various Indic scripts.
Dataset name (Reference) Script Dataset type Dataset size
ISI-HDND [31] Devanagari Numerals 22,556
CMATERdb 3.2.1 [28] Devanagari Numerals 3000
HPL-iso-dev-char [32] Devanagari Characters 29,970
DHCD [33] Devanagari Characters 92,000
CMATERdb 2.2.3 [29] Devanagari Words 15,528
CMATERdb 1.4 [30] Devanagari Text lines 150 document images
CMATERdb 3.1.1 [34] Bengali Numerals 6000
ISI-HBND [39] Bengali Numerals 23,392
CMATERdb 3.1.2 [35] Bengali Basic characters 15,000
CMATERdb 3.1.3 [36] Bengali Compound characters 42,697
CMATERdb 3.1.4 [37] Bengali Modified characters 2044
CMATERdb 2.1.3 [29] Bengali Words 18,931
CMATERdb 1.1 [38] Bengali Text lines 100 document images
CMATERdb 3.4.1 [28] Telugu Numerals 6000
HPL-iso-telugu-char [40] Telugu Characters 45,154
HPL-iso-tamil-char [41] Tamil Characters 82,000
KHTD [42] Kannada Text lines and words 4298 text-lines, 26,115 words
ISI-HOND [39] Oriya Numerals 5970
IITBBS [43] Oriya Characters 35,000
Table 6
Description of feature extraction techniques for Devanagari, Gurumukhi and Bengali scripts.
Features Category Author [Reference]
Shadow, CC, Intersection and Line fitting Structural features Arora et al. [51]
Longest run, Octant centroid, Modified shadow Statistical and structural features Basu et al. [52]
Longest run, Quad tree and Shadow Statistical and structural features Das et al. [53]
Curvelet coefficients Structural features Singh et al. [54]
Legendre and Zernike moment Statistical features Kale et al. [55]
Power curve fitting based features Statistical features Kumar et al. [56]
Topological Structural features Bag et al. [57]
Zone based centroid, Distance profile, BDD and CC Statistical and structural features Singh and Maring [58]
Pixel density features, Structural Statistical and structural features Shelke and Apte [59]
Scale invariant features transform Statistical features Surinta et al. [60]
Gradient and Curvature based composite feature Statistical and structural features Aggarwal and Singh [61]
Discrete cosine transform based features Statistical features Kumar et al. [62]
HOG, Projection profile Statistical features Yadav and Purwar [63]
LBP, Directional, Regional features Statistical and structural features Kumar and Gupta [64]
Automatic extraction using BornoNet Non-explicit features Rabby et al. [65]
Supervised Layerwise-DCNN Non-explicit features Jangid and Srivastava [66]
compute the curvature features. In [54] curvelet transform tech- The results indicate that GABN based feature extraction provides
nique was used for feature extraction. Curvelet coefficients were more promising results than GABM for handwritten Gurumukhi
computed for thick and thin character images. The fusion of these character recognition.
curvelet coefficients computed by inverse Fast Fourier Transform Kumar et al. [56] proposed two curve fitting based feature
acts as features. extraction methods, i.e., power curve fitting and parabola curve
Singh and Lehri [68] presented a pixel based feature extraction fitting. In both power and parabola curve fitting methods, the
technique in which the feature vector was composed of pixel val- thinned character image was segmented into z zones. Then the
ues obtained from the normalized thinned character image. Kale power curve and parabola curve were fitted to the foreground
et al. proposed an orthogonal feature extraction technique in [55], pixels in each zone using the least square method. In this work,
based on Legendre moment and Zernike moment. Shelke and the power curve fitting method provides the highest accuracy for
Apte [59] recognized characters using both structural features Gurumukhi characters.
and normalized pixel density features. Singh et al. [58] proposed Aggarwal et al. [61] used gradient and curvature based fea-
feature extraction technique based on four different kinds of tures extraction approach. These two sets of features were then
features, namely, zone based centroid, chain code, distance profile combined which result in the formation of a composite feature
and background directional distribution features. vector. The composite feature vector (CGCFV) was computed by
The research work reported in [63] was based on two kinds of the cross product of gradient directions (feature vector) and cur-
features: Projection profile histogram and Histogram of oriented vature levels (feature vector). The research work reported in [62]
gradient (HOG). The projection profile histogram was obtained by was based on the collection of different feature extraction tech-
calculating the number of background pixels along the horizontal niques. These techniques include discrete cosine transformation
and vertical directions in the entire character image. The HOG (DCT), discrete wavelet transformation (DWT), fan beam trans-
features were extracted using shape description by computing formation (FBT), and fast Fourier transformation (FFT). Among all
gradient in local segments of the images. these techniques the DCT provides the most promising results for
In [69], Neocognitron neural network was proposed for ex- feature extraction.
tracting features from handwritten Gurumukhi characters. The Kumar and Gupta [64] have extracted three kinds of features,
neocognitron neural network performs feature extraction by go- namely directional features, local binary pattern (LBP) and re-
ing through successive stages. In each stage, the recognition sys- gional features. For each character image, a sum of 117 features
tem extracts relevant features from the preceding stage and thus were extracted with 54 directional features, 59 LBP features, and
a compressed representation was formed from those extracted 4 regional features.
features for further recognition. Among earlier works on handwritten Bengali script, Basu
Siddharth et al. [70] extracted zoning density based features et al. [52] relied strongly on topologically features, namely, octant
(ZD) and background directional distribution features (BDD). In centroid features, longest run features and modified shadow
this work, they extracted 16 zonal densities features for each features for handwritten Bengali script. Surinta et al. [60] have
image and 8 BDD features were computed for each of the 16 proposed two kinds of local gradient feature descriptors, viz.,
zones. histogram of oriented gradient features (HOG) and scale invari-
Sinha et al. [71] proposed a zone-based technique based on ant features transform (SIFT) for handwritten Bengali characters
the combination of the image centroid zone and zone centroid recognition.
zone. In zone centroid zone method, the image was segmented In [73], the extracted features were categorized into two types
into z equal zones and then each zone centroid and the average i.e. global and local features. A total of 175 global features con-
distance between zone centroid and each pixel in the zone was sidered in the research work, among them 155 were convex hull
computed. In image centroid zone method, the character image features and the remaining were centre of gravity based quad
was segmented into z equal zones and then the centroid of image tree longest run features. The longest run features contributing
and the average distance between image centroid and each zone to local feature set were then obtained from each local or sub
was computed. region depicted by node in the quad tree, across four directions,
Gabor filter based feature extraction was reported in [72], in vertical, horizontal and two diagonals.
this work two variants of Gabor features GABM and GABN were Pal et al. [74] used gradient features for classifying handwrit-
used. In GABM based technique the Gabor filter outputs were ten Bengali compound characters. Firstly, size normalization was
processed to compute the energy magnitudes, which serves as done by applying 2 × 2 mean filtering on the grey scale char-
a feature vector. However, in GABN based technique the outputs acter image. The gradient image was then obtained by applying
were not processed any further and used as extracted features. Roberts filter on the normalized image. Initially, the direction of
R. Sharma and B. Kaushik / Computer Science Review 38 (2020) 100302 11
gradient was quantized along 32 directions, therefore histograms like, feed-forward neural network, feed-backward neural net-
are computed for these 32 directions. Finally, Gaussian filter was work, self-organizing neural network, recurrent neural network
used to down sample these directional frequencies so as to obtain and many more.
392-dimensional feature vector. Among the initial works reported on handwritten Devanagari
Das et al. [53] used feature set composed of three kinds of script, Arora et al. [51] developed a recognition technique based
features, viz., shadow features, longest run features and centre on the ensemble of four MLP classifiers. Each MLP classifier was
of gravity based quad tree features. Topological features based trained with conventional backpropagation algorithm and sig-
recognition scheme was proposed in [57]. Decomposition rules moid activation function, but with different features, including 16
were formulated in this work that breaks compound characters shadow features, 200 chain-code histogram features, 32 intersec-
into subsequent simple shape components. This decomposition tion features and 48 line-fitting features. The momentum term
helps in improving the efficiency of features used and thereby and learning rate were set as 0.7 and 0.8 respectively. Finally,
provides better recognition performance. a majority voting scheme was used to aggregate the results
Pramanik and Bag [75] used chain-code histogram based fea- obtained from all four MLP classifiers.
ture set for the recognition of handwritten Bengali characters. The recognition system outlined in [68] trains two layer back-
Firstly, the smallest rectangular frame was computed for each propagation neural network (BPNN) with gradient descent delta
character image and then each image was divided into 7 × 7 learning to identify handwritten Devanagari characters. The
blocks. Chain-code histogram was computed for each block and method reported in [82] recognizes handwritten Devanagari vow-
each histogram can have one of 8 possible values, viz., 0, 1, 2, 3, els using feed-forward MLP due to its well-known generalization
4, 5, 6, 7. Thus, 7 × 7 × 8 = 392 dimensional feature vector was and edification abilities. The FFMLP was trained using static
obtained. back-propagation with sigmoid and tanh activation functions.
The method used in [76] has extracted shadow features, tran- Khanduja et al. [83] proposed a gradient descent rule-based NN to
sition features, diagonal features, directional features, zoning fea- minimize the squared error between output values and network
tures, centroid features, intersection point features, power curve- target values. This network architecture has two hidden layers
fitting features, and parabola curve-fitting features. In this work, with the first layer having 70 neurons, and the second layer hav-
the authors have also proposed two significant feature extraction ing 40 neurons. The recognition system outlined in [84] trained
approaches, namely, modified division point and peak extent using deep convolutional neural network (DCNN) with RMS Prop
based feature extraction approaches. optimizer to identify handwritten Devanagari characters.
Table 6 elucidates the feature extraction techniques for hand- Jain et al. [69] proposed a neocognitron ANN to identify hand-
written Devanagari, Gurumukhi and Bengali scripts. written Gurumukhi characters. Neocognitron ANN is a well-
known classifier in pattern recognition problems for its significant
6.1.2. Classification and recognition performance and fast processing time. In this work, neocog-
Classification is the process of categorizing feature vectors nitron ANN firstly extracts features from handwritten images
into distinct groups or classes depending upon their similarities. and then classify them. The classification is performed in series
Classification techniques works by dividing the entire dataset into of functionally equivalent stages. At every stage, a compressed
two parts: training set and testing set. Classification is a broad representation is formed by extracting relevant features from
research area in itself and also a significant phase of character the output of the previous stage. This compressed representation
recognition. There are extensively large numbers of classifiers stores the spatial location of derived features and acts as an input
available in the field of pattern recognition [77,78], so we need to the succeeding stage. Thus, the classification was done by
to restraint our scope, thus the classifiers having more relevance subsequently extracting and compressing features until the input
to this field are discussed here. character image gets reduced into a vector representation.
The classification techniques are broadly categorized into Kumar et al. [64] presented a deep neural network (DNN)
three classes: NN based approaches, SVM based approaches and based classification technique for handwritten Gurumukhi char-
miscellaneous techniques. These techniques have been analysed acters. In contrast to traditional neural networks, deep learning
in the following subsections. neural networks have a significantly large number of hidden
A. Neural Network based techniques layers. The proposed DNN has 117 hidden layers, which also
Artificial Neural network (ANN) based techniques are poten- represent the number of extracted features. For training the clas-
tially used for the classification and recognition of handwritten sifier, two autoencoder layers were used followed by a single
characters [79,80]. Due to its parallel architecture, ANN performs soft-max layer. The above system reported an accuracy of 99.30%
the computations at significantly higher frequency as compared for 2700 handwritten Gurumukhi characters. Basu et al. [52] used
to traditional techniques. These classifiers are built with an aim a back-propagation based MLP (BPMLP) for the recognition of
to simulate human brains. Neurons are the basic adaptive units handwritten Bengali basic characters with a momentum value of
in ANN with associated connections and corresponding synaptic 0.7 and a learning rate of 0.8.
weights. These classifiers are mainly composed of three layers, Pramanik and Bag [75] presented a BPMLP for the classifi-
namely, input, hidden and output layer. cation of handwritten Bengali compound characters. The back-
Jangid et al. [66] proposed a layer-wise trained Deep Convo- propagation learning algorithm was based on an iterative gra-
lutional Neural Network (DCNN) architecture for the recognition dient approach which reduces the error between actual output
of handwritten Devanagari character. The proposed model was and desired output while training MLP. Using BPMLP, the authors
tested on ISIDCHAR database with 56,477 handwritten character claim a recognition accuracy of 88.74% for handwritten Bengali
samples and claim a recognition accuracy of 98%. compound characters. Keserwani et al. [85] designed an efficient
On the basis of number of hidden layers, the neural network unified-CNN model based on the Adadelta optimizer to recognize
(NN) may be classified into different categories [81], for exam- handwritten Bengali characters. In this work, the proposed model
ple, NN with a single hidden layer is called perceptron whereas reported 98.56% accuracy and has a significantly lesser number
NN with multiple hidden layers is called Multi-Layer Percep- of parameters as compared to the conventional deep learning
tron (MLP). The NNs are further classified into different kinds architectures.
12 R. Sharma and B. Kaushik / Computer Science Review 38 (2020) 100302
Table 7
Recognition accuracies for handwritten Devanagari characters.
Methodology Dataset size Feature extraction Classification technique Recognition accuracy (%)
Kale et al. [55] 27,000 Legendre and Zernike moment Support Vector Machine 98.51(Basic),
98.30(Compound)
Jangid and Srivastava [66] 56,477 Automatic SL-DCNN 98.00
Singh and Maring [58] 20,000 Zone based centroid, CC, Distance Profile Support Vector Machine 97.61
and BDD
Shelke and Apte [59] 40,000 Pixel density features, Structural Fuzzy system and FFNN 96.95
Yadav and Purwar [63] 4428 Histogram of oriented gradient, Projection Quadratic SVM 96.60
profile
Jangid et al. [84] 36,172 Automatic DCNN with RMS Prop 96.00
Pal et al. [67] 36,172 Curvature, Gradient Mirror Image Learning 95.19(Curvature),
94.94(Gradient)
Sarkhel et al. [89] 22,086 Multiscale-multicolumn CNN Support Vector Machine 95.18
Singh et al. [54] 31,860 Curvelet based features K-Nearest Neighbour 93.80
Singh and Lehri [68] 1000 Pixel based features Backpropagation NN 93.00
Arora et al. [51] 4900 Shadow, CC histogram, Intersection features Feed-foward NN 92.80
and Line fitting
Narang et al. [87] 5484 SIFT, Gabor filter, PCA Poly-SVM 91.39
Sharma et al. [50] 11,270 Direction CC histogram Modified QDF 80.36
Table 8
Recognition accuracies for handwritten Gurumukhi characters.
Methodology Dataset size Feature extraction Classification technique Recognition accuracy (%)
Kumar and Gupta [64] 2700 Directional, LBP, Regional features Deep Neural Network 99.30
Aggarwal and Singh [61] 7000 Curvature and Gradient Support Vector Machine 98.56
Kumar et al. [56] 3500 Power curve fitting K-Nearest Neighbour 98.10
Kumar et al. [62] 10,500 Discrete cosine transform SVM with linear kernel 95.80
Sinha et al. [71] 7000 Zone based features SVM, KNN 95.11, 90.64
Siddharth et al. [70] 7000 Zoning density and BDD Support Vector Machine 95.04
Singh et al. [72] 7000 Gabor features SVM with RBF Kernel 94.29
Jain and Sharma [69] 15,000 Automatic features Neocognitron NN 92.78
Garg et al. [76] 8960 Peak extent, MDP, PCA Linear SVM+polySVM+kNN 92.30
Kaur and Rani [90] 2450 Zoning features Convolutional NN 92.08
Table 9
Recognition accuracies for handwritten Bengali basic and compound characters.
Methodology Dataset size Dataset type Feature extraction Classification technique Recognition accuracy (%)
Sarkhel et al. [89] 42,697 Compound Multiscale-multicolumn CNN Support Vector Machine 98.12
Keserwani et al. [85] 41,536 Compound Automatic Unified-CNN 98.12
Roy et al. [91] 42,959 Compound Automatic SL-DCNN 90.33
Pramanik and Bag [75] 10,240 Compound Chain code histogram Multi-layer perceptron 88.74
Bag et al. [57] 19,800 Compound Topological Template matching 86.74
Pal et al. [74] 20,543 Compound Gradient MQDF 85.90
Das et al. [53] 19,765 Compound Shadow, Quad tree, LR Support Vector Machine 80.51
Keserwani et al. [85] 15,000 Basic Automatic Unified-CNN 98.56
Rabby et al. [65] 15,000 Basic Automatic extraction BornoNet 98.00
Sarkhel et al. [73] 15,000 Basic CG based Quad tree LR, Convex hull Support Vector Machine 87.28
Sarkhel et al. [92] 15,000 Basic LR, Enhanced harmony search SVM with RBF kernel 86.53
Roy et al. [93] 15,000 Basic ABCO, Gradient features Support Vector Machine 86.40
Gupta et al. [94] 15,000 Basic HOG, Convex hull, LR, Harmony search Support Vector Machine 86.10
Surinta et al. [60] 5527 Basic Scale invariant feature transform Support Vector Machine 85.00
Basu et al. [52] 10,800 Basic Modified shadow, Octant centroid, LR MLP based two stage classifier 80.58
is achieved by the SVM classifier with the extraction of non- Initially, seven invariant moments has been derived for each char-
explicit MMCNN features from a standardized CMATER db 3.1.3 acter image but this process provides very poor recognition rate.
dataset [89]. Additionally, the unified-CNN model [85] trained us- Thus, the character image was firstly segmented into four zones,
ing the Adadelta optimizer with 0.95 decay rate and 1.00 learning viz., upper left, upper right, lower left and lower right. Then,
rate also achieved 98.12% recognition accuracy on CMATER db invariant moments were evaluated for each zone resulting in total
3.1.3 dataset. of 7 × 4 = 28 features. Dhandra et al. [97] used the directional
6.2. Reported work on Kannada and Telugu scripts spatial features to recognize handwritten characters. They ex-
tracted directional spatial features like number of strokes, stroke
6.2.1. Feature extraction length and stroke density as potential features for handwritten
Among earlier works on Handwritten Kannada and Telugu Kannada character recognition.
script, Pal et al. [95] has proposed a directional feature based The method used in [98] has extracted shape based features
technique. In this work, they segmented each character image viz., chain codes and invariant Fourier descriptors to identify
into equal blocks and the directional features were evaluated for character in handwritten Kannada script. Authors in [101] also
each block. Then The Gaussian filter was used to downsample relied on shape based features, namely, wavelet packets and nor-
these blocks and the directional features obtained from these malized chain codes. In this work, wavelet decomposition coeffi-
downsampled blocks are then fed to the classifier. cients and normalized chain codes were extracted as 22 dimen-
Sangame et al. [96] extracted moment invariant features from sional feature vector from normalized binary character images of
zoned images for handwritten Kannada character recognition. size 40 × 40.
14 R. Sharma and B. Kaushik / Computer Science Review 38 (2020) 100302
Table 10
Description of Feature extraction techniques for handwritten Kannada and Telugu scripts.
Features Category Reference
Moment invariant features Statistical features Sangame et al. [96]
Fourier descriptors and chain codes Statistical and structural features Rajput and Horakeri [98]
Zone based and pixel density features Statistical features Mukarambi et al. [99]
Positional features Statistical features Vaidya and Bombade [100]
Wavelet decomposition and normalized chain code Statistical and structural features Dhandra and Mukarambi
[101]
Zoning method Statistical features Sastry et al. [102]
MMCNN Non-explicit features Sarkhel et al. [89]
Block wise pixel count Statistical features Lakshmi [103]
Table 11
Recognition accuracies for handwritten Kannada script.
Methodology Dataset size Feature extraction Classification technique Recognition accuracy (%)
Karthik et al. [104] 18,800 Distributed average of gradients Deep belief networks 97.04
Dhandra and Mukarambi [101] 1400 Normalized chain code and wavelet decomposition K-Nearest Neighbour 95.07
Rajput and Horakeri [98] 6500 Fourier descriptors and chain codes SVM 93.92
Rani et al. [105] 5200 Automatic Alex net 92.00
Pasha and Padma [106] 4800 Structural features and Wavelet transform ANN 91.00
Pal et al. [95] 10,779 Directional features Quadratic classifier 90.34
Dhandra et al. [97] 1400 Spatial features K-Nearest Neighbour 90.10
Angadi and Angadi [107] 2490 Structural features SVM 89.84
Vaidya and Bombade [100] 7350 Positional features GRNN 85.62
Sangame et al. [96] 1625 Moment invariant features K-Nearest Neighbour 85.53
Mukarambi et al. [99] 2800 Zone based and pixel density features SVM 73.33
Mukarambi et al. [99] proposed zoned based feature extraction features include dimensional features, small glyphs positional
approach. All the normalized 32 × 32 dimensional character features and zone based features. Sarkhel et al. [89] presented a
images were segmented into 64 non overlapping zones. Thus non-explicit feature extraction techniques. In this work, a Multi-
64 zone based features were obtained by computing the pixel column Multiscale Convolutional Neural Network (MMCNN) ap-
density for each zone. Vaidya et al. [100] developed a feature proach was developed. The multiscale convolutional sampling
extraction technique based on positional information of each method helps in extracting more invariant and robust features
pixel in the handwritten character image. Firstly, the avg_matrix from character images and the multicolumn architecture im-
is computed by adding all the character image matrices into proves the effectiveness of the system.
a single matrix and then this resultant matrix was divided by Lakshmi used [103] two feature extraction techniques, namely,
the total number of image matrices. The avg_matrix was then cell pixel count feature extraction and histogram profile. Among
subtracted from each character image matrix so as to obtain these two techniques cell pixel count feature extraction provides
unique features based on positional information of each pixel in better performance to identify characters.
the character image. Angadi et al. [107] used structural features Table 10 summarizes the feature extraction techniques for
for the identification of handwritten Kannada characters. These handwritten Kannada and Telugu scripts.
features include structural information of character images like 6.2.2. Classification and recognition
eccentricity, orientation, perimeter, filled area and convex area. A. Neural Network based techniques
The work reported in [106] was based on the extraction of Vaidya [100] proposed a generalized regression neural net-
wavelet features and structural features, namely, aspect ratio, work classifier (GRNN) for the recognition of handwritten Kan-
quadrant density, corner detection, width and correlation to rec- nada characters. The architecture of GRNN classifier was com-
ognize handwritten Kannada characters. A distributed average of posed of pattern layer and summation layer. The GRNN recogni-
gradient-based feature extraction technique (DAG) has been pro- tion model was based on radial basis network and trained using
posed in [104] to recognize handwritten Kannada characters. The positional features. Experimental evaluation on 7350 character
DAG features are inspired by the conventional speeded up robust samples reported recognition accuracy of 85.62%.
features (SURF) descriptors, however, the numbers of windows in A Feed-Forward ANN classifier has been presented in [106]
DAG, are kept fixed to 4. to recognize handwritten Kannada characters. The FFANN trained
Sastry et al. [108] proposed a 3 D feature extraction technique using wavelet and structural features like aspect ratio, quadrant
for the classification of handwritten Telugu characters. These 3 density, corner detection, width, and correlation. This classifier
D features were based on the X, Y and Z coordinates of pixels for evaluated 4800 samples of handwritten characters and achieved
each character image, where the attribute Z indicates the depth of an accuracy of 91.00%. Ragha et al. [110] used back propagation
indentation at the pixel. This depth is very important attribute for MLP classifier to identify handwritten Kannada characters. In
character recognition and is directly proportional to the pressure this work, they trained BPMLP using statistical and moment fea-
applied by pen at the pixels. tures extracted from cut images, directional images and original
Zone based features were extracted in [102], the character images.
image was firstly partitioned into number of non-overlapping Angadi et al. [111] presented a CNN based recognition model
zones or segments. The statistical features were computed for for handwritten Telugu characters. In this work, the CNN ar-
each pixel contained in the segment of character image. These chitecture comprises of four convolutional layers, followed by a
pixel intensities were then summed to obtain feature for each max-pooling layer and two fully-connected layers. Furthermore,
segment of the image. Then the features of all segments in the the generalization techniques, such as data augmentation and
character image form the feature vector. dropout, are also applied. SGD optimizer and categorical cross-
Manisha et al. [109] proposed a hybrid feature extraction entropy loss are used for training and evaluating the model,
method to recognize handwritten Telugu characters. These hybrid respectively.
R. Sharma and B. Kaushik / Computer Science Review 38 (2020) 100302 15
B. SVM based techniques classifier with the extraction of non-explicit MMCNN features
Rajput et al. [98] presented a Gaussian kernel based SVM from standardized HPL-iso-telugu-char dataset.
classifier with 5-fold cross-validation to recognize handwritten 6.3. Reported work on Gujarati, Oriya, Tamil and Malayalam scripts
Kannada characters. In this work, normalized chain-code and
Fourier descriptor features were fed to the multiclass SVM clas- 6.3.1. Feature extraction
sifier using the one-versus-rest class model. While evaluating Prasad et al. [112] used the structural features or pixel based
6500 handwritten characters, they claimed recognition accuracy features to identify handwritten Gujarati characters, where indi-
of 93.92%. Mukarambi et al. [99] developed an SVM classifier vidual image pixels were treated as features. Patel et al. [113]
have extracted a combination of statistical and structural features.
based on zoning and pixel density features with a two-fold cross-
The structural features were used as primary features and statisti-
validation scheme for the classification of handwritten Kannada
cal features were used as secondary features. The primary feature
characters.
set consists of number of objects in image, number of objects in
In [107] the authors presented a one-versus-all approach
image’s upper part, number of objects in image’s lower part and
based multiclass SVM, trained using topological and structural number of holes in the character image. The secondary feature
features to identify handwritten Kannada characters. SVM classi- set consists of averaging features, moment features and centroid
fier is one of the prominent classifiers in the pattern recognition distance features.
field. A hybrid feature extraction technique was used in [114,115],
The non-explicit multicolumn multiscale CNN based features based on the combination of statistical and structural features.
were fed to SVM classifier in [102] for recognition of handwritten The proposed technique extracts four features, namely, pattern
Telugu characters. This work reported an accuracy of 95.76% with descriptor, Gabor phase XNOR pattern features (GPXNP), autocor-
a five-fold cross-validation scheme. relation and contour direction probability distribution function
features (CDPDF). The pattern descriptor represents the structural
C. Miscellaneous techniques features whereas GPXNP, autocorrelation and CDPDF represents
statistical features. Among these features only pattern descriptor
(1) Quadratic Discriminant Function: and GPXNP are considered to be relevant due to their higher
Pal et al. [95] used a quadratic classifier for the recognition power values and the fuzzy hedges values are also maximum for
of handwritten Kannada and Telugu characters. In this work, each class. The remaining two features, viz., autocorrelation and
400-dimensional directional features were fed to the quadratic CDPDF are considered to be irrelevant on the basis of their fuzzy
classifier to recognize 10,779 Kannada characters and 10,872 hedge values and hence rejected.
Telugu characters. They claimed recognition accuracy of 90.34% The structural features also plays significant role in the re-
and 90.90% respectively. search reported in [116]. The method includes structural features
like number of closed loops, connected components or discon-
(2) K Nearest Neighbour Classifier: nected components and number of end points. In [117], the
Sangame et al. [96] applied the kNN classifier with Euclidean direction and strength of gradient was obtained by applying
distance criterion to classify handwritten Kannada characters. Robert’s filter on Oriya character image. Then the biquadratic in-
Dhandra et al. [97] used the kNN classifier with 4-fold cross- terpolation technique was used for computing curvature features,
validation to recognize handwritten Kannada characters. While quantized in three levels according to linear, concave and convex
experimental evaluation a recognition accuracy of 90.1% was regions. Finally, the dimension of obtained feature vector was
achieved using spatial features and kNN classifier with k = 1. reduced by applying principal component analysis (PCA) and fed
Authors in [101] presented a kNN classifier with normal- to the classifier.
ized chain code and wavelet decomposition features to identify Pal et al. [117] used the Fisher ratio (F-ratio) and gradient
handwritten Kannada vowels. In this work, they performed the based technique for the recognition of handwritten Oriya char-
experimental evaluations using varying k values with k = 1, 3, acters. Initially, they obtained 400-dimensional gradient feature
5 and optimal results were obtained when k = 3. The average vector by applying Roberts and Gaussian filter. Then the feature
recognition accuracy achieved for handwritten Kannada charac- vector was modified by using feature weighting scheme based
on F-ratio. This feature weighting technique identifies the similar
ters was 95.07%. Sastry et al. [102] used the kNN classifier with
shaped characters much easily by reducing the features belonging
zone-based pixel intensity features for identifying handwritten
to identical parts of similar shaped characters and enhancing
Telugu characters.
the features belonging to distinguishable parts of similar shaped
(3) Decision Tree Classifier: characters.
Sastry et al. [108] proposed a Decision Tree (DT) classifier Padhi [118] extracted several features like average angle based
on image centroid, average angle based on zone centroid, average
trained using 3 D features to recognize handwritten Telugu char-
distance based on zone, centroid distance based on zone and
acters. The DT was constructed using SEE5 algorithm, which
standard deviation based features. Dash et al. [119] proposed a
claims a recognition accuracy of 93.10% for handwritten samples.
binary external symmetry axis constellation (BESAC) feature ex-
6.2.3. Some observations on handwritten Kannada and Telugu
traction technique. This method works by generating two binary
scripts
coded histograms of orientations which were then concatenated
Table 11 compiles the experimental results obtained from
to obtain proposed BESAC feature.
several classifiers for the recognition of handwritten Kannada In [120], the character image was partitioned into 3 × 3 zones
scripts. The maximum recognition accuracy obtained so far for and the centroid is evaluated for each zone. Then the vertical and
the handwritten Kannada script is 97.04% [104]. This recognition horizontal Euclidean distance was computed for the nearest pixel
accuracy is obtained by feeding Deep Belief Networks classifier from each zone centroid. Finally, the mean Euclidean distance
with distributed average of gradients features. Table 12 sum- was calculated along with the mean angular values of each zone
marizes the performance details of several classifiers for the and these values corresponds to the feature set.
recognition of handwritten Telugu script. As shown in Table 12, Bhattacharya et al. [121] proposed a two stage recognition
the best recognition accuracy of 95.76% [89] is achieved by SVM system for handwritten Tamil script. In the first stage of this
16 R. Sharma and B. Kaushik / Computer Science Review 38 (2020) 100302
Table 12
Recognition accuracies for handwritten Telugu script.
Methodology Dataset size Feature extraction Classification technique Recognition accuracy (%)
Sarkhel et al. [89] 45,217 MMCNN SVM 95.76
Sastry et al. [108] Not-specified 3D features Decision Tree 93.10
Angadi et al. [111] 45,133 Automatic CNN with SGD optimizer 92.40
Pal et al. [95] 10,872 Directional Quadratic classifier 90.90
Lakshmi [103] 18,000 Block wise pixel count KNN and SVM 90.80
Manisha et al. [109] 23,875 Hybrid and zone based features KNN 88.15
Sastry et al. [102] 19,250 Zoning method Nearest Neighbour classifier 78.00
recognition scheme, the character image was partitioned into concentrated in 32 curvature directions. Then concatenate these
7 × 7 blocks of equal size. Then each block was scanned along two feature sets and apply PCA on concatenated feature set in
both vertical and horizontal directions. Finally, the total number order to obtain final reduced feature vector.
of transitions from black to white and vice versa were counted Raju et al. [132] proposed a gradient and run length count
in each scan for obtaining the feature vector. In the second stage based feature extraction technique (GF-RLC) along with three
of recognition scheme, the chain code histogram features were other simple features, namely, position of centroid, character
computed for contour representation of input character image. code and aspect ratio for identification of Malayalam charac-
Shanthi et al. [122] used the pixel based density features. In ters. Manjusha et al. [133] have designed feature descriptors
this work, the character was segmented into non overlapping based on scattering convolution network for the classification of
blocks. Then the pixel densities were computed for each block of handwritten Malayalam characters.
character image and used as features. Subashini et al. [123] pro- Table 13, elucidates the features extraction techniques for
posed the SIFT algorithm for constructing local invariant feature handwritten Gujarati, Oriya, Tamil and Malayalam scripts.
vector for each character image. They generated the codebook 6.3.2. Classification and recognition
from the obtained feature vector set for each character image
A. Neural Network based techniques
using K means clustering. The bag-of-keypoints were then calcu-
Jose et al. [124] proposed a feed-forward BPNN using sigmoid
lated for total number of images. These features are then finally
activation function to identify handwritten Tamil characters. The
fed to SVM for classification.
FFBPNN consists of three layers and based on gradient descent
In [124] the localized features were obtained using two-
back-propagation approach. In this work, the FFBPNN is trained
dimensional wavelet transformation for handwritten Tamil char-
using wavelet decomposition features to classify handwritten
acter recognition. Abirami et al. [125] have extracted six statis-
samples.
tical features from the character boundaries to identify Tamil
Manuel et al. [136] used a back-propagation based MLP
characters. These six different features are freeman directional
(BPMLP) to classify handwritten Malayalam characters. The
code, slope angle, aspect ratio, curvature, linearity and curliness.
BPMLP was trained using curvelet-based features. This work
The research work reported in [126] was based on two step
reported a recognition accuracy of 95.99% on 2120 handwritten
procedure of feature selection and extraction. The features were
character samples.
firstly selected using zoning and 8 directional chain-code tech-
Extreme Learning Machine (ELM) [137], is a fast learning neu-
nique. Then the selected features were extracted using bound
ral network, in which the input weights and hidden biases are
boxing algorithm and sub line direction, and fed to the classifier.
chosen randomly. Along with the extremely fast learning prop-
Raj et al. [127] used zoning technique to partition the charac-
erty, this algorithm also provides a good generalization ability.
ter image into different zones. Then the chain code algorithm was
The work reported in [130] claimed an accuracy of 95.59% using
applied on each zone of the character image for selecting essential
wavelet feature based ELM on 9000 sample images.
character portion features. Finally, features were extracted from
Raju et al. [132] proposed a FFBPNN classifier for the recogni-
each selected portion of the character image using pixel points
tion of handwritten Malayalam characters. In this approach, the
count, pixel based on zoning division (horizontal and vertical),
FFBPNN with sigmoid activation function was trained by using
pixel location based on axis, and pixel location based on row, col-
combination of run length count and gradient features along with
umn and diagonal. The method in [128] was based on statistical
other simple features like centroid, character code and aspect ra-
analysis in which the pixel points present in the character image
tio. Using FFBPNN, a recognition accuracy of 99.78% was achieved
were analysed and represented using quad tree algorithm. The
after evaluating 19,800 handwritten characters.
research work reported in [129] was based on the extraction of
twelve directional features for 36 blocks and thereby generating B. SVM based techniques
a feature vector of dimension 432. Shanthi et al. [122] used a SVM classifier based on statisti-
The wavelet energy based features (WEF) were extracted cal learning theory to recognize handwritten Tamil characters.
from character image using wavelet transform in [130]. WEF The pixel density based features were computed for training
represents wavelet energy distribution of handwritten charac- the SVM classifier using max-win voting approach. Experimental
ters in various directions at different decomposition levels. The evaluations on 41,489 handwritten characters claim a recogni-
wavelet coefficient amplitude increases with the increase in scale tion accuracy of 82.04% with 5-fold cross-validation. Subashini
of wavelet decomposition. In order to discriminate handwritten et al. [123] presented a linear SVM classifier trained using local
character images, the wavelet energy of different levels possess invariant SIFT descriptor features to classify handwritten Tamil
different powers. Thus, the feature vector represents patterns of characters.
characters on the basis of wavelet energy for classification. Shyni et al. [126] proposed a weighted SVM, in which the
Jomy et al. [131] used a combination of gradient and curvature weight factor was computed using Lagrangian Theorem. The
based feature extraction technique along with principal compo- weighted SVM classifier was trained using chain-code, zoning,
nent analysis for dimension reduction. Firstly, gradient features bound box and sub-line detection features to identify 6000 char-
were computed using directional information obtained from arc acter samples with a recognition accuracy of 88%. Raj et al. [128]
tangent of gradient. Secondly, curvature features were obtained also used multiclass SVM based on Lagrangian theorem for the
using biquadratic interpolation technique on strength of gradient classification of handwritten Tamil characters They implemented
R. Sharma and B. Kaushik / Computer Science Review 38 (2020) 100302 17
Table 13
Description of feature extraction techniques for Gujarati, Oriya, Tamil and Malayalam scripts.
Features Category Reference
Gradient, Curvature and PCA Statistical and structural features Pal et al. [117]
Gradient and F-ratio Statistical features Wakabayashi et al. [134]
Pixel density Statistical features Shanthi and Duraiswamy [122]
SIFT feature descriptors Statistical features Subashini and Kodikara [123]
Centroid and moment-based features Statistical features Patel and Desai [113]
CL, Connected-disconnected components and End points Structural features Thaker and Kumbharana [116]
Gradient based features and Run length count Statistical features Prasad and Kulkarni [115]
Zone, Chain-code Statistical and structural features Raj and Abirami [127]
Quad tree feature Statistical and structural features Raj et al. [128]
Automatic features using LSTM Non-explicit features Jino et al. [135]
Zone based mean angular values and mean ED Statistical features Sethy et al. [120]
Reduced scattering convolutional network Non-explicit features Manjusha et al. [133]
the SVM classifier using one against all approach to predict 4200 like Euclidean distance and triangular distance. The proposed
handwritten samples and indicates an accuracy of 88.25%. The classifier along with Gabor phase XNOR pattern features (GPXNP)
multiclass SVM classifier proposed in [138] has been trained using provides a recognition accuracy of 86.33% for handwritten sam-
discrete value features extracted from Z-ordering, strip-tree, and ples. The work reported in [119] was classified using nearest
quad-tree approaches to identify handwritten Tamil characters. In neighbour classifier on the basis of BESAC features. The pro-
this multiclass SVM classifier, the one-vs-all class model is used posed approach claimed a recognition accuracy of 99.48% for
along with the divide and conquer technique. 7800 handwritten Oriya characters.
Jomy et al. [131] used a SVM classifier with ten-fold cross-
(3) Template Matching:
validation to identify handwritten Malayalam characters. The
A template matching technique based on structural features
multiclass SVM classifier operates using RBF kernel, with kernel
was presented in [112] for the recognition of handwritten Gu-
parameter (γ ) equals to 0.02. While evaluating 13,200 handwrit-
jarati characters.
ten Malayalam characters, a recognition accuracy of 97.96% was
achieved. Manjusha et al. [133] presented a SVM classifier with (4) Decision Tree Classifier:
linear kernel to recognize handwritten Malayalam characters. The research work presented in [116] proposed a decision
The linear SVM was trained using scattering convolutional net- tree classifier to identify handwritten Gujarati characters. In the
work features extracted from 29,302 character images, provides proposed model, several features like closed loops, connected-
a recognition accuracy of 91.05%. disconnected components and end points were fed to decision
tree classifier which claims an accuracy of 88.78%.
C. Miscellaneous techniques
(5) Adaptive Neuro Fuzzy Classifier
(1) Quadratic Discriminant Function Classifier: Prasad et al. [114] implemented an Adaptive Neuro Fuzzy
Pal et al. [117] proposed a quadratic classifier with five-fold (ANF) classification technique to recognize handwritten Gujarati
cross-validation to recognize handwritten Oriya characters. In characters. This ANF classification model was trained using
this work, gradient and curvature features were fed to quadratic GPXNP features.
classifier which achieves a recognition accuracy of 94.6%. Wak-
abayashi et al. [134] presented a Quadratic Discriminant Function (6) Hidden Markov Classifier
(QDF) classifier with five-fold cross-validation to identify hand- Abirami et al. [125] presented a symbol modelling Hidden
written Oriya characters. F-ratio based weighted features were Markov Model (HMM) for the classification of handwritten Tamil
fed to QDF classifier, which claims an accuracy of 95.14% for characters. Since the handwritten character recognition is consid-
18,190 handwritten character samples. ered to be very complex problem due to several dissimilarities
Pal et al. [95] used a quadratic classifier for the recognition in characters like different size, dimension, thickness, orientation
of handwritten Tamil characters. In this work, 400-dimensional and style. HMM are proved to be very efficient classifier in such
directional feature vector was fed to the quadratic classifier to scenarios.
recognize 10,216 Tamil characters and claims recognition accu- HMM mainly depends upon state probabilities and states are
racy of 96.73%. Moni et al. [129] proposed a modified QDF to assigned on the basis of features. In this work, Baum Welch
classify handwritten Malayalam characters. The MQDF classifier algorithm was used along with three probabilities to train HMM.
was trained using 12 directional code features and claims an ac- The HMM transition was represented using 5 states and prob-
curacy of 95.42% for 19,800 handwritten samples. In comparison ability value was assigned on the basis of reference value. The
to QDF, the MQDF enhances the recognition accuracy by more proposed HMM model was trained using freeman directional
than 10% and also reduces the computational cost. code, curvature, aspect ratio, slope angle and curliness features
Raju et al. [132] used a simplified QDF classifier for the recog- and claims a recognition accuracy of 85.00%.
nition of handwritten Malayalam characters. In this approach, the 6.3.3. Some observations on handwritten Gujarati, Oriya, Tamil and
SQDF was trained using combination of run length count and Malayalam scripts
gradient features along with other simple features like centroid, Table 14 summarizes the performance details of various clas-
character code and aspect ratio. Using SQDF, a recognition accu- sifiers for the recognition of handwritten Gujarati scripts. The
racy of 99.66% was reported after evaluating 19,800 handwritten best recognition accuracy reported so far for handwritten Gujarati
characters. script is 99.48% [139]. This recognition accuracy is obtained by
using Polynomial kernel SVM classifier with structural decom-
(2) K Nearest Neighbour Classifier: position. As seen from Table 15, a wide variety of classifiers are
Prasad et al. [115] used a weighted kNN classifier for the used for the identification of handwritten Oriya scripts. Among
recognition of handwritten Gujarati characters. In the proposed them, the quadratic discriminant classifier provides the optimal
approach, the conventional kNN algorithm was improved by ex- recognition accuracy of 97.40% [120]. Table 16 compiles the ex-
ploiting combination of new feature weights, distance measures perimental results obtained for handwritten Tamil script. The
18 R. Sharma and B. Kaushik / Computer Science Review 38 (2020) 100302
Table 14
Recognition accuracies for handwritten Gujarati script.
Methodology Dataset size Feature extraction Classification technique Recognition accuracy (%)
Sharma et al. [139] 20,500 Structural decomposition Polynomial kernel SVM 99.48
Pareek et al. [140] 10,000 Automatic CNN with Adam optimizer 97.21
Thaker and Kumbharana [116] 750 Closed loops, CDC and End points Decision tree classifier 88.78
Prasad and Kulkarni [115] 16,560 GPXNP Weighted KNN 86.33
Prasad et al. [112] Unspecified Structural features Template matching 71.66
Prasad and Kulkarni [114] 16,560 GPXNP Adaptive NFC using feature selection 68.67
Patel and Desai [113] Unspecified Centroid and moment based features Tree classifier and KNN 63.10
Table 15
Recognition accuracies for handwritten Oriya script.
Methodology Dataset size Feature extraction Classification technique Recognition accuracy (%)
Sethy et al. [120] 9400 Zone based mean angular values and mean ED Quadratic discriminant classifier 97.40
Wakabayashi et al. [134] 18,190 Gradient and F-ratio Modified QDF 95.14
Dash et al. [119] 7800 Binary ESAC K-Nearest Neighbour 95.01
Pal et al. [117] 18,190 Gradient, Curvature and PCA Quadratic classifier 94.60
Padhi [118] Not specified Average angle, Average distance and SD BPNN and Genetic Algorithm 94.00
Dash et al. [141] 10,200 Tetrolet + SCC Nearest Neighbour 93.24
Table 16
Recognition accuracies for handwritten Tamil script.
Methodology Dataset size Feature extraction Classification technique Recognition
accuracy (%)
Sarkhel et al. [89] 82,000 MMCNN SVM 98.79
Kavitha et al. [142] 82,928 Automatic CNN 97.70
Pal et al. [95] 10,216 Directional Quadratic classifier 96.73
Raj et al. [138] 10,000 Z-ordering and Tree representation Hierarchical SVM 90.30
Bhattacharya et al. [121] 77,609 Directional, Chain code histogram K-means algorithm and 89.66
Multi-layer perceptron
Raj and Abirami [127] 12,000 Zone, Chain-code and Statistical features SVM 89.00
Jose and Wahi [124] 3100 Wavelet decomposition BPNN 89.00
Raj et al. [128] 4200 Quad tree feature SVM 88.25
Shyni et al. [126] 6000 Chain code, zoning, bound box and Weighted SVM 88.00
sub-line detection
Abirami et al. [125] 3360 Freeman directional code, curvature, etc. HMM 85.00
Shanthi and Duraiswamy [122] 41,489 Pixel density SVM 82.04
Subashini and Kodikara [123] 8000 SIFT feature descriptors SVM 81.62
Table 17
Recognition accuracies for handwritten Malayalam script.
Methodology Dataset size Feature extraction Classification technique Recognition
accuracy (%)
Raju et al. [132] 19,800 Gradient based features, run Simplified quadratic discriminant 99.78(MLP),
length count function and MLP 99.66(SQDF)
Jomy et al. [131] 13,200 Gradient, Curvature and PCA SVM with RBF kernel 97.96
Jino et al. [135] 18,000 Automatic features Stacked Long Short Term Memory 97.00
Manuel et al. [136] 2120 Curvelet transform Multi-layer perceptron 95.99
Chacko et al. [130] 9000 Wavelet Energy features Extreme Learning Machine 95.59
Moni and Raju [129] 19,800 Directional features Modified QDF 95.42
Kishna et al. [143] Unspecified Texture features Hybrid HMM with ANN 93.40
Manjusha et al. [133] 29,302 Reduced scattering Linear SVM 91.05
convolutional network
best accuracy of 98.79% is achieved by SVM classifier with the • One of the basic issues in Indic script OCRs is to find a
extraction of non-explicit MMCNN features from standardized significantly good benchmark dataset. Therefore, for Indic
HPL-iso-tamil-char dataset [89]. As shown in Table 17, the max- scripts, there is a necessity for the construction of exhaus-
imum recognition accuracy for Malayalam script is 99.78% and tive and standardized datasets. Most of the work done on In-
achieved by training MLP classifier GBF-RLC features [132].
dic scripts OCR is based on relatively small datasets gathered
in the laboratory environment.
7. Analysis of work done on handwritten Indic sripts OCR
• Parameters like accuracy, runtime and error rate are neces-
sary to perform comparative as well as experimental analy-
• Due to infinite variations in handwriting styles, the task of
sis of feature extraction and classification models.
handwritten script recognition becomes quite complex and
challenging. • From the survey, it is clearly evident that the accuracy of the
• It is evident from the studies that for handwritten script model relies on the size and quality of the dataset.
recognition, high accuracy rates are achieved by using hy- • Deep learning models provide very high recognition accu-
brid features i.e. the combination of statistical and structural racies, but only when the size of the training dataset is
features. sufficiently large.
R. Sharma and B. Kaushik / Computer Science Review 38 (2020) 100302 19
• In hybrid classification models, the majority voting scheme the distinguishing element is lost in pre-processing phase
is mostly used by the researchers to aggregate the results or if the distinguishing element is too small to identify due
obtained from different classifiers. to variability in the writing styles. Thus, these characters
• Non-explicit feature extraction techniques such as ANN, need peculiar attention in order to get recognized correctly.
HMM, CNN, etc. helps in extracting more invariant and A two-stage recognition scheme can be used for handling
robust features from handwritten character images. these confusing characters, in which the first stage groups
• Accuracy of handwritten script recognition system depends the similar shaped characters into one class resulting in
largely upon the features with high discriminating power. reduced number of recognition classes. Then the second
Hence, there is an immense need to investigate feature stage classifies the similar shaped characters within the
selection techniques in order to achieve the most accurate class by extracting character specific features and thereby
results. immensely improving the recognition results.
• No standardized dataset is available for handwritten Guru- • Demand for multi-script OCR: India is a multi-script and
mukhi, Gujarati and Malayalam scripts. multi-lingual country with most of the people communi-
• Studies reported on Indic scripts indicate that various re- cating in their own regional languages, so there is a huge
searchers have created their in-house datasets to carry out need for the development of multi-script character recog-
the research. nition system for Indic scripts. To design a multi-script
• Most of the standardized and benchmark datasets available recognition system, it is necessary to recognize different
for Indic scripts belong to Devanagari and Bengali scripts. scripts documents before delivering them to the recognition
Other than these, some standardized datasets are also avail- system of individual scripts. The research work reported on
able for Kannada, Telugu, Oriya, Tamil scripts. multi-script Indic scripts is still at its infancy, so this area
• The work reported on handwritten Indic scripts signifies needs great attention to support the community of active
that principal component analysis is one of the highly used researchers in this field.
and efficient dimension reduction techniques. • Development of OCR for historical and degraded documents:
• The success rate of the classifier entirely depends upon the Majority of research work reported on Indic scripts recogni-
features extracted. So, in order to achieve good accuracy, tion are based on good quality documents. The work done
the feature extraction technique and classification algorithm on poor quality degraded documents is almost negligible.
should be chosen carefully. Therefore, experiments should be made in this direction for
• Studies reveal that among all Indic scripts, maximum work furthering the research on degraded, noisy and historical
has been reported on Devanagari and Gurumukhi scripts, documents. The recognition of these degraded historical and
and have yielded satisfactory results. ancient archives will help several memory institutions to
• From the above findings, it is analysed that SVM and ANN digitalize their manuscript collection. Thus, contributing to-
are very common classifiers that provide good recognition wards the preservation and advancement of ancient heritage
accuracies for handwritten Indic scripts. But again some lat- and will be very useful to philologist and historians.
est findings prove that recognition accuracies can be further • Combination of multiple classifiers: To significantly improve
improved by using CNN and other hybrid techniques. the classification performance, future research should intent
towards combining several classifiers so as to obtain an
8. Challenges and future perspectives ideal combination (ensemble of classifiers). The fusion of
classifiers can be performed in two ways i.e. sequential and
The research works discussed in this survey depict the efforts parallel fusion. The sequential fusion is mostly used for
of several researchers towards the recognition of handwritten In- handling large category set, whereas parallel fusion deals
dic scripts. The recognition accuracies claimed by applying a wide with improving the accuracy classifier.
variety of feature extraction and classification techniques are very • Post-recognition error detection and correction: The character
persuasive. But certain areas like constrained database, multi- recognition segment for handwritten Indic scripts are highly
script and degraded ancient documents, confusing characters and prone to give inaccurate results owing to the structural
many more still need more exploration. Therefore, this section complexities of characters in Indic scripts. So, in order to
highlights the number of challenges and future directions, which improve the recognition accuracy, it is recommended to use
may improve the performance and recognition rates of existing grapheme features, linguistic information or script specific
works when given proper scrutiny. knowledge. The research work reported on the use of post-
recognition error detection and correction is significantly
• Lack of benchmark and standard datasets: The creation of less, hence more endeavour is required to explore this field.
benchmark and standard datasets is one of the most indis-
pensable aspect for any character recognition system. Due 9. Proposed framework
to the inadequacy of comprehensive standardized datasets
for Indic scripts, researcher have created their own in-house Based on the aforementioned future directions, it is contem-
datasets. But these in-house datasets do not provide any plated that for the handwritten recognition of Indic scripts, there
established standards or parameters for evaluating the per- is a huge need to develop novel hybrid classification approaches.
formance of algorithms. Thus, successful research in this So, in this work we have directed our efforts to propose a hybrid
direction needs publicly available standard and accurate classification technique with an aim to outperform existing state-
datasets. Any effort infused in these activities will go a long of-the-art techniques. For better understanding, this proposed
way, furthering the research towards the advancement of algorithm has been demonstrated and explained in detail with
handwritten Indic scripts character recognition systems. the help of an algorithm and example.
• Resolution of confusing and similar characters: There are sev- Deep neural networks (DNN) have proven to outperform tra-
eral characters in Indic scripts that resemble to each other a ditional machine learning techniques for solving complex hand-
lot, thus the recognition of these similar shaped characters written Indic scripts recognition problems. However, building a
becomes quite difficult and challenging. The distinction of handcrafted successful DNN right from scratch requires a lot of
these similar shape characters becomes more complex if problem domain knowledge. For training deep CNN architectures
20 R. Sharma and B. Kaushik / Computer Science Review 38 (2020) 100302
millions of parameters are involved, thus a single training run activation function like rectified linear units (ReLu).
may take several days and hence this approach consumes a signif-
⎨Z n = X ,
⎧
n=1
icant amount of computational resources and time. Thus, a novel
framework is proposed based on the improved particle swarm Zn = fn (Yn ), n>1 (3)
Yn = gn (Zn−1 , Kn )
⎩
optimization (PSO) algorithm, to automatically search and create
optimal deep convolutional neural network (CNN) architecture ⎨Yn = Kn Yn−1 ,
⎧
if nth layer is convolution
for the classification and recognition of handwritten Indic scripts
Yn = (⊞x,y , Yn−1 ), if nth layer is pooling (4)
as shown in Fig. 9.
Zn = fn (Yn ),
⎩
if nth layer is fully-connected
The PSO is a population based meta-heuristic global opti-
mization technique commonly used in continuous and discrete The proposed framework consists of improved PSO algorithm
optimization problems. PSO mimics the flying behaviour of flock including the following four stages that help it to search and
of birds searching for food. In PSO algorithm, a single solution construct the optimal CNN architecture. In the first step, particles
is known as particle and the group of all solutions is known as are initialized with random CNN architectures. Then in the next
swarm. The PSO is based on the idea that every particle in the step fitness is evaluated in order to determine the loss. The third
swarm has knowledge about its position, velocity, its personal step deals with velocity computation and finally, the particles are
best configuration (p-best) and global best configuration of the updated leading the search towards best CNN architecture. The
swarm (g-best) [144]. In every iteration, the p-best of particles general layout of the proposed model is shown in Algorithm 1.
and g-best of the swarm are adjusted and updated in such a way The detailed description of improved PSO–CNN is given in the
that all particles lead towards the same optimal position. The following subsections.
velocity and position kth particle of at nth iteration is updated
using Eqs. (1) and (2) respectively. 9.1. Representing CNN architectures with swarm initialization
Vkn+1 = ω ∗ Vkn + c1 r1 ∗ (p − bestkn − Xkn ) + c2 r2 ∗ (g − best n − Xkn ) (1) The initialization of population or swarm is the first stage of
the proposed algorithm. This step constructs N particles corre-
Xkn+1 = Xkn + Vkn+1 (2) sponding to random CNN architectures, with a different number
In CNN classifier [145], layers are aligned together in such a of layers. For every particle, the first layer is always a convolution
manner that output of one layer will be fed as input to the suc- layer (Conv) and the last layer is always a fully-connected layer
ceeding layer. Mathematically, the CNN architecture is depicted in (FC) in order to build feasible CNN architectures. In addition
Eq. (3), where X denotes the input data, Zn indicates the output to this, the fully-connected layer is always placed at the end
of nth layer, the activation function corresponding to nth layer i.e. it cannot be placed in between convolution and pooling lay-
ers. Once, a fully-connected layer is placed in the architecture
is represented as fn (), the kernel operation corresponding to nth
then every other layer that follows it is also a fully-connected
layer is gn (), the kernel for nth layer is Kn and the kernel operation
layer. Typically, the function of fully connected layer classifies the
output before activation function for nth layer is represented by
reduced feature vector obtained from convolution and pooling
Yn . The CNN classifier is mainly composed of three layers, namely
layers.
convolution, pooling and fully-connected layers, represented in
Eq. (4). The convolution layer is responsible for constructing
Example. Three randomly initialized particles are illustrated in
feature map with the help of sliding kernel or filter over the
Fig. 10 with each particle representing a unique CNN architecture.
input data on the basis of stride size. Then, the pooling layer
is responsible for down-sampling on the convolved features in
a x × y window ⊞x,y and thereby further reducing the output 9.2. Evaluate the fitness
parameters. Finally, the fully-connected layer operates in a way
similar to classical ANN. In this step, the flatten output is fed to The fitness is evaluated by compiling the particles into full-
the fully-connected layer which gives the final output using an fledged CNN architectures and then training these architectures
R. Sharma and B. Kaushik / Computer Science Review 38 (2020) 100302 21
Fig. 10. Representation of three particles with different CNN architectures. Fig. 11. Example of computing the difference (g-best - X).
for the total number of training epochs (etrain ). The loss func-
tions computed for different particles are compared in order to
evaluate the fitness. Thus, the aim of the algorithm is to iden-
tify a particle architecture with a minimum loss, irrespective of
the number of hyper-parameters. On the basis of this evaluated
fitness, the g-best and p-bestk are updated in order to direct
the search towards an optimal solution. When the evaluation
terminates, then the obtained g-best will represent the optimal
CNN architecture.
Fig. 12. Example of computing the difference (p-best - X).
9.3. Velocity computation
The velocity of every kth particle (Xk ) depends upon the differ- indicates that this layer position should remain the same while
ences: (g-best – Xk ) and (p-bestk – Xk ), thus we have to compute updating the given particle architecture.
these differences in order to determine the velocity for each The difference is always computed with respect to g-best and
particle. While computing the differences we need to separate p-best, as shown in Figs. 11 and 12 respectively. For example, if g-
FC layers from convolution and pooling layers, in order to avoid best and Xk have different layer types then the result will keep the
the chance of ending-up with FC layers between convolution and layer from g-best with its hyper-parameters. If g-best has fewer
pooling layers. For each particle, the difference depends only on layers as compared to Xk then the corresponding extra layer is
the type of layer. truncated. On the contrary, if g-best has more layers than Xk , then
For example, as shown in Fig. 11 the third layer of both g-best extra layer is padded to the final difference. The same procedure
and Xk is a convolution layer, therefore difference will be 0, which is followed for computing the difference with respect to p-best.
22 R. Sharma and B. Kaushik / Computer Science Review 38 (2020) 100302
Table 18
Comparative analysis of the proposed framework with some popular
contemporaries on CMATERdb 3.1.2 database.
Database Work reference Recognition accuracy (%)
Bengali basic Sarkhel et al. [73] 87.28
Character Basu et al. [52] 80.58
CMATERdb 3.1.2 Rabby et al. [65] 98.00
Gupta et al. [94] 86.10
Sarkhel et al. [92] 86.53
Keserwani et al. [85] 98.56
Roy et al. [93] 86.40
Dash et al. [141] 94.78
Present work 98.93
Example. An example of final velocity computation is shown in In the designed PSO–CNN model, the parameters setting re-
Fig. 13. lated to the PSO algorithm are based on the conventions used in
PSO communities. Initially, the inertia weight value is predefined
9.4. Update the particle as 0.9, which decreases with each iteration according to Eq. (5).
The population size is initialized to 30 particles with each par-
The particles are updated on the basis of computed velocity ticle representing random CNN architecture, and the maximum
i.e. layers are added to the particle architecture or removed from number of PSO iterations is 20. Since the fitness evaluation is an
the particle architecture with respect to its velocity. expensive process in CNN architecture due to the requirement of
a large amount of training and testing, therefore, the maximum
Example. An example of updating the particle architecture is number of epochs for particle training is kept to 5. Finally, at
shown in Fig. 14. the end of optimization, the obtained optimal CNN architecture
is trained for 100 epochs.
10. Experimental analysis Due to the stochastic nature of the proposed algorithm, the
results are averaged over five independent experimental runs
In order to investigate the performance of the proposed model, for maintaining the consistency in the results to be compared.
the following sub-sections firstly outlines the employed bench- The proposed model has reported significantly improved results
mark datasets. Then the experimental setup and overall perfor- on both the publicly available standard datasets. Furthermore,
mance are illustrated and comprehensively analysed. Finally, the the computational results achieved on the CMATERdb 3.1.2 and
chosen peer competitors are reviewed, and comparative analysis CMATERdb 3.1.3 datasets are provided in Tables 18 and 19, re-
is performed with these state-of-the-art techniques. spectively.
R. Sharma and B. Kaushik / Computer Science Review 38 (2020) 100302 23
10.3. Comparison with the state-of-the-art analysis has been performed with an intent to determine the
efficiency of recognition schemes by comparing several exist-
In this section, the superiority of the proposed model is proved ing feature extraction and classification methods based on their
by comparing its performance with some other popular con- recognition rates. Finally, we have discussed various challenges
temporary works. Peer competitors for the proposed PSO–CNN in Indic scripts, which provide a way forward for future research.
model, which have been extensively discussed and reviewed in Based on the extensive study conducted in this article, it has
Section 6, are employed to perform the comparisons on the been contemplated that there is a need to develop hybrid feature
chosen standard image classification datasets. Specifically, the extraction and classification approaches for achieving the best
state-of-the-art methods, that have reported promising recogni- accuracy results. So, a novel framework based on the improved
tion accuracy on the chosen benchmarks are MMCNN+SVM [89], PSO algorithm to automatically construct optimal CNN architec-
SL-DCNN [91], Two-pass GA+SVM [37], NSHA+NSGA-II+AFS+ ture has been proposed, which yield better performance than the
SVM [73], Shape decomposition+BP-MLP [75], SVM+RBF [92], existing traditional state-of-the-art techniques.
BornoNet [65], ABCO+SVM [93], CDM+BP-MLP [52], unified- From the perspective of future work, the demand for man and
CNN [85], Multiobjective harmony search+SVM [94] and machine interaction has elevated this field. The researchers have
Tetrolet+SCC+Nearest Neighbour [141]. Finally, Tables 18 and immensely contributed towards the advancement of character
19 illustrates the comparative analysis of the proposed PSO–CNN recognition systems that can be implemented practically, to aid
model with the above benchmark state-of-the-art algorithms. society by enhancing the interface between humans and com-
puters. Intensive research available so far in this field is mainly
11. Conclusion focused on scripts like Chinese, Latin, Arabic and Japanese, and
the work done on Indic scripts is still at its infancy. So, this survey
In this paper, we have presented an exhaustive survey of vari- has been performed to compile significant contributions in the
ous feature extraction and classification techniques for the recog- field that will encourage and provide in-depth information to
nition of handwritten Indic scripts. It also provides a detailed the novice researchers in this field of handwritten Indic scripts
description of related benchmark and standardized datasets. It is recognition.
clearly evident from the survey that the recognition accuracy of
the model relies on the size and quality of the dataset. The lack of Declaration of competing interest
significantly good benchmark datasets is one of the basic issues in
several Indic scripts. So, there is a necessity for the construction The authors declare that they have no known competing finan-
of exhaustive and standardized datasets for Indic scripts which cial interests or personal relationships that could have appeared
provides a novel future aspect in this direction. A systematic to influence the work reported in this paper.
24 R. Sharma and B. Kaushik / Computer Science Review 38 (2020) 100302
[11] U. Pal, R. Jayadevan, N. Sharma, Handwriting recognition in indian [39] U. Bhattacharya, B. Chaudhuri, Umapada, Databases for research on recog-
regional scripts: a survey of offline techniques, ACM Trans. Asian Lang. nition of handwritten characters of Indian scripts, in: Eighth International
Inf. Process. (TALIP) 11 (1) (2012) 1–35. Conference on Document Analysis and Recognition (ICDAR’05), IEEE,
[12] U. Pal, B. Chaudhuri, Indian script character recognition: a survey, Pattern 2005, pp. 789–793.
Recognit. 37 (9) (2004) 1887–1899. [40] HPLabs, Isolated handwritten telugu character dataset, 2006, URL http:
[13] A. Datta, A generalized formal approach for description and analysis of //lipitk.sourceforge.net/datasets/teluguchardata.htm.
major Indian scripts, IETE J. Res. 30 (6) (1984) 155–161. [41] M. Agrawal, A.S. Bhaskarabhatla, S. Madhvanath, Data collection for hand-
[14] R. Sharma, B.N. Kaushik, N.K. Gondhi, Devanagari and gurmukhi script writing corpus creation in Indic scripts, in: International Conference on
recognition in the context of machine learning classifiers, J. Artif. Intell. Speech and Language Technology and Oriental COCOSDA (ICSLT-COCOSDA
11 (2) (2018) 65–70. 2004), New Delhi, India (November 2004), Citeseer, 2004.
[15] P.K. Singh, R. Sarkar, M. Nasipuri, Offline script identification from [42] A. Alaei, P. Nagabhushan, U. Pal, A benchmark kannada handwritten
multilingual indic-script documents: a state-of-the-art, Comp. Sci. Rev. document dataset and its segmentation, in: 2011 International Conference
15 (2015) 1–28. on Document Analysis and Recognition, IEEE, 2011, pp. 141–145.
[16] M. Yadav, R.K. Purwar, M. Mittal, Handwritten Hindi character [43] K.S. Dash, N.B. Puhan, G. Panda, Odia character recognition: a directional
recognition: a review, IET Image Process. 12 (11) (2018) 1919–1933. review, Artif. Intell. Rev. 48 (4) (2017) 473–497.
[17] K. Ubul, G. Tursun, A. Aysa, D. Impedovo, G. Pirlo, T. Yibulayin, Script [44] A. Chahi, Y. Ruichek, R. Touahni, et al., An effective and conceptu-
identification of multi-script documents: a survey, IEEE Access 5 (2017) ally simple feature representation for off-line text-independent writer
6546–6559. identification, Expert Syst. Appl. 123 (2019) 357–376.
[18] S. Bag, G. Harit, A survey on optical character recognition for Bangla and [45] P. Inkeaw, J. Bootkrajang, S. Marukatat, T. Gonçalves, J. Chaijaruwanich,
Devanagari scripts, Sadhana 38 (1) (2013) 133–168. Recognition of similar characters using gradient features of discriminative
[19] R. Jayadevan, S.R. Kolhe, P.M. Patil, U. Pal, Offline recognition of de- regions, Expert Syst. Appl. (2019).
vanagari script: A survey, IEEE Trans. Syst. Man Cybern. C 41 (6) (2011) [46] M.A. Dhali, C.N. Jansen, J.W. de Wit, L. Schomaker, Feature-extraction
782–796. methods for historical manuscript dating based on writing style
[20] B. Chaudhuri, U. Pal, A complete printed Bangla OCR system, Pattern development, Pattern Recognit. Lett. 131 (2020) 413–420.
Recognit. 31 (5) (1998) 531–549. [47] Ø.D. Trier, A.K. Jain, T. Taxt, Feature extraction methods for character
[21] S.M. Obaidullah, K. Santosh, C. Halder, N. Das, K. Roy, Automatic Indic recognition-a survey, Pattern Recognit. 29 (4) (1996) 641–662.
script identification from handwritten documents: page, block, line and [48] B. Bataineh, S.N.H.S. Abdullah, K. Omar, A novel statistical feature ex-
word-level approach, Int. J. Mach. Learn. Cybern. 10 (1) (2019) 87–106. traction method for textual images: Optical font recognition, Expert Syst.
[22] A.K. Bhunia, P.P. Roy, A. Mohta, U. Pal, Cross-language framework for Appl. 39 (5) (2012) 5470–5477.
word recognition and spotting of Indic scripts, Pattern Recognit. 79 (2018) [49] C.-Z. Shi, S. Gao, M.-T. Liu, C.-Z. Qi, C.-H. Wang, B.-H. Xiao, Stroke detector
12–31. and structure based models for character recognition: a comparative
[23] N.H. Khan, A. Adnan, Urdu optical character recognition systems: Present study, IEEE Trans. Image Process. 24 (12) (2015) 4952–4964.
contributions and future directions, IEEE Access 6 (2018) 46019–46046. [50] N. Sharma, U. Pal, F. Kimura, S. Pal, Recognition of off-line handwritten
[24] N.R. Soora, P.S. Deshpande, A novel local skew correction and segmenta- devnagari characters using quadratic classifier, in: Computer Vision,
tion approach for printed multilingual Indian documents, Alexandria Eng. Graphics and Image Processing, Springer, 2006, pp. 805–816.
J. 57 (3) (2018) 1609–1618. [51] S. Arora, D. Bhattacharjee, M. Nasipuri, D.K. Basu, M. Kundu, Combining
[25] V. Bansal, R. Sinha, Segmentation of touching and fused devanagari multiple feature extraction techniques for handwritten devnagari char-
characters, Pattern Recognit. 35 (4) (2002) 875–893. acter recognition, in: 2008 IEEE Region 10 and the Third International
[26] R. Ghosh, C. Vamshi, P. Kumar, RNN based online handwritten word Conference on Industrial and Information Systems, IEEE, 2008, pp. 1–6.
recognition in Devanagari and Bengali scripts using horizontal zoning, [52] S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, D.K. Basu, A hierar-
Pattern Recognit. 92 (2019) 203–218. chical approach to recognition of handwritten Bangla characters, Pattern
[27] P.P. Roy, A.K. Bhunia, A. Das, P. Dey, U. Pal, HMM-based Indic handwritten Recognit. 42 (7) (2009) 1467–1484.
word recognition using zone segmentation, Pattern Recognit. 60 (2016) [53] N. Das, B. Das, R. Sarkar, S. Basu, M. Kundu, M. Nasipuri, Handwritten
1057–1075. bangla basic and compound character recognition using MLP and SVM
[28] N. Das, J.M. Reddy, R. Sarkar, S. Basu, M. Kundu, M. Nasipuri, D.K. Basu, A classifier, 2010, arXiv preprint arXiv:1002.4040.
statistical–topological feature combination for recognition of handwritten [54] B. Singh, A. Mittal, D. Ghosh, An evaluation of different feature extractors
numerals, Appl. Soft Comput. 12 (8) (2012) 2486–2495. and classifiers for offline handwritten devnagari character recognition, J.
[29] P.K. Singh, R. Sarkar, N. Das, S. Basu, M. Kundu, M. Nasipuri, Bench- Pattern Recognit. Res. 2 (2011) 269–277.
mark databases of handwritten Bangla-Roman and Devanagari-Roman [55] K. Kale, S. Chavan, M. Kazi, Y. Rode, Handwritten and printed devana-
mixed-script document images, Multimedia Tools Appl. 77 (7) (2018) gari compound using multiclass svm classifier with orthogonal moment
8441–8473. feature, Int. J. Comput. Appl. 71 (24) (2013).
[30] S. Basu, C. Chaudhuri, M. Kundu, M. Nasipuri, D.K. Basu, Text line [56] M. Kumar, R. Sharma, M. Jindal, Efficient feature extraction techniques
extraction from multi-skewed handwritten documents, Pattern Recognit. for offline handwritten Gurmukhi character recognition, Nat. Acad. Sci.
40 (6) (2007) 1825–1839. Lett. 37 (4) (2014) 381–391.
[31] U. Bhattacharya, B.B.U. Chaudhuri, Handwritten numeral databases of [57] S. Bag, G. Harit, P. Bhowmick, Recognition of Bangla compound char-
Indian scripts and multistage recognition of mixed numerals, IEEE Trans. acters using structural decomposition, Pattern Recognit. 47 (3) (2014)
Pattern Anal. Mach. Intell. 31 (3) (2008) 444–457. 1187–1201.
[32] V.P. Agnihotri, Offline handwritten devanagari script recognition, IJ Inf. [58] A. Singh, K.A. Maring, Handwritten devanagari character recognition using
Technol. Comput. Sci. 8 (1) (2012) 37–42. SVM and ANN, Int. J. Adv. Res. Comput. Commun. Eng. 4 (8) (2015)
[33] S. Acharya, A.K. Pant, P.K. Gyawali, Deep learning based large scale 123–128.
handwritten devanagari character recognition, in: 2015 9th Interna- [59] S. Shelke, S. Apte, A fuzzy based classification scheme for unconstrained
tional Conference on Software, Knowledge, Information Management and handwritten devanagari character recognition, in: 2015 International
Applications (SKIMA), IEEE, 2015, pp. 1–6. Conference on Communication, Information & Computing Technology
[34] N. Das, R. Sarkar, S. Basu, M. Kundu, M. Nasipuri, D.K. Basu, A genetic (ICCICT), IEEE, 2015, pp. 1–6.
algorithm based region sampling for selection of local features in hand- [60] O. Surinta, M.F. Karaaba, L.R. Schomaker, M.A. Wiering, Recognition of
written digit recognition application, Appl. Soft Comput. 12 (5) (2012) handwritten characters using local gradient feature descriptors, Eng. Appl.
1592–1606. Artif. Intell. 45 (2015) 405–414.
[35] N. Das, S. Basu, R. Sarkar, M. Kundu, M. Nasipuri, et al., An improved [61] A. Aggarwal, K. Singh, Handwritten gurmukhi character recognition, in:
feature descriptor for recognition of handwritten bangla alphabet, 2015, 2015 International Conference on Computer, Communication and Control
arXiv preprint arXiv:1501.05497. (IC4), IEEE, 2015, pp. 1–5.
[36] N. Das, K. Acharya, R. Sarkar, S. Basu, M. Kundu, M. Nasipuri, A benchmark [62] M. Kumar, M. Jindal, R. Sharma, Offline handwritten gurmukhi character
image database of isolated bangla handwritten compound characters, Int. recognition: analytical study of different transformations, Proc. Nat. Acad.
J. Doc. Anal. Recognit. (IJDAR) 17 (4) (2014) 413–431. Sci. India Sect. A 87 (1) (2017) 137–143.
[37] N. Das, R. Sarkar, S. Basu, P.K. Saha, M. Kundu, M. Nasipuri, Handwritten [63] M. Yadav, R. Purwar, Hindi handwritten character recognition using
Bangla character recognition using a soft computing paradigm embedded multiple classifiers, in: 2017 7th International Conference on Cloud
in two pass approach, Pattern Recognit. 48 (6) (2015) 2054–2071. Computing, Data Science & Engineering-Confluence, IEEE, 2017, pp.
[38] R. Sarkar, N. Das, S. Basu, M. Kundu, M. Nasipuri, D.K. Basu, CMATERdb1: a 149–154.
database of unconstrained handwritten Bangla and Bangla–English mixed [64] N. Kumar, S. Gupta, A novel handwritten Gurmukhi character recognition
script document image, Int. J. Doc. Anal. Recognit. (IJDAR) 15 (1) (2012) system based on deep neural networks, Int. J. Pure Appl. Math. 117 (21)
71–83. (2017) 663–678.
26 R. Sharma and B. Kaushik / Computer Science Review 38 (2020) 100302
[65] A.S.A. Rabby, S. Haque, S. Islam, S. Abujar, S.A. Hossain, BornoNet: Bangla [92] R. Sarkhel, A.K. Saha, N. Das, An enhanced harmony search method
handwritten characters recognition using convolutional neural network, for bangla handwritten character recognition using region sampling, in:
Procedia Comput. Sci. 143 (2018) 528–535. 2015 IEEE 2nd International Conference on Recent Trends in Information
[66] M. Jangid, S. Srivastava, Handwritten devanagari character recognition Systems (ReTIS), IEEE, 2015, pp. 325–330.
using layer-wise training of deep convolutional neural networks and [93] A. Roy, N. Das, R. Sarkar, S. Basu, M. Kundu, M. Nasipuri, Region selection
adaptive gradient methods, J. Imaging 4 (2) (2018) 41. in handwritten character recognition using artificial bee colony optimiza-
[67] U. Pal, T. Wakabayashi, F. Kimura, Comparative study of devnagari tion, in: 2012 Third International Conference on Emerging Applications
handwritten character recognition using different feature and classi- of Information Technology, IEEE, 2012, pp. 183–186.
fiers, in: 2009 10th International Conference on Document Analysis and [94] A. Gupta, R. Sarkhel, N. Das, M. Kundu, Multiobjective optimization for
Recognition, IEEE, 2009, pp. 1111–1115. recognition of isolated handwritten Indic scripts, Pattern Recognit. Lett.
[68] G. Singh, S. Lehri, Recognition of handwritten hindi characters using 128 (2019) 318–325.
backpropagation neural network, Int. J. Comput. Sci. Inf. Technol. 3 (4) [95] U. Pal, N. Sharma, T. Wakabayashi, F. Kimura, Handwritten character
(2012) 4892–4895. recognition of popular south Indian scripts, in: Summit on Arabic and
[69] U. Jain, D. Sharma, Recognition of isolated handwritten characters of Chinese Handwriting Recognition, Springer, 2006, pp. 251–264.
gurumukhi script using neocognitron, Int. J. Comput. Appl. 10 (8) (2010). [96] S. Sangame, R. Ramteke, R. Benne, Recognition of isolated handwritten
[70] K.S. Siddharth, R. Dhir, R. Rani, Handwritten Gurumukhi character recog- Kannada vowels, Adv. Comput. Res. 1 (2) (2009) 52–55.
nition using zoning density and background directional distribution [97] B. Dhandra, M. Hangarge, G. Mukarambi, Spatial features for handwritten
features, Int. J. Comput. Sci. Inf. Technol. 2 (3) (2011) 1036–1041. kannada and english character recognition, Int. J. Comput. Appl. (2010)
[71] G. Sinha, R. Rani, R. Dhir, Handwritten gurmukhi character recognition 146–150, Special Issue on RTIPPR-10.
using K-NN and SVM classifier, Int. J. Adv. Res. Comput. Sci. Softw. Eng. [98] G. Rajput, R. Horakeri, Shape descriptors based handwritten character
2 (6) (2012) 288–293. recognition engine with application to Kannada characters, in: 2011 2nd
[72] S. Singh, A. Aggarwal, R. Dhir, Use of gabor filters for recognition of International Conference on Computer and Communication Technology
handwritten gurmukhi character, Int. J. Adv. Res. Comput. Sci. Softw. Eng. (ICCCT-2011), IEEE, 2011, pp. 135–141.
2 (5) (2012). [99] G. Mukarambi, B. Dhandra, M. Hangarge, A zone based character recog-
[73] R. Sarkhel, N. Das, A.K. Saha, M. Nasipuri, A multi-objective approach nition engine for kannada and english scripts, Procedia Eng. 38 (2012)
towards cost effective isolated handwritten Bangla character and digit 3292–3299.
recognition, Pattern Recognit. 58 (2016) 172–189. [100] S.A. Vaidya, B.R. Bombade, A novel approach of handwritten character
[74] U. Pal, T. Wakabayashi, F. Kimura, Handwritten Bangla compound charac- recognition using positional feature extraction, Int. J. Comput. Sci. Mobile
ter recognition using gradient feature, in: 10th International Conference Comput. 2 (6) (2013) 179–186.
on Information Technology (ICIT 2007), IEEE, 2007, pp. 208–213. [101] B. Dhandra, G. Mukarambi, Kannada handwritten vowels recognition
[75] R. Pramanik, S. Bag, Shape decomposition-based handwritten compound based on normalized chain code and wavelet filters, Int. J. Comput. Appl.
character recognition for Bangla OCR, J. Vis. Commun. Image Represent. 975 (2014) 8887.
50 (2018) 123–134. [102] P.N. Sastry, T.V. Lakshmi, N.K. Rao, T. Rajinikanth, A. Wahab, Telugu
[76] A. Garg, M.K. Jindal, A. Singh, Offline handwritten gurmukhi character handwritten character recognition using zoning features, in: 2014 Inter-
recognition: k-NN vs. SVM classifier, Int. J. Inf. Technol. (2019) 1–8. national Conference on IT Convergence and Security (ICITCS), IEEE, 2014,
[77] J.A. Sánchez, V. Romero, A.H. Toselli, M. Villegas, E. Vidal, A set of pp. 1–4.
benchmarks for handwritten text recognition on historical documents, [103] T.V. Lakshmi, Multi-stage strategy to classify handwritten characters of
Pattern Recognit. 94 (2019) 122–134. telugu, Int. J. Curr. Res. Rev. 9 (20) (2017) 39.
[78] S. Mandal, S.M. Prasanna, S. Sundaram, GMM posterior features for [104] S. Karthik, K.S. Murthy, Deep belief network based approach to recognize
improving online handwriting recognition, Expert Syst. Appl. 97 (2018) handwritten Kannada characters using distributed average of gradients,
421–433. Cluster Comput. 22 (2) (2019) 4673–4681.
[79] M. Labani, P. Moradi, F. Ahmadizar, M. Jalili, A novel multivariate filter [105] N.S. Rani, N. Chandan, A.S. Jain, H. Kiran, Deformed character recognition
method for feature selection in text classification problems, Eng. Appl. using convolutional neural networks, Int. J. Eng. Technol. 7 (3) (2018)
Artif. Intell. (ISSN: 0952-1976) 70 (2018) 25–37. 1599–1604.
[80] J.M. Alonso-Weber, M. Sesmero, A. Sanchis, Combining additive input [106] S. Pasha, M. Padma, Handwritten kannada character recognition using
noise annealing and pattern transformations for improved handwritten wavelet transform and structural features, in: 2015 International Con-
character recognition, Expert Syst. Appl. 41 (18) (2014) 8180–8188. ference on Emerging Research in Electronics, Computer Science and
[81] T. Akter, S. Desai, Developing a predictive model for nanoimprint Technology (ICERECT), IEEE, 2015, pp. 346–351.
lithography using artificial neural networks, Mater. Des. 160 (2018) [107] S. Angadi, S.H. Angadi, Structural features for recognition of hand written
836–848. kannada character based on SVM, Int. J. Comput. Sci. Eng. Inf. Technol. 5
[82] P. Ajmire, Handwritten devanagari vowel recognition using artificial (2) (2015) 25–32.
neural network, Int. J. Adv. Res. Comput. Sci. 8 (7) (2017). [108] P.N. Sastry, R. Krishnan, B.V.S. Ram, Classification and identification of
[83] D. Khanduja, N. Nain, S. Panwar, A hybrid feature extraction algorithm Telugu handwritten characters extracted from palm leaves using decision
for devanagari script, ACM Trans. Asian Low-Resour. Lang. Inf. Process. tree approach, J. Appl. Eng. Sci. 5 (3) (2010) 22–32.
15 (1) (2016) 2. [109] C.N. Manisha, E.S. Reddy, Y.S. Krishna, Glyph-based recognition of offline
[84] M. Jangid, S. Srivastava, Deep convnet with different stochastic optimiza- handwritten Telugu characters: GBRoOHTC, in: 2016 IEEE International
tions for handwritten devanagari character, in: Advances in Computer Conference on Computational Intelligence and Computing Research
Communication and Computational Sciences, Springer, 2019, pp. 51–60. (ICCIC), IEEE, 2016, pp. 1–6.
[85] P. Keserwani, T. Ali, P.P. Roy, Handwritten bangla character and numeral [110] L.R. Ragha, M. Sasikumar, Feature analysis for handwritten kannada
recognition using convolutional neural network for low-memory GPU, Int. kagunita recognition, Int. J. Comput. Theory Eng. 3 (1) (2011) 94.
J. Mach. Learn. Cybern. 10 (12) (2019) 3485–3497. [111] A. Angadi, V. Kumari Vatsavayi, S. Keerthi Gorripati, A deep learning
[86] V.N. Vapnik, The nature of statistical learning, Theory (1995). approach to recognize handwritten telugu character using convolution
[87] S.R. Narang, M. Jindal, S. Ahuja, M. Kumar, On the recognition of neural networks, Int. J. Inf. Syst. Manage. Sci. 1 (2) (2018).
devanagari ancient handwritten characters using SIFT and gabor features, [112] J.R. Prasad, U.V. Kulkarni, R.S. Prasad, Template matching algorithm for
Soft Comput. (2020). gujrati character recognition, in: 2009 Second International Conference on
[88] A. Kataria, M. Singh, A review of data classification using k-nearest Emerging Trends in Engineering & Technology, IEEE, 2009, pp. 263–268.
neighbour algorithm, Int. J. Emerg. Technol. Adv. Eng. 3 (6) (2013) [113] C. Patel, A. Desai, Gujarati handwritten character recognition using hybrid
354–360. method based on binary tree-classifier and k-nearest neighbour, Int. J.
[89] R. Sarkhel, N. Das, A. Das, M. Kundu, M. Nasipuri, A multi-scale deep Eng. Res. Technol. (IJERT) 2 (6) (2013) 2337–2345.
quad tree based feature extraction method for the recognition of isolated [114] J.R. Prasad, U. Kulkarni, Gujarati character recognition using adaptive
handwritten characters of popular indic scripts, Pattern Recognit. 71 neuro fuzzy classifier with fuzzy hedges, Int. J. Mach. Learn. Cybern. 6
(2017) 78–93. (5) (2015) 763–775.
[90] H. Kaur, S. Rani, Handwritten gurumukhi character recognition using [115] J.R. Prasad, U. Kulkarni, Gujrati character recognition using weighted k-
convolution neural network, Int. J. Comput. Intell. Res. 13 (5) (2017) NN and mean χ 2 distance measure, Int. J. Mach. Learn. Cybern. 6 (1)
933–943. (2015) 69–82.
[91] S. Roy, N. Das, M. Kundu, M. Nasipuri, Handwritten isolated Bangla [116] H.R. Thaker, C. Kumbharana, Structural feature extraction to recognize
compound character recognition: A new benchmark using a novel deep some of the offline isolated handwritten gujarati characters using decision
learning approach, Pattern Recognit. Lett. 90 (2017) 15–21. tree classifier, Int. J. Comput. Appl. (2014).
R. Sharma and B. Kaushik / Computer Science Review 38 (2020) 100302 27
[117] U. Pal, T. Wakabayashi, F. Kimura, A system for off-line Oriya handwritten [132] G. Raju, B.S. Moni, M.S. Nair, A novel handwritten character recognition
character recognition using curvature feature, in: 10th International system using gradient based features and run length count, Sadhana 39
Conference on Information Technology (ICIT 2007), IEEE, 2007, pp. (6) (2014) 1333–1355.
227–229. [133] K. Manjusha, M.A. Kumar, K. Soman, On developing handwritten character
[118] D. Padhi, Novel hybrid approach for odia handwritten character image database for malayalam language script, Eng. Sci. Technol. Int. J.
recognition system, IJARCSSE 2 (5) (2012). 22 (2) (2019) 637–645.
[119] K.S. Dash, N.B. Puhan, G. Panda, BESAC: Binary external symmetry [134] T. Wakabayashi, U. Pal, F. Kimura, Y. Miyake, F-ratio based weighted
axis constellation for unconstrained handwritten character recognition, feature extraction for similar shape character recognition, in: 2009 10th
Pattern Recognit. Lett. 83 (2016) 413–422. International Conference on Document Analysis and Recognition, IEEE,
[120] A. Sethy, P.K. Patra, D.R. Nayak, Off-line odia handwritten character 2009, pp. 196–200.
recognition: A hybrid approach, in: Computational Signal Processing and [135] P. Jino, J. John, K. Balakrishnan, Offline handwritten malayalam character
Analysis, Springer, 2018, pp. 247–257. recognition using stacked LSTM, in: 2017 International Conference on In-
[121] U. Bhattacharya, S. Ghosh, S. Parui, A two stage recognition scheme telligent Computing, Instrumentation and Control Technologies (ICICICT),
for handwritten tamil characters, in: Ninth International Conference on IEEE, 2017, pp. 1587–1590.
Document Analysis and Recognition (ICDAR 2007), Vol. 1, IEEE, 2007, pp. [136] M. Manuel, S. Saidas, Handwritten malayalam character recognition using
511–515. curvelet transform and ANN, Int. J. Comput. Appl. 121 (6) (2015).
[122] N. Shanthi, K. Duraiswamy, A novel SVM-based handwritten tamil [137] S.M. Salaken, A. Khosravi, T. Nguyen, S. Nahavandi, Extreme learning
character recognition system, Pattern Anal. Appl. 13 (2) (2010) 173–180. machine based transfer learning algorithms: A survey, Neurocomputing
[123] A. Subashini, N. Kodikara, Bag-of-keypoints approach for tamil handwrit- 267 (2017) 516–524.
ten character recognition using SVMs, in: 2011 International Conference [138] M.A.R. Raj, S. Abirami, Structural representation-based off-line tamil
on Advances in ICT for Emerging Regions (ICTer), IEEE, 2011, pp. 102–107. handwritten character recognition, Soft Comput. 24 (2) (2020) 1447–
[124] T.M. Jose, A. Wahi, Recognition of tamil handwritten characters us- 1472.
ing daubechies wavelet transforms and feed-forward backpropagation [139] A.K. Sharma, P. Thakkar, D.M. Adhyaru, T.H. Zaveri, Handwritten gujarati
network, Int. J. Comput. Appl. 64 (8) (2013). character recognition using structural decomposition technique, Pattern
[125] S. Abirami, V. Essakiammal, R. Baskaran, Statistical features based char- Recognit. Image Anal. 29 (2) (2019) 325–338.
acter recognition for offline handwritten tamil document images using [140] J. Pareek, D. Singhania, R.R. Kumari, S. Purohit, Gujarati handwritten
HMM, Int. J. Comput. Vis. Robot. 5 (4) (2015) 422–440. character recognition from text images, Procedia Comput. Sci. 171 (2020)
[126] S. Shyni, M.A.R. Raj, S. Abirami, Offline Tamil handwritten character 514–523.
recognition using sub line direction and bounding box techniques, Indian [141] K.S. Dash, N. Puhan, G. Panda, Sparse concept coded tetrolet transform
J. Sci. Technol. 8 (S7) (2015) 110–116. for unconstrained odia character recognition, 2020, arXiv preprint arXiv:
[127] M.A.R. Raj, S. Abirami, Offline Tamil handwritten character recognition 2004.01551.
using statistical features, Adv. Nat. Appl. Sci. 9 (6 SE) (2015) 367–375. [142] B. Kavitha, C. Srimathi, Benchmarking on offline handwritten tamil
[128] R. Raj, M. Antony, S. Abirami, Offline tamil handwritten character character recognition using convolutional neural networks, J. King Saud
recognition using statistical based quad tree, 2016. Univ.-Comput. Inf. Sci. (2019).
[129] B.S. Moni, G. Raju, Modified quadratic classifier and directional features [143] N.T. Kishna, S. Francis, Intelligent tool for malayalam cursive handwritten
for handwritten malayalam character recognition, Int. J. Comput. Appl. character recognition using artificial neural network and hidden Markov
(2011) 30–34. model, in: 2017 International Conference on Inventive Computing and
[130] B.P. Chacko, V.V. Krishnan, G. Raju, P.B. Anto, Handwritten character Informatics (ICICI), IEEE, 2017, pp. 595–598.
recognition using wavelet energy and extreme learning machine, Int. J. [144] G. Xu, K. Luo, G. Jing, X. Yu, X. Ruan, J. Song, On convergence analysis of
Mach. Learn. Cybern. 3 (2) (2012) 149–161. multi-objective particle swarm optimization algorithm, European J. Oper.
[131] J. Jomy, K. Balakrishnan, K. Pramod, A system for offline recognition of Res. (2020).
handwritten characters in Malayalam script, Int. J. Image Graph. Signal [145] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang,
Process. 5 (4) (2013) 53. G. Wang, J. Cai, et al., Recent advances in convolutional neural networks,
Pattern Recognit. 77 (2018) 354–377.