0% found this document useful (0 votes)
42 views

Segmentation and Recognition of Handwritten Lontara Characters Using Convolutional Neural Network

Uploaded by

Technuzer AI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Segmentation and Recognition of Handwritten Lontara Characters Using Convolutional Neural Network

Uploaded by

Technuzer AI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2019 International Conference on Information and Communications Technology (ICOIACT)

Segmentation and Recognition of Handwritten


Lontara Characters Using Convolutional Neural
Network
Asri Hidayat Ingrid Nurtanio Zulkifli Tahir
Department of Informatics Department of Informatics Department of Informatics
Hasanuddin University Hasanuddin University Hasanuddin University
Makassar, Indonesia Makassar, Indonesia Makassar, Indonesia
[email protected] [email protected] [email protected]

Abstract—This study presents a technique to recognize same class must be retained. Extracting feature using a
handwritten Lontara characters. Lontara character is traditional hand-design method is complicated and takes a
Indonesia’s traditional character which is used mostly in the long time. This method cannot process raw images. Retrieving
southern area of Sulawesi during the kingdom era. The work features directly from raw images using automatic extraction
consists of two stages. First, character segmentation of each method is a better solution.
character in images is achieved with a combination of contour
feature and sliding window technique to create a boundary and Reference [1] focused on word-based recognition.
extract character segments. Second, a Convolutional Neural Recognizing character from the document using word-based
Network (CNN) is used to recognize or classify the segmented method is tedious work and exhausting. It takes a lot of time
characters. The dataset contains 23 Lontara characters with five and resources since the input word image is matched with the
combinations of diacritics and one special character, that falls vocabulary containing all the representations of the words.
into 139 classes. The result of the conducted experiments on the Another disadvantage is when the words are not found in the
dataset shows that CNN provides good results – obtaining 96% vocabulary then the words will be recognized incorrectly since
of accuracy. Also, the result shows a promising result in a the vocabulary does not include the words. The representative
combination of segmentation and recognition. work of word-based recognition was [2].
Keywords— Convolutional Neural Network; Handwriting Character recognition on the handwritten document is
Recognition; Lontara characters more challenging than the printed document due to some
reasons. First, each time a person writes, the characters are not
I. INTRODUCTION always identical. Different writers will produce a nonidentical
Optical Character Recognition (OCR) is a solution to character and also generate more variation in different aspects
extract and recognize characters from an image and convert such as shape and size. Second, each writer has numerous
them to text form. The existence of this identification system variations in their writing style that makes difficulties on the
makes it easy for users to get information in a document. recognition task. Third, a different character may have
There have been various OCR studies on several types of similarities in shapes, and the interconnection of the
script both Latin and traditional script using different neighboring characters and the overlaps that make the
techniques. The uniqueness of each script and the nature of character recognition problem is more complicated. In
each method produces an exciting new story to be lifted. summary, it is a challenge to accurately recognizing
handwritten characters due to the complex features of the
This paper describes the handwritten recognition of the handwritten characters and a large variety of writing styles [3].
Lontara character. Lontara character is Indonesia’s traditional
character which is used mostly in the southern area of On the other side, object recognition on the image dataset
Sulawesi since the kingdom era. Most of the literary works using deep learning method is becoming state of the art. As a
from the kingdom era of the Makassar Kingdom and the Bone comparison to conventional classification methods, deep
Kingdom are written using Lontara character. The character is learning gives a more satisfactory result on accuracy.
written on lontara leaves and preserves until now. One of the Convolutional Neural Network (CNN) is one of the deep
famous books written in Lontara is Lagaligo. This book is learning methods to recognize object on image dataset, and
filled with philosophies, stories, and poetries of Buginese and widely used. One of the advantages of CNN is that the model
Makassarnese. can study the feature on the input data through learning.
There are some challenges in Lontara handwritten The rest of this paper is structured as follows; Section 2
document recognition. Some characters in Lontara have a describes the methodology of character segmentation and
similar shape. There are no available public handwriting recognition technique. The experiments and results of the case
datasets in Lontara scripts resulting slow development of study including discussion are explained in Section 3. Then,
handwritten character recognizers. Earlier research of Lontara Section 4 describes our conclusion.
characters recognition focused on the printed-documents
domain. Reference [1] used the Fourier Descriptor (FD) and II. METHODOLOGY
Modified Direction Feature (MDF) as a feature, and Support A. Character Segmentation
Vector Machine (SVM) to classify Lontara printed characters.
It is a necessary step to perform character segmentation to
In the recognition system, feature extraction plays a recognize the character in many OCR systems. It is also an
significant role to recognize an object in image successfully. essential step because incorrect segmentation can affect
The characteristics among different classes must be the most recognition accuracy. There are 23 characters including vowel
distinguishable but at the same time characteristics within the and consonant in Lontara script as shown in Fig. 1.

978-1-7281-1655-6/19/$31.00 ©2019 IEEE


Authorized licensed use limited to: University of Exeter. Downloaded157
on June 30,2020 at 04:22:11 UTC from IEEE Xplore. Restrictions apply.
2019 International Conference on Information and Communications Technology (ICOIACT)

Fig. 2. Block diagram of the segmentation process

The segmented character with the ratio value below


threshold t is fed into the CNN model for recognition.
Fig. 1. (a) character of Lontara script, (b) diacritics (c) character with Otherwise, the sliding window is performed to extract more
diacritics characters.
The ratio of each character is defined as:
There are also five diacritics sign on the left, right, above
and below the character. Moreover, there are many similar
shaped characters found in Lontara characters. Some similar
characters are distinguished only with a single dot or mark. To
achieve an efficient segmentation and classification with the (1)
presence of diacritics in the Lontara script, in both machine
printed as well as handwritten form, is a very challenging job. The segmentation process involves a window that moves
over the character image and pixel values. A sliding window
There are five steps to recognize a character in the OCR dynamically slides over each cell on the image and extracts
system: First, the dataset is prepared through pre-processing. the value of the image portion within the window boundaries.
Then, segmenting the image into characters or words The window is “dynamic” in the sense that the location
according to the approach used. Then, feature extraction is changes and the size changes. The window starts from the left
carried out followed by classification. The last step is side of the image and is moved from left to right until it covers
postprocessing [4]. Segmentation is the process of isolating the entire image area.
object to be fed to the recognition system. Incorrect
segmentation method makes the accuracy of the recognition B. Character Recognition using CNN
system is low. Deep learning has been used in image recognition and
Texts and background must be separated using classification, text detection, object tracking, visual saliency
binarization. Binarization is a preprocessing step using an detection, pose estimation, scene labeling, and action
algorithm to convert the RGB image as the input to the recognition. Convolutional Neural Networks, Restricted
grayscale image as the output. The image is processed through Boltzmann Machine, Auto Encoder, sparse coding, and Deep
a dedicated thresholding binarization. The representation of Belief Networks are commonly used models in deep learning.
the output image is a matrix. The entries of this matrix are Among different types of models, the Convolutional Neural
conventionally set to 0 for background pixels (white) and 1 for Networks has been demonstrated high accuracy on classifying
foreground pixels (black). The quality of the binary outcome the image [7].
depends on the thresholding process. Our experiment used the It has been proven in many applications that applying
Otsu algorithm to binarize the image. This method introduced CNN for analyzing visual imagery is crucial for decision
by Otsu in [5]. OpenCV library was used to get the contours making and recognition. The network learns the filters using
feature. an additional convolutional layer that in other traditional
The contours are a powerful tool for object detection and algorithms were manually designed. The robustness of the
recognition and shape analysis. Contours can be explained as network in dealing with transformations in the image is the
a list of points as a representation of curve in an image having goal of the added layers [8].
the same color or intensity. There are many ways to represent This work applies CNN [9] to perform classification on
a curve. Contours are represented by list in OpenCV. Every handwritten Lontara characters. CNN is a deep supervised
entry in the sequence stores information about the location of learning architecture consists of a multi-layer neural network
the next point on the curve [6]. In OpenCV, cvFindContours() that contains two parts: feature extractor that learns
retrieves contours from binary images. automatically and a trainable classifier [10]. There are feature
A block diagram of the segmentation process is shown in map layers in the feature extractor that computes the feature
Fig. 2. discrimination from the raw images via two steps:
convolutional filtering and downsampling. Then, a back-
A Lontara character consists of one to four contours. The propagation algorithm computes the feature map layers.
first step is to detect and find all shapes using contour function.
For each contour found, bounding boxes are drawn with The structure of CNN is shown in Fig. 3. There are three
additional height threshold h. This height helps each contour convolution layers, and four layers of multi-layer perceptron
to connect with another contour above or below. Then, another applied in this work. Overall, there are eight layers in this
contour function is performed to extract the height and width network architecture including the input layer. The input layer
information of each character. containing the original image data in grayscale is flattened to
28x28x1.

Authorized licensed use limited to: University of Exeter. Downloaded158


on June 30,2020 at 04:22:11 UTC from IEEE Xplore. Restrictions apply.
2019 International Conference on Information and Communications Technology (ICOIACT)

study, a primary dataset of handwritten Lontara characters was


manually collected from some people as shown in Fig. 4.
The dataset consists of 30,024 data which belongs to 139
classes of Lontara characters. Each class of the Lontara
character consists of 180 training data. Each data in the dataset
is an 8-bit grayscale image which has a dimension of 28 x 28
pixels.
The number of characters of handwritten Lontara script in
our dataset is relatively limited. It came from two independent
persons and went through scanning and character splitting
process. For each character, a person wrote 36 times so that
there is a total of 10008 Lontara characters in the dataset at
this time.
Fig. 3. The structure of Convolutional Neural Network
The overall quality of a model is determined on the size
The first layer comprises 32 of 3x3 filters. The second and quality of the training set. A simple recognition task
layer has 64 of 3×3 filters, and the third layer has 64 of 3x3 works well using relatively small size dataset (tens of
filters.An activation function called Rectified Linear Unit thousands of images). The performance and stability of the
(ReLu) is used to filter negative values. This function applied model can be improved by increasing the size of the data. Data
in each convolution layer and multi-layer perceptron. Then, augmentation is performed to overcome the lack of training
the layer volume is reduced using the max pool with a stride samples to generate a larger training dataset. Lontara dataset
of 2 for all max pooling layer. was augmented by two times with rotation. An example of
augmentation is displayed in Fig. 5.
ReLU is one of the activation functions known as the most
non-saturated. The ReLU activation function is defined as: B. Character Segmentation Result
The experimental results are shown in Fig. 6. Before the
(2) classification procedure, a segmentation procedure will be
performed for the captured handwritten Lontara character
where i,j indicates the location of the neuron and k is the image. Contour features and sliding windows technique are
channel. The activation function output a piecewise linear applied to the captured image using OpenCV library to
function where the negative segment to zero is pruned and perform the segmentation procedure. This segmentation
keeps the positive part. The computation is much faster than procedure will produce segmented images of Lontara
tanh or sigmoid activation functions using simple max(·) characters.
operation of ReLU, and it also gives the sparsity induction in
the hidden units and makes the network obtains sparse
representations easily. It has been shown that deep networks
can be trained efficiently using ReLU even without pre-
training. ReLu works better than tanh and sigmoid activation
functions in many works [11].
In order to perform training and testing phases of the CNN
model, the handwritten Lontara characters dataset was divided
into 20% of the testing dataset and 80% of the testing dataset.
Xavier weight initialization [12] was used in each training fold
to initialize internal weight matrices in the CNN model. In the
testing phase, the performance of each fold in the CNN model
was measured using a confusion matrix and the value of the
classification accuracy. The dropout technique [13] was used
to avoid overfitting by dropping units from the neural network
in random during training time. The classification accuracy is Fig. 4. Example of filled and scanned A4 form used for creating Lontara
dataset
defined in Formula (3) as follows:

(3)

Tensorflow [14] framework and Google Colaboratory [8]


were used to train the network.
III. EXPERIMENTS AND RESULTS
A. Dataset
In the training process, the CNN model needed an image
dataset which consists of handwritten Lontara characters.
Unfortunately, there is no available public secondary
dataset for handwritten Lontara characters. Therefore, in this Fig. 5. Example of augmentation used for creating Lontara dataset

Authorized licensed use limited to: University of Exeter. Downloaded159


on June 30,2020 at 04:22:11 UTC from IEEE Xplore. Restrictions apply.
2019 International Conference on Information and Communications Technology (ICOIACT)

Fig. 7. The CNN training process


Fig. 6. Image segmentation results.

C. Character Recognition Result


The training set and the testing set are the two subsets in
our dataset. There are 27,522 of training images, each image
contains one character, and 2,502 of testing images also
consists of a single character. The number of different
character classes is 139. The result shows that the CNN
classification performance achieves 96% of accuracy. The
CNN training process is shown in Fig. 7. The accuracy
increases gradually and reaches its maximum after around 94
epochs. There is no overfitting as the distance between the
train and the test accuracy is pretty close and following each
other.
An experiment of combined segmentation and recognition
proficiency on the separate image data is shown in Fig. 8. The
accuracy of the experiment is 75%.
D. Discussion
Fig. 8. Segmentation and recognition results
The experiment has successfully shown satisfactory
results in recognizing handwritten Lontara characters using The result of the conducted experiment shows that CNN
CNN algorithm. The segmentation accuracy is significantly provided a good result of 96% of accuracy on the data set.
affected by the overlaps between characters. Error in character Furthermore, the combination of character segmentation and
segmentation occurred because some characters are not well recognition on separate datasets obtains 75% of accuracy.
aligned. Therefore, the classification gives incorrect character
recognition. Mis-segmentation also occurred on characters REFERENCES
with diacritics O. The O diacritics is very similar to character [1] I. S. Areni, A. I. Asry, and Indrabayu, “A Hybrid Feature Extraction
Ta. The similarity makes the character with diacritics O is Method for Accuracy Improvement in Aksara Lontara Translation,”
often wrongly segmented into two characters. Journal of Computer Science, August 2017.
[2] Y. Lu and M. Shridhar, “Character segmentation in handwritten
The recognition accuracy is affected by the similarities of words — An overview,” Pattern Recognition, vol. 29, no. 1, pp. 77–
a different character in shapes and a large variety of writing 96, January 1996.
styles, for example, Co and Jo, Lo and Wo. Some characters [3] M. Z. Alom, P. Sidike, M. Hasan, T. M. Taha, and V. K. Asari,
“Handwritten Bangla Character Recognition Using the State-of-the-
are different from another character with a single dot or mark, Art Deep Convolutional Neural Networks,” Computational
for example, Da and Mi, Na and Tu. In order to improve the Intelligence and Neuroscience, vol. 2018, pp. 1–13, August 2018.
recognition accuracy, the CNN architecture must be trained [4] A. Zoizou, A. Zarghili, and I. Chaker, “A new hybrid method for
with more variation of writing styles. Arabic multi-font text segmentation, and a reference corpus
construction,” Journal of King Saud University - Computer and
IV. CONCLUSION Information Sciences, July 2018.
[5] N. Otsu, “A Threshold Selection Method from Gray-Level
This paper described handwriting recognition and Histograms,” IEEE Transactions on Systems, Man, and Cybernetics,
segmentation of Lontara characters. The segmentation is vol. 9, no. 1, pp. 62–66, January 1979.
carried out using a contour features and sliding window [6] A. Zelinsky, “Learning OpenCV---Computer Vision with the
technique. Then, a learning method is used to perform OpenCV Library (Bradski, G.R. et al.; 2008) [On the Shelf],” IEEE
Robotics & Automation Magazine, vol. 16, no. 3, pp. 100–100,
recognition of the segmented character. A Convolutional September 2009.
Neural Network is used as a learning method. The dataset was [7] T. Guo, J. Dong, H. Li, and Y. Gao, “Simple convolutional neural
created with total characters 30,024 and consisted of 216 network on image classification,” in 2017 IEEE 2nd International
images each character, which are divided into 139 classes. Conference on Big Data Analysis (ICBDA), pp. 721–724, 2017.

Authorized licensed use limited to: University of Exeter. Downloaded160


on June 30,2020 at 04:22:11 UTC from IEEE Xplore. Restrictions apply.
2019 International Conference on Information and Communications Technology (ICOIACT)

[8] T. Carneiro, R. V. M. D. Nóbrega, T. Nepomuceno, G. Bian, V. H. [12] X. Glorot and Y. Bengio, “Understanding the difficulty of training
C. D. Albuquerque, and P. P. R. Filho, “Performance Analysis of deep feedforward neural networks,” in The 13th International
Google Colaboratory as a Tool for Accelerating Deep Learning Conference of Artificial Intelligence and Statistics, pp. 18–22, 2010.
Applications,” IEEE Access, vol. 6, pp. 61677–61685, 2018. [13] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R.
[9] J. Schmidhuber, “Deep learning in neural networks: An overview,” Salakhutdinov, “Dropout: a simple way to prevent neural networks
Neural Networks, vol. 61, pp. 85–117, January 2015. from overfitting,” Journal of Machine Learning Research, vol. 15,
[10] X.-X. Niu and C. Y. Suen, “A novel hybrid CNN–SVM classifier for no. 1, pp. 1929–1958, 2014.
recognizing handwritten digits,” Pattern Recognition, vol. 45, no. 4, [14] M. Abadi et al., “TensorFlow: Large-Scale Machine Learning on
pp. 1318–1325, April 2012. Heterogeneous Distributed Systems,” arXiv:1603.04467 [cs], March
[11] J. Gu et al., “Recent advances in convolutional neural networks,” 2016.
Pattern Recognition, vol. 77, pp. 354–377, May 2018.

Authorized licensed use limited to: University of Exeter. Downloaded161


on June 30,2020 at 04:22:11 UTC from IEEE Xplore. Restrictions apply.

You might also like