Segmentation and Recognition of Handwritten Lontara Characters Using Convolutional Neural Network
Segmentation and Recognition of Handwritten Lontara Characters Using Convolutional Neural Network
Abstract—This study presents a technique to recognize same class must be retained. Extracting feature using a
handwritten Lontara characters. Lontara character is traditional hand-design method is complicated and takes a
Indonesia’s traditional character which is used mostly in the long time. This method cannot process raw images. Retrieving
southern area of Sulawesi during the kingdom era. The work features directly from raw images using automatic extraction
consists of two stages. First, character segmentation of each method is a better solution.
character in images is achieved with a combination of contour
feature and sliding window technique to create a boundary and Reference [1] focused on word-based recognition.
extract character segments. Second, a Convolutional Neural Recognizing character from the document using word-based
Network (CNN) is used to recognize or classify the segmented method is tedious work and exhausting. It takes a lot of time
characters. The dataset contains 23 Lontara characters with five and resources since the input word image is matched with the
combinations of diacritics and one special character, that falls vocabulary containing all the representations of the words.
into 139 classes. The result of the conducted experiments on the Another disadvantage is when the words are not found in the
dataset shows that CNN provides good results – obtaining 96% vocabulary then the words will be recognized incorrectly since
of accuracy. Also, the result shows a promising result in a the vocabulary does not include the words. The representative
combination of segmentation and recognition. work of word-based recognition was [2].
Keywords— Convolutional Neural Network; Handwriting Character recognition on the handwritten document is
Recognition; Lontara characters more challenging than the printed document due to some
reasons. First, each time a person writes, the characters are not
I. INTRODUCTION always identical. Different writers will produce a nonidentical
Optical Character Recognition (OCR) is a solution to character and also generate more variation in different aspects
extract and recognize characters from an image and convert such as shape and size. Second, each writer has numerous
them to text form. The existence of this identification system variations in their writing style that makes difficulties on the
makes it easy for users to get information in a document. recognition task. Third, a different character may have
There have been various OCR studies on several types of similarities in shapes, and the interconnection of the
script both Latin and traditional script using different neighboring characters and the overlaps that make the
techniques. The uniqueness of each script and the nature of character recognition problem is more complicated. In
each method produces an exciting new story to be lifted. summary, it is a challenge to accurately recognizing
handwritten characters due to the complex features of the
This paper describes the handwritten recognition of the handwritten characters and a large variety of writing styles [3].
Lontara character. Lontara character is Indonesia’s traditional
character which is used mostly in the southern area of On the other side, object recognition on the image dataset
Sulawesi since the kingdom era. Most of the literary works using deep learning method is becoming state of the art. As a
from the kingdom era of the Makassar Kingdom and the Bone comparison to conventional classification methods, deep
Kingdom are written using Lontara character. The character is learning gives a more satisfactory result on accuracy.
written on lontara leaves and preserves until now. One of the Convolutional Neural Network (CNN) is one of the deep
famous books written in Lontara is Lagaligo. This book is learning methods to recognize object on image dataset, and
filled with philosophies, stories, and poetries of Buginese and widely used. One of the advantages of CNN is that the model
Makassarnese. can study the feature on the input data through learning.
There are some challenges in Lontara handwritten The rest of this paper is structured as follows; Section 2
document recognition. Some characters in Lontara have a describes the methodology of character segmentation and
similar shape. There are no available public handwriting recognition technique. The experiments and results of the case
datasets in Lontara scripts resulting slow development of study including discussion are explained in Section 3. Then,
handwritten character recognizers. Earlier research of Lontara Section 4 describes our conclusion.
characters recognition focused on the printed-documents
domain. Reference [1] used the Fourier Descriptor (FD) and II. METHODOLOGY
Modified Direction Feature (MDF) as a feature, and Support A. Character Segmentation
Vector Machine (SVM) to classify Lontara printed characters.
It is a necessary step to perform character segmentation to
In the recognition system, feature extraction plays a recognize the character in many OCR systems. It is also an
significant role to recognize an object in image successfully. essential step because incorrect segmentation can affect
The characteristics among different classes must be the most recognition accuracy. There are 23 characters including vowel
distinguishable but at the same time characteristics within the and consonant in Lontara script as shown in Fig. 1.
(3)
[8] T. Carneiro, R. V. M. D. Nóbrega, T. Nepomuceno, G. Bian, V. H. [12] X. Glorot and Y. Bengio, “Understanding the difficulty of training
C. D. Albuquerque, and P. P. R. Filho, “Performance Analysis of deep feedforward neural networks,” in The 13th International
Google Colaboratory as a Tool for Accelerating Deep Learning Conference of Artificial Intelligence and Statistics, pp. 18–22, 2010.
Applications,” IEEE Access, vol. 6, pp. 61677–61685, 2018. [13] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R.
[9] J. Schmidhuber, “Deep learning in neural networks: An overview,” Salakhutdinov, “Dropout: a simple way to prevent neural networks
Neural Networks, vol. 61, pp. 85–117, January 2015. from overfitting,” Journal of Machine Learning Research, vol. 15,
[10] X.-X. Niu and C. Y. Suen, “A novel hybrid CNN–SVM classifier for no. 1, pp. 1929–1958, 2014.
recognizing handwritten digits,” Pattern Recognition, vol. 45, no. 4, [14] M. Abadi et al., “TensorFlow: Large-Scale Machine Learning on
pp. 1318–1325, April 2012. Heterogeneous Distributed Systems,” arXiv:1603.04467 [cs], March
[11] J. Gu et al., “Recent advances in convolutional neural networks,” 2016.
Pattern Recognition, vol. 77, pp. 354–377, May 2018.