Bangla Sign Language Recognition
Bangla Sign Language Recognition
Abstract—For the deaf and dumb (D&D) people, sign till date, which is obviously important for a number of
language is one of the primary and most used methods applications, such as translation devices and assistive
for communication. All over the world, every day the technology tools.
D&D community faces difficulties while communicating
with the general mass. Most of the times, they need Deaf people can use sign language to share their
an interpreter to communicate with others and the feelings and express their emotions. The global pop-
interpreter may not always be available. Bangla sign ulation is made up of 15% of the population who
language (BdSL) is a complete and independent natural have various forms of disabilities. There are over five
sign language with its own linguistic characteristics. percent of the population that is deaf, which is over
Our system solely relies on the images of bare hands,
which allows the users to interact with the system in a 466 million people. These people deal with difficulties
natural way. We have collected in total 51800 different in interacting with others especially when joining the
hand signs for the 47 BdSL alphabets and 10 digits workforce, education, healthcare, and transportation.
along with 30 Bengali words. We propose both deep According to the Department of Social Services, there
learning & machine learning algorithms in our study
are 153,776 vocal disabled people, 73,507 hearing
and found out that deep learning models have achieved
comparatively more accuracy (96.15%) than that of disabled people, and 9625 hearing and visually dis-
machine learning models (94.09%). Thus deep learning abled people in Bangladesh [1]. A digital Bangla
models have performed quite better. Sign Language Interpretation system can surpass this
Index Terms—Communication, BdSL, Image Augmen- communication barrier between vocal-hearing disabled
tation, Convulational Neural Network(CNN), KPI, Mo-
people and a common person. Approximately 71 mil-
bileNet, VGG16
lion people worldwide use the spatial movements-
I. INTORDUCTION based language for their primary interactions. There
Sign language, popularly known as silent conversa- are over 3 million deaf and hard-of-hearing people
tion, serves as a visual gesture-based primary commu- in Bangladesh [2]. It is considered the second most
nication medium for hearing-impaired individuals. At prevalent type of disability in this country.
present, deafness is one of the major health problems Now, there exists another statistics that WHO stated
in the world. For solving this problem, sign language around 466 million people have a hearing dis-ability,
comes into play. In a world abundant with different which is over 5% of the world’s population [3]. 0.38%
spoken languages, Bengali is a language which is of the total population of Bangladesh have speech and
mostly used in communication by millions of people, hearing disabilities, according to the National Census
often finding itself overshadowed. Yet, within this 2011. Approximately 15% of the world’s population
vast linguistic tapestry, a unique and often overlooked have some degree of hearing loss, and many of them
form of communication exists—Bangla Sign Language are children [4]. WHO (World Health Organization)
(BdSL). It differs from any other language that has reported that over 5% of the world’s population have
dependency on hand shape, palm orientation, body hearing loss, jeopardizing their daily life and livelihood
gesture, and facial expression to express its meaning. in the year 2013.
Bengali is one of the most spoken languages; still, There is a possible way that we can perform Scale-
there is minimal research in Bangla Sign Language, Invariant Feature Transform (SIFT) for the robust
particularly for word-level detection. detection of keypoints and invariant feature descriptors
BdSL possesses some complex two-handed gestures of Bangla Sign words & letters. [3] Therefore the
along with simultaneous body movements that make it main objective of our research is to provide sufficient
different and at the same time challenging as compared support to the deaf-muted people in their daily life
to other sign languages. Some of the key challenges and to develop and build an improved and efficient
include: Firstly, to the best of our knowledge large- Machine Learning model for recognizing Bangla sign
scale diverse datasets for BdSL do not exist. Secondly, words.
the approaches that are currently existing for recog-
II. LITERATURE REVIEW
nizing BdSL often don’t provide high scalability and
accuracy. It is also evident that there is yet no efficient In one of our studied articles, we have found out that
system that could recognize BdSL gestures in real time there exists Bangla Ishara Bhasha Obhidhan (Bangla
Sign Language Dictionary, 1994, 1997) and Ishara utilized a dataset of 3,000 images categorized into
Bhashay Jogajog (Communication in Sign Language, original, binary, and segmented formats. (Das & Is-
2005, 2015) that try to bridge the gap in communica- lam, 2021). Accuracies achieved by Bengali words
tion. Islam et al. (2018): Created ”Ishara-Lipi,” the first and alphabets were 92.5% by using a dataset
dataset for isolated Bangla characters. Rahaman et al. mentioned in (Miah et al., 2022). It was also found
(2014): The authors presented a real-time computer that most of the earlier works make use of small or
vision-based BSL recognition system with a vowel single datasets; thus, generalization of such models is
recognition accuracy of 98.17%. Most of the existing not that easily possible. [2]
models have focused on letters or numerical digits. According to our knowledge, there is no video
Most of the approaches are not scalable for dynamic dataset available for BdSL except for a few image-
gestures or larger vocabularies of BSL. [5] Previous based datasets. This creates a gap in research on video-
works majorly concentrated on alphabet and digit based BdSL, which is the central of this research work.
recognition, thus leaving the topic of detecting static- [8] Now there exists a dataset named BdSL47 which
gesture words in BSL largely unexplored. is a comprehensive dataset that can be a valuable
Research on Bangla Sign Language recognition is resource for the researchers working on computer
relatively less as compared to other sign languages like vision-based Bangla sign language recognition. [14]
American Sign Language or Indian Sign Language. 2D The researchers and developers can explore the use
and 3D tracking sensors for depth information and seg- of multimodal deep learning architectures to correctly
mentation Machine learning models, including but not identify Bangla hand signs because the dataset con-
limited to HMM, CRF, and SVM, have been used for tains both RGB images and depth key-points of each
the identification, feature extraction from gestures, and sign for analysis. The dataset contains Bangla hand
gesture recognition. Deep learning approaches, espe- signs of digits and alphabets in different challenging
cially CNNs, have become popular owing to their high- conditions that reflect real-life scenarios and impose
level feature extraction capability and higher accuracy. challenges for the researchers and developers.
VGG16, VGG19, and custom CNN architectures, pre- III. METHODOLOGY AND
trained models were applied on static and dynamic IMPLEMENTATION
gestures and showed high accuracy in isolated datasets.
[15] Very few works are reported on Bangla Sign Our proposed system is designed to make the life
Language which has also focused on static gestures of people with hearing and speaking disability much
for alphabets and digits. While some efforts have been easier. The system’s design will enable communities
directed to translating BSL into text and the identifi- with disabilities to communicate within themselves
cation of static hand gestures, research regarding sen- and among others. The system is efficient enough
tence construction and dynamic gestures remains few to allow its users to communicate via Bengali sign
and far between. Large and diverse datasets for word- language. We have conducted our research work ac-
level or sentence-level recognition are also limited. cording to the procedure as illustrated in Figure 1
[6] Now, there is some clear evidence that different
types of studies have been conducted on sign language
detection around the world, most of them based on
American Sign Language, Thai Sign Language, and
Arabic Sign Language. Methods involving YOLOv3
for the real-time conversion of ASL and CNNs for
the detection and speech generation from Arabic Sign
Language have also given hopeful results. Now let’s Fig. 1: Methodology of our proposed System
come to the discussion about Bangla sign recognition.
There is a method called SIFT and PCA-based Feature
Extraction. It used to detect 38 Bangla signs; these A. Dataset Collection
methods converted the images from RGB to HSV Acquiring the data has been a crucial part of this
color space. There was also Incomplete and a lack of work since there is not enough dataset available for
diversity in BdSL datasets. One of the popular datasets use. And this task was not easy due to the presence
for hand gesture classification is “Isharalipi Dataset” of a huge number of alphabets. At first, we have
but not suitable for real-time object detection as it is collected two datasets consisting of hand images which
of low resolution. [3] expressed sign words and letters. The first one (dataset-
We have studied another article which stated that un- 1) is BdSL47 which contains 47000 RGB input images
like widely studied sign languages like American Sign of 47 signs (10 digits, 37 letters) of Bangla Sign
Language (ASL), BSL possesses complex grammar Language. [14] And the second(dataset-2) consists of
and limited resources, hence making the detection and 1200 images, categorized into 30 different classes.
translation difficult.There is also a developed real-time Each class represents a distinct sign in the Bangla Sign
BSL alphabet recognizer using deep learning which Language(BSL), with each class containing 40 images.
The images in the dataset are in RGB color space. Both B. Image Preprocessing
the are available in open-source resources.
In the next step of our research, we have prepro-
cessed all the obtained images found after merging
of datasets. All the input images were resized and
normalized in the range of 0 and 1 to ensure con-
sistency and facilitate the machine learning and deep
learning models to learn correctly and effectively. We
অ/য় আ ই/ঈ উ/ঊ র/ঋ have also made some adjustments in the brightness
of the images to enhance the robustness of machine
learning models. In general, we have done fine-tuning
only when our available image of the dataset is not
drastically different in context from the dataset on
এ ঐ ও ঔ ক which the pre-trained model is already trained. After
that as part of image augmentation, we have converted
them into four distinct types:
i. Grayscale
খ/ক্ষ গ ঘ ঙ চ We have converted all our images to grayscale
to remove color dependencies and reduce com-
putational complexity. The images were resized
to 128×128 pixels.
ii. Gaussian Blur
ছ জ/য ঝ ঞ ট Our model has applied a slight blur using a
5×5 kernel to introduce slight variations in image
texture.
iii. High Contrast Next, we have increased the
brightness (alpha=1.2, beta=30) of the corre-
ঠ ড ঢ ণ/ন ত sponding images.
iv. Low Contrast After that, we have also decreased
brightness by setting alpha=0.8 and beta=-30 of
the hand-gesture images.
থ দ ধ প ফ Image augmentation is a process of creating new
training examples from the existing ones. The pictures
above illustrate some of the dataset samples that we
have used for Bangla Sign Language recognition, con-
sisting of various hand gestures. Each row represents
ব/ভ ম ল শ/ষ/স হ
different gestures, while each column shows different
augmentation techniques applied to the original im-
ages.
The first picture of every row(a) in Figure 3 is
ংʘ ◌ং ০ ১ ২ indicating the original hand-gesture image of the sign
words, which is basically the colored images. After
collecting such samples, we have enhanced our dataset
using four key augmentation techniques: (a) Grayscale
conversion, (b) Gaussian Blur, (c) High Contrast, and
৩ ৪ ৫ ৬ ৭ (d) Low Contrast. These augmentations are placed
Fig. 2: Overview of our merged dataset sequentially ((b, c, d & e) of every row) in the
picture as mentioned in Figure 3. Such preprocessing
certainly improves the generalization of the model by
introducing variations in lighting, texture, and noise
After that, we have merged both the dataset to in- conditions.
crease the robustness of the model. The merger is sup- This diverse dataset ensures robustness in the recog-
posed to have equal balanced gesture classes and have nition of gestures under varying real-world conditions,
the real world variations better represented(dataset- making the model more effective for practical applica-
1+2). This dataset has been primarily used to evaluate tions. Besides the preprocessing steps greatly helped
various deep learning models as well as evaluating our system to learn about the hand gestures quickly
different machine learning models. and swiftly.
1(a) 1(b) 1(c) 1(d) 1(e)
IV. RESULTS AND ANALYSIS TABLE IV: KPI values of merged datasets using
The system has been evaluated with 9640 test machine learning algorithms
images in 77 classes which have not been used to
Algorithm Name Test Accuracy(%) F1 score(%) Recall(%)
train. We have calculated metrics (Key Performance KNN 96.05 96.01 96.05
Indicators) - Recall, F1-Score, and Accuracy to assess RandomForest 96.24 96.19 96.24
the performance of our hand gesture detection model.
TABLE I: Evaluation Metrics of dataset-1 using deep Similarly, table- IV depicts the values that we have
learning algorithms obtained during our experiment using machine learn-
ing algorithms. In this case, RandomForest has given
Algorithm Name Test Accuracy(%) F1 score(%) Recall(%) comparatively better results than KNN according to
CNN 98.09 98.09 98.09 our observation.
MobileNet v2 96.57 96.56 96.57
VGG16 94.64 94.64 94.64