0% found this document useful (0 votes)
32 views

Real Time Indian Sign Language Recognition and Speech Generation Using Convolutional Neural Network

Uploaded by

tejasbangre200
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Real Time Indian Sign Language Recognition and Speech Generation Using Convolutional Neural Network

Uploaded by

tejasbangre200
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Real Time Indian Sign Language Recognition and

Speech Generation using Convolutional Neural


Network
Prof. Durgeshnandini D. Meshram1, Rasika Ukarde2, Tejas Bangre3, Khemant Shahare4,
Ashwini Ramteke5
1
Assistant Professor, 2,3,4,5 Student
Department of Computer Science and Engineering,
Priyadarshini J. L. College of Engineering, Nagpur, India

Abstract
Sign language is an essential mode of communication for the deaf community worldwide. Even today, there exists an evident
communication gap between sign language users and non-sign language users. To resolving this problem, researchers have proposed
real time sign language recognition systems. While American Sign Language (ASL) has been extensively studied, Indian Sign
Language (ISL) has received less attention. The approach in this paper tries to explore the challenges and opportunities involved in
developing a real time sign language recognition system for ISL. The purpose of the project was to bridge the communication gap
for the hearing-impaired communities and provide a tool for better accessibility. The paper identifies the sign recognition and speech
generation challenges while trying to provide a product capable of recognizing different characters of ISL. The objective is to focus
on words rather than alphabets used by deaf people daily.

Keywords: CNN (Convolutional Neural Network), ISL (Indian Sign Language), SLR (Sign Language Recognition), Speech
Generation.
I. INTRODUCTION
Sign language (SL) is a visual-gestural language used by deaf and hard-hearing people for communication with others. Deaf
people express meaning through hand gestures or other parts of the body to convey their message. They have a whole different
vocabulary and grammar than spoken languages. [1]
One of the most common disabilities is speech hearing loss. According to the World Health Organization, the global
population will reach 630 million by 2030. India, the world's second-largest country, has around 63 million deaf or hearing-
impaired citizens, a figure that could be ten times higher. In the case of a hearing-impaired person, sign language can be a very
useful adaptation. Deaf people can use it to communicate with others daily, making it like a mother tongue for them. It is
based on visual hand signs and uses different vocabulary.[2]
The official language for the deaf in India is Indian Sign Language (ISL). It is made up of structured, coded motions, each
of which has a distinct meaning. People who are deaf or dumb used to rely on interpreters to help them interact with others,
but it can be difficult to find an appropriate interpretation. For a hearing-impaired person, sign language is the only completely
dependable form of communication and expressing their views to others. Accordingly, the field of sign language has seen a
rise in interest during the last few years.
II. RELATED WORK
Honnaraju B, Meghana M, et al. [3] This research paper focuses on recognition utilities that work in real time to ensure their
use in various situations. They have achieved their goal by creating a custom dataset that addresses issues such as rotational
variability and background dependence. Their system has successful training for all 36 ISL letters and numbers, with an
accuracy of 99%. In this paper, we present a method that uses the visual vocabulary model to identify Indian Sign Language
letters and numbers in a live video stream. To predict the data, this project used skin color and background subtraction along
with CNN and SVM to perform sign and speech prediction.
Snehal H, Manpreet S, et al. [4] In this research paper, they have conducted a survey on the deaf and hard-of-hearing
community, with an emphasis on the Indian Sign Language (ISL) recognition. In this study, they propose a development of a
hybrid CNN based system for the real-time recognition of ISL. While this system is still in its early stages of development,
our detailed analysis of current methods and their limits will provide valuable insights for its future implementation. By
promoting technology that responds to the specific needs of DHH communities, this survey paper highlights the importance
of socially responsible technology development.
Aman P, Avinash K., et al. [5] The main purpose of this research paper is to provide a feasible way of communication
between normal and dumb people by using hand gesture. This proposed system can be accessed by using a webcam or any
built-in camera that can detect the sign and process it for recognition. The system can also detect some data like Hello,
Welcome, Thank you, etc. The best method of image processing was found to be Human movement classification. The system
can recognize selected sign language characters in a controlled, low-light environment with 70% to 80% accuracy.
III. CONVOLUTIONAL NEURAL NETWORK
The Convolutional Neural Network (CNN) architecture used in this study is structured to effectively learn and classify sign
language images. The model's input layer accepts grayscale images with a size of 100 x 100 pixels. The network is successivel y
composed of multiple sets of Conv2D layers, each followed by MaxPooling2D layers. These convolutional layers play a crucial
role in extracting features from the input images, with ReLU activation functions introducing nonlinearity to facilitate learning
complex patterns. MaxPooling layers are designed to shrink spatial dimensions, which helps reduce computational complexity
and control overfitting. To prevent overfitting, dropout layers are strategically placed after each MaxPooling layer, which
randomly deactivates a portion of neurons during training.
Following the convolutional layers, a flattening layer reshapes the multidimensional output into a one-dimensional array
and prepares it for input into fully connected layers. The fully connected layers consist of a 512-unit dense layer with ReLU
activation, followed by a dropout layer with a dropout rate of 0.5 to further mitigate overfitting. Finally, the output layer uses
softmax activation, where 36 units represent the number of classes in the data set. During training, the model is compiled using
the Adam optimizer and the categorical cross-entropy loss function, where accuracy is the metric of interest. A batch size of
256 and 100 epochs are used for training.
IV. METHODOLOGY
Our proposed methodology for Real Time Indian Sign Language aims to accurately interpret and classify ISL gestures. To
achieve this, we employ a multi-step process that involve data acquisition, preprocessing, feature extraction, and model
training.[6] We have designed to accurately capture the intricate details and movements of ISL signs, collecting datasets. and
executing CNN Fig.1 shows the general flow of the proposed architecture. This diagram shows a representation of the
sequential steps involved.

Fig.1. Block Diagram


A. Data Acquisition
To develop a Real Time Indian Sign Language Recognition model, it is challenging to acquire a dataset consisting of multiple
images. In our proposed architecture, we have created a custom dataset with words and numbers. The goal is to collect new
hand gestures data directly from the webcam so that the dataset can be expanded with new examples. A proper dataset of hand
gestures is created so that images captured during communication can be recognized. The dataset is divided into two sets
training set and test set. The training set consists of about 80% random sampling without replacement of the rows in the training
set, and the remaining 20% is placed in the test set.
Fig. 2. Dataset
B. Preprocessing:
The processing in sign language recognition consists of various steps to enhance the quality and relevance of the input data.
Firstly, flipping the image in the Y-axis allows for consistent orientation of objects across the dataset. Secondly, converting
the color image to grayscale reduces the dimensionality of the data while preserving important visual information. Edge
detection further refines the preprocessing pipeline by identifying boundaries between objects in the image. Finally, image
segmentation using masks allows for precise delineation of regions of interest within the image.
C. Feature Extraction:
Feature extraction is a crucial step that derives meaningful representations from the pre-processed data. In the case of ISL
detection, features can be extracted at two levels: spatial and temporal. Spatial features: These features capture the static aspects
of a drawing gesture in a single image. They can be extracted using techniques such as Histogram of Oriented Gradients
(HOG), Scale-Invariant Feature Transform (SIFT) or Local Binary Patterns (LBP). These methods analyze the shape, texture
and edges of the hand or other relevant areas in the frame
D. Model Training
In the proposed architecture, a multilayer neural network was used to classify sign language signs A neural network takes
the extracted image vectors as input and predicts the corresponding sign language motion. A neural network consists of
interconnected layers: input layers, hidden and output layers. Each layer has a few neurons that compute input data and produce
output. The parameters of the network, such as weights and biases, are learned throughout the training process to improve
classification accuracy.
V. EXPERIMENT AND RESULT
The dataset was split into two different sets, with 80% of the data intended for training purposes and the remaining 20%
used for testing. It is noteworthy that the CNN classifiers showed impressive accuracy of the images. The system is specifically
designed to recognize 25 characters comprising 16 words and 10 digits. While the current results are promising, incorporating
some improvements could potentially lead to even better results.
Using CNN, we observed an overall accuracy of 78.4% on the training set in the last epoch, while the testing accuracy was
greater than 73.6%. The confusion matrix heatmap above shows the model's classification results for a subset of labels. Each
cell in the heatmap represents the number of cases where the true label (y-axis) and the predicted label (x-axis) match. This
visualization helps understand the model's performance in distinguishing between different ISL gestures within this subset of
labels.

Fig. 3. Result
Fig. 4. Confusion matrix

VI. CONCLUSION
In conclusion, our study achieves a commendable 73.4 % accuracy in real-time Indian sign language recognition and
speech generation. Our model bridges the communication gap for the hearing-impaired by effectively recognizing ISL gestures
from image inputs by utilizing Convolutional Neural Network (CNN) architecture. This innovation can change lives by
enabling smooth communication between ISL users and those who need to speak audibly. Even though our system shows
encouraging levels of accuracy, continuous efforts are focused on improving real-time processing capabilities and fine-tuning
the model architecture. We envision a future in which barriers to communication are removed, promoting inclusivity
and fair access to communication resources, with continued advancements.

REFERENCES
[1] Deep, A. Litoriya, A. Ingole, V. Asare, S. M. Bhole, S. Pathak. “Realtime Sign Language Detection and Recognition.”, 2022 2nd Asian
Conference on Innovation in Technology (ASIANCON), DOI: 10.1109/ASIANCON55314.2022.9908995, August 2022.
[2] K. Goyal, Dr. Velmathi G. “INDIAN SIGN LANGUAGE RECOGNITION USING MEDIAPIPE HOLISTIC”,
https://arxiv.org/ftp/arxiv/papers/2304/2304.10256.pdf , Vellore Institute of Technology Chennai.
[3] S. Hon, M. Sidhu, S. Marathe, T. A. Rane. “Real Time Indian Sign Language Recognition using Convolutional Neural Network”,
IJNRD.ORG, ISSN: 2456-4184, Department of Information Technology PICT, Pune. Volume 9, Issue 2 February 2024.
[4] Honnaraju B, Meghana M, Sanjana D S, Nisarga N S, Nikhil H R. “Sign Language Recognition Using Deep Learning (CNN) And
SVM”, International Research Journal of Modernization in Engineering Technology and Science, e-ISSN: 2582-5208,
Volume:05/Issue:05/May-2023.
[5] A. Pathak, A. Kumar, Priyam, P. Gupta, G. Chugh, Dr. Akhilesh Das, “Real Time Sign Language Detection”, International Journal for
Modern Trends in Science and Technology. 8(01): pp 32-37, ISSN: 2455-3778, Copyright © 2022.
[6] Dr. Honnaraju B, Meghana M, Sanjana D S, Nisarga N S, Nikhil H R “SIGN LANGUAGE RECOGNITION USING DEEP
LEARNING (CNN) AND SVM”. International Research Journal of Modernization in Engineering Technology and Science. Impact
Factor- 7.868. Volume:05/Issue:05/May-2023. e-ISSN: 2582-5208. May-2023.
[7] R. Kadwade, A. Tangade, N. Pakhare, S.Kolhe, H. Waikar, S. J. Wagh, “Indian Sign Language Recognition System”. International
Journal of Engineering Research & Technology (IJERT). , Maharaja Institute of Technology, Mysore, Karnataka, India. Vol. 12 Issue
05, May-2023. ISSN: 2278-0181. May-2023.
[8] V. Jain, A. Jain, A. Chauhan, S. S. Kotla, A. Gautam. “American Sign Language Recognition using Support Vector Machine and
Convolutional Neural Network”. International Journal of Information Technology 13. DOI:10.1007/s41870-021-00617-x February
2021.
[9] D. Kothadiya, C. Bhatt, K. Sapariya, K. Patel, A. -B. Gil-González, J. M. Corchado. “Deepsign: Sign Language Detection and
Recognition Using Deep Learning”. https://www.mdpi.com/journal/electronics. Electronics 2022, 11, 1780.
https://doi.org/10.3390/electronics11111780. Published: 3 June 2022.
[10] S. Malik, Y. Kholiwal, Dr. Jayashree J. ,“SIGN LANGUAGE RECOGNITION AND DETECTION: A COMPREHENSIVE SURVEY”.
Journal of Data Acquisition and Processing (JCST) Vol.38 (3) 2023. https://sjcjycl.cn/DOI: 10.5281/zenodo.777632. ISSN: 1004-9037
(2023)

You might also like