SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 609
Sign Language Text to Speech Converter using Image Processing and
CNN
Mangesh B.1, Mayur K.2, Rujali P.3
1Student, Dept. of Information Technology, Vidyalankar Institute of Technology, Mumbai, Maharashtra, India
2Student, Dept. of Information Technology, Vidyalankar Institute of Technology, Mumbai, Maharashtra, India
3Student, Dept. of Information Technology, Vidyalankar Institute of Technology, Mumbai, Maharashtra, India
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - The main form of communication for people who
are deaf or hard of hearing is considered to be Sign language.
Different innovations in automatic sign recognition try to
break down this communication barrier. Our contribution
considers a recognition system using the concept of Image
processing and Convolution Neural network. As we have used
image processing and CNN in our application compared to
existing applications we have improved the Integrity and
Flexibility of application. Such system will enable common
people to understand the sign language with the hearing
impaired.
Key Words: Image Processing, Gesture recognition,
Convolution Neural Network, Train and Testing of Gestures,
Thresholding
1. INTRODUCTION
In the 21st century field of science and technology, one has
reached such a level that people are expecting more
comfortable and useful things, which can make their lives
easier. Nowadays, homes with voicerecognition builtinwith
the sense of gestures have already beenconceived.There are
video games in the market which can be played with real
time gestures and all this has been possible with the advent
of the new technology. Even our mobiles have been loaded
with all similar technologies. Nevertheless, there are people
who are less fortunate than us and arephysicallychallenged,
may it be deafness or being aphonic.
Figure 1.1 - Hand Signs.
These people have some expectations from the researchers
and mostly from a computer scientist that computer
scientists can provide some machine or a model which can
help them to communicate and express their feelings with
others. The deaf and the mute can only perceive visual,
therefore communication is done by visual and sound. This,
sign language is a medium for communication between the
deaf and the mute. Sign Language. Recognition(SLR)isa tool
that executes the conversion of sign language into text and
further into speech. Research in SLR started two decades
before, all over the world especially in American Sign
Languages. Based on statistical world analysis, over 5% of
the world’s population – 360 million people have hearing
disability. The biggest drawback for the deaf is their
employment issues. Communication has always played a
very vital role in getting the task solved so therefore, the
issue. In order to help the deaf people communicatewiththe
ordinary people, we build a system to translate sign
language into text and further into speech. This concept
proposed a system that can automatically detect hand signs
of alphabets in American Sign Language (ASL) that is all the
English alphabets. To create spaces between the letters we
create extra two signs, i.e., SPACE and OK. The SPACE sign is
used to create the space between different recognized
words, and the OK sign is used to stop capturing and start to
execute a current required function. This system is basedon
American Sign Language (ASL), which is considered to be a
complete language. The main focus of this project is for
helping the deaf and the mute by converting hand gestures
to speech. Finger sign is a subset of sign language, and uses
finger signs to spell words of the spokenor writtenlanguage.
The finger sign recognitiontask involvesthesegmentationof
finger sign hand gestures from image sequences. ASL
(American Sign Language) isthefourthmostcommonlyused
language in the USA and is extensively used by deaf people
and this language is officially acquired by the deaf society of
United States. It is a unique language that highlights signs
made by moving the hands. ASL is not defined as the world
language but it has its roots in English speaking parts in
Canada, few regions of Mexico, and all over United States of
America. Human beings are gifted with a voice that allows
them to communicate with each other. Therefore, spoken
language becomes one of the main key points of humans.
Unfortunately, not everybody has this capability because of
one sense, i.e., hearing. In India, there are around 5 to 15
million deaf people approx. Signlanguageisconsideredto be
the basic alternative communication method between the
deaf people and several dictionaries of words or single
letters have been defined to make this communication
strong and effective. Without an interpreter it gets difficult
for such a communication to take place. Therefore, a system
that converts symbols in sign languages into plain text and
further into speech can help with real-time communication.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 610
2. RELATED WORK
Yi Li(2012), This system consist of three components-Hand
detection, Finger IdentificationandGesture recognition. This
system is built on Candescent NUI project, which is freely
available online. Open NI framework was used to extractthe
depth data from the 3D sensor[1]. Zhou Ren, Jingjing Meng,
JunsongYaun(2011),The depth sensors like the Xtion Pro
Live sensor, have given rise to new opportunities for human
-computer interaction (HCI). There is a great progress that
has been made by using Xtion Pro Live sensor in human
body tracking and body gestures recognition, robust hand
gesture recognition which still remains a problem.
Compared to the human body,thehandissmallerobject, and
has more complex articulations. Thus a hand is easily
affected by segmentation errors as compared to entire
human body[2]. Nobuhiko Tanibata, Nobutaka Shimada,
Yoshiaki Shirai(2002), Obtain hand features from sequence
of images. This is done by segmenting and tracking the face
and hands skin colour. The tracking of elbow is done by
matching the template of an elbow shape. The hand features
like the area of hand, direction of hand motion etc. therefore
are extracted and are then input to Hidden Markov Model
(HMM) [3]. Spencer D Kelly, Sarah M Manning, Sabrina
Rodak(2008), Recognise hand postures used in various sign
languages usingnovel hand posturefeature,Eigen-spaceSize
function and Support Vector Machine(SVM) based gesture
recognition framework. They used a combination of Hu
moments and Eigen-space size function to classify different
hand postures [4].
3. PROPOSED ARCHITECTURE AND METHODOLOGY
Figure 3.1 – System Architecture
 First we create and store the hand gestures of signs
with the help of image processing techniques, i.e.,
converting RBG(colour) image to grayscale.
 And then again converting that grayscale image to
binary image using threshold.
 Then we smooth the image using Gaussian and
Median blur technique and to recognize the edges of
hand gestures we use contours.
 Then we store these gestures in database and after
that we use CNN algorithm on these stored images of
hand gestures using Tensorflow and Keras.
 Tensorflow and Keras is used to test and train the
system, which then recognizes thegestureswhichare
stored in the database and gives appropriate results
when the user runs the application.
Figure 3.2 – Application Flow [6]
3.1. Image processing:
Figure 3.1.1 – Image Processing of hand Gestures
In this application for image processing of hand gestures we
have used OpenCV libraries. In image processing of hand
gestures, first we have converted the captured hand gesture
from RGB to HSV, i.e., in Hue Saturation and Value. After
converting the image into HSV we have used Gaussian blur
and median blur to smooth the image. Then we do
thresholding of an image to split an image into smaller
segments or junks using gray scale value to define their
boundary. It also reduces the complexity of the data and
simplifies the process of recognition and classification. After
thresholding to determine the shape of a hand gesture,
contour is used as contour are useful tools for shape
analysis, object detection and recognition.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 611
3.2. Convolution Neural Network Algorithm:
Figure 3.2.1 – CNN block Diagram [5]
For hand gesture recognition and classification, we have
used CNN architecture. Convolutional Neural Networks are
implemented successfully for human gesture recognition in
recent times and also in image recognitionandclassification.
Research says there has been work done in the field of sign
language recognition with the help of deep language of
CNN’s, with input recognition that is sensitive to more than
just pixels of the images. The camera makes the process
much easier to develop characteristic depth and motion
profiles for each sign language gesturesotherefore,itsenses
depth and contour. The advantage of CNN is its abilities to
learn features as well as the weight corresponding to each
feature. CNNs seek to optimize some objective function,
especially the loss function. We utilized the softmax-based
loss function:
Equation (2) is the softmax function. It takes the feature
vector z for the given training example, and squashes its
value to a vector of [0, 1] valued real numbers summing to 1
[7]. Equation (1) takes the mean loss for each training [7].
3.3. Dataset:
We basically supervised learning, and learning by training
the network with our own set of sign dataset. We have
classified letters and digits, i.e. A-Z and 0-9. We initially
trained and tested on self-generated dataset of images we
took ourselves. The images were trained and testedwith the
help of Google Collab. This datasetconsistsof1200images of
each alphabet and the digits 0-9. Additionally pipeline was
created, so that people are able to generate and continue to
add the gesture in dataset.
4. IMPLEMENTATION DETAILS
In these section, the implementation of Sign language text to
speech converter using image processing and OpenCv has
been described. In section A of this topic, the actual
methodology used has been described with respecttoall the
modules. Section B are the snapshots of the application with
their description showing the implemented application in
detail.
A) Methodology Used
B) Figure 4.1 – CNN architecture [9]
CNN contains four types of layers: convolution layers,
pooling/subsampling layers, nonlinear layers, and fully
connected layers. It captures various image features and
complex non - linear features and interactions. The softmax
layer is used to recognize the hand gesture/signs. We have
used common CNN architecture, consisting of multiple
convolutional and dense layers. The architecture consists of
3 groups of 2 convolutional layers followed by max pool
layer and dropout layer, and 2 groups of fully connected
layer followed by dropout layer and one final output layer.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 612
Convolution Layer:
Figure 4.2 – Conv2D layer [5]
Briefly, some background a convolution layer scans a
source image with a filter of, for instance, 5×5 pixels, to
extract features which can be important for classification.
This filter is additionally called the convolution kernel.
The kernel also contains weights, which are tuned within
the training of the model to realize the foremost accurate
predictions. During a 5×5 kernel, for every 5×5 pixel
region, the model computes the dot products between the
image pixel values and therefore the weights defined
within the filter. A 2D convolution layer means the input
of the convolution operation is three dimensional, as an
example, a colour image which features a worth for each
pixel across three layers: red, blue and green. However,
it's called a 2D convolution because the movement of the
filter across the image happens in two dimensions. The
filter is meet the image 3 times, once for every of the
three layers. After the convolution ends, the features are
down sampled, then an equivalent convolutional
structure repeats again. At first, the convolution
identifies features within the original image, then it
identifies sub-features within smaller parts of the image.
Eventually, this process is supposed to spot the essential
features which will help classify the image.
Pooling/Subsampling:
Figure 4.3 – Pooling layer [6]
Pooling layer is another building block ofCNN.Itreducesthe
spatial size of representation to decrease the quantity of
parameters and computation within the network. Pooling
layer operates on each feature map independently.
Fully Connected Layer:
Figure 4.4 – Fully connected layer [6]
The objective of fully connected layeristoneedthe results of
pooling process and use them to classify theimageintolabel.
The output of pooling is flattened into one vector of values
where each value is representing a probability that a
specific feature is belonging to a label. For instance, if the
image is of a dog, features representing things like
whiskers or fur should have high probabilities for the label
“dog”. The fully connected a part of the CNN network goes
through its own backpropagation process to work out the
foremost accurate weights. Each neuron receives weights
that prioritize the foremost appropriate label. Finally, the
neurons “vote” on each of the labels, and thus the winner of
that vote is that the classification decision.
5. RESULTS AND DISCUSSIONS
Figure 5.1 – Setting Hand Threshold
In gesture creation, we have to set the hand coordinates as
shown in fig. While setting thecoordinatesofhandgesture,it
is converted to binary image for minimizing the background
disturbance.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 613
Figure 5.2 – Final layout of Application
In the above fig. the final layout of application is shown. The
layout of application consist of Threshold window and
Gesture recognition window.
Figure 5.3 – Gesture sign alphabet
In above fig. we can see the binary image and gesture of “B”
Alphabet and its result in Text Mode.
Figure 5.4 – Gesture sign alphabet
In above fig. we can see the threshold and gesture of “5”
number. In Text mode we can see the result of 5 and
previous hand gestures.
As we have trained and tested our own dataset, theaccuracy
rate of an application is approximately 85%. And as we can
see it’s a research based project the purpose of this project
and also the accuracy rate will be fulfilled satisfactorily in
the coming updates.
The system will provide an interface which will easily
communicate with deaf people by signing recognition. The
system is not applied only in family environment, but also
can apply in public. For the social use, this technique is
extremely helpful for deaf and dumb people. We will build a
simple gesture recognizer based on OpenCV toolkit and
integrated it into Visionary framework.Asa yesgesturewe’ll
price and down hand motions regardless of which hand is
employed.
The project focuses on distinguishing among various
different alphabets of English language. Future work may
include recognition of all theEnglishalphabetsandnumbers.
Furthermore, we may move on to recognitionofwords,from
as large as a dictionary as possible.
REFERENCES
[1] Yi Li (2012),”Hand Gesture Recognition Using Kinect”.
[2] Zhou Ren, JingjingMeng, JunsongYaun(2011),”Robust
Hand Gesture Recognition with Kinect Sensor”
[3] ]Nobuhiko Tanibata, Nobutaka Shimada, Yoshiaki
Shirai(2002),”Extraction of Hand Features.
[4] Ferdousi, Z., “Design and Development of a Real Time
Gesture Recognition System”, U.M.I. Publishers, June
2008.
[5] https://missinglink.ai/guides/keras/keras-conv2d-
working-cnn-2d-convolutions-keras
[6] Using Convolutional Neural Networks Image
Recognition by Samer Hijazi, Rishi Kumar, and Chris
Rowen, IP Group, Cadence.
[7] Real-time American Sign Language Recognition with
Convolutional Neural Networks Brandon Garcia
Stanford University Stanford,CAbgarcia7@stanford.edu
Sigberto Alarcon Viesca Stanford University Stanford,
CA.
[8] Subha Rajam, P. and Balakrishnan, G.(2011), “Real Time
Sign Language Recognition System to aid Deaf-dumb
People”, IEEE, pg: 737- 742,2011.
[9] Sharmila Gaekwad, Akanksha Shetty, Akshaya Satam,
Mihir Rathod, Pooja Shah(2019), “Recognition of
American Sign Language using Image processing and
Machine Learning”, IJCSMC, pg: 352-357,2019.
6. CONCLUSIONS AND FUTURE SCOPE

More Related Content

What's hot (20)

PDF
IRJET- Hand Gesture based Recognition using CNN Methodology
IRJET Journal
 
PDF
GRS '“ Gesture based Recognition System for Indian Sign Language Recognition ...
ijtsrd
 
PDF
Development of Sign Signal Translation System Based on Altera’s FPGA DE2 Board
Waqas Tariq
 
PDF
IRJET- Survey on Sign Language and Gesture Recognition System
IRJET Journal
 
PDF
A Translation Device for the Vision Based Sign Language
ijsrd.com
 
PDF
A gesture recognition system for the Colombian sign language based on convolu...
journalBEEI
 
PDF
IRJET- Review on Raspberry Pi based Assistive Communication System for Blind,...
IRJET Journal
 
PDF
Real time Myanmar Sign Language Recognition System using PCA and SVM
ijtsrd
 
PPTX
Movement Tracking in Real-time Hand Gesture Recognition
Pranav Kulkarni
 
PDF
IRJET - Sign to Speech Smart Glove
IRJET Journal
 
PDF
IRJET- Sign Language Interpreter
IRJET Journal
 
PDF
IRJET - Chatbot with Gesture based User Input
IRJET Journal
 
DOCX
HandOVRS p-report
Ashwani Kumar
 
PDF
IRJET- A Review on Iot Based Sign Language Conversion
IRJET Journal
 
PDF
Gesture recognition using artificial neural network,a technology for identify...
NidhinRaj Saikripa
 
PDF
IRJET- Vision Based Sign Language by using Matlab
IRJET Journal
 
PDF
IRJET- Hand Gesture Recognition System using Convolutional Neural Networks
IRJET Journal
 
PDF
Video Audio Interface for recognizing gestures of Indian sign Language
CSCJournals
 
PDF
IRJET - Paint using Hand Gesture
IRJET Journal
 
PDF
delna's journal
Delna Domini
 
IRJET- Hand Gesture based Recognition using CNN Methodology
IRJET Journal
 
GRS '“ Gesture based Recognition System for Indian Sign Language Recognition ...
ijtsrd
 
Development of Sign Signal Translation System Based on Altera’s FPGA DE2 Board
Waqas Tariq
 
IRJET- Survey on Sign Language and Gesture Recognition System
IRJET Journal
 
A Translation Device for the Vision Based Sign Language
ijsrd.com
 
A gesture recognition system for the Colombian sign language based on convolu...
journalBEEI
 
IRJET- Review on Raspberry Pi based Assistive Communication System for Blind,...
IRJET Journal
 
Real time Myanmar Sign Language Recognition System using PCA and SVM
ijtsrd
 
Movement Tracking in Real-time Hand Gesture Recognition
Pranav Kulkarni
 
IRJET - Sign to Speech Smart Glove
IRJET Journal
 
IRJET- Sign Language Interpreter
IRJET Journal
 
IRJET - Chatbot with Gesture based User Input
IRJET Journal
 
HandOVRS p-report
Ashwani Kumar
 
IRJET- A Review on Iot Based Sign Language Conversion
IRJET Journal
 
Gesture recognition using artificial neural network,a technology for identify...
NidhinRaj Saikripa
 
IRJET- Vision Based Sign Language by using Matlab
IRJET Journal
 
IRJET- Hand Gesture Recognition System using Convolutional Neural Networks
IRJET Journal
 
Video Audio Interface for recognizing gestures of Indian sign Language
CSCJournals
 
IRJET - Paint using Hand Gesture
IRJET Journal
 
delna's journal
Delna Domini
 

Similar to IRJET - Sign Language Text to Speech Converter using Image Processing and CNN (20)

PDF
Sign Language Recognition using Mediapipe
IRJET Journal
 
PDF
DHWANI- THE VOICE OF DEAF AND MUTE
IRJET Journal
 
PDF
DHWANI- THE VOICE OF DEAF AND MUTE
IRJET Journal
 
PDF
Real Time Translator for Sign Language
ijtsrd
 
PDF
Sign Language Identification based on Hand Gestures
IRJET Journal
 
PDF
Sign Language Detection using Action Recognition
IRJET Journal
 
PDF
Real-Time Sign Language Detector
IRJET Journal
 
PDF
IRJET- ASL Language Translation using ML
IRJET Journal
 
PDF
Real Time Sign Language Detection
IRJET Journal
 
PDF
Hand Gesture Recognition System Using Holistic Mediapipe
IRJET Journal
 
PDF
Sign Language Recognition
IRJET Journal
 
PDF
IRJET - Sign Language Converter
IRJET Journal
 
PDF
Design and Development of Motion Based Continuous Sign Language Detection
IRJET Journal
 
PDF
INDIAN SIGN LANGUAGE TRANSLATION FOR HARD-OF-HEARING AND HARD-OF-SPEAKING COM...
IRJET Journal
 
PDF
Live Sign Language Translation: A Survey
IRJET Journal
 
PDF
SIGN LANGUAGE RECOGNITION USING MACHINE LEARNING
IRJET Journal
 
PDF
KANNADA SIGN LANGUAGE RECOGNITION USINGMACHINE LEARNING
IRJET Journal
 
PDF
IRJET- Gesture Recognition for Indian Sign Language using HOG and SVM
IRJET Journal
 
PDF
Sign Language Recognition using Machine Learning
IRJET Journal
 
PDF
Sign Language Recognition with Gesture Analysis
paperpublications3
 
Sign Language Recognition using Mediapipe
IRJET Journal
 
DHWANI- THE VOICE OF DEAF AND MUTE
IRJET Journal
 
DHWANI- THE VOICE OF DEAF AND MUTE
IRJET Journal
 
Real Time Translator for Sign Language
ijtsrd
 
Sign Language Identification based on Hand Gestures
IRJET Journal
 
Sign Language Detection using Action Recognition
IRJET Journal
 
Real-Time Sign Language Detector
IRJET Journal
 
IRJET- ASL Language Translation using ML
IRJET Journal
 
Real Time Sign Language Detection
IRJET Journal
 
Hand Gesture Recognition System Using Holistic Mediapipe
IRJET Journal
 
Sign Language Recognition
IRJET Journal
 
IRJET - Sign Language Converter
IRJET Journal
 
Design and Development of Motion Based Continuous Sign Language Detection
IRJET Journal
 
INDIAN SIGN LANGUAGE TRANSLATION FOR HARD-OF-HEARING AND HARD-OF-SPEAKING COM...
IRJET Journal
 
Live Sign Language Translation: A Survey
IRJET Journal
 
SIGN LANGUAGE RECOGNITION USING MACHINE LEARNING
IRJET Journal
 
KANNADA SIGN LANGUAGE RECOGNITION USINGMACHINE LEARNING
IRJET Journal
 
IRJET- Gesture Recognition for Indian Sign Language using HOG and SVM
IRJET Journal
 
Sign Language Recognition using Machine Learning
IRJET Journal
 
Sign Language Recognition with Gesture Analysis
paperpublications3
 
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
PDF
Kiona – A Smart Society Automation Project
IRJET Journal
 
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
PDF
Breast Cancer Detection using Computer Vision
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
Kiona – A Smart Society Automation Project
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Ad

Recently uploaded (20)

PDF
SERVERLESS PERSONAL TO-DO LIST APPLICATION
anushaashraf20
 
PPT
Testing and final inspection of a solar PV system
MuhammadSanni2
 
PPT
Footbinding.pptmnmkjkjkknmnnjkkkkkkkkkkkkkk
mamadoundiaye42742
 
PPTX
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
PPTX
Water Resources Engineering (CVE 728)--Slide 4.pptx
mohammedado3
 
PPTX
Introduction to Internal Combustion Engines - Types, Working and Camparison.pptx
UtkarshPatil98
 
PPTX
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
PPTX
DATA BASE MANAGEMENT AND RELATIONAL DATA
gomathisankariv2
 
PDF
Halide Perovskites’ Multifunctional Properties: Coordination Engineering, Coo...
TaameBerhe2
 
PDF
mbse_An_Introduction_to_Arcadia_20150115.pdf
henriqueltorres1
 
PDF
Electrical Engineer operation Supervisor
ssaruntatapower143
 
PDF
Design Thinking basics for Engineers.pdf
CMR University
 
PDF
3rd International Conference on Machine Learning and IoT (MLIoT 2025)
ClaraZara1
 
PPTX
What is Shot Peening | Shot Peening is a Surface Treatment Process
Vibra Finish
 
PDF
20ES1152 Programming for Problem Solving Lab Manual VRSEC.pdf
Ashutosh Satapathy
 
PDF
Digital water marking system project report
Kamal Acharya
 
PDF
REINFORCEMENT LEARNING IN DECISION MAKING SEMINAR REPORT
anushaashraf20
 
PPTX
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
PPT
New_school_Engineering_presentation_011707.ppt
VinayKumar304579
 
PPTX
2025 CGI Congres - Surviving agile v05.pptx
Derk-Jan de Grood
 
SERVERLESS PERSONAL TO-DO LIST APPLICATION
anushaashraf20
 
Testing and final inspection of a solar PV system
MuhammadSanni2
 
Footbinding.pptmnmkjkjkknmnnjkkkkkkkkkkkkkk
mamadoundiaye42742
 
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
Water Resources Engineering (CVE 728)--Slide 4.pptx
mohammedado3
 
Introduction to Internal Combustion Engines - Types, Working and Camparison.pptx
UtkarshPatil98
 
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
DATA BASE MANAGEMENT AND RELATIONAL DATA
gomathisankariv2
 
Halide Perovskites’ Multifunctional Properties: Coordination Engineering, Coo...
TaameBerhe2
 
mbse_An_Introduction_to_Arcadia_20150115.pdf
henriqueltorres1
 
Electrical Engineer operation Supervisor
ssaruntatapower143
 
Design Thinking basics for Engineers.pdf
CMR University
 
3rd International Conference on Machine Learning and IoT (MLIoT 2025)
ClaraZara1
 
What is Shot Peening | Shot Peening is a Surface Treatment Process
Vibra Finish
 
20ES1152 Programming for Problem Solving Lab Manual VRSEC.pdf
Ashutosh Satapathy
 
Digital water marking system project report
Kamal Acharya
 
REINFORCEMENT LEARNING IN DECISION MAKING SEMINAR REPORT
anushaashraf20
 
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
New_school_Engineering_presentation_011707.ppt
VinayKumar304579
 
2025 CGI Congres - Surviving agile v05.pptx
Derk-Jan de Grood
 

IRJET - Sign Language Text to Speech Converter using Image Processing and CNN

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 609 Sign Language Text to Speech Converter using Image Processing and CNN Mangesh B.1, Mayur K.2, Rujali P.3 1Student, Dept. of Information Technology, Vidyalankar Institute of Technology, Mumbai, Maharashtra, India 2Student, Dept. of Information Technology, Vidyalankar Institute of Technology, Mumbai, Maharashtra, India 3Student, Dept. of Information Technology, Vidyalankar Institute of Technology, Mumbai, Maharashtra, India ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - The main form of communication for people who are deaf or hard of hearing is considered to be Sign language. Different innovations in automatic sign recognition try to break down this communication barrier. Our contribution considers a recognition system using the concept of Image processing and Convolution Neural network. As we have used image processing and CNN in our application compared to existing applications we have improved the Integrity and Flexibility of application. Such system will enable common people to understand the sign language with the hearing impaired. Key Words: Image Processing, Gesture recognition, Convolution Neural Network, Train and Testing of Gestures, Thresholding 1. INTRODUCTION In the 21st century field of science and technology, one has reached such a level that people are expecting more comfortable and useful things, which can make their lives easier. Nowadays, homes with voicerecognition builtinwith the sense of gestures have already beenconceived.There are video games in the market which can be played with real time gestures and all this has been possible with the advent of the new technology. Even our mobiles have been loaded with all similar technologies. Nevertheless, there are people who are less fortunate than us and arephysicallychallenged, may it be deafness or being aphonic. Figure 1.1 - Hand Signs. These people have some expectations from the researchers and mostly from a computer scientist that computer scientists can provide some machine or a model which can help them to communicate and express their feelings with others. The deaf and the mute can only perceive visual, therefore communication is done by visual and sound. This, sign language is a medium for communication between the deaf and the mute. Sign Language. Recognition(SLR)isa tool that executes the conversion of sign language into text and further into speech. Research in SLR started two decades before, all over the world especially in American Sign Languages. Based on statistical world analysis, over 5% of the world’s population – 360 million people have hearing disability. The biggest drawback for the deaf is their employment issues. Communication has always played a very vital role in getting the task solved so therefore, the issue. In order to help the deaf people communicatewiththe ordinary people, we build a system to translate sign language into text and further into speech. This concept proposed a system that can automatically detect hand signs of alphabets in American Sign Language (ASL) that is all the English alphabets. To create spaces between the letters we create extra two signs, i.e., SPACE and OK. The SPACE sign is used to create the space between different recognized words, and the OK sign is used to stop capturing and start to execute a current required function. This system is basedon American Sign Language (ASL), which is considered to be a complete language. The main focus of this project is for helping the deaf and the mute by converting hand gestures to speech. Finger sign is a subset of sign language, and uses finger signs to spell words of the spokenor writtenlanguage. The finger sign recognitiontask involvesthesegmentationof finger sign hand gestures from image sequences. ASL (American Sign Language) isthefourthmostcommonlyused language in the USA and is extensively used by deaf people and this language is officially acquired by the deaf society of United States. It is a unique language that highlights signs made by moving the hands. ASL is not defined as the world language but it has its roots in English speaking parts in Canada, few regions of Mexico, and all over United States of America. Human beings are gifted with a voice that allows them to communicate with each other. Therefore, spoken language becomes one of the main key points of humans. Unfortunately, not everybody has this capability because of one sense, i.e., hearing. In India, there are around 5 to 15 million deaf people approx. Signlanguageisconsideredto be the basic alternative communication method between the deaf people and several dictionaries of words or single letters have been defined to make this communication strong and effective. Without an interpreter it gets difficult for such a communication to take place. Therefore, a system that converts symbols in sign languages into plain text and further into speech can help with real-time communication.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 610 2. RELATED WORK Yi Li(2012), This system consist of three components-Hand detection, Finger IdentificationandGesture recognition. This system is built on Candescent NUI project, which is freely available online. Open NI framework was used to extractthe depth data from the 3D sensor[1]. Zhou Ren, Jingjing Meng, JunsongYaun(2011),The depth sensors like the Xtion Pro Live sensor, have given rise to new opportunities for human -computer interaction (HCI). There is a great progress that has been made by using Xtion Pro Live sensor in human body tracking and body gestures recognition, robust hand gesture recognition which still remains a problem. Compared to the human body,thehandissmallerobject, and has more complex articulations. Thus a hand is easily affected by segmentation errors as compared to entire human body[2]. Nobuhiko Tanibata, Nobutaka Shimada, Yoshiaki Shirai(2002), Obtain hand features from sequence of images. This is done by segmenting and tracking the face and hands skin colour. The tracking of elbow is done by matching the template of an elbow shape. The hand features like the area of hand, direction of hand motion etc. therefore are extracted and are then input to Hidden Markov Model (HMM) [3]. Spencer D Kelly, Sarah M Manning, Sabrina Rodak(2008), Recognise hand postures used in various sign languages usingnovel hand posturefeature,Eigen-spaceSize function and Support Vector Machine(SVM) based gesture recognition framework. They used a combination of Hu moments and Eigen-space size function to classify different hand postures [4]. 3. PROPOSED ARCHITECTURE AND METHODOLOGY Figure 3.1 – System Architecture  First we create and store the hand gestures of signs with the help of image processing techniques, i.e., converting RBG(colour) image to grayscale.  And then again converting that grayscale image to binary image using threshold.  Then we smooth the image using Gaussian and Median blur technique and to recognize the edges of hand gestures we use contours.  Then we store these gestures in database and after that we use CNN algorithm on these stored images of hand gestures using Tensorflow and Keras.  Tensorflow and Keras is used to test and train the system, which then recognizes thegestureswhichare stored in the database and gives appropriate results when the user runs the application. Figure 3.2 – Application Flow [6] 3.1. Image processing: Figure 3.1.1 – Image Processing of hand Gestures In this application for image processing of hand gestures we have used OpenCV libraries. In image processing of hand gestures, first we have converted the captured hand gesture from RGB to HSV, i.e., in Hue Saturation and Value. After converting the image into HSV we have used Gaussian blur and median blur to smooth the image. Then we do thresholding of an image to split an image into smaller segments or junks using gray scale value to define their boundary. It also reduces the complexity of the data and simplifies the process of recognition and classification. After thresholding to determine the shape of a hand gesture, contour is used as contour are useful tools for shape analysis, object detection and recognition.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 611 3.2. Convolution Neural Network Algorithm: Figure 3.2.1 – CNN block Diagram [5] For hand gesture recognition and classification, we have used CNN architecture. Convolutional Neural Networks are implemented successfully for human gesture recognition in recent times and also in image recognitionandclassification. Research says there has been work done in the field of sign language recognition with the help of deep language of CNN’s, with input recognition that is sensitive to more than just pixels of the images. The camera makes the process much easier to develop characteristic depth and motion profiles for each sign language gesturesotherefore,itsenses depth and contour. The advantage of CNN is its abilities to learn features as well as the weight corresponding to each feature. CNNs seek to optimize some objective function, especially the loss function. We utilized the softmax-based loss function: Equation (2) is the softmax function. It takes the feature vector z for the given training example, and squashes its value to a vector of [0, 1] valued real numbers summing to 1 [7]. Equation (1) takes the mean loss for each training [7]. 3.3. Dataset: We basically supervised learning, and learning by training the network with our own set of sign dataset. We have classified letters and digits, i.e. A-Z and 0-9. We initially trained and tested on self-generated dataset of images we took ourselves. The images were trained and testedwith the help of Google Collab. This datasetconsistsof1200images of each alphabet and the digits 0-9. Additionally pipeline was created, so that people are able to generate and continue to add the gesture in dataset. 4. IMPLEMENTATION DETAILS In these section, the implementation of Sign language text to speech converter using image processing and OpenCv has been described. In section A of this topic, the actual methodology used has been described with respecttoall the modules. Section B are the snapshots of the application with their description showing the implemented application in detail. A) Methodology Used B) Figure 4.1 – CNN architecture [9] CNN contains four types of layers: convolution layers, pooling/subsampling layers, nonlinear layers, and fully connected layers. It captures various image features and complex non - linear features and interactions. The softmax layer is used to recognize the hand gesture/signs. We have used common CNN architecture, consisting of multiple convolutional and dense layers. The architecture consists of 3 groups of 2 convolutional layers followed by max pool layer and dropout layer, and 2 groups of fully connected layer followed by dropout layer and one final output layer.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 612 Convolution Layer: Figure 4.2 – Conv2D layer [5] Briefly, some background a convolution layer scans a source image with a filter of, for instance, 5×5 pixels, to extract features which can be important for classification. This filter is additionally called the convolution kernel. The kernel also contains weights, which are tuned within the training of the model to realize the foremost accurate predictions. During a 5×5 kernel, for every 5×5 pixel region, the model computes the dot products between the image pixel values and therefore the weights defined within the filter. A 2D convolution layer means the input of the convolution operation is three dimensional, as an example, a colour image which features a worth for each pixel across three layers: red, blue and green. However, it's called a 2D convolution because the movement of the filter across the image happens in two dimensions. The filter is meet the image 3 times, once for every of the three layers. After the convolution ends, the features are down sampled, then an equivalent convolutional structure repeats again. At first, the convolution identifies features within the original image, then it identifies sub-features within smaller parts of the image. Eventually, this process is supposed to spot the essential features which will help classify the image. Pooling/Subsampling: Figure 4.3 – Pooling layer [6] Pooling layer is another building block ofCNN.Itreducesthe spatial size of representation to decrease the quantity of parameters and computation within the network. Pooling layer operates on each feature map independently. Fully Connected Layer: Figure 4.4 – Fully connected layer [6] The objective of fully connected layeristoneedthe results of pooling process and use them to classify theimageintolabel. The output of pooling is flattened into one vector of values where each value is representing a probability that a specific feature is belonging to a label. For instance, if the image is of a dog, features representing things like whiskers or fur should have high probabilities for the label “dog”. The fully connected a part of the CNN network goes through its own backpropagation process to work out the foremost accurate weights. Each neuron receives weights that prioritize the foremost appropriate label. Finally, the neurons “vote” on each of the labels, and thus the winner of that vote is that the classification decision. 5. RESULTS AND DISCUSSIONS Figure 5.1 – Setting Hand Threshold In gesture creation, we have to set the hand coordinates as shown in fig. While setting thecoordinatesofhandgesture,it is converted to binary image for minimizing the background disturbance.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 613 Figure 5.2 – Final layout of Application In the above fig. the final layout of application is shown. The layout of application consist of Threshold window and Gesture recognition window. Figure 5.3 – Gesture sign alphabet In above fig. we can see the binary image and gesture of “B” Alphabet and its result in Text Mode. Figure 5.4 – Gesture sign alphabet In above fig. we can see the threshold and gesture of “5” number. In Text mode we can see the result of 5 and previous hand gestures. As we have trained and tested our own dataset, theaccuracy rate of an application is approximately 85%. And as we can see it’s a research based project the purpose of this project and also the accuracy rate will be fulfilled satisfactorily in the coming updates. The system will provide an interface which will easily communicate with deaf people by signing recognition. The system is not applied only in family environment, but also can apply in public. For the social use, this technique is extremely helpful for deaf and dumb people. We will build a simple gesture recognizer based on OpenCV toolkit and integrated it into Visionary framework.Asa yesgesturewe’ll price and down hand motions regardless of which hand is employed. The project focuses on distinguishing among various different alphabets of English language. Future work may include recognition of all theEnglishalphabetsandnumbers. Furthermore, we may move on to recognitionofwords,from as large as a dictionary as possible. REFERENCES [1] Yi Li (2012),”Hand Gesture Recognition Using Kinect”. [2] Zhou Ren, JingjingMeng, JunsongYaun(2011),”Robust Hand Gesture Recognition with Kinect Sensor” [3] ]Nobuhiko Tanibata, Nobutaka Shimada, Yoshiaki Shirai(2002),”Extraction of Hand Features. [4] Ferdousi, Z., “Design and Development of a Real Time Gesture Recognition System”, U.M.I. Publishers, June 2008. [5] https://missinglink.ai/guides/keras/keras-conv2d- working-cnn-2d-convolutions-keras [6] Using Convolutional Neural Networks Image Recognition by Samer Hijazi, Rishi Kumar, and Chris Rowen, IP Group, Cadence. [7] Real-time American Sign Language Recognition with Convolutional Neural Networks Brandon Garcia Stanford University Stanford,[email protected] Sigberto Alarcon Viesca Stanford University Stanford, CA. [8] Subha Rajam, P. and Balakrishnan, G.(2011), “Real Time Sign Language Recognition System to aid Deaf-dumb People”, IEEE, pg: 737- 742,2011. [9] Sharmila Gaekwad, Akanksha Shetty, Akshaya Satam, Mihir Rathod, Pooja Shah(2019), “Recognition of American Sign Language using Image processing and Machine Learning”, IJCSMC, pg: 352-357,2019. 6. CONCLUSIONS AND FUTURE SCOPE