Real Time Sign Language Detection
Real Time Sign Language Detection
https://doi.org/10.22214/ijraset.2022.42961
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue V May 2022- Available at www.ijraset.com
Abstract: Communication is the method of sharing or exchanging information, ideas or feelings. To have a communication
between two people, both of them need to have knowledge and understanding of a common language. But in the case of deaf and
dumb people, the means they use for communicating is different from that of normal people. Deaf is not able to hear and dumb
is not able to speak.
They communicate using sign language among themselves and with normal people but normal people don’t take seriously the
importance of sign languages. Not every-one has the knowledge and understanding of sign language which make the
communication difficult between a normal person and a deaf and dumb person. For overcoming this barrier,a model can be
build based on machine learning.
A model can be trained to recognize different gestures of sign language and translate them into English language. This will help
a lot of people in communicating with deaf and dumb people with ease. A real time ML based system is built for the real time
sign language detection with TensorFlow object detection in this paper. The major purpose of this project is to build a system for
the differently abled people to communicate with others easily and efficiently.
Keywords: Object detection, sign language, Deep learning, SSD(Single-shot Detector), TensorFlow
I. INTRODUCTION
Communication can be defined as the process of transferring information from one place, person, or group to another place,person
or group. It consists of three components: the speaker, the message that is to be communicated, and the listener. Communication can
be considered successful only when whatever message the speaker is trying to convey is received and understood by the listener.
In this paper a real time ML based system is built for the real time sign language detection with TensorFlow object detection in this
paper. This model is using SSD ML algorithm, by recognizing the signs as words instead of old traditional translators, that are very
slow and take too much since every alphabet has to be recognized to form the whole sentence in old methods.
TensorFlow object detection API is a powerful library that can enable anyone to build and deploy powerful image recognition
system. This object detection includes recognizing the objects and classifying the objects and then localizing those objects and
attract the bounding boxes surrounding them.
TensorFlow has a broad range of attention and use in the field of machine learning globally, because the second generation is
learning the system of google. TensorFlow has the advantage of elasticity and high attainability . Deep learning is the base of the
object detection algorithm which is also very convenient to execute through TensorFlow, and the hardware environment needs are
reasonable, which is very suitable for the research of this paper.
This research Paper focuses on a real time ML based system that is built for the real time sign language detection with TensorFlow
object detection. This model is using SSD ML algorithm, by recognizing the signs as words instead of old traditional translators,
that are very slow and take too much since every alphabet has to be recognized to form the whole sentence in old methods.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3522
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue V May 2022- Available at www.ijraset.com
4) Krizhevsky released AlexaNet which used deep learning in various aspects for object detection in 2012.
5) Girshick et al. showed the advantages of neural networks for detecting objects naming R- CNN. But, it needed huge datasets
and the datasets available at that time were very low. They used ImageNet 2012 dataset to pre-train the system which solved the
scarcity problem of dataset. Garshick proposed a faster object detection algorithm in 2015 called Fast R-CNN in which, image
is inputted first to a single CNN with many convolutional layers that generates a convolution feature map. The advantage of this
was that it trained the entire image with only one CNN other than training images with multiple CNNs for all the region of the
image.
6) The SSD model was first adopted for hand detection with proposed model based IsoGD dataset, which gave the accuracy of
4.25% . Real-time object detection came in view in 2016 in testing images and two algorithms were proposed naming YOLO
and SSD. YOLO used CNN to reduce spatial dimension detection box and performed linear regression and made boundary box
predictions. While in SSD, the size of the box that detects is usually fixed and used for simultaneous size detection. SSD
advantage is simultaneous detection of objects with different sizes.
System Architecture
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3523
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue V May 2022- Available at www.ijraset.com
For data acquisition, images have been captured by webcam of laptop using Python and OpenCV. OpenCV has functions which
are primarily aimed at the real-time computer vision. It escalates the use of machine perception in the commercial products and
provides a common infrastructure for the applications based on computer visions. The OpenCV library has more than 2400 efficient
computer vision and machine learning algorithms. These algorithms could be used for face detections and recognitions, object
identification, classification of human actions, camera tracking and object movements,and many more. Once all of the images have
been captured by the webcam, they are then one by one labelled using the LabelImg software. LabelImg is a free open-source tool
that is for graphically labelling images. When the labelled image is saved, its XML file is created. These XML files have all the
details of the images and also the details of the labelled portion. Once labelling of all the images is done, their XML files are
available. This is used for creating the TensorFlow records(TF records). All the images and their XML files are then divided into
training data and validation data in the ratio of 80:20.
The data samples are collected for 4 words and one sentence . The data samples are recorded by us, using a digital camera.
1) Hello
2) Thanks
3) I Love you
4) How
5) No.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3524
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue V May 2022- Available at www.ijraset.com
V. FUTURE SCOPE
In future, the dataset that has been used can be enlarged so that the system can recognize more gestures. The model which is
Tensorflow model that has been used can be replaced with another model as well. The same system can be implemented for
different sign languages by substituting the dataset.
VI. ACKNOWLEDGEMENT
We express our deepest gratitude and heartfelt thanks to our mentor, DR.LOKESH JAIN(Information Technology Department), for
his expert guidance, constant encouragement, constructive criticism, and inspiring advice throughout the completion of this report.
REFERENCES
[1] M. Van den Bergh and L. Van Gool, "Combining RGB and ToF cameras for real-time 3D hand gesture interaction," 2011 IEEE Workshop on Applications of
Computer Vision (WACV), 2011, pp. 66-72
[2] J. R. Balbin et al., "Sign language word translator using Neural Networks for the Aurally Impaired as a tool for communication," 2016 6th IEEE International
Conference on Control System, Computing and Engineering (ICCSCE), 2016, pp. 425-429
[3] Raghunandan, Apoorva, Pakala Raghav, and HV Ravish Aradhya. "Object Detection Algorithms for video surveillance applications." In 2018 International
Conference on Communication and Signal Processing (ICCSP), pp. 0563- 0568. IEEE, 2018
[4] Alex Krizhevsky, Ilya Sutskever, and Geo_rey Hinton. Imagenet classi_cation with deep convolutional neural networks. In Advances in Neural Information
Processing Systems, pages 1097-1105, 2012.
[5] Girshick, Ross, Jeff Donahue, Trevor Darrell, and Jitendra Malik. "Rich feature hierarchies for accurate object detection and semantic segmentation." In
Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587. 2014
[6] Wu W, Dasgupta D, Ramirez E, et al. Classification accuracies of physical activities using smartphone motion sensors[J]. Journal of medical Internet
research,2012,14(5): el30-el30
[7] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//International Conference on Neural Information
Processing Systems. Curran Associates Inc.2012:1097-1105.
[8] D. Mart, Sign Language Translator Using Microsoft Kinect XBOX 360 TM, 2012, pp. 1-76.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 3525