Report-1
Report-1
CHAPTER 1
INTRODUCTION
Sign language serves as a vital means of communication for individuals with hearing and speech
impairments, allowing them to express thoughts, emotions, and ideas through hand gestures,
facial expressions, and body movements. Unlike spoken languages, sign language relies on
visual cues, making it unique in its grammar, structure, and execution. Each gesture or
movement conveys specific meanings, ranging from simple words to complex sentences.
However, the lack of universal understanding of sign language has created a communication
divide, often leading to social isolation and limited access to essential services for its users.
Despite its cultural and linguistic richness, sign language remains underappreciated in many
societies. The absence of widespread adoption or comprehension, especially among non-signing
individuals, presents significant barriers to inclusion. This gap in understanding highlights an
urgent need for technological innovations that can bridge this divide and enable seamless
communication between sign language users and non-users.
Sign language detection using deep learning represents a groundbreaking approach to bridging
communication gaps for individuals with hearing and speech impairments. This innovative
system leverages advanced machine learning techniques to recognize and translate sign language
gestures into spoken or written formats, fostering accessibility and inclusivity. The technology
focuses on the real-time interpretation of gestures, enabling seamless communication between
sign language users and non-users.
Key Components
1. Gesture Recognition:
Deep learning models, such as Convolutional Neural Networks (CNNs) and Long Short-
Term Memory (LSTM) networks, are utilized for recognizing both static and dynamic
gestures. CNNs excel in extracting spatial features from gesture images, while LSTMs
handle temporal sequences, enabling the system to interpret continuous gestures
effectively.
2. Custom Dataset Creation:
The system often relies on custom-built datasets, tailored to capture the unique nuances
of specific sign languages like These datasets include diverse gestures performed in
various contexts to ensure robust model training and accurate predictions.
3. User-Friendly Interface:
The system is designed with a simple and intuitive interface, allowing users to input
gestures through a webcam or upload images. The recognized gestures are displayed in
text format, accompanied by audio feedback for better accessibility.
Applications
The application of deep learning in sign language detection has immense potential across various
fields:
CHAPTER 2
LITERATURE SURVEY
Paper Title Year Methods Advantages Limitations
Gesture 2024 MediaPipe for Real-time Small gesture
Recognition in preprocessing, recognition, vocabulary (8
Indian Sign LSTM for achieved 91% gestures),
Language Using sequential accuracy for reliance on
Deep Learning gesture eight gestures MediaPipe,
Approach. [1] recognition which struggles
under occlusion
or challenging
environments.
Indian Sign 2024 CNN for ISL High accuracy Limited to
Language alphabets, for static gesture alphabet
Recognition for trained on self- detection, recognition,
created dataset
Hearing computational dataset diversity
with 1300
Impaired: A images per efficiency restricted, CNN
Deep Learning alphabet less effective for
Based Approach. dynamic
[2] gestures.
Deep Learning 2023 MobileNet for High accuracy Limited gesture
Based Real- static gestures, for both static vocabulary,
Time Indian CNN-LSTM for (90.3%) and inability to
Sign Language dynamic dynamic recognize
Recognition. [3] gestures gestures (87%), overlapping
real-time hand gestures.
recognition
capability
An Efficient 2023 CNN, High accuracy Limited
Real-Time MediaPipe, and for 35 ISL vocabulary (26
Indian Sign OpenCV for gestures, alphabets, 9
gesture detection integration with numerals),
Language (ISL)
accessibility challenges with
Detection Using features like varying lighting
Deep Learning. text-to-speech conditions and
[4] overlapping
gestures.
Drawback of existing systems is their limited vocabulary. Most systems focus on predefined
datasets with alphabets or numerals, neglecting more complex or conversational gestures.
Additionally, hardware-based solutions using wearable sensors or gloves offer high precision but
are impractical for widespread use due to cost and inconvenience. While vision-based
approaches are more accessible, they often rely on controlled environments for optimal
performance. Accessibility features like text-to-speech or audio feedback are integrated into only
a few systems, further restricting their usability for non-signers.These limitations highlight the
need for a more robust system capable of recognizing a broader range of gestures, handling real-
world conditions, and providing intuitive interfaces for communication.
Disadvantages:
Existing systems often misclassify gestures due to poor lighting, cluttered backgrounds,
or overlapping gestures, leading to false positives and negatives.
2. Scalability Issues:
Most systems are designed for specific datasets with limited vocabularies, such as
alphabets or numerals. They lack the scalability to recognize a diverse range of gestures
or expand to complex vocabularies.
3. Dataset Limitations:
The datasets used for training and testing are often small, biased, or lack diversity in
terms of gesture types, signers, and environmental variations.
Gestures with similar hand movements or overlapping hands are hard to differentiate,
leading to misclassification. Additionally, fast-moving gestures are not captured
accurately.
5. Lack of Robustness:
Most systems are not robust enough to handle occlusions, such as when a part of the hand
is blocked, or to function well in crowded or noisy backgrounds.
6. Accessibility Features:
Existing systems rarely include user-friendly accessibility features like real-time text-to-
speech or audio feedback, limiting their utility for broader audiences.
The proposed system aims to address the gaps in existing ISL recognition solutions by
leveraging deep learning technologies like CNNs and advanced computer vision tools such as
OpenCV and MediaPipe. Unlike many existing systems, this project prioritizes both static
gesture recognition, enabling users to communicate using natural, real-time sign language
gestures.
A unique feature of this system is its reliance on a custom dataset, curated specifically to
address the nuances of ISL. This dataset will include alphabets, numerals, and commonly used
words or phrases to ensure broader applicability. Unlike other approaches that depend on generic
datasets, this customization improves recognition accuracy and relevance. The incorporation of
preprocessing techniques, such as background subtraction, normalization, and augmentation,
further enhances the robustness of the model.
The system architecture integrates MediaPipe for real-time hand tracking and gesture
segmentation. MediaPipe's ability to detect hand landmarks ensures precise gesture
identification, while OpenCV aids in feature extraction through contour and motion analysis. The
CNN serves as the backbone for classifying gestures with high accuracy, and LSTM layers may
be added to handle temporal dynamics in videos.
The system also includes accessibility features such as text-to-speech and visual feedback,
making it highly user-friendly. By integrating these features, the proposed system not only
facilitates communication for the deaf and hard-of-hearing but also creates a bridge for
interaction with non-signers in various settings.
1. User-Friendly Interface:
o The system employs Streamlit to provide an intuitive and accessible graphical
interface for interacting with the modules.
o Users can navigate between two main functionalities: face database management
and static sign language recognition.
4. Gesture Classification:
o A pre-trained deep learning model (sign_language_model.h5) is used for predicting
static hand gestures.
o Supports predefined classes such as "Hello," "Thank You," "Yes," "No," and
more.
7. Technology Stack:
o Combines OpenCV for image processing, TensorFlow/Keras for deep learning-
based gesture classification, and Streamlit for the front-end interface.
o Utilizes the CVZone library for enhanced hand detection and manipulation.
8. Efficient Workflow:
o Processes images end-to-end, from user input to hand detection, preprocessing,
and final gesture classification.
o Ensures seamless interaction between modules to deliver accurate results in real
time.
9. Practical Application:
o Aimed at providing an accessible communication tool for individuals with hearing
or speech impairments.
o Can serve as a foundation for more advanced assistive technologies.
Sign language is a manual type of communication commonly used by deaf and mute people. It is
not a universal language, so many deaf/mute people from different regions speak different sign
languages. So, this project aims to improve the communication between deaf/mute people from
different areas and those who cannot understand sign language. We are using deep learning
methods which can improve the classification accuracy of sign language gestures.
2.4 Objectives
3. Accessibility Features:
o Integrate features such as text-to-speech, audio feedback, and visual outputs to
enhance usability for both signers and non-signers.
o Design an intuitive user interface that facilitates interaction through webcam
inputs and displays results in real-time.
CHAPTER 3
SYSTEM REQUIREMENTS
1. Hardware Requirements
2. Software Requirements
3. Functional Requirements
4. Non-Functional Requirements
Performance:
o Process images within 3 seconds; real-time gesture detection within 1 second.
o Support at least 5 gestures per second for live inputs.
Reliability:
o 99% uptime and fault-tolerant to minor hardware/software issues.
Usability:
o Intuitive GUI, accessible for disabled users, and supports multiple languages.
Security:
o Encrypt data transmission, secure image storage, and user authentication.
Maintainability:
o Modular code, version-controlled dependencies, and comprehensive
documentation.
Portability:
Chapter 4
SYSTEM DESIGN
and not user friendly. In vision-based methods, the computer webcam is the input device
for observing the information of hands and/or fingers. The Vision Based methods require
only a camera, thus realizing a natural interaction between humans and computers without
the use of any extra devices, thereby reducing costs. The main challenge of vision-based
hand detection ranges from coping with the large variability of the human hand’s
appearance due to a huge number of hand movements, to different skin-color possibilities
as well as to the variations in viewpoints, scales, and speed of the camera capturing the
scene.
Figure 4.2: Stop gesture
In this approach for hand detection, firstly we detect hand from image that is acquired by
webcam and for detecting a hand we used mediapipe library which is used for image
processing. So, after finding the hand from image we get the region of interest (Roi) then
we cropped that image and convert the image to gray image using OpenCV library after
we applied the gaussian blur .The filter can be easily applied using open computer vision
library also known as OpenCV. Then we converted the gray image to binary image using
threshold and Adaptive threshold methods.
in this method there are many loop holes like your hand must be ahead of clean soft background
and that is in proper lightning condition then only this method will give good accurate results but
in real world we dont get good background everywhere and we don’t get good lightning
conditions too.
So to overcome this situation we try different approaches then we reached at one interesting
solution in which firstly we detect hand from frame using mediapipe and get the hand landmarks
of hand present in that image then we draw and connect those landmarks in simple white image.
Now we will get this landmark points and draw it in plain white background using opencv library
By doing this we tackle the situation of background and lightning conditions because the
mediapipe library will give us landmark points in any background and mostly in any lightning
conditions.
The neurons in a layer will only be connected to a small region of the layer (window size) before
it, instead of all of the neurons in a fully-connected manner.
Moreover, the final output layer would have dimensions (number of classes), because by the end
of the CNN architecture we will reduce the full image into a single vector of class scores.
1. Convolutional Layer:
In convolution layer I have taken a small window size [typically of length 5*5] that extends to
the depth of the input matrix.
The layer consists of learnable filters of window size. During every iteration I slid the window
by stride size [typically 1], and compute the dot product of filter entries and input values at a
given position.
As I continue this process well create a 2-Dimensional activation matrix that gives the response
of that matrix at every spatial position.
That is, the network will learn filters that activate when they see some type of visual feature such
as an edge of some orientation or a blotch of some colour.
2. Pooling Layer:
We use pooling layer to decrease the size of activation matrix and ultimately reduce the learnable
parameters.
a. Max Pooling:
In max pooling we take a window size [for example window of size 2*2], and only taken the
maximum of 4 values.
Well lid this window and continue this process, so well finally get an activation matrix half of its
original Size.
b. Average Pooling:
In average pooling we take average of all Values in a window.
Chapter 5
IMPLEMENTATION
Step 1: Project Overview
Detecting Signs:
Step 6: Testing
1. Face Database:
o Test by adding images with various names to ensure proper storage and retrieval.
2. Sign Language Detection:
o Test with diverse images of hand gestures to validate accuracy.
o Ensure the hand detector identifies and crops hands correctly.
Step 7: Deployment
CHAPTER 6
TESTING
1. Unit Testing
Objective:
Tools:
Steps:
4. Expected Outcomes:
o Each function behaves correctly with valid inputs.
o Graceful error handling for invalid inputs
2. Integration Testing
Objective:
Tools:
Steps:
4. Expected Outcomes:
o Modules interact seamlessly and produce correct outputs.
o Errors are communicated effectively to the user.
Tools:
Steps:
3. Collect Feedback:
o Record user responses on the application’s accuracy, ease of navigation, and any
challenges faced.
o Note areas for improvement, such as unclear messages or slow performance.
5. Expected Outcomes:
o Users find the application intuitive and efficient.
o Feedback highlights specific improvements for future iterations.
4. Automated Testing
Objective:
Automate repetitive test scenarios to ensure consistent validation and reduce manual
effort.
Quickly identify regressions or issues after updates.
Tools:
Steps:
4. Expected Outcomes:
o Automated tests reliably detect issues in the system.
o Manual effort is reduced significantly, and consistent results are ensured.
Results
OUTPUTS
CONCLUSION
The project's innovative use of Streamlit for interface design ensures a seamless user experience.
It lays the foundation for future developments in gesture-based applications, fostering inclusivity
and enhancing communication in a diverse world.
FUTURE WORK
REFERENCES
[1] Vashisth, H.K., Tarafder, T., Aziz, R. and Arora, M., 2024. Hand Gesture Recognition in
Indian Sign Language Using Deep Learning. Engineering Proceedings, 59(1), p.96.IEEE.
[2] Kolikipogu, R., Mammai, S., Nisha, K., Krishna, T.S., Kuchipudi, R. and Sureddi, R.K.,
2024, March. Indian Sign Language Recognition for Hearing Impaired: A Deep Learning based
approach. In 2024 3rd International Conference for Innovation in Technology (INOCON) (pp. 1-
7). IEEE.
[3] Likhar, P., Bhagat, N.K. and Rathna, G.N., 2020, November. Deep learning methods for
indian sign language recognition. In 2020 IEEE 10th International Conference on Consumer
Electronics (ICCE-Berlin) (pp. 1-6). IEEE.
[4] Surya, B., Krishna, N.S., SankarReddy, A.S., Prudhvi, B.V., Neeraj, P. and Deepthi, V.H.,
2023, May. An Efficient Real-Time Indian Sign Language (ISL) Detection using Deep Learning.
In 2023 7th International Conference on Intelligent Computing and Control Systems
(ICICCS) (pp. 430-435). IEEE.
[5] Wadhawan, A. and Kumar, P., 2020. Deep learning-based sign language recognition system
for static signs. Neural computing and applications, 32(12), pp.7957-7968.
[6] Tolentino, L.K.S., Juan, R.S., Thio-ac, A.C., Pamahoy, M.A.B., Forteza, J.R.R. and Garcia,
X.J.O., 2019. Static sign language recognition using deep learning. International Journal of
Machine Learning and Computing, 9(6), pp.821-827.
[7] Al-Qurishi, M., Khalid, T. and Souissi, R., 2021. Deep learning for sign language
recognition: Current techniques, benchmarks, and open issues. IEEE Access, 9, pp.126917-
126951.
[8] Bauer, B. and Hienz, H., 2000, March. Relevant features for video-based continuous sign
language recognition. In Proceedings Fourth IEEE International Conference on Automatic Face
and Gesture Recognition (Cat. No. PR00580) (pp. 440-445). IEEE.
[9] Cui, R., Liu, H. and Zhang, C., 2019. A deep neural framework for continuous sign language
recognition by iterative training. IEEE Transactions on Multimedia, 21(7), pp.1880-1891.
[10] Richards, T., 2021. Getting Started with Streamlit for Data Science: Create and deploy
Streamlit web applications from scratch in Python. Packt Publishing Ltd.