0% found this document useful (0 votes)
14 views

Untitled-1-1-1 (3)

The project report details the development of an AI-based system for translating English to American Sign Language (ASL) using the Porter Stemming Algorithm, aimed at improving communication for individuals with hearing impairments. It includes a comprehensive dataset of ASL gestures and employs natural language processing techniques to enhance translation accuracy, achieving a reported accuracy of 94.51%. The project seeks to bridge communication gaps in various sectors including education, healthcare, and employment, promoting inclusivity and accessibility for the deaf community.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Untitled-1-1-1 (3)

The project report details the development of an AI-based system for translating English to American Sign Language (ASL) using the Porter Stemming Algorithm, aimed at improving communication for individuals with hearing impairments. It includes a comprehensive dataset of ASL gestures and employs natural language processing techniques to enhance translation accuracy, achieving a reported accuracy of 94.51%. The project seeks to bridge communication gaps in various sectors including education, healthcare, and employment, promoting inclusivity and accessibility for the deaf community.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 46

NATIONAL COLLEGE OF

ENGINEERING
(Affiliated to Tribhuvan University)
Talchhikhel, Lalitpur

A MINOR PROJECT FINAL REPORT ON

“English To Sign Language Translation Using Poter Stemming Algorithm”

Submitted by:

Ranjit Adhikari NCE078BCT031

Sarishma Neupane NCE078BCT038


Sujit Adhikari NCE078BCT044
Prayash Niraula NCE078BCT029

Submitted to
Department of Computer and Electronics Engineering,

Fagun, 2081
Letter of Approval
The undersigned certify that they have read, and recommended to the Institute of Engineering for
acceptance a project entitled “English to Sign Language Translation using Porter Stemming
Algorithm” submitted by Prayash Niraula, Ranjit Adhikari, Sarishma Neupane and Sujit Adhikari in
partial fulfillment of the requirements for the Bachelor’s Degree in Computer and Electronics
Engineering.

Er.Subash Panday
Project Supervisor/DHOD
Department of Computer and Electronics Engineering
National College of Engineering

Er.Suroj Burlakoti
HOD/Senior lecturer
Department of Computer and Electronics Engineering
National College of Engineering

External Examiner:
Er.
Senior lecturer
Department of Computer and Electronics Engineering
National College of Engineering

DATE OF APPROVAL: Chaitra,2081


COPYRIGHT

The authors have agreed that the Library, Department of Computer and Electronic Engineering ,
National College of Engineering may make this report freely available for inspection. Moreover, the
authors have agreed that permission for extensive copying of this project report of scholarly
purposes may be granted by the supervisors who supervised the project work recorded here in or in
their absence, by the Head of Department wherein the project report was done. It is understood that
recognition will be given to the authors of the project and Department of Computer and Electronics
Engineering, National College of Engineering, Institute of Engineering in any use of the material of
this report. Copying or publication or the other use of this report for financial gain without the
approval of the Department of Computer and Electronics Engineering, Institute of Engineering,
National College of Engineering and authors’ written permission as strictly prohibited.

Request for permission to copy or to make any other use of the material in this report in whole or in
part should be addressed to:

Head
Department of Computer and Electronics Engineering,
Institute of Engineering, National College of Engineering,
Lalitpur, Nepal
ACKNOWLEDGEMENT
This project is prepared in partial fulfillment of the requirement for the Bachelor’s degree in
Computer Engineering. Foremost, we would like to express our heartfelt thanks to Er. Subash
Panday, who was not only our supervisor but also our mentor and guide throughout this journey.
His enduring support, enlightening lectures, and motivating words have been pillars of this project.
His supervision has been indispensable, and for that, we are immensely grateful.

Our appreciation also extends to the Department of Computer and Electronics Engineering at the
National College of Engineering. We are thankful for the opportunity to engage in this collaborative
effort, which allowed us to apply our accumulated knowledge, work on an intensive project in our
final year, and gain invaluable teamwork experience.

We must acknowledge our friends, whose support was crucial, both directly and indirectly, to the
success of this project. Their involvement and insights were key to overcoming many challenges.

Finally, we are deeply grateful to our families, whose unwavering support and inspiration have been
our constant motivation. Their belief in our abilities has been a driving force behind our endeavors.

We are open to and would greatly appreciate any suggestions or criticisms that could help us
improve further.

Author
ABSTRACT
American Sign Language (ASL) plays a vital role in communication for people with disabilities
across whole world. Since English being spoken all around the world, it would be beneficial for the
people with hearing disabilities to communicate with American Sign Language. This visual language
relies on hand gestures and body postures. Recognizing that People who are deaf or hard of hearing
often face challenges in understanding spoken language this can take the English language as input
and show its corresponding sign language in the form of animation using Blender. This project
focusses on creating a comprehensive dataset of American sign language gestures and learning about
NLP techniques that can be implemented in machine learning models. This project is based on the
ASL datasets where special attention is given to the alphabet and also contains 91 words and 10
numbers whose corresponding sign language gestures will be displayed but the word other than that
will be displayed as letters. This project is employed using NLP techniques and the Porter Stemming
Algorithm (PSA) Which will help to cut the unnecessary words like “is”, “am”, “are” etc. The
project produces an accuracy of 94.51% . This project can be implemented in the industries like
groceries and factories where people can easily communicate with people with hearing impairment.
The project can be enhanced by using augmented and virtual reality for the better user experience
TABLE OF CONTENTS

LIST OF FIGURES iii

LIST OF ABBREVIATIONS iv

1 INTRODUCTION 1
1.1 BACKGROUND 1
1.2 PROBLEM STATEMENT 2
1.3 AIM AND OBJECTIVE 3
1.4 SCOPE 4

2 . LITERATURE REVIEW 5
3. METHODOLOGY
3.1. SYSTEM BLOCK DIAGRAM 8

DATASET WORDS 11

4. AALGORITHM TO BE USED
13

6. TOOLS AND TECHNOLOGY TO BE USED 17

7. EPILOGUE 18
EXPECTED OUTPUT 18
7.1. VALIDATION METHODS 19

8 . GANTT CHART 20

REFERENCES 21
LIST OF FIGURES

Figure 1: System block diagram 8


Figure 2: Porter Stemming Algorithm 14
Figure 3: Gantt chart 20
LIST OF ABBREVIATIONS

AI Artificial Intelligence

CNN Convolutional Neural Network

GHz Gigahertz

GPU Graphics Processing Unit

NLTK Natural Language Tool Kit

NLP Natural Language Processing

PC Personal Computer

POS Part-of-Speech

PSA Porter Stemming Algorithm


API Application Programming Interface
ML Machine Learning
IDE Integrated Development Environment
ORM Object-Relational Mapping
CSV Comma-Seperated Values
HTML HyperText Markup Language
FPS Frames Per Seconds
CSS Cascading Style Sheets
LSTM Long-Short Term Memory
1 INTRODUCTION

1.1 BACKGROUND

Speech-to-text recognition is a pivotal technology in the field of human-computer


interaction, enabling the transformation of spoken language into written text. This
technology has broad applications, including accessibility services, real-time
communication, and user interface design. Concurrently, the conversion of text into
corresponding gestures represents an advancement in assistive technologies, particularly
benefiting those with hearing impairments by translating written or spoken inputs into
sign language. These technologies leverage advanced natural language processing
algorithms and machine learning models to accurately interpret and convert language,
enhancing communication options for diverse user groups. Recent developments in deep
learning have significantly improved the accuracy and speed of these applications,
making real-time performance increasingly feasible.

Individuals with hard-to-hear face significant communication barriers, as spoken


languages are not understandable to them. For people with hearing impairment, sign
language provides a platform for coping with society. However, there is some hindrance
in the existing system, so that there is lack of effectiveness in existing “Sign Language
Translators”. To address these challenges, this project proposal aims to advance the field
of AI-based sign language detection and transforms the voice or English inputs in the
form of Sign language animation.

A key component of this project is the development of a comprehensive dataset of sign


language gestures. This dataset will be essential for training and evaluating AI models. It
will feature annotated sign language, covering a broad spectrum of gestures and
expressions. This thorough curation will allow the models to learn and accurately
recognize sign language patterns with a high degree of precision.[1]
The proposed project will employ Natural Language Processing (NLP) to interpret the text while to
interpret the voice the project will employ API. For quicker inference, the project will implement an
effective Graphics Processing Unit (GPUs). The project will have a dataset containing the words,
letters and numbers along with their sign language gesture videos in csv format. The dataset will also
contain synonyms of the words which maps to same sign language gesture video. This dataset will
be trained using the classification model Random Forest classifier. A Random Forest Classifier is a
robust ensemble learning technique primarily used for classification tasks. It operates by constructing multiple
decision trees during the training phase, using random samples of both data and features. Each tree in the
forest makes an independent prediction, and the final output is determined by a majority vote from all trees.
This method enhances the overall accuracy and reduces the risk of overfitting, making it more reliable than
using a single decision tree. Random Forests are versatile, able to process both numerical and categorical
data, and are commonly applied to classification, regression, and feature selection problems.

By drawing the required effort and placing the work properly, the proposed system aims to bring
noticeable change in society. The main motive of this project is to reduce the communication gap
between industry and individuals with hearing impairments.

By facilitating communication between deaf individuals and employers, it will promote better
understanding, effective collaboration, and equal opportunities in the workplace. Additionally,
the project’s impact extends to the healthcare sector. The AI-based system’s ability to interpret
given text into sign language gestures can enhance healthcare access and quality for deaf
individuals. This project has potential to make a huge impact on the lifestyle of the people with
hearing impairment as they can easily communicate with their colleagues at the workplace.

The goal is to empower individuals to express themselves effectively and engage fully in society,
promoting inclusivity and equal opportunities. This project proposal presents an innovative and
technologically advanced solution to overcome communication barriers faced by people with
hearing impairments. It aims to significantly advance sign language detection systems and
improve communication accessibility.
1.2 PROBLEM STATEMENT

People who are deaf or hard of hearing often face challenges in understanding spoken
language, especially in situations where lip-reading isn’t possible or when no interpreter is
available. While sign language is a vital way for the deaf community to communicate, not
everyone is familiar with it, making conversations difficult and sometimes leaving them
feeling isolated. This communication gap can create barriers in daily life, from simple
conversations to important interactions. To make communication more inclusive and
accessible, there is a need for a system that can translate spoken language into sign language,
helping bridge the gap and ensuring that everyone can connect and communicate effortlessly.
1.3 AIM AND OBJECTIVE
Aim: -

The aim of this project is to develop an AI-based English to sign language


translation system to enhance communication accessibility for individuals with
hearing impairments.

Objectives: -

• To create a comprehensive dataset of american sign language gestures.

• To learn about various NLP techniques that can be implemented on machine learning.
1.4 SCOPE
The project will impact various sectors, including education, healthcare and employment
by enhancing learning tools for deaf students, improving communication between
healthcare providers and deaf patients and providing training programs in the workplace.
It will also advance accessibility and inclusivity in digital content, offering subtitles and
real-time translations in sign language. Public services, customer service, retail and e-
commerce will benefit from accessible websites and communication tools.

Additionally, it will promote social integration, bridge communication gaps and raise
awareness about the needs and capabilities of the deaf and hard-of-hearing community,
fostering a more inclusive society. It will serve a pivotal role in the commercial sector as
it will help immensely in the active communication between the service providers and the
hard-of-hearing community. The project is finally seen to help deaf people by giving a
sense of inclusivity in society as a better understanding is maintained.

The project will help in the medical sectors for the understanding between medical
professionals and deaf people. The lack of effective communication has created a
problem in the nursing sector and understanding the needs which are urgently required,
especially in the medical sectors, and thus the project will help bridge the gap.
2 . LITERATURE REVIEW
2.1 Related Works

Sneha Prabhu, Sriraksha Shetty, Sushmitha P Suvarna, VindyaSanil, Dr.Jagadisha N.


(2022) [1], explores how sign languages can be recognized and translates it into text. The
study looks at how the photo frames of sign language are pre-processed and classified
using CNN. Their work shows how the model is constructed from these pre-processed
and classified photos and trained to get the final output. The proposed website recognizes
sign language and translates it into text with 93.27% accuracy with less computational
time.

Ezhumalai P, Raj Kumar M,Rahul A S, Vimalanathan V, Yuvaraj A. (2021) [2], focuses


on the topic of taking speech as input and translating them into sign language. This
system was developed for Indian sign language translation. The system was designed in a
way that if the word the user gives in the input as form of audio is not found in local
system, the system will search for the word in a sign language repository named ‘‘Indian
sign language portal’’. The system used web scrapping for playing the corresponding sign
language video sequence from Indian sign language portal. The execution time of system
was 28.94 seconds to convert speech to sign language.

Dr. Pallavi Chaudhari, Pranay Pathrabe, Umang Ghatbandhe, Sangita Mondal, and Sejal
Parmar [3], introduces a real- time approach for recognizing sign language gestures using
convolutional neural networks (CNNs). This system is designed to facilitate
communication between deaf and mute individuals and the public by enabling the
understanding and interpretation of sign language. The authors detail the architecture of
their CNN model, which involves processing the hand image through a filter and then
applying a classifier to predict the gesture class. Impressively, the model achieves a 98%
accuracy rate for recognizing the alphabet letters A-Z in sign language. Beyond gesture
detection, the
article explores the creation of a communication system for deaf individuals. This system
translates audio messages into corresponding sign language using predefined American
Sign Language images and videos. By incorporating this feature, the authors propose a
user-friendly human-computer interface that allows for seamless interaction between deaf
and hearing individuals.

Tanmay Petkar, Tanay Patil, Ashwini Wadhankar, Vaishnavi Chandore , Vaishnavi


Umate , Dhanshri Hingnekar (2022) [4], proposed a system which works in both ways:
Sign-language to Text conversion and Text to Sign- language Conversion. This proposed
system created their own dataset by recording and saving gestures through a laptop
camera or webcam with the help of OpenCV. This system used TensorFlow which helped
it in achieving accuracy of 90% and predicting the text accurately. Beside this, the system
created an Avatar using Blender 3D tool and animated the equivalent gestures for the
alphabets and words. In this work, they propose a new posture-guided pooling strategy to
extract features from 3D convolutional neutral networks in the context of world-class sign
language recognition. This system uses NLTK to translate the text input given by the user
into its equivalent gestures. JavaScript Web Speech API was used by this system to
generate output text from the input audio signal.[4]

Becky Sue Parton [5] examines the use of artificial intelligence (AI) in sign language
recognition and translation. The author introduces a cross-disciplinary approach that
merges AI techniques with insights from sign language linguistics. The study underscores
the difficulties encountered by the deaf and hard-of-hearing community in
communicating with non-sign language users and stresses the necessity of creating
precise and effective sign language recognition and translation systems. It reviews the
shortcomings of conventional methods and contends that AI can significantly address
these issues.[5]
Tewari, Soni Singh, Turlapati, and Bhuva's study Real-Time Sign Language Recognition
Framework (2021) [6], describes a two-way communication system intended to close the
gap between spoken and sign languages. The authors suggest a brand-new model that can
identify three characters and 26 alphabets with a remarkable 90.78% accuracy rate. The
design of the system guarantees effective real-time recognition, offering a reliable sign
language processing solution. The study admits its limitations in handling only a subset of
sign gestures, despite its success, and recommends that future studies look at deep
learning technique developments for increased recognition accuracy. By making a
substantial contribution to the fields of speech-to-sign translation and natural language
processing, this work opens the door to more inclusive and accessible communication
systems.

An inventive approach to translating English to Indian Sign Language (ISL) is examined


in the work [7] by Khawlar, Akhtar, Ansari, and Patil (2021). The study automates the
speech-to-sign translation process by using Google APIs with Natural Language
Processing (NLP) approaches. The authors concentrate on 3D avatar animation, which
improves sign representation's efficacy and clarity. With a 77% accuracy rate, the model
showed promise in enabling smooth communication between users of spoken and sign
language. The study does, however, draw attention to the system's present shortcoming of
only supporting a limited number of terms and makes recommendations for additional
improvements to increase its vocabulary and real-time application. This study emphasizes
how crucial it is to use 3D technology for precise and interesting sign language
translation.

Deep learning is used to transform Arabic text into Arabic Sign Language in the [8] by Jamil (2020).
The scientists created a model that improves sign identification accuracy by combining a number of
text-processing and computer vision approaches. The system's 87% accuracy rate in translating
Arabic text-to-sign sequences allows the deaf and hard-of-hearing Arabic-speaking people to
communicate more effectively. A strong framework is presented in the paper, but it also identifies
important issues with Arabic sign representation and stresses the necessity for more research to fully
handle the intricacy of sign variants among various Arabic dialects. This study makes a substantial
contribution to multilingual sign language translation, ensuring greater accessibility and inclusivity
for Arabic-speaking individuals with hearing impairments.

A highly dynamic method for converting spoken and textual input into fluid 3D avatar-based sign
movements is presented in [9] by Debasis Das Chakladar, Pradeep Kumar, Shubham Mandal, Partha
Pratim Roy, Masakar, Luwanda, and Byung-Gyu Kim (2021). By creating fluid and continuous sign
animations, this research aims to enhance the expressiveness and naturalness of sign communication
in contrast to conventional systems that depend on static gestures. In order to accomplish this, the
model makes use of 3D animation techniques that dynamically create signs in response to input text
and speech.To ensure precise and captivating visual representation, the authors describe a three-step
procedure that includes gesture extraction, 3D character modeling, and final animation synthesis.
This study highlights how 3D avatars can improve communication for individuals with hearing
impairments, offering a more natural and immersive translation system.

2.2 Related Theory

2.2.1 Machine learning

Durante the past few decades, machine learning (ML), which is an area in AI (artificial intelligence),
has made an impact on many industries and the life of an everyday person. In the simplest terms, it
allows a computer to reason without being told exactly how to. For instance, ML algorithms look at
various pieces of information and identify patterns to improve their performance in predefined goals
like classification, prediction, and clustering.

ML can perform variety of task, some of which include:

1. Classification: This is task of entailing distinguishing inputs, like recognizing


pictures of cats and dogs, and assigning them the relevant labels.

2. Prediction: This is the task of predicting a future value based on past data such as
stock values or weather conditions from the previous data.

3. Clustering: This refers to segregation of similar data points like segmenting clients,
by their purchasing behavior.

ML works behind the scenes to power many real life applications and products,
including self-driving vehicles, medical diagnosis, spam filters, recommendation
engines, and fraud detection tools.

2.2.2 Word2Vec
Word2Vec represents one of the techniques that had a high impact on NLP for learning vector
representation of words. It was developed by Tomas Mikolov and his team at Google 2013 [10]. The
basic concept is to map words onto dense, continuous vector spaces that capture the relations and
meanings between words, which are crucial in language understanding.

In Word2Vec, every word is represented as a vector in N-dimensional space where words with
similar meanings are positioned close to each other. The mapping allows the model to capture
significant relationships between words based on their context usage in large text corpora. For
example, in the case of synonyms, "elegant" is placed near to another word "beautiful" to
demonstrate such relation.

The model employs two main architectures that are used in producing the vectors: the Continuous
Bag of Words (CBOW) and the Skip-gram model. The CBOW model uses context words to predict
the target word. For example, with the context words: "the structure was", CBOW might choose the
word "elegant". On the other hand, the Skip-Gram model does the opposite, where it takes a target
word and predicts context words. From using "elegant" as target, Skip Gram will predict surrounding
words as "the structure was"

Word2Vec, during its training, depends on neural network to refine the word vectors, aiming to
maximize the accuracy of predicting the correct context or target words. This process ensures that
the vectors represent substantive relationships and similarities between words, thereby enhancing
their utility across various NLP applications, including text classification, sentiment analysis, and
machine translation. By situating words within a continuous vector space, Word2Vec provides a
robust framework for processing and interpreting natural language with precision.

2.2.4 Porter stemming algorithm


The Porter Stemming algorithm provides a basic approach to conflation that may
work well in practice. Natural Language Processing (NLP) helps the computer to
understand the natural human language. Porter Stemming is one of the Natural
Language Processing techniques. It is the famous stemming algorithm proposed in
1980. The Porter Stemmer algorithm is known for its speed and ease. It is mainly
used for data mining and to retrieve information. It produces better results than any
other stemming algorithms. It has a lower error rate. The system removes the
morphological and in flexional endings of the English words. The system uses
Porter stemming Algorithm to remove the commonly used suffixes and prefixes of
the words and find the root word or original word. For example, the Porter
stemming algorithm reduces the words “agrees”, “agreeable”, “agreement” to the
root word “agree”. Because of this stemming, we can reduce the time taken for
searching the sign language for the given word.[]
3. METHODOLOGY

3.1 SYSTEM BLOCK DIAGRAM

smock diagm

Figure 1: System block diagram


The sign language translation system starts by taking the text or audio as input form keyboard /
Microphone. Such voice input is passed to webkit speech recognition API which translates it into
text. Then the text undergoes segmentation to classify whether it is letter or word, followed by
preprocessing techniques to ensure that the obtained text is clean and clear. Then, the features
relevant to the sign language are extracted and fed to the Porter Stemming Algorithm which
processes the obtained text and finds the root words from the given text. Then, the obtained data are
fed to the recognition part where the Natural Language Processing algorithm converted them into
tokens using Natural Language Tool Kit. Then, the tokens are converted into Part-of-Speech tag
which are matched to the corresponding sign language gestures present in the model and the
corresponding animation video is displayed on screen.
3.2 Data Collection

The data collection process for the project involved several important steps. First, we watched
several videos of gestures of the American Sign Language. We noted down the details of it like hand
gestures and body positioning. We then studied the various signs for alphabet and numbers from
UpSkills Tutor and ASL Love. Then, we gathered total of 127 dataset contents containing 91 words,
26 alphabets and 10 numbers and their corresponding sign language gestures. In ASL, there are
synonyms of words which maps to same sign language gesture .So, to make sure our dataset was
diverse and reliable, we listed various synonyms for total of 16 words. Throughout the process, we
followed ethical guidelines, making sure to respect the rights and dignity of everyone involved.

Data was collected mainly in the form of videos. Sign language gesture for numbers, letters and
words were made using Blender tool, where an animated character was made and scripted to perform
various sign language gestures. Each sign language video was one to three seconds long. These
videos were then saved.
Fig 2: 26 Alphabets in ASL [11].

3.2 Method

3.2.1 Text/Audio:

The text or audio is fed as input to the proposed system. This is the first step for
executing the process of the system. The user either inputs the text or provides the
speech through microphone or any other external device connected expecting the
sign language videos.

Segmentation:
Once the user gives the text or when the text is recognized from the voice input,
the system performs segmentation to classify the words and the letters from the
obtained text. This step helps to isolate letters and extract the corresponding sign
language to be displayed through the video.

3.2.2 Preprocessing:

The segmented text undergoes preprocessing to ensure clean and clear data. In the
preprocessing part, the stop words such as “the”, “are”, etc. that are unlikely to
contribute to the meaning of the text for the sign language translation are omitted.

Also, preprocessing involves breaking the text into individual elements or tokens,
commonly words. Tokenization helps in analyzing text at the word level, which is
crucial for tasks like stemming.
3.2.3 Feature Extraction:

Next, features relevant to text recognition are extracted from the preprocessed text.
These features capture important information about the text's position and spatial
relationships with corresponding sign gestures.
It basically involves identifying the key attributes of the text that correspond to
sign language gestures. The feature extraction process is carried out by the porter
stemming algorithm.

Figure: Porter Stemming Algorithm


Recode Plurals

It converts plural words to singular forms. For example: "cats" → "cat"

Recode 'ed' and 'ing'

 IT removes past tense (-ed) and continuous tense (-ing) suffixes. For example:
"running" → "run", "played" → "play"

Recode 'y' to 'i'

 If a word ends in 'y' and is preceded by a consonant, the 'y' is converted to 'i'. For example:
"happy" → "happi", "cry" → "cri"

Recode Double Suffix to Simple Suffix

 It reduces double suffixes (complex endings) to simpler suffixes. For example:


"running" → "run", "happiness" → "happi"

6. Recode Remaining Simple Suffix

 It removes common suffixes to get the root word. For example: "hopeful" → "hope",
"argument" → "argue"

7. Stemmed Word (Output)

 After all transformations the stemmed word is finally produced. For example: "eaten" →
"eat",

3.2.4 Recognition:

The extracted features are then fed into the NLP algorithm and converted into tokens
using NLTK (Natural Language Tool Kit). POS tags are produced from the tokens and the tags
are used to match the exact sign gesture from the model. The obtained sign gesture is read, and
corresponding animation is played until the whole given sentence is interpreted by the
animation. The user can replay or pause the animation as per their need which helps in better
and more efficient communication with the person with hearing impairment.
3.2.5 Output as Video:

The recognized text or letters are displayed along with the sign language gestures
simultaneously one after the other. For example, if the user input is “Hello”, then
our system will recognize it and provide the corresponding sign language gesture of
the word “Hello”. Our system can only process the words present in our dataset. If
a user inputs a word that is not in our database, our system will generate the sign
language gestures for the word by spelling it out letter by letter. For example, if the
user types "Metabolism," our model will translate and display the sign language for
each letter of the word consecutively.
3.3 Vector Generation Using Word Embeddings:

To convert the words or letters present in the dataset into numerical features that could
be used by Random Forest Classifier, word embedding was applied. Pre-trained word
embedding, Word2Vec was used to represent each word or letter as a vector. These
embedding were chosen because they capture semantic meaning of words. This process
converted each word into a numerical vector which was then used as input in the
machine learning model.

3.4 Model Training : Random Forest Classifier:

The next step was to train a machine learning model using the vector data. Random
Forest Classifier was used as a classification model which help to make prediction of the
words in the respective categories.

The Random Forest Classifier was trained using the following steps:

1. Data Preprocessing:

 The dataset contains words and corresponding videos.

 The words were tokenized and converted into numerical representations using
Word2Vec.

 A Word2Vec model was trained with a vector size of 100 dimensions, creating word
embeddings.

2. Feature Extraction:

 Each word was mapped to its corresponding word vector (numerical representation).

 These word vectors served as features (X), while the words (or their labels) became the
target (y).

3. Train-Test Split:
The dataset was split into training (X_train, y_train) and testing (X_test,
y_test) sets.

4. Model Training:
a. A Random Forest Classifier was initialized with:
i. 1000 decision trees (n_estimators=1000).
ii. A fixed random state (random_state=42) for reproducibility.
b. The model was trained on word vector features using fit(X_train, y_train).

Performance Metrics
Performance metrics in machine learning provide quantitative methods to assess
various aspects of model performance such as effectiveness and efficiency. These
metrics can involve multiple dimensions, serving different purposes, and often
include categories related to precision, recall, F1 score, and others.

Confusion Matrix

The confusion matrix is a table used in machine learning to evaluate the


performance of a classification model. It compares the predicted classes of the
model with the actual classes in the dataset. The matrix has rows and columns
representing the actual and predicted classes, respectivelly, and contains four main
components:
Fig : Confusion Matrix []

TP (True Positive) is the number of instances correctly predicted as positive.


FP (False Positive) is the number of instances incorrectly predicted as positive.
TN (True Negative) is the number of instances correctly predicted as negative.
FN (False Negative) is the number of instances incorrectly predicted as negative.

Accuracy

Accuracy is a performance metric used in classification tasks to measure the overall


correctness of the model’s predictions. It represents the proportion of correctly classified
instances out of the total number of instances in the dataset.

Mathematically, accuracy is calculated using this formula:

𝑇𝑃 +𝑇𝑁
Accuracy =
𝑇𝑃+𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁

Where:
TP (True Positive) is the number of instances correctly predicted as positive.
FP (False Positive) is the number of instances incorrectly predicted as positive.
TN (True Negative) is the number of instances correctly predicted as negative.
FN (False Negative) is the number of instances incorrectly predicted as negative.

Recall

Recall, also known as sensitivity or true positive rate, is a performance metric used in binary
classification tasks. It measures the proportion of actual positive instances that are correctly
identified by the model.
Mathematically, recall is calculated using the formula:
In terms of the confusion matrix:

𝑇𝑃
Recall=
𝑇𝑃 + 𝐹𝑁

Where:
TP (True Positive) is the number of instances correctly predicted as positive.
FN (False Negative) is the number of instances incorrectly predicted as negative.

F1-Score

The F1 score is a performance metric commonly used in binary classification tasks,


Which considers both precision and recall to provide a balanced measure of a model’s
performance. It is the harmonic mean of precision and recall,emphasizing the balance
between the two metrics.

2 𝑇𝑃
F1-score =
2𝑇𝑃 + 𝐹𝑃+ 𝐹𝑁

Precision

Precision is a performance metric used in binary classification tasks that measure the
proportion of correctly predicted instances out of all instances predicted as positive
by the model.

𝑇𝑃
Precision=
𝑇𝑃 + 𝐹𝑃
Where:
TP (True Positive) is the number of instances correctly predicted as positive.
FP (False Positive) is the number of instances incorrectly predicted as positive.
DATASET
4 TOOLS USED

Python

Python is a high-level, interpreted programming language known for its clear syntax and
readability. We used python as it supports multiple programming paradigms including procedural,
object-oriented and functional programming. Python's extensive library and vast ecosystem of third-
party packages facilitate rapid development across many domains like web development, data
analysis, scientific computing and many more. It also uses dynamic typing, and a combination of
reference counting and a cycle-detecting garbage collector for memory management and also
features dynamic name resolution, which binds method and variable names during program
execution. The comprehensive standard library that python provides helps to build our system within
the effective cost. Python is open-source software which interpreters are available for many
operating systems.

HTML
Html is the building blocks of the web which tags are used to give structure to our system. We
have used html to give the content inside our web application. It guarantees that text and graphics are
properly formatted for our internet browser. A browser would not be able to display text as elements
or load videos or other components if HTML was not present. HTML also supplies the page’s
fundamental structure, which is overlaid with Cascading Style Sheets to customize the system’s
appearance.

CSS
CSS (cascading style sheets) is utilized to enhance the visual presentation of our web page by
defining the style of HTML structures. It allows us to modify attributes such as fonts, colors,
spacing, margins, and more, ensuring our web documents are both appealing and functional. For our
system, CSS has been implemented in both inline and external forms, enabling precise and global
styling across our web pages.

Visual Studio Code


Visual Studio Code is commonly described as a user-friendly code editor. It is an open-source tool
that supports developers by enabling code writing, offering debugging capabilities, and facilitating
code corrections. The editor makes coding more accessible and is often regarded as a blend of an
Integrated Development Environment (IDE) and a simple text editor, tough opinions vary among
users. Every software or application we interact with operates on code that functions behind the
scenes. Historically, coding was performed using basic text editors like Notepad, which provided
minimal functionality to programmers.
Django
Django is a high-level python web framework that serves as the backbone for managing data flow
and processing in our project. In our project Django would primarily be responsible for handling
user requests, such as receiving text or voice data, and orchestrating the server-side logic. This
includes providing speech recognition library and text processing modules like the Porter Stemming
algorithm to process and standardize the text input. It also includes an Object-Relational Mapping
(ORM) layer that abstracts the database operations.You can define models that represent the data
structure and Django will handle the database interactions for you. It supports various database
backends, such as SQLite.

Blender
Blender is a free and open-source 3D creation software widely used for modeling, animation,
simulation, rendering, compositing, motion tracking, and even game development. We utilized it to
create and animate avatars that represent the sign language translations. The avatars mimic the
gestures and movements associated with sign language, providing visual representation to aid in
communication.

Jupyter Notebook
For creation of model files, Jupyter Notebook was used. Jupyter Notebook is an open-source web
application that allows users to create and share documents that contain live code, equations,
visualizations, and narrative text.

Tenserflow

TensorFlow is an open-source machine learning framework developed by Google Brain. It provides


a comprehensive ecosystem for building, training, and deploying deep learning models, neural
networks, and AI applications. It is widely used in research and production for applications like
image recognition, natural language processing (NLP), speech recognition, and recommendation
systems. We used Tensorflow for training and evaluating the NLP model.
6. Result and Discussion
6.1 Data collection

At first, we learned sign language gestures from various sites such as Upskill Tutor, ASL Love etc.
Then, an animated character was created using the blender tool in which each node of fingers and
elbow was scripted and the sign language gesture for the various words, letters and numbers were
created in ".mp4" format. A total of 127 sign language gestures were created, numbers from 0-9, 26
alphabets, and 91 words. Each sign language gesture video was recorded at a rate of 30 frames per
second (fps), meaning that for every second of video, we had 30 individual frames. In ASL, some of
the words and their synonym have same sign language gesture .So, within 91 words there are 16
such words which synonym have same sign language gesture as they have. Within these 16 words,
one of them contain 5 synonyms, 3 words contain 4 synonyms each, 2 word contain 3 synonyms
each, 3 word contain 2 synonym each and 7 words contain one synonym each, making total number
of words in data set to 136. Overall we have collected 174 dataset containing 0-9 numbers
representing 10 different classes, a-z letters representing 26 different classes and 91 words along
with their synonym representing 91 different classes. This comprehensive dataset allowed us to train
and test our model effectively, ensuring accurate sign language video translation.

Fig : sample data for a-z letter


6.2 Model Performance Evaluation:

The model performance has been measured based on the following metrics:

Accuracy: The value 0.9451 indicates that our model correctly predicts the outcome approximately 94.51% .

Precision: it measures how many of the predicted positives are actually correct. For majority of the categories
whose precision is 1, this indicates every prediction of these categories were correct. However, for the
category of “again”, the precision score is 0.18 which means synonyms of “again” classified under it were
only 18% correct. The remaining 72% were misclassified.

Recall: It measures how many actual positives were correctly predicted. High recall means the model is
identifying most of the actual positive cases. But in case of “after”, its recall is 0.67, which indicates 33% of
the actual cases were missed.

F1-score: It indicates the harmonic mean of precision and recall, balancing their trade-offs. ”After” has an F1-
score of 0.80, showing moderate performance due to lower recall whereas “Again” has an F1-score of 0.31,
showing low performance due to lower precision.

Support: It indicates number of true instances of each class in the dataset. In the dataset “after” has a support
of 6 which means there are 5 other synonyms of “after” in the test set.
Fig : performance metrices of the model
Figure: Output of the system.

The final output of the system is the translation of a given text into sign language animation video . Upon
giving the input , the system processes it to translate into sign language . Subsequently , the system generates
the stemmed version of the word if it is in the dataset else it generates each letter of the word and display
those letters one by one . This seamless process enables users to translate sign language gestures effectively,
facilitating communication and understanding.
EPILOGUE

5.1 Conclusion

The minor project focused on developing a Sign Language translation system using Porter Stemming
Algorithm and other NLP techniques. The project successfully implemented the sign language translation
where the user provide text/voice input. Overall, the system demonstrated good performance in terms of
accuracy. This is one of the curtail steps in sign language translation system. This system helps normal and
abnormal people to communicate in efficient ways. Data collection is one of the crucial phase in development
of machine learning projects. However, future research could focus on improving the scalability of the system
by adding more words and their respective sign language videos.

5.2 Future Enhancement

This project is an initial step in reaching the effective solution for the daily concern.
This project can be extended in multiple ways in the future such as:

 Mobile app: Developing a mobile app to translate sign language translation system which
could make it more accessible to users who prefer using their smartphones or tablets.

 Increased Scalability: This project can be integrated with virtual and augmented reality systems,
which can provide more immersive and interactive sign language experiences for deaf and hard of
hearing individuals.

Overall, these enhancements would help to further improve the system, providing additional words
and their respective gestures in the system, which improves the effectiveness of the system.
REFERENCES

1. S. Prabhu, S. Shetty, S. P. Suvarna, V. Sanil, and J. N., "Sign


language Recognition Using machine learning,"2022.

2. E. P., R. K. M., R. A. S., V. V., and Y. A., "Speech To Sign Language


Translator For Hearing Impaired," 2021.

3. P. Chaudhari, P. Pathrabe, U. Ghatbandhe, S. Mondal, and S. Parmar,


"Sign Language Detection System," 2022.

4. T. Petkar, T. Patil, A. Wadhankar, V. Chandore, V. Umate, and D. Hingnekar,


"Real Time Sign Language Recognition System for Hearing and Speech
Impaired People," 2022.

5. B. S. Parton, "Sign Language Recognition and Translation: A


Multidisciplinary Approach from the Field of Artificial Intelligence," 2022.

6. "Deep Sign: Sign Language Detection and Recognition Using Deep


Learning," 2022.

7. Khawlar, Akhtar, Ansari, and Patil, “An Advancement in Speech-to-


Sign Language Translation using 3D Avatar Animation”
2021.

8. “Design and Implementation of an Intelligent System to translate Arabic text


to Arabic sign language” by Jamil 2020.

9. “3D Avatar Approach for Continuous Sign Movement Using Speech/Text”


by Debashis Das Chakladar, Pradeep Kumar, Shubham Mandal, Partha
Pratim Roy, Masakazu Iwamura and Byung-Gyu Kim 2021.

10. Tomas Mikolov. “Efficient estimation of word representations in vector


space.” 2013.
11. “Up Skill Tutor”.

You might also like