Untitled-1-1-1 (3)
Untitled-1-1-1 (3)
ENGINEERING
(Affiliated to Tribhuvan University)
Talchhikhel, Lalitpur
Submitted by:
Submitted to
Department of Computer and Electronics Engineering,
Fagun, 2081
Letter of Approval
The undersigned certify that they have read, and recommended to the Institute of Engineering for
acceptance a project entitled “English to Sign Language Translation using Porter Stemming
Algorithm” submitted by Prayash Niraula, Ranjit Adhikari, Sarishma Neupane and Sujit Adhikari in
partial fulfillment of the requirements for the Bachelor’s Degree in Computer and Electronics
Engineering.
Er.Subash Panday
Project Supervisor/DHOD
Department of Computer and Electronics Engineering
National College of Engineering
Er.Suroj Burlakoti
HOD/Senior lecturer
Department of Computer and Electronics Engineering
National College of Engineering
External Examiner:
Er.
Senior lecturer
Department of Computer and Electronics Engineering
National College of Engineering
The authors have agreed that the Library, Department of Computer and Electronic Engineering ,
National College of Engineering may make this report freely available for inspection. Moreover, the
authors have agreed that permission for extensive copying of this project report of scholarly
purposes may be granted by the supervisors who supervised the project work recorded here in or in
their absence, by the Head of Department wherein the project report was done. It is understood that
recognition will be given to the authors of the project and Department of Computer and Electronics
Engineering, National College of Engineering, Institute of Engineering in any use of the material of
this report. Copying or publication or the other use of this report for financial gain without the
approval of the Department of Computer and Electronics Engineering, Institute of Engineering,
National College of Engineering and authors’ written permission as strictly prohibited.
Request for permission to copy or to make any other use of the material in this report in whole or in
part should be addressed to:
Head
Department of Computer and Electronics Engineering,
Institute of Engineering, National College of Engineering,
Lalitpur, Nepal
ACKNOWLEDGEMENT
This project is prepared in partial fulfillment of the requirement for the Bachelor’s degree in
Computer Engineering. Foremost, we would like to express our heartfelt thanks to Er. Subash
Panday, who was not only our supervisor but also our mentor and guide throughout this journey.
His enduring support, enlightening lectures, and motivating words have been pillars of this project.
His supervision has been indispensable, and for that, we are immensely grateful.
Our appreciation also extends to the Department of Computer and Electronics Engineering at the
National College of Engineering. We are thankful for the opportunity to engage in this collaborative
effort, which allowed us to apply our accumulated knowledge, work on an intensive project in our
final year, and gain invaluable teamwork experience.
We must acknowledge our friends, whose support was crucial, both directly and indirectly, to the
success of this project. Their involvement and insights were key to overcoming many challenges.
Finally, we are deeply grateful to our families, whose unwavering support and inspiration have been
our constant motivation. Their belief in our abilities has been a driving force behind our endeavors.
We are open to and would greatly appreciate any suggestions or criticisms that could help us
improve further.
Author
ABSTRACT
American Sign Language (ASL) plays a vital role in communication for people with disabilities
across whole world. Since English being spoken all around the world, it would be beneficial for the
people with hearing disabilities to communicate with American Sign Language. This visual language
relies on hand gestures and body postures. Recognizing that People who are deaf or hard of hearing
often face challenges in understanding spoken language this can take the English language as input
and show its corresponding sign language in the form of animation using Blender. This project
focusses on creating a comprehensive dataset of American sign language gestures and learning about
NLP techniques that can be implemented in machine learning models. This project is based on the
ASL datasets where special attention is given to the alphabet and also contains 91 words and 10
numbers whose corresponding sign language gestures will be displayed but the word other than that
will be displayed as letters. This project is employed using NLP techniques and the Porter Stemming
Algorithm (PSA) Which will help to cut the unnecessary words like “is”, “am”, “are” etc. The
project produces an accuracy of 94.51% . This project can be implemented in the industries like
groceries and factories where people can easily communicate with people with hearing impairment.
The project can be enhanced by using augmented and virtual reality for the better user experience
TABLE OF CONTENTS
LIST OF ABBREVIATIONS iv
1 INTRODUCTION 1
1.1 BACKGROUND 1
1.2 PROBLEM STATEMENT 2
1.3 AIM AND OBJECTIVE 3
1.4 SCOPE 4
2 . LITERATURE REVIEW 5
3. METHODOLOGY
3.1. SYSTEM BLOCK DIAGRAM 8
DATASET WORDS 11
4. AALGORITHM TO BE USED
13
7. EPILOGUE 18
EXPECTED OUTPUT 18
7.1. VALIDATION METHODS 19
8 . GANTT CHART 20
REFERENCES 21
LIST OF FIGURES
AI Artificial Intelligence
GHz Gigahertz
PC Personal Computer
POS Part-of-Speech
1.1 BACKGROUND
By drawing the required effort and placing the work properly, the proposed system aims to bring
noticeable change in society. The main motive of this project is to reduce the communication gap
between industry and individuals with hearing impairments.
By facilitating communication between deaf individuals and employers, it will promote better
understanding, effective collaboration, and equal opportunities in the workplace. Additionally,
the project’s impact extends to the healthcare sector. The AI-based system’s ability to interpret
given text into sign language gestures can enhance healthcare access and quality for deaf
individuals. This project has potential to make a huge impact on the lifestyle of the people with
hearing impairment as they can easily communicate with their colleagues at the workplace.
The goal is to empower individuals to express themselves effectively and engage fully in society,
promoting inclusivity and equal opportunities. This project proposal presents an innovative and
technologically advanced solution to overcome communication barriers faced by people with
hearing impairments. It aims to significantly advance sign language detection systems and
improve communication accessibility.
1.2 PROBLEM STATEMENT
People who are deaf or hard of hearing often face challenges in understanding spoken
language, especially in situations where lip-reading isn’t possible or when no interpreter is
available. While sign language is a vital way for the deaf community to communicate, not
everyone is familiar with it, making conversations difficult and sometimes leaving them
feeling isolated. This communication gap can create barriers in daily life, from simple
conversations to important interactions. To make communication more inclusive and
accessible, there is a need for a system that can translate spoken language into sign language,
helping bridge the gap and ensuring that everyone can connect and communicate effortlessly.
1.3 AIM AND OBJECTIVE
Aim: -
Objectives: -
• To learn about various NLP techniques that can be implemented on machine learning.
1.4 SCOPE
The project will impact various sectors, including education, healthcare and employment
by enhancing learning tools for deaf students, improving communication between
healthcare providers and deaf patients and providing training programs in the workplace.
It will also advance accessibility and inclusivity in digital content, offering subtitles and
real-time translations in sign language. Public services, customer service, retail and e-
commerce will benefit from accessible websites and communication tools.
Additionally, it will promote social integration, bridge communication gaps and raise
awareness about the needs and capabilities of the deaf and hard-of-hearing community,
fostering a more inclusive society. It will serve a pivotal role in the commercial sector as
it will help immensely in the active communication between the service providers and the
hard-of-hearing community. The project is finally seen to help deaf people by giving a
sense of inclusivity in society as a better understanding is maintained.
The project will help in the medical sectors for the understanding between medical
professionals and deaf people. The lack of effective communication has created a
problem in the nursing sector and understanding the needs which are urgently required,
especially in the medical sectors, and thus the project will help bridge the gap.
2 . LITERATURE REVIEW
2.1 Related Works
Dr. Pallavi Chaudhari, Pranay Pathrabe, Umang Ghatbandhe, Sangita Mondal, and Sejal
Parmar [3], introduces a real- time approach for recognizing sign language gestures using
convolutional neural networks (CNNs). This system is designed to facilitate
communication between deaf and mute individuals and the public by enabling the
understanding and interpretation of sign language. The authors detail the architecture of
their CNN model, which involves processing the hand image through a filter and then
applying a classifier to predict the gesture class. Impressively, the model achieves a 98%
accuracy rate for recognizing the alphabet letters A-Z in sign language. Beyond gesture
detection, the
article explores the creation of a communication system for deaf individuals. This system
translates audio messages into corresponding sign language using predefined American
Sign Language images and videos. By incorporating this feature, the authors propose a
user-friendly human-computer interface that allows for seamless interaction between deaf
and hearing individuals.
Becky Sue Parton [5] examines the use of artificial intelligence (AI) in sign language
recognition and translation. The author introduces a cross-disciplinary approach that
merges AI techniques with insights from sign language linguistics. The study underscores
the difficulties encountered by the deaf and hard-of-hearing community in
communicating with non-sign language users and stresses the necessity of creating
precise and effective sign language recognition and translation systems. It reviews the
shortcomings of conventional methods and contends that AI can significantly address
these issues.[5]
Tewari, Soni Singh, Turlapati, and Bhuva's study Real-Time Sign Language Recognition
Framework (2021) [6], describes a two-way communication system intended to close the
gap between spoken and sign languages. The authors suggest a brand-new model that can
identify three characters and 26 alphabets with a remarkable 90.78% accuracy rate. The
design of the system guarantees effective real-time recognition, offering a reliable sign
language processing solution. The study admits its limitations in handling only a subset of
sign gestures, despite its success, and recommends that future studies look at deep
learning technique developments for increased recognition accuracy. By making a
substantial contribution to the fields of speech-to-sign translation and natural language
processing, this work opens the door to more inclusive and accessible communication
systems.
Deep learning is used to transform Arabic text into Arabic Sign Language in the [8] by Jamil (2020).
The scientists created a model that improves sign identification accuracy by combining a number of
text-processing and computer vision approaches. The system's 87% accuracy rate in translating
Arabic text-to-sign sequences allows the deaf and hard-of-hearing Arabic-speaking people to
communicate more effectively. A strong framework is presented in the paper, but it also identifies
important issues with Arabic sign representation and stresses the necessity for more research to fully
handle the intricacy of sign variants among various Arabic dialects. This study makes a substantial
contribution to multilingual sign language translation, ensuring greater accessibility and inclusivity
for Arabic-speaking individuals with hearing impairments.
A highly dynamic method for converting spoken and textual input into fluid 3D avatar-based sign
movements is presented in [9] by Debasis Das Chakladar, Pradeep Kumar, Shubham Mandal, Partha
Pratim Roy, Masakar, Luwanda, and Byung-Gyu Kim (2021). By creating fluid and continuous sign
animations, this research aims to enhance the expressiveness and naturalness of sign communication
in contrast to conventional systems that depend on static gestures. In order to accomplish this, the
model makes use of 3D animation techniques that dynamically create signs in response to input text
and speech.To ensure precise and captivating visual representation, the authors describe a three-step
procedure that includes gesture extraction, 3D character modeling, and final animation synthesis.
This study highlights how 3D avatars can improve communication for individuals with hearing
impairments, offering a more natural and immersive translation system.
Durante the past few decades, machine learning (ML), which is an area in AI (artificial intelligence),
has made an impact on many industries and the life of an everyday person. In the simplest terms, it
allows a computer to reason without being told exactly how to. For instance, ML algorithms look at
various pieces of information and identify patterns to improve their performance in predefined goals
like classification, prediction, and clustering.
2. Prediction: This is the task of predicting a future value based on past data such as
stock values or weather conditions from the previous data.
3. Clustering: This refers to segregation of similar data points like segmenting clients,
by their purchasing behavior.
ML works behind the scenes to power many real life applications and products,
including self-driving vehicles, medical diagnosis, spam filters, recommendation
engines, and fraud detection tools.
2.2.2 Word2Vec
Word2Vec represents one of the techniques that had a high impact on NLP for learning vector
representation of words. It was developed by Tomas Mikolov and his team at Google 2013 [10]. The
basic concept is to map words onto dense, continuous vector spaces that capture the relations and
meanings between words, which are crucial in language understanding.
In Word2Vec, every word is represented as a vector in N-dimensional space where words with
similar meanings are positioned close to each other. The mapping allows the model to capture
significant relationships between words based on their context usage in large text corpora. For
example, in the case of synonyms, "elegant" is placed near to another word "beautiful" to
demonstrate such relation.
The model employs two main architectures that are used in producing the vectors: the Continuous
Bag of Words (CBOW) and the Skip-gram model. The CBOW model uses context words to predict
the target word. For example, with the context words: "the structure was", CBOW might choose the
word "elegant". On the other hand, the Skip-Gram model does the opposite, where it takes a target
word and predicts context words. From using "elegant" as target, Skip Gram will predict surrounding
words as "the structure was"
Word2Vec, during its training, depends on neural network to refine the word vectors, aiming to
maximize the accuracy of predicting the correct context or target words. This process ensures that
the vectors represent substantive relationships and similarities between words, thereby enhancing
their utility across various NLP applications, including text classification, sentiment analysis, and
machine translation. By situating words within a continuous vector space, Word2Vec provides a
robust framework for processing and interpreting natural language with precision.
smock diagm
The data collection process for the project involved several important steps. First, we watched
several videos of gestures of the American Sign Language. We noted down the details of it like hand
gestures and body positioning. We then studied the various signs for alphabet and numbers from
UpSkills Tutor and ASL Love. Then, we gathered total of 127 dataset contents containing 91 words,
26 alphabets and 10 numbers and their corresponding sign language gestures. In ASL, there are
synonyms of words which maps to same sign language gesture .So, to make sure our dataset was
diverse and reliable, we listed various synonyms for total of 16 words. Throughout the process, we
followed ethical guidelines, making sure to respect the rights and dignity of everyone involved.
Data was collected mainly in the form of videos. Sign language gesture for numbers, letters and
words were made using Blender tool, where an animated character was made and scripted to perform
various sign language gestures. Each sign language video was one to three seconds long. These
videos were then saved.
Fig 2: 26 Alphabets in ASL [11].
3.2 Method
3.2.1 Text/Audio:
The text or audio is fed as input to the proposed system. This is the first step for
executing the process of the system. The user either inputs the text or provides the
speech through microphone or any other external device connected expecting the
sign language videos.
Segmentation:
Once the user gives the text or when the text is recognized from the voice input,
the system performs segmentation to classify the words and the letters from the
obtained text. This step helps to isolate letters and extract the corresponding sign
language to be displayed through the video.
3.2.2 Preprocessing:
The segmented text undergoes preprocessing to ensure clean and clear data. In the
preprocessing part, the stop words such as “the”, “are”, etc. that are unlikely to
contribute to the meaning of the text for the sign language translation are omitted.
Also, preprocessing involves breaking the text into individual elements or tokens,
commonly words. Tokenization helps in analyzing text at the word level, which is
crucial for tasks like stemming.
3.2.3 Feature Extraction:
Next, features relevant to text recognition are extracted from the preprocessed text.
These features capture important information about the text's position and spatial
relationships with corresponding sign gestures.
It basically involves identifying the key attributes of the text that correspond to
sign language gestures. The feature extraction process is carried out by the porter
stemming algorithm.
IT removes past tense (-ed) and continuous tense (-ing) suffixes. For example:
"running" → "run", "played" → "play"
If a word ends in 'y' and is preceded by a consonant, the 'y' is converted to 'i'. For example:
"happy" → "happi", "cry" → "cri"
It removes common suffixes to get the root word. For example: "hopeful" → "hope",
"argument" → "argue"
After all transformations the stemmed word is finally produced. For example: "eaten" →
"eat",
3.2.4 Recognition:
The extracted features are then fed into the NLP algorithm and converted into tokens
using NLTK (Natural Language Tool Kit). POS tags are produced from the tokens and the tags
are used to match the exact sign gesture from the model. The obtained sign gesture is read, and
corresponding animation is played until the whole given sentence is interpreted by the
animation. The user can replay or pause the animation as per their need which helps in better
and more efficient communication with the person with hearing impairment.
3.2.5 Output as Video:
The recognized text or letters are displayed along with the sign language gestures
simultaneously one after the other. For example, if the user input is “Hello”, then
our system will recognize it and provide the corresponding sign language gesture of
the word “Hello”. Our system can only process the words present in our dataset. If
a user inputs a word that is not in our database, our system will generate the sign
language gestures for the word by spelling it out letter by letter. For example, if the
user types "Metabolism," our model will translate and display the sign language for
each letter of the word consecutively.
3.3 Vector Generation Using Word Embeddings:
To convert the words or letters present in the dataset into numerical features that could
be used by Random Forest Classifier, word embedding was applied. Pre-trained word
embedding, Word2Vec was used to represent each word or letter as a vector. These
embedding were chosen because they capture semantic meaning of words. This process
converted each word into a numerical vector which was then used as input in the
machine learning model.
The next step was to train a machine learning model using the vector data. Random
Forest Classifier was used as a classification model which help to make prediction of the
words in the respective categories.
The Random Forest Classifier was trained using the following steps:
1. Data Preprocessing:
The words were tokenized and converted into numerical representations using
Word2Vec.
A Word2Vec model was trained with a vector size of 100 dimensions, creating word
embeddings.
2. Feature Extraction:
Each word was mapped to its corresponding word vector (numerical representation).
These word vectors served as features (X), while the words (or their labels) became the
target (y).
3. Train-Test Split:
The dataset was split into training (X_train, y_train) and testing (X_test,
y_test) sets.
4. Model Training:
a. A Random Forest Classifier was initialized with:
i. 1000 decision trees (n_estimators=1000).
ii. A fixed random state (random_state=42) for reproducibility.
b. The model was trained on word vector features using fit(X_train, y_train).
Performance Metrics
Performance metrics in machine learning provide quantitative methods to assess
various aspects of model performance such as effectiveness and efficiency. These
metrics can involve multiple dimensions, serving different purposes, and often
include categories related to precision, recall, F1 score, and others.
Confusion Matrix
Accuracy
𝑇𝑃 +𝑇𝑁
Accuracy =
𝑇𝑃+𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
Where:
TP (True Positive) is the number of instances correctly predicted as positive.
FP (False Positive) is the number of instances incorrectly predicted as positive.
TN (True Negative) is the number of instances correctly predicted as negative.
FN (False Negative) is the number of instances incorrectly predicted as negative.
Recall
Recall, also known as sensitivity or true positive rate, is a performance metric used in binary
classification tasks. It measures the proportion of actual positive instances that are correctly
identified by the model.
Mathematically, recall is calculated using the formula:
In terms of the confusion matrix:
𝑇𝑃
Recall=
𝑇𝑃 + 𝐹𝑁
Where:
TP (True Positive) is the number of instances correctly predicted as positive.
FN (False Negative) is the number of instances incorrectly predicted as negative.
F1-Score
2 𝑇𝑃
F1-score =
2𝑇𝑃 + 𝐹𝑃+ 𝐹𝑁
Precision
Precision is a performance metric used in binary classification tasks that measure the
proportion of correctly predicted instances out of all instances predicted as positive
by the model.
𝑇𝑃
Precision=
𝑇𝑃 + 𝐹𝑃
Where:
TP (True Positive) is the number of instances correctly predicted as positive.
FP (False Positive) is the number of instances incorrectly predicted as positive.
DATASET
4 TOOLS USED
Python
Python is a high-level, interpreted programming language known for its clear syntax and
readability. We used python as it supports multiple programming paradigms including procedural,
object-oriented and functional programming. Python's extensive library and vast ecosystem of third-
party packages facilitate rapid development across many domains like web development, data
analysis, scientific computing and many more. It also uses dynamic typing, and a combination of
reference counting and a cycle-detecting garbage collector for memory management and also
features dynamic name resolution, which binds method and variable names during program
execution. The comprehensive standard library that python provides helps to build our system within
the effective cost. Python is open-source software which interpreters are available for many
operating systems.
HTML
Html is the building blocks of the web which tags are used to give structure to our system. We
have used html to give the content inside our web application. It guarantees that text and graphics are
properly formatted for our internet browser. A browser would not be able to display text as elements
or load videos or other components if HTML was not present. HTML also supplies the page’s
fundamental structure, which is overlaid with Cascading Style Sheets to customize the system’s
appearance.
CSS
CSS (cascading style sheets) is utilized to enhance the visual presentation of our web page by
defining the style of HTML structures. It allows us to modify attributes such as fonts, colors,
spacing, margins, and more, ensuring our web documents are both appealing and functional. For our
system, CSS has been implemented in both inline and external forms, enabling precise and global
styling across our web pages.
Blender
Blender is a free and open-source 3D creation software widely used for modeling, animation,
simulation, rendering, compositing, motion tracking, and even game development. We utilized it to
create and animate avatars that represent the sign language translations. The avatars mimic the
gestures and movements associated with sign language, providing visual representation to aid in
communication.
Jupyter Notebook
For creation of model files, Jupyter Notebook was used. Jupyter Notebook is an open-source web
application that allows users to create and share documents that contain live code, equations,
visualizations, and narrative text.
Tenserflow
At first, we learned sign language gestures from various sites such as Upskill Tutor, ASL Love etc.
Then, an animated character was created using the blender tool in which each node of fingers and
elbow was scripted and the sign language gesture for the various words, letters and numbers were
created in ".mp4" format. A total of 127 sign language gestures were created, numbers from 0-9, 26
alphabets, and 91 words. Each sign language gesture video was recorded at a rate of 30 frames per
second (fps), meaning that for every second of video, we had 30 individual frames. In ASL, some of
the words and their synonym have same sign language gesture .So, within 91 words there are 16
such words which synonym have same sign language gesture as they have. Within these 16 words,
one of them contain 5 synonyms, 3 words contain 4 synonyms each, 2 word contain 3 synonyms
each, 3 word contain 2 synonym each and 7 words contain one synonym each, making total number
of words in data set to 136. Overall we have collected 174 dataset containing 0-9 numbers
representing 10 different classes, a-z letters representing 26 different classes and 91 words along
with their synonym representing 91 different classes. This comprehensive dataset allowed us to train
and test our model effectively, ensuring accurate sign language video translation.
The model performance has been measured based on the following metrics:
Accuracy: The value 0.9451 indicates that our model correctly predicts the outcome approximately 94.51% .
Precision: it measures how many of the predicted positives are actually correct. For majority of the categories
whose precision is 1, this indicates every prediction of these categories were correct. However, for the
category of “again”, the precision score is 0.18 which means synonyms of “again” classified under it were
only 18% correct. The remaining 72% were misclassified.
Recall: It measures how many actual positives were correctly predicted. High recall means the model is
identifying most of the actual positive cases. But in case of “after”, its recall is 0.67, which indicates 33% of
the actual cases were missed.
F1-score: It indicates the harmonic mean of precision and recall, balancing their trade-offs. ”After” has an F1-
score of 0.80, showing moderate performance due to lower recall whereas “Again” has an F1-score of 0.31,
showing low performance due to lower precision.
Support: It indicates number of true instances of each class in the dataset. In the dataset “after” has a support
of 6 which means there are 5 other synonyms of “after” in the test set.
Fig : performance metrices of the model
Figure: Output of the system.
The final output of the system is the translation of a given text into sign language animation video . Upon
giving the input , the system processes it to translate into sign language . Subsequently , the system generates
the stemmed version of the word if it is in the dataset else it generates each letter of the word and display
those letters one by one . This seamless process enables users to translate sign language gestures effectively,
facilitating communication and understanding.
EPILOGUE
5.1 Conclusion
The minor project focused on developing a Sign Language translation system using Porter Stemming
Algorithm and other NLP techniques. The project successfully implemented the sign language translation
where the user provide text/voice input. Overall, the system demonstrated good performance in terms of
accuracy. This is one of the curtail steps in sign language translation system. This system helps normal and
abnormal people to communicate in efficient ways. Data collection is one of the crucial phase in development
of machine learning projects. However, future research could focus on improving the scalability of the system
by adding more words and their respective sign language videos.
This project is an initial step in reaching the effective solution for the daily concern.
This project can be extended in multiple ways in the future such as:
Mobile app: Developing a mobile app to translate sign language translation system which
could make it more accessible to users who prefer using their smartphones or tablets.
Increased Scalability: This project can be integrated with virtual and augmented reality systems,
which can provide more immersive and interactive sign language experiences for deaf and hard of
hearing individuals.
Overall, these enhancements would help to further improve the system, providing additional words
and their respective gestures in the system, which improves the effectiveness of the system.
REFERENCES