Project_Proposal_Form-1
Project_Proposal_Form-1
DATE – –
03106860994
3. 70132841 Bilal Sher [email protected]
PROBLEM STATEMENT
Individuals with speech impairments (people who can’t speak) often face challenges in daily
communication, requiring a translator to convey their thoughts and needs. This reliance can limit
their independence and social interaction. Our solution aims to address this issue by developing a
software that translates gestures into text and speech, and vice versa, enabling seamless
communication without external assistance.
EXECUTIVE SUMMARY
This project addresses the communication challenges faced by individuals with speech
impairments. Current solutions often rely on human translators, limiting independence and social
interaction. This project aims to develop innovative software that facilitates seamless
communication by translating gestures into text or speech, and vice versa. This technology will
empower individuals with speech impairments to express themselves independently, enhance their
social participation, and improve their overall quality of life. The software will leverage advanced
computer vision and machine learning algorithms to accurately recognize and interpret a wide
Page 1
The University of Lahore – Final Project Proposal
There are some competitive apps that are lying under this category but they offer only one or two
functionalities like gesture to text or text to voice. But in our app we are combinning six core
functionalities on a single platform this is uniqueness of our app.
INTRODUCTION
Background Information
Individuals with speech impairments face diverse challenges, including difficulty in producing
clear speech sounds, controlling vocal pitch and volume, and comprehending or processing
language. This can manifest in various ways, from slurred speech to complete inability to vocalize.
Existing assistive technologies, such as text-to-speech devices and augmentative and alternative
communication (AAC) systems, often require significant manual input and may not be intuitive or
adaptable to individual needs.
"A Wearable Gesture-Based Communication Device for People with Motor Disabilities"
(IEEE Transactions on Haptics, 2024):
This research presents a wearable device designed to enable individuals with motor disabilities to
communicate through simple hand gestures. The device incorporates inertial measurement units
Page 2
The University of Lahore – Final Project Proposal
(IMUs) and a machine learning algorithm to recognize gestures and generate corresponding
messages. This research aligns with the problem statement by focusing on developing assistive
technologies that enhance communication for individuals with motor impairments, which often co-
occur with speech impairments.
COMPETITORS/COMPETITIVE ANALYSIS
Functionalities Hand Talk Spread Signs The ASL app Our App
Translator
Gesture to text No No Yes Yes
Text to gesture Yes Yes No Yes
Text to speech No No No Yes
Speech to text No No No Yes
Gesture to No No No Yes
speech
Speech to Yes No No Yes
gesture
OBJECTIVES
The primary objectives of the App are:
1. Text-to-speech conversion
2. Speech-to-text conversion
Page 3
The University of Lahore – Final Project Proposal
3. Gesture-to-text conversion
4. Text-to-gesture conversion
5. Speech-to-gesture conversion
6. Gesture-to-speech conversion
Empower Independence:
Foster greater independence and confidence among users, enabling them to engage in social,
professional, and personal interactions seamlessly.
Global Accessibility:
Make the software available on mobile devices ensuring it reaches a diverse audience worldwide.
MOTIVATION
The motivation is to create a more equitable and inclusive society where individuals with speech
impairments have the same opportunities for communication and social participation as everyone
else.
REQUIREMENTS
Functional Requirements:
1. Gesture-to-Text Conversion
The app can recognize and interpret gestures (such as hand movements, facial expressions, or
body posture) and convert them into real-time text, allowing users to express themselves without
speaking.
2. Gesture-to-Speech Conversion
Recognizes gestures and translates them into spoken words, enabling users to "speak" via their
gestures and communicate with others who may not understand sign language or gestures.
3. Text-to-Gesture Conversion
Converts written text into visual gestures or sign language, facilitating communication for non-
verbal individuals to communicate with others who understand gestures but not text.
4. Text-to-Speech Conversion
Converts written text into spoken language, allowing individuals with speech impairments
to communicate through text that is spoken aloud by the app.
Page 4
The University of Lahore – Final Project Proposal
5. Speech-to-Text Conversion
Converts spoken language into text, allowing individuals with hearing impairments or speech
difficulties to read what others are saying in real-time.
6. Speech-to-Gesture Conversion
Converts spoken language into gestures or sign language, allowing the app to "translate" speech
into an accessible visual form.
Non-Functional Requirements:
1. Reliability and Stability:
Robust and stable system with minimal crashes or errors.
High availability and uptime.
2. Usability:
Easy use, even for users with limited technical experience..
3. Performance:
Real-time performance with minimal latency.
Efficient resource utilization (CPU, memory, battery).
4.Portability:
For smartphones.
FEATURES OF PROJECT
1. Gesture-to-Text Conversion
The app can recognize and interpret gestures (such as hand movements, facial expressions, or
body posture) and convert them into real-time text, allowing users to express themselves without
speaking.
2. Gesture-to-Speech Conversion
Recognizes gestures and translates them into spoken words, enabling users to "speak" via their
gestures and communicate with others who may not understand sign language or gestures.
3. Text-to-Gesture Conversion
Converts written text into visual gestures or sign language, facilitating communication for non-
verbal individuals to communicate with others who understand gestures but not text.
4. Text-to-Speech Conversion
Page 5
The University of Lahore – Final Project Proposal
5. Speech-to-Text Conversion
Converts spoken language into text, allowing individuals with hearing impairments or speech
difficulties to read what others are saying in real-time.
6. Speech-to-Gesture Conversion
Converts spoken language into gestures or sign language, allowing the app to "translate" speech
into an accessible visual form.
7. Real-Time Communication
Ensures that all conversions (gesture-to-text, text-to-speech, etc.) happen in real-time, enabling
smooth, ongoing conversations between the user and others.
8. Easy-to-Use Interface
Designed with an intuitive and accessible interface, making the app simple to navigate for people
of all ages and tech proficiency levels.
ARCHITECTURAL DESIGN
Hardware Components:
The core user interface for interaction with the app will be mobile phone. These devices will
have basic computing capabilities, necessary sensors (camera, microphone, speakers), and a
display for rendering text, speech, and gestures.
2. Camera:
The device's camera (or an external webcam) will capture gestures, facial expressions, and body
movements, essential for the gesture recognition system. This will be used for the gesture-to-text
and gesture-to-speech conversion features.
3. Microphone:
The microphone on the user's device will capture speech for the speech-to-text conversion feature. It will
also be used for detecting speech input in the speech-to-gesture conversion system.
4. Speakers:
For the text-to-speech functionality, speakers on the device will produce audible speech when text is
Page 6
The University of Lahore – Final Project Proposal
Software Components:
1. Mobile Application:
A cross-platform application will be built to run on mobile phones (Android, iOS). The app will serve as
the front-end interface for user interaction with various functionalities such as text, speech, and gesture
conversions.
The app will integrate speech recognition tools (e.g., Google Speech-to-Text API or Microsoft's Azure
Speech Services) to convert spoken language into text. It will also include Text-to-Speech (TTS) systems
such as Google Text-to-Speech or Amazon Polly to read text aloud.
3. APIs:
External APIs (e.g., Google Cloud Speech-to-Text, Text-to-Speech APIs, computer vision models for
gesture recognition) will be integrated into the app for processing speech input and generating speech or
gesture output.
4. Database:
A local database or cloud database will store user preferences, customization settings, historical
interactions, and any other relevant data, enabling a personalized experience.
Network Components:
1. Internet Connectivity:
The app will rely on an internet connection for cloud synchronization, fetching updates, and accessing
APIs for speech recognition, text-to-speech conversion, and machine learning model processing.
Page 7
The University of Lahore – Final Project Proposal
Page 8
The University of Lahore – Final Project Proposal
Project Plan
VERSION CONTROL
REFERENCES
References:
Camgoz, N., Akarun, L., & Akarun, S. (2018). Deep learning for sign language recognition:
A survey. Computer Vision and Image Understanding, 175, 1-29.
Liu, H., Zhang, Y., & Liu, W. (2018). Gesture recognition based on depth image and
support vector machine for intelligent wheelchair. Journal of Ambient Intelligence and
Humanized Computing, 9(1), 189-197.
Muller, M., Muller, T., & Gross, H.-M. (2013). Real-time hand pose estimation for sign
language recognition using depth and color information. Computer Vision and Image
Understanding, 117(12), 1422-1437.
Vogler, C., Larsen, E., & Krüger, N. (2014). Gesture-based interaction for people with
motor disabilities. Universal Access in the Information Society, 13(2), 147-161.
Name:
Signature:
Day Month Year
DATE – –
Page 10