dl_proj_rep

This project report details the development of a voice-to-text conversion system using Python and deep learning techniques, primarily leveraging CNNs and LSTMs for accurate transcription. It discusses the history, challenges, and applications of speech recognition technology, highlighting the importance of preprocessing and model training on diverse datasets. The report concludes with insights into the potential of speech-to-text systems in various sectors and the future of voice interaction with technology.

Uploaded by

Visvagna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

dl_proj_rep

Uploaded by

Visvagna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

PROJECT REPORT

DEEP LEARNING

22AD3105R/A

PROJECT TITLE- VOICE TO TEXT CONVERSION USING

PYTHON AND DEEP LEARNING

SUBMITTED BY-
2210080076- SWAPNIKA SHETTY
2210080050- PADMA SREEJA
2210080059-P.VISVAGNA
2210080034- M.JYOTHI
2210080053- SEELAMSETTI LEELA KUMARI

CONTENT
1. Abstract
2. Introduction
3. History of speech to text
4. Problem faced in speech to text
5. Packages required
6. Steps
7. Applications of speech recognition
8. Conclusion

ABSTRACT
The advancement of deep learning technologies has significantly enhanced the capabilities of
speech recognition systems. This project focuses on developing a robust speech-to-text
conversion model utilizing deep learning techniques in Python. The primary goal is to
accurately transcribe spoken language into written text by training a neural network on a
large dataset of audio files and corresponding transcripts.
The project leverages a combination of convolutional neural networks (CNNs) and recurrent
neural networks (RNNs), specifically Long Short-Term Memory (LSTM) units, to capture the
temporal dependencies and acoustic features of speech. Additionally, the model employs a
Connectionist Temporal Classification (CTC) loss function to handle the varying lengths of
input speech and output text sequences.
Using popular Python libraries such as TensorFlow and Keras, the model is trained on diverse
speech datasets to enhance its generalization across different speakers, accents, and
environments. The implementation also integrates various preprocessing steps, including
noise reduction, feature extraction using Mel-Frequency Cepstral Coefficients (MFCCs), and
data augmentation, to improve the model's robustness.
The resulting speech-to-text system is evaluated on multiple benchmarks, demonstrating its
potential for real-world applications such as voice-activated assistants, automated
transcription services, and accessibility tools for the hearing impaired. The project concludes
with insights into the challenges faced during development, such as handling noise and
managing computational complexity, and suggests potential future improvements for
achieving even greater accuracy and efficiency.

INTRODUCTION
As our interactions with machines have evolved, the complexity of these interactions has
increased dramatically. We’ve transitioned from pressing large mechanical buttons to
seamlessly navigating touchscreens. But the advancements aren’t limited to hardware alone.
Historically, text has been the predominant method of input for computers. However, with
breakthroughs in Natural Language Processing (NLP), Machine Learning (ML), and Data
Science, we are now on the cusp of using speech as a primary medium for interacting with
our devices.
Virtual assistants exemplify this shift and are now embedded in our daily lives. Digital
assistants like Google Assistant, Siri, Alexa, and others have set a new standard for how we
can communicate with the digital world on a more personal and intuitive level.
One of the most exciting developments in modern technology is the widespread availability
of tools that allow anyone to convert spoken words into text. This capability, once limited to
specialized applications, is now accessible to a broad audience, inviting experimentation and
innovation.
When it comes to developing speech-to-text applications, Python—one of the most popular
programming languages—offers a wealth of options. From built-in libraries to powerful
frameworks, Python provides the tools necessary to create efficient and effective speech-to-
text systems, making this technology more accessible than ever before.

History of Speech to Text

Before diving into Python’s statement to text feature, it’s interesting to take a look at how far
we’ve come in this area. Listed here is a condensed version of the timeline of events:
Audrey,1952: The first speech recognition system built by 3 Bell Labs engineers was Audrey
in 1952. It was only able to read numerals.
IBM Shoebox (1962): Coils can distinguish 16 words in addition to numbers in IBM’s first
voice recognition system, the IBM Shoebox (1962). Had the ability to do basic mathematical
calculations and publish the results.
IMAGE

Defense Advanced Research Projects Agency(DARPA) (1970): Defense Advanced

Research Projects Agency (DARPA) (1970): DARPA supported Speech Understanding
Research, which led to the creation of Harpy’s ability to identify 1011 words.
Hidden Markov Model(HMM), the 1980s: Problems that need sequential information can
be represented using the HMM statistical model. This model was used in the development of
new voice recognition techniques.
Voice search by Google,2001: It was in 2001 that Google launched its Voice Search tool,
which allowed users to search by speaking. This was the first widely used voice-enabled app.
IMAGE

Siri,2011: A real-time and convenient way to connect with Apple’s gadgets was provided by
Siri in 2011.

IMAGE

Alexa,2014 & google home,2016: Voice-activated virtual assistants like Alexa and Google
Home, which have sold over 150 million units combined, entered the mainstream in 2014 and
2016, respectively.
IMAGE
Problems faced in Speech to Text
Speech-to-text conversion remains a challenging area of technology, with several unresolved
issues. Numerous technical limitations currently make it an imperfect tool. Below are some
of the most common challenges faced in voice recognition technology:
1. Inaccurate Interpretation
Speech recognition systems often struggle to accurately capture spoken words. Unlike
humans, Voice User Interfaces (VUIs) lack the ability to fully understand context, which can
significantly alter the meaning of words and phrases. As a result, these systems may have
difficulty grasping the true semantics of a statement.
2. Processing Time
Speech recognition systems can sometimes take a considerable amount of time to process
spoken input. This delay is often due to the vast range of vocal patterns among different
users. While speaking more slowly or clearly can improve recognition accuracy, it detracts
from the tool’s overall convenience and efficiency.
3. Accents and Dialects
VUIs may struggle to understand non-standard accents and dialects. Even within the same
language, people can pronounce the same words in vastly different ways, leading to potential
misinterpretations by the system.
4. Background Noise and Volume
Ideally, background noise and varying speech volumes would not be an issue for VUIs, but in
reality, these factors can significantly impact performance. In noisy environments such as
public spaces or large offices, speech recognition systems may struggle to function
effectively.
How does Speech recognition work?

A complete description of the method is beyond the scope of this blog.А соmрlete desсriрtiоn
оf the methоd is beyоnd the sсорe оf this blоg. I’m going to demonstrate how to convert
speech to text using Python in this blog. This is accomplished using the “Speech
Recognition” API and the “PyAudio” library.
PACKAGES REQUIRED
 speech_recogntion (pip install SpeechRecogntion): This is the core package that
handles the most important part of the conversion process. Other solutions, such as
appeal, assembly, google-cloud-search, pocketsphinx, Watson-developer-cloud, wit,
and so on, offer advantages and disadvantages.
pip install SpeechRecognition
 My audio (pip install Pyaudio)
 Portaudio (pip install Portaudio)

Convert an audio file into text

Steps
1. Import library for speech recognition
2. Initializing the recognizer class in order to do voice recognition. We аre utilizing
Gооgle’s sрeeсh reсоgnitiоn teсhnоlоgy.
3. The following audio formats are supported by speech recognition: wav, AIFF, AIFF-
C, and FLAC. In this example, I utilized a ‘wav’ file.
4. I’ve utilized an audio clip from a ‘stolen’ video that states “I have no idea who you are
or what you want, but if you’re seeking for ransom, I can tell you I don’t have any
money.”
5. Google recognizer reads English by default. It supports a variety of languages; for
further information, please refer to this documentation.
Speech is nothing more than a sound wave at its most basic level. In terms of acoustics,
amplitude, peak, trough, crest, and trough, wavelength, cycle, and frequency are some of the
characteristics of these sound waves or audio signals.
Due to the fact that these audio signals are continuous, they include an endless number of
data points. To convert such an audio signal to a digital signal capable of being processed by
a computer, the network must take a discrete distribution of samples that closely
approximates the continuity of an audio signal.
Once we’ve established a suitable sample frequency (8000 Hz is a reasonable starting point,
given the majority of speech frequencies fall within this range), we can analyze the audio
signals using Python packages such as LibROSA and SciPy. On the basis of these inputs, we
can then partition the data set into two parts: one for training the model and another for
validating the model’s conclusions.
At this stage, one may use the Conv1d model architecture, a convolutional neural network
with a single dimension of operation. After that, we may construct a model, establish its loss
function, and use neural networks to prevent the best model from converting voice to text. We
can modify statements to text using deep learning and NLP (Natural Language Processing) to
enable wider applicability and acceptance.

Applications of Speech Recognition

There are more tools accessible for operating this technological breakthrough because it is
mostly a software creation that does not belong to anyone company. Because of this, even
developers with little financial resources have been able to use this technology to create
innovative apps.
The following are some of the sectors in which voice recognition is gaining traction
 Evolution in search engines: Speech recognition will aid in improving search
accuracy by bridging the gap between verbal and textual communication.
 Impact on the healthcare industry: The impact on the healthcare business is that voice
recognition is becoming a more prevalent element in the medical sector, as it speeds
up the production of medical reports. As VUIs improve their ability to comprehend
medical language, clinicians will gain time away from administrative tasks by using
this technology.
 Service industry: As automation advances, it is possible that a customer will be unable
to reach a human to respond to a query; in this case, speech recognition systems can
fill the void. We will witness a quick expansion of this function at airports, public
transportation, and other locations.
 Service providers: Telecommunications companies may rely even more on speech-to-
text technology that may help determine callers’ requirements and lead them to the
proper support.

Conclusion
A speech-to-text conversion is a useful tool that is on its way to becoming commonplace.
With Python, one of the most popular programming languages in the world, it’s easy to create
applications with this tool. As we make progress in this area, we’re laying the groundwork for
a future in which digital information may be accessed not just with a fingertip but also with a
spoken command.

2200 845835 IC 2389 00100 - 02 - 01 - PIMSReviewCommentsPdf
No ratings yet
2200 845835 IC 2389 00100 - 02 - 01 - PIMSReviewCommentsPdf
147 pages
Synopsis
No ratings yet
Synopsis
5 pages
Speech Recognition Technology
No ratings yet
Speech Recognition Technology
24 pages
Text To Speechh Technology
No ratings yet
Text To Speechh Technology
28 pages
AI Speech Recognition Document
No ratings yet
AI Speech Recognition Document
26 pages
Voice Assistant (4)
No ratings yet
Voice Assistant (4)
34 pages
Minor Project123
No ratings yet
Minor Project123
40 pages
Automatic Speech Recognition Using Python
No ratings yet
Automatic Speech Recognition Using Python
18 pages
Speech To Text Conversion
No ratings yet
Speech To Text Conversion
7 pages
SPEECH
No ratings yet
SPEECH
8 pages
Speech Recognition
No ratings yet
Speech Recognition
9 pages
DL Based Speech To Text Converter For Audio Visual Applications
No ratings yet
DL Based Speech To Text Converter For Audio Visual Applications
4 pages
Speech Recognition Technology
No ratings yet
Speech Recognition Technology
22 pages
Report On Smart Bot Using Python
No ratings yet
Report On Smart Bot Using Python
19 pages
Tejaswini Group Report
No ratings yet
Tejaswini Group Report
18 pages
Text and Speech CCS369-UNIT 5
No ratings yet
Text and Speech CCS369-UNIT 5
9 pages
DOC-20241111-WA0002.
No ratings yet
DOC-20241111-WA0002.
10 pages
minor project sem 2
No ratings yet
minor project sem 2
35 pages
6.python Text To Speech
No ratings yet
6.python Text To Speech
2 pages
Main (pt2)
No ratings yet
Main (pt2)
13 pages
Development of Multilingual Speech
No ratings yet
Development of Multilingual Speech
13 pages
Key Application: - Audrey System - The First Speech Recognition System Introduced by Bell Laboratories in 1952
No ratings yet
Key Application: - Audrey System - The First Speech Recognition System Introduced by Bell Laboratories in 1952
8 pages
Anurag Synop[1]
No ratings yet
Anurag Synop[1]
9 pages
A Skill Based Evaluation Report: Submitted by Joy James Swamy (Urk23Cs1042)
No ratings yet
A Skill Based Evaluation Report: Submitted by Joy James Swamy (Urk23Cs1042)
16 pages
Key Application: Automatic Speech Recognition or ASR, As It's
No ratings yet
Key Application: Automatic Speech Recognition or ASR, As It's
8 pages
Voice Assistant (1)
No ratings yet
Voice Assistant (1)
30 pages
Ai Virtual Assistant in Python: Submitted By: Rohit Kumar Sakshi Verma
No ratings yet
Ai Virtual Assistant in Python: Submitted By: Rohit Kumar Sakshi Verma
17 pages
Project Report
No ratings yet
Project Report
17 pages
7sem_projectreport
No ratings yet
7sem_projectreport
33 pages
AI Report (Karthi)
No ratings yet
AI Report (Karthi)
15 pages
Natural Language Processing: by Dr. Parminder Kaur
No ratings yet
Natural Language Processing: by Dr. Parminder Kaur
26 pages
Speech Recognition Technology
No ratings yet
Speech Recognition Technology
23 pages
Ranjith S - Mini Project
No ratings yet
Ranjith S - Mini Project
74 pages
Speech recognition applications TEXT
No ratings yet
Speech recognition applications TEXT
7 pages
Vivek Kumar - 1613112052
No ratings yet
Vivek Kumar - 1613112052
7 pages
Speech to Text
No ratings yet
Speech to Text
17 pages
Artificial Intelligence in Voice Recognition
No ratings yet
Artificial Intelligence in Voice Recognition
14 pages
UNIT 5 Application AI
No ratings yet
UNIT 5 Application AI
16 pages
Speech Recognition Report
No ratings yet
Speech Recognition Report
46 pages
(IJCST-V9I2P18) :swati, Harpreet Kaur
No ratings yet
(IJCST-V9I2P18) :swati, Harpreet Kaur
6 pages
Introduction to Speech Recognition
No ratings yet
Introduction to Speech Recognition
3 pages
Voice Assistant Using Python 2
No ratings yet
Voice Assistant Using Python 2
20 pages
KY DSV
No ratings yet
KY DSV
7 pages
Research Paper on Speech Recognition
100% (1)
Research Paper on Speech Recognition
6 pages
Speech Recognition System
No ratings yet
Speech Recognition System
16 pages
JARVIS A PC Voice Assistant
No ratings yet
JARVIS A PC Voice Assistant
9 pages
Approved by AICTE, New Delhi Affiliated To Aryabhatta Knowledge University, Patna, BIHAR
No ratings yet
Approved by AICTE, New Delhi Affiliated To Aryabhatta Knowledge University, Patna, BIHAR
5 pages
ai
No ratings yet
ai
8 pages
Speech Recognition Final Report (1) - Removed - Removed
No ratings yet
Speech Recognition Final Report (1) - Removed - Removed
62 pages
Design and Implementation
No ratings yet
Design and Implementation
74 pages
Project Report
No ratings yet
Project Report
58 pages
AIML ppt
No ratings yet
AIML ppt
9 pages
Chapter 1. INTRODUCTION
No ratings yet
Chapter 1. INTRODUCTION
2 pages
AI Desktop
No ratings yet
AI Desktop
14 pages
Features: Digital Assistant
No ratings yet
Features: Digital Assistant
8 pages
A Survey On Speech Recognition
No ratings yet
A Survey On Speech Recognition
2 pages
A Report On
No ratings yet
A Report On
35 pages
Ai For Speech Recognition
No ratings yet
Ai For Speech Recognition
27 pages
Speech Recognition[1]
No ratings yet
Speech Recognition[1]
11 pages
Speech Recognition: Fundamentals and Applications
From Everand
Speech Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet
Ventilation and Cooling Comparison Between Diesel and Electric Mining Equipment
100% (1)
Ventilation and Cooling Comparison Between Diesel and Electric Mining Equipment
128 pages
scribd.vdownloaders.com_mgt-300-501-mbti-personality-test
No ratings yet
scribd.vdownloaders.com_mgt-300-501-mbti-personality-test
4 pages
Form 2 (Undertaking)
No ratings yet
Form 2 (Undertaking)
1 page
257revc Abyc
No ratings yet
257revc Abyc
2 pages
Ad Case Study Group 2
No ratings yet
Ad Case Study Group 2
26 pages
Epekto NG Kompyuter Sa Mga Estudyante Thesis
100% (3)
Epekto NG Kompyuter Sa Mga Estudyante Thesis
6 pages
The 100 most useful phrases for business meetings - UsingEnglish.com
No ratings yet
The 100 most useful phrases for business meetings - UsingEnglish.com
6 pages
2CDC135010D0204 en B CEM11-FBP - XXX Manual 2019 01 16
No ratings yet
2CDC135010D0204 en B CEM11-FBP - XXX Manual 2019 01 16
13 pages
f2 SQL
No ratings yet
f2 SQL
4 pages
The Effectiveness of Gen AI in Assisting Students Knowledge Construction in Humanities and Social Sciences Courses Learning Behaviour Analysis
No ratings yet
The Effectiveness of Gen AI in Assisting Students Knowledge Construction in Humanities and Social Sciences Courses Learning Behaviour Analysis
23 pages
Purging Workflow Tables
No ratings yet
Purging Workflow Tables
10 pages
Another Innovative Solution From Tronair: Shimmy Damper Fluid Service Unit
No ratings yet
Another Innovative Solution From Tronair: Shimmy Damper Fluid Service Unit
1 page
Reliance Industries
No ratings yet
Reliance Industries
13 pages
Stable Matching Problem: Gale-Shapley Algorithm Hospital Optimality Context
No ratings yet
Stable Matching Problem: Gale-Shapley Algorithm Hospital Optimality Context
36 pages
Mathematics P1 Nov 2009 Eng Memo
No ratings yet
Mathematics P1 Nov 2009 Eng Memo
17 pages
Nyein Nyein Ei (Hotel)
No ratings yet
Nyein Nyein Ei (Hotel)
2 pages
Formative and Summative Evaluation
No ratings yet
Formative and Summative Evaluation
8 pages
Empathy Fieldguide: How Do You Feel?
No ratings yet
Empathy Fieldguide: How Do You Feel?
9 pages
Plastic
No ratings yet
Plastic
26 pages
DATA-PRIVACY-ACT-CONSENT-FORM_Amended01
No ratings yet
DATA-PRIVACY-ACT-CONSENT-FORM_Amended01
1 page
Mac vs. PC
No ratings yet
Mac vs. PC
18 pages
Soal PTS Kelas 6 Bahasa Inggris
100% (1)
Soal PTS Kelas 6 Bahasa Inggris
8 pages
The Game Theory Malaysia S Position in S
No ratings yet
The Game Theory Malaysia S Position in S
24 pages
A - B. C. D - Answer & Explanation
No ratings yet
A - B. C. D - Answer & Explanation
62 pages
JJAP Authors Guide
No ratings yet
JJAP Authors Guide
10 pages
worksheet L-11 & 12 (1)
No ratings yet
worksheet L-11 & 12 (1)
3 pages
Design Principles: Crane Sheave Basic
No ratings yet
Design Principles: Crane Sheave Basic
6 pages
HGP Module 14 The Responsible Me
No ratings yet
HGP Module 14 The Responsible Me
23 pages
Liftmaster 8500 Garage Door Opener Manual
No ratings yet
Liftmaster 8500 Garage Door Opener Manual
40 pages