100% found this document useful (1 vote)
64 views

SPEECH

The document discusses speech recognition including its meaning, working process, advantages, disadvantages and future. Speech recognition is the process of converting spoken words to text. It works by using algorithms through language modeling and hidden Markov models. The future of speech recognition includes developing systems that can instantly translate languages with high accuracy and understand the meaning behind words.

Uploaded by

Ramesh k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
64 views

SPEECH

The document discusses speech recognition including its meaning, working process, advantages, disadvantages and future. Speech recognition is the process of converting spoken words to text. It works by using algorithms through language modeling and hidden Markov models. The future of speech recognition includes developing systems that can instantly translate languages with high accuracy and understand the meaning behind words.

Uploaded by

Ramesh k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

SPEECH RECOGNITION

CONTENTS
Introduction
Meaning of Speech Recognition
Working of Speech Recognition
Speech Recognition Flowchart
Recognition process Flow Summary
Advantages
Disadvantages
The Future of Speech Recognition
Conclusion
Introduction
Speech recognition is the process of converting an acoustic
signal, captured by a microphone or a telephone, to set of words.
The recognized words can be an end in themselves, as for
applications such as commands & control, data entry, and
document preparation.
They can also serve as the input to further linguistic processing in
order to achieve speech understanding.
It is also known as Automatic Speech Recognition (ASR),
Computer Speech Recognition, Speech To Text (STT).
WHAT IS SPEECH RECOGNITION?
 SPEECH RECOGNITION BASICALLY MEANS TALKING TO A COMPUTER, HAVING
IT RECOGNIZE WHATEVER WE'RE SAYING.
 THE DEFINITION SAYS SPEECH RECOGNITION IS THE INTERDISCIPLINARY
SUBFIELD OF COMPUTATIONAL LINGUISTICS THAT DEVELOPS
METHODOLOGIES AND TECHNOLOGIES THAT ENABLES THE RECOGNITION AND
TRANSLATION OF SPOKEN LANGUAGE INTO TEXT BY COMPUTERS. IT IS ALSO
KNOWN AS AUTOMATIC SPEECH RECOGNITION (ASR), COMRUTER SPEECH
RECOGNITION OR SPEECH TO TEXT (STT).
HOW DOES IT WORK?
 This process fundamentally functions as a pipeline that converts pcm (pulse
code modulation) digital audio from a sound card into recognized speech.
 It basically uses algorithms through language modeling. it involves
relationship between linguistic units of speech and audio signals; language
modeling matches sounds with word sequences to help differentiate between
words that sound similar.
 We also use hidden markov models to identify temporal patterns to improve
accuracy.
TYPES OF SPEECH RECOGNITION

1) Speaker-Dependent
2) Speaker-Independent
1) Speaker-Dependent:-
 Speaker-dependent software is commonly used for dictation software,
while speaker-independent software is more commonly found in telephone
applications.
 Speaker-dependent software works by learning the unique characteristics
of a single person's voice, in a way similar to voice recognition. New users
must first "train" the software by speaking to it, so the computer can
analyze how the person talks. This often means users have to read a few
pages of text to the computer before they can use the speech recognition
software.
2) Speaker-Independent:-
 Speaker-independent software is designed to recognize anyone's voice, so no
training is involved. This means it is the only real option for applications such
as interactive voice response systems - where businesses can't ask callers to
read pages of text before using the system. The downside is that speaker-
independent software is generally less accurate than speaker-dependent
software.
 Speech recognition engines that are speaker independent generally deal with
this fact by limiting the grammars they use. By using a smaller list of
recognized words, the speech engine is more likely to correctly recognize
what a speaker said.
Recognition Process Flow
Summary
 Step 1:User Input
The system catches user's voice in the form of analog
acoustic signal.
 Step 2 Digitization
Digitize the analog acoustic signal.
 Step 3:Phonetic Breakdown
Breaking signals into phonemes
Recognition Process Flow
Summary
Step 4:Statistical Modeling
 Mapping phonemes to their phonetic representation using statistics
model.
Step 5:Matching
 According to grammar phonetic representation and Dictionary, the
system returns an n-best list (I.e,:a word plus a confidence score)
 Grammar-the union words or phrases to constraint the range of input
or output in the voice application.
 Dictionary-the mapping table of phonetic representation and word(EX
: thu, thee->the)
Program Training
 The process is more complicated for phrases and sentences -- the system has to
figure out where each word stops and starts.
 The statistical systems need lots of exemplary training data to reach their optimal
performance.
 Sometimes on the order of thousands of hours of human transcribed speech and
hundreds of megabytes of text.
 The training data are used to create acoustic models of words, word lists and multi-
word probability networks.
 The details can make the difference between a well-performing system and a poorly-
performing system -- even when using the same basic algorithm.
ADVANTAGES
 People with disabilities.
 Organizations - Increases productivity, reduces costs and errors.
 Lower operational Costs.
 Advances in technology will allow consumers and businesses to
implement speech recognition systems at a relatively low cost.
• Cell-phone users can dial pre-programmed numbers by voice
command.
• Users can trade stocks through a voice-activated trading system.
• Speech recognition technology can also replace touch-tone
dialing resulting in the ability to target customers that speak
different languages
DISADVANTAGES
 Difficult to build a perfect system.
 Conversations
•Involves more than just words (non-verbal communication;
stutters etc.
•Every human being has differences such as their voice,
mouth, and speaking style.
 Filtering background noise is a task that can even be difficult for
humans to accomplish.
The Future of Speech Recognition
 The Defense Advanced Research Projects Agency (DARPA) has
three teams of researchers working on Global Autonomous Language
Exploitation (GALE), a program that will take in streams of
information from foreign news broadcasts and newspapers and
translate them.
 It hopes to create software that can instantly translate two languages
with at least 90 percent accuracy.
 "DARPA is also funding an R&D effort called TRANSTAC to enable
the soldiers to communicate more effectively with civilian
populations in non English-speaking countries.
Conclusion:
 At some point in the future, speech recognition may become speech
understanding
 The statistical models that allow computers to decide what a person just
said may someday allow them to grasp the meaning behind the words.
 Although it is a huge leap in terms of computational power and software
sophistication, some researchers argue that speech recognition
development offers the most direct line from the computers of today to
true artificial intelligence.

You might also like