SPEECH
SPEECH
CONTENTS
Introduction
Meaning of Speech Recognition
Working of Speech Recognition
Speech Recognition Flowchart
Recognition process Flow Summary
Advantages
Disadvantages
The Future of Speech Recognition
Conclusion
Introduction
Speech recognition is the process of converting an acoustic
signal, captured by a microphone or a telephone, to set of words.
The recognized words can be an end in themselves, as for
applications such as commands & control, data entry, and
document preparation.
They can also serve as the input to further linguistic processing in
order to achieve speech understanding.
It is also known as Automatic Speech Recognition (ASR),
Computer Speech Recognition, Speech To Text (STT).
WHAT IS SPEECH RECOGNITION?
SPEECH RECOGNITION BASICALLY MEANS TALKING TO A COMPUTER, HAVING
IT RECOGNIZE WHATEVER WE'RE SAYING.
THE DEFINITION SAYS SPEECH RECOGNITION IS THE INTERDISCIPLINARY
SUBFIELD OF COMPUTATIONAL LINGUISTICS THAT DEVELOPS
METHODOLOGIES AND TECHNOLOGIES THAT ENABLES THE RECOGNITION AND
TRANSLATION OF SPOKEN LANGUAGE INTO TEXT BY COMPUTERS. IT IS ALSO
KNOWN AS AUTOMATIC SPEECH RECOGNITION (ASR), COMRUTER SPEECH
RECOGNITION OR SPEECH TO TEXT (STT).
HOW DOES IT WORK?
This process fundamentally functions as a pipeline that converts pcm (pulse
code modulation) digital audio from a sound card into recognized speech.
It basically uses algorithms through language modeling. it involves
relationship between linguistic units of speech and audio signals; language
modeling matches sounds with word sequences to help differentiate between
words that sound similar.
We also use hidden markov models to identify temporal patterns to improve
accuracy.
TYPES OF SPEECH RECOGNITION
1) Speaker-Dependent
2) Speaker-Independent
1) Speaker-Dependent:-
Speaker-dependent software is commonly used for dictation software,
while speaker-independent software is more commonly found in telephone
applications.
Speaker-dependent software works by learning the unique characteristics
of a single person's voice, in a way similar to voice recognition. New users
must first "train" the software by speaking to it, so the computer can
analyze how the person talks. This often means users have to read a few
pages of text to the computer before they can use the speech recognition
software.
2) Speaker-Independent:-
Speaker-independent software is designed to recognize anyone's voice, so no
training is involved. This means it is the only real option for applications such
as interactive voice response systems - where businesses can't ask callers to
read pages of text before using the system. The downside is that speaker-
independent software is generally less accurate than speaker-dependent
software.
Speech recognition engines that are speaker independent generally deal with
this fact by limiting the grammars they use. By using a smaller list of
recognized words, the speech engine is more likely to correctly recognize
what a speaker said.
Recognition Process Flow
Summary
Step 1:User Input
The system catches user's voice in the form of analog
acoustic signal.
Step 2 Digitization
Digitize the analog acoustic signal.
Step 3:Phonetic Breakdown
Breaking signals into phonemes
Recognition Process Flow
Summary
Step 4:Statistical Modeling
Mapping phonemes to their phonetic representation using statistics
model.
Step 5:Matching
According to grammar phonetic representation and Dictionary, the
system returns an n-best list (I.e,:a word plus a confidence score)
Grammar-the union words or phrases to constraint the range of input
or output in the voice application.
Dictionary-the mapping table of phonetic representation and word(EX
: thu, thee->the)
Program Training
The process is more complicated for phrases and sentences -- the system has to
figure out where each word stops and starts.
The statistical systems need lots of exemplary training data to reach their optimal
performance.
Sometimes on the order of thousands of hours of human transcribed speech and
hundreds of megabytes of text.
The training data are used to create acoustic models of words, word lists and multi-
word probability networks.
The details can make the difference between a well-performing system and a poorly-
performing system -- even when using the same basic algorithm.
ADVANTAGES
People with disabilities.
Organizations - Increases productivity, reduces costs and errors.
Lower operational Costs.
Advances in technology will allow consumers and businesses to
implement speech recognition systems at a relatively low cost.
• Cell-phone users can dial pre-programmed numbers by voice
command.
• Users can trade stocks through a voice-activated trading system.
• Speech recognition technology can also replace touch-tone
dialing resulting in the ability to target customers that speak
different languages
DISADVANTAGES
Difficult to build a perfect system.
Conversations
•Involves more than just words (non-verbal communication;
stutters etc.
•Every human being has differences such as their voice,
mouth, and speaking style.
Filtering background noise is a task that can even be difficult for
humans to accomplish.
The Future of Speech Recognition
The Defense Advanced Research Projects Agency (DARPA) has
three teams of researchers working on Global Autonomous Language
Exploitation (GALE), a program that will take in streams of
information from foreign news broadcasts and newspapers and
translate them.
It hopes to create software that can instantly translate two languages
with at least 90 percent accuracy.
"DARPA is also funding an R&D effort called TRANSTAC to enable
the soldiers to communicate more effectively with civilian
populations in non English-speaking countries.
Conclusion:
At some point in the future, speech recognition may become speech
understanding
The statistical models that allow computers to decide what a person just
said may someday allow them to grasp the meaning behind the words.
Although it is a huge leap in terms of computational power and software
sophistication, some researchers argue that speech recognition
development offers the most direct line from the computers of today to
true artificial intelligence.