0% found this document useful (0 votes)

15 views12 pages

_speech recognition system

The document provides an overview of speech recognition systems, detailing their primary tasks: preprocessing, recognition, and communication. It describes various recognition technologies, including template matching, acoustic-phonetic recognition, stochastic processing, and neural networks, along with their applications in fields such as healthcare and telecommunications. Additionally, it categorizes speech recognition systems based on the types of utterances they can recognize, such as isolated words, connected words, continuous speech, and spontaneous speech.

Uploaded by

Arya Shaiju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views12 pages

_speech recognition system

Uploaded by

Arya Shaiju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

SPEECH RECOGNITION SYSTEM

SUBMITTED TO: Ms. Litna

SUBMITTED BY: Ms. Ashithanandan

SPEECH RECOGNITION
Speech recognition (also known as automatic speech recognition or
computer speech recognition) converts spoken words to machine-readable input. Speech
recognizer is a device that converts an acoustic signal into another form, like writing, to be
stored or used in some way. Speech recognition device accepts acoustic signals as input and
produces sequence of words as output. Speech recognition system performs three primary
tasks;

 Preprocessing - Converts the spoken input into a form that the recognizer can process.
 Recognition - Recognizer incorporates acoustic model, lexical model, language model
which helps to analyze the data at acoustic, articulatory phonetic, linguistic level and
identifies what is being said.
 Communication - Sends the recognized input to the software/hardware systems that need
it.

In order to understand what these three tasks entail, the Technology Focus begins with a
description of the data that speech recognition systems must handle. It describe speech is
produced (called articulation), examines the stream of speech itself (called acoustics), and
then characterizes the ability of the human ear to handle spoken input (called auditory
perception).

In figure (1), communication is displayed as a bi-directional arrow. This represents the two-
way communication that exists in applications where the speech interface is closely bound to
the rest of the application. In those applications, software components that are external to the
speech recognition system may guide recognition by specifying the words and structures that
the recognition system can use at any point in the application. Other uses of speech involve
one-way communication from recognition system to other components of the application.
Figure (1): Components of speech recognition system.

Preprocessing, recognition and communication should be invisible to the users of a speech

recognition interface. The end user sees them indirectly as the accuracy and speed of system.
Accuracy and speed are tools that users call upon to evaluate a speech recognition interface.

THE DATA OF SPEECH RECOGNITION

The information needed to perform speech recognition is contained in the stream of speech.
For humans, that flow of sounds and silences can be partitioned into discourses, sentences,
words, and sounds. Speech recognition systems focus on words and the sounds that
distinguish one word from another in a language. Those sounds are called phonemes. The
ability to differentiate words with distinct phonemes is as critical for speech recognition as it
is for human beings.

There are a number of ways speech can be described and analyzed. The most commonly used
approaches are

 Articulation - Analysis of how speech sounds are produced by speakers

 Acoustics – Analysis of speech signal as a stream of sounds
 Auditory Perception – Analysis of how speech is processed by human listener

These three approaches offer insights into the nature of speech and provide tools to make
recognition more accurate and efficient.

PREPROCESSING SPEECH

Like all sounds, speech is an analog waveform. In order for a recognition system to utilize
speech data, all formants, noise patterns, silences, and co-articulation effects must be
captured and converted to a digital format. This conversion process is accomplished through
digital signal processing techniques. Some speech recognition products include hardware to
perform this conversion. Other systems rely on the signal processing capabilities of other
products, such as digital audio sound cards.

Capturing the Speech Signal

In order for speech recognition to function at an acceptable speed, the amount of data must be
reduced. Fortunately, some data in the speech signal are redundant, some are irrelevant to the
recognition process, and some need to be removed from the signal because they interfere with
accurate recognition. The challenge is to eliminate these detrimental components from the
signal without losing or distorting critical information contained in the data.
One method of reducing the quantity of data is to use filters to screen out frequencies above
3100 Hz and below 100 Hz. Such bandwidth narrowing is similar to using a zoom lens on a
video recorder. Another data reduction technique, sampling, reduces speech input to slices
(called samples) of the speech signal. Most speech recognition system takes 8000 to 10000
samples per second.

Digitizing the Waveform

The samples extracted from the analog signal must be converted into a digital form. The
process of converting the analog waveform representation into a digital code is called analog
to digital conversion or coding. To achieve high recognition accuracy at an acceptable speed
the conversion process must

 Include all critical data

 Remove redundancies
 Remove noise and distortion
 Avoid introducing new distortions

The preprocessor extracts acoustic patterns contained in each frame and captures the changes
that occur as the signal shifts from one frame to the next. This approach is called spectral
analysis because it focuses on individual element of the frequency spectrum. Two most
commonly used spectral analysis approaches are the

 Bank – of – filters approach (uses FFT)

 Linear Predictive Coding

RECOGNITION

Once the preprocessing of a user’s input is complete the recognizer is ready to perform its
primary function: to identify what the user has said. The competing recognition technologies
found in commercial speech recognition system are:

 Template Matching
 Acoustic – Phonetic Recognition
 Stochastic Processing
 Neural networks

These approaches differ in speed, accuracy, and storage requirements.

Template Matching

Template matching is a form of pattern recognition and was dominant recognition

methodology in the 1950’s and 1960’s. It represents speech data as sets of
features/parameters vectors called templates. Each word or phrase in an application is stored
as a separate template. Spoken input by end users is organized into templates prior to
performing the recognition process. The input is then compared with the stored templates
and, as figure (2) indicates, the stored template most closely matching the incoming speech
pattern is identified as the input word or phrase. The selected template is called, the best
match for the input. Template matching is performed at the word level and contains no
reference to the phonemes within the word. The matching process entails a frame-by-frame
comparison of spectral patterns and generates an overall similarity assessment (usually called
the distance matrix) for each template.

Figure (2): Recognition using template matching.

The comparison is not expected to produce an identical match. Individual utterances of the
same word, even by the same person, often differ in length. This variation can be due to a
number of factors, including difference in the rate at which the person is speaking, emphasis,
or emotion. Whatever the cause, there must be a way to minimize temporal differences
between patterns so that fast and slow utterances of the same word will not be identified as
different words. The process of minimizing temporal/word length differences is called
temporal alignment. The approach most commonly used to perform temporal alignment in
template matching is a pattern – matching technique called dynamic time warping (DTW).
DTW establishes the optimum alignment of one set of vectors (template) with another.
Dynamic time warping is an algorithm for measuring similarity between two sequences
which may vary in time or speed.

Most template matching systems have a predetermined threshold of acceptability. Its function
is to prevent noise and words not in the application vocabulary from being incorrectly
identified as acceptable speech input. If no template match exceeds the threshold of
acceptability no recognition is recorded. Applications and systems differ on how such non-
recognition events are handled. Many systems ask the user to repeat the word or utterance.
Template matching performs very well with small vocabularies of phonetically distinct items
but has difficulty making the fine distinctions required for larger vocabulary recognition and
recognition of vocabularies containing similar-sounding words (called confusable words).
Since it operates at the word level there must be at least one stored template for each word in
the application vocabulary. If, for example, there are five thousand words in an application,
there would need to be at least five thousand templates.

Although template matching is currently on the decline as a basic approach to recognition, it

has been adapted for use in word spotting applications. It also remains the primary
technology applied to speaker verification.

Acoustic – Phonetic Recognition

Acoustic- phonetic recognition supplanted template matching in the early 1970s. Unlike
template matching, acoustic – phonetic recognition functions at the phoneme level.
Theoretically, it is an attractive approach to speech recognition because it limits the number
of representations that must be stored to the number of phonemes needed for language.

Acoustic-phonetic recognition generally involves three steps:

 Feature – extraction
 Segmentation and labeling
 Word – level recognition

During feature extraction the system examines the input for spectral patterns, such as formant
frequencies, needed to distinguish phonemes from each other. The collection of extracted
feature is interpreted using acoustic – phonetic rules. These rules identify phonemes
(labelling) and determine where one phoneme ends and the next begins (segmentation).

The high degree of acoustic similarity among phonemes combined with phoneme variability
resulting from co-articulation effects and other sources create uncertainly with regard to
potential phone labels. As a result, the output of the segmentation and labelling stage is a set
of phoneme hypotheses. The hypotheses can be organized into a phoneme lattice (figure 3),
decision tree, or similar structure. Figure (3) displays more than one phoneme hypothesis for
a single point in the input.

Figure (3): Phoneme Lattice.

Once the segmentation and labelling process has been completed, the system searches
through the application vocabulary for words matching the phoneme hypotheses. The word
best matching a sequence of hypotheses is identified as the input item.

Stochastic Processing

The term stochastic refers to the process of making a sequence of non-deterministic

selections from among sets of alternatives. They are non-deterministic because the choices
during the recognition process are governed by the characteristics of the input and not
specified in advance. The use of stochastic models and processing permeates speech
recognition. Stochastic processing dominates current word – construction/recognition and
grammar.

Like template matching, stochastic processing requires the creation and storage of models of
each of the terms that will be recognized. At that point the two approaches diverge.
Stochastic processing involves no direct matching between stored models and input. Instead,
it is based upon complex statistical and probabilistic analyses which are best understood by
examining the network-like structure in which those statistics are stored: Hidden Markov
Model (HMM).

In 1913, A. A. Markov described a network model capable of generating Russian Letter

sequences or predicting letter sequence using probabilities acquired through exposure to
Russian texts. In the 1960’s and early 1970’s, Markov modeling was applied to multi-layer,
hierarchical structures by Baum and other researchers. Since the probabilistic calculations of
the underlying layers are not observed as part of the higher-level sequences these models
were called Hidden Markov Models (HMM’s).

Researchers began investigating using HMM’s for speech recognition in the early 1970’s.
One of the earliest proponents of the technology was James Baker, who used it to develop
CMU’s DRAGON system. Another early proponent of HMM’s was Frederick Jelinek, whose
research group at IBM was instrumental in advancing HMM technology. In 1982, James and
Janet Baker founded Dragon System, and soon developed the DragonScribe system, one of
the first commercial products using HMM technology. HMM technology did not gain
widespread acceptance for commercial system until the late 1980’s, but by 1990 HMM’s had
become the dominant approach to recognition.

An HMM, such as the one displayed in figure (4), consists of a sequence of states connected
by transitions. The states represent the alternatives of the stochastic process and the
transitions contain probabilistic and other data used to determine which state should be
selected next. The states of the HMM in figure (4) are displayed as circles and its transitions
are represented by arrows. Transitions from the first state of the HMM go to the first state
(called a recursive transition), to the next state, or to the third state of the HMM. If the HMM
in figure (4) is a stored model of the word “five”, it would be called a reference model for
“five” and would contain statistics about all the spoken samples of the word used to create the
reference model. Each state of the HMM holds statistics for a segment of the word. Those
statistics describe the parameter values and parameter values and parameter variation that
were found in samples of the word. A recognition system may have numerous HMM’s like
the one in figure (4) or may consolidate them into a network of states and transitions.

Figure (4): Typical HMM Structure.

The recognition system proceeds through the input comparing it with stored models. These
comparisons produce a probability score indicating the likelihood that a particular stored
HMM reference model is the best for the input. This approach is called the Baum-Welch
Maximum-likelihood algorithm. Another common method used for stochastic recognition is
the Viterbi algorithm. The Viterbi algorithm looks through a network of nodes for a sequence
of HMM states that correspond most closely to the input. This is called the best path.

Stochastic processing using HMM’s accurate, flexible, and capable of being fully automated.
It can be applied to units smaller than phonemes or as large as sequences of words. Stochastic
processing is most often used to represent and recognize speech at the word level (sometimes
called whole word recognition) and for a variant of the phoneme level called sub-words.

Neural Networks

Neural networks are computer programs used in speech recognizers, which could learn
important speech knowledge automatically and represent this knowledge in parallel
distributed fashion for rapid evaluation. Such system would mimic the function of human
brain, which consists of several billion simple, inaccurate, and slow processors that perform
reliable speech recognition (Alex Waibel & John Hampshire II (1989). They are sometimes
called artificial neural networks to distinguish neural network programs from biological
neural structures.

Neural networks are excellent classification systems. They specialize in classifying noisy,
patterned, variable data streams containing multiple, overlapping, interacting, and incomplete
cues. Speech recognition is a classification tasks that has all of these characteristics, making
neural network an attractive alternative to the approaches described above.

Unlike most other technologies, neural networks do not require that a complete specification
of a problem be created prior to developing a network – based solution. Instead, networks
learn patterns solely through exposure to large numbers of examples, making it possible to
construct neural networks for auditory models and other poorly understood areas. The fact
that networks accomplish all of these feats using parallel processing is of special interest
because increases in complexity do not entail significant reductions in speed.

The concept of artificial neural networks has its roots in the structure and behavior of the
human brain. The brain is composed of a network of specialized cells called neurons that
operate in parallel to learn and process a wide range of complex information. Like human
brain, neural networks are constructed from interconnected neurons (also called nodes or
processing elements) and learn new patterns by experiencing examples of those patterns.

Figure (5): Typical Neural Network Architecture.

TYPES OF SPEECH RECOGNITION

Speech recognition system can be separated in several different classes by describing what
types of utterances they have the ability to recognize. There classes are based on the fact that
one of the difficulties of ASR is the ability to determine when a speaker starts and finished an
utterance. Most can fit into more than one class depending on which made they are using.
Isolated words

 Isolated word recognizers usually require each utterance to have quite lack of audio
signal on both sides of the sample windows.
 This means that it accept single utterance at a time.
 Isolated utterances might be a better name for this class.
 Here each vocabulary word must be spoken in isolation which distinct pauses between
words.

Connected words

 Connect word systems (or more correctly connected utterances) are similar to isolated
words but allow separate words to be ‘run-together’ with a minimal pause between
them.

Continuous speech

 Recognizers with continuous speech capabilities are some of the most difficult to
create because they must utilize special methods to determine utterance boundaries.
 This allows users to speak almost naturally, while the computer determines the
content. Basically, it's computer dictation.
 It does not require pauses between words best but have a medium size vocabularies
and permit input of complete sentence.

Spontaneous speech

 At basic level it can be thought of as a speech that is natural sounding and not
rehearsed.
 As ASR system with spontaneous speech ability should be able to handle a variety of
natural speech features such as words being run together “ums” and “ah”, and even
slight stutters.

APPLICATION OF SPEECH RECOGNITION SYSTEM

o Health care
o Telephony and other domains (Mobile telephony, mobile email)
o Disabled people
o Automatic translation
o Automotive speech recognition
o Court reporting (Real-time Voice Writing)
o Speech Biometric Recognition
o Hands-free computing: voice command recognition computer user interface
o Home automation
o Interactive voice response
o Multimodal interaction
o Pronunciation evaluation in computer-aided language learning applications
o Robotics
o Transcription (digital speech-to-text).
o Voice to Text: Visual Voicemail services for mobile phones.

o Dictation

 This is most common use for ASR system today.

 Included medical transcriptions, legal and business dictation, as well as
general word processing.

o Military

 High-performance fighter aircraft

 Helicopter
 Battle management
 Training air traffic controllers.

o Command and control


ASR systems that are designed to perform functions and actions on the
system are defined as commands and control systems.
 E.g. Utterances like “open nets cape” and start a “new xterm”
o Embedded applications

 The main application area for speech recognition is voice input to

computers for such task as document creation (word processing), data
base information and financial transaction processing.
 Others include above mentioned one together with automated baggage
handling, parcel sorting, quality control and computer aided design and
manufacture.

PROBLEMS IN AUTOMATIC SPEECH RECOGNITION

 Acoustic variations in individual speech production.

 There are no identifiable boundaries between sound or even words.
 Variability due to dialect which often includes leaving out certain sounds or replacing
one sound with another.
 Prosodic features such as intonation, rhythm and stress also cause variability in the
speech.

Speech Recognition Seminar Report
87% (97)
Speech Recognition Seminar Report
32 pages
Physics Implementation Lab
100% (4)
Physics Implementation Lab
9 pages
Feature Extraction Using PCA
No ratings yet
Feature Extraction Using PCA
36 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
Speech Technology
No ratings yet
Speech Technology
5 pages
Term Paper ECE-300 Topic: - Speech Recognition
No ratings yet
Term Paper ECE-300 Topic: - Speech Recognition
14 pages
Speech Recognition1
No ratings yet
Speech Recognition1
24 pages
ICCSEE.2012.359
No ratings yet
ICCSEE.2012.359
4 pages
SPEECH RECOGNITION SYSTEM
No ratings yet
SPEECH RECOGNITION SYSTEM
5 pages
Speech Recognition Full Report
No ratings yet
Speech Recognition Full Report
11 pages
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
No ratings yet
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
3 pages
NLP Project Reportttt
No ratings yet
NLP Project Reportttt
9 pages
A Review On Different Approaches For Speech - Recognition System
No ratings yet
A Review On Different Approaches For Speech - Recognition System
6 pages
Working of A Voice Recognition System
No ratings yet
Working of A Voice Recognition System
2 pages
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
No ratings yet
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
6 pages
Speech Recognition UTHM
No ratings yet
Speech Recognition UTHM
30 pages
IRJET Speech Scribd
No ratings yet
IRJET Speech Scribd
3 pages
AI Speech Recognition Document
No ratings yet
AI Speech Recognition Document
26 pages
Artificial Intelligence-An Introduction: Department of Computer Science & Engineering
No ratings yet
Artificial Intelligence-An Introduction: Department of Computer Science & Engineering
17 pages
Speech Recognition: BY Charu Joshi
100% (2)
Speech Recognition: BY Charu Joshi
26 pages
Speech Recognition - Specific Task of Speech Recognition: Abstract
No ratings yet
Speech Recognition - Specific Task of Speech Recognition: Abstract
7 pages
Ann LA2 Project
No ratings yet
Ann LA2 Project
23 pages
Ai For Speech Recognition
No ratings yet
Ai For Speech Recognition
19 pages
Speech Recognition: College Name: Guru Nanak Engineering College Authors: Shruthi Tapse
No ratings yet
Speech Recognition: College Name: Guru Nanak Engineering College Authors: Shruthi Tapse
13 pages
Automatic Speaker Recognition System
No ratings yet
Automatic Speaker Recognition System
11 pages
9 Speech Recognition
No ratings yet
9 Speech Recognition
26 pages
Speech Recognition1
100% (1)
Speech Recognition1
39 pages
A Report On
No ratings yet
A Report On
35 pages
An Introduction To Speech and Speaker Recognition
No ratings yet
An Introduction To Speech and Speaker Recognition
8 pages
Dragon's Breath: Mastering Voice Recognition in the Digital Age
From Everand
Dragon's Breath: Mastering Voice Recognition in the Digital Age
Pasquale De Marco
No ratings yet
(IJCST-V9I2P18) :swati, Harpreet Kaur
No ratings yet
(IJCST-V9I2P18) :swati, Harpreet Kaur
6 pages
SPEECH
100% (1)
SPEECH
17 pages
Speech Recognition
0% (1)
Speech Recognition
27 pages
Speech Recognition Project
No ratings yet
Speech Recognition Project
33 pages
Ai For Speech Recognition
No ratings yet
Ai For Speech Recognition
27 pages
Speech Recognition
No ratings yet
Speech Recognition
17 pages
Reconocimiento de Voz - MATLAB
No ratings yet
Reconocimiento de Voz - MATLAB
5 pages
Piyu Sem Report.5
No ratings yet
Piyu Sem Report.5
30 pages
Speaker Recognition System: A Project Report On
No ratings yet
Speaker Recognition System: A Project Report On
48 pages
Speech Recognition: BY Charu Joshi
No ratings yet
Speech Recognition: BY Charu Joshi
26 pages
Question
100% (1)
Question
17 pages
Speaker Recognition System
No ratings yet
Speaker Recognition System
7 pages
Speech Recognition Report
100% (1)
Speech Recognition Report
20 pages
A Study On Automatic Speech Recognition
100% (1)
A Study On Automatic Speech Recognition
2 pages
Speech Recognition
No ratings yet
Speech Recognition
4 pages
Speech Recognition As Emerging Revolutionary Technology
No ratings yet
Speech Recognition As Emerging Revolutionary Technology
4 pages
Speaker Recognition System - v1
No ratings yet
Speaker Recognition System - v1
7 pages
Research paper
No ratings yet
Research paper
9 pages
SPEECH RECOGNITION SYSTEM Final
No ratings yet
SPEECH RECOGNITION SYSTEM Final
16 pages
Preprocessing Signal
No ratings yet
Preprocessing Signal
6 pages
Speech Recognition System - A Review: April 2016
No ratings yet
Speech Recognition System - A Review: April 2016
10 pages
An Overview of the Development of Speaker Recognition
No ratings yet
An Overview of the Development of Speaker Recognition
11 pages
HG3052 CourseOutline SpeechSynthesisRecognition AY2019-20 SEM1 Update Sep10
No ratings yet
HG3052 CourseOutline SpeechSynthesisRecognition AY2019-20 SEM1 Update Sep10
6 pages
Speaker Recognition
No ratings yet
Speaker Recognition
11 pages
Ijreas Volume 3, Issue 3 (March 2013) ISSN: 2249-3905 Efficient Speech Recognition Using Correlation Method
No ratings yet
Ijreas Volume 3, Issue 3 (March 2013) ISSN: 2249-3905 Efficient Speech Recognition Using Correlation Method
9 pages
Speechrecognitionfinalpresentation 141124072610 Conversion Gate01
No ratings yet
Speechrecognitionfinalpresentation 141124072610 Conversion Gate01
30 pages
A Survey On Speech Recognition
No ratings yet
A Survey On Speech Recognition
2 pages
Shareef Seminar Docs
No ratings yet
Shareef Seminar Docs
24 pages
Speaker Recognition: Fundamentals and Applications
From Everand
Speaker Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Visual Word: Unlocking the Power of Image Understanding
From Everand
Visual Word: Unlocking the Power of Image Understanding
Fouad Sabry
No ratings yet
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
Battle of Deep Fakes Artificial Intelligence Set To Become A Major
No ratings yet
Battle of Deep Fakes Artificial Intelligence Set To Become A Major
5 pages
EPL English All
No ratings yet
EPL English All
8 pages
Sen Ky003 Datasheet
No ratings yet
Sen Ky003 Datasheet
1 page
GIS Practical No 4
No ratings yet
GIS Practical No 4
21 pages
Ii. 5 Tes - WPS Rebabbit Bearing Pltu Tembilahan
No ratings yet
Ii. 5 Tes - WPS Rebabbit Bearing Pltu Tembilahan
4 pages
24 - Single-Source Shortest Paths
No ratings yet
24 - Single-Source Shortest Paths
55 pages
APAAR State
No ratings yet
APAAR State
11 pages
Liquid Ring Vacuum Pumps: LEM 90, LEM 125, LEM 150 LEL 90, LEL 125, LEL 150
No ratings yet
Liquid Ring Vacuum Pumps: LEM 90, LEM 125, LEM 150 LEL 90, LEL 125, LEL 150
12 pages
Service Bulletin - Product: Driver Instructions TRDR0950 GET 2 KNOW Videos
No ratings yet
Service Bulletin - Product: Driver Instructions TRDR0950 GET 2 KNOW Videos
2 pages
Monitoring Perubahan Penggunaan Lahan Pesisir Di Kabupaten Gianyar Menggunakan Citra Landsat 1997 Dan 2018
No ratings yet
Monitoring Perubahan Penggunaan Lahan Pesisir Di Kabupaten Gianyar Menggunakan Citra Landsat 1997 Dan 2018
8 pages
m16 Script
No ratings yet
m16 Script
4 pages
BWIT_VaibhhavPBadkelwr_Domain Consultant
No ratings yet
BWIT_VaibhhavPBadkelwr_Domain Consultant
4 pages
Quiz Proposed Title: Truth or Dare Game Background of The Study
No ratings yet
Quiz Proposed Title: Truth or Dare Game Background of The Study
1 page
English Johnryl
No ratings yet
English Johnryl
2 pages
A120/140 DIN: Multi-Rate Single Phase Meter
No ratings yet
A120/140 DIN: Multi-Rate Single Phase Meter
4 pages
Visual Business Configuration With SAP TM
No ratings yet
Visual Business Configuration With SAP TM
10 pages
Onward Tech AR
No ratings yet
Onward Tech AR
263 pages
Atul Gaikwad CV-2
No ratings yet
Atul Gaikwad CV-2
1 page
Innovative Mobile and Internet Services in Ubiquitous Computing Proceedings of the 14th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing IMIS 2020 Leonard Barolli 2024 Scribd Download
100% (3)
Innovative Mobile and Internet Services in Ubiquitous Computing Proceedings of the 14th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing IMIS 2020 Leonard Barolli 2024 Scribd Download
55 pages
Flavianus Dui Saverino Rahim - 13.2020.1.00964 - 6
No ratings yet
Flavianus Dui Saverino Rahim - 13.2020.1.00964 - 6
4 pages
Soa 100011087594 520005804198
No ratings yet
Soa 100011087594 520005804198
1 page
Narayana Engineering College::Nellore
No ratings yet
Narayana Engineering College::Nellore
16 pages
New Product Development Report: Solar Tech Enterprise SDN BHD Garden Solar Lamp
No ratings yet
New Product Development Report: Solar Tech Enterprise SDN BHD Garden Solar Lamp
32 pages
Study On The Analysis of Near-Miss Ship Collisions Using Logistic Regression
No ratings yet
Study On The Analysis of Near-Miss Ship Collisions Using Logistic Regression
7 pages
LFI Vulnerability
100% (1)
LFI Vulnerability
7 pages
4.Dr Gagandeep Kaur
No ratings yet
4.Dr Gagandeep Kaur
13 pages
DOC-20241105-WA0020.
No ratings yet
DOC-20241105-WA0020.
24 pages
2025-03-11_18_10_52_Virtual_Card_Logs
No ratings yet
2025-03-11_18_10_52_Virtual_Card_Logs
6 pages
Nexo Ippbx Admin Manual v1.2
No ratings yet
Nexo Ippbx Admin Manual v1.2
115 pages

_speech recognition system

Uploaded by

_speech recognition system

Uploaded by

SPEECH RECOGNITION SYSTEM

SUBMITTED TO: Ms. Litna

SUBMITTED BY: Ms. Ashithanandan

Preprocessing, recognition and communication should be invisible to the users of a speech

THE DATA OF SPEECH RECOGNITION

 Articulation - Analysis of how speech sounds are produced by speakers

Capturing the Speech Signal

Digitizing the Waveform

 Include all critical data

 Bank – of – filters approach (uses FFT)

These approaches differ in speed, accuracy, and storage requirements.

Template matching is a form of pattern recognition and was dominant recognition

Figure (2): Recognition using template matching.

Although template matching is currently on the decline as a basic approach to recognition, it

Acoustic – Phonetic Recognition

Acoustic-phonetic recognition generally involves three steps:

Figure (3): Phoneme Lattice.

The term stochastic refers to the process of making a sequence of non-deterministic

In 1913, A. A. Markov described a network model capable of generating Russian Letter

Figure (4): Typical HMM Structure.

Figure (5): Typical Neural Network Architecture.

TYPES OF SPEECH RECOGNITION

APPLICATION OF SPEECH RECOGNITION SYSTEM

 This is most common use for ASR system today.

 High-performance fighter aircraft

o Command and control

 The main application area for speech recognition is voice input to

PROBLEMS IN AUTOMATIC SPEECH RECOGNITION

 Acoustic variations in individual speech production.

You might also like