Automatic Speech Recognition: A Review: Anchal Katyal, Amanpreet Kaur, Jasmeen Gill
Automatic Speech Recognition: A Review: Anchal Katyal, Amanpreet Kaur, Jasmeen Gill
Published By:
71 Blue Eyes Intelligence Engineering
& Sciences Publication Pvt. Ltd.
Automatic Speech Recognition: A Review
Among the various models, Hidden Markov Model (HMM) C. Language modeling
is so far the most widely used technique due to its efficient It defines what kind of phoneme and word sequences are
algorithm for training and recognition. possible in the target language or application at hand and
2.3 Phases of ASR what are their probabilities.
Automatic speech recognition system involves two phases: D. Decoding
A. Training phase The acoustic models and language models are used in for
searching the recognition hypothesis that fits best to the
A rigorous training procedure is followed to map the basic
models. Recognition output can then be used in various
speech unit such as phone, syllable to the acoustic
applications.
observation. In training phase, known speech is recorded,
pre-processed and then enters the first stage i.e. Feature
IV. SPEECH RECOGNITION PROCESS
extraction. The next three stages are HMM creation, HMM
training and HMM storage [2]. In essence, the basic task involved in speech recognition is
that of going from speech recordings to word labels. As the
B. Recognition phase pattern recognition approach to speech recognition is the
The recognition phase starts with the acoustic analysis of most widely used approach. There are two main variants of
unknown speech signal. The signal captured is converted to the basic speech recognition task, namely isolated word
a series of acoustic feature vectors. Using suitable recognition and connected word recognition [2] [3].
algorithm, the input observations are processed. The speech
is compared against the HMM’s networks and the word 4.1 Variants of the Speech Recognition Task
which is pronounced is displayed. An ASR system can only A. Isolated word recognition
recognize what it has learned during the training process. Isolated word recognition refers to the task of recognizing a
But, the system is able to recognize even those words, which single spoken word where the choice of words is not
are not present in the training corpus and for which sub- constrained to task syntax or semantics. HMMs can be used
word units of the new word are known to the system and the to build an isolated word recognizer. HMM approach is a
new word exists in the system dictionary [4]. well-known and widely used statistical method of
characterizing the spectral properties of the frames of a
III. CLASSIFICATIONS OF ASR pattern. HMMs are particularly suitable for speech
The goal of an ASR system is to accurately and efficiently recognition as the speech signal can be well characterized as
convert a speech signal into a text message transcription of a parametric random process and the parameters of the
the spoken words independent of the speaker, environment stochastic process can be determined in a precise, well-
or the device used to record the speech [7]. defined manner.
Speech B. Fluent speech Recognition
Fluent speech recognition is a more complicated task than
isolated word recognition. In this case the task is to
Feature Extraction recognize a continuous string of words from the vocabulary.
C. Feature Extraction and Pattern Recognition
Acoustic Language The input into an automatic speech recognition system is the
Feature Modeling speech signal. The two major tasks involved in speech
ModelingDecoding recognition are feature extraction and pattern recognition.
Feature Extraction
In all speech recognition systems the first step in the process
Recognized Text is signal processing. Initially a spectral / temporal analysis
of the speech signal is performed to give observation vectors
Text which can be used to train the HMMs. One way to obtain
Speech Retrieval Machine Translator observation vectors from speech samples is to perform
spectral analysis. A type of spectral analysis that is often
used is linear predictive coding.
Fig1: Main Components of anTranslator
Automatic Speech
Recognition System Pattern Recognition
A. Feature Extraction Pattern recognition refers to the matching of features. The
pattern recognition process consists of training and testing.
Signal processing techniques are applied to the speech During training, a model of each vocabulary word must be
signal in order to dig out the features that distinguish created. Each model consists of a set of features extracted
different phonemes from each other. from the speech signal. The exact form of the model
B. Acoustic modeling depends on the type of pattern-recognition algorithm used.
It provides probabilities for different phonemes at different During testing, a similar model is created for the unknown
word [3]. The pattern-recognition algorithm compares the
time instants. It is the statistical mapping from the units of
speech to all the features of speech. These are used for model of the unknown word with the models of known
speech sounds to phoneme and from phoneme to word. words and selects the word whose model score is highest.
There are many different pattern matching techniques.
Published By:
72 Blue Eyes Intelligence Engineering
& Sciences Publication Pvt. Ltd.
International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 – 8958, Volume-3, Issue-3, February 2014
Published By:
73 Blue Eyes Intelligence Engineering
& Sciences Publication Pvt. Ltd.
Automatic Speech Recognition: A Review
Published By:
74 Blue Eyes Intelligence Engineering
& Sciences Publication Pvt. Ltd.