Body and Conclu

The document discusses the history and development of speech recognition technology over time. It covers early systems from the 1950s that could only recognize numbers, to advances in the 1970s and 1980s, to modern systems that achieve 80% accuracy. It also describes different approaches to speech recognition including acoustic phonetic, pattern recognition, and artificial intelligence methods.

Uploaded by

Alyssa Sarah Arenas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views3 pages

Body and Conclu

Uploaded by

Alyssa Sarah Arenas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Introduction

Speech, being the most common, fastest and most natural method of communication for
human beings, has been an active research topic for years. One of the most promising
discoveries is using such a method of interaction between human and machine, also known as
speech recognition. In fact, computer scientists since the 1950s have been trying to configure
ways on how to interpret, record, and understand human speech. However, such a system, the
“Audrey” system of Bell Laboratories, was only able to focus on numbers and not on words.
Different advancements were observed in the later centuries. Carnegie Mellon’s “Harpy’ speech
system” in the 1970s was able to understand over 1,000 words and the “Hidden Markov Model
(HMM)” in the 1980s which helped recognize speech in a noisy environment. Eventually, the
speech recognition technology achieved an estimated 80 % level of accuracy by the year 2001.
[10]. In the 20th century, Google Voice Search became one of the prominent advancements,
letting speech recognition be on the hands of millions of people alongside Apple’s Siri and
Amazon’s Alexa. This type of technology has greatly affected the way people live and became
one of the key features which humans enjoy in their devices (e.g., Siri, Google Now, and
Cortana) [2].

With the widespread application and continuous advancement of speech recognition

technology, it must be understood that to comprehend and appreciate how speech recognition
affects our current lifestyle, basic theories and concepts must be reviewed. Learning the
fundamental concepts, as well as identifying the several developments, problems, and solutions
made over the years will help in future studies and progresses in speech recognition
technology. According to Alhawiti [3], speech recognition is the approach of converting speech
signals into sequences of words and that there are several algorithms and techniques capable
of doing so. With several studies experimenting with different approaches, he observed that
artificial intelligence is the most accurate and effective approach among the approaches and
decided to use it as the researches’ main topic. Using a literature review, researchers can
concentrate on a topic without making it too broad or too specific, in addition to the comparison
of results of the studies for better application of the topic. Over the years, numerous studies
were made in understanding and advancing speech recognition. A literature review on speech
recognition will be able to recap the findings over the years as well as summarize the discussion
on several categories of speech recognition.

According to Zwass [11], speech recognition is “the ability of devices to respond to

spoken commands.” Using algorithms, it converts speech signals to a sequence of words. It is a
hand-free control of devices and equipment which is usually used for querying databases,
dictation, giving commands to computer-based systems and others. There are different
statistical models of speech recognition. Lexicon model has a vital role in speech recognition
due to identifying the word pronunciation within the speech [1]. According to Alhawiti [3], the
Lexicon model process is based on two parameters: whole-word access and decomposition of
target speech into small portions. The acoustic model (AM) is one that captures the
characteristics of basic recognition units: phoneme level, syllable level, and word level [6].
Another model is the language model (LM), which conveys the behavior of the language and
intends to predict the presence of defined word sequences in the target speech, minimizing the
search space [4]. Lastly, Hidden Markov model (HMM) is the most popular statistical model,
wherein a model was made to counteract issues in speech recognition, becoming the basis of
HMM-based speech recognition technology [5]. Different approaches can be done for speech
recognition. One of which is the acoustic phonetic approach which deals with the acoustic
properties of speech sounds by analyzing sound wave signals through the variation of the
amplitudes, durations and frequencies [8]. Another approach would be the pattern recognition
approach which deals with the recognition of patterns through algorithms in machine learning
and classifies the data already gained or through the extracted statistical information from the
discovered trend [9]. Lastly, the artificial intelligence approach involves the study of thought
processes among human beings and representing such processes via machines [7]. Studies on
statistical models and approaches of speech recognition will be further discussed in the next
pages of the paper.

Body

Being the fastest, most common, and most natural method of communication for human
beings, speech has been an active research topic for years. Speech recognition is one of the
most promising discoveries regarding this topic. It conforms to a method of interaction between
human and machine. Since the 1960s, computer scientists have been trying to configure ways
on how to interpret, record, and understand human speech. However, the “Audrey” system of
Bell Laboratories introduced in this decade was only able to focus on numbers and not on
words. Later centuries introduced different advancements to such systems. The “Harpy’ speech
system” in the 1970s by Carnegie Mellon was able to understand 1,000 words. In the 1980s, the
“Hidden Markov Model” helped recognize speech even in a noisy environment. Eventually, 80
percent level of accuracy in speech recognition technology was achieved by the year 2001 [7].
Google Voice Search has become one of the prominent advancements in the 20th century,
letting speech recognition be on the hands of millions of people alongside Apple’s Siri and
Amazon’s Alexa. According to Zwass [8], speech recognition is “the ability of devices to respond
to spoken commands.” Using algorithms, it converts speech signals to a sequence of words. It
is a hand-free control of devices and equipment which is usually used for querying databases,
dictation, giving commands to computer-based systems and others. This type of technology has
greatly affected the way people live and became one of the key features which humans enjoy in
their devices (e.g., Siri, Google Now, and Cortana) [3].

There are three significant approaches or methods that could be done in speech
recognition systems namely acoustic phonetic approach, pattern recognition approach, and
artificial intelligence approach [4].

Speech sounds alongside their acoustic qualities and physiological production is studied
in phonetics. This can be further subdivided into articulatory phonetics, acoustic phonetics, and
linguistic phonetics [5]. Acoustic phonetics basically deals with the wave-like properties of
speech sounds in the acoustics aspect. It deals with the acoustic properties of speech sounds
by analyzing sound wave signals through the variation of the amplitudes, durations and
frequencies [6]. The phonetic units represented by these acoustic properties varies with time as
it depends on both the neighboring sounds and the speaker, thus, the coarticulation effect.
Despite this variance, there is an assumption that this kind of approach can easily be learned
and adapted by a machine because of its straightforwardness. Acoustic phonetic approach is
governed by three main steps. First, together with a feature detection technique, spectral
analysis of the speech signal is done. It helps convert the measurements into values that can
describe its respective acoustic properties. Segmentation and labeling is the next step. Here,
the acquired speech signal is subdivided into stable acoustic regions which are properly labelled
forming a lattice characterization of the sample. Lastly, attempts are then made to determine an
existing word from the acquired phonetic label sequences. However, this type of approach has
not been commercially used to a large extent [4].
Pattern recognition approach, on the other hand, uses larger units such as patterns to
avoid difficulties with using a phenome, which are tiny units of speech sounds, primarily used in
the aforementioned acoustic phonetic approach. According to Bridle [2], pattern recognition
approach is equipped with three promising features namely consistent speech recognition, well
formulated mathematical framework and reliable pattern comparison. It follows four main steps
namely feature extraction, pattern training, pattern matching, and decision logic to extract
patterns, classify, as well as separate the said patterns. In the feature extraction step, a
reference pattern is obtained in consideration of one or more test patterns on the input signal.
This pattern can be in the form of a speech template or speech model such as HMM model and
can be used to a phrase, word or sound. In the pattern training stage, each possible pattern is
adapted and learned by the machine. In pattern matching, the speech to be recognized is then
directly compared to the possible patterns acquired in pattern training. Lastly, in the decision
logic stage, the match percentage between the patterns determines the identity of the sample.
For the last six decades, this type of approach is the prime method for speech recognition [4].

Pattern recognition can be further classified into four categories namely template-based
approach, stochastic approach, dynamic time warping and vector quantization. In a template-
based approach, a dictionary of the candidate’s words is formulated through the collection of
prototypical speech patterns. These are then stored as the reference patterns. Thus, this
basically compares the unknown sample to the best matching pattern from the said dictionary.
Contextual effects, speaker variability, homophone words and confusable sounds are only some
of the mostly encountered sources of these uncertain and incomplete information. Stochastic
approach, on the other hand, models based on probability are ensured to deal with these types
of information. Among researchers, the most popular stochastic model is the Hidden Markov
Model (HMM). As compared to the template-based approach, the stochastic approach has a
better mathematical background. Dynamic time warping, on the other hand, compares two
sequences varying in time or speed. It provides a linear representation of the acquired data may
it be in the form of graphics, audio or video. Lastly, vector quantization makes use of reference
models in the form of codebooks and instead of costly evaluation methods, uses codebook
searchers. It has a great data reduction efficiency and very much used as Speech Coders [4].

The last main type or speech recognition approach is artificial intelligence approach.
According to John McCarthy, artificial intelligence can be described as “human intelligence
exhibited by machines” [1]. It can be considered as a hybrid of the previously discussed
acoustic phonetics approach and pattern recognition approach as this incorporates both of their
concepts. It can be further divided into three categories namely knowledge-based approach,
connectionist approach and super vector machine. Knowledge based approach uses phonetic,
spectrogram and linguistic related information. This plays an important role in the design of
recognition algorithms, definition of unit speech and the selection of suitable input itself.
However, it is quite disadvantageous in terms of incorporation of expert and human knowledge
such as pragmatics, semantics and syntax. Connectionist approach or artificial neural network
applies intelligence in visualizing, analyzing and characterizing speech based on a set of certain
features, same with the mechanism of biological neurons. This type of approach is
commendable for its uniformity and simplicity. Lastly, support vector machines use a
discriminative model which provides decision boundaries. Thus, it can only classify on a fixed
length of data. For classification of data points. , it uses both nonlinear and linear hyper-planes
[4].

Speech Recognition Seminar Report
87% (97)
Speech Recognition Seminar Report
32 pages
Deep Learning Notes Andrew NG
No ratings yet
Deep Learning Notes Andrew NG
54 pages
Speech Processing
No ratings yet
Speech Processing
4 pages
AI Speech Recognition Document
No ratings yet
AI Speech Recognition Document
26 pages
ASR Proof
No ratings yet
ASR Proof
19 pages
Human-Computer Interaction Based On Speech Recogni
No ratings yet
Human-Computer Interaction Based On Speech Recogni
9 pages
Introduction to Speech Recognition
No ratings yet
Introduction to Speech Recognition
3 pages
25 The Comprehensive Analysis Speech Recognition System
No ratings yet
25 The Comprehensive Analysis Speech Recognition System
5 pages
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
No ratings yet
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
6 pages
Speech Recognition Using Neural Networks IJERTV7IS100087
No ratings yet
Speech Recognition Using Neural Networks IJERTV7IS100087
7 pages
Project New
No ratings yet
Project New
2 pages
A Report On
No ratings yet
A Report On
35 pages
ICCSEE.2012.359
No ratings yet
ICCSEE.2012.359
4 pages
IRJET Speech Scribd
No ratings yet
IRJET Speech Scribd
3 pages
Tejaswini Group Report
No ratings yet
Tejaswini Group Report
18 pages
Rohit
No ratings yet
Rohit
14 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
9 pages
Speech Recognition Technology: Applications & Future: Pankaj Pathak
No ratings yet
Speech Recognition Technology: Applications & Future: Pankaj Pathak
3 pages
Automatic Speech Recognition (Attempt) : ECE 113DB Final Project, Winter 2019 Fong Chi Ho, Zijun Sun, Shao Xiong Lee
No ratings yet
Automatic Speech Recognition (Attempt) : ECE 113DB Final Project, Winter 2019 Fong Chi Ho, Zijun Sun, Shao Xiong Lee
4 pages
Speech Recognition: From Wikipedia, The Free Encyclopedia
0% (1)
Speech Recognition: From Wikipedia, The Free Encyclopedia
16 pages
Research paper
No ratings yet
Research paper
9 pages
Voice Recognition System Speech To Text
No ratings yet
Voice Recognition System Speech To Text
5 pages
Artificial Intelligence in Voice Recognition
No ratings yet
Artificial Intelligence in Voice Recognition
14 pages
The Development Process and Current State of The Speech Recognition Technology
No ratings yet
The Development Process and Current State of The Speech Recognition Technology
8 pages
Vivek Kumar - 1613112052
No ratings yet
Vivek Kumar - 1613112052
7 pages
Application and Development Prospect of AI Speech Recognition Technology
No ratings yet
Application and Development Prospect of AI Speech Recognition Technology
5 pages
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
No ratings yet
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
6 pages
Literature Review On Automatic Speech Recognition
No ratings yet
Literature Review On Automatic Speech Recognition
9 pages
Voice Recognition
From Everand
Voice Recognition
Kai Turing
No ratings yet
4
No ratings yet
4
37 pages
Introduction To Artificial Intelligence
No ratings yet
Introduction To Artificial Intelligence
19 pages
A Review On Speech Recognition Challenge
No ratings yet
A Review On Speech Recognition Challenge
7 pages
A Survey On Speech Recognition
No ratings yet
A Survey On Speech Recognition
2 pages
Speech Recognition Using Artificial Neural Network: - A Review
100% (1)
Speech Recognition Using Artificial Neural Network: - A Review
4 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
Speech Recognition: College Name: Guru Nanak Engineering College Authors: Shruthi Tapse
No ratings yet
Speech Recognition: College Name: Guru Nanak Engineering College Authors: Shruthi Tapse
13 pages
A Seminar Report On: R. H. Sapat College of Engineering, Management Studies and Research
No ratings yet
A Seminar Report On: R. H. Sapat College of Engineering, Management Studies and Research
32 pages
A Study On Automatic Speech Recognition
100% (1)
A Study On Automatic Speech Recognition
2 pages
Speech Recognition Technology
No ratings yet
Speech Recognition Technology
9 pages
Ijreas Volume 3, Issue 3 (March 2013) ISSN: 2249-3905 Efficient Speech Recognition Using Correlation Method
No ratings yet
Ijreas Volume 3, Issue 3 (March 2013) ISSN: 2249-3905 Efficient Speech Recognition Using Correlation Method
9 pages
Speech Recognition As Emerging Revolutionary Technology
No ratings yet
Speech Recognition As Emerging Revolutionary Technology
4 pages
Chapter 1. INTRODUCTION
No ratings yet
Chapter 1. INTRODUCTION
2 pages
355-Article Text-710-1-10-20210107
No ratings yet
355-Article Text-710-1-10-20210107
12 pages
Speech Recognition
100% (3)
Speech Recognition
66 pages
An In-Depth Analysis of Automatic Speech Recognition System
No ratings yet
An In-Depth Analysis of Automatic Speech Recognition System
5 pages
I6157630 - TG1 - Cara Henning - Machine Learning - Voice and Speech Recognition System
No ratings yet
I6157630 - TG1 - Cara Henning - Machine Learning - Voice and Speech Recognition System
11 pages
NLP Project Reportttt
No ratings yet
NLP Project Reportttt
9 pages
Speech Recognition System - A Review: April 2016
No ratings yet
Speech Recognition System - A Review: April 2016
10 pages
Synopsis
No ratings yet
Synopsis
5 pages
Speech Recognition
100% (4)
Speech Recognition
576 pages
Speech Recognition
No ratings yet
Speech Recognition
27 pages
Ibrahim 2020
No ratings yet
Ibrahim 2020
5 pages
Working of A Voice Recognition System
No ratings yet
Working of A Voice Recognition System
2 pages
Shareef Seminar Docs
No ratings yet
Shareef Seminar Docs
24 pages
Speech Recognition1
No ratings yet
Speech Recognition1
24 pages
Ai For Speech Recognition
No ratings yet
Ai For Speech Recognition
19 pages
Speaker Recognition System: A Project Report On
No ratings yet
Speaker Recognition System: A Project Report On
48 pages
Speech Recognition-Statistical Methods
No ratings yet
Speech Recognition-Statistical Methods
18 pages
Speech Recognition
No ratings yet
Speech Recognition
4 pages
Speech Recognition: Fundamentals and Applications
From Everand
Speech Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Terminology Extraction: Fundamentals and Applications
From Everand
Terminology Extraction: Fundamentals and Applications
Fouad Sabry
No ratings yet
Portfolio For Module 2
No ratings yet
Portfolio For Module 2
46 pages
Arenas MSE137 Act3
No ratings yet
Arenas MSE137 Act3
1 page
MSE129 - B20 - ANEVAZO Plant Layout
No ratings yet
MSE129 - B20 - ANEVAZO Plant Layout
1 page
MSE132 Act1
No ratings yet
MSE132 Act1
3 pages
REVISED - MSE129 - B20 - ANEVAZO Machinery and Equipment
No ratings yet
REVISED - MSE129 - B20 - ANEVAZO Machinery and Equipment
21 pages
REVISED - MSE129 - B20 - ANEVAZO Plant Layout
No ratings yet
REVISED - MSE129 - B20 - ANEVAZO Plant Layout
1 page
MSE129 - B20 - ANEVAZO Plant Layout
No ratings yet
MSE129 - B20 - ANEVAZO Plant Layout
1 page
FINAL MSE129 - B20 - ANEVAZO Plant Layout
No ratings yet
FINAL MSE129 - B20 - ANEVAZO Plant Layout
2 pages
Arenas MSE149P LabComponent4
No ratings yet
Arenas MSE149P LabComponent4
8 pages
Finalquiz3a Arenas
No ratings yet
Finalquiz3a Arenas
3 pages
4D Printing Based Piezoelectric Composite For Medical Applications
No ratings yet
4D Printing Based Piezoelectric Composite For Medical Applications
9 pages
Applied Data Science
No ratings yet
Applied Data Science
7 pages
Basic Concepts
No ratings yet
Basic Concepts
34 pages
CSPC41 Es 2022
No ratings yet
CSPC41 Es 2022
2 pages
Neural Networks
No ratings yet
Neural Networks
38 pages
Surface Defect Detection Using YOLO Network
No ratings yet
Surface Defect Detection Using YOLO Network
12 pages
? Google Generative AI Course Summaries
No ratings yet
? Google Generative AI Course Summaries
3 pages
Lecture7-8 Diffusion Model
No ratings yet
Lecture7-8 Diffusion Model
136 pages
ANN Lab Manual
100% (3)
ANN Lab Manual
35 pages
How Machines Learn an Illustrated Guide to Machine Learning (Helen Edwards) (Z-Library)
No ratings yet
How Machines Learn an Illustrated Guide to Machine Learning (Helen Edwards) (Z-Library)
63 pages
Deep Learning Chapter - 3 1
No ratings yet
Deep Learning Chapter - 3 1
16 pages
Supervised Learning - A Systematic Literature Review
No ratings yet
Supervised Learning - A Systematic Literature Review
22 pages
07 AIS302 CNN
No ratings yet
07 AIS302 CNN
56 pages
Speech Emotion Recognition PDF
No ratings yet
Speech Emotion Recognition PDF
5 pages
Generative AI and ChatGPT for Beginners_ a Comprehensive Guide to Harness the Power of AI, Boost Productivity, And Get More Done in Less Time (Tech Mastery)
No ratings yet
Generative AI and ChatGPT for Beginners_ a Comprehensive Guide to Harness the Power of AI, Boost Productivity, And Get More Done in Less Time (Tech Mastery)
118 pages
Rahul Gupta2
No ratings yet
Rahul Gupta2
3 pages
Deep Learning Techniques (Important Questions)
No ratings yet
Deep Learning Techniques (Important Questions)
5 pages
CS 229 - Deep Learning Cheatsheet
No ratings yet
CS 229 - Deep Learning Cheatsheet
6 pages
Artificial Neural Networks Jntu Model Com
No ratings yet
Artificial Neural Networks Jntu Model Com
8 pages
4 LLM Fine Tuning Techniques
No ratings yet
4 LLM Fine Tuning Techniques
8 pages
Image Classification Using CNN: Page - 1
No ratings yet
Image Classification Using CNN: Page - 1
13 pages
2024 Jorzik Klein Kanbach Kraus
No ratings yet
2024 Jorzik Klein Kanbach Kraus
18 pages
STTP Schedule Atal
No ratings yet
STTP Schedule Atal
1 page
Chapter#10 (Part#01) SL (K-NN)
No ratings yet
Chapter#10 (Part#01) SL (K-NN)
27 pages
Spatial Gated Multi-Layer Perceptron For Land Use and Land Cover Mapping
No ratings yet
Spatial Gated Multi-Layer Perceptron For Land Use and Land Cover Mapping
5 pages
ml_cheatsheet
No ratings yet
ml_cheatsheet
4 pages
Lecture 3 CNN - Backpropagation
No ratings yet
Lecture 3 CNN - Backpropagation
18 pages
Unit - 3
No ratings yet
Unit - 3
42 pages
Supervised Learning Algorithms cheat sheet
No ratings yet
Supervised Learning Algorithms cheat sheet
20 pages
A Survey of Deep Learning in Sports Applications
No ratings yet
A Survey of Deep Learning in Sports Applications
20 pages
Kaushal Ghoshi LAB 1
No ratings yet
Kaushal Ghoshi LAB 1
17 pages
Ai Project - Cycle
No ratings yet
Ai Project - Cycle
10 pages

Body and Conclu

Uploaded by

Body and Conclu

Uploaded by

Introduction

With the widespread application and continuous advancement of speech recognition

According to Zwass [11], speech recognition is “the ability of devices to respond to

You might also like