0% found this document useful (0 votes)

36 views

Speech and Audio Processing: Lecture-2

The document discusses modeling the human speech production system and the human auditory system. It describes how the speech production system can be modeled as a time-varying filter excited by a white noise source representing the lungs. Linear prediction is used to estimate the filter parameters from observed speech signals. The model aims to replicate the frequency spectrum of speech using filtered white noise. It also describes key aspects of the human auditory system including the pinna, ossicles, cochlea, and properties like absolute threshold and masking.

Uploaded by

Randeep Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views

Speech and Audio Processing: Lecture-2

Uploaded by

Randeep Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Speech and Audio Processing

Lecture-2 By: Mohit Goel

Modeling the Speech Production System

A model is a simplified representation of the real world. It is designed to help us better understand the world in which we live and, ultimately, to duplicate many of the behaviors and characteristics of real-life phenomenon. In order for the model to be successful, it must be able to replicate partially or completely the behaviors of the particular object or fact that it intends to capture or simulate. The model may be a physical one (i.e., a model airplane) or it may be a mathematical one, such as a formula.

Modeling the Speech Production System

The human speech production system can be modeled using a rather simple structure: the lungsgenerating the air or energy to excite the vocal tractare represented by a white noise source. The acoustic path inside the body with all its components is associated with a time-varying filter.

By using a system identification technique called linear prediction , it is possible to estimate the parameters of the time-varying filter from the observed signal. The assumption of the model is that the energy distribution of the speech signal in frequency domain is totally due to the time-varying filter, with the lungs producing an excitation signal having a flat-spectrum white noise. This model is rather efficient and many analytical tools have already been developed around the concept. The idea is the well-known autoregressive model,

Parametric Speech Coding

Consider the speech frame corresponding to an unvoiced segment with 256 samples of Figure 1.8. Applying the samples of the frame to a linear prediction analysis procedure, the coefficients of an associated filter are found. This filter has system function

Figure 1.8 Example of speech waveform uttered by a male subject about the word problems. The expanded views of a voiced frame and an unvoiced frame are shown. The frame is 256 samples in length.

White noise samples are created using a unit variance Gaussian random number generator; when passing these samples (with appropriate scaling) to the filter, the output signal is obtained. Figure 1.10 compares the original speech frame, with two realizations of filtered white noise. As we can see, there is no timedomain correspondence between the three cases. However, when these three signal frames are played back to a human listener (converted to sound waves), the perception is almost the same!

How could this be? After all, they look so different in the time domain. The secret lies in the fact that they all have a similar magnitude spectrum, as plotted in Figure 1.11. As we can see, the frequency contents are similar, and since the human auditory system is not very sensitive toward phase differences, all three frames sound almost identical (more on this in the next section). The original frequency spectrum is captured by the filter, with all its coefficients. Thus, the flat-spectrum white noise is shaped by the filter so as to produce signals having a spectrum similar to the original speech. Hence, linear prediction analysis is also known as a spectrum estimation technique.

This simple speech coding procedure is summarized below. Encoding 1. Derive the filter coefficients from the speech frame. 2. Derive the scale factor from the speech frame. 3. Transmit filter coefficients and scale factor to the decoder. Decoding 1. Generate white noise sequence. 2. Multiply the white noise samples by the scale factor. 3. Construct the filter using the coefficients from the encoder and filter the scaled white noise sequence. Output speech is the output of the filter.

General Structure of a Speech Coder

Human Auditory System

The pinna (or informally the ear) is the surface surrounding the canal in which sound is funneled. Sound waves are guided by the canal toward the eardruma membrane that acts as an acoustic-tomechanic transducer. The sound waves are then translated into mechanical vibrations that are passed to the cochlea through a series of bones known as the ossicles.

Human Auditory System

Presence of the ossicles improves sound propagation by reducing the amount of reflection and is accomplished by the principle of impedance matching. The cochlea is a rigid snail-shaped organ filled with fluid. Mechanical oscillations impinging on the ossicles cause an internal membrane, known as the basilar membrane, to vibrate at various frequencies. The basilar membrane is characterized by a set of frequency responses at different points along the membrane.

Human Auditory System

Motion along the basilar membrane is sensed by the inner hair cells and causes neural activities that are transmitted to the brain through the auditory nerve.

Due to this arrangement, the human auditory system behaves very much like a frequency analyzer.

Absolute Threshold
The absolute threshold of a sound is the minimum detectable level of that sound in the absence of any other external sounds.

The horizontal axis is frequency measured in hertz (Hz); while the vertical axis is the absolute threshold in decibels (dB), related to a reference intensity of 1012 watts per square metera standard quantity for sound intensity measurement.

Absolute Threshold
As we can see, human beings tend to be more sensitive toward frequencies in the range of 1 to 4 kHz, while thresholds increase rapidly at very high and very low frequencies. It is commonly accepted that below 20 Hz and above 20 kHz, the auditory system is essentially dysfunctional. These characteristics are due to the structures of the human auditory system.

Absolute Threshold
We can take advantage of the absolute threshold curve in speech coder design. Some approaches are the following:

Any signal with an intensity below the absolute threshold need not be considered, since it does not have any impact on the final quality of the coder. More resources should be allocated for the representation of signals within the most sensitive frequency range, roughly from 1 to 4 kHz, since distortions in this range are more noticeable.

Masking
Masking refers to the phenomenon where one sound is rendered inaudible because of the presence of other sounds.

Masking
The presence of a single tone, for instance, can mask the neighboring signalswith the masking capability inversely proportional to the absolute difference in frequency. Masking capability increases with the intensity of the reference signal, or the single tone in this case.

Phase Perception
There is abundant evidence on phase deafness; for instance, a single tone and its time-shifted version essentially produce the same sensation; on the other hand, noise perception is chiefly determined by the magnitude spectrum. Even though phase has a minor role in perception, some level of phase preservation in the coding process is still desirable, since naturalness is normally increased. The code-excited linear prediction (CELP) algorithm, for instance, has a mechanism to retain phase information of the signal

Minutes 2010 Church of God Book of Discipline, Church Order, and Governance
100% (4)
Minutes 2010 Church of God Book of Discipline, Church Order, and Governance
220 pages
Introduction To Management - Assignment
No ratings yet
Introduction To Management - Assignment
9 pages
Analysis of Obstacle Detection by Megha Pandey (02D07006) Jayaprakash (02D07021)
No ratings yet
Analysis of Obstacle Detection by Megha Pandey (02D07006) Jayaprakash (02D07021)
161 pages
A Tutorial On Speech Synthesis Models
No ratings yet
A Tutorial On Speech Synthesis Models
8 pages
4. Human Speech Communication
No ratings yet
4. Human Speech Communication
44 pages
Overview of Speech Enhancement: 3.1 Psychoacoustics
No ratings yet
Overview of Speech Enhancement: 3.1 Psychoacoustics
19 pages
TEST-1
No ratings yet
TEST-1
77 pages
2.2 Speech Processing: - Speech Synthesis. - Speech Recognition. - Speech Coding
No ratings yet
2.2 Speech Processing: - Speech Synthesis. - Speech Recognition. - Speech Coding
7 pages
Speech Generation
No ratings yet
Speech Generation
11 pages
HG3052 SpeechSynthesisAndRecognition Lecture 11 Update2019-20
No ratings yet
HG3052 SpeechSynthesisAndRecognition Lecture 11 Update2019-20
78 pages
Speech Compression (2)
No ratings yet
Speech Compression (2)
37 pages
Codificadores de Voz
No ratings yet
Codificadores de Voz
26 pages
1. Introduction (UCS749).pptx
No ratings yet
1. Introduction (UCS749).pptx
72 pages
Audio and Audio Compression
No ratings yet
Audio and Audio Compression
27 pages
Speech Compression Techniques - Formant and CELP Vocoders
No ratings yet
Speech Compression Techniques - Formant and CELP Vocoders
41 pages
MPEG Audio
No ratings yet
MPEG Audio
68 pages
Mimicking the Human Ear
No ratings yet
Mimicking the Human Ear
30 pages
Method To Study Speech Synthesis
No ratings yet
Method To Study Speech Synthesis
43 pages
FPGA Based Signal Processing Implementation For Hearing Impairment
No ratings yet
FPGA Based Signal Processing Implementation For Hearing Impairment
5 pages
Speech
No ratings yet
Speech
39 pages
Assignment On Speech
No ratings yet
Assignment On Speech
9 pages
Vocoder
No ratings yet
Vocoder
12 pages
HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing
No ratings yet
HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing
12 pages
A Practical Handbook of Speech Coders
No ratings yet
A Practical Handbook of Speech Coders
13 pages
Unit2 1
No ratings yet
Unit2 1
23 pages
Human Speech Producing Organs: 2.4 Kbps
No ratings yet
Human Speech Producing Organs: 2.4 Kbps
108 pages
Linear Predictive Coding
No ratings yet
Linear Predictive Coding
22 pages
Anais Aesbr2007
No ratings yet
Anais Aesbr2007
160 pages
Laboratory6 ELECTIVE2
No ratings yet
Laboratory6 ELECTIVE2
5 pages
M5_audio
No ratings yet
M5_audio
32 pages
Vocoders: Phase Insensitivity
No ratings yet
Vocoders: Phase Insensitivity
3 pages
CELP
No ratings yet
CELP
23 pages
(Alli) Linear Predictive Modelling of Speech Signal
No ratings yet
(Alli) Linear Predictive Modelling of Speech Signal
25 pages
Introduction (UCS749)
No ratings yet
Introduction (UCS749)
59 pages
Speech Signal Processing
No ratings yet
Speech Signal Processing
41 pages
Unit Ii
No ratings yet
Unit Ii
34 pages
Codec 2
No ratings yet
Codec 2
30 pages
24:192 Music Downloads Are Very Silly Indeed
No ratings yet
24:192 Music Downloads Are Very Silly Indeed
16 pages
2.0 Information Sources 2.1 Audio Signals: Dr. Ing. Saviour Zammit
No ratings yet
2.0 Information Sources 2.1 Audio Signals: Dr. Ing. Saviour Zammit
12 pages
Adaptive Multi Rate Coder Using ACLP
No ratings yet
Adaptive Multi Rate Coder Using ACLP
45 pages
Chapter 2
No ratings yet
Chapter 2
29 pages
msp.1982.28454
No ratings yet
msp.1982.28454
6 pages
Gabor - 1946 - Theory of Communication. Part 2 The Analysis of Hearing
No ratings yet
Gabor - 1946 - Theory of Communication. Part 2 The Analysis of Hearing
4 pages
Chapter 2 - Speech Signal Processing
No ratings yet
Chapter 2 - Speech Signal Processing
60 pages
PsychoPhisics, Experiments
No ratings yet
PsychoPhisics, Experiments
9 pages
Perceptual Modelling For Low-Rate Audio Coding: Christopher R. Cave
No ratings yet
Perceptual Modelling For Low-Rate Audio Coding: Christopher R. Cave
96 pages
EEE 6211 Digital Speech Processing
No ratings yet
EEE 6211 Digital Speech Processing
40 pages
Psychoacoustics
No ratings yet
Psychoacoustics
22 pages
Audio Compression Using Wavelet Techniques: Project Report
No ratings yet
Audio Compression Using Wavelet Techniques: Project Report
41 pages
Speech Coders For Wireless Communication
No ratings yet
Speech Coders For Wireless Communication
53 pages
Digital Signal Processing: Course
No ratings yet
Digital Signal Processing: Course
47 pages
Chapter6 - SPEECH SIGNAL PROCESSING
No ratings yet
Chapter6 - SPEECH SIGNAL PROCESSING
54 pages
Treisman 1963
No ratings yet
Treisman 1963
8 pages
LPC Modeling: Unit 5 1.speech Compression
No ratings yet
LPC Modeling: Unit 5 1.speech Compression
13 pages
Digital Speech Processing-: Speech Perception-Auditory Models, Sound Perception Models, MOS Methods
No ratings yet
Digital Speech Processing-: Speech Perception-Auditory Models, Sound Perception Models, MOS Methods
90 pages
18.7 Real-World Example - Speech Synthesis: 0 and So The Interfering Sinusoid Is Filtered Out. The PSD at
No ratings yet
18.7 Real-World Example - Speech Synthesis: 0 and So The Interfering Sinusoid Is Filtered Out. The PSD at
5 pages
A Practical Handbook of Speech Coders
No ratings yet
A Practical Handbook of Speech Coders
11 pages
Audio Compression
No ratings yet
Audio Compression
53 pages
LPC Analysis and Synthesis
No ratings yet
LPC Analysis and Synthesis
17 pages
The Music Producer's Guide To EQ: The Music Producer's Guide
From Everand
The Music Producer's Guide To EQ: The Music Producer's Guide
Ashley Hewitt
No ratings yet
Physics Part One Dictionary: Grow Your Vocabulary, #32
From Everand
Physics Part One Dictionary: Grow Your Vocabulary, #32
Blake Pieck
No ratings yet
The Impulse Response Bible
From Everand
The Impulse Response Bible
Past To Future
No ratings yet
Osi Model Slides
No ratings yet
Osi Model Slides
50 pages
Cellular Architecture (Communication)
No ratings yet
Cellular Architecture (Communication)
15 pages
Levinson Durbin Algorithm
No ratings yet
Levinson Durbin Algorithm
11 pages
Introduction To Speech Processing
No ratings yet
Introduction To Speech Processing
16 pages
Speech and Audio Processing: Lecture-4
No ratings yet
Speech and Audio Processing: Lecture-4
20 pages
Speech and Audio Processing: Lecture-3
No ratings yet
Speech and Audio Processing: Lecture-3
20 pages
TQM Book
85% (20)
TQM Book
257 pages
TQM - Continuous Process Improvement
No ratings yet
TQM - Continuous Process Improvement
23 pages
Worth Their Weight
No ratings yet
Worth Their Weight
104 pages
Guess Which Box
No ratings yet
Guess Which Box
6 pages
Letters From A Birmingham Jail by Martin Luther King
No ratings yet
Letters From A Birmingham Jail by Martin Luther King
4 pages
Prefixes of Degree in English: A Cognitive-Corpus Analysis: Zeki Hamawand
No ratings yet
Prefixes of Degree in English: A Cognitive-Corpus Analysis: Zeki Hamawand
11 pages
PVC - Immaculada
No ratings yet
PVC - Immaculada
7 pages
Examples of Teacher's Comments To Be Included in The PBD Template
No ratings yet
Examples of Teacher's Comments To Be Included in The PBD Template
8 pages
Part One: The Cognitive Revolution
No ratings yet
Part One: The Cognitive Revolution
9 pages
Tracking The Crack in The Universe
100% (1)
Tracking The Crack in The Universe
177 pages
Factors That Influence Financial Behavior Among Accounting Students (Mam)
No ratings yet
Factors That Influence Financial Behavior Among Accounting Students (Mam)
17 pages
43 48 PDF
No ratings yet
43 48 PDF
6 pages
Remotivation Therapy (Sammm)
100% (3)
Remotivation Therapy (Sammm)
4 pages
Practical Research 1
No ratings yet
Practical Research 1
10 pages
Yoga Suryanamaskar at ST Peters College Agra
No ratings yet
Yoga Suryanamaskar at ST Peters College Agra
9 pages
Hughes 1997 Whole
100% (1)
Hughes 1997 Whole
294 pages
Chapter 5 Guided Reading Notes
No ratings yet
Chapter 5 Guided Reading Notes
10 pages
Structured Question 8 - 1
No ratings yet
Structured Question 8 - 1
8 pages
ACT Enhanced Writing
No ratings yet
ACT Enhanced Writing
10 pages
Engineer As Manager
No ratings yet
Engineer As Manager
3 pages
Crucibles of Leadership
No ratings yet
Crucibles of Leadership
2 pages
Sanskrit PDF
No ratings yet
Sanskrit PDF
27 pages
Sports Psychology
100% (1)
Sports Psychology
17 pages
Consumer Loyalty 1
No ratings yet
Consumer Loyalty 1
14 pages
[Ebooks PDF] download Power of Ideology István Mészáros full chapters
100% (2)
[Ebooks PDF] download Power of Ideology István Mészáros full chapters
51 pages
Reflection Paper
No ratings yet
Reflection Paper
3 pages
Generative Approach Under Computer Aided Process Planning (
No ratings yet
Generative Approach Under Computer Aided Process Planning (
14 pages
Five Ways To Make Architecture Political An Introduction To The Politics of Design Practice (Yaneva, Albena) PDF
No ratings yet
Five Ways To Make Architecture Political An Introduction To The Politics of Design Practice (Yaneva, Albena) PDF
201 pages
Gr10 - Q1 - LESSON 6 Fact Opinion Propaganda
No ratings yet
Gr10 - Q1 - LESSON 6 Fact Opinion Propaganda
10 pages
Facilitator PDF
No ratings yet
Facilitator PDF
8 pages

Speech and Audio Processing: Lecture-2

Uploaded by

Speech and Audio Processing: Lecture-2

Uploaded by

Speech and Audio Processing

Lecture-2 By: Mohit Goel

Modeling the Speech Production System

Modeling the Speech Production System

Parametric Speech Coding

General Structure of a Speech Coder

Human Auditory System

Human Auditory System

Human Auditory System

Human Auditory System

You might also like