0% found this document useful (0 votes)

30 views11 pages

Hands-On Lab On Speech Processing-Time-domain Processing - 2021

This document provides an introduction to time-domain processing of speech signals. It discusses analyzing speech signals using short-time energy and zero-crossings measures. The goal is to implement a voiced/unvoiced/silence discriminator based on these measures. It describes how short-time energy and zero-crossings are defined and computed. Choosing an appropriate analysis window size is important for short-time representations to balance time and frequency resolution. High zero-crossing rates indicate unvoiced speech while low rates indicate voiced speech, though this is an imprecise distinction. MATLAB code is provided to illustrate computing short-time energy of a voiced speech frame.

Uploaded by

Lucas Truong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views11 pages

Hands-On Lab On Speech Processing-Time-domain Processing - 2021

Uploaded by

Lucas Truong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Project 0: Part 1

A first hands-on lab on Speech Processing

Time-domain processing

February 12, 2020

In this lab, you will get acquainted with speech signals and their short-time processing. You will
explore the time domain structure of the most basic speech elements, such as vowels and consonants.
You will learn about time domain properties of speech signals and you will apply basic metrics on
speech, such as energy and zero-crossings.

The final goal of this lab is to implement a simple, fully-automated voiced/unvoiced/silence (VUS)
discriminator. Such an algorithm is very practical in real-life systems such as mobile communications
systems. A typical Voice Activity Detector (VAD), which is a subset of a VUS discriminator, is used in
the Global System for Mobile Communications (GSM), the European system for cellular communica-
tions. The algorithm you will implement is based on energy and zero-crossing measures of the speech
signal - such computations are fast enough for real-time implementations. However, in this lab, we will
consider an off-line approach. This means that we have the whole signal available from the first place.

1 Theoretical Background
The algorithm is based on energy and zero-crossings measures of the speech waveform. Let’s take
a short introduction on these subjects.

1.1 Short-Time Energy

We have observed that the amplitude of speech signals varies appreciably with time. In particular,
the amplitude of unvoiced segments is generally much lower than the one of voiced segments. The
short-time energy of the speech signal provides a convenient representation that reflects these amplitude
variations. In general, we can define the short-time energy as
∞
X
En = x2 [k]w2 [n − k] (1)
k=−∞

This expression can be written as

∞
X
En = x2 [k]h[n − k] (2)
k=−∞

where h[n] = w2 [n] is the squared analysis window applied on a speech segment. We can safely as-
sume that the analysis window is supported in [−N, N ]. The choice of the impulse response, h[n], or
equivalently the analysis window, determines the nature of the short-time energy representation. To

1
see how the choice of window affects the short-time energy, let us observe that if h[n] in the equa-
tion above was very long, and of constant amplitude, En would change very little with time. Such a
window would be the equivalent of a very narrowband lowpass filter. Clearly what is desired is some
lowpass filtering but not so much that the output is constant; i.e., we want the short-time energy to
reflect the amplitude variations of the speech signal. Thus, we encounter for the first time a conflict
that will repeatedly arise in the study of short-time representations of speech signals. That is, we
wish to have a short duration window (impulse response) to be responsive to rapid amplitude changes,
but a window that is too short will not provide sufficient averaging to produce a smooth energy function.

The effect of the window on the time-dependent energy representation can be illustrated by discus-
sing the properties of two representative windows, i.e., the rectangular window

1, for 0 ≤ n ≤ N − 1
h[n] = (3)
0, otherwise

and the Hamming window

(
2πn
0.54 − 0.46 cos N −1 , for 0 ≤ n ≤ N − 1
h[n] = (4)
0, otherwise

where N is the window length in samples. The rectangular window corresponds to applying equal
weight to all the samples in the interval (n − N + 1) to n, whereas the Hamming window gives more
weight to the center of the window, which is preferable in many applications. If the window size, N . is
too small, i.e., on the order of a pitch period or less, En will fluctuate very rapidly depending on exact
details of the waveform. If N is too large, i.e., on the order of several pitch periods (3 − 4), En will
change very slowly and thus will not adequately reflect the time-varying properties of the speech signal.
Unfortunately, this implies that no single value of N is entirely satisfactory. With these shortcomings
in mind, a suitable practical choice for N is on the order of 300 − 500 samples for a 16 kHz sampling
rate (i.e., 20 − 30 ms duration).

1.2 Short-Time Zero Crossings

In the context of discrete-time signals, a zero-crossing is said to occur if a sample has different
algebraic sign from the previous (or the following) one. The rate at which zero crossings occur is a
simple measure of the frequent content of a signal. This is particularly true for narrowband signals. For
example, a sinusoidal signal of frequency f0 Hz, sampled at a rate of Fs , has Fs /f0 samples per cycle of
the sine wave. Each cycle has two zero crossings so that the long-time average rate of zero-crossings is
2f0
Z= crossings per sample (5)
Fs
Thus, the average zero-crossings rate gives a reasonable and simple way to estimate the frequency of a
sine wave.

Speech signals are broadband signals and the interpretation of average zero-crossing rate is the-
refore much less precise. However, rough estimates of spectral properties can be obtained using a
representation based on the short-time average zero-crossing rate. Before discussing the interpreta-
tion of zero-crossing rate for speech, let us first define and discuss the theory behind. An appropriate
definition is
X∞
Zn = |sgn(x[m]) − sgn(x[m − 1])|w[n − m] (6)
m=−∞

2
where
1, for x[n] ≥ 0
sgn(x[n]) = (7)
−1, otherwise
and
1

2N , for 0 ≤ n ≤ N − 1
w[n] = (8)
0, otherwise
This representation shows that the short-time average zero-crossing rate has the same general proper-
ties as the short-time energy. However, the definition of zero crossings equation make the computation
of Zn appear more complex than it really is. All that is required is to check samples in pairs to de-
termine where the zero-crossings occur and then the average is computed over N consecutive samples
(the division by N is obviously unnecessary as well).

Now let us see how the short-time average zero-crossing rate applies to speech signals. The model
for speech production suggests that the energy of voiced speech is concentrated below 4 kHz, whereas
for unvoiced speech, most of the energy is found at higher frequencies. Since high frequencies imply
high zero-crossing rates, and low frequencies imply low zero-crossing rate, there is a strong correlation
between zero-crossing rate and energy distribution over frequency. A reasonable generalization is that
if the zero-crossing rate is high, the speech signal is unvoiced, while if the zero-crossing rate is low, the
speech signal is voiced. This, however, is a very imprecise statement because we have not said what is
high and what is low, and, of course, it really is not possible to be precise. Despite this imprecision,
zero-crossing rate is definitely a simple and convenient measure for speech discrimination.

2 MATLAB point of view

2.1 Short-Time Energy
As you know, in discrete time the energy of a N -length signal is the sum of all squared values,
divided by the number of samples. For example, let’s extract a frame from the signal. The extracted
frame is shown in Figure 1a.

% Let’s load the speech signal first.

[s,fs] = wavread(’H.22.16k.wav’);
frame1 = s(3600:4400);
N1 = length(frame1);
figure; plot(0:1/fs:N1/fs-1/fs, frame1); grid;
title(’A voiced frame’); xlabel(’time’);
% Listen if you want
% soundsc(frame1, fs);

You can see that the signal has a periodicity of some form. It’s not periodic, it’s quasi-periodic. It
has been observed that voiced speech can be represented using periodic or quasi-periodic waveforms.
So, speech signals like this one, which have a strong periodicity of some kind, can be thought of as
voiced speech signals, like /a/, /e/, /o/, etc. You can also see that it is a low frequency signal. It
doesn’t change very quickly over time. Let’s see what happens with the energy of a signal like that
one.

E1 = (1/N1)*sum(frame1.^2);

We’ll hold this result, we’ll need it later. Let’s now cut another piece of speech from our waveform.
Let’s try this one.

3
A voiced frame An unvoiced frame
0.3 0.015

0.2
0.01

0.1
0.005

0
0
−0.1

−0.005
−0.2

−0.01
−0.3

−0.4 −0.015
0 0.01 0.02 0.03 0.04 0.05 0.06 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045
time time

(a) A voiced part of speech. (b) An unvoiced part of speech.

Energy comparison

Energy of Voiced frame

0.01 Energy of Unvoiced Frame
0.009

0.008

0.007
Energy Value

0.006

0.005

0.004

0.003

0.002

0.001

0
0 0.5 1 1.5 2 2.5 3
Index

(c) Energy comparison between different parts of speech

waveform.

Figure 1: Voiced and Unvoiced speech sounds, along with their energies.

frame2 = s(4800:5500);
N2 = length(frame2);
figure; plot(0:1/fs:N2/fs-1/fs, frame2); grid;
title(’An unvoiced frame’); xlabel(’time’);
% Listen if you want
%soundsc(frame2, fs);

You can see in Figure 1b that this one is very peaky! It has no periodicity at all, and it changes
very quickly in time. The last one means that it is a high-frequency signal. Speech signals like that
can be thought as unvoiced speech signals, like /s/, /f/, /sh/, etc. Let’s take a look at the energy of
this kind of speech signal, and compare it with the first one.

E2 = (1/N2)*sum(frame2.^2);

Let’s compare them.

figure; stem(1, E1, ’+’, ’LineWidth’, 3); grid; hold on;

stem(2, E2, ’rx’, ’LineWidth’, 3); hold off;
axis([0 3 0 max(E1,E2)]); title(’Energy comparison’);
xlabel(’Index’); ylabel(’Energy Value’);
legend(’Energy of Voiced frame’, ’Energy of Unvoiced Frame’);

4
You can see in Figure 1c that the energy of the voiced speech signal is really higher than the unvoiced
one. If you do this in several other voiced/unvoiced examples, you will see that our observation holds
in general. If we do the same for a full speech waveform, what we will get is depicted in Figure 2.

’The fish twisted and turned on the bent hook’

0.4

0.2

−0.2

−0.4

−0.6
0 0.5 1 1.5 2 2.5 3
Time (s)

0.02
Short−Time Energy

0.015

0.01

0.005

0
0 0.5 1 1.5 2 2.5 3
Time (s)

Figure 2: Energy distribution for full speech.

You can clearly see that voiced parts of speech have higher energy than unvoiced or silent ones. So,
the energy of a speech frame is a good indicator of whether a frame is a voiced or unvoiced one.

2.2 Zero crossings

Again, what is a zero crossing? A zero-crossing is a point where the sign of a function changes (e.g.
from positive to negative), represented by a crossing of the axis (zero value) in the graph of the function.

How can we exploit the number of zero crossings in a speech frame to make the discrimination
between voiced and unvoiced speech? Let’s take a look again into our two extracted speech frames:

figure; subplot(2,1,1); plot(frame1); grid; title(’Voiced frame’);

subplot(2,1,2); plot(frame2); grid; title(’Unvoiced frame’);

Take a look at Figure 3a. Can you guess what is going on with the zero-crossings? :-) You can
see that the number of zero-crossings are really high in the unvoiced speech frame, much higher than
the voiced speech frame! So, this can be our second indicator of whether a speech frame is voiced or
unvoiced. There is a straightforward way to find out how many zero crossing are there in a speech
frame. Let’s see it:

ZCr1 = 0.5*sum(abs(sign(frame1(2:end))-sign(frame1(1:end-1))));
ZCr2 = 0.5*sum(abs(sign(frame2(2:end))-sign(frame2(1:end-1))));

5
Voiced frame Zero−Crossings comparison
0.4
Zero Crossings of Voiced frame
0.2 Zero Crossings of Unvoiced Frame
200
0

−0.2

Zero−Crossings Value
150
−0.4
0 100 200 300 400 500 600 700 800 900

Unvoiced frame
0.02 100

0.01

0 50

−0.01

−0.02 0
0 100 200 300 400 500 600 700 800 0 0.5 1 1.5 2 2.5 3
Index

(a) Our two parts of speech. (b) Zero crossings comparison.

Figure 3: Voiced and Unvoiced speech sounds, along with their zero-crossings.

The above equation counts the number of zero-crossings in a speech frame. The first term returns
the sign of each sample value, starting from sample 2, and the second term returns the sign of each
sample value, starting from sample 1 and ending a sample from the last one. The subtraction of these
two gives us a value different than zero (+1 or -1) when we have a change of the function’s sign. So,
the addition of all these non-zero values will give us the number of zero-crosssings, but doubled. So,
we need to keep half of them. That’s why we multiply the number by 0.5. Try this in paper to see why
it is valid. A for loop and an if statement would do the same thing, but we prefer not using loops in
MATLAB, especially when the loops are too many, because all this procedure is not so time-efficient.

Let’s take a look at the zero-crossings results:

figure; stem(1, ZCr1, ’+’, ’LineWidth’, 3); grid; hold on;

stem(2, ZCr2, ’rx’, ’LineWidth’, 3); hold off;
axis([0 3 0 max(ZCr1, ZCr2)]); title(’Zero-Crossings comparison’);
xlabel(’Index’); ylabel(’Zero-Crossings Value’);
legend(’Zero Crossings of Voiced frame’, ’Zero Crossings of Unvoiced Frame’);

You can see the result in Figure 3b. We can generalize for a full waveform, and what we get is
shown in Figure 4. You can see that the number of zero-crossings is quite high in unvoiced parts of
speech, whereas is low in voiced parts. So, zero-crossings is a quite convenient way to discriminate
voiced from unvoiced speech.

2.3 Limitations
A combination of the methods we have described seems reasonable and powerful enough for our
purpose. And indeed, it is, up to a certain level. :-) However, there are some problems in our simple
approach. Some frames are transient frames (something between voiced and unvoiced), which cannot
be easily detected and categorized. Some others are neither purely voiced, nor purely unvoiced, such
as fricatives, plosives, or nasals (/p/, /g/, /d/, /b/, /m/, /n/, etc). Also, and more importantly, there
might be some voiced frames with low energy and some unvoiced with high energy. This depends on
the speaker and the speaking style. Even the zero crossing rate may differ among different speakers or

6
’The fish twisted and turned on the bent hook’
0.4

0.2

−0.2

−0.4

−0.6
0 0.5 1 1.5 2 2.5 3
Time (s)
Short−time Zero Crossings Rate

400

300

200

100

0
0 0.5 1 1.5 2 2.5 3
Time (s)

Figure 4: Zero-crossings distribution in full speech.

speaking style. So, there are limitations to our apporach, as you may have guessed. However, it is not
quite far from the one used in real systems, like the GSM standard of the European Union Cellular
Communications Systems, and the purpose of this lab is not to build a robust VUS discriminator, but
to use simple tools and familiarize yourselves with time-domain properties of speech.

3 VUS discriminator implementation issues

Now that you are familiar to the basic analysis tools, you can start building your VUS discrimina-
tor. Most of the code is given to you at the end of this PDF. Please consider the following for your
implementation:

• What about silence? We haven’t mentioned anything about it in our analysis. Well, ideally, a
silence frame would be a frame with all samples equal to zero, right? This means energy equal
to zero and no zero-crossings at all. But in practice, because of microphone noise or noise from
the environment (breath, room reflections etc.), silence frames have some kind of small sample
variation. This variation is so small that the energy should be less than both voiced and unvoiced
frames, and the zero-crossings should be also less than the other two cases. Usually, this is what
happens in the real world. :-)

• As we have mentioned at the beginning, our algorithm has an adaptivity of some form, which me-
ans that its results depend on some statistics of the input waveform. This adaptivity is presented
in the thresholds we introduce, in order to make our discrimination into voiced/unvoiced/silence

7
frames. You will see that there are two thresholds, one for the energy and one for zero-crossings.
These thresholds depend on the input waveform, so they are not fixed numbers.

• Your algorithm should work on a frame-by-frame basis. That means you have to estimate your
energy and zero-crossings in a set of analysis time instants of your choice, and with a step size of
your choice. However, you should pay attention that your analysis window should be long enough
for your statistics to be meaningful, but also short enough to have good time localization. :-) It
is suggested that your analysis window and your frame rate should be 20 − 30 ms and 5 − 10
ms, respectively. Your energy and zero-crossings estimates are considered to be localized in the
center of the analysis window.

• So, for each frame, you will have to calculate the above measures (energy and zero-crossings) and
find out if the frame is voiced/unvoiced/silence.

• Because of its simplicity, this VUS discriminator should perform adequately but not perfectly
(actually, a very accurate VUS is still a subject of research in speech community, although several
robust and highly accurate VUSs have been proposed over the years). However, it should at least
correctly detect the voiced parts of speech.

• Also, since your VUS results in estimates every 5 − 10 ms, you will have to interpolate your
results over the whole speech waveform, in order to have a continuous estimate (that means, for
every time sample). To do this, you can use the interp1 function that is already available for
you in MATLAB. Try different interpolation methods, like splines, cubic, and linear interpolation
schemes. You can visually inspect the results and check which method performs better. Justify
your results and make a short comment.

• Moreover, you can try different analysis window sizes, different analysis window types, or different
frame rates to see how the results change.

• Finally, in the speech examples that we provide, there are two speech files ending in -sin.wav
and -swn.wav. These are two utterances that have the same context (the speakers says the same
thing) but in a different speaking style. In the -sin.wav file, the speaking style is more ”stressed”,
more ”intense” in a way. This is called Lombard speech, because the speaker changes his/her
speaking style in order to produce speech that is more intelligible in a noisy environment. It
is like speaking in a very quiet place (-swn.wav file) and in a cafeteria, an airport, or in any
other crowded place (-sin.wav ). Can you see any significant energy or zero-crossing rate changes
between these two waveforms? If yes, comment on your results.

A sample code is given below:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% SAMPLE CODE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Reading the speech signal

[s, fs] = wavread(’sample.wav’);

% Remove mean value (DC component)

s = s - mean(s);

% Signal length
D = length(s);

8
% Frame length (30 ms, how many samples? )
L = %INSERT CODE HERE

% Frame shift (10 ms, how many samples? )

U = %INSERT CODE HERE

% Window type (Hamming)

win = hamming(L);

% Number of frames
Nfr = %INSERT CODE HERE

% Memory allocation (for speed)

energy = zeros(1, Nfr+1);
ZCr = zeros(1, Nfr+1);

% Loop which calculates the speech features

for i = 1:1:(Nfr+1)
frame = %INSERT CODE HERE % a frame of speech windowed by the Hamming window
energy(i) = %INSERT CODE HERE % calculate energy
ZCr(i) = %INSERT CODE HERE % calculate zero crossings
T(i) = L/2 + (i-1)*U; % Next analysis time instant
end

% THRESHOLDS (you can play with it!)

Ethres = mean(energy)/2;
ZCRthres = (3/2)*mean(ZCr) - 0.3*std(ZCr);

% Clssification
for i = 1:1:Nfr
if % INSERT CONDITION HERE
% VOICED
VUS(i) = 1.0;
elseif % INSERT CONDITION HERE
% SILENCE
VUS(i) = 0.0;
elseif % INSERT CONDITION HERE
% UNVOICED
VUS(i) = 0.5;
end
end

% Interpolation with interp1

VUSi = INSERT CODE HERE

% Visualize
figure;
t = 0:1/fs:length(s)/fs-1/fs;
plot(t, VUSi);

9
hold on; plot(t, s/max(s), ’r’); hold off;
xlabel(’Time (s)’);
title(’Energy & Zero-Crossings Rate-based VUS discrimination’);
grid;

If everything went well, you should see something like Figure 5 below. You can see that it is not
perfect but it works. :-)

You can try your own .wav file or you can use the ones we provide.

Energy & Zero−Crossings Rate−based VUS discrimination

VOICED

UNVOICED

SILENCE

0 0.5 1 1.5 2 2.5 3

Time (s)

Figure 5: A VUS discrimination using linear interpolation.

4 Minimum deliverables in your report

• Complete (where indicated by %INSERT CODE) and deliver the MATLAB code provided. Stan-
dard parameters are 30 ms for each (Hamming) analysis window, and 10 ms for the frame rate.
For the interpolation of your results, select the linear interpolation as a parameter to interp1.m
MATLAB function. Include the figure that is generated by the code, along with your code in
your report.

• Comment on the results of your VUSD, in all provided speech waveforms. Does it work well? If
not, where does it have problems? Comment and include figures for all waveforms in your report.

• What happens if you increase the frame rate to 20 or 30 ms? What happens if you decrease the
frame rate down to 2.5 or 5 ms? What happens if you change the window size (make it shorter
or larger)? Comment (figures are NOT necessary).

10
• Record your voice with a microphone and save it into a .wav file. Use a sampling frequency of
Fs = 16 kHz for your recording and a 16-bit precision. Load it into MATLAB (use audioread.m)
and apply your algorithm on the waveform. Include a figure of your voice and the corresponding
result of the algorithm.

• Delivery deadline: 24 February 2020

If you have ANY questions on this lab, please send an e-mail to : [email protected]

Adina Mornell - Art in Motion - Musical and Athletic Motor Learning and Performance (2010, Peter Lang GMBH)
No ratings yet
Adina Mornell - Art in Motion - Musical and Athletic Motor Learning and Performance (2010, Peter Lang GMBH)
266 pages
RCM Grade 6 - Tech
No ratings yet
RCM Grade 6 - Tech
7 pages
2
No ratings yet
2
26 pages
Voiced/Unvoiced Decision For Speech Signals Based On Zero-Crossing Rate and Energy
No ratings yet
Voiced/Unvoiced Decision For Speech Signals Based On Zero-Crossing Rate and Energy
5 pages
Week 5 Silent Discrimination
No ratings yet
Week 5 Silent Discrimination
7 pages
Chapter6 - SPEECH SIGNAL PROCESSING
No ratings yet
Chapter6 - SPEECH SIGNAL PROCESSING
54 pages
Lectures 7-8 Winter 2012
No ratings yet
Lectures 7-8 Winter 2012
73 pages
Digital Signal Processing: Course
No ratings yet
Digital Signal Processing: Course
47 pages
Module2 SSP
No ratings yet
Module2 SSP
70 pages
Automatic Identification of Silence, Unvoiced and Voiced Chunks in Speech
No ratings yet
Automatic Identification of Silence, Unvoiced and Voiced Chunks in Speech
10 pages
Lec 65
No ratings yet
Lec 65
11 pages
lab9a
No ratings yet
lab9a
12 pages
Homework 1
No ratings yet
Homework 1
3 pages
46 Silence PDF
No ratings yet
46 Silence PDF
8 pages
A New Silence Removal and Endpoint Detection Algorithm For Speech and Speaker Recognition Applications
No ratings yet
A New Silence Removal and Endpoint Detection Algorithm For Speech and Speaker Recognition Applications
5 pages
Digital Signal Processing "Speech Recognition": Paper Presentation On
No ratings yet
Digital Signal Processing "Speech Recognition": Paper Presentation On
12 pages
Time-Domain Methods For Speech Processing
No ratings yet
Time-Domain Methods For Speech Processing
77 pages
Spectral Energy Based Voice Activity Detection For Real-Time Voice Interface
No ratings yet
Spectral Energy Based Voice Activity Detection For Real-Time Voice Interface
17 pages
Speech
No ratings yet
Speech
39 pages
Acoustic Phonetics - The Handbook of Phonetic Sciences - Blackwell Reference Online
100% (1)
Acoustic Phonetics - The Handbook of Phonetic Sciences - Blackwell Reference Online
32 pages
Basic Characteristics Basic Characteristics of Speech Signal Analysis Analysis
No ratings yet
Basic Characteristics Basic Characteristics of Speech Signal Analysis Analysis
5 pages
ZCR Based Identification of Voiced Unvoiced and Silent Parts of Speech Signal in Presence of Background Noise
No ratings yet
ZCR Based Identification of Voiced Unvoiced and Silent Parts of Speech Signal in Presence of Background Noise
30 pages
Speech Processing Using MATLAB111
No ratings yet
Speech Processing Using MATLAB111
32 pages
2
No ratings yet
2
7 pages
Use of Spectral Autocorrelation in Spectral Envelope Linear Prediction For Speech Recognition
No ratings yet
Use of Spectral Autocorrelation in Spectral Envelope Linear Prediction For Speech Recognition
31 pages
Epoch-Based Analysis of Speech Signals
No ratings yet
Epoch-Based Analysis of Speech Signals
47 pages
(Alli) Linear Predictive Modelling of Speech Signal
No ratings yet
(Alli) Linear Predictive Modelling of Speech Signal
25 pages
Abstract:: Text-Independent and Dependent Methods. in A Text
No ratings yet
Abstract:: Text-Independent and Dependent Methods. in A Text
11 pages
LPC Analysis and Synthesis
No ratings yet
LPC Analysis and Synthesis
17 pages
Lab9: Speech Synthesis
No ratings yet
Lab9: Speech Synthesis
13 pages
Preprocessing Signal
No ratings yet
Preprocessing Signal
6 pages
Speech Coding and Phoneme Classification Using Matlab and Neuralworks
No ratings yet
Speech Coding and Phoneme Classification Using Matlab and Neuralworks
4 pages
Silence Removal
No ratings yet
Silence Removal
3 pages
2.2 Speech Processing: - Speech Synthesis. - Speech Recognition. - Speech Coding
No ratings yet
2.2 Speech Processing: - Speech Synthesis. - Speech Recognition. - Speech Coding
7 pages
System For Automatic Formant Analysis of Voiced Speech
No ratings yet
System For Automatic Formant Analysis of Voiced Speech
15 pages
Speech Processing Unit 4 Notes
No ratings yet
Speech Processing Unit 4 Notes
16 pages
Speech Generation
No ratings yet
Speech Generation
11 pages
A Tutorial On Speech Synthesis Models
No ratings yet
A Tutorial On Speech Synthesis Models
8 pages
Speaker Recognition
No ratings yet
Speaker Recognition
11 pages
7.0 Speech Signals and Front-End Processing: References: 1. 3.3, 3.4 of Becchetti
No ratings yet
7.0 Speech Signals and Front-End Processing: References: 1. 3.3, 3.4 of Becchetti
50 pages
3_3_1
No ratings yet
3_3_1
6 pages
DSP Project 2
No ratings yet
DSP Project 2
10 pages
Acoustic Parameters For Speaker Verification
No ratings yet
Acoustic Parameters For Speaker Verification
16 pages
Signals: Lasciate Ogni Speranza, Voi CH 'Entrate. Dante Alighieri, The Divine Comedy
No ratings yet
Signals: Lasciate Ogni Speranza, Voi CH 'Entrate. Dante Alighieri, The Divine Comedy
17 pages
TEST-1
No ratings yet
TEST-1
77 pages
My Lecture Notes in Speech Processing and Recognition Course
100% (1)
My Lecture Notes in Speech Processing and Recognition Course
190 pages
Observations On The Endpoint Location Algorithm
No ratings yet
Observations On The Endpoint Location Algorithm
7 pages
Towards Neurocomputational Speech and So
No ratings yet
Towards Neurocomputational Speech and So
279 pages
Speech Production
No ratings yet
Speech Production
8 pages
A Practical Handbook of Speech Coders
No ratings yet
A Practical Handbook of Speech Coders
15 pages
Acoustics of Speech: Julia Hirschberg CS 4706
No ratings yet
Acoustics of Speech: Julia Hirschberg CS 4706
30 pages
Speech Recognition and Retrieving Using Fuzzy Logic System
No ratings yet
Speech Recognition and Retrieving Using Fuzzy Logic System
15 pages
4. Human Speech Communication
No ratings yet
4. Human Speech Communication
44 pages
Assignment On Speech
No ratings yet
Assignment On Speech
9 pages
Emotional Feature Analysis
No ratings yet
Emotional Feature Analysis
5 pages
Speech Acoustics Project
No ratings yet
Speech Acoustics Project
22 pages
Speech Endpoint Detection Based On Sub-Band Energy and Harmonic Structure of Voice
No ratings yet
Speech Endpoint Detection Based On Sub-Band Energy and Harmonic Structure of Voice
9 pages
Am-Demodulation of Speech Spectra and Its Application To Noise Robust Speech Recognition
No ratings yet
Am-Demodulation of Speech Spectra and Its Application To Noise Robust Speech Recognition
4 pages
Proj 2
No ratings yet
Proj 2
9 pages
396
No ratings yet
396
5 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Constructed Layered Systems: Measurements and Analysis
From Everand
Constructed Layered Systems: Measurements and Analysis
W. H. Cogill
No ratings yet
Trial Cublak Suweng
No ratings yet
Trial Cublak Suweng
3 pages
St. Joseph College of Canlaon, Inc.: Diagnostic Test in Mapeh 6
100% (1)
St. Joseph College of Canlaon, Inc.: Diagnostic Test in Mapeh 6
2 pages
Nardis Analisis
100% (1)
Nardis Analisis
1 page
EE193 Answer Key 8
No ratings yet
EE193 Answer Key 8
3 pages
Characteristics of Musical Keys
No ratings yet
Characteristics of Musical Keys
8 pages
Music Reviewer 3rd Quarter
No ratings yet
Music Reviewer 3rd Quarter
2 pages
Guitar Scales Demystified: Chords
No ratings yet
Guitar Scales Demystified: Chords
16 pages
Augmented Sixth Chord - Wikipedia
No ratings yet
Augmented Sixth Chord - Wikipedia
40 pages
Mautopitch: Presets Button
No ratings yet
Mautopitch: Presets Button
34 pages
D Bebop BC
100% (1)
D Bebop BC
50 pages
Hallelujah: A Pentatonix Christmas
No ratings yet
Hallelujah: A Pentatonix Christmas
12 pages
Wayne Krantz
100% (2)
Wayne Krantz
7 pages
Major 6 Thguide
100% (1)
Major 6 Thguide
225 pages
An Approach To Musical Live Coding
No ratings yet
An Approach To Musical Live Coding
10 pages
[FREE PDF sample] Fashion Theory A Reader 2nd Edition Malcolm Barnard ebooks
100% (3)
[FREE PDF sample] Fashion Theory A Reader 2nd Edition Malcolm Barnard ebooks
50 pages
Fall
No ratings yet
Fall
6 pages
Brewer - Magnificat in D
No ratings yet
Brewer - Magnificat in D
8 pages
PRR 4 134
No ratings yet
PRR 4 134
6 pages
Buchla Music Easel Manual
No ratings yet
Buchla Music Easel Manual
68 pages
Viola 0812
No ratings yet
Viola 0812
2 pages
Absolute Judgement
No ratings yet
Absolute Judgement
4 pages
Jeff Berlin - A Comprehensive Chord Tone System For Mastering The Bass 1987 PDF
100% (1)
Jeff Berlin - A Comprehensive Chord Tone System For Mastering The Bass 1987 PDF
54 pages
Trevor Wye Afinacinação
100% (1)
Trevor Wye Afinacinação
8 pages
The Small Theme and The Large Theme: Motive
No ratings yet
The Small Theme and The Large Theme: Motive
16 pages
Trans Series Ebook Series Ebook Formula
0% (2)
Trans Series Ebook Series Ebook Formula
5 pages
Vincent - Harmony Diatonic and Chromatic
100% (3)
Vincent - Harmony Diatonic and Chromatic
188 pages
Yeu 5 PDF
No ratings yet
Yeu 5 PDF
3 pages
III. Largo: From Recorder Sonata No. 2 in D Minor
No ratings yet
III. Largo: From Recorder Sonata No. 2 in D Minor
2 pages

Hands-On Lab On Speech Processing-Time-domain Processing - 2021

Uploaded by

Hands-On Lab On Speech Processing-Time-domain Processing - 2021

Uploaded by

Project 0: Part 1

A first hands-on lab on Speech Processing

February 12, 2020

1.1 Short-Time Energy

This expression can be written as

and the Hamming window

1.2 Short-Time Zero Crossings

2 MATLAB point of view

% Let’s load the speech signal first.

(a) A voiced part of speech. (b) An unvoiced part of speech.

Energy of Voiced frame

(c) Energy comparison between different parts of speech

Let’s compare them.

figure; stem(1, E1, ’+’, ’LineWidth’, 3); grid; hold on;

’The fish twisted and turned on the bent hook’

Figure 2: Energy distribution for full speech.

2.2 Zero crossings

figure; subplot(2,1,1); plot(frame1); grid; title(’Voiced frame’);

(a) Our two parts of speech. (b) Zero crossings comparison.

Let’s take a look at the zero-crossings results:

figure; stem(1, ZCr1, ’+’, ’LineWidth’, 3); grid; hold on;

Figure 4: Zero-crossings distribution in full speech.

3 VUS discriminator implementation issues

A sample code is given below:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% SAMPLE CODE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Reading the speech signal

% Remove mean value (DC component)

% Frame shift (10 ms, how many samples? )

% Window type (Hamming)

% Memory allocation (for speed)

% Loop which calculates the speech features

% THRESHOLDS (you can play with it!)

% Interpolation with interp1

Energy & Zero−Crossings Rate−based VUS discrimination

0 0.5 1 1.5 2 2.5 3

Figure 5: A VUS discrimination using linear interpolation.

4 Minimum deliverables in your report

• Delivery deadline: 24 February 2020

You might also like