EC39201_Expt4_Lab Report_Grp-24
EC39201_Expt4_Lab Report_Grp-24
12 October 2022
Group-24
KR Rahul(20EC30021)
Rahul Singh(20EC30037)
AIM
In this experiment, we need to gain understanding of the relative importance of low frequency
temporal structure of speech and frequency content of speech in speech perception
THEORY
Nearly perfect speech recognition was observed under conditions of greatly reduced
spectral information. A speech temporal envelope was extracted from a wide frequency
band and used to modulate the same bandwidth of the sound. This operation preserved
the temporal envelope cues of each band but limited the listener to significantly
degraded information about the spectral energy distribution. Discrimination of
consonants, vowels, and words in simple sentences improved significantly as the number
of bands increased. High speech recognition performance was achieved with only her
three bands of modulated noise. Therefore, representing dynamic temporal patterns in
just a few broad spectral ranges is sufficient to recognize speech.
Random noise was modulated using the envelope signal and spectrally limited by the
same bandpass filter used for the original analysis band. Thus, time and amplitude cues
were retained in each spectral band, but spectral details within each band were
removed. All bands were then summarized and presented to the audience
CODE:
clc;
clear;
info=audioinfo('fivewo.wav');
[x,Fs]=audioread('fivewo.wav');
t=0:seconds(1/Fs):seconds(info.Duration);
t=t(1:end-1);
subplot(3,1,1);
plot(t,x);
xlabel('Time');
ylabel('Audio Data');
title('Plot of the Given Audio');
disp(info);
n=100; %SNR
noise=(1/n)*wgn(156250,1,1);
subplot(3,1,2);
plot(t,noise);
xlabel('Time');
ylabel('Noise');
title('Plot of the White Gaussian Noise');
z=0;
N=2; %no. Of filters
for i = 1:N
f1=90*64.^((i-1)/N);
f2=90*64.^(i/N);
[B,A]=butter(2,[f1/Fs,f2/Fs],"stop");
y=filter(B,A,x);
y_hilb=hilbert(y);
[B_,A_]=butter(2,240/Fs,"low");
y_final=filter(B_,A_,abs(y_hilb));
mult=y_final.*noise;
z=z+mult;
end
subplot(3,1,3);
plot(t,z);
xlabel('Time');
ylabel('Final Audio');
title('Plot of Final Audio');
sound(n*z,Fs);
audiowrite('final.wav',z,Fs);
● For N=2 Bandpass Filters:
DISCUSSION
1. For N=1 or 2, where N is the number of bands, we couldn’t recognize the voice in
the provided audio file, however the rhythmic pattern could be understood. The
voice was recognizable from 3-bands onwards.
2. The voice is clearly recognizable for 8-bands and 16-bands. mThe clarity of voice
in the 16-bands is similar to using 8-bands.
3. As a result, as we add bands, the time domain representation becomes
increasingly close, and the frequency domain even begins to resemble the actual
spectrum of the audio stream.
4. The spacing between bands is crucial for the decoding of the audio signal.
5. The majority of human voice is located in the low frequency range. We begin to
acquire the information held in the speech as we apply small bandwidth filters
and increasingly closely spaced bands. Thus, increasing the number of bands
improves the output voice's clarity.