0% found this document useful (0 votes)
30 views

D2 Report 2022JTM2399

Uploaded by

BIDISHA MISRA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

D2 Report 2022JTM2399

Uploaded by

BIDISHA MISRA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

ELL720:- Advanced Digital Signal Processing

Design Problem 2
Pitch Detection

Pitch or fundamental frequency is the lowest frequency component of a signal. The


pitch period is inverse of fundamental frequency , is the smallest repeating unit of a
signal. One such period describes the periodic signal (voiced part of the speech)
completely.

Extracting Pitch in Time domain:-


Most of the time-domain pitch period estimation techniques use auto-correlation
function (ACF).

Auto-correlation function :-
The basic idea of correlation-based pitch tracking is that the correlation signal will
have a peak of large magnitude at a lag corresponding to the pitch period.

ACF for a signal s[n] is computed as :--

Where ,
N - total number of samples in a window
K -Lag index
And N- should be as small as possible to show time variation
& N should be large enough to cover at least 2 periods so that periodicity can be captured by R[k].

Properties of R[k] :-
 Same periodicity as of s[m]
 Maximum value at k=0 and R[0] is equal to energy of deterministic signal.
 If s[m] is periodic with period of P samples , R[k] has maximum at k= 0, +P ,+2P , ………
Auto-correlation method

Original signal s4.wav is taken here to plot it in time domain and with sample
number.

 As seen from the above plot , the speech sample have spoken words and silence
in between different ranges.
 So , I am taking the spoken ranges of 17000 to 20000 , 7000 to 10000 , 3000 to
6000 and 12000 to 15000.

 I have also used Low -pass -filter before processing the speech signal .
 The filtered signal is shown above.

Speech segment from 17000 to 20000 sample number

pitch_Hz_17k_20k = 666.6667 hz (without filtering)

pitch_Hz_17k_20k = 571.4286 hz (with filtering)


Speech segment from 7000 to 10000 sample number

pitch_Hz_7k_10k = 533.3333 Hz (without filtering)


pitch_Hz_7k_10k = 533.3333 Hz (with filtering)
Speech segment from 3000 to 6000 sample number

pitch_Hz_3k_6k = 470.882 Hz (without filtering)

pitch_Hz_7k_10k = 470.882 Hz (with filtering)

Speech segment from 12000 to 15000 sample number

pitch_Hz_12k_15k = 533.3333 Hz (without filtering)

pitch_Hz_12k_15k = 533.3333 Hz (with filtering)


Now its the turn of s3.wav file sample
pitch_Hz_3.5k_5.5k =615.3846 Hz
pitch_Hz_8.2k_10.2k = 444.4444 Hz

pitch_Hz_13k_15k = 615.3846 Hz
pitch_Hz_22k_24k = 400 Hz

For s1.wav file :-

But the above method is doing pitch detection just for 3-4 ranges of the spoken
speech signal.
Now, another way to find the pitch for every 10ms of slots in whole of 3 second
speech signal.

Average pitch of s1.wav signal = 349.3121 Hz


For s2.wav file :-

Average pitch of s2.wav signal = 305.9340 Hz


For s3.wav file :-

Average pitch of s3.wav signal = 356.8377 Hz


For s4.wav file :-

average_pitch = 287.8709 Hz
Cepstral Method

Pitch Detection using Cepstral Method

Pitch detection is often done in the Cepstral domain because the Cepstral
domain represents the frequency in the logarithmic magnitude spectrum of a signal.
The Cepstrum is formed by taking the FFT (or IFFT) of log magnitude spectrum of a
signal. The reason for using the FFT or IFFT interchangeably is because one will just
give you a reversed version of the other, so each is equally valid for the processing
we wish to do.
Once in the cepstral domain, the pitch can be estimated by picking the peak
of the resulting signal within a certain range. The Cepstrum is given in term of
“quefrency” which, besides being a terrible name, represents pitch lag. Therefore,
the lag at which there is the most energy represents the dominant frequency in the
log magnitude spectrum thereby giving you the pitch.
There are of course some caveats to this approach. First of all, pitch and
fundamental frequency are not actually the same thing, so depending on which peak
your algorithm picks, you may be getting F0 (the fundamental) of FI (one of the
formants). Secondly, the Cepstrum is time shift variant. Therefore, you cannot just
apply this method blindly. Instead, you need to precisely line up your time domain
windows such that they start and stop exactly over a voiced speech segment. This is
not a trivial task as most VADs (Voice activity detection) often have errors and thus
your cepstrum will suffer from phase ambiguity.

Steps used :

1. Load the wav file


2. Define frame length , no. of samples, sampling frequency .
3. Evaluate FFT of every frame
4. Evaluate magnitude spectrum of every frame
5. Fid log magnitude of above spectrum
6. Find IFFT of above log -magnitude ==== this is the cepstrum we want.
7. As cepstrum is symmetric , used the half of this array

8. Apply "HIGH TIME LIFTERING" to get the pitch frequency


9. High Time liftered cepstrum

10. perform matrix multiplication of the half_cepstrum and liftering window


11. Simple Voice Activity detection

if mean(power_spectrum) >= 1 % an experimental value, most likely to fail


on other inputs
voiced_pitch_freq(length(voiced_pitch_freq)+1) = pitch_frequency; % record
frames identified as voiced
end

S1 file :-

Pitch frequency = 382.0938 (without VAD condition)


final_pitch_freq = 239.0619 Hz (with VAD condition)

S2 file :-
Pitch frequency =1.5472e+03 Hz (without VAD condition)
final_pitch_freq = 346.1053 Hz (with VAD condition)
S3 file :-

Pitch frequency = 471.3674 Hz without VAD condition

final_pitch_freq =250.8600 Hz (with VAD condition)


S4 file :-

Pitch frequency = 1.2335e+03 Hz (without VAD condition)

final_pitch_freq =309.1556 Hz (with VAD condition)


Comparision of above two methods :-

Pitch detection is a common task in digital signal processing that involves identifying
the fundamental frequency of a sound signal. There are several methods for pitch
detection, including the autocorrelation method and the cepstral method. Here's a
brief comparison of these two methods: -

Auto-correlation method:

In the auto-correlation method, the pitch of a sound signal is estimated by


calculating the auto-correlation function of the signal. The auto-correlation function
measures the similarity between a signal and a time-delayed version of itself. The
pitch of the signal is then estimated by identifying the delay that maximizes the
auto-correlation function.
Pros:
The auto-correlation method is simple to implement and computationally efficient.
It is a widely used method for pitch detection and can work well for many types of
sounds.
Cons:
The autocorrelation method can be sensitive to noise and harmonics that are not
related to the fundamental frequency.
It may not work well for complex signals with multiple sources and non-harmonic
components.

Cepstral method:

The cepstral method involves transforming the sound signal into the cepstral domain,
which is a logarithmic representation of the power spectrum of the signal. The
fundamental frequency can be estimated by analyzing the peaks in the cepstral
spectrum.
Pros:
The cepstral method is less sensitive to noise and harmonic interference compared
to the auto-correlation method.
It can work well for complex signals with multiple sources and non-harmonic
components.
Cons:
The cepstral method is computationally more expensive than the auto-correlation
method.
It may not work well for signals with low-frequency content, as the cepstral method
is based on logarithmic representation.
In summary, both the auto-correlation and cepstral methods have their strengths
and weaknesses, and the choice of method depends on the characteristics of the
sound signal and the specific application requirements. The auto-correlation method
is a good choice for simple signals with few sources and harmonic components,
while the cepstral method is more suitable for complex signals with multiple sources
and non-harmonic components.

You might also like