D2 Report 2022JTM2399
D2 Report 2022JTM2399
Design Problem 2
Pitch Detection
Auto-correlation function :-
The basic idea of correlation-based pitch tracking is that the correlation signal will
have a peak of large magnitude at a lag corresponding to the pitch period.
Where ,
N - total number of samples in a window
K -Lag index
And N- should be as small as possible to show time variation
& N should be large enough to cover at least 2 periods so that periodicity can be captured by R[k].
Properties of R[k] :-
Same periodicity as of s[m]
Maximum value at k=0 and R[0] is equal to energy of deterministic signal.
If s[m] is periodic with period of P samples , R[k] has maximum at k= 0, +P ,+2P , ………
Auto-correlation method
Original signal s4.wav is taken here to plot it in time domain and with sample
number.
As seen from the above plot , the speech sample have spoken words and silence
in between different ranges.
So , I am taking the spoken ranges of 17000 to 20000 , 7000 to 10000 , 3000 to
6000 and 12000 to 15000.
I have also used Low -pass -filter before processing the speech signal .
The filtered signal is shown above.
pitch_Hz_13k_15k = 615.3846 Hz
pitch_Hz_22k_24k = 400 Hz
But the above method is doing pitch detection just for 3-4 ranges of the spoken
speech signal.
Now, another way to find the pitch for every 10ms of slots in whole of 3 second
speech signal.
average_pitch = 287.8709 Hz
Cepstral Method
Pitch detection is often done in the Cepstral domain because the Cepstral
domain represents the frequency in the logarithmic magnitude spectrum of a signal.
The Cepstrum is formed by taking the FFT (or IFFT) of log magnitude spectrum of a
signal. The reason for using the FFT or IFFT interchangeably is because one will just
give you a reversed version of the other, so each is equally valid for the processing
we wish to do.
Once in the cepstral domain, the pitch can be estimated by picking the peak
of the resulting signal within a certain range. The Cepstrum is given in term of
“quefrency” which, besides being a terrible name, represents pitch lag. Therefore,
the lag at which there is the most energy represents the dominant frequency in the
log magnitude spectrum thereby giving you the pitch.
There are of course some caveats to this approach. First of all, pitch and
fundamental frequency are not actually the same thing, so depending on which peak
your algorithm picks, you may be getting F0 (the fundamental) of FI (one of the
formants). Secondly, the Cepstrum is time shift variant. Therefore, you cannot just
apply this method blindly. Instead, you need to precisely line up your time domain
windows such that they start and stop exactly over a voiced speech segment. This is
not a trivial task as most VADs (Voice activity detection) often have errors and thus
your cepstrum will suffer from phase ambiguity.
Steps used :
S1 file :-
S2 file :-
Pitch frequency =1.5472e+03 Hz (without VAD condition)
final_pitch_freq = 346.1053 Hz (with VAD condition)
S3 file :-
Pitch detection is a common task in digital signal processing that involves identifying
the fundamental frequency of a sound signal. There are several methods for pitch
detection, including the autocorrelation method and the cepstral method. Here's a
brief comparison of these two methods: -
Auto-correlation method:
Cepstral method:
The cepstral method involves transforming the sound signal into the cepstral domain,
which is a logarithmic representation of the power spectrum of the signal. The
fundamental frequency can be estimated by analyzing the peaks in the cepstral
spectrum.
Pros:
The cepstral method is less sensitive to noise and harmonic interference compared
to the auto-correlation method.
It can work well for complex signals with multiple sources and non-harmonic
components.
Cons:
The cepstral method is computationally more expensive than the auto-correlation
method.
It may not work well for signals with low-frequency content, as the cepstral method
is based on logarithmic representation.
In summary, both the auto-correlation and cepstral methods have their strengths
and weaknesses, and the choice of method depends on the characteristics of the
sound signal and the specific application requirements. The auto-correlation method
is a good choice for simple signals with few sources and harmonic components,
while the cepstral method is more suitable for complex signals with multiple sources
and non-harmonic components.