0% found this document useful (0 votes)

54 views4 pages

The Mel-Frequency Cepstral Coefficients in The Context of Singer Identification

This document discusses using Mel-frequency cepstral coefficients (MFCCs) to identify singers. It explores using lower-order MFCCs that characterize vocal tract resonances versus higher-order MFCCs related to glottal wave shape. Neural networks are trained on MFCC subsets and tested on different songs. Results show both subsets contribute to identity, but higher-order MFCCs are more robust to changes in singing style.

Uploaded by

justspamme

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views4 pages

The Mel-Frequency Cepstral Coefficients in The Context of Singer Identification

Uploaded by

justspamme

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

THE MEL-FREQUENCY CEPSTRAL COEFFICIENTS IN THE CONTEXT OF SINGER IDENTIFICATION

Annamaria Mesaros1,2 Technical University of Cluj Napoca Communications Department Cluj Napoca, ROMANIA [email protected]
1

Jaakko Astola 2 Institute of Signal Processing Tampere University of Technology Tampere, FINLAND [email protected]
2

ABSTRACT
The singing voice is the oldest and most complex musical instrument. A familiar singers voice is easily recognizable for humans, even when hearing a song for the rst time. On the other hand, for automatic identication this is a difcult task among sound source identication applications. The signal processing techniques aim to extract features that are related to identity characteristics. The research presented in this paper considers 32 Mel-Frequency Cepstral Coefcients in two subsets: the low order MFCCs characterizing the vocal tract resonances and the high order MFCCs related to the glottal wave shape. We explore possibilities to identify and discriminate singers using the two sets. Based on the results we can afrm that both subsets have their contribution in dening the identity of the voice, but the high order subset is more robust to changes in singing style. Keywords: sound source identication, singing voice, MFCC

INTRODUCTION

Considering the wide area of signals from our everyday life, problems concerning the singing voice characterization arise naturally after the interest on speaker and instrument recognition. Because of its particularities in production and control, the singing voice falls between the speech and musical instruments sounds, having common characteristics with each of these, but being also very different from both of them. Singing is composed mostly of sustained vowels with almost perfectly harmonic spectrum, resembling with the sustained sounds of musical instruments. In the mean time, the shape of the vocal tract that determines the sounds is a characteristic of human articulator system, intensely studied in speech recognition tasks. Because singers have to sustain vowels as long as
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. c 2005 Queen Mary, University of London

possible, they learn to develop a control technique over the pronunciation of the vowels (Barnes et al., 2004), thus putting difculties in the use of techniques and models from speech processing (Youngmoo, 2003). The cepstral coefcients are a set of features reported to be robust in some different pattern recognition tasks concerning human voice. They are widely used in speech recognition and also in speaker identication. Lately, research on musical instrument identication techniques proved the cepstral coefcients to be a useful set of features in this task also. The human voice is very well adapted to the ear sensitivity, most of the energy developed in speech being comprised in the lower frequency energy spectrum, below 4 kHz. In speech recognition tasks, usually the rst 12 coefcients are retained, considering that they represent the slow variations of the spectrum of the signal (Rabiner and Juang, 1993), characterizing the vocal tract shape, the spectrum of the uttered words. Attempts of using the same features in speaker recognition had proved that also identity features are coded into the cepstral coefcient representation of a sound. Experiments conducted on different number of speakers, with the use of neural networks in the modeling of categories and in the identication stage showed satisfactory results using a number of 12-14 coefcients (Seddik et al., 2004; Fredrickson and Tarassenko, 1995; Mafra and Simoes, 2004). Cepstral coefcients were also successfully used in instrument recognition: the use of 18 cepstral coefcients derived from a constant Q transform gives a good discrimination rate between oboe and clarinet (Brown, 1999), and the combination with temporal features can result in good instrument classication results (Eronen and Klapuri, 2000). In this paper, a study of Mel-Frequency Cepstral Coefcients is proposed, concerning the identication of singing voices. In speaker identication systems, the low order coefcients were used, comprising vocal tract frequency information. The singing voice has a much larger variability than speech and much higher frequency components, starting with pitch, that can be up until 1200 Hz in soprano voices. The aim in this study is to determine if it is appropriate to characterize the singing voices using higher order cepstral coefcients, that are related to pitch and ne spectral structure rather than to the formantic structure. We try to determine if the lower or the upper

610

subset of MFCCs encodes more individuality-related information. The paper is organized as it follows: rst we present a short review of the methods and processing for obtaining the cepstral coefcients, in section 2. Subsection 2.2 presents some basic concepts about neural networks and also the steps used in implementing, training and testing on different subsets of data. Section 3 will describe the study material, grouping of the data and training of the networks, and nally section 4 will present the results obtained for different network complexities in each case, giving the possibility to generalize the posed problem.

2.2 Feed-Forward Neural Networks The simplest architecture of a neural network is the feedforward network, consisting of one or more hidden layers through which the signal travels one way only, from the input to output. This architecture is extensively used in pattern recognition because of its basic task of associating inputs with outputs. Properly trained backpropagation networks are able to generalize problems and to handle reasonably inputs they have never seen. For improving the generalization of the neural networks during training and not get to the situation of overtting the data, the early stopping method was used. The data set is divided into three subsets: a training set which will be used in training, a validation set and a test set. The error on the validation set is monitored during the training process; when the network begins to overt the data, the error on the validation set will tend to rise, and the training will be stopped. This leads to a much faster training of the network, as long as we take upon the error, which will be larger than the imposed goal. To improve the training of a network, certain preprocessing techniques can be performed. The one used in this study is normalization of mean and standard deviation of the training set so that the training and the target sets will have zero mean and unity standard deviation. The MFCCs are decorrelated and there is no need to check for data redundancy with PCA. Post-training analysis is used to check the performance of the trained networks.

2 SIGNAL PROCESSING METHODS AND TOOLS

2.1 The Mel-Frequency Cepstral Coefcients The cepstrum of a time domain signal s(n) is the Inverse Fourier Transform of the log-magnitude spectrum of the signal. The log-magnitude spectrum of a real signal is a real and even function, thus the cepstrum is normally computed via Discrete Cosine Transform which is equivalent with the Fourier transform in case of even functions. An important preprocessing step in the analysis of speech signals is the pre-emphasis of high frequencies. This is done because the amount of energy carried in the high energy components is small compared to low frequencies. For the singing voice, the high frequency components are all the more important for the perceived quality. Preemphasis is usually done by ltering the signal with a FIR lter whose transfer function in time domain is: y (n) = x(n) ax(n 1) (1)

3 SETTING UP THE EXPERIMENTS

3.1 The Database The studied material consists in a number of 20 untrained voices. For each voice, there are two common musical phrases of medium length 3 seconds and a third different one of medium length 4 seconds, all sampled at 44100 Hz. The two common phrases were used as training data, and the third one for testing. It should be noted that while the models are constructed based on the same utterance, the identication uses different phrases for all the subjects. Four groups consisting of ve voices were set up for initial experiments concerning identity characterization, and one group containing 10 voices was used to test the capabilities of neural networks to model the data in case of extending the database. 3.2 The Feature Set The voice signals were pre-emphasized using a FIR lter as presented in eq. 1, with a = 0.95. MFCCs were calculated using the described method, as the DCT of the log-magnitude spectrum with 1024-point FFT. 32 MFCCs were calculated for each frame of the signal. The coefcients were partitioned for two different situations: coefcients 115 that characterize the smoothed spectrum, and coefcients 1532 for the ne structure of the spectrum. The two subsets represent the input for training the neural network. Neural networks were trained also with the entire set of cepstral coefcients to check if any improvement is obtained by using all the available information.

where a is close to 1, with typical values around 0.95. The processing continues with a Fourier analysis of the windowed signal. A Hamming window of 20 ms was considered. The Mel-frequency scaling is done by a bank of triangular band-pass lters, nonuniformly distributed along the frequency axis. The Mel-scale equivalent value for frequency f expressed in Hz is: mel(f ) = 2595log10(1 + f ) 700 (2)

The MFCCs are computed by redistributing the linearlyspaced bins of the log-magnitude FFT into Mel-spaced bins according to eq. 2, and applying DCT on the redistributed spectrum. A relatively small number of coefcients (typically 13) provide a smoothed version of the spectral envelope, leading to the isolation of the vocal tract response by the simple retention of the desired amount of information. An additional advantage in using MFCCs is that they have a decorrelating effect on the spectral data, maximizing the variance of the coefcients, similar to the effect of Principal Component Analysis. This allows the elimination of one of the preprocessing steps in the neural network training, which is the actual PCA to eliminate data redundancy.

611

3.3 The Neural Networks For the groups of ve voices, the neural network was chosen to have one 20-neurons hidden layer. One of the ve neurons in the output layer was assigned to each voice by giving a positive unity answer. The number of neurons in the hidden layer was increased to 40 for modeling the group of 10 voices. We chose a training function that uses a variable learning rate set to 0.09 and with early stopping method. The validation data for this task was 1/4 of the whole training set. Initial experiments showed that neurons with tan-sigmoid transfer function perform much better in this recognition task than neurons with log-sigmoid transfer function. The input data was normalized so that all the coefcients have zero mean and unity variance. Using each subset of MFCCs, several networks were trained to ensure that we obtain the best results.

PROBABILITY DENSITY ESTIMATES 10

0 0.3

0.2

0.1

0.2

0.3

0.4

0.5

0.6

Figure 1: Probability distribution estimates of neurons responses for the test phrase; modeling with MFCCs 1-15
PROBABILITY DENSITY ESTIMATES 7

4 TRAINING RESULTS AND SIMULATIONS

For each group of ve voices, a neural network was trained with the two common phrases. The early stopping method implies monitoring the error on the validation set during training. At rst, the error will decrease, in the data tting process, but in case the network starts to overt the data, the error will rise and the training will return the weights from the minimum attended error. Usually the training stopped around 0.15 to 0.20 error, depending on the difculty of modeling the data. The closer the value is to 1, the better the data t for the corresponding voice. Based on the t values we would expect best results in identication with the whole set of MFCCs. In the conditions of these results, we test the network with unknown data. The test phrase was processed through the same steps in order to obtain the sets of MFCCs and the coefcients were presented frame by frame as input to the trained network. We emphasize the fact that the test data is different for each voice, so in some cases it might resemble to the training data, while in others it can be very different. Table 1 summarizes the percent of correctly labeled frames and the degree of data t for one group of ve voices. Although the correlation test shows better modeling of classes with the entire set of coefcients, it is not always necessary to use them all. Some voices can be distinguished by using the rst 15 cepstral coefcients, while for others, the information in the upper coefcients gives the difference. In the mean time, using all of the coefcients in the same classication does not always provide a more reliable result. For increasing the number of voices used in the study, we trained a neural network with one 40-neurons hidden layer, using a set of ten voices, in the same conditions. Probability density estimates can be constructed based on the response of each neuron in each frame. The positive response neuron for one voice should have a PDE with mean close to 1, while the rest of the neurons should have PDEs close to 0. Figures 1-3 illustrate PDEs of the responses of the 10 neurons for the test phrase, estimated in 100 equidistant points, in one case that cannot be solved using low order MFCCs. The positive response

0 0.3

0.2

0.1

0.2

0.3

0.4

0.5

Figure 2: Probability distribution estimates of neurons responses for the test phrase; modeling with MFCCs 15-32
PROBABILITY DENSITY ESTIMATES 10 9

0 0.4

0.3

0.2

0.1

0.2

0.3

0.4

0.5

0.6

Figure 3: Probability distribution estimates of neurons responses for the test phrase; modeling with MFCCs 1-32 neuron is represented by the + line. The generalization of the results state that the upper order cepstral coefcients contain at least the same quantity of information as the lower order ones. The cepstrum decomposes the problem in resonance-related information (low-order coefcients) and source-related information (high-order coefcients). As expected, both have their contribution to dening the identity of a voice, in singing, thus the source-related coefcients can be used to characterize the identity of the voice, and seem to behave well to changing the singing

612

Table 1: Correlation coefcient between target and network output for the training set and identication percent on the test phrase for one 5-categories experiment t 0.69 0.68 0.78 v01 identif 0.46 0.53 0.50 t 0.67 0.59 0.78 v02 identif 0.79 0.63 0.80 t 0.63 0.65 0.78 v03 identif 0.60 0.64 0.74 t 0.60 0.61 0.79 v04 identif 0.81 0.64 0.84 t 0.62 0.56 0.79 v05 identif 0.47 0.47 0.54

coeff 1-15 coeff 15-32 coeff 1-32

style. Compared with the results obtained on speaker identication, it can be argued that in speech the lter part of the system does not have such a great variability as in singing, that is why the use of upper coefcients was generally not considered.

References
J. Barnes, P. Davis, J. Oates, and J. Chapman. The relationship between professional operatic soprano voice and high range spectral energy. The Journal of the Acoustical Society of America, 116(1):530538, July 2004. J. C. Brown. Computer identication of musical instruments using pattern recognition with cepstral coefcients as features. The Journal of the Acoustical Society of America, 105:19331941, 1999. A. Eronen and A.; Klapuri. Musical instrument recognition using cepstral coefcients and temporal features. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2:753756, June 2000. S.E. Fredrickson and L. Tarassenko. Text-independent speaker recognition using neural network techniques. Fourth International Conference on Articial Neural Networks, pages 1318, June 1995. S. Hayakawa and F. Itakura. Text-dependent speaker recognition using the information in the higher frequency band. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1:137140, April 1994. A.T. Mafra and M.G. Simoes. Text independent automatic speaker recognition using selforganizing maps. 39th IAS Annual Meeting Conference Record of the Industry Applications Conference, 3:15031510, October 2004. L. Rabiner and B-H. Juang. Fundamentals of speech recognition. PTR Prentice Hall, Englewood Cliffs, New Jersey, 1993.

CONCLUSIONS

This paper presented a study of Mel-frequency cepstral coefcients in the context related to singing voice identication. The human articulator system in voicing is modeled in signal processing as a system with a specic signal - the glottal wave - as input to a linear time-invariant lter - the vocal tract. The low order cepstral coefcients represent information about the vocal tract shape, and the high order coefcients characterize the source signal. Both parts contain important information about voice identity. In the case of singing voice, the input of the system is more invariant than the lter part. Cases difcult to handle with low-order MFCCs can eventually be solved correctly by using the high-order MFCCs. In this study no special care was taken for best trained neural networks; the purpose was rough and fast training for testing the selected features. For reliable results with neural networks in case of working with a large number of classes, parallel networks are used in order to achieve low complexity, fast training and small error rates in training each network.

FUTURE WORK

The results of the study lead to searching for a different way of characterizing the source in the articulator system, independently of the vocal tract parameters. A widely used method for estimating the glottal ow is through the Liljencrantz-Fant model; the processing involves determination of the closed glottis period, for correct inverse ltering. In singing and in high-pitched voices this is a real problem, because the closed glottis period may be too short for correct estimation of the inverse lter parameters. Also, authors of such studies used the voice signal and the simultaneous electroglottograph signal in order to locate specic instants in the voice signal. This method is inappropriate outside of laboratories, that is why we aim for an equivalent method of describing the glottal wave characteristics using information extracted only from the signal.

H. Seddik, A. Rahmouni, and M. Sayadi. Text independent speaker recognition using the mel frequency cepstral coefcients and a neural network classier. First International Symposium on Control, Communications and Signal Processing, pages 631634, 2004. F. Sun, B. Li, and H. Chi. Some key factors in speaker recognition using neural networks approach. IEEE International Joint Conference on Neural Networks, 3: 27522756, November 1991. J Sundberg. Research on the singing voice in retrospect. TMH-QPSR Speech, Music and Hearing, 45:1122, 2003. E.K. Youngmoo. Singing voice analysis/synthesis. PhD thesis, Massachusetts institute of Technology, 2003.

613

Cepstrum Pitch Determination: OICED-speech Sounds Result From The Resonant
100% (1)
Cepstrum Pitch Determination: OICED-speech Sounds Result From The Resonant
17 pages
Linux QoS (Save)
No ratings yet
Linux QoS (Save)
18 pages
Mapping Stakeholder Engagement Updated 42718
100% (1)
Mapping Stakeholder Engagement Updated 42718
13 pages
Cepstrum vs. LPC: A Comparative Study For Speech Formant Frequencies Estimation
No ratings yet
Cepstrum vs. LPC: A Comparative Study For Speech Formant Frequencies Estimation
16 pages
Speaker Recognition Using Vocal Tract Features
No ratings yet
Speaker Recognition Using Vocal Tract Features
5 pages
PIIS0892199721004331
No ratings yet
PIIS0892199721004331
12 pages
COMMUNICATION SYSTEMS
From Everand
COMMUNICATION SYSTEMS
B.P. Lathi
No ratings yet
Speech Feature Extraction
No ratings yet
Speech Feature Extraction
9 pages
Scale Transform in Speech Analysis
No ratings yet
Scale Transform in Speech Analysis
6 pages
Mel-Frequency Cepstral Coefficients Explained Easily
No ratings yet
Mel-Frequency Cepstral Coefficients Explained Easily
75 pages
11111111111111
No ratings yet
11111111111111
4 pages
Vocal Segment Classification in Popular Music
No ratings yet
Vocal Segment Classification in Popular Music
6 pages
Inter Speech 2006 Nakano
No ratings yet
Inter Speech 2006 Nakano
4 pages
Feature Extraction MFCCs PDF
No ratings yet
Feature Extraction MFCCs PDF
15 pages
Evaluation MFCC For Music Similarity
No ratings yet
Evaluation MFCC For Music Similarity
5 pages
Lecture Notes 10 - Monday 7/10: Summary of Last Lecture
No ratings yet
Lecture Notes 10 - Monday 7/10: Summary of Last Lecture
5 pages
Mel Frequency Cepstral Coefficient (MFCC) - Guidebook - Informatica e Ingegneria Online
No ratings yet
Mel Frequency Cepstral Coefficient (MFCC) - Guidebook - Informatica e Ingegneria Online
12 pages
Jennifer - Springer - 2020 Published PDF
No ratings yet
Jennifer - Springer - 2020 Published PDF
14 pages
List of Figures: Second Unit: Audio and Speech Descriptors
No ratings yet
List of Figures: Second Unit: Audio and Speech Descriptors
22 pages
Cepstral Signal Analysis For Pitch Detection: 1.1 Definition of The Cepstral Coefficients
No ratings yet
Cepstral Signal Analysis For Pitch Detection: 1.1 Definition of The Cepstral Coefficients
3 pages
Transmagnetic Resonance Field Theory
From Everand
Transmagnetic Resonance Field Theory
Timothy E. Douglas
No ratings yet
Singer Identification in Popular Music Recordings Using Voice Coding Features
No ratings yet
Singer Identification in Popular Music Recordings Using Voice Coding Features
6 pages
025 What Effect Audio Quality Robustness MFCC Chroma Features
No ratings yet
025 What Effect Audio Quality Robustness MFCC Chroma Features
6 pages
Analysing_and_understanding_the_singing
No ratings yet
Analysing_and_understanding_the_singing
13 pages
13MFCC Tutorial
No ratings yet
13MFCC Tutorial
6 pages
An Automatic Speaker Recognition System
100% (1)
An Automatic Speaker Recognition System
11 pages
99999999999999999
No ratings yet
99999999999999999
4 pages
Automatic Singing Quality Recognition Employing Artificial Neural Networks
No ratings yet
Automatic Singing Quality Recognition Employing Artificial Neural Networks
7 pages
Listening Evaluation and Classification of Female Singing Voice Categories
No ratings yet
Listening Evaluation and Classification of Female Singing Voice Categories
14 pages
Articulo 5
No ratings yet
Articulo 5
46 pages
Lab2 Cepstrales Sin Cepstrales
No ratings yet
Lab2 Cepstrales Sin Cepstrales
21 pages
Discrete Representation of Signal
No ratings yet
Discrete Representation of Signal
34 pages
First Research Paper
No ratings yet
First Research Paper
15 pages
Chapter - 1: 1.1 Introduction To Music Genre Classification
No ratings yet
Chapter - 1: 1.1 Introduction To Music Genre Classification
57 pages
Pitch Estimation Using A Full/Multi-Band Approaches: Mikhail Tadjikov, Arya Ahmadi
No ratings yet
Pitch Estimation Using A Full/Multi-Band Approaches: Mikhail Tadjikov, Arya Ahmadi
5 pages
paper183
No ratings yet
paper183
8 pages
Mel-Scaled Filter Bank: Mel (F) 2595 Log10 (1+f/700)
No ratings yet
Mel-Scaled Filter Bank: Mel (F) 2595 Log10 (1+f/700)
3 pages
Maretext Independent Speaker Identification Based On K-Mean Algorithm
No ratings yet
Maretext Independent Speaker Identification Based On K-Mean Algorithm
9 pages
Musical Instrument Identi Cation With Feature Selection Using Evolutionary Methods Loughran Thesis
No ratings yet
Musical Instrument Identi Cation With Feature Selection Using Evolutionary Methods Loughran Thesis
281 pages
Brent William
No ratings yet
Brent William
173 pages
Biometrics Lecture Speech
No ratings yet
Biometrics Lecture Speech
38 pages
The Use and Effective Analysis of Vocal Spectrum A
No ratings yet
The Use and Effective Analysis of Vocal Spectrum A
14 pages
s10844-010-0140-5
No ratings yet
s10844-010-0140-5
22 pages
A-Formant-Range-Profile-for-Singers_2016_ymvj
No ratings yet
A-Formant-Range-Profile-for-Singers_2016_ymvj
5 pages
Automatic Detection of Pathological Voic
No ratings yet
Automatic Detection of Pathological Voic
10 pages
The Physics of Singing Vibrato
No ratings yet
The Physics of Singing Vibrato
7 pages
Final Report On Speech Recognition Project
No ratings yet
Final Report On Speech Recognition Project
32 pages
3 Deec 51 Ae 28 Ba 013 A 4
No ratings yet
3 Deec 51 Ae 28 Ba 013 A 4
5 pages
VocalSet Analysis
No ratings yet
VocalSet Analysis
7 pages
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
Content-Based Classification of Musical Instrument Timbres: Agostini Longari Pollastri
100% (1)
Content-Based Classification of Musical Instrument Timbres: Agostini Longari Pollastri
8 pages
Bruder Et Al-2024-Scientific Reports
No ratings yet
Bruder Et Al-2024-Scientific Reports
15 pages
Speech Analysis
No ratings yet
Speech Analysis
6 pages
Automatic Classification of Singing Voice Quality
No ratings yet
Automatic Classification of Singing Voice Quality
7 pages
Music Speech and Song
100% (1)
Music Speech and Song
11 pages
Recognizing Voice For Numerics Using MFCC and DTW
No ratings yet
Recognizing Voice For Numerics Using MFCC and DTW
4 pages
pxc3872774 PDF
No ratings yet
pxc3872774 PDF
7 pages
Vector Quantization Approach For Speaker Recognition Using MFCC and Inverted MFCC
No ratings yet
Vector Quantization Approach For Speaker Recognition Using MFCC and Inverted MFCC
7 pages
Journal of New Music Research: To Cite This Article: Shlomo Dubnov, Naftali Tishby & Dalia Cohen (1995) Hearing
No ratings yet
Journal of New Music Research: To Cite This Article: Shlomo Dubnov, Naftali Tishby & Dalia Cohen (1995) Hearing
29 pages
3764
No ratings yet
3764
4 pages
12 Celpstrum y Spectrum 2010
No ratings yet
12 Celpstrum y Spectrum 2010
10 pages
Audio Indexing: Feature Extraction
No ratings yet
Audio Indexing: Feature Extraction
1 page
Design and Development of Online Hospital Management Information System
No ratings yet
Design and Development of Online Hospital Management Information System
10 pages
PBXact 100 Datasheet
No ratings yet
PBXact 100 Datasheet
2 pages
Presentation of Internship Work: Data Analyst Intern
No ratings yet
Presentation of Internship Work: Data Analyst Intern
20 pages
Sequence Taken From The Movie "D - Wars" (Dragon Wars) : Bam 403 Computer Lab On Compositing - 1
No ratings yet
Sequence Taken From The Movie "D - Wars" (Dragon Wars) : Bam 403 Computer Lab On Compositing - 1
3 pages
LampSite Solution Product Description-20150707
No ratings yet
LampSite Solution Product Description-20150707
20 pages
Assign 1
100% (1)
Assign 1
5 pages
Lecture 1
No ratings yet
Lecture 1
21 pages
MCSL-054 Solved Assignment 2017-18
No ratings yet
MCSL-054 Solved Assignment 2017-18
16 pages
Tuning
No ratings yet
Tuning
2 pages
Wrfase Acceptance Letter 360 PDF
No ratings yet
Wrfase Acceptance Letter 360 PDF
3 pages
TIPING
No ratings yet
TIPING
6 pages
Homework No: 3: ATM Networks (CSE-884)
No ratings yet
Homework No: 3: ATM Networks (CSE-884)
8 pages
Intro
No ratings yet
Intro
2 pages
Using The Group Functions Questions
No ratings yet
Using The Group Functions Questions
27 pages
Configuration of FSCM Dispute Management
60% (5)
Configuration of FSCM Dispute Management
36 pages
Web RFP Template
100% (4)
Web RFP Template
4 pages
QP CSC Q0402 Draughtsman Mechanical 1 02.07.2018
No ratings yet
QP CSC Q0402 Draughtsman Mechanical 1 02.07.2018
34 pages
Pps Previous Papers
No ratings yet
Pps Previous Papers
8 pages
Operating System Exercises - Chapter 11-Sol
No ratings yet
Operating System Exercises - Chapter 11-Sol
4 pages
Daa Lab Manual
No ratings yet
Daa Lab Manual
37 pages
Request For Correction Forms
No ratings yet
Request For Correction Forms
6 pages
13 B A Sem-3 Nov 2019 External Seat Nos 50401 50450 PDF
No ratings yet
13 B A Sem-3 Nov 2019 External Seat Nos 50401 50450 PDF
50 pages
Hidden-Powers VCDS-Lite Guide v1.2
100% (1)
Hidden-Powers VCDS-Lite Guide v1.2
31 pages
Collection of Zakat Infaq Shodaqoh Funds Based On The Intentions of Gen Y Muslim Behavior in Using Digital Payment Technology
No ratings yet
Collection of Zakat Infaq Shodaqoh Funds Based On The Intentions of Gen Y Muslim Behavior in Using Digital Payment Technology
14 pages
Subsurface Data Management - Getting It Right
No ratings yet
Subsurface Data Management - Getting It Right
12 pages
Correlator Solution: 1 Collocation Method
No ratings yet
Correlator Solution: 1 Collocation Method
10 pages
GMS User Manual v9.1 PDF
No ratings yet
GMS User Manual v9.1 PDF
710 pages
Information and Communications University: Inner Classes in Java
No ratings yet
Information and Communications University: Inner Classes in Java
25 pages

The Mel-Frequency Cepstral Coefficients in The Context of Singer Identification

Uploaded by

The Mel-Frequency Cepstral Coefficients in The Context of Singer Identification

Uploaded by

THE MEL-FREQUENCY CEPSTRAL COEFFICIENTS IN THE CONTEXT OF SINGER IDENTIFICATION

2 SIGNAL PROCESSING METHODS AND TOOLS

3 SETTING UP THE EXPERIMENTS

PROBABILITY DENSITY ESTIMATES 10

4 TRAINING RESULTS AND SIMULATIONS

coeff 1-15 coeff 15-32 coeff 1-32

You might also like