Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach

an automatic speech recognition framework developed to differentiate between normal and pathological voices

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views8 pages

Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach

an automatic speech recognition framework developed to differentiate between normal and pathological voices

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

ARTICLE IN PRESS

Detection of Pathological Voice Using Cepstrum

Vectors: A Deep Learning Approach
*Shih-Hau Fang, †Yu Tsao, *Min-Jing Hsiao, *Ji-Ying Chen, ‡Ying-Hui Lai, §Feng-Chuan Lin, and §,¶,**Chi-Te
Wang, *Taoyuan and †‡§¶**Taipei, Taiwan

Summary: Objectives. Computerized detection of voice disorders has attracted considerable academic and clini-
cal interest in the hope of providing an effective screening method for voice diseases before endoscopic confirmation.
This study proposes a deep-learning-based approach to detect pathological voice and examines its performance and
utility compared with other automatic classification algorithms.
Methods. This study retrospectively collected 60 normal voice samples and 402 pathological voice samples of 8 common
clinical voice disorders in a voice clinic of a tertiary teaching hospital. We extracted Mel frequency cepstral coeffi-
cients from 3-second samples of a sustained vowel. The performances of three machine learning algorithms, namely,
deep neural network (DNN), support vector machine, and Gaussian mixture model, were evaluated based on a five-
fold cross-validation. Collective cases from the voice disorder database of MEEI (Massachusetts Eye and Ear Infirmary)
were used to verify the performance of the classification mechanisms.
Results. The experimental results demonstrated that DNN outperforms Gaussian mixture model and support vector
machine. Its accuracy in detecting voice pathologies reached 94.26% and 90.52% in male and female subjects, based
on three representative Mel frequency cepstral coefficient features. When applied to the MEEI database for validation,
the DNN also achieved a higher accuracy (99.32%) than the other two classification algorithms.
Conclusions. By stacking several layers of neurons with optimized weights, the proposed DNN algorithm can fully
utilize the acoustic features and efficiently differentiate between normal and pathological voice samples. Based on this
pilot study, future research may proceed to explore more application of DNN from laboratory and clinical perspectives.
Key Words: Nodule–Polyp–Neoplasm–Spasmodic dysphonia–Sulcus.

INTRODUCTION differentiate between normal and pathological voices. A previ-

From a health science perspective, the pathological status of the ous study reported an overall accuracy of 93.4% using a linear
human voice can substantially reduce the quality of life and oc- discriminant analysis to detect continuous speech samples from
cupational performance,1 which results in considerable costs for the voice disorder database of Massachusetts Eye and Ear In-
both the patient and the society. Common pathologies of im- firmary (MEEI).5,6 Later studies using Mel frequency cepstral
paired vocal function include structural lesions (eg, vocal fold coefficients (MFCCs) and Gaussian mixture model (GMM)
nodules, polyps, and cysts), neoplasms, and neurogenic disor- further improved the classification accuracy to 94.1%.7 Another
ders (eg, vocal fold paralysis and adductor spasmodic dysphonia).2 scheme combining the modified MFCCs and hidden Markov
Current standards recommend the use of laryngeal endoscopy model has been proposed by Costa et al.8 Meanwhile, other work
for the accurate diagnoses of voice disorders,3 which requires applied neural networks to classify MFCC features9 and showed
well-trained specialists and expensive equipment. In places without that the performance can be improved by differentiating the speak-
sufficient medical resources, and for patients without adequate er’s gender.10 Combining MFCCs and other acoustic features,
insurance coverage, correct diagnosis and subsequent treat- Arias-Londono et al developed a multistaged classifier and ef-
ment may be delayed. fectively increased the accuracy to 98%.11,12 Aside from the MFCC
To mitigate these problems, noninvasive screening methods features, wavelet transform and the modulated spectrum were
have been proposed for clinical applications.4 Because laryn- also applied for the tasks of abnormal voice detection using
geal disorders, particularly those originating from the membranous various algorithms with good accuracy.13–15
vocal folds, almost always result in the change of voice quality, Despite the success of the abovementioned studies, automat-
an automatic speech recognition framework was developed to ic expert systems for voice disorders have not popularly used
as initially envisioned because people are still reluctant to rely
Accepted for publication February 6, 2018. on a machine to receive a diagnosis. In recent years, the concept
From the *Department of Electric Engineering, Yuan Ze University, Taoyuan, Taiwan;
†Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan;
of deep learning has attracted considerable attention. Deep
‡Institute of Biomedical Engineering, National Yang-Ming University, Taipei, Taiwan; §De- neural network (DNN) approaches have been extensively imple-
partment of Otolaryngology Head and Neck Surgery, Far Eastern Memorial Hospital, Taipei,
Taiwan; ¶Department of Special Education, University of Taipei, Taipei, Taiwan; and the
mented to model data in numerous applications.16–19 With
**Department of Otolaryngology Head and Neck Surgery, National Taiwan University College increasing daily exposure to machine learning and big data em-
of Medicine, Taipei, Taiwan.
Address correspondence and reprint requests to Chi-Te Wang, Department of
powered health services of the general population,20–22 it might
Otolaryngology Head and Neck Surgery, Far Eastern Memorial Hospital, 21, Section 2, be a good timing to reinvestigate the potential of computerized
Nan-Ya South Road, Pan Chiao District, New Taipei City, Taiwan. E-mail: drwangct@
gmail.com
classification systems of voice disorders. To the best of our knowl-
Journal of Voice, Vol. ■■, No. ■■, pp. ■■-■■ edge, no study has attempted to use DNN for the detection of
0892-1997
© 2018 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
pathological voice samples before. Accordingly, we conducted
https://doi.org/10.1016/j.jvoice.2018.02.003 this pilot study with the following objectives: (1) to propose a
ARTICLE IN PRESS
2 Journal of Voice, Vol. ■■, No. ■■, 2018

DNN-based system for detecting features extracted from voice the coefficients had zero mean and unit variance, and ap-
samples, (2) to examine the performance of DNN in differen- pended the delta MFCCs to the normalized MFCCs to form
tiating between normal and pathological voice samples, and (3) the third feature vectors. The details of MFCCs and their varia-
to validate the accuracy of the DNN using the widely applied tions were outlined in a previous publication.27
voice disorder database from MEEI. The experimental setups for the acoustic signal processing and
feature extraction procedures are described later. The first feature,
MATERIALS AND METHODS made up of 13-dimension MFCCs, was extracted from a 16-
Study subjects millisecond windowed signal using an 8-millisecond frameshift.
Voice samples were obtained from a voice clinic in a tertiary A window length of 16 milliseconds is used to capture the fast
teaching hospital (Far Eastern Memorial Hospital, FEMH), which dynamic acoustic waves, whereas the 8-millisecond frameshift
included 60 normal voice samples and 402 samples of common enables smoothness between frames. Similar settings were applied
voice disorders, including vocal nodules, polyps, and cysts; glottic in many previous studies.28,29 The next feature, MFCC + delta,
neoplasm; vocal atrophy; laryngeal dystonia (ie, spasmodic dys- was created by appending 13 velocity features to the original
phonia and tremor); unilateral vocal paralysis; and sulcus vocalis 13-dimension MFCCs and thus had 26 dimensions. The third
(Tables 1 and 2). Voice samples of a 3-second sustained vowel feature, denoted by MFCC(N) + delta for convenience, has the
sound /a:/ were recorded at a comfortable level of loudness, with same dimensions as that of MFCC + delta. The only difference
a microphone-to-mouth distance of approximately 15–20 cm, is that the former normalized all MFCC coefficients with zero
using a high-quality microphone (Model: SM58, SHURE, IL),23 mean and unit variance.
with a digital amplifier (Model: X2u, SHURE) under a back-
ground noise level between 40 and 45 dBA. The sampling rate
DNN
was 44,100 Hz with a 16-bit resolution, and data were saved in
A DNN model comprises multiple hidden layers to form a
an uncompressed .wav format.
complex mapping function between the inputs and outputs. Pre-
vious studies have verified that a DNN model can provide a
Feature extraction from MFCCs satisfactory performance in speech enhancement.30 In DNN, the
Derived through pre-emphasis, windowing, fast Fourier trans- relationship between the input, x, and the output of the first hidden
form, Mel filtering, nonlinear transformation, and discrete cosine layer, h1, is described as
transform, MFCCs have been widely used in acoustic research.24
For example, MFCC and MFCC + delta features were selected h1 = f (W1 x) + b1, (1)
for voice disorder detection,7,10,11 and the normalized version where W1 and b1 are the weight matrix and bias vector, respec-
was selected for performance comparison.25,26 To capture these tively, and f (.) is the activation function. In this study, we use
MFCC features, first, the raw waveform was divided into N the sigmoid function for the activation function, namely,
frames (or segments), represented by the vectors x1, . . ., xN f(z) = [1 + exp(−z)](−1), based on the better performance among
(Figure 1). A total of N frames were then transformed into N different activation functions (Appendix 1 of the Supplemen-
MFCC vectors, representing the acoustic features. Next, for tary material). The relationship between the current hidden layer
the second feature, we calculated the trajectories of the MFCCs and the next hidden layer can be expressed as
over time (delta MFCCs) and appended them to the original
MFCCs. Finally, we normalized the MFCCs such that all of hi+1 = f (Wi+1hi + bi+1 ), i = 1, 2, … , L −1, (2)

TABLE 1.
Demographics of the 462 Normal and Pathological Voice Samples
Number Mean Age (y) Age Range (y) Standard Deviation
M F M F M F M F
Normal 16 44 30.7 30.1 23–37 22–47 3.93 5.79
Pathological 189 213 56.1 44.2 20–87 20–87 15.9 14.9
Abbreviations: M, male; F, female.

TABLE 2.
Disease Categories of the 402 Pathological Voice Samples
Nodules Polyp Cyst Neoplasm Atrophy Dystonia Vocal Palsy Sulcus
M 1 18 17 43 39 2 41 28
F 51 33 34 5 16 17 26 31
Abbreviations: M, male; F, female.
ARTICLE IN PRESS
Shih-Hau Fang, et al Detection of Pathological Voice Using Deep Neural Network 3

FIGURE 1. Illustration of the acoustic waveform segmentation procedure.

where L is the DNN hidden layer number. Finally, another func- θ * = arg min O ( y, yˆ ; X , θ ), (5)
θ
tion, g(.), is adopted on the output layer to form the output vector
ŷ. Thus, we have where the standard back-propagation algorithm is applied to
compute θ∗ in Eq. (5).
yˆ = g ( hL ). (3)
For classification tasks, the softmax function is usually adopted
Experimental setup
for g(.). To compute the parameters in the DNN model with the
We examined 1–16 Gaussian mixtures for the GMM and
training samples X = [x1, xi, . . ., xN] and the corresponding labels
tested the performance of different kernel functions for the
Y = [y1, yi, . . ., yN], where N is the total number of training
SVM, including linear, quadratic, and Gaussian functions. The
samples, we formulate an objective function:
DNN was structured into multiple hidden layers with varying
−1 N J neuron numbers in each layer. The best combination of
O (Y , Yˆ ; X , θ ) = ∑ ∑[ yi,j log yˆi, j ], (4) hidden layers and number of neurons was determined based
NJ i=1 j=1
on the experimental results. The threshold of the ratio of the
where θ = {Wl, bl, l = 1, 2, L} is the DNN parameter set, pathological feature vectors was investigated from 0.1 to 0.9,
and Ŷ = [ŷ1, ŷi, . . ., ŷN ] is the DNN output (ŷi is the ith with a 0.1 increment. The performance was evaluated through
DNN output given input xi); yi,j and ŷi,j denote the jth a fivefold cross-validation. We utilized general accuracy, which
element of yi and ŷi, respectively. The parameter is then is widely used for detection tasks, as the main performance
estimated by metric.
ARTICLE IN PRESS
4 Journal of Voice, Vol. ■■, No. ■■, 2018

FIGURE 3. Accuracy of different numbers of mixtures in GMM with (A) MFCC, (B) MFCC + delta, and (C) MFCC (N) + delta features.

RESULTS AND DISCUSSION function was chosen because of its highest accuracy for female
Spectrogram and acoustic waveforms voice samples with the lowest variation of accuracy among male
Figure 2 shows the waveform and spectrogram plots of normal samples (Appendix 2 of the Supplementary material). We also
and pathological voice samples. From the waveform plots, the compared the accuracies of GMM using different numbers of
pathological voice sample (B) showed irregular and wider varia- Gaussian mixtures with three MFCC features. The results show
tions of amplitude compared with the normal voice sample (A). that the accuracy increases as the number of mixtures in-
Meanwhile, from the spectrogram plots, the normal voice sample creases from 1 to 6, whereas the performance becomes saturated
(C) presented clearer harmonic structures, especially in the low- when the number of Gaussian mixtures ranges from 8 to 16
frequency areas. In contrast, the pathological voice sample (D) (Figure 3). Accordingly, we used eight Gaussian mixtures as a
showed blurred harmonic structures and contained noise-like com- representative GMM model in the following study. We also
ponents in the high-frequency region. examine the performance among different DNN structures (ie,
hidden layers and number of neurons); the results showed that
the best performance was achieved when using 3 hidden layers
Optimal setting for SVM, GMM, and DNN with 300 neurons in each layer (Appendix 3 of the Supplemen-
We performed multiple experiments to determine the optimal tary material). Finally, we investigated the ratio of the pathological
kernel functions for SVM. The Gaussian radial basis kernel feature vectors to determine the adequate cut-off value.

FIGURE 2. Waveform of a normal voice sample (A) and a pathological voice sample (B). Wide band spectrogram in the corresponding normal
voice sample (C) and pathological voice sample (D).
ARTICLE IN PRESS
Shih-Hau Fang, et al Detection of Pathological Voice Using Deep Neural Network 5

FIGURE 4. (A) True-positive rate (TPR), false-negative rate (FNR); (B) positive predictive value (PPV) and false detection rate (FDR) among
different ratios of pathological feature vectors on DNN.

Experimental results indicated that 0.5 could be a good balance useful for improving the performance, particularly for the GMM
point between the true-positive rate, false-negative rate, posi- and DNN methods, in the FEMH database of voice disorders.
tive predictive value, and false detection rate (Figure 4).
Validation of DNN performance using MEEI voice
Accuracy of DNN in comparison with SVM and disorder database
GMM To validate the aforementioned results, which indicates that
Table 3 compares the performance of the three algorithms and DNN outperforms SVM and GMM in detecting pathological
three features among the voice samples of the male subjects from voice samples, we applied a common voice disorder database
FEMH. The table indicates that the DNN coupled with the from MEEI under the same experimental setting. We retrieved
MFCC(N) + delta feature provides higher accuracy (94.26%), 53 normal and 173 pathological samples from the MEEI
with a lower standard variation (2.25%) than those of the database,5 identical to a previously published study.7 Results in
other two classification algorithms (GMM and SVM) and the Table 5 showed that DNN provides greater accuracy and a
other two features (MFCC and MFCC + delta). Data on the lower standard deviation than SVM and GMM, indicating the
performance of the DNN among female subjects are provided same tendency as the previous results obtained using the FEMH
in Table 4, which also demonstrates that the DNN and data (Tables 3 and 4). Similarly, compared with previous studies
MFCC(N) + delta feature achieve the highest accuracy for de- using neural networks with a single hidden layer to detect
tecting pathological voice samples. However, the overall accuracy pathological voice samples from MEEI database,15,31 this study
among the female voice samples is lower than that among the utilized DNN with three hidden layers and exhibited a better
male samples. Similar to our results, a previous study by Fraile performance, further confirming the advantages of the pro-
et al also reported a higher accuracy for pathological voice de- posed DNN-based approach. Although the detailed settings for
tection in men than in women.10 The authors proposed that such extracting the MFCC features and numbers of GMM mixtures
discrepancy might be explained by the wider distribution of the were not identical with the previous study by Godino-Llorente
values of MFCC features in women compared with the nar- et al,7 this study also showed that dynamic delta features of
rower distribution for men.10 MFCCs do not enhance the capability of the MEEI model in
Compared with a previous study using an artificial neural the detection of voice disorders (Table 5). Such a concordance
network containing one layer of hidden node (accuracy: men: may be due to the fact that the MFCC produces sufficient
90.95%; women: 86.50%),10 DNN model with multiple layers discriminative information when voice samples are recorded
of hidden nodes further increases the accuracy rates by around in a well-controlled environment. Accordingly, the appended
4% (men: 94.26%; women: 90.52%, Tables 3 and 4). Accord- delta trajectory may be redundant and result in learning confu-
ingly, our results indicate that DNN achieves the highest sions. In contrast, under circumstances in which the voice
performance for both female and male subjects, confirming samples are recorded in suboptimal settings, adding temporal
the capability of DNN for detecting pathological voice samples. derivatives (delta feature) might be helpful to increase the
Moreover, the velocity (delta) features and normalization are robustness of performance (Tables 3 and 4).
ARTICLE IN PRESS
6 Journal of Voice, Vol. ■■, No. ■■, 2018

TABLE 3.
Classification Accuracies of Three Classification Algorithms and Three MFCC Features Among Male Subjects
SVM GMM DNN
Accuracy ± Standard Accuracy ± Standard Accuracy ± Standard
Deviation Deviation Deviation
MFCC 92.24 ± 2.66% 89.00 ± 1.79% 93.86 ± 2.05%
MFCC + delta 92.24 ± 2.66% 91.02 ± 3.38% 93.86 ± 2.05%
MFCC(N) + delta 93.04 ± 2.74% 90.24 ± 4.18% 94.26 ± 2.25%

TABLE 4.
Classification Accuracies of Three Classification Algorithms and Three MFCC Features Among Female Subjects
SVM GMM DNN
Accuracy ± Standard Accuracy ± Standard Accuracy ± Standard
Deviation Deviation Deviation
MFCC 85.18 ± 0.72% 83.56 ± 2.12% 86.14 ± 1.43%
MFCC + delta 85.18 ± 0.72% 86.12 ± 4.35% 87.74 ± 1.43%
MFCC(N) + delta 87.40 ± 1.92% 90.20 ± 3.83% 90.52 ± 2.00%

TABLE 5.
Detection of Pathological Voice Samples in the MEEI Voice Disorder Database
SVM GMM DNN
Accuracy ± Standard Accuracy ± Standard Accuracy ± Standard
Deviation Deviation Deviation
MFCC 98.28 ± 2.36% 98.26 ± 1.80% 99.14 ± 1.92%
MFCC + delta 93.04 ± 2.74% 90.24 ± 4.18% 94.26 ± 2.25%
MFCC(N) + delta 87.40 ± 1.92% 90.20 ± 3.83% 90.52 ± 2.00%

Limitations for example, wavelet transformation15 and spectra modulation,12

This study found that the classification accuracy using DNN can be combined with DNN for a better performance and wider
model is higher for the MEEI data than for the FEMH data. A application on different voice databases.
similar result was also reported in a previous study, in which
the MEEI database revealed a much higher accuracy of patho-
logical voice detection than another database,12 probably owing CONCLUSION
to differences in the recording environment (eg, background noise This study proposed an enhanced pathological voice detection
levels and recording instruments). For example, the frequency system that combines a DNN classifier and normalized MFCC
response of the microphone used in this study was not ideally features. The results based on both the FEMH and the MEEI
flat with a peak around the 2–4 kHz region, which may have af- databases indicated that DNN outperforms traditional GMM and
fected the recognition results. Nevertheless, as voice samples used SVM in improving detection accuracy based on three represen-
for training and testing were recorded using the same micro- tative features. Successful aspects of the current offline model
phone, the experimental results shall remain credible.32 Future (recording followed by analysis) can be implemented in the future
study using a better microphone (with an ideally flat frequency development of an online model (simultaneous recording and
response) might result in higher recognition accuracy than the analysis) for telepractice, in which the voice samples may be
current study. Besides, larger numbers of pathological samples screened in real time through modern cloud-based computa-
in FEMH database might introduce higher variability, result- tion (Figure 5).
ing in lower accuracy for automatic detection. Another possible
explanation is that the ratio between normal and pathological Acknowledgments
voice samples is different between the two databases. Further This study was supported by research grants from the Ministry
study shall gather more voice samples from normal subjects to of Science and Technology (MOST 105-2221-E-155-013-
provide a wider base of feature identification for machine learn- MY3, 106-2314-B-418-003 and 107-2634-F-155-001). The
ing algorithms. In addition, other techniques of feature extraction, authors thank Prof. Chii-Wann Lin, PhD, and Mr. Sheng-Yang
ARTICLE IN PRESS
Shih-Hau Fang, et al Detection of Pathological Voice Using Deep Neural Network 7

FIGURE 5. Online and offline models of the proposed pathological voice detection system.

Tsui, BSc, for their help in the analysis of acoustic data and neural 12. Arias-Londono JD, Godino-Llorente JI, Markaki M, et al. On combining
network modeling. information from modulation spectra and Mel-frequency cepstral coefficients
for automatic detection of pathological voices. Logoped Phoniatr Vocol.
2011;36:60–69.
SUPPLEMENTARY DATA 13. Markaki M, Stylianou Y. Voice pathology detection and discrimination based
on modulation spectral features. IEEE Trans Audio, Speech, Language Proc.
2011;19:1938–1948.
Supplementary data related to this article can be found online 14. Muhammad G, Mesallam TA, Malki KH, et al. Multidirectional regression
at doi:10.1016/j.jvoice.2018.02.003. (MDR)-based features for automatic voice disorder detection. J Voice.
2012;26:e819–e827.
15. Arjmandi MK, Pooyan M. An optimum algorithm in pathological voice
REFERENCES quality assessment using wavelet-packet-based features, linear discriminant
1. Titze IR. Workshop on acoustic voice analysis: Summary statement. National analysis and support vector machine. Biomed Signal Process Control.
Center for Voice and Speech; 1995. 2012;7:3–19.
2. Stemple JC, Roy N, Klaben BK. Clinical Voice Pathology Theory and 16. Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling
Management. San Diego: Plural Publishing; 2014. in speech recognition. IEEE Signal Proc Mag. 2012;29:82–97.
3. Schwartz SR, Cohen SM, Dailey SH, et al. Clinical practice guideline: 17. Silver D, Huang A, Maddison CJ, et al. Mastering the game of Go with
hoarseness (dysphonia). Otolaryngol Head Neck Surg. 2009;141:S1– deep neural networks and tree search. Nature. 2016;529:484–489.
S31. 18. Fang SH, Fei YX, Xu ZZ, et al. Learning transportation modes from
4. Vaziri G, Almasganj F, Behroozmand R. Pathological assessment of patients’ smartphone sensors based on deep neural network. IEEE Sens J.
speech signals using nonlinear dynamical analysis. Comput Biol Med. 2017;17:6111–6118.
2010;40:54–63. 19. Li B, Tsao Y, Sim KC. An investigation of spectral restoration algorithms
5. Elemetrics K. Disordered voice database 1.03ed, 1994. for deep neural networks based noise robust speech recognition.
6. Umapathy K, Krishnan S, Parsa V, et al. Discrimination of pathological INTERSPEECH; 2013:3002–3006.
voices using a time-frequency approach. IEEE Trans Biomed Eng. 20. Tawalbeh LA, Mehmood R, Benkhlifa E, et al. Mobile cloud computing
2005;52:421–430. model and big data analysis for healthcare applications. IEEE Access.
7. Godino-Llorente JI, Gomez-Vilda P, Blanco-Velasco M. Dimensionality 2016;4:6171–6180.
reduction of a pathological voice quality assessment system based on 21. Sahoo PK, Mohapatra SK, Wu SL. Analyzing healthcare big data with
Gaussian mixture models and short-term cepstral parameters. IEEE Trans prediction for future health condition. IEEE Access. 2017;99:1.
Biomed Eng. 2006;53:1943–1953. 22. Ma Y, Wang Y, Yang J, et al. Big health application system based on health
8. Costa SC, Neto BGA, Fechine JM. Pathological voice discrimination using internet of things and big data. IEEE Access. 2016;PP:1.
cepstral analysis, vector quantization and hidden Markov models. In: IEEE 23. Fu S, Theodoros DG, Ward EC. Delivery of intensive voice therapy for vocal
International Conference on Bioinformatics and Bioengineering. Athens, fold nodules via telepractice: a pilot feasibility and efficacy study. J Voice.
Greece; 2008:1–5. 2015;29:696–706.
9. Salhi L, Mourad T, Cherif A. Voice disorders identification using multilayer 24. Davis S, Mermelstein P. Comparison of parametric representations for
neural network. Int Arab J Inf Technol. 2010;7:177–185. monosyllabic word recognition in continuously spoken sentences. IEEE Trans
10. Fraile R, Saenz-Lechon N, Godino-Llorente JI, et al. Automatic detection Acoust. 1980;28:357–366.
of laryngeal pathologies in records of sustained vowels by means of 25. Hamawaki S, Funasawa S, Katto J, et al. Feature Analysis and Normalization
Mel-frequency cepstral coefficient parameters and differentiation of patients Approach for Robust Content-Based Music Retrieval to Encoded Audio with
by sex. Folia Phoniatr Logop. 2009;61:146–152. Different Bit Rates. In: Huet B, Smeaton A, Mayer-Patel K, et al., eds.
11. Arias-Londono JD, Godino-Llorente JI, Saenz-Lechon N, et al. Automatic Advances in Multimedia Modeling: 15th International Multimedia Modeling
detection of pathological voices using complexity measures, noise parameters, Conference, MMM 2009, Sophia-Antipolis, France, January 7–9, 2009.
and Mel-cepstral coefficients. IEEE Trans Biomed Eng. 2011;58:370–379. Proceedings. Berlin, Heidelberg: Springer Berlin Heidelberg; 2009:298–309.
ARTICLE IN PRESS
8 Journal of Voice, Vol. ■■, No. ■■, 2018

26. Boril H, Hansen JHL. Unsupervised equalization of Lombard effect for 29. Dahmani M, Guerti M. Vocal folds pathologies classification using
speech recognition in noisy adverse environments. IEEE Trans Audio, Speech Naïve Bayes Networks Systems and Control (ICSC), 2017:p. 426–
Lang Proc. 2010;18:1379–1393. 432.
27. Zhang D, Gatica-Perez D, Bengio S, et al. Semisupervised adapted HMMS 30. Lu X, Tsao Y, Matsuda S, et al. Speech enhancement based on deep
for unusual event detection. IEEE Comput Soc Conf Comput Vis Pattern denoising autoencoder, 2013:436–440.
Recognit. 2005;1:611–618. 31. Godino-Llorente JI, Gomez-Vilda P. Automatic detection of voice
28. Chan CP, Wong YW, Tan L, et al. Two-dimensional multi-resolution analysis impairments by means of short-term cepstral parameters and neural network
of speech signals and its application to speech recognition 1999. In: IEEE based detectors. IEEE Trans Biomed Eng. 2004;51:380–384.
International Conference on Acoustics, Speech, and Signal Processing. 32. Li J, Deng L, Gong Y, et al. An overview of noise-robust automatic
Proceedings, Vol. 401. ICASSP99 (Cat. No.99CH36258). Phoenix, AZ, USA; speech recognition. IEEE/ACM Trans Audio, Speech Lang Proc.
1999:405–408. 2014;22:745–777.

DR-XD 1000 Service Manual: Control Sheet
100% (1)
DR-XD 1000 Service Manual: Control Sheet
96 pages
A Project Report On Brick Breaker Game
100% (1)
A Project Report On Brick Breaker Game
9 pages
Kristian Kohle - Rainbows and Chainsaws Documentation
100% (1)
Kristian Kohle - Rainbows and Chainsaws Documentation
12 pages
Violet
No ratings yet
Violet
9 pages
A Modular Deep Learning Architecture for Voice Pathology Classification
No ratings yet
A Modular Deep Learning Architecture for Voice Pathology Classification
14 pages
Voice Pathology Detection Based On The Modified Voice Contour and SVM
No ratings yet
Voice Pathology Detection Based On The Modified Voice Contour and SVM
9 pages
Review Paper VPD
No ratings yet
Review Paper VPD
4 pages
2019 Learning Strategies For VD
No ratings yet
2019 Learning Strategies For VD
7 pages
Jennifer - Springer - 2020 Published PDF
No ratings yet
Jennifer - Springer - 2020 Published PDF
14 pages
Voice Disorder Classification Using Convolutional Neural Network Based on Deep Transfer Learning1
No ratings yet
Voice Disorder Classification Using Convolutional Neural Network Based on Deep Transfer Learning1
15 pages
A Hybrid Approach for Binary and Multi-class Class
No ratings yet
A Hybrid Approach for Binary and Multi-class Class
15 pages
A Survey On Machine Learning Approaches For Automatic
No ratings yet
A Survey On Machine Learning Approaches For Automatic
23 pages
Automatic classification of neurological voice disorders
No ratings yet
Automatic classification of neurological voice disorders
10 pages
2020 Novel Multicenter
No ratings yet
2020 Novel Multicenter
9 pages
Pathology Voice Detection and Classification Using Ensemble Learning
No ratings yet
Pathology Voice Detection and Classification Using Ensemble Learning
8 pages
Voice Disorder Detection Using Long Short Term Memory (LSTM) Model
No ratings yet
Voice Disorder Detection Using Long Short Term Memory (LSTM) Model
4 pages
Artificial Neural Networks and Support Vector Machine For Voice Disorders Identification
No ratings yet
Artificial Neural Networks and Support Vector Machine For Voice Disorders Identification
6 pages
Voice Pathology Identification System Using SVM Classifier
No ratings yet
Voice Pathology Identification System Using SVM Classifier
7 pages
JSIP_2014021010293134
No ratings yet
JSIP_2014021010293134
7 pages
MachineLearning Final
No ratings yet
MachineLearning Final
5 pages
Telephony-Based Voice Pathology Assessment Using Automated Speech Analysis
No ratings yet
Telephony-Based Voice Pathology Assessment Using Automated Speech Analysis
10 pages
Voice Disorder Classification Using Speech Enhancement and Deep Learning Models
No ratings yet
Voice Disorder Classification Using Speech Enhancement and Deep Learning Models
18 pages
An Improved Method For Voice Pathology D
No ratings yet
An Improved Method For Voice Pathology D
13 pages
Speech Processing Article
No ratings yet
Speech Processing Article
13 pages
1_s2.0_S0885230823000712_main
No ratings yet
1_s2.0_S0885230823000712_main
17 pages
Application of Automatic Speaker Recognition Techniques To Pathological Voice Assessment
No ratings yet
Application of Automatic Speaker Recognition Techniques To Pathological Voice Assessment
12 pages
Application of Speech Processing For Pathological Voice Detection and Analysis
No ratings yet
Application of Speech Processing For Pathological Voice Detection and Analysis
9 pages
10.3934_mbe.2020404
No ratings yet
10.3934_mbe.2020404
22 pages
Deep Learning-Based Parkinson's Disease Classification Using Vocal Feature Sets
No ratings yet
Deep Learning-Based Parkinson's Disease Classification Using Vocal Feature Sets
12 pages
Traitement de Parole
No ratings yet
Traitement de Parole
12 pages
Multitask and Transfer Learning Approach For Joint Classification and Severity Estimation of Dysphonia
No ratings yet
Multitask and Transfer Learning Approach For Joint Classification and Severity Estimation of Dysphonia
12 pages
Driver Identification Based On Voice Signal Using Continuous Wavelet Transform and Artificial Neural Network Techniques
No ratings yet
Driver Identification Based On Voice Signal Using Continuous Wavelet Transform and Artificial Neural Network Techniques
4 pages
severity_paper
No ratings yet
severity_paper
5 pages
RS-MSConvNet a Novel End-To-End Pathological Voice Detection Model
No ratings yet
RS-MSConvNet a Novel End-To-End Pathological Voice Detection Model
12 pages
Automatic Detection of Pathological Voic
No ratings yet
Automatic Detection of Pathological Voic
10 pages
8 Pertrbacion Voz 2006
No ratings yet
8 Pertrbacion Voz 2006
5 pages
Applied Machine Learning Techniques to Diagnose Voice-Affecting Conditions and Disorders: Systematic Literature Review
No ratings yet
Applied Machine Learning Techniques to Diagnose Voice-Affecting Conditions and Disorders: Systematic Literature Review
27 pages
04f6680f55f8b91d2e87094b6b1f5519
No ratings yet
04f6680f55f8b91d2e87094b6b1f5519
11 pages
Diagnosing Dysarthria With Long Short-Term Memory Networks: September 15-19, 2019, Graz, Austria
No ratings yet
Diagnosing Dysarthria With Long Short-Term Memory Networks: September 15-19, 2019, Graz, Austria
5 pages
reference-14
No ratings yet
reference-14
14 pages
dib-0005-0078
No ratings yet
dib-0005-0078
11 pages
Classification of Parkinsons Disease Based On Acoustic Characteristics Using Fea
No ratings yet
Classification of Parkinsons Disease Based On Acoustic Characteristics Using Fea
6 pages
Convolutional Neural Network-based Vocal Cord Tumo
No ratings yet
Convolutional Neural Network-based Vocal Cord Tumo
15 pages
1 s2.0 S2665917423002490 Main
No ratings yet
1 s2.0 S2665917423002490 Main
5 pages
reference-10
No ratings yet
reference-10
12 pages
Sentiment Analysis - Voice Analysis For PD Detection
No ratings yet
Sentiment Analysis - Voice Analysis For PD Detection
22 pages
Prakash B Indian Perspective
No ratings yet
Prakash B Indian Perspective
5 pages
s41598-024-64987-5
No ratings yet
s41598-024-64987-5
25 pages
AI Based Quality of Voice Analysis Models for Clinical Use Insights of Quality of Models From 19 Parkinsons Disease Studies 2013 2023
No ratings yet
AI Based Quality of Voice Analysis Models for Clinical Use Insights of Quality of Models From 19 Parkinsons Disease Studies 2013 2023
8 pages
Article Voice Disorder
No ratings yet
Article Voice Disorder
11 pages
Biomechanical Analysis of the Voice as A
No ratings yet
Biomechanical Analysis of the Voice as A
13 pages
1 s2.0 S0892199715001861 Main
No ratings yet
1 s2.0 S0892199715001861 Main
13 pages
Detection of Simulated Vocal Dysfunctions Using Complex sEMG Patterns
No ratings yet
Detection of Simulated Vocal Dysfunctions Using Complex sEMG Patterns
15 pages
HPP-Voice: A Large-Scale Evaluation of Speech Embeddings for Multi-Phenotypic Classification
No ratings yet
HPP-Voice: A Large-Scale Evaluation of Speech Embeddings for Multi-Phenotypic Classification
13 pages
1 s2.0 S0892199709000563 Main - 2
No ratings yet
1 s2.0 S0892199709000563 Main - 2
11 pages
Max Little, Patrick Mcsharry, Irene Moroz and Stephen Roberts
No ratings yet
Max Little, Patrick Mcsharry, Irene Moroz and Stephen Roberts
4 pages
reference-12
No ratings yet
reference-12
14 pages
275-Article Text-1998-1-10-20250129
No ratings yet
275-Article Text-1998-1-10-20250129
16 pages
Murmur Detection From PCG Signal
No ratings yet
Murmur Detection From PCG Signal
8 pages
Cross-Lingual Detection of Dysphonic Speech For Dutch and Hungarian Datasets
No ratings yet
Cross-Lingual Detection of Dysphonic Speech For Dutch and Hungarian Datasets
6 pages
Gender Detection by Voice Using Deep Learning
No ratings yet
Gender Detection by Voice Using Deep Learning
5 pages
Machine Learning Based Identification of Relevant
No ratings yet
Machine Learning Based Identification of Relevant
15 pages
Intra-Operative Monitoring: A Comprehensive Approach
From Everand
Intra-Operative Monitoring: A Comprehensive Approach
Donald Farrell MD
No ratings yet
02 Introduction To Power Shell
No ratings yet
02 Introduction To Power Shell
14 pages
Daniel D Gajski Nikil D Dutt Allen C-H Wu Steve Y-L Lin Auth - High Level Synthesis - Introduction To Chip and System Design-Springer US 1992
No ratings yet
Daniel D Gajski Nikil D Dutt Allen C-H Wu Steve Y-L Lin Auth - High Level Synthesis - Introduction To Chip and System Design-Springer US 1992
367 pages
CW UFMFW7-15-3 - Assignment 2021
No ratings yet
CW UFMFW7-15-3 - Assignment 2021
4 pages
Adrian Lim - Innovation Factory
No ratings yet
Adrian Lim - Innovation Factory
17 pages
The World's First Pocket Sized NIR Spectrometer
No ratings yet
The World's First Pocket Sized NIR Spectrometer
2 pages
PHP Variables - Data Types and Constants
No ratings yet
PHP Variables - Data Types and Constants
21 pages
Concurrent & Parallel Execution
No ratings yet
Concurrent & Parallel Execution
3 pages
GMAT Quant Formulaes Cheat Sheet
100% (1)
GMAT Quant Formulaes Cheat Sheet
6 pages
BRD 7theditan-1
No ratings yet
BRD 7theditan-1
51 pages
Web-Authoring-Notes-PDF
No ratings yet
Web-Authoring-Notes-PDF
21 pages
Chapter: 3.7 Drives Topic: 3.7.1 Drives: E-Content of It Tools and Business System
100% (1)
Chapter: 3.7 Drives Topic: 3.7.1 Drives: E-Content of It Tools and Business System
15 pages
Truckload Carrier Selection Routing and Cost Optimization
No ratings yet
Truckload Carrier Selection Routing and Cost Optimization
89 pages
Pentra DX 120-Raa027ben
0% (1)
Pentra DX 120-Raa027ben
528 pages
Writing Data Commentary
No ratings yet
Writing Data Commentary
17 pages
Tablet PC 平板
No ratings yet
Tablet PC 平板
53 pages
Easy Medium Hard: Array
No ratings yet
Easy Medium Hard: Array
6 pages
Stereo Chino
No ratings yet
Stereo Chino
2 pages
FDS FILTRATION 2002 repository
No ratings yet
FDS FILTRATION 2002 repository
13 pages
(10.5.23) Applications of Queuing Theory in Quantitative Business Analysis
No ratings yet
(10.5.23) Applications of Queuing Theory in Quantitative Business Analysis
5 pages
(Phase-2) Poster Presentation Template
No ratings yet
(Phase-2) Poster Presentation Template
1 page
System Infoffg
No ratings yet
System Infoffg
7 pages
TraumaCad 2.5 User Guide
No ratings yet
TraumaCad 2.5 User Guide
184 pages
- 2.Creative_Projects_NAO_EN (progetto1)
No ratings yet
- 2.Creative_Projects_NAO_EN (progetto1)
6 pages
For The Full Essay Please WHATSAPP 010-2504287
No ratings yet
For The Full Essay Please WHATSAPP 010-2504287
11 pages
The Imitation of Christ
No ratings yet
The Imitation of Christ
292 pages
Camtasia 2020 MSI Installation Guide
No ratings yet
Camtasia 2020 MSI Installation Guide
4 pages
Mapping p127 p2
No ratings yet
Mapping p127 p2
76 pages

Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach

Uploaded by

Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach

Uploaded by

ARTICLE IN PRESS

Detection of Pathological Voice Using Cepstrum

INTRODUCTION differentiate between normal and pathological voices. A previ-

FIGURE 1. Illustration of the acoustic waveform segmentation procedure.

Limitations for example, wavelet transformation15 and spectra modulation,12

You might also like