MachineLearning Final
MachineLearning Final
net/publication/370894001
CITATIONS READS
0 120
2 authors, including:
SEE PROFILE
All content following this page was uploaded by National Center For Voice and Speech on 19 May 2023.
Machine Learning:
Machine Learning is a subfield of ar;ficial intelligence that enables computers to learn
pa=erns or models from data and improve with experience without programming explicitly1. In
tradi;onal programming, to perform any task, the programmer provides the input data and the
model (logic or algorithm) to the computer. The computer (program) then applies the input data
to the model and obtains the output (Fig. 1a). On the contrary, the goal of machine learning is
to develop the model (algorithm) to perform a task, given both data and expected output from
the model as inputs (Fig. 1b)2.
Figure 1. (a) Tradi;onal Program (b) Machine Learning model training and inference.
Unsupervised Learning:
The supervised learning methods require data with accurate labels (verbal descriptors)
or outputs for be=er performance. However, it is oLen not easy and requires experts to
generate accurate labels, which can be highly subjec;ve and prone to errors. Unsupervised
learning, on the other hand, does not use labeled data. Instead, it automa;cally learns pa=erns
in the input data and groups them into mul;ple categories. Unsupervised learning is currently
being used for disorder detec;on12, emo;on recogni;on13, and voice quality detec;on using
voice and speech samples.
Reinforcement Learning:
In Reinforcement learning, training data is not needed ahead of ;me14. The model
interacts with the physical plant (vocal system) in a trial-and-error manner and learns to control
the physical plant. This method can be used to learn the neural processes that control the vocal
system. These neural control systems try to mimic how the brain controls the vocal system.
When used with voice simulators, such neural control systems can provide insights into
neuromuscular disorders such as vocal tremor, Parkinson's disease, and spasmodic dysphonia.
The use of reinforcement learning is s;ll in its infancy in voice and speech science research.
Even though the DIVA model15 and other neural controllers of the vocal system16-18 do not use
reinforcement learning in its true sense, they fall under its broader category. The controllers
(reinforcement learning model) take our vocal inten;ons (how high in pitch the voice should be,
how loud the voice should be, how rough or periodic the voice should be, and what syllable to
produce) as inputs and generate corresponding muscle ac;va;ons as output. These muscle
ac;va;ons are provided as input to the vocal system, which then produces phona;on at the
desired vocal inten;ons. The auditory and somatosensory feedback from the vocal system is
con;nuously used to train the reinforcement Learning Models aLer every interac;on. This
con;nuous interac;on through feedback allows the model to improve over ;me (Fig.2).
Conclusion:
Machine Learning approaches are being used in a wide range of applica;ons, including
voice and speech science. The approaches will get be=er and more powerful in the future with
more databases, accurate modeling, and wide distribu;on of soLware, allowing researchers and
prac;;oners to take be=er advantage of them. The profound ques;on is, will human
intelligence and human learning be advanced with ar;ficial intelligence, or will a trust in
machine learning diminish a deeper understanding of the communica;on sciences and
disorders.
References
1. Mohri, M., Rostamizadeh, A., and Talwalkar, A. Founda'ons of Machine Learning. MIT
press, pp. 1-7, 2018.
2. Turner, R. Machine Learning: The ul'mate beginner’s guide to learn machine learning,
ar'ficial intelligence and neural networks step by step. Publishing Factory LLC, 2020.
3. Ayodele, T.O. Types of machine learning algorithms. New advances in machine learning,
vol. 3, pp. 19-48, 2010.
4. Al-Dhief, F.T., La;ff, N.M.A., Malik, N.N.N.A., Salim, S. N., Baki, M.M., Albadr, M.A.A., et
al. A survey of voice pathology surveillance systems based on Internet of Things and
machine learning algorithms. IEEE Access, vol. 8, pp. 64514-64533, 2020.
5. Verde, L., De Pietro, G. and Sannino, G. Voice disorder iden;fica;on by using machine
learning techniques. IEEE Access, vol. 6, pp. 16246-16255, 2018.
6. Hegde, S., She=y, S., Rai, S. and Dodderi, T. A survey of machine learning approaches for
automa;c detec;on of voice disorders. Journal of Voice, vol. 33(6), pp. 947.e11-947.e33,
2019.
7. Zhang, Y., Zheng, X. and Xue, Q. A deep neural network based glo=al flow model for
predic;ng fluid-structure interac;ons during voice produc;on. Applied Sciences (Basel),
vol. 10(2): 705, 2020.
8. Zhang, Z. Voice feature selec;on to improve performance of machine learning models
for voice produc;on inversion. Journal of Voice, 2021.
9. Zhang, Z. Es;ma;on of vocal fold physiology from voice acous;cs using machine
learning. The Journal of the Acous'cal Society of America, vol. 147(3), pp. EL264-EL270,
2020.
10. Kojima, T., Fujimura, S., Hasebe, K., Okanoue, Y., Shuya, O., Yuki, R. et al. Objec;ve
assessment of pathological voice using ar;ficial intelligence based on the GRBAS scale.
Journal of Voice, 2021.
11. Titze, I.R. and Lucero, J.C. Voice simula;on: The next genera;on. Applied Sciences, vol.
12(22): 11720, 2022.
12. Rueda, A. and Krishnan, S. Clustering Parkinson’s and age-related voice impairment
signal features for unsupervised learning. Advances in Data Science and Adap've
Analysis, vol 10(2):1840007, 2018.
13. Zhang, Z., Weninger, F., Wollmer, M. and Schuller, B. Unsupervised learning in cross-
corpus acous;c emo;on recogni;on. IEEE Workshop on Automa'c Speech Recogni'on
and Understanding, pp. 523-528, 2011.
14. Su=on, R.S. and Barto, A.G. Reinforcement learning: An introduc'on, MIT press, 2018.
15. Guenther, F.H. Neural control of speech. Cambridge, MA. MIT press, 2016.
16. Kroger, B.J., Kannampuzha, J. and Rube, C.N. Towards a neurocomputa;onal model of
speech produc;on and percep;on. Speech Communica'on, vol. 51, pp. 793-808, 2009.
17. Hickok, G. Computa;onal neuroanatomy of speech produc;on. Nature reviews
neuroscience, vol. 13(2), pp. 135-145, 2012.
18. Palaparthi, A. Computa;onal motor learning and control of the vocal source for voice
produc;on. Ph.D. disserta'on, University of Utah, Salt Lake City, UT, 2021.
19. Ribeiro, M.T., Singh, S. and Guestrin, C. Why should I trust you? Explaining the
predic;ons of any classifier. In Proceedings of the 22nd ACM SIGKDD interna'onal
conference on knowledge discovery and data mining, pp. 1135-1144, 2016.