0% found this document useful (0 votes)
3 views

a method of classifying vq

This conference paper by Adele Gregory proposes a methodology for classifying voice quality using both auditory perceptual and acoustic measures, aimed at achieving high inter-rater reliability. The study focuses on infant vocalizations and presents a decision-making tree for categorizing various voice qualities, demonstrating a high level of agreement between analysts. The results indicate that the classification scheme effectively combines perceptual and instrumental aspects, providing a systematic approach to voice quality evaluation in clinical and research settings.

Uploaded by

Lamyaa Aldawy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

a method of classifying vq

This conference paper by Adele Gregory proposes a methodology for classifying voice quality using both auditory perceptual and acoustic measures, aimed at achieving high inter-rater reliability. The study focuses on infant vocalizations and presents a decision-making tree for categorizing various voice qualities, demonstrating a high level of agreement between analysts. The results indicate that the classification scheme effectively combines perceptual and instrumental aspects, providing a systematic approach to voice quality evaluation in clinical and research settings.

Uploaded by

Lamyaa Aldawy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/330689300

A method for classifying voice quality

Conference Paper · December 2018

CITATION READS

1 413

1 author:

Adele Gregory
University of Melbourne
16 PUBLICATIONS 31 CITATIONS

SEE PROFILE

All content following this page was uploaded by Adele Gregory on 28 January 2019.

The user has requested enhancement of the downloaded file.


SST 2018 Table of Contents
for this manuscript

A method for classifying voice quality


Adele Gregory

Department of Speech Pathology, James Cook University, Townsville, Australia


[email protected]

These instrumental measures may be aerodynamic or


Abstract acoustic. Aerodynamic measures (e.g., subglottal pressure,
The classification of voice quality is utilized across the fields phonation threshold pressure maximum flow declination rate)
of linguistics and speech pathology. This paper proposes a are used extensively in the clinical environment to diagnose
methodology to classify voice quality utilizing both auditory and examine the differences pre- and post-treatment [10,11].
perceptual and acoustic references that provide a high level of Acoustic measures (e.g., f0, waveform and spectral inspection,
inter-rater reliability. Using a case study from the field of jitter and Long Term Average Spectrum) are also utilized in
infant language acquisition we show that this classification both diagnosis and treatment evaluation. Multiple measures
scheme provides a systematic way of combining are required, as voice quality does not have a single acoustic
spectrographic and wave inspection together with an auditory or aerodynamic correlate. Evidence suggests that voice quality
impression for a replicable methodology. may not even be independent of frequency and amplitude [9].
Index Terms: voice quality, classification, perceptual, In addition to pitch and loudness, which are easily quantifiable
acoustic as they have single acoustic correlates, voice quality is
influenced by numerous different factors; including effects of
1. Introduction the spectral envelope and its changes in time, periodic
fluctuations of amplitude or fundamental frequency, and any
Traditionally the quality or timbre of a sound has been defined noise component in the signal. Often measures are selected
as "that attribute of auditory sensation in terms of which a based on their suitability for a particular population. This
listener can judge that two sounds similarly presented and paper presents a case study of how the consideration of these
having the same loudness and pitch are dissimilar" [1: 45]. measures impact on methodology.
Given this broad definition, it is no surprise that previous Infants form an interesting population for voice quality
reviews have reported as many as 67 terms for vocal quality in study as a large proportion of the vocalizations they produce
the literature [2]. Multiple systems such as those by [3, 4, 5] are deemed to be non-modal [12]. This is a result of infants
and [6] have endeavored to capture the variety of phonation still developing the control and coordination necessary for the
types. These systems provide a wide range of options for acquisition of speech. [12] created a regime classification
classifying non-modal voice quality. They are routinely used system that utilized both perceptual and instrumental aspects
in a number of fields, and auditory-perceptual evaluation of of voice quality evaluation. However, they also introduced
voice quality is the most commonly used clinical voice specific infant related phonation categories. In contrast [13,
assessment method [7]. Perceptual methods are often utilized 14] created a system that utilized perceptual aspects and adult
to classify consensus features of speakers during the voice quality categories with infants. These two ideas can be
production of sustained vowels, sentences, and running combined so that a classification scheme is developed that can
speech. Perceptual protocols are often referenced in the utilize common adult voice quality categories but will also
literature, and many studies have developed these, primarily take into account the perceptual and instrumental aspects of
for the evaluation of pathological voices. However perceptual voice quality evaluation.
evaluation has been heavily criticized because it is subjective
[8]. [8: 14] notes that “building detail and complexity into a 2. Methodology
coding system does not guarantee that the listener’s auditory-
perceptual decision space is equal to the demands of the A subset of 8 voice quality categories was chosen after
analysis task.” As such a broader transcription that is more consideration from a broad range of literature: harsh voice,
dependable has benefits over a narrow transcription with more creaky voice, whispery voice, modal voice, breathy voice, loft,
detailed information but poorer reliability. In addition, whisper and voiceless. [13, 14, 15] have previously used these
perceptual methods do not necessarily provide information terms in an auditory analysis of voice quality parameters
about actual vocal tract function; the complicated relationship where the degree of laryngeal constriction (defined primarily
between human auditory perception, acoustic measures of in terms of the degree of sphinctering of the aryepiglottic fold
voice quality and vocal tract configuration limits the extent in the larynx) distinguishes phonation types. These categories
which perception of voice quality can reveal the underlying are based on laryngoscopic observations of the adult pharynx
vocal tract physiology [7]. Despite this, a perceptual analysis and larynx [16]. Thus there is some defined relationship
allows for considerable advantages in terms of convenience, between the vocal tract configuration and the perceptual voice
economy, and robustness [8]. Therefore [7] recommends quality categories in this proposed classification scheme. The
multiple methods of voice quality evaluation: both subjective use of these terms within this framework provides a broad
notation and appropriately implemented instrumental spectrum of vocal behavior and an ability to speak to some
measures. extent of the vocal tract configuration. As such the
methodology is similar to [12: 553] because “it [is] internally

ISSN 2207-1296 © 2018 ASSTA 117 4–7 December 2018, Sydney, Australia
Table of Contents
for this manuscript

Is there voicing present?


yes no

Is there a ‘normal’ Does it have a whis-


clear periodic har- pery quality and en-
yes no yes no
monic structure? ergy above 3000Hz?

Are there regular har-


monics with little to no Are the harmonics widely
yes energy above 3000Hz? no Whisper Voiceless
yes spaced and the f0 high? no
Does it sound ‘normal’ ?

Are there identifiable


Loft vertical striations and
yes individual glottal pulses no
Modal and a ‘creaky’ quality?
Voice
Is there frication on
the wave-form, low
yes amplitude noise and Is there a lot of spec-
Creaky tral ‘noise’ with high
a ‘breathy’ sound? Voice energy bands over 300
yes no
Hz and a ‘harsh’ or
‘growl’ like quality?
Breathy
Voice
Is there spectral noise
Harsh Whispery
with ‘whispery’ quality
Voice Voice yes
but also a voicing band?

Figure 1: Proposed classification schema for labelling voice quality categories

comprehensive in the sense that all phonation are classified independent analyst was unsure about the category, they made
but it is not exhaustive in the sense that all vocal qualities are the decision based on their perception of the sound
thereby represented.” It thus takes into account [8]’s irrespective of the acoustic references. The tokens were
requirements for limiting complexity and detail. Acoustic presented to them in a randomized order, and they were
correlates for each category were developed based on previous blinded to the participant.
literature (e.g. [17], [18]). These included f0, waveform, and
broadband spectrogram visual inspection. A decision–making 3. Results
tree was created based on these different categories (see
Figure 1). A total of 761 vocalizations were labeled with a voice quality
To test the methodology, a corpus of infant data was category by two analysts. The categories were then compared.
classified for voice quality. Two female infants comprised the The scheme can provide a technique for the comprehensive
study. Each infant was video-recorded using a Sony DCR classification of a child’s recorded productions, but in this
TRV16E digital video camera recorder with an integrated case study, 96% of all vocalizations were analyzed by both
electret condenser microphone. This integrated microphone raters as 4% were left unlabeled by the independent analyst.
was shown to have a wide-band flat response through the No attempts were made to establish a consensus or to re-train
range of 100Hz to 10kHz. Infants were recorded in 16-bit the independent analyst for increased discrimination. Table 1
stereo and at a sampling rate of 48kHz. Each recording session provides the results from the study. Overall 82% or Cohen's
occurred in the infant’s home in the presence and under the Kappa of 0.77 utterances were labeled independently in
control of one of their caregivers. The camera was positioned agreement representing a high level of agreement. Modal,
on a stationary tripod and directed at the infant at a distance of creaky, harsh, breathy, loft and voiceless voice qualities were
approximately one to two meters. The infants were recorded all recognized with accuracies approaching or exceeding 80%.
for approximately one hour each week during the study (from When the errors for each voice quality were examined in turn
3 weeks up to 26 weeks of age). A full range of vocal behavior the following results emerge:
was recorded. Randomly selected vocalizations from each • Creaky voice: most confused with harsh voice
week of the study were chosen. For the purposes of the case o (n=8, 7%)
study, a vocalization was any discrete sound produced with • Harsh voice: most confused with creaky voice
varying degrees of constriction occurring within one o (n = 4, 4%)
respiration cycle. Each vocalization was labeled by utilizing • Whispery voice: most confused with breathy voice
the classification tool by the author. A total of 761 separate o (n = 3, 13%)
vocalizations were labeled. • Modal voice: most confused with breathy voice
As auditory perceptual judgments are susceptible to a o (n = 10, 6%)
variety of sources of error and bias, inter-rater reliability • Breathy voice: most confused with modal voice
measurements were used to quantify the consistency with o (n = 4, 8%)
which these judgments are made. An independent analyst was • Loft: most confused with modal voice
trained to utilize the classification scheme on practice tokens o (n = 2, 6%)
for approximately an hour. Any reflexive sounds (hiccup, • Whisper: confused with breathy voice and voiceless
sneeze, and cough) were labeled as such; all other o (n =1, 13%)
vocalizations were labeled for voice quality. If the

118
Table of Contents
for this manuscript

Analyst!1!
!

Whispery'Voice'

Unlabeled'
Voiceless'
Whisper'
Breathy'

Sneeze'
Creaky'

Hiccup'
Modal'

Cough'

Totals!
Harsh'

Loft'
!
Creaky' 99' 4' 3' 1' 1' 108'
' ' ' '
Analyst!2!(Independent)!!

Harsh' 8' 84' 8' 1' 5' 106'


Whispery' ' ' ' '
Voice' 0' 3' 18' 4' 3' 28'
Modal' 6' 3' 2' 140' 4' 2'' 2' 159'
Breathy' 1' 3' 3' 10' 42' 1' 1'' ' 61'
Loft' 3' 1' 28' ' 32'
Whisper' ' ' ' 6'' 3' 9'
Voiceless' ' 1' 7' 1' 1' 197' ' 207'
Hiccup' ' ' 1'' 3' 4'' 8'
Cough' ' 3' 1' ' 5' 1' '' ' 1' 11'
Sneeze' ' ' ' 2' 3' ' 5'
Unlabeled' ' 2' 1' 4' 20' ' '' ' 27'
Totals! 116' 105' 23'' 177' 51'' 34' 8' 238' 5'' 0' 4' 0' 761'

Table 1: Results from analysts’ determining the voice quality of each vocalization

• Voiceless: either labeled as vegetative vocalisation review questions about the ease and use of the scheme and
or not labeled underlying issues of classification would also add to the
o (n = 10, 4% and n = 20, 8%) understanding about why some voice qualities cannot be
Whilst the Kappa statistics indicated overall good categorized confidently.
performance, it is evident that some voice qualities were either
more challenging to classify accurately or with confidence. 4. Discussion
This was particularly true for voiceless vocalizations where
the largest number (n=20) of unlabeled vocalization was The results from this case study have shown that infant
located. As the classification of voice quality categories vocalizations can be exhaustively classified by the scheme
required definitive decisions the first type of error was in shown at Figure 1. The combination of acoustic and
determining phonation. This occurs at the first node ('Is there perceptual cues to decide on the voice quality laid out in the
voicing present?') (see Figure 2). There was the possibility of decision tree enables a systematic way to delineate voice
an analyst recording a voice quality where there was none quality in infant vocalizations. The results of the inter-rater
(Category 1a), or saying there was none when there was reliability study compare very favorably with those of [12].
eligible phonation (Category 1b). The largest number of When categorizing infant voice quality using their schema,
Category 1a errors was where voiceless vocalizations were [12: 560] obtained a Cohen’s Kappa of 0.76. The Kappa
labeled with the combined vegetative class (cough, sneeze, statistics indicate overall good performance. To enhance the
hiccup, n=10) and where the analyst was unsure and therefore scheme amendments have been made as shown in Figure 2.
did not make a decision (unlabeled n=20). However, when These minor changes remove two unnecessary decision nodes.
these are removed, there was a high level of agreement about By demonstrating this scheme’s utility on an area of
whether a vocalization was voiceless or had some sort of linguistic study known for its challenges it can be inferred that
phonation present (94%). The largest number of Category 1b it will fare as well or better in other areas of study. Further
errors was when the independent analyst misjudged a modal application of this methodology is not limited to the field of
voice as voiceless (n=7). language acquisition. As defining voice quality categories
Other voice qualities were also confused for each other. across disciplines continues to be an issue (including inter-
Category 2 errors account for the misclassification between rater and inter-lab reliability) this scheme may fulfill a need
modal and breathy. Category 3 errors account for the for a replicable method of voice quality classification. Due to
misclassification between creaky and harsh. These category the voice quality categories being based on adults rather than
errors occur because the scheme requires a forced decision. infants it would be possible to extend the scheme to work with
It is also suggested that voice quality categories may be adult populations as well. This scheme particularly commends
incorrectly categorized due to the perceptual similarities itself for application in any field where the classification of
between them (e.g. whispery voice and breathy voice). voice quality provides discriminatory insight including but not
Improved familiarity with borderline cases would help limited to the fields of socio-linguistics, forensic linguistics
differentiate between the more easily confused voice qualities. and speech pathology. Preliminary studies are currently
Future training of analysts should focus on this. Qualitative underway investigating its suitability for its use with

119
Table of Contents
for this manuscript

1a 1Is there voicing present? 1b


yes no

Does it have a whis-


Is there a ‘normal’ clear
pery quality and en-
yes harmonic structure? no yes no
ergy above 3000Hz?

2a 2monics
Are there regular har-
with little to no
2b Are the harmonics widely
yes energy above 3000Hz? no Whisper Voiceless
yes spaced and the f0 high? no
Does it sound ‘normal’ ?

Modal Breathy 3a 3 vertical


Are there identifiable
striations and
Loft
Voice Voice yes individual glottal pulses no
and a ‘creaky’ quality?

Is there a lot of spec-


Creaky 3b tral ‘noise’ with high
Voice energy bands over 300
yes no
Hz and a ‘harsh’ or
‘growl’ like quality?

Harsh Whispery
Voice Voice

Figure 2: Amended classification schema for labelling voice quality categories

disordered voices. quality measurement, pages 49–58. Singular Publishing Group,


Within the field of infant language acquisition, the San Deigo, 2000.
technique provides the ability to show the change in dominant [7] Oates, J. Auditory-perceptual evaluation of disordered voice
quality. Folia Phoniatrica et Logopaedica, 61(1):49–56, 2009.
voice qualities over time as well as adding other acoustic
[8] Kent, R. D. Hearing and believing: Some limits to the auditory-
dimensions (e.g., f0 or duration) to the task of describing perceptual assessment of speech and voice disorders. American
speech development in these early stages [19]. Extended Journal of Speech-Language Pathology, 5(3):7–23, 1996.
longitudinal research with large sets of infants including from [9] Krumhansl, C. L. and Iverson, P. Perceptual interactions between
a variety of linguistic backgrounds would provide additional musical pitch and timbre. Journal of Experimental Psychology:
data to systematically delineate the development of vocal Human Perception and Performance, 18(3):739–751, 1992.
quality and the emergence of control over modal voice. [10] Hartl, D. M., Hans, S., Vaissière, J., and Brasnu, D. F. Objective
acoustic and aerodynamic measures of breathiness in paralytic
dysphonia. European Archives of Oto-rhino-laryngology,
5. Conclusions 260(4):175–182, 2003.
[11]Giovanni, A., Revis, J., and Triglia, J.-M. Objective Aerodynamic
Voice quality is variously utilized across the fields of
and Acoustic Measurement of Voice Improvement After
linguistics and speech pathology. This paper proposes a Phonosurgery. The Laryngoscope, 109 (4):656–660, 1999.
methodology to classify voice quality utilizing both auditory [12] Buder, E. H., Chorna, L. B., Oller, D. K., and Robinson, R. B.
perceptual and acoustic references that provide a high level of Vibratory Regime Classification of Infant Phonation. Journal of
inter-rater reliability. Using a case study from the field of Voice, 22(5):553–564, 2008.
infant language acquisition, it is shown that this classification [13] Esling, J. H. and Harris, J. G. States of the Glottis: An
scheme provides a systematic way of combining Articulatory Phonetic Model Based on Laryngoscopic
spectrographic and wave inspection together with an auditory Observations. In Hardcastle, W. J. and Beck, J. M., editors, A
Figure of Speech, pages 347–353. Lawrence Erlbaum Associates,
impression for a replicable methodology.
Mahwah, 2005.
[14] Benner, A., Grenon, I., and Esling, J. H. Infants’ phonetic
6. References acquisition of voice quality parameters in the first year of life. In
16th International Congress of Phonetic Sciences, 2073–2076,
[1] ANSI. USA standard: Acoustical terminology (S1.1). American Saarbrücken, 2007.
National Standards Institute, New York, 1960. [15] Esling, J. H. There are no back vowels: The laryngeal articulator
[2] Pannbacker, M. Classification Systems of Voice Disorders: A model. The Canadian journal of linguistics/La revue canadienne
review of the literature. Language, Speech & Hearing Services in de linguistique, 50(1):13–44, 2005.
Schools, 15:169–179, 1984. [16] Edmondson, J. A. and Esling, J. H. The valves of the throat and
[3] Laver, J. The phonetic description of voice quality. Cambridge their functioning in tone, vocal register and stress: laryngoscopic
University Press, Cambridge, 1980. case studies. Phonology, 23(02): 157–191, 2006.
[4]Laver, J. Principles of Phonetics. Cambridge University Press, [17] Ladefoged, P. Phonetic data analysis: An introduction to
Cambridge, 1994. fieldwork and instrumental techniques. Blackwell Publishing
[5] Laver, J. Phonetic Evaluation of Voice Quality. In Kent, R. D. and Melbourne, 2003.
Ball, M. J., editors, Voice quality measurement, pages 37– 48. [18] Keating, P. A. and Esposito, C.M. Linguistic voice quality.
Singular Publishing Group, San Deigo, 2000. Working papers in Phonetics, 1-8, 2007.
[6] Ball, M. J., Esling, J. H., and Dickson, C. The Transcription of [19] Gregory, A., Tabain, M. and Robb, M. Duration and voice quality
Voice Quality. In Kent, R. D. and Ball, M. J., editors, Voice of early infant vocalizations. Journal of Speech, Language and
Hearing Research, [advance online publication]; 1-12, 2018.

120

View publication stats

You might also like