a method of classifying vq
a method of classifying vq
net/publication/330689300
CITATION READS
1 413
1 author:
Adele Gregory
University of Melbourne
16 PUBLICATIONS 31 CITATIONS
SEE PROFILE
All content following this page was uploaded by Adele Gregory on 28 January 2019.
ISSN 2207-1296 © 2018 ASSTA 117 4–7 December 2018, Sydney, Australia
Table of Contents
for this manuscript
comprehensive in the sense that all phonation are classified independent analyst was unsure about the category, they made
but it is not exhaustive in the sense that all vocal qualities are the decision based on their perception of the sound
thereby represented.” It thus takes into account [8]’s irrespective of the acoustic references. The tokens were
requirements for limiting complexity and detail. Acoustic presented to them in a randomized order, and they were
correlates for each category were developed based on previous blinded to the participant.
literature (e.g. [17], [18]). These included f0, waveform, and
broadband spectrogram visual inspection. A decision–making 3. Results
tree was created based on these different categories (see
Figure 1). A total of 761 vocalizations were labeled with a voice quality
To test the methodology, a corpus of infant data was category by two analysts. The categories were then compared.
classified for voice quality. Two female infants comprised the The scheme can provide a technique for the comprehensive
study. Each infant was video-recorded using a Sony DCR classification of a child’s recorded productions, but in this
TRV16E digital video camera recorder with an integrated case study, 96% of all vocalizations were analyzed by both
electret condenser microphone. This integrated microphone raters as 4% were left unlabeled by the independent analyst.
was shown to have a wide-band flat response through the No attempts were made to establish a consensus or to re-train
range of 100Hz to 10kHz. Infants were recorded in 16-bit the independent analyst for increased discrimination. Table 1
stereo and at a sampling rate of 48kHz. Each recording session provides the results from the study. Overall 82% or Cohen's
occurred in the infant’s home in the presence and under the Kappa of 0.77 utterances were labeled independently in
control of one of their caregivers. The camera was positioned agreement representing a high level of agreement. Modal,
on a stationary tripod and directed at the infant at a distance of creaky, harsh, breathy, loft and voiceless voice qualities were
approximately one to two meters. The infants were recorded all recognized with accuracies approaching or exceeding 80%.
for approximately one hour each week during the study (from When the errors for each voice quality were examined in turn
3 weeks up to 26 weeks of age). A full range of vocal behavior the following results emerge:
was recorded. Randomly selected vocalizations from each • Creaky voice: most confused with harsh voice
week of the study were chosen. For the purposes of the case o (n=8, 7%)
study, a vocalization was any discrete sound produced with • Harsh voice: most confused with creaky voice
varying degrees of constriction occurring within one o (n = 4, 4%)
respiration cycle. Each vocalization was labeled by utilizing • Whispery voice: most confused with breathy voice
the classification tool by the author. A total of 761 separate o (n = 3, 13%)
vocalizations were labeled. • Modal voice: most confused with breathy voice
As auditory perceptual judgments are susceptible to a o (n = 10, 6%)
variety of sources of error and bias, inter-rater reliability • Breathy voice: most confused with modal voice
measurements were used to quantify the consistency with o (n = 4, 8%)
which these judgments are made. An independent analyst was • Loft: most confused with modal voice
trained to utilize the classification scheme on practice tokens o (n = 2, 6%)
for approximately an hour. Any reflexive sounds (hiccup, • Whisper: confused with breathy voice and voiceless
sneeze, and cough) were labeled as such; all other o (n =1, 13%)
vocalizations were labeled for voice quality. If the
118
Table of Contents
for this manuscript
Analyst!1!
!
Whispery'Voice'
Unlabeled'
Voiceless'
Whisper'
Breathy'
Sneeze'
Creaky'
Hiccup'
Modal'
Cough'
Totals!
Harsh'
Loft'
!
Creaky' 99' 4' 3' 1' 1' 108'
' ' ' '
Analyst!2!(Independent)!!
Table 1: Results from analysts’ determining the voice quality of each vocalization
• Voiceless: either labeled as vegetative vocalisation review questions about the ease and use of the scheme and
or not labeled underlying issues of classification would also add to the
o (n = 10, 4% and n = 20, 8%) understanding about why some voice qualities cannot be
Whilst the Kappa statistics indicated overall good categorized confidently.
performance, it is evident that some voice qualities were either
more challenging to classify accurately or with confidence. 4. Discussion
This was particularly true for voiceless vocalizations where
the largest number (n=20) of unlabeled vocalization was The results from this case study have shown that infant
located. As the classification of voice quality categories vocalizations can be exhaustively classified by the scheme
required definitive decisions the first type of error was in shown at Figure 1. The combination of acoustic and
determining phonation. This occurs at the first node ('Is there perceptual cues to decide on the voice quality laid out in the
voicing present?') (see Figure 2). There was the possibility of decision tree enables a systematic way to delineate voice
an analyst recording a voice quality where there was none quality in infant vocalizations. The results of the inter-rater
(Category 1a), or saying there was none when there was reliability study compare very favorably with those of [12].
eligible phonation (Category 1b). The largest number of When categorizing infant voice quality using their schema,
Category 1a errors was where voiceless vocalizations were [12: 560] obtained a Cohen’s Kappa of 0.76. The Kappa
labeled with the combined vegetative class (cough, sneeze, statistics indicate overall good performance. To enhance the
hiccup, n=10) and where the analyst was unsure and therefore scheme amendments have been made as shown in Figure 2.
did not make a decision (unlabeled n=20). However, when These minor changes remove two unnecessary decision nodes.
these are removed, there was a high level of agreement about By demonstrating this scheme’s utility on an area of
whether a vocalization was voiceless or had some sort of linguistic study known for its challenges it can be inferred that
phonation present (94%). The largest number of Category 1b it will fare as well or better in other areas of study. Further
errors was when the independent analyst misjudged a modal application of this methodology is not limited to the field of
voice as voiceless (n=7). language acquisition. As defining voice quality categories
Other voice qualities were also confused for each other. across disciplines continues to be an issue (including inter-
Category 2 errors account for the misclassification between rater and inter-lab reliability) this scheme may fulfill a need
modal and breathy. Category 3 errors account for the for a replicable method of voice quality classification. Due to
misclassification between creaky and harsh. These category the voice quality categories being based on adults rather than
errors occur because the scheme requires a forced decision. infants it would be possible to extend the scheme to work with
It is also suggested that voice quality categories may be adult populations as well. This scheme particularly commends
incorrectly categorized due to the perceptual similarities itself for application in any field where the classification of
between them (e.g. whispery voice and breathy voice). voice quality provides discriminatory insight including but not
Improved familiarity with borderline cases would help limited to the fields of socio-linguistics, forensic linguistics
differentiate between the more easily confused voice qualities. and speech pathology. Preliminary studies are currently
Future training of analysts should focus on this. Qualitative underway investigating its suitability for its use with
119
Table of Contents
for this manuscript
2a 2monics
Are there regular har-
with little to no
2b Are the harmonics widely
yes energy above 3000Hz? no Whisper Voiceless
yes spaced and the f0 high? no
Does it sound ‘normal’ ?
Harsh Whispery
Voice Voice
120