0% found this document useful (0 votes)
20 views6 pages

508_Paper

The Distress Analysis Interview Corpus (DAIC) is a collection of clinical interviews aimed at diagnosing psychological distress, including anxiety, depression, and PTSD, conducted by humans and automated agents. The corpus includes audio, video, and physiological data from various interview formats and has been utilized to develop automated interviewing systems and research on distress detection. Key findings indicate that specific verbal and nonverbal cues can serve as indicators of psychological distress, aiding in the understanding and diagnosis of mental health conditions.

Uploaded by

naifbaloch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views6 pages

508_Paper

The Distress Analysis Interview Corpus (DAIC) is a collection of clinical interviews aimed at diagnosing psychological distress, including anxiety, depression, and PTSD, conducted by humans and automated agents. The corpus includes audio, video, and physiological data from various interview formats and has been utilized to develop automated interviewing systems and research on distress detection. Key findings indicate that specific verbal and nonverbal cues can serve as indicators of psychological distress, aiding in the understanding and diagnosis of mental health conditions.

Uploaded by

naifbaloch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

The Distress Analysis Interview Corpus of human and computer interviews

Jonathan Gratch, Ron Artstein, Gale Lucas, Giota Stratou, Stefan Scherer,
Angela Nazarian, Rachel Wood, Jill Boberg, David DeVault,
Stacy Marsella, David Traum, Skip Rizzo, Louis-Philippe Morency
USC Institute for Creative Technologies, 12015 Waterfront Drive, Playa Vista CA 90094-2536, USA
{gratch, artstein, lucas, stratou, scherer, nazarian, rwood, boberg, devault, marsella, traum, rizzo, morency}@ict.usc.edu
Abstract
The Distress Analysis Interview Corpus (DAIC) contains clinical interviews designed to support the diagnosis of psychological distress
conditions such as anxiety, depression, and post traumatic stress disorder. The interviews are conducted by humans, human controlled
agents and autonomous agents, and the participants include both distressed and non-distressed individuals. Data collected include audio
and video recordings and extensive questionnaire responses; parts of the corpus have been transcribed and annotated for a variety of
verbal and non-verbal features. The corpus has been used to support the creation of an automated interviewer agent, and for research on
the automatic identification of psychological distress.

Keywords: multimodal corpora, virtual humans, dialogue systems, nonverbal behavior


1. Overview
Untreated mental illness creates enormous social and eco-
nomic costs, yet many cases go undiagnosed. Up to half
of patients with psychiatric disorders are not recognized
as having mental illness by their primary care physicians
(Higgins, 1994). Within health-care settings, a first step in
identifying mental illness is a semi-structured clinical inter-
view, where health-care providers ask a series of questions
aimed at identifying clinical symptoms in an open-ended
fashion. Recently, there is considerable research interest in
developing tools to analyze the verbal and nonverbal con-
tent of these interviews as a means for building decision-
support tools (Gratch et al., 2013) and computer-assisted Figure 1: Face-to-face interview setup.
self-administered screenings (Bickmore et al., 2005), and
for answering fundamental questions about language, non-
verbal behavior and mental illness (Scherer et al., 2013b;
Yang et al., 2013; Alvarez-Conrad et al., 2001).
The Distress Analysis Interview Corpus (DAIC) is a multi-
modal collection of semi-structured clinical interviews. De-
signed to simulate standard protocols for identifying people
at risk for post-traumatic stress disorder (PTSD) and ma-
jor depression, these interviews were collected as part of a
larger effort to create a computer agent that interviews peo-
ple and identifies verbal and nonverbal indicators of men-
tal illness (DeVault et al., 2014). The corpus contains four
types of interviews:
Face-to-face interviews between participants and a human Figure 2: Ellie, the virtual interviewer.
interviewer (Figure 1);
Teleconference interviews, conducted by a human inter- Participants are drawn from two distinct populations living
viewer over a teleconferencing system; in the Greater Los Angeles metropolitan area – veterans of
Wizard-of-Oz interviews, conducted by an animated vir- the U.S. armed forces and from the general public – and are
tual interviewer called Ellie (Figure 2), controlled by coded for depression, PTSD and anxiety based on accepted
a human interviewer in another room; psychiatric questionnaires.

Automated interviews, where participants are interviewed Besides informing the development of computer-assisted
by Ellie operating as an agent in a fully automated interviews that improve rates of diagnosis, the corpus
mode. has been used to examine several fundamental questions
about language, nonverbal behavior, psychophysiology and
Sample interview excerpts are shown in Figure 3. human-computer interaction. This article describes the de-

3123
Face-to-face Wizard-of-Oz Autonomous
Interviewer Let’s see, is there someone Ellie Who’s someone that’s been a pos- Ellie Who’s someone that’s been a pos-
in your life that’s been a really positive itive influence in your life? itive influence in your life?
influence for you? Participant Uh my father. Participant My mom
Participant Uh yeah, my husband, Ellie Can you tell me about that? Participant Has been a positive influ-
yeah. Participant Yeah, he is a uh ence
Interviewer Yeah. Participant He’s a very he’s a man of Ellie Tell me more about that
Interviewer What kind of values did few words Participant Well she’s just always re-
you take away from him? Participant And uh he’s very calm ally nice and
Participant Uh he’s always uh thinking Participant Slow to anger Participant Considerate and upbeat and
ahead and looks at the big picture and Participant And um very warm very Participant Sh just positive person
doesn’t uh mull over trivial things so loving man Ellie How would your best friend de-
that’s something that helped me. Participant Responsible scribe you?
Interviewer Mhm yeah, those are good Participant And uh he’s a gentleman Participant Outgoing funny
traits to have. has a great sense of style and he’s a Participant A little bit
Participant Yeah, yes. great cook. Participant Foolish <laughter>
Interviewer Um how did you guys Ellie Uh huh
meet? Ellie What are you most proud of in
your life?

Figure 3: Sample excerpts from three types of interviews

velopment and content of the corpus and summarizes the pression and PTSD; and ending with a “cool-down” phase,
key scientific findings it has supported to date. to ensure that participants would not leave the interview in
a distressed state of mind.
2. Collection
Each face-to-face and teleconference interview was con-
Collection began with the interpersonal interviews (both ducted by one of two female interviewers. In the face-to-
face-to-face and via teleconferencing), followed by the face condition, only the participant and interviewer were in
Wizard-of-Oz interviews and the automated interviews. the room during the interview (Figure 1 above). In the tele-
Participants were recruited through two channels: on-line conference, Wizard-of-Oz and automated interviews, par-
ads posted on Craigslist.org, and on-site at a US Vets facil- ticipants were alone in a room in front of a large computer
ity in Southern California (US Vets is a non-profit organi- screen, showing the human interviewer in the teleconfer-
zation that helps veterans re-integrate into civilian life af- ence interviews, and the animated character Ellie in the
ter deployment). Face-to-face interviews for the Craigslist Wizard-of-Oz and automated interviews (Figure 2 above).
population took place at the USC Institute for Creative Ellie’s behavior in the Wizard-of-Oz collection was con-
Technologies (ICT) in Los Angeles, California, while the trolled by two wizards, responsible for non-verbal behav-
US Vets population were interviewed on-site. Teleconfer- iors (e.g., nods and facial expressions) and verbal utter-
ence, Wizard-of-Oz, and automated agent interviews were ances, respectively (the wizards were the same two in-
all conducted at ICT: teleconference and Wizard-of-Oz par- terviewers from the face-to-face and teleconference inter-
ticipants were only recruited through Craigslist, while for views). Two wizards were necessary because controlling
the automated agent, US Vets participants were transported both verbal utterances and non-verbal behaviors proved too
to ICT. All participants were fluent English speakers and difficult for a single person to handle in real time. Ellie had
all interviews were conducted in English. Face-to-face and a fixed set of utterances (these consisted of pre-recorded
teleconference interviews typically took 30–60 minutes, audio of the wizard that controlled Ellie’s verbal behav-
while Wizard-of-Oz and automated inteviews were shorter ior and pre-animated gestures and facial expressions based
(Wizard-of-Oz 5–20 minutes, automated 15–25 minutes). on those typically employed during the face-to-face inter-
A summary of collected interviews can be seen in Table 1. views). Small changes were made to the interview protocol
All collection efforts used the same experimental proto- throughout the data collection effort: the wizards followed
col except where changes were required by the nature of a written policy which gradually became stricter and more
the interaction (human vs. computer interviewer) or instru- structured.
mentation (see below). Participants first completed a con- The autonomous agent’s behavior was guided solely by
sent form (which included optional consent that allowed its implemented policies, without any manual intervention.
their data to be shared for research purposes). They then The policies were refined over time as the agent develop-
completed set of questionnaires alone on a computer, then ment progressed.
went through the interview, followed by additional ques-
tionnaires after the interview. Participants were recorded Starting partway through the Wizard-of-Oz collection and
only during the interview. Interviews were semi-structured, continuing through the automated agent collection, partic-
starting with neutral questions designed to build rapport and ipants were randomly assigned to one of two framing con-
make the participant comfortable; progressing to more spe- ditions, presenting the character as either an autonomous
cific questions about symptoms and events related to de- computer system or a system controlled by a person.

3124
Framing Distress
Condition Total Transcribed Biopac
Human Computer Yes No
Face-to-face 120 — — 49 71 74 20
Teleconference 45 — — 16 29 0 20
Wizard-of-Oz 193 53 140 60 133 193 66
Automated agent 263 47 216 99 164 95 71

Table 1: Interviews collected

3. Corpus Composition ety Inventory (Spielberger et al., 1983), all of which were
highly correlated, reflecting the typical comorbidity found
3.1. Verbal, Nonverbal and Physiological between these clinical conditions (Gaudiano and Zimmer-
Instrumentation man, 2010; Scherer et al., 2013b). After the interac-
The corpus contains audio, video, and depth sensor (Mi- tion, participants completed the PANAS again, Kang and
crosoft Kinect) recordings of all the interactions. For Gratch’s (2012) Rapport scale, and a measure of social
the face-to-face and teleconference interactions, the inter- desirability (Li and Bagger, 2007). They also rated their
viewer and participant were recorded by separate cameras, interaction partner on 32 adjectives using a 7-point likert
lapel microphones, and Kinects; additionally, face-to-face scale with response options ranging from a positive adjec-
interviews used an overhead camera to capture the general tive (e.g., polite, kind, warm) to a negative adjective (rude,
orientation of the interviewer and participant in the envi- cruel, cold, respectively).
ronment. In the Wizard-of-Oz and automated agent in- Before the interview, some participants completed a mea-
terviews, the participant was recorded by a camera, high- sure of the five factors of personality (John et al., 1991), and
quality close-talking microphone, and Kinect, while the others completed a measure of emotion regulation (Gross
agent was recorded through screen-capture software. and John, 2003). After the interview, participants in the
A subset of the collections also include physiological data Wizard-of-Oz and automated agent conditions rated their
(Biopac).1 We record galvanic skin response (GSR), elec- fears of being evaluated negatively during the interview
trocardiogram (ECG), and respiration of participants. Sen- (Leary, 1983) as well as the system’s usability (Brooke,
sors were attached following the pre-questionnaires but be- 1996); they also rated success at specific design goals (such
fore the interview. Sensors were connected to the partic- as “Ellie was sensitive to my body language” and “Ellie was
ipants’ trunk and lower-extremities to avoid interference a good listener”) using a 5-point scale from “strongly dis-
with natural gestures. Additionally, participants were re- agree” to “strongly agree”.
quired to sit at rest for three minutes, then presented a se-
ries of standardized emotional pictures (Lang et al., 2008) 3.4. Transcription
to help calibrate the instrument and provide baseline mea- A portion of the interviews was segmented and transcribed
sures of physiological responsiveness. using the ELAN tool from the Max Planck Institute for Psy-
cholinguistics (Brugman and Russel, 2004).2 Each tran-
3.2. System logs scription was reviewed for accuracy by a senior transcriber.
The Wizard-of-Oz and automated agent interviews include Utterances were segmented at boundaries with at least
generated logs of the character’s speech and nonverbal be- 300 milliseconds of silence. The face-to-face and early
havior events. Additionally, the automated agent logs con- Wizard-of-Oz interviews were transcribed from a compos-
tain real-time segmentation, recognition and understanding ite video, combining both participant and interviewer; later
of the participants’ speech and language, which drive the Wizard-of-Oz and automated interviews were transcribed
agent’s actions. from the audio stream of the participant only, while the in-
terviewer utterances were recovered from the system logs.
3.3. Questionnaire data
3.5. Annotation
Participants completed a series of questionnaires prior to
the interview, including basic demographic questions, es- De-identification All the transcribed interviews were an-
tablished measures of psychological distress, and a mea- notated to remove identifying information. Utterances were
sure of current mood. The Positive and Negative Affect tagged for mentions of personal names, specific dates, ad-
Scale (PANAS) was used to assess mood (Watson and dresses, schools, places of employment, and locations that
Clark, 1994). Measures of psychological distress included can be used to narrow down an event. Utterances were
the PTSD Checklist – Civilian Version (Blanchard et al., not considered to be personally identifying if they only
1996), the Patient Health Questionnaire, depression mod- included large locations (e.g. “I live in Santa Monica”),
ule (Kroenke and Spitzer, 2002), and the State-Trait Anxi- very large institutions (“I served in the Marines”), or non-
specific dates such as age in years. De-identification was
1 http://www.biopac.com
2 http://tla.mpi.nl/tools/tla-tools/elan

3125
performed independently by two annotators, and differ- language models for speech recognition, training classi-
ences were reconciled by a senior annotator. Utterances fiers for natural language understanding, and informing the
marked as personally identifying will not be shared in ac- creation of dialogue policies; for details, see DeVault et
cordance with our institution’s ethical guidelines. al. (2014). The corpus has also been used to support the
agent’s capabilities for distress detection, using multiple
Explicit psychological conditions The corpus includes types of information including visual signals, voice qual-
annotations of questions and statements that give an ex- ity, and dialogue-level features.
plicit indication of a past or existing condition of psycho-
logical distress (e.g. “She diagnosed me with a type of de- Visual signals from the face-to-face data show that sev-
pression”). These are useful when developing systems to eral features can serve as indicators of depression, anxi-
detect the more subtle distress signals. ety, and PTSD (Scherer et al., 2013b; Scherer et al., 2014).
Specifically, these forms of psychological distress are pre-
Dialogue annotation Parts of the transcribed corpus have dicted by a more downward angle of the gaze, less intense
been annotated with dialogue-level information to support smiles and shorter average durations of smile, as well as
the development and training of natural language under- longer self-touches and fidget on average longer with both
standing for the agent. Annotations include: (1) Identifica- hands (e.g. rubbing, stroking) and legs (e.g. tapping, shak-
tion of clarification questions by the participant (e.g. “What ing). Moreover, the predictive ability of these indicators is
was the question?”). (2) Places that are appropriate for moderated by gender (Stratou et al., 2013). A crossover
the agent to provide a positive or negative empathy re- interaction was observed between gender and distress level
sponse (e.g. “That’s great” or “I’m sorry”), used to tune the on emotional displays such as frowning, contempt, and dis-
thresholds on the valence classifier for the agent (DeVault gust. For example, men who scored positively for depres-
et al., 2014); inter-rater agreement (Krippendorff, 2011) sion tend to display more frowning than men who did not,
was 0.73 for positive empathy and 0.81 for negative em- whereas women who scored positively for depression tend
pathy. (3) Domain-specific dialogue acts in participants’ to display less frowning than those who did not. Other fea-
responses to specific questions, to support follow-up by the tures such as variability of facial expressions show a main
agent. effect of gender – women tend to be more expressive than
men, while still other observations, such as head-rotation
Non-verbal behavior annotation Several non-verbal be- variation, were entirely gender independent.
haviors were annotated (Waxer, 1974; Hall et al., 1995):
gaze directionality (up, down, left, right, towards inter-
Voice quality from the Wizard-of-Oz data, particularly dif-
viewer), listening smiles (smiles while not speaking), self-
ferences on the breathy to tense dimension, is also a predic-
adaptors (self-touches in the hand, body, and head), fidget-
tor of psychological distress (Scherer et al., 2013a; Scherer
ing behaviors, and foot-tapping or shaking behaviors. Each
et al., 2014). Depression and PTSD are both predicted
behavior was annotated in a separate tier in ELAN. Four
by more tense voice features, such that those with depres-
student annotators participated in the annotation; each tier
sion and PTSD exhibit more tense voice characteristics than
was assigned to a pair of annotators, who first went through
those without depression or PTSD. Tense voice features
a training phase until the inter-rater agreement (Krippen-
were, specifically, able to distinguish interviewees with de-
dorff’s alpha) exceeded 0.7. Following training, each video
pression from those without depression with an accuracy
was annotated by a single annotator; to monitor reliabil-
of 75%, and distinguish those with PTSD from those with-
ity, every 10–15 videos each pair was assigned the same
out PTSD with an accuracy of 72%.
video and inter-rater agreement was re-checked. Annota-
tors were informed that their reliability was measured but
Dialogue-level features in the Wizard-of-Oz data also
did not know which videos were used for cross-checking
serve as indicators of distress: distressed individuals were
(Wildman et al., 1975; Harris and Lahey, 1982).
slower to begin speaking and used fewer filled pauses than
In addition, automatic annotation of non-verbal features non-distressed participants (DeVault et al., 2013). More-
was carried out using a multimodal sensor fusion frame- over, the type of distress may moderate which dialogue-
work called MultiSense, with a multithreading architecture level features are most predictive: standard deviation in on-
that enables different face- and body-tracking technologies set time of first segment in each user turn yielded was the
to run in parallel and in realtime. Output from MultiSense best unique predictor of depression, yet, for PTSD, mean
was used to estimate the head orientation, the eye-gaze di- number of filled pauses in user segments was among the
rection, smile level, and smile duration. Further, we au- most informative. For overall distress – across depression
tomatically analyzed voice characteristics including speak- and PTSD, mean maximum valence in user segments was
ers’ prosody (e.g. fundamental frequency or voice intensity) the most valuable. As moderating by gender improved the
and voice quality characteristics, on a breathy to tense di- ability of visual signals to predict distress, moderating by
mension (Scherer et al., 2013a). type of question improves the ability of dialogue-level fea-
tures to detect depression (Yu et al., 2013). For example,
4. Usage time to onset of speech in response to intimate questions
predicts distress, whereas length of speech is more predic-
The corpus has been used to support the automated agent’s tive of distress from responses to rapport building ques-
interactive capabilities by developing custom acoustic and tions.

3126
Overall, the corpus helps with the identification of subtle Blanchard, E. B., Jones-Alexander, J., Buckley, T. C., and
indicators of psychological distress along multiple behav- Forneris, C. A. (1996). Psychometric properties of the
ioral dimensions; these indicators will be implemented in PTSD checklist (PCL). Behaviour Research and Ther-
the agent, to allow it to identify people who should be re- apy, 34(8):669–673, August.
ferred to further evaluation. In addition, the corpus has been Brooke, J. (1996). SUS: A ‘quick and dirty’ usability scale.
used for research that does not directly support develop- In Jordan, P. W., Thomas, B., Weerdmeester, B. A., and
ment of the agent; for example, some research has consid- McClelland, I. L., editors, Usability Evaluation in Indus-
ered how interviews might differ depending on the method try, chapter 21, pages 189–194. Taylor and Francis, Lon-
of data collection (face-to-face, Wizard-of-Oz, automated don.
agent interviews). A comparison of face-to-face and wizard Brugman, H. and Russel, A. (2004). Annotating multi-
dialogues found that participants use twice as many filled media / multi-modal resources with ELAN. In Proceed-
pauses when talking to the animated character than when ings of the Fourth International Conference on Language
talking to a live interviewer – precisely the opposite of pre- Resources and Evaluation (LREC 2004), pages 2065–
vious results on people talking to task-oriented dialogue 2068, Lisbon, Portugal, May.
systems, where they were less disfluent when talking to the
DeVault, D., Georgila, K., Artstein, R., Morbini, F., Traum,
computer (Faust and Artstein, 2013). An investigation of
D., Scherer, S., Rizzo, A. S., and Morency, L.-P. (2013).
the effects of framing the character as human-controlled
Verbal indicators of psychological distress in interac-
or autonomous showed that participants felt lower fear of
tive dialogue with a virtual human. In Proceedings of
negative evaluation and engaged in less impression man-
the SIGDIAL 2013 Conference, pages 193–202, Metz,
agement when the character was framed as autonomous
France, August. Association for Computational Linguis-
than when it was framed as human-controlled (Gratch et
tics.
al., 2014b; Gratch et al., 2014a). In fact, actual method
DeVault, D., Artstein, R., Benn, G., Dey, T., Fast, E.,
of data collection (Wizard-of-Oz versus automated agent
Gainer, A., Georgila, K., Gratch, J., Hartholt, A., Lhom-
interviews) had no impact on fear of negative evaluation
met, M., Lucas, G., Marsella, S., Morbini, F., Nazarian,
or impression management, but who participants believed
A., Scherer, S., Stratou, G., Suri, A., Traum, D., Wood,
they were interacting with (human versus computer) ef-
R., Xu, Y., Rizzo, A., and Morency, L.-P. (2014). Sim-
fected both fear of negative evaluation and impression man-
Sensei kiosk: A virtual human interviewer for health-
agement. Moreover, participants also displayed sad emo-
care decision support. In Proceedings of the 13th Inter-
tional expressions more intensely when they believed they
national Conference on Autonomous Agents and Multia-
were interacting with a computer compared to a human.
gent Systems (AAMAS’14), Paris.
This robust dataset has the potential to help various re-
searchers address questions across areas of mental health, Faust, L. and Artstein, R. (2013). People hesitate more,
human-agent interactions, and verbal and non-verbal be- talk less to virtual interviewers than to human interview-
havior. ers. In Fernández, R. and Isard, A., editors, Semdial
2013 DialDam: Proceedings of the 17th Workshop on
the Semantics and Pragmatics of Dialogue, pages 35–43,
5. Distribution
Amsterdam, December.
Currently, the corpus is being shared on a case-by-case ba- Gaudiano, B. A. and Zimmerman, M. (2010). Does comor-
sis by request and for research purposes. Longer-term we bid posttraumatic stress disorder affect the severity and
intend to make significant portions of the data more broadly course of psychotic major depressive disorder? Journal
available to the research community. of Clinical Psychiatry, 71(4):442–450.
Gratch, J., Morency, L.-P., Scherer, S., Stratou, G., Boberg,
Acknowledgments J., Koenig, S., Adamson, T., and Rizzo, A. (2013).
User-state sensing for virtual health agents and telehealth
applications. In Westwood, J. D., Westwood, S. W.,
This work is supported by DARPA under contract
Felländer-Tsai, L., Haluck, R. S., Robb, R. A., Senger,
W911NF-04-D-0005 and by the U.S. Army RDECOM.
S., and Vosburgh, K. G., editors, Medicine Meets Virtual
Statements and opinions expressed do not necessarily re-
Reality 20, volume 184 of Studies in Health Technology
flect the position or the policy of the United States Govern-
and Informatics, pages 151–157. IOS Press.
ment, and no official endorsement should be inferred.
Gratch, J., Lucas, G. M., and King, A. (2014a). It’s only
a computer: Virtual humans increase willingness to dis-
6. References close. Manuscript under review.
Alvarez-Conrad, J., Zoellner, L. A., and Foa, E. B. (2001). Gratch, J., Lucas, G. M., King, A., and Morency, L.-P.
Linguistic predictors of trauma pathology and physi- (2014b). It’s only a computer: The impact of human-
cal health. Applied Cognitive Psychology, 15(7):S159– agent interaction in clinical interviews. In Proceedings
S170. of the 13th International Conference on Autonomous
Bickmore, T., Gruber, A., and Picard, R. (2005). Establish- Agents and Multiagent Systems (AAMAS’14), Paris.
ing the computer–patient working alliance in automated Gross, J. J. and John, O. P. (2003). Individual differences
health behavior change interventions. Patient Education in two emotion regulation processes: Implications for af-
and Counseling, 59(1):21–30, October. fect, relationships, and well-being. Journal of Personal-

3127
ity and Social Psychology, 85:348–362. Proceedings of International Conference on Affective
Hall, J. A., Harrigan, J. A., and Rosenthal, R. (1995). Non- Computing and Intelligent Interaction (ACII), Geneva,
verbal behavior in clinician-patient interaction. Applied Switzerland, September.
and Preventive Psychology, 4(1):21–37. Watson, D. and Clark, L. A. (1994). The PANAS-X: Man-
Harris, F. and Lahey, B. (1982). Recording system bias in ual for the positive and negative affect schedule – Ex-
direct observational methodology: A review and critical panded Form. University of Iowa, Iowa City.
analysis of factors causing inaccurate coding behavior. Waxer, P. (1974). Nonverbal cues for depression. Journal
Clinical Psychology Review, 2(4):539–556. of Abnormal Psychology, 83(3):319–322.
Higgins, E. S. (1994). A review of unrecognized men- Wildman, B., Erickson, M., and Kent, R. (1975). The ef-
tal illness in primary care: Prevalence, natural history, fect of two training procedures on observer agreement
and efforts to change the course. Archives of Family and variability of behavior ratings. Child Development,
Medicine, 3(10):908–917, October. pages 520–524.
John, O. P., Donahue, E. M., and Kentle, R. L. (1991). The Yang, Y., Fairbairn, C., and Cohn, J. (2013). Detecting
Big Five Inventory–Versions 4a and 54. University of depression severity from vocal prosody. Affective Com-
California, Berkeley, Institute of Personality and Social puting, IEEE Transactions on, 4(2):142–150.
Research, Berkeley, CA. Yu, Z., Scherer, S., DeVault, D., Gratch, J., Stratou, G.,
Kang, S.-H. and Gratch, J. (2012). Socially anxious peo- Morency, L.-P., and Cassell, J. (2013). Multimodal pre-
ple reveal more personal information with virtual coun- diction of psychological disorders: Learning verbal and
selors that talk about themselves using intimate human nonverbal commonalities in adjacency pairs. In Fernán-
back stories. In Wiederhold, B. and Riva, G., editors, dez, R. and Isard, A., editors, Semdial 2013 DialDam:
Annual Review of Cybertherapy and Telemedicine 2012, Proceedings of the 17th Workshop on the Semantics and
volume 181, pages 202–207. IOS Press, Amsterdam. Pragmatics of Dialogue, pages 160–169, Amsterdam,
Krippendorff, K. (2011). Agreement and information in December.
the reliability of coding. Communication Methods and
Measures, 5(2):93–112.
Kroenke, K. and Spitzer, R. L. (2002). The PHQ-9: A new
depression diagnostic and severity measure. Psychiatric
Annals, 32(9):509–515, September.
Lang, P. J., Bradley, M. M., and Cuthbert, B. N. (2008).
International affective picture system (IAPS): Affective
ratings of pictures and instruction manual. Technical Re-
port A-8, University of Florida, Gainesville, FL.
Leary, M. R. (1983). A brief version of the fear of nega-
tive evaluation scale. Personality and Social Psychology
Bulletin, 9:371–376.
Li, A. and Bagger, J. (2007). The balanced inventory
of desirable responding: A reliability and generaliza-
tion study. Educational and Psychological Measure-
ment, 40:131–141.
Scherer, S., Stratou, G., Gratch, J., and Morency, L.-
P. (2013a). Investigating voice quality as a speaker-
independent indicator of depression and PTSD. In Pro-
ceedings of Interspeech 2013, Lyon, France, Augist.
Scherer, S., Stratou, G., Mahmoud, M., Boberg, J., Gratch,
J., Rizzo, A. S., and Morency, L.-P. (2013b). Automatic
behavior descriptors for psychological disorder analy-
sis. In 10th IEEE International Conference on Automatic
Face and Gesture Recognition, Shanghai, China, April.
Scherer, S., Stratou, G., Lucas, G. M., Mahmoud, M.,
Boberg, J., Gratch, J., Rizzo, A., and Morency, L.-P.
(2014). Automatic audiovisual behavior descriptors for
psychological disorder analysis. Manuscript under re-
view.
Spielberger, C., Gorssuch, R., Lushene, P., Vagg, P., and
Jacobs, G. (1983). Manual for the State-Trait Anxiety
Inventory. Consulting Psychologists Press, Inc.
Stratou, G., Scherer, S., Gratch, J., and Morency, L.-P.
(2013). Automatic nonverbal behavior indicators of de-
pression and PTSD: Exploring gender differences. In

3128

You might also like