Enabling Robots To Understand Indirect Speech Acts in Task-Based Interactions
Enabling Robots To Understand Indirect Speech Acts in Task-Based Interactions
in Task-Based Interactions
Gordon Briggs
NRC Postdoctoral Fellow, U.S. Naval Research Laboratory
Tom Williams
Human-Robot Interaction Laboratory, Tufts University
and
Matthias Scheutz
Human-Robot Interaction Laboratory, Tufts University
An important open problem for enabling truly taskable robots is the lack of task-general natural
language mechanisms within cognitive robot architectures that enable robots to understand typi-
cal forms of human directives and generate appropriate responses. In this paper, we first provide
experimental evidence that humans tend to phrase their directives to robots indirectly, especially in
socially conventionalized contexts. We then introduce pragmatic and dialogue-based mechanisms to
infer intended meanings from such indirect speech acts and demonstrate that these mechanisms can
handle all indirect speech acts found in our experiment as well as other common forms of requests.
Keywords: human-robot dialogue, human perceptions of robot communication, robot architectures,
speech act theory, intention understanding
1. Introduction
Two key challenges at the intersection of artificial intelligence (AI), robotics, and human-robot in-
teraction (HRI) need to be addressed in order to enable truly taskable robots: (1) the capability
challenge of developing robotic agents that are able to both algorithmically and physically perform
the desired tasks, and (2) the interaction challenge of developing agents that can be instructed by
humans through natural language (NL) in natural and intuitive ways (e.g. Scheutz, Schermerhorn,
Kramer, & Anderson, 2007) to perform the desired tasks and appropriately respond to these instruc-
tions. We will focus exclusively on the second challenge.
Specifically, we will focus on how robots can understand directives: utterances issued with the
intention that the addressee will perform some task for the speaker. For example, for an utterance
intended as a question, the speaker intends the addressee to provide an informative response; for
an utterance intended as a command, the speaker intends the addressee to perform some general
action. Several capabilities are required to understand even the simplest of directives, from speech
recognition, to syntactic and semantic analysis, to pragmatic understanding. What is more, robotic
Authors retain copyright and grant the Journal of Human-Robot Interaction right of first publication with the work
simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an
acknowledgement of the work’s authorship and initial publication in this journal.
Journal of Human-Robot Interaction, Vol. 6, No. 1, 2017, Pages 64–94. DOI 10.5898/JHRI.6.1.Briggs
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
architectures that provide these capabilities must take into account the many social norms that NL-
enabled agents are expected to obey. For example, in order to adhere to social norms such as polite-
ness, people frequently use so-called indirect speech acts (ISAs), in which the speech act’s literal
meaning does not match its intended meaning. For example, since it is often rude to use a direct
command such as “Bring me coffee,” one might instead use the indirect request “Could you get me a
coffee?”, which is literally a request for information yet indirectly communicates the speaker’s true
intention: a request to be brought coffee. In such cases, human listeners automatically and without
contemplation understand the indirect interpretation to be the intended one. Hence, if humans were
to use the same kinds of indirect speech acts with robots—as we will show—robots interacting with
humans will need to take social norms into account if they are to properly understand the intentions
implied by human utterances.
We believe that one of the primary obstacles for enabling truly interactive taskable robots is the
lack of general, integrated, and architectural mechanisms for understanding the intentions behind
directives, regardless of how these directives are expressed. In this paper, we present mechanisms
integrated into a cognitive robotic architecture that make strides toward addressing this obstacle and
demonstrate that the proposed mechanisms can handle all common indirect speech acts found in
an empirical study specifically designed to probe the extent to which ISAs will actually be used
in human-robot dialogue. To ensure terminological clarity, we will use the following definitions
throughout the paper.
Indirect speech act: an utterance whose literal meaning does not match its intended meaning.
Direct speech act: an utterance whose literal and intended meanings match.
Illocutionary point: the category of an utterance, such as statement, question, suggestion, or
command. An utterance has both a literal illocutionary point (which is
directly reflected in the utterance’s form) and an intended illocutionary
point. For direct speech acts, these match. For indirect speech acts, they
may or may not.
Directive: an utterance intended to cause the addressee to perform some action.
Direct request: a direct directive whose literal illocutionary point is that of a question.
Direct command: a direct directive whose literal illocutionary point is that of a command.
Indirect request: any indirect directive. Thus, an indirect request is an indirect speech act
with the literal illocutionary point of a statement, question, or suggestion,
and the intended illocutionary point of a question or command.
The rest of the paper will proceed as follows. In Section 2, we begin by discussing the com-
putational challenge of indirect speech act understanding. In Section 3, we then present the results
of an HRI study intended to probe the extent to which humans will use indirect speech acts in their
NL interactions with robots. The results of this experiment provide the first evidence for the need to
develop mechanisms in cognitive robot architectures for handling indirect requests. After discussing
these results, we use them to present design recommendations for robot architecture designers. In
Section 4, we introduce an architectural framework that makes significant progress toward address-
ing the interaction challenge for taskable robots. Specifically, we discuss data representations and
inference algorithms that use contextual knowledge to allow a robot to understand commands and
indirect requests. In Section 5, we demonstrate the proposed mechanisms integrated in a cognitive
robotic architecture and show how they handle the variety of utterances observed in our human-
subject experiment. Finally, in Sections 6 and 7, we discuss the significance of our results and
propose directions for future work.
65
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
2. Computational Motivation
The capabilities necessary for understanding indirect speech acts (or any other form of speech act)
can be thought of as a subset of those necessary for achieving common ground, (i.e., mutual under-
standing, between interactants). Theoretical work in conversation and dialogue has construed the
establishment of common ground as a multi-stage process (Clark, 1996). In the attentional stage,
interactants must attend to each another in a conversational context. In the perceptual stage, the
addressee must successfully perceive a communicative act directed to him/her by the speaker. In the
semantic understanding stage, the addressee must abduce some literal meaning from this perceived
act. Finally, in the intentional understanding stage, which Clark 1996 terms uptake, the addressee
must abduce some intention from this literal meaning given the joint context.
While Clark’s multi-stage model of establishing common ground is valuable in conceptualizing
the challenges involved, it can be even further refined. Schlöder 2014 proposes that uptake be
divided into both weak and strong forms. Weak uptake can be associated with Clark’s intentional
understanding process, whereas strong uptake denotes the stage where the addressee may either
accept or reject the proposal implicit in the speaker’s action. A proposal is not strongly “taken up”
unless it has been accepted as well as understood (Schlöder, 2014). This distinction is important
as the addressee can certainly understand the intentions of an indirect request such as “Could you
deliver the package?”, but this does not necessarily mean that the addressee will actually agree to
the request and carry it out. In order for the proposal to be accepted, a set of felicity conditions must
hold. For a directive, the addressee must have the knowledge, capacity, obligation, and permission
necessary to carry out the intended action before he or she (or it) will accept that directive.
Uptake in human-robot interaction requires mechanisms that either implicitly or explicitly deal
with the previous stages in Clark’s model of mutual understanding (attentional, perceptual, and
semantic), and there have indeed been several prior attempts at implicitly enabling human-robot
interaction at these levels. For instance, work at the attentional level includes the development of
mechanisms to detect when an interactant is engaged with the robot (e.g. Rich, Ponsler, Holroyd,
& Sidner, 2010). Work at the perceptual level includes projects seeking to enable robust speech
and gesture recognition (e.g. Gomez, Kawahara, Nakamura, & Nakadai, 2012). Finally, work at
the semantic understanding level includes developing mechanisms to tackle challenges, such as
reference resolution (e.g. Tellex et al., 2013). A much smaller body of work, however, has focused
on explicit intentional understanding in human-agent interactions (weak uptake), which we discuss
next.
In order to handle common linguistic forms of directives, robot architectures require a number
of additional mechanisms, as directives can come in both literal and non-literal forms. Often these
non-literal forms (i.e., indirect requests) are considered to be tightly associated with their intended
meanings, as is the case in conventionalized (or idiomatic) ISAs (Clark & Schunk, 1980; Searle,
1975). Conventionalized ISAs include questions used as pre-requests to remove potential obstacles
toward a desired action or outcome (e.g., “Can I get a coffee?”) (Gibbs Jr, 1986) and assertions of
needs or desires (e.g., “I would like a coffee”).
Some robot architectures have enabled ISA understanding by using rule-based systems that rea-
son over conventionalized forms (Wilske & Kruijff, 2006). This approach stands in contrast to
the plan-reasoning, or inferential approach, where each utterance is viewed as an action within the
speaker’s larger dialogue plan (Perrault & Allen, 1980).
There are advantages and disadvantages to both strategies. The idiomatic approach, while less
computationally expensive, can only detect and handle indirect speech acts that have (known) con-
ventionalized forms. The inferential approach, while more general, requires computationally ex-
pensive goal and plan abduction mechanisms. As such, some researchers have proposed hybrid
architectures, which combine the idiomatic and inferential forms of pragmatic reasoning (Briggs &
66
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
Scheutz, 2013; Hinkelman & Allen, 1989). These approaches first attempt to identify whether an
utterance fits a conventionalized form given the current context. If the utterance does not fit any
known conventionalized form, a more expensive plan-reasoning process is utilized.
The choice to develop ISA-understanding mechanisms presupposes that ISAs will be used in
human-robot dialogues because they are used in human-human dialogues. However, it is not imme-
diately obvious whether this assumption is warranted. It would be reasonable for one to suspect, for
example, that humans might talk to robots the way they typically talk to dogs: by issuing simple,
direct commands. This should be especially true for simple tasks that lack their own conventional-
ized social norms. And although we have some evidence from previous HRI studies suggesting that
the opposite might be true—that people make frequent use of indirect speech acts when instruct-
ing robots, even in the simplest of tasks—there is currently no formal HRI study that verifies this
transfer of human social norms to robot interactants.
In the next section, we thus present the results of a human-subjects experiment intended to
investigate the extent to which humans will actually use ISAs when interacting with robots through
natural language in task-based settings. The context of the experiment is a simple, novel task, for
which conventionalized social norms do not exist beyond those of everyday life, and which is simple
enough that it can be achieved solely through simple, direct commands. If, indeed, ISAs come so
natural to people that they will even use them in a novel, non-conventionalized task as the one we
will explore in the experiment, then this will provide firm evidence for the need to handle ISAs in
robotic architecture capable of task-based natural language interactions with humans.
3.1 Design
For this experiment, we chose one of the simplest tasks we could find in the HRI literature; a human
instructor must command a robot to knock over colored towers built out of cans (Briggs & Scheutz,
2014). This is an interaction so simple that one would only expect direct language to be used by
participants; the task can be easily accomplished using only instructions of the form “knock over the
<color> tower.”
Participants were told that the experimenters were developing natural language interaction capa-
bilities for robots and that their task would be to interact with a “tower-toppling robot.” Participants
67
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
were given a list of three towers (i.e., “Red tower,” “Yellow tower,” and “Blue tower”) and were told
that, after being introduced to the tower-toppling robot, they were to command the robot to knock
over those three towers, one at a time, in whatever order they wished.
After this briefing, the experimenter left the room through one door, and the tower-toppling robot
(an iRobot Create, as seen in Fig. 1) entered through a different door and introduced itself. This robot
was teleoperated by a trained confederate through a Cognitive Wizard-of-Oz (WoZ) interface (Baxter,
Kennedy, Senft, Lemaignan, & Belpaeme, 2016) created with the ADE implementation (Scheutz
et al., 2013) of the DIARC architecture (Schermerhorn, Kramer, Middendorff, & Scheutz, 2006),
with video data being streamed to the confederate through a GoPro camera affixed to the robot.
Utterances made by the robot were prerecorded using the open source MaryTTS text-to-speech
package. A simple diagram of this setup can be seen in Fig. 2.
68
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
Table 1: Responses given for different categories of ISAs. Direct (Direct Illo-
cutionary) Point: Q = Question, S = Statement, S[Su] = Suggestive Statements;
Cond(ition): P = Preparatory, S = Sincerity; Dir(ection): A = Agent, P = Patient.
3.3 Population
Participants were recruited online and through flyers posted near a university campus. All partic-
ipants (11 male, 13 female) were between the ages of 19 and 69 (M = 32.04, SD = 15.96).
Participants were paid $10 each for their participation and provided informed written consent be-
fore beginning the experiment. While most participants were beyond college age, we asked them
for their current or previous college major, if any. Three reported studying mathematics, computer
science, or engineering; six reported studying a natural science or medicine; five reported studying
a social science; four reported studying a branch of the arts or humanities; two reported studying
some other field; and four reported no previous or current major.
3.4 Results
The demographic ANOVAs revealed no age (F (1, 46) = 2.24, p > .1) or gender (F (1, 47) =
0.79, p > .3) effects. Given that there were no gender effects, we can treat the subjects as a uniform
group that has sufficient size for the purposes of this experiment.
We hypothesized that even in this simple scenario, some participants would use ISAs. In fact,
over half of all participants (n=14) used ISAs and 27.97% of all task-relevant utterances were ISAs.
We found this striking, particularly because of the task’s lack of conventionalized social norms that
would have necessitated the use of ISAs. What is more, participants used roughly this number of
ISAs even when the robots repeatedly demonstrated an inability to understand ISAs: ISAs comprised
29.41% of task-relevant utterances in the understanding condition, and 26.87% of task-relevant ut-
terances in the misunderstanding condition. Mean ISA use was found to be statistically equivalent
using a Welch two-sample TOST analysis (Schuirmann, 1981; Westlake, 1981) (p < .001).
69
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
...
Subject: Can you knock over the yellow tower please?
Robot: Ok. ... I have knocked down the yellow tower.
Subject: Thank you. Can you please knock down the blue tower now, please?
Robot: Ok. ... I have knocked down the blue tower.
Subject: And can you knock down the red tower now?
Robot: Ok. ... I have knocked down the red tower.
Overall, one might expect participants to rely on direct commands for a variety of reasons: (1)
the task used in this experiment is a novel and unfamiliar one for which there exist no convention-
alized social or politeness norms; (2) the task is straightforward, and there is nothing of interest
for participants to talk to the robot about beyond the task at hand, and thus, casual conversation or
small-talk is unlikely; and (3) the task is simple enough that it could be completed solely by issuing
low-level metric commands (e.g., “drive forward one meter”).
However, our results show that not only did most participants use ISAs in this scenario, but
that some relied almost exclusively on them, as seen in Fig. 3. In order to more fully examine this
phenomenon, we decided to examine ISA use across a set of experiments we have previously run. In
addition to the experiment presented in this paper, we examined two additional interaction corpora
collected from previously conducted but unpublished pilot experiments: (1) an experiment similar
to the understanding condition of the presented experiment, differing only in that an Aldebaran
Nao was used, and (2) an experiment similar to the misunderstanding condition of the presented
experiment, but in which the iRobot Create delivered meals instead of demolishing towers.
While the data from those experiments cannot be directly compared to the data from the pre-
sented experiment due to different experimental designs, we can broadly compare ISA use across
the interaction data from all three experiments, to get a sense as to whether or not the ISA use seen in
the presented experiment should be regarded as typical. As shown in Table 2, ISA use was common
across all three experiments. Fig. 4 shows a dialogue drawn from the first of these two interaction
corpora (Row 2 in Table 2); Fig. 5 shows a dialogue drawn from the second of these two interaction
corpora (Row 3 in Table 2).
70
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
Subject: Oh let’s see here. I have to do something. I would like you to go to the blue tower of cans to
your left.
Robot: Ok.
Subject: And I would like you to knock down the top can on the blue tower.
Robot: Ok.
Subject: You can knock down all of the cans if you like.
Robot: Ok.
Subject: Very good, thank you so much. Nice job!
Robot: Hello my name is Waiterbot. I am taking care of you today. Please tell me what drink I should
bring you.
Subject: Uh, I’d like a water.
Robot: That is nice. Please tell me what drink I should bring you.
Subject: Uh could I have a water?
Robot: You could order that. Please tell me what drink I should bring you.
Subject: Um I would like to have a glass of water please.
Robot: That is interesting. Please tell me what drink I should bring you.
Subject: Uh, I need a water.
Robot: Okay. Please tell me what drink I should bring you.
Subject: Um could I have a water?
Robot: You could order that. Please tell me what drink I should bring you.
Subject: Water.
Broadly, this suggests that participants are likely to bring their social norms into interactions
with robots. What is more, participants will bring these politeness norms into contexts even when
the robot’s sole purpose is to fulfill participant’s requests (i.e., it does not purport to have its own
goals, desires, and intentions), and when the robot is distinctly non-humanoid. We suspect that the
frequency of ISA usage might have been even higher had the robot expressed its own goals, which
the participants would have been overriding with their requests, or if a more humanoid robot had
been used (given that previous research has suggested that humans treat humanoid robots more
politely than they treat mechanical robots (Hinds, Roberts, & Jones, 2004)). Overall, the results of
this experiment suggests some high-level design and application principles, which, if followed by
robot architecture designers, should improve task-based dialogue interactions of natural-language
enabled robots with humans.
71
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
contexts for which conventionalized social norms do not exist, and even when the robot repeatedly
demonstrates an inability to understand indirect speech acts.
If a robot is expected to interact with naı̈ve users, this error rate is clearly unacceptable: In such
cases, we believe that it would thus be inappropriate to use a language-enabled robot incapable
of understanding at the very least, common conventionalized ISAs such as those concerned with
capabilities, permissions, and desires. In cases where interaction with naı̈ve users is not expected to
be common, this error rate may be less problematic, as users may be explicitly or implicitly trained
to avoid using indirect language. But this avoidance of natural, polite communication is likely to
come at a cost with respect to humans’ perceptions of the robot: If it is important to robot designers
that human teammates be able to engage in natural, human-like dialogue with a robot, then this
constrained communication style and its associated interaction costs may prove to be unacceptable.
We thus suggest that language-enabled robots engaging in dialogue-based human-robot interactions
must be able to understand ISAs if the robots are expected to commonly engage with naı̈ve users,
or if natural, human-like dialogue is of paramount importance.
72
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
sense to draw attention to your own mental states by asking what they are, as your interlocutor
cannot assess them, or to draw attention to your own desires by suggesting what they can be, as your
interlocutor cannot change them.
Similarly, it does not always make sense to make statements about the abilities of others, espe-
cially when they are the presumed domain experts. Consider the third column of Table 3. Here, the
examples seen in the second row call attention to the preparatory condition of requests concerning
whether or not the action is going to happen anyway. Another subcategory of such patient-directed
preparatory statements would be to call attention to the preparatory condition of capability (e.g.,
“The red tower can be knocked down”). While such an utterance makes sense, it runs the risk of
coming off as rude if the hearer is the presumed domain, as it appears to assert that the speaker
knows something that the hearer does not. Calling attention to either capability or inevitability for
agent-directed preparatory statements runs a similar risk; the speaker calling attention to capability
seems to presume a lack of knowledge on the hearer’s part, whereas calling attention to inevitability
runs the risk of asserting dominance.
The discussion in this section suggests that robot designers should consider (at least) the follow-
ing criteria when deciding what types of ISA forms their system must be prepared to handle: (1) The
likely illocutionary points users will need to convey (e.g., requests, suggestions, and statements); (2)
the relationship between agent and patient in actions users might desire to be performed; and (3) the
relationships between the robot and user which might make some utterance forms presumptive or
rude.
4.1 Utterances
For defining pragmatic rules and inference mechanisms, we adopt the representations from (Briggs
& Scheutz, 2011) and consider utterances of the following form:
U = U tteranceT ype(α, β, X, M )
where UtteranceType denotes the speech act classification, α denotes the speaker, β denotes the
73
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
addressee, X denotes an initial semantic analysis, while M denotes a set of sentential modifiers
(e.g., “now,” “still,” “really,” “please”).
Below we specify the different utterance types that have been currently implemented in the
dialogue component:
Note that each of the utterance types we described also contains a set of sentential modifiers M .
These modifiers do not generally alter the core semantics of the utterance (primarily indicated by the
utterance type and X), but rather, they usually alter other facets of how the utterance is understood
or chosen during generation, including politeness information and additional semantics regarding
the belief state of the interlocutor.
4.2 Pragmatic Rules
For the pragmatic rule representation, we build on Briggs and Scheutz (2013):
74
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
It is sometimes the case that interlocutors expect that both the literal and non-literal aspects of
utterances to be addressed and acknowledged. In addition to appropriately reacting to the intended
illocutionary point of an utterance, there are also expectations regarding the linguistic form of the
response. Utterances with the literal illocutionary point of a question, for instance, generally demand
answers at some point, while utterances with the literal illocutionary point of a command require
either acceptance or rejection. Consider an ISA such as, “May I have a coffee?”, which has the
literal illocutionary point of a question (i.e., as to the permissibility of the speaker obtaining and
having coffee) but has the intended illocutionary point of a command to the addressee to serve coffee
to the speaker. It is clear that in this context the addressee should serve coffee to the speaker, but
in addition, the addressee is prompted by discourse obligations to provide a yes or no answer to the
query. Indeed, previous studies on politeness have demonstrated that people prefer and consider it
more polite when both the literal and intended illocutionary points of an indirect request are attended
to in responses (Clark & Schunk, 1980; Gibbs Jr. & Mueller, 1988).
As such, there should be at least two pathways of interpretation in the dialogue component. The
updates to the agent’s beliefs about the world and the intentions of the speaker should be modified
only by an utterance’s intended meaning, although the literal meaning should also be tracked and
handled.
Likewise, direct statements and questions can also be simply represented. A direct statement
from agent α to agent β that φ is true (e.g., “The coffee is in the breakroom”) can be represented by
the following context-independent rule:
A direct question from agent α to agent β asking whether or not φ is true (e.g., “Is the coffee
in the breakroom?”) can be represented by the following context-independent rule, where itk(α, φ)
denotes the intention of agent α to know the true value of φ:
75
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
α wants to know whether or not β possess a name. The non-literal (i.e., intended) interpretation is
that α wants to know β’s name.
When a speaker α asks an addressee β whether or not β “can turn φ,” this has the literal
interpretation that α wants to know if β has the ability of turning in direction φ, whereas the
non-literal interpretation is that α wants β to turn in direction φ:
Generating Yes-No Responses: The intention of the speaker to know whether or not a proposition φ
holds is represented as an intention-to-know predicate itk(α, φ), which the dialogue component can
use to query the belief component to see if φ holds. If φ holds a ReplyY utterance is constructed
and communicated, otherwise a ReplyN utterance is constructed and communicated.
Generating Answers: In this case, the speaker’s intention is to know some information denoted
by a referring expression (e.g., location of the robot). This can be represented by the intention of
the speaker to want the robot to inform him or her as to the information denoted by the referring
expression (e.g., want(α,informref (β, α, ρ)), where β denotes the robot, α denotes the human in-
terlocutor, and ρ denotes the reference), or an intention-to-know-reference predicate itkRef (α, ρ).
The dialogue component contains a function that searches belief for an appropriate fact that satisfies
the referring expression specified by ρ. If such a fact can be found, it is communicated in a
statement. In the case when no such answer can be found in belief, a general statement that states
the ignorance of the robot is generated (e.g., “Sorry, I do not know that”).
76
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
Context/Belief Space
Bself = {...} Monitor Beliefs
Acknowledgment Y Generate
Expected? Ack
(UtteranceType
Input Utterance == Stmt)
u = UtteranceType(S,H,X,M)
N
Intended Meaning Literal Meaning
Bint Blit
Accept/Reject Y Generate
Expected? Accept/Reject
(UtteranceType
== Instruct)
Assert To Belief
Figure 6. Process diagram for handling and responding to utterances in the dia-
logue component that handles the extended pragmatic representation, addressing
both the literal and non-literal aspects of incoming utterances.
Generating Responses to Directives: When people give directives, they expect feedback as to
whether these directives are either accepted or rejected. However, to answer appropriately, a robot
must be able to reason about the appropriateness of adopting the goal/directive. This depends on
reasoning about the various felicity conditions that need to be satisfied in order to accept a proposed
course of action (the strong uptake process). This reasoning process is beyond the scope of this
paper but is discussed in (Briggs & Scheutz, 2015).
5. Evaluation
In Section 3, we presented evidence of robot-directed ISAs from a human-subject experiment. We
then analyzed the indirect requests seen in this experiment as well as those found in two additional
interaction corpora, producing the taxonomy seen in Table 3. In this section, we verify that the
computational mechanisms introduced in Section 4 can handle the utterance forms associated with
each category in that taxonomy. We first show in Section 5.1 how the mechanisms can be inte-
grated into the natural language procession system of a robot architecture and then demonstrate how
these mechanisms can handle the ISAs in our experimental data. Specifically, in Section 5.2, we
verify coverage of utterance forms seen in Experiment 1, and in Section 5.3, we verify coverage of
utterance forms seen in two additional interaction corpora. Finally, in Section 5.4, we verify cover-
age of utterance forms captured by the proposed taxonomy but not observed in the experimental or
additional corpus data.
77
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
The NL information channel in DIARC consists of a sequence of components that allow for both
the understanding (Fig. 7, Component 1) and generation of spoken language (Component 11). The
key components in the NL information channel that allow for NL understanding are as follows: the
automatic speech recognition (ASR) component, the natural language processing (NLP) component,
the dialogue component, and the belief component. The ASR sends detected natural language text
to the NLP component (see 2), which performs parsing, reference resolution, and initial semantic
analysis. The results are sent to the dialogue component (see 3), which makes pragmatic inferences
to ascertain speaker intent and passes the final semantic analysis to the robot’s belief component
(see 4). These results specify how the robot should update its own beliefs about the world, as well
as beliefs about other agents and their beliefs.
It is worth noting how the component divisions in DIARC appear to nicely correspond to Clark’s
theoretical stages of joint understanding. The ASR component is responsible for the process of
perceptual understanding, while the NLP component is responsible for the process of semantic
understanding. Weak update (intentional understanding) is carried out primarily by the pragmatics
reasoning process in the dialogue component (which also factors in contextual knowledge stored in
the belief component), whereas strong uptake is a process that is started in the belief component but
involves interactions between a number of components.
78
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
In summary, in the context of DIARC, the process of NL understanding can then be thought
of as a multi-stage process that transduces an acoustic speech signal into a set of first-order-like
logical predicates, representative of the interlocutor’s intention, which are asserted into the belief
component. Once this predicate is asserted into the belief component, it is up to a variety of other
mechanisms to determine the appropriate reaction. In the following section, we describe a set of
pragmatic rules that allow the described architectural components to achieve full coverage over the
set of task-relevant utterance forms observed in Section 3.
5.2 Coverage of Utterance Forms Observed in the Presented Experiment
As described in Section 3, the presented experiment used a simple tower-toppling scenario in which
a robot was obligated to find and knock down towers for a human participant. To verify coverage
of utterance forms observed in this experiment, we thus provided the robot with (1) the contextual
knowledge that, in the current tower toppling scenario, the robot is in the role of the tower-toppler
and the human interlocutor is in the role of the instructor (represented in the robot’s initial set of be-
liefs by predicates role[self, towerT oppler] and role[commX, towerInstructor], respectively),
(2) rules defining the role-based obligations found in the tower toppling scenario:
where Atower denotes a set of task-relevant actions/effects for the tower toppling task (e.g.
knockedDown(α, τ ) – that a tower τ is knocked down by agent α), and (3) rules defining when an
agent is potentially obligated to perform an action or achieve some goal state, such as:
79
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
Parse Result:
DIARC’s NLP component performs syntactic analysis, semantic analysis, and literal illocu-
tionary point recognition to transduce the given sentence into the following utterance form:
AskYN(commX,self,can(self,knockedDown(self,red(tower))), {})
If ¬bel(self,potentiallyObl(self,knockedDown(self,red(tower)))):
itk(commX, capableOf(self,knockedDown(self,red(tower))))
Questions of the form “Could you X” were handled through the following rule, where c =
{potentiallyObl(H, X)}:
80
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
Parse Result:
AskYN(commX,self,could(self,knockedDown(self,red(tower))), {})
If ¬ bel(self,potentiallyObl(self,knockedDown(self,red(tower)))):
itk(commX, per(self,knockedDown(self,red(tower))))
Using this rule, the utterance “I need you to knock down the red tower” is handled in the follow-
ing manner:
Parse Result:
Stmt(commX,self,need(commX,knockedDown(self,red(tower))),{})
where γ denotes the item being requested, and (3) rules defining when an agent is potentially obli-
gated to perform an action or achieve some goal state, such as:
81
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
Parse Result:
Stmt(commX,self,can(self,knockedDown(self,red(tower))),{})
If ¬ bel(self,potentiallyObl(self,knockedDown(self,red(tower)))):
want(commX,bel(self,can(self,knockedDown(self,red(tower)))))
Statements such as “We’re going to X” were handled through the following rule, where c =
{potentiallyObl(H, X)}:
Parse Result:
Stmt(commX,self,goingTo(we,knockedDown(self,red(tower))),{})
82
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
want(commX,knockedDown(self,red(tower)))
⇒ goal(self,knockedDown(self,red(tower)))
If ¬ bel(self,potentiallyObl(self,knockedDown(self,red(tower)))):
want(commX,bel(self,goingTo(we,knockedDown(self,red(tower)))))
The general literal interpretation comes from a general pragmatic rule of the form:
Using these rules, the utterance “You’re going to have to knock down the red tower” is processed
in the following manner:
Parse Result:
Stmt(commX,self,goingTo(self,obl(self,knockedDown(self,red(tower)))),{})
If ¬ bel(self,potentiallyObl(self,knockedDown(self,red(tower)))):
want(commX, bel(self, goingTo(self,knockedDown(self,red(tower)))))
Suggestions such as “Why don’t you X” were handled through the following rule, where c =
{potentiallyObl(H, X)}:
The general, literal interpretation comes from a general pragmatic rule of the form:
Using these rules, the utterance “Why don’t you knock down the red tower” is processed in the
following manner:
Parse Result:
AskWH(commX,self,why(not(knockedDown(self,red(tower)))),{})
83
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
If ¬ bel(self,potentiallyObl(self,knockedDown(self,red(tower)))):
want(commX,informref(self,commX,why(not(knockedDown(self,red(tower))))))
Suggestions such as “Maybe X?” were handled through the following rule, where c =
{potentiallyObl(H, X)}:
The general, non-ISA interpretation is generated by Rule 2 in Section 4. Using these rules, the
utterance “Maybe knock down the red tower?” is processed in the following manner:
Parse Result:
AskYN(commX,self,maybe(knockedDown(self,red(tower))),{})
If ¬ bel(self,potentiallyObl(self,knockedDown(self,red(tower)))):
itk(commX,maybe(knockedDown(self,red(tower))))
Suggestions such as “Let’s try X?” were handled through the following rule, where c =
{potentiallyObl(H, X)}:
The general, non-ISA interpretation is handled by Rule 1 in Section 4. Using these rules, the
utterance “Let’s try knocking down the red tower?” is processed in the following manner:
Parse Result:
Stmt(commX,self,let(us,try(knockedDown(self,red(tower)))),{})
If ¬ bel(self,potentiallyObl(self,knockedDown(self,red(tower)))):
want(commX,bel(self,let(us,try(knockedDown(self,red(tower))))))
84
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
The general, non-ISA interpretation is again handled by Rule 1 in Section 4. Using these rules,
the utterance “I will have a coffee” is processed in the following manner:
Parse Result:
Stmt(commX,self,will(commX,have(commX,coffee)),{})
If ¬ bel(self,potentiallyObl(self,have(commX,coffee))):
want(commX,bel(self,will(commX,have(commX,coffee))))
Statements of the form “I’ll get X” were handled through the following rule, where c =
{potentiallyObl(H, X)}.
The general, non-ISA interpretation is again handled by Rule 1 in Section 4. Using these rules,
the utterance “I will get a coffee” is processed in the following manner:
Parse Result:
Stmt(commX,self,will(commX,get(commX,coffee)),{})
If ¬ bel(self,potentiallyObl(self,have(commX,coffee))):
want(commX,bel(self,will(commX,get(commX,coffee))))
Statements of the form “I’ll take X” were handled through the following rule, where c =
{potentiallyObl(H, X)}:
The general, non-ISA interpretation is again handled by Rule 1 in Section 4. Using these rules,
the utterance “I will take a coffee” is processed in the following manner:
85
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
Parse Result:
Stmt(commX,self,will(commX,take(commX,coffee)),{})
If ¬ bel(self,potentiallyObl(self,have(commX,coffee))):
want(commX,bel(self,will(commX,take(commX,coffee))))
Parse Result:
AskYN(commX,self,will(self,served(self,commX,coffee)),{})
If ¬ bel(self,potentiallyObl(self,served(self,commX,coffee))):
itk(commX, will(self, served(self,commX,coffee)))
Parse Result:
Stmt(commX,self,obl(self,served(self,commX,coffee)),{})
86
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
want(commX,served(self,commX,coffee))
⇒goal(self,served(self,commX,coffee))
If ¬ bel(self,potentiallyObl(self,served(self,commX,coffee))):
want(commX,bel(self,obl(self,served(self,commX,coffee))))
The general, non-ISA interpretation is handled by Rule 2 in Section 4. Using these rules, the utter-
ance “Can I have a coffee?” is processed in the following manner:
Parse Result:
AskYN(commX,self,can(commX,have(commX,coffee)),{})
If ¬ bel(self,potentiallyObl(self,have(commX,coffee))):
itk(commX, capableOf(commX, have(commX, coffee)))
Questions of the form “Could I X” were handled through the following rule, where c =
{potentiallyObl(H, X)}:
The general, non-ISA interpretation is handled by Rule 2 in Section 4. Using these rules, the
utterance “Could I have a coffee” is processed in the following manner:
Parse Result:
AskYN(commX,self,could(commX,have(commX,coffee)),{})
If ¬ bel(self,potentiallyObl(self,have(commX,coffee))):
itk(commX, could(commX, have(commX, coffee)))
87
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
Parse Result:
AskYN(commX,self,may(commX,have(commX,coffee)),{})
If ¬ bel(self,potentiallyObl(self,have(commX,coffee))):
itk(commX, may(commX, have(commX, coffee)))
Parse Result:
Stmt(commX,self,need(commX,have(commX,coffee)),{})
Statements of the form “I want X” were handled through the literal statement case (Rule 1 in
Section 4).
Using this rule, the utterance “I want a coffee” is processed in the following manner:
Parse Result:
Stmt(commX,self,want(commX,have(commX,coffee)),{})
88
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
Statements of the form “I would like X” and “I would love X” were handled through the
following rules, where c = {potentiallyObl(H, X)}:
The general, non-ISA interpretation is handled by Rule 1 in Section 4. Using these rules, the
utterance “I would like a coffee” is processed in a similar manner to the above cases.
Parse Result:
Stmt(commX,self,shouldBe(red(tower),knockedDown(red(tower))),{})
If ¬ bel(self,potentiallyObl(self,knockedDown(self,red(tower)))):
want(commX,bel(self,shouldBe(red(tower),knockedDown(red(tower)))))
6. General Discussion
Indirect speech acts are an integral part of human-human communication. The ability to commu-
nicate our intentions indirectly allows us, as just one example, to better achieve our goals through
the help of others, without straining our social relationships. As robots’ capabilities increase, so too
will their status as agents be increased in perception. With this increase in perceived agency, it will
become increasingly difficult for us to avoid carrying over behaviors such as ISAs from our social
interactions with humans into our interactions with robots. And it will be just as hard not to make
pejorative inferences about robots when they speak or act in ways that unknowingly violate those
social conventions.
While it has been suspected for some time that we are rapidly approaching this point, the exper-
imental work we have presented provides the first empirical evidence that we have reached it. For
many years, the lack of good speech recognition has been the primary obstacle on the path to natural
human-robot dialogue. This year, for the first time, word error rates on the Switchboard Corpus are
dipping below double digits (Saon, Sercu, Rennie, & Kuo, 2016). While the state-of-the-art word
error rate of 6.9% is still too high for natural human-robot dialogue, it is low enough that speech
89
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
recognition can no longer be considered the main source of error in natural language understanding.
In our second experiment, a robot participating in a restaurant scenario (a domain not far removed
from the domains of interest for many HRI researchers) suffered a 28% utterance error rate due to its
inability to understand indirect speech acts. We take this as evidence that natural-language capable
robots employed in realistic task-based environments will increasingly find not speech recognition
errors, but semantic and pragmatic errors, to be the dominant source of error in their interactions.
90
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
and generation systems that facilitate task-based human-robot interactions (Lemaignan, Ros, Sis-
bot, Alami, & Beetz, 2012; She et al., 2014). Some even focus on issues of mental modeling and
perspective-taking (Warnier, Guitton, Lemaignan, & Alami, 2012). However, these architectures do
not address issues such as ISA understanding.
The architectures that do attempt to enable robots to understand ISAs (Williams, Briggs, Oost-
erveld, & Scheutz, 2015; Wilske & Kruijff, 2006), however, do not recognize dialogue obligations
generated by the literal/surface forms of the received utterances in addition to those generated by
non-literal semantics. Instead, they seek only to detect non-literal directives and then respond ac-
cordingly (either answering only the literal non-directive or the non-literal directive). For example,
Wilske and Kruijff (2006) also utilize this behavior pattern to avoid having the robot say “I’m un-
available/busy” in the case when it receives a non-literal directive when it is in “non-servant” mode.
Instead, the robot simply responds to the literal form of the utterance in order to at least satisfy
discourse obligations (Wilske & Kruijff, 2006).
6.2.2 TRIPS
In contrast with many NL approaches implemented in robotic architectures that either ignore or make
implicit many of facets of NL interaction, one of the most prevalent NL architectures that engage in
explicit reasoning about dialogues is the TRIPS architecture. However, making direct comparison
with TRIPS is somewhat tricky, as there are often no particular rules or algorithmic commitments
specified in the relevant literature. For instance, (Allen et al., 2001) describes the handling of an
ISA in which both the literal surface form and non-literal interpretation are processed. However, the
only details given regarding how the literal semantics are processed are that the system recognizes
and processes an obligation to “RESPOND-TO” the surface utterance. Nonetheless, we believe we
can identify at least two points of relative strength.
The two components within the TRIPS architecture responsible for handling ISAs and determin-
ing the type of response entailed by the received utterance are the Interpretation Manager (IM) and
Discourse Context Component (DCC), respectively (Allen et al., 2001). According to the TRIPS
website2 , the rules found in the TRIPS-IM are described in more detail in Hinkelman and Allen
(1989). The rules described in Hinkelman and Allen (1989) for identifying potential convention-
alized ISAs are primarily rules pertaining to the surface features of the utterance (e.g., Is “please”
used? Does the utterance align with one of the Searle’s ISA forms?) However, other pieces of ev-
idence can indicate ISAs. For instance, repeated use of the “Can you X?” construction, despite
having been shown that the robot is indeed capable of performing an action such as X. As such,
it would make sense for the pragmatic interpreter to have a rule that formalizes the notion of, “If
A asks B whether or not B is capable of doing X, and A already believes B is capable of doing
X, then interpret this as a directive.” Yet, it is not clear that TRIPS currently has the mechanisms
to do this at the initial, rule-based level. In our architecture, a rule dependent on a belief about the
speaker’s belief is no different than any other rule, as the set of contextual constraints for the rule C
can include terms that indicate the beliefs of other agents (and the dialogue component will query
the belief modeling component to ensure these constraints are satisfied). This direct connection with
the architecture’s belief component also would enable perceptual modulation of ISA understanding
as well. Additionally, the TRIPS architecture does not appear to model the sociolinguistic aspects of
utterances, modeled in our architecture by the θ values. These values are primarily utilized during
NL generation (influencing whether or not to generate literal or non-literal directives) and are not the
focus of this paper (see Briggs and Scheutz (2016) for more information about utterance selection
and politeness values).
2 https://www.cs.rochester.edu/research/cisd/projects/trips/architecture/interpretation manager.html
91
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
6.2.3 Summary
In this section, we have compared our rule-based pragmatics framework to other approaches that
seek to enable robotic agents to understand ISAs. While there are various specific and subtle differ-
ences between our architecture and implementation relative to other approaches, we view our key
contribution to be a framework and set of representations that bind utterances (speech act repre-
sentations) with a meaning representation that is much richer than what is found in other systems
and allows for reversing the direction of information flow in a straightforward manner. Not only
are we interested in simply solving the problem of associating the right intended meaning with the
observed utterance in the present context, but we are interested in associating the utterance with
additional semantic information that allows for distinguishing and ranking different utterance forms
during NL generation. The rules and reasoning processes used in other approaches to identify and
interpret ISAs do not inform the generation process (e.g., TRIPS contains a rule to interpret utter-
ances with “please” as directives, but this does not inform when the system should or should not use
this politeness softener).
7. Conclusion
We have presented experimental evidence that humans make frequent use of ISAs when commu-
nicating with robots, even in the simplest of task-based interaction contexts. This use of ISAs,
however, poses a significant challenge for future cognitive robotic architectures as they will have to
be able to make sense of such utterances. To address this challenge, we have presented mechanisms
for automatically understanding ISAs in different HRI contexts. Our findings provide both justifica-
tion for previous and ongoing research in AI and HRI on interpreting ISAs, as well as motivation for
future research. First, it will be important to continue development of ISA understanding algorithms
that increase both the breadth of handled ISA forms as well as the efficiency and accuracy of inten-
tion understanding. Second, it will also be important to develop mechanisms that allow robots to
automatically learn ISA understanding rules, based on an analysis of previous dialogue, reinforce-
ment signals, and direct explanations. Third, future work should determine how to best generate
clarification requests when a robot is unsure if it correctly interpreted an utterance, and how such
92
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
a request might be phrased so as to maximize the information that can be gleaned from an inter-
locutor’s response. Finally, HRI researchers should investigate whether the frequency or variety of
ISAs will differ when using robots with different morphologies, varying capabilities, or in different
experimental contexts.
Acknowledgments
This work was supported in part by ONR grants #N00014-14-1-0149 and #N00014-11-1-0493.
References
Allen, J. F., Byron, D. K., Dzikovska, M., Ferguson, G., Galescu, L., & Stent, A. (2001). Toward Conversational
Human-Computer Interaction. AI Magazine, 22, 27–37.
Baxter, P., Kennedy, J., Senft, E., Lemaignan, S., & Belpaeme, T. (2016). From characterising three years of
HRI to methodology and reporting recommendations. In the Eleventh ACM/IEEE International Confer-
ence on Human Robot Interation (pp. 391–398). Christchurch, NZ. doi:10.1109/HRI.2016.7451777.
Bratman, M. (1987). Intention, plans, and practical reason. Stanford, CA: Center for the Study of Language
and Information.
Briggs, G., & Scheutz, M. (2011). Facilitating Mental Modeling in Collaborative Human-Robot Interaction
through Adverbial Cues. In Proceedings of the sigdial 2011 conference (pp. 239–247). Portland, Oregon:
Association for Computational Linguistics.
Briggs, G., & Scheutz, M. (2013). A hybrid architectural approach to understanding and appropriately gener-
ating indirect speech acts. In Proceedings of the Twenth-Seventh AAAI Conference on Artificial Intelli-
gence (pp. 1213–1219). Bellevue, WA.
Briggs, G., & Scheutz, M. (2014). How robots can affect human behavior: Investigating the effects
of robotic displays of protest and distress. International Journal of Social Robotics, 6, 343–355.
doi:10.1007/s12369-014-0235-1.
Briggs, G., & Scheutz, M. (2015). “Sorry, I can’t do that”: Developing mechanisms to appropriately re-
ject directives in human-robot interactions. In AAAI Fall Symposium Series: Artificial Intelligence for
Human-Robot Interaction (pp. 32–36). Arlington, VA.
Briggs, G., & Scheutz, M. (2016). The pragmatic social robot: Toward socially-sensitive utterance generation
in human-robot interactions. In AAAI Fall Symposium Series: Artificial Intelligence for Human-Robot
Interaction (pp. 12–15). Arlington, VA.
Brown, P., & Levenson, S. C. (1987). Politeness: Some universals in language usage. Cambridge, MA:
Cambridge University Press.
Clark, H. H. (1996). Using language. New York, NY: Cambridge University Press.
doi:10.1017/cbo9780511620539.
Clark, H. H., & Schunk, D. H. (1980). Polite responses to polite requests. Cognition, 8, 111–143.
doi:10.1016/0010-0277(80)90009-8.
Gibbs Jr, R. W. (1986). What makes some indirect speech acts conventional? Journal of Memory and
Language, 25, 181–196. doi:10.1016/0749-596x(86)90028-8.
Gibbs Jr., R. W., & Mueller, R. A. (1988). Conversational sequences and preference for indirect speech acts.
Discourse Processes, 11, 101–116. doi:10.1080/01638538809544693.
Gomez, R., Kawahara, T., Nakamura, K., & Nakadai, K. (2012). Multi-party human-robot interaction with
distant-talking speech recognition. In Proceedings of the Seventh Annual ACM/IEEE International Con-
ference on Human-Robot Interaction (pp. 439–446). Boston, MA. doi:10.1145/2157689.2157835.
Gopalan, N., & Tellex, S. (2015). Modeling and solving human-robot collaborative tasks using POMDPs.
In Robotics: Science and Systems: Workshop on Model Learning for Human-Robot Communication.
Rome, Italy.
Hinds, P. J., Roberts, T. L., & Jones, H. (2004). Whose job is it anyway? A study of
human-robot interaction in a collaborative task. Human-Computer Interaction, 19, 151–181. doi:
10.1207/s15327051hci1901&2 7.
93
Briggs et al., Enabling Robots to Understand Indirect Speech Acts
Hinkelman, E. A., & Allen, J. F. (1989). Two constraints on speech act ambiguity. In Proceedings of the
27th Annual Meeting on Association for Computational Linguistics (pp. 212–219). Vancouver, Canada.
doi:10.3115/981623.981649.
Lemaignan, S., Ros, R., Sisbot, E. A., Alami, R., & Beetz, M. (2012). Grounding the interaction: Anchoring
situated discourse in everyday human-robot interaction. International Journal of Social Robotics, 4,
181–199. doi:10.1007/s12369-011-0123-x.
Perrault, C. R., & Allen, J. F. (1980). A plan-based analysis of indirect speech acts. Computational Linguistics,
6, 167–182.
Rich, C., Ponsler, B., Holroyd, A., & Sidner, C. L. (2010). Recognizing engagement in human-robot interaction.
In Proceedings of the Fifth ACM/IEEE International Conference on Human-Robot Interaction (pp. 375–
382). Osaka, Japan. doi:10.1109/hri.2010.5453163.
Saon, G., Sercu, T., Rennie, S. J., & Kuo, H. J. (2016). The IBM 2016 English Conversational Telephone
Speech Recognition System. CoRR, abs/1604.08242, 1–5.
Schermerhorn, P., Kramer, J. F., Middendorff, C., & Scheutz, M. (2006). DIARC: A testbed for natural human-
robot interaction. In Proceedings of the AAAI Conference on Artificial Intelligence (pp. 1972–1973).
Scheutz, M., Briggs, G., Cantrell, R., Krause, E., Williams, T., & Veale, R. (2013). Novel mechanisms for
natural human-robot interactions in the DIARC architecture. In Proceedings of the AAAI Workshop on
Intelligent Robotic Systems.
Scheutz, M., Schermerhorn, P., Kramer, J., & Anderson, D. (2007). First steps toward natural human-like HRI.
Autonomous Robots, 22, 411–423.
Schlöder, J. J. (2014). Uptake, clarification and argumentation. Unpublished master’s thesis, Universiteit van
Amsterdam.
Schuirmann, D. (1981). On hypothesis-testing to determine if the mean of a normal-distribution is contained
in a known interval. Biometrics, 37(3), 617–617.
Searle, J. R. (1975). Indirect speech acts. Syntax and Semantics, 3, 59–82.
Searle, J. R. (1976). A classification of illocutionary acts. Language in Society, 5, 1–23.
doi:10.1017/s0047404500006837.
Shafer, G. (1976). A mathematical theory of evidence. Princeton, NJ: Princeton University Press.
She, L., Yang, S., Cheng, Y., Jia, Y., Chai, J. Y., & Xi, N. (2014). Back to the blocks world: Learning new
actions through situated human-robot dialogue. In Proceedings of the Fifteenth Annual Meeting of the
Special Interest Group on Discourse and Dialogue (pp. 89–97). Philadelphia, PA.
Tellex, S., Thaker, P., Deits, R., Simeonov, D., Kollar, T., & Roy, N. (2013). Toward information theoretic
human-robot dialog. In Robotics: Science and Systems VIII (pp. 409–416). Cambridge, MA: The MIT
Press. doi:10.15607/rss.2012.viii.052.
Warnier, M., Guitton, J., Lemaignan, S., & Alami, R. (2012). When the robot puts itself in your shoes. Manag-
ing and exploiting human and robot beliefs. In Proceedings of the 21st IEEE International Symposium
on Robot and Human Interaction Communication (pp. 948–954). Paris, France.
Westlake, W. (1981). Bioequivalence testing–a need to rethink. Biometrics, 37, 589–594.
Williams, T., Briggs, G., Oosterveld, B., & Scheutz, M. (2015). Going beyond command-based instructions:
Extending robotic natural language interaction capabilities. In Proceedings of AAAI Conference on
Artificial Intelligence (pp. 1388–1393). Austin,TX.
Wilske, S., & Kruijff, G.-J. M. (2006). Service robots dealing with indirect speech acts. In Proceedings of
the IEEE/RSJ International Conference on Intelligent Robots and Systems (pp. 4698–4703). Beijing,
China.
Young, S., Gašić, M., Keizer, S., Mairesse, F., Schatzmann, J., Thomson, B., & Yu, K. (2010). The hid-
den information state model: A practical framework for POMDP-based spoken dialogue management.
Computer Speech & Language, 24, 150–174.
Gordon Briggs, Naval Research Laboratory, Washington, DC, USA. Email: gor-
[email protected]; Tom Williams, Tufts University, Medford, MA, USA. Email:
[email protected]; Matthias Scheutz, Tufts University, Medford, MA, USA. Email:
[email protected]
94