0% found this document useful (0 votes)
22 views

Fur Chat

Uploaded by

nurbanuyilmaz365
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Fur Chat

Uploaded by

nurbanuyilmaz365
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

FurChat: An Embodied Conversational Agent using LLMs, Combining

Open and Closed-Domain Dialogue with Facial Expressions

Neeraj Cherakara, Finny Varghese, Sheena Shabana, Nivan Nelson


Abhiram Karukayil, Rohith Kulothungan, Mohammed Afil Farhan,
Birthe Nesset, Meriam Moujahid, Tanvi Dinkar, Verena Rieser, Oliver Lemon†
Interaction Lab, Heriot-Watt University ; †Alana AI
{nc2025, fv2002, ss2022, nn2023, ak2120, rk2065, mf2034, bn25
mm470, t.dinkar, v.t.rieser, o.lemon}@hw.ac.uk
arXiv:2308.15214v2 [cs.CL] 30 Aug 2023

Abstract
We demonstrate an embodied conversational
agent that can function as a receptionist and
generate a mixture of open and closed-domain
dialogue along with facial expressions, by us-
ing a large language model (LLM) to develop
an engaging conversation. We deployed the
system onto a Furhat robot, which is highly
expressive and capable of using both verbal
and nonverbal cues during interaction. The sys-
tem was designed specifically for the National Figure 1: A user interacting with the FurChat System.
Robotarium to interact with visitors through
natural conversations, providing them with in-
formation about the facilities, research, news, interaction with humans to little or no freedom of
upcoming events, etc. The system utilises the choice in answers (Tudor Car et al., 2020). The ad-
state-of-the-art GPT-3.5 model to generate such vancement of large language models (LLMs) in the
information along with domain-general conver- past year has brought an exciting revolution in the
sations and facial expressions based on prompt field of natural language processing. With the de-
engineering.
velopment of models like GPT-3.51 , we have seen
unprecedented progress in tasks such as question-
1 Introduction
answering and text summarization (Brown et al.,
The progress in robotics and artificial intelligence 2020). However, a question remains about how to
in recent decades has led to the emergence of robots successfully leverage the capabilities of LLMs to
being utilized beyond their conventional industrial create systems that can go from closed domain to
applications. Robot receptionists are designed to open, while also considering the embodiment of
interact with and assist visitors in various places the system.
like offices, hotels, etc. by providing information In this work, we present FurChat2 , an embodied
about the location, services, and facilities. The conversational agent that utilises the latest advances
appropriate use of verbal and non-verbal cues is in LLMs to create a more natural conversational
very important for the robot’s interaction with hu- experience. The system seamlessly combines open
mans (Mavridis, 2015). Most research in the field and closed-domain dialogues with emotive facial
has been mainly focused on developing domain- expressions, resulting in an engaging and person-
specific conversation systems, with little explo- alised interaction for users. The system was ini-
ration into open-domain dialogue for social robots. tially designed and developed to serve as a recep-
Conventional agents are often rule-based, which 1
https://platform.openai.com/docs/
means they rely on pre-written commands and key- models/gpt-3-5
2
words that are pre-programmed. This limits the A demonstration video of the system is available here.
Figure 2: System Architecture of the current FurChat system.

tionist for the National Robotarium, in continuation 3 System Architecture


of the multi-party interactive model developed by
Moujahid et al. (2022b), and its deployment shows As shown in Figure 2, the system architecture repre-
promise in other areas due to the LLMs versatile sents a conversational system that enables users to
capabilities. As a result, the system is not limited interact with a robot through spoken language. The
to the designated receptionist role, but can also system involves multiple components, including
engage in open-domain conversations, thereby en- automatic speech recognition (ASR) for converting
hancing its potential as a multifunctional conversa- user speech to text, natural language understanding
tional agent. We demonstrate the proposed conver- (NLU) for processing and interpreting the text, a di-
sational system on a Furhat robot (Al Moubayed alogue manager (DM) for managing the interaction
et al., 2013) which is developed by the Swedish flow, and natural language generation (NLG) pow-
firm Furhat Robotics3 . With FurChat, we demon- ered by GPT-3.5 for generating natural sounding
strate the possibility of LLMs for creating a more responses(Ross et al., 2023). The generated text is
natural and intuitive conversation with robots. then converted back to speech using text-to-speech
(TTS) technology and played through the robot’s
speaker to complete the interaction loop. The sys-
2 Furhat Robot tem relies on a database to retrieve relevant data
based on the user’s intent.
Furhat is a social robot created by Furhat Robotics.
To interact with humans naturally and intuitively, 3.1 Speech Recognition
the robot employs advanced conversational AI and
The current system uses the Google Cloud Speech-
expressive facial expressions. A three-dimensional
to-Text4 module for ASR. This module, which tran-
mask that mimics a human face is projected
scribes spoken words into text using machine learn-
with an animated face using a microprojector
ing algorithms, is integrated into the system by
(Al Moubayed et al., 2013). A motorised platform
default through the Furhat SDK.
supports the robot’s neck and head, allowing the
platform’s head to spin and node. To identify and 3.2 Dialogue Management
react to human speech, it has a microphone array
and speakers. Due to the human-like appearance Dialogue Management consists of three sub-
of Furhat, it is prone to the uncanny valley effect modules: NLU, DM and a database storage. The
(Ågren and Silvervarg, 2022). NLU component analyses the incoming text from
4
https://cloud.google.com/
3
https://furhatrobotics.com/ speech-to-text
the ASR module and, through machine learning response format. Moreover, the prompt engineer-
techniques, breaks it down into a structured set of ing methodology involves using the LLM to gener-
definitions (Otter et al., 2021). The FurhatOS pro- ate an appropriate emoticon based on the conversa-
vides an NLU model to classify the text into intents tion. In the context of emotional expression during
based on a confidence score. We provide multi- an interaction, selecting an appropriate emoticon
ple custom intents for identifying closed-domain depends on understanding the underlying emotions
intents using Furhat’s NLU capabilites. being conveyed by the visitors and adhering to the
The in-built dialogue manager in the Furhat SDK display rules of the specific social situation. If
is responsible for maintaining the flow of conver- the dialogue reflects joy or humor, a happy facial
sation and managing the dialogue state based on gesture might be fitting. On the other hand, if
the intents identified by the NLU component. This the conversation conveys empathy or sadness, a
module is responsible for sending the appropriate sad face could be more suitable. These emoticons
prompt to the LLM, receiving a candidate response are then integrated with the robot’s facial gestures
from the model, and subsequent processing of the to generate facial expressions (see §3.4), thereby
response to add in desired facial gestures (see §3.4). enabling a text-based LLM to integrate in the em-
An open challenge faced by present-day LLMs bodied Furhat robot. The explicit specification of
is the hallucination of nonfactual content, which the personality and context in the prompt aids in
potentially undermines user trust and raises con- creating a natural conversation between the robot
cerns of safety. While we cannot fully mitigate and the human that is coherent and relevant to the
hallucinated content in the generated responses, in topic. The sample format of the prompt used is as
order to tone-down this effect, we create a custom follows:
database following suggestions from Kumar (2023). This is a conversation with a robot receptionist,
We do so by manually web-scraping the website of <Robot Personality>, <Data from the Database>,
the National Robotarium5 . The database consists <Dialogue history>, <Response Format along with
of a dictionary of items with the intents as keys sample emoticons>.
and scraped data as values. When an appropriate
intent is triggered, the dialogue manager accesses 3.4 Gesture Parsing
the database to retrieve the scraped data, which is The Furhat SDK offers a range of built-in facial
then sent with the prompt (further details in §3.3)) gestures that can be enhanced by custom facial
to elicit a response from the LLM. gestures that meet specific needs. The latest GPT
models have the ability to recognise emotions and
3.3 Prompt engineering for NLG
sentiments from text, which is used in the system
The NLG module is responsible for generating a (Leung et al., 2023). Rather than simply recognis-
response based on the request from the dialogue ing sentiments in the text, the model is tasked with
manager. Prompt engineering is done to elicit an ap- generating appropriate emotions for the conversa-
propriate sounding response from the LLM, which tion from the text After receiving the response from
generates natural dialogue that results in engaging the model, the matched conditional clause in the
conversations with humans. The current system dialogue manager will trigger an expression from
uses text-davinci-003, which is one of the the pre-developed set of gestures, which will be
most powerful models in the GPT-3.5 series and it triggered along with the generated speech.
is priced at $0.0200 per 1000 tokens.
Producing relevant responses was achieved us- 3.5 Text-To-Speech Generation
ing the combined technique of few-shot learning
For converting the text to speech, the Amazon
and prompt engineering, which enabled us to try
Polly6 service is used. This service is available
different variations in techniques and produce a
within FurhatOS by default.
variety of output by the LLM.
During prompt engineering, the personality of 4 Example Dialogue
the robot and the context of the application are de-
scribed, along with the past few dialogue histories An example dialogue between a human and the
and scraped data from the database in a particular robot is illustrated in Table 1.
5 6
https://thenationalrobotarium.com/ https://docs.aws.amazon.com/polly/
At the outset, the robot remains in an idle state. Once Acknowledgements
the user enters the vicinity of the robot, the conversation
begins. This research has been funded by the EU H2020
R: [robot] Hello, I am the Receptionist here at the Na- program under grant agreement no. 871245
tional Robotarium. Would you like to know about this
facility? <Robot smiles> (http://spring-h2020.eu/) and the EP-
U: [user] Yes, tell me about this facility. SRC project ‘Gender Bias in Conversational AI’
R: [robot] Sure. <Robot smiles> (EP/T023767/1).
R: [robot] The National Robotarium is a state-of-the art
research facility located on the Heriot-Watt campus in
Edinburgh. . . .
References
U: [user] That’s great. Is there any ongoing healthcare
researches here Angus Addlesee, Weronika Sieinska, Nancie Gunson,
R: [robot] Yes. <Robot smiles> The National Robotar- Daniel Hernandez Garcia, Christian Dondrup, and
ium is part of a multimillion-pound collaborative project Oliver Lemon. 2023. Data collection for multi-party
called SPRING and is focused on developing socially as- task-based dialogue in social robotics. In IWSDS
sistive robots for healthcare settings. SPRING will focus 2023: International Workshop on Spoken Dialogue
on supporting elderly patients and improving psycholog- Systems Technology.
ical well-being and the relationship between patients and
hospital professionals. <Robot nod> Isabella Ågren and Annika Silvervarg. 2022. Exploring
U: [user] That’s nice. Can you tell a joke? humanlikeness and the uncanny valley with furhat.
R: [robot] <Robot smiles> Sure, Why did the robot go In Proceedings of the 22nd ACM International Con-
to the bar? <Robot winks> To get a screwdriver! ference on Intelligent Virtual Agents, pages 1–3.
U: [user] Can you suggest me a movie about robots?
R: [robot] Sure thing! <Robot smiles> How about "I, Samer Al Moubayed, Jonas Beskow, and Gabriel
Robot"? It’s a classic that deals with the exploration of Skantze. 2013. The furhat social companion talk-
robotics and AI. <Robot nod> ing head. In INTERSPEECH, pages 747–749.
Tom Brown, Benjamin Mann, Nick Ryder, Melanie
Table 1: Sample Conversation between the user and the Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind
robot. For a full system description, please refer to §3. Neelakantan, Pranav Shyam, Girish Sastry, Amanda
Askell, et al. 2020. Language models are few-shot
learners. Advances in neural information processing
systems, 33:1877–1901.
5 Conclusions and Future Work
Nancie Gunson, Daniel Hernández García, Weronika
We demonstrate FurChat, an embodied conver- Sieińska, Christian Dondrup, and Oliver Lemon.
sational agent with open and closed domain di- 2022. Developing a social conversational robot for
alogue generation and facial expressions generated the hospital waiting room. In 2022 31st IEEE Inter-
national Conference on Robot and Human Interac-
through LLMs, on a social robot in a receptionist tive Communication (RO-MAN), pages 1352–1357.
environment. The system is developed by inte- IEEE.
grating the state-of-the-art GPT-3.5 model on top
Krishna Kumar. 2023. Geotechnical Parrot Tales (GPT):
of the Furhat SDK. The proposed system uses a Overcoming GPT hallucinations with prompt engi-
one-to-one interaction method of communication neering for geotechnical applications. arXiv preprint
with the visitors. We plan on extending the sys- arXiv:2304.02138.
tem to handle multi-party interaction (Moujahid Oliver Lemon. 2022. Conversational AI for multi-agent
et al., 2022a; Addlesee et al., 2023; Lemon, 2022; communication in Natural Language. AI Communi-
Gunson et al., 2022), which is an active research cations, 35(4):295–308.
topic in developing receptionist robots. It is also John Kalung Leung, Igor Griva, William G Kennedy,
crucial to address the issue of hallucination from Jason M Kinser, Sohyun Park, and Seo Young Lee.
the large language model and this problem can be 2023. The application of affective measures in text-
mitigated by fine-tuning the language model and based emotion aware recommender systems. arXiv
preprint arXiv:2305.04796.
directly generating conversations from it without
relying on any NLU components which we plan to Nikolaos Mavridis. 2015. A review of verbal and
non-verbal human–robot interactive communication.
implement in the future.
Robotics and Autonomous Systems, 63:22–35.
We plan to showcase the system on the Furhat
robot during the SIGDIAL conference to all the Meriam Moujahid, Helen Hastie, and Oliver Lemon.
2022a. Multi-party interaction with a robot recep-
attendees and show them the capabilities of using tionist. In Proceedings of the 2022 ACM/IEEE In-
LLMs for dialogue and facial expression generation ternational Conference on Human-Robot Interaction,
as described in this paper. HRI ’22, page 927–931. IEEE Press.
Meriam Moujahid, Bruce Wilson, Helen Hastie, and
Oliver Lemon. 2022b. Demonstration of a robot
receptionist with multi-party situated interaction.
In Proceedings of the 2022 17th ACM/IEEE Inter-
national Conference on Human-Robot Interaction
(HRI), pages 1202–1203. IEEE.
Daniel W. Otter, Julian R. Medina, and Jugal K. Kalita.
2021. A Survey of the Usages of Deep Learning for
Natural Language Processing. IEEE Transactions on
Neural Networks and Learning Systems, 32(2):604–
624.
Steven I. Ross, Fernando Martinez, Stephanie Houde,
Michael Muller, and Justin D. Weisz. 2023. The
programmer’s assistant: Conversational interaction
with a large language model for software develop-
ment. IUI ’23, page 491–514, New York, NY, USA.
Association for Computing Machinery.

Lorainne Tudor Car, Dhakshenya Ardhithy Dhinagaran,


Bhone Myint Kyaw, Tobias Kowatsch, Shafiq Joty,
Yin-Leng Theng, and Rifat Atun. 2020. Conversa-
tional agents in health care: Scoping review and con-
ceptual analysis. J Med Internet Res, 22(8):e17158.

You might also like