Voice_Assistant_vs_Chatbot_-_Examining_the_Fit_Bet (1)
Voice_Assistant_vs_Chatbot_-_Examining_the_Fit_Bet (1)
https://doi.org/10.1007/s10796-021-10226-5
Abstract
Owing to technological advancements in artificial intelligence, voice assistants (VAs) offer speech as a new interaction
modality. Compared to text-based interaction, speech is natural and intuitive, which is why companies use VAs in customer
service. However, we do not yet know for which kinds of tasks speech is beneficial. Drawing on task-technology fit theory,
we present a research model to examine the applicability of VAs to different tasks. To test this model, we conducted a
laboratory experiment with 116 participants who had to complete an information search task with a VA or a chatbot. The
results show that speech exhibits higher perceived efficiency, lower cognitive effort, higher enjoyment, and higher service
satisfaction than text-based interaction. We also find that these effects depend on the task’s goal-directedness. These find-
ings extend task-technology fit theory to customers’ choice of interaction modalities and inform practitioners about the use
of VAs for information search tasks.
Keywords voice assistant · conversational agent · speech interaction · cognitive fit · customer service
1 Introduction which interact in written form, VAs engage the user in con-
versations that are set up to assist in various tasks, such as
With rapid advances in artificial intelligence, specifically the answering of questions and smart home control (Hoy,
natural language processing, speech holds promise as 2018). Both VAs and chatbots can be referred to as con-
an interaction modality between humans and computers versational agents (CAs) (Diederich et al., 2022; Gnewuch
(Enholm et al., 2021; Hirschberg & Manning, 2015). Voice et al., 2017). Companies increasingly adopt CAs in banking,
assistants (VAs) enable speech interaction with existing healthcare, and e-commerce, with customer service being a
devices (e.g., smartphones) and serve as the main interface key use case (Behera et al., 2021; Woodford, 2020). A cus-
of new devices (e.g., smart speakers). Similar to chatbots, tomer service encounter describes “any customer-company
interaction that results from a service system that is com-
* Christine Rzepka prised of interrelated technologies, human actors (employees
[email protected] and customers), physical/digital environments and company/
Benedikt Berger customer processes” (Larivière et al., 2017, p. 2). CAs ena-
[email protected] ble cost savings while still providing human-like interaction,
Thomas Hess thereby bridging the existing service provision gap between
[email protected] frontline employees and self-service technologies (Adam
et al., 2021; Larivière et al., 2017). However, the availability
1
Institute for Digital Management and New Media, Ludwig- of speech- and text-based conversational interfaces to com-
Maximilians-Universität München (LMU Munich),
Ludwigstr. 28, 80539 Munich, Germany municate with companies raises questions about the impact
2 of the interaction modalities on customer service encounters
Department of Information Systems, University of Münster,
Leonardo‑Campus 11, 48149 Münster, Germany (De Keyser et al., 2019).
3 Extant literature has examined costumers’ satisfaction
Institute for Digital Management and New Media,
Ludwig-Maximilians-Universität München (LMU Munich), with service encounters to assess whether offering a CA is
Ludwigstr. 28, 80539 Munich, Germany beneficial (Diederich et al., 2019; Diederich et al., 2020).
13
Vol.:(0123456789)
In the service literature, customers’ satisfaction with the How do CAs’ interaction modalities fit different informa-
service is used as a measure for the company’s success tion search tasks in the service encounter?
in addressing their expectations and needs (Dai & Salam, To address this research question, we investigate the
2020; McKinney et al., 2002). To satisfy customers, ser- applicability of speech- and text-based conversational inter-
vice encounters must provide them with both utilitarian and action to different information search task types. Because
hedonic value (Childers et al., 2001; Chiu et al., 2014; Jones speech- and text-based interactions differ in the way humans
et al., 2006). Because satisfied customers tend to continue produce, transmit, and process information, we use cogni-
using a service (Bhattacherjee, 2001) or to buy repeatedly tive load theory and cognitive fit theory (Vessey & Gal-
(Bartl et al., 2013), satisfaction is an appropriate construct letta, 1991) as underlying theories to derive our hypotheses.
to evaluate the effect of CAs’ distinct interaction modalities Based on this theoretical foundation, task-technology fit
on service encounters. Indeed, existing research compar- theory (Goodhue & Thompson, 1995) guides the inclu-
ing speech- and text-based interactions shows that speaking sion of the task type in the research model. We conduct a 2
generates positive attitudes toward the CA (Cho et al., 2019; (interaction modality: speech vs. text) × 2 (search task type:
Novielli et al., 2010; Schroeder & Schroeder, 2018). These goal-directed vs. experiential) between-subject laboratory
attitudes are driven by both utilitarian and hedonic values: experiment to test our research model. The results show
Studies on users’ perceived efficiency find that speaking is that the search task’s perceived goal-directedness moder-
faster than typing, which increases users’ productivity (Le ates the interaction modality’s effect on users’ perceptions
Bigot et al., 2007; Ruan et al., 2017). Speech is also more of the interaction. Thereby, our study extends cognitive fit
intuitive and natural (Kock, 2004), increasing its ease of theory to conversational information requests and presen-
use and decreasing cognitive effort. Users also report sim- tations and task-technology fit theory to users’ choice of
ply enjoying the interaction, which stresses VAs’ hedonic interaction modalities. Our findings further inform practi-
benefits (Pal et al., 2020; Yang & Lee, 2019). However, tioners about the benefits of speech-based interactions over
research explicitly comparing speech- and text-based inter- text-based interactions, and of their effects on customers’
actions reports inconclusive results. While speaking is faster satisfaction. Considering that visual and auditory modalities
and easier (Ruan et al., 2017), listening to the CA’s answers suit distinct types of information and tasks in different ways,
reduces the interaction speed, thereby increasing the relative it is essential that service providers understand how differ-
efficiency of text-based conversations (Le Bigot et al., 2004). ent tasks and interaction modalities jointly affect customers’
Whether speaking or texting is more beneficial during satisfaction in the service encounter (De Keyser et al., 2019;
a service encounter depends on the type of task the cus- Lee et al., 2001). In this way, depending on the task’s goal-
tomer wants to solve. Extant service literature distinguishes directedness, system designers can learn when and how the
between (goal-directed) searching and (experiential) brows- offering of speech as an interaction modality is beneficial.
ing tasks (Hoffman & Novak, 1996; Hong et al., 2004). This paper is structured as follows: In section 2, as the
Depending on the customers’ goal when interacting with, conceptual background of our study, we provide an overview
e.g., a chatbot, the CA’s conversational style should be of extant research on CAs and the relevant theories. Sec-
more or less friendly (Chen et al., 2021). Initial research tion 3 introduces the research model and derives the under-
on speech- and text-based CAs found that users’ prefer- lying hypotheses. We describe the experiment including its
ence for one interaction modality over another can change procedure, measurements, and data collection process in sec-
depending on the task and context of the interaction. Krämer tion 4 and report our analyses’ results in section 5. Section 6
et al. (2009) show that users’ preference for text interaction presents the main findings and indicates their implications
decreased after service failures, while speech interaction in the light of extant literature. The paper briefly concludes
preferences increased. Cho et al. (2019) find that attributing in section 7.
a human likeness mediates users’ attitudes in utilitarian tasks
but not in hedonic tasks. Additionally, VA users frequently
adopt VAs for simple routine tasks, but remain reluctant to 2 Conceptual Background
use them for more complex tasks (Ma & Liu, 2020). Con-
sequently, no interaction modality is superior to the other. 2.1 Voice Assistants, Conversational Agents,
While previous research has examined the fit between speci- and the Service Encounter
fied tasks and technologies to support these tasks (Goodhue
& Thompson, 1995), the literature has not yet considered Although first efforts to build speech-based systems date
differences between speech- and text-based interaction for back to 1950, the diffusion of VAs was only recently ena-
information search tasks. Against this background, we aim bled by advancements in natural language processing and
to answer the following research question: the availability of large volumes of linguistic data (Hinton
et al., 2012; Hirschberg & Manning, 2015). VAs minimally
13
consist of an automatic speech recognizer that interprets service efficiency and service quality (Adam et al., 2021; De
human speech, a dialogue manager that determines and Keyser et al., 2019). As indicated, CAs provide both utilitar-
performs the requested action, and text-to-speech synthe- ian and hedonic benefits, which drive customers’ satisfac-
sis that responds in a spoken form (Hirschberg & Manning, tion with the service and related outcomes (Childers et al.,
2015). While speech- and text-based CAs face the same 2001; Chiu et al., 2014; Jones et al., 2006). While Mero
challenges in interpreting language, advances in artificial (2018) shows that a text-based CA’s interactivity has posi-
intelligence enabled VAs to successfully map sound signals tive effects on customers’ satisfaction, prior research on hap-
to sequences of words, thereby overcoming what is arguably tic interfaces emphasizes that the interaction modality can
the biggest difficulty in attaching meaning to speech (Hinton be as important as the content itself (Brasel & Gips, 2014).
et al., 2012). Various VAs followed Apple’s launch of Siri in For instance, extant studies comparing speech and manual
2011, each offering a new interaction modality using speech expression modalities show that speech reduces customers’
as input and output to represent information (Hoy, 2018). self-control, leading to different product choices (Klesse
Existing information systems (IS) research on VA use et al., 2015) and consumption behaviors (Son & Oh, 2018).
distinguishes between utilitarian and hedonic drivers and CAs are implemented in multiple steps and tasks along
inhibitors. As speech is a natural and intuitive way of the service process, starting with the information search
interacting (Kock, 2004), various studies stress its utilitar- and continuing through purchasing products to filing a
ian benefits of convenience, ease of use, and time saving complaint. In this context, extant research identified two
through hands-free possibilities (Balakrishnan et al., 2021; major search task types that customers conduct online: goal-
McLean & Osei-Frimpong, 2019). In addition, users enjoy directed (searching) tasks and experiential (browsing) tasks
talking to their VAs, thus deriving hedonic value from their (Hong et al., 2004). Searching is characterized by a clearly
interactions (Pal et al., 2020; Rzepka, 2019; Yang & Lee, defined goal and is extrinsically motivated. In contrast,
2019). Although users’ enjoyment could be attributed to browsing is exploratory and non-directed, being intrinsically
the newness of the technology (McLean & Osei-Frimpong, motivated (Hoffman & Novak, 1996). These two task types
2019), users may also enjoy the social aspect of engaging describe two instantiations of the specificity of customers’
with a VA. Extant research shows that VAs trigger users’ search objectives, which translate into a continuum of the
perceptions of human likeness (Cho et al., 2019) and social customers’ perceived goal-directedness. Although custom-
presence (McLean & Osei-Frimpong, 2019), which also ers may have already successfully adopted CAs for tasks
encourages more personal dialogues (Novielli et al., 2010). they perceive as highly goal-directed, they show a reluctance
Users’ perceptions of human likeness are driven by VAs’ when perceiving tasks as less goal-directed, especially when
capabilities to communicate in natural and interactive ways, using VAs (Ma & Liu, 2020). However, until now, service
e.g., through synthesized speech, and to answer in a respon- literature has mainly focused on visual search behaviors and
sive manner as they are able to process human speech (Li, distinguished different display characteristics and informa-
2015). Extant research shows that these social cues elicit tion formats such as lists and matrices (e.g., Hong et al.,
social responses from users who perceive the IT system as a 2004; Nadkarni & Gupta, 2007). More recently, Chen et al.
social actor (Moon, 2000; Nass & Moon, 2000). Social cues (2021) distinguished between these tasks to examine the fit
not only promote use intentions (McLean & Osei-Frimpong, of a chatbot’s distinct conversational styles to the custom-
2019), but also attenuate privacy concerns that present a ers’ goal. Overall, little is known on the differences between
major adoption barrier (Benlian et al., 2020). These cues spoken and written interactions and the interplay between
are subject to various IS studies on text-based CAs (i.e., customers’ information requests and the system’s informa-
chatbots) (e.g., Adam et al., 2021; Diederich et al., 2019; tion presentation in a two-way conversation. In order to
Gnewuch et al., 2017). However, direct comparisons suggest effectively design voice-based experiential search, it is nec-
that speech differs from text in its impact on users’ attitudes essary to understand the differences between speech- and
(Cho et al., 2019; Novielli et al., 2010) and task performance text-based searches for different information search tasks
(Le Bigot et al., 2004; Le Bigot et al., 2007). This raises the (Ma & Liu, 2020) and to extend initial research efforts in
need for further research on the hedonic and utilitarian dif- this field (Chen et al., 2021).
ferences between chatbot and VA use.
The question regarding differences between speech and 2.2 Cognitive Fit Theory and Task‑Technology Fit
text specifically becomes relevant in domains where humans Theory
can choose between interaction modalities, e.g., service
encounters (De Keyser et al., 2019; Larivière et al., 2017). Research in human-computer interaction and linguistics
The application of CAs in this context is promising as they confirms that speech and text differ in their modes of pro-
offer the customer efficient, yet still personalized service, duction, transmission, and reception. Speaking and listening
thus addressing the key challenges of customer service: develop naturally and intuitively, while writing and reading
13
develop only through formal learning (Akinnaso, 1982). as information search (Dang et al., 2020; Hong et al., 2004).
Being more natural than writing, speaking requires less cog- Additionally, Chen et al. (2021) apply cognitive fit theory to
nitive effort (Kock, 2004; Le Bigot et al., 2007). Moreover, investigate the matching of a chatbot’s interaction style with
physical effort is reduced through hands-free use (Akinnaso, goal-directed and experiential tasks. Based on this theoretical
1982). In addition, speech is of a temporal nature, requiring framework, the following section derives our research model
sequential information presentation, whereas written text on the applicability of CAs’ interaction modalities for distinct
can be processed in parallel as the reader can skip parts information search tasks.
of the text (Rubin et al., 2000). Hence, depending on the
information and the task, these distinct properties of speech
and text can affect customers’ preferences for one or the 3 Research Model and Hypotheses
other modality. To understand how the interplay between
the interaction modality and the task type affects custom- The use of CAs promises benefits in terms of efficiency,
ers’ cognitive resources, we draw on cognitive fit theory and specifically in the service encounter (De Keyser et al., 2019;
task-technology fit (TTF) theory. Kraus et al., 2019). However, we do not know to what extent
Cognitive fit theory offers an explanation for the match these benefits depend on the interaction modality. Le Bigot
or mismatch between interaction modalities and task types et al. (2004), for instance, find that text interaction is faster
(Vessey & Galletta, 1991). Depending on the match between than speech. With current technical advancements, however,
the form of information representation and the task, users’ VAs offer more natural conversational interaction possibili-
performance within a given system could be enhanced by ties and faster processing speeds (Dubiel et al., 2018). Con-
reducing their cognitive load when carrying out the task. sequently, comparisons in prior literature may only hold to
However, if there is a mismatch and the users’ cognitive a certain extent (Schmitt et al., 2021). Recent qualitative
ability for information processing and their attention span investigations on VA use suggest that users particularly
is exceeded, they experience overload and lose information. value the efficiency of speech interaction (Moussawi, 2018;
The term “cognitive load” describes the human’s working Rzepka, 2019). From a users’ perspective, speech produc-
memory capacity used to accomplish the task, depending on tion is faster than writing because it does not involve the
the task’s characteristics and the users’ cognitive resources physical process of writing (Akinnaso, 1982). Hence, we
(Paas et al., 2004). Using this concept of “fit,” Goodhue assume that customers achieve and perceive more efficiency
and Thompson (1995) established a generic TTF model to in speech interaction compared to text. At the same time,
measure the interplay between different characteristics of the extant research shows that reading a system’s prompts
task, technology, and human through the users’ perceived requires less time than listening to them because humans can
TTF. The higher the discrepancy between task and technol- selectively retrieve relevant information from text (Rubin
ogy, the lower the perceived TTF and the lower the value et al., 2000). In contrast, listening to the VA’s information in
users get from using a technology. High TTF, by contrast, a sequential manner takes time and reduces users’ efficiency
positively affects users’ utilization of the technology and (Dubiel et al., 2020; Schmitt et al., 2021). The amount of
performance in achieving “improved efficiency, improved information that users must listen to depends on the tasks
effectiveness, and/or higher quality” (Goodhue & Thomp- they conduct. Because customers listen to less information
son, 1995, p. 218). The TTF theory allows us to assume during tasks they perceive to be more goal-directed (Hong
that either text or speech has a better fit with certain tasks, et al., 2004), the advantages of speech interaction should be
which eventually affects users’ performance. Initial findings more pronounced if the level of goal-directedness is high.
support this assumption, showing that speech interaction is However, if customers want to compare and evaluate dif-
better evaluated than text-based interactions for utilitarian ferent alternatives in experiential tasks, they need to listen
tasks, but not for hedonic tasks (Cho et al., 2019). to more information sequentially in speech-based interac-
Hence, we adapt TTF and cognitive fit theory to investigate tions, which reduces their efficiency compared to text-based
task-modality fit (Lee et al., 2001) between speech- and text- interactions (Dubiel et al., 2020). Therefore, we hypothesize:
based CAs for two information search tasks. Because the inter-
action modality describes a distinct technical property of CAs, H1a: Customers achieve higher levels of efficiency with
which is shown to affect users’ perceptions and behavior, task- speech interaction than with text interaction.
modality fit describes a TTF problem. Cognitive fit theory and H1b: Goal-directedness positively moderates the effect of
TTF theory are frequently used to study the adoption of diverse interaction modality on efficiency.
IT systems such as social networking sites (Lu & Yang, 2014)
or green IS (Yang et al., 2018). In the service context, TTF Besides efficiency, cognitive effort is a suitable variable
theory has already been adapted to the whole customer journey to account for differences between interaction modalities
(Wells et al., 2003; You et al., 2020) and to specific tasks such and tasks (Hong et al., 2004). While Le Bigot et al. (2004,
13
2007) consistently show that cognitive effort is higher for research shows that perceived enjoyment has a signifi-
speech than for textual interaction, Ruan et al. (2017) report cant positive effect on humans’ intention to use VAs (Pal
the opposite. Recent technological advancements and media et al., 2020; Yang & Lee, 2019). Hence, we propose that
naturalness theory offer a potential explanation for this speech interaction will lead to greater enjoyment than tex-
phenomenon (Kock, 2004; Schmitt et al., 2021). Humans tual interaction. We furthermore assume that this effect will
perceive speech interaction as more natural and intuitive, be weaker if customers perceive tasks to be highly goal-
which reduces cognitive effort (Kock, 2004). However, any directed because their focus on the particular goal leaves
decrease in naturalness through speech recognition errors less space for enjoying the actual interaction (Hoffman &
or synthetic speech output weakens this effect. As VAs have Novak, 1996). In contrast, for tasks perceived as less goal-
recently advanced technologically, we refer to recent empir- directed, the interaction serves to explore different options.
ical results (Dubiel et al., 2018; Foley et al., 2020; Ruan In this way, the benefits of speaking in a two-sided conversa-
et al., 2017) and suggest that speech interaction requires less tion should become more pronounced as the duration of the
cognitive effort than text interaction. This is in line with the interactions increases and because of the hedonic nature of
basic assumption of media naturalness theory and is based experiential tasks (Li et al., 2012; Novak et al., 2003). So,
on the fact that speech does not require the formal writ- our next hypotheses are:
ten structuring of information (Akinnaso, 1982). Producing
text involves a process of consciously applying intention- H3a: Customers perceive more enjoyment when they
ally learnt rules, while humans intuitively build and process interact via speech, compared to interacting via text.
speech. In particular, we expect this effect to be strong if H3b: Goal-directedness negatively moderates the effect
customers have a high goal-directed perception of a task. of interaction modality on enjoyment.
For low goal-directedness, customers are likely to use more
interaction turns to explore the CA’s information, analogous Finally, we assume that cognitive effort, efficiency, and
to the increasing number of clicks on a website (Hong et al., enjoyment affect customers’ satisfaction with the service out-
2004). These interactions increase cognitive effort because come, i.e., the search result. Satisfaction is mostly conceptu-
customers need to remember the agent’s information, which alized as a state in which an interaction meets or exceeds a
becomes more difficult considering the increasing informa- customer’s expectation (McKinney et al., 2002), and, there-
tion load of multiple turns (Paas et al., 2004). Consequently, fore, is also dependent on the customers’ expectations of the
when evaluating different alternatives in experiential search, service encounter. Technology is used in the service encounter
the sequential information representation in speech-based to provide efficient service which, in turn, should enhance sat-
interactions applies cognitive pressure on customers as they isfaction with the encounter (Larivière et al., 2017). Because
need to retain different options in their memory (Dubiel customers using CAs in the service encounter have similar
et al., 2020). Customers can, however, selectively retrieve expectations and request fast service (Kraus et al., 2019),
relevant information from the system’s written answers, meeting these expectations should positively affect their sat-
which reduces their cognitive effort compared to listening isfaction. Furthermore, initial results on customers’ satisfac-
to and memorizing the information given in spoken inter- tion determinants of voice commerce show high convenience
actions (Rubin et al., 2000). Therefore, we assume speech expectations (Kraus et al., 2019), which indicates that keep-
requires more cognitive effort in experiential tasks: ing customers’ cognitive effort low promotes their satisfac-
tion regarding service encounters via CAs. This relationship
H2a: Customers perceive less cognitive effort when they has already been examined for online shopping, showing that
interact via speech compared to interacting via text. reducing cognitive effort through electronic decision aids
H2b: Goal-directedness positively moderates the effect of has a positive impact on customers’ satisfaction (Bechwati
interaction modality on cognitive effort. & Xia, 2003). Hence, we assume that high cognitive effort
negatively impacts customers’ satisfaction with the search
Moreover, CAs promise to provide personal customer outcome. Furthermore, extant research has shown that cus-
encounters, similar to relationships with human person- tomers’ satisfaction are not only driven by extrinsic motiva-
nel (Larivière et al., 2017). With their anthropomorphic tions, but also by intrinsic expectations (Lowry et al., 2015).
characteristics and social cues, CAs can promote feelings Since CAs offer an enjoyable experience through their inter-
of humanness and social presence (Diederich et al., 2019). active conversational capabilities (Cho et al., 2019; Novielli
Speech-based interactions trigger such perceptions more et al., 2010), we expect that enjoyment also drives customers’
than text-based interactions do, which leads to more posi- satisfaction with the service encounter:
tive (Cho et al., 2019) and personal attitudes (Novielli et al.,
2010). This indicates that users prefer VAs because they H4a: Perceived efficiency has a positive impact on cus-
enjoy the interaction itself (Rzepka, 2019). Indeed, extant tomers’ satisfaction with the service encounter.
13
H1b (+)
Efficiency
H2b (+) H1a (+)
H4a (+)
Interaction modality H3b (-)
H4b: Cognitive effort has a negative impact on custom- as a within-subject factor. In this experiment, we randomly
ers’ satisfaction with the service encounter. assigned participants to a group who would interact either
H4c: Enjoyment has a positive impact on customers’ sat- via speech or via text with a specifically created CA to solve
isfaction with the service encounter. – for experimental purposes – a single goal-directed and a
single experiential search task in a randomized order. We
Figure 1 illustrates the hypotheses on the fit between used the randomization algorithm integrated in the survey
interaction modality and task goal-directedness as regards software Qualtrics, which we also used for the instructions
their impact on customers’ satisfaction. Following our and questionnaires of the experiment. Unfortunately, we
research question and the extant literature, we hypothesize found confounding effects caused by the experience of the
that speech differs from textual interaction in its perceived first task and, thus, had to omit the second measurement
efficiency, cognitive effort, and enjoyment, thus ultimately from further analyses. Because we randomized the order
affecting customers’ satisfaction with the service encoun- of the tasks, we were able to report the remaining between-
ter. However, this relationship can change depending on the subject laboratory experiment based on the data of the first
search task’s perceived goal-directedness. task only.
All interactions took place in single-person laboratory
sessions to ensure controlled and private conditions for each
4 Experimental Design participant. Participants selected an appointment via an
online registration form. Upon arrival, a researcher provided
4.1 Experimental Procedure participants with information about the general procedure of
the study and the technical devices (Laptop and CA) used in
To test our research model, we designed a 2 (interaction the experiment. Participants could briefly familiarize them-
modality: speech vs. text) × 2 (search task type: goal- selves with the setup and ask open questions. Thereafter, the
directed vs. experiential) between-subject laboratory researcher left the room but remained accessible at all times
experiment. Originally, we designed and conducted the should problems occur. At this stage, participants started
experiment as a mixed design experiment with interaction the experiment on the laptop and were guided through the
modality serving as a between-subject factor and task type experimental process depicted in Figure 2.
Interaction 4. Search
3. Training 5. Post-task
modality: task type:
task questionnaire
speech goal-directed
8. Control &
1. General 2. CA demographic
instruction intro questions
Interaction 4. Search
3. Training 5. Post-task
modality: task type:
task questionnaire
text experiential
Randomization Condition
13
After giving general instructions on the experiment, a recommendation for a restaurant or a café based on the
we briefly defined CAs and asked participants to answer participants’ preferences of cuisine, location, and price.
questions on their prior experience with CAs. Next, we ran- Depending on the condition, either a smart speaker or a
domly assigned participants to either the text or the speech smartphone was placed next to the laptop. In the speech con-
condition and provided them with specific instructions on dition, participants interacted with the CA via the Google
how to interact with the CA. The participants conducted an Nest Mini smart speaker by saying “ok google, start restau-
exemplary task to familiarize themselves with the system. rant finder.” We implemented the DialogFlow agent using a
When participants felt confident using the system, they went demo integration for Google’s assistant application on the
ahead with the actual task which, by chance, was either goal- smart speaker. In the text condition, participants started the
directed or experiential. After the completion of the task, interaction through a messenger chat window on a Google
we administered a post-task questionnaire with manipula- Pixel smartphone by typing “ok google, start restaurant
tion and attention checks, and posed questions on subjective finder.” In this condition, we implemented the DialogFlow
measures (perceived efficiency, cognitive effort, enjoyment, agent as a demo in a browser window on the smartphone.
and satisfaction with the service encounter). The experiment Both implementations were based on the same DialogFlow
concluded with demographic questions. agent and, therefore, engaged the user in the exact same con-
versational flow. Table 2 presents an exemplary dialogue for
4.2 Experimental Treatments each task.
Search task type Goal-directed Search for the cheapest Indian res- Search for the cheap-
taurant near the University est Indian restaurant
near the University
Experiential Search for any restaurant without Search for any
predefined criteria restaurant without
predefined criteria
13
CA: Welcome to Restaurant Finder! I am happy to give you a restau- CA: Welcome to Restaurant Finder! I am happy to give you a restaurant
rant recommendation. What would you like to eat? recommendation. What would you like to eat?
User: I am looking for an Indian restaurant near the University. User: I don’t know yet. What would you recommend?
CA: Ok, I am searching for Indian food near the University. How CA: I am happy to give you a recommendation! Please choose a cuisine,
much money would you like to spend? for example, Asian, Italian, German, Greek, or Mexican food.
User: It should be the cheapest restaurant. User: Hm, what about African options?
CA: Alright, I have found a restaurant matching your criteria. The CA: Ok, I am searching for African food. Where would you like to eat?
cheapest Indian restaurant is Sankar’s Food Corner. You can now ask User: In the city center.
for further information or close the application.
CA: Ok, I’ll search in the city center. How much would you like to
spend?
User: Less than 20 euros.
CA: Alright, I have found three restaurants matching your criteria.
Makamba offers various African food specialties. You can now ask for
further information or close the application.
User: What is the other recommendation?
CA: The second recommendation is Savanna, which offers African
tapas. You can ask for further information or close the application.
User: What are the reviews? …
a participant failed these attention checks, we excluded the The main data collection took place in Q4 2020. To
data set from further analysis. As a control question, partici- recruit students, we advertised the laboratory experiment
pants were asked to indicate whether they had been able to via student mailing lists and on site at a large public uni-
solve the task successfully. We used this question to assess versity, as is frequently done in related studies (e.g., Polites
the CA’s performance. The question on the scenario’s real- and Karahanna (2012)). To ensure completion, participa-
ism assessed how well participants were able to acquaint tion was incentivized with a five-euro reward. In succes-
themselves with the described situation and task (Paschall sion, a total of 119 participants completed the experiment.
et al., 2005). Finally, we asked for free form feedback on the We discarded three participants from further analysis as
interaction with the CA. The participants’ comments did not their attention checks and interaction logs showed that they
reveal any serious problems or misunderstandings. did not follow the instructions. Of the remaining 116 par-
ticipants, 60.3% were female and the average age was 24
4.4 Data Collection and Sample (M=23.97; SD=4.10). Most participants were students in
the field of business (63.8%), followed by medicine (6.9%)
We conducted two pre-tests to train and evaluate our CA and social sciences (6.9%). 80.9% of the participants that
with objective and subjective measures (Shawar & Atwell, disclosed their income earned less than €1,500 per month
2007). During the first pre-test, 16 participants interacted (5.2% chose not to disclose). Nearly all participants had pre-
with the CA and noted everything that came to their mind. viously made use of speech- or text-based CAs. More than
Based on this feedback, we revised the CA and adjusted the 80% of the participants used text-based CAs at least weekly;
conversational structure for a second objective evaluation. only 4.3% indicated that they had never interacted with a
To reduce system errors in the main laboratory experiment, chatbot. Speech-based CAs were used weekly by 74% of
the aim of the second pre-test was to train the CA for varia- our participants while 7.8% had never used them. Detailed
tions of user requests. We therefore distributed the Dialog- information about the sample distribution is presented in
Flow demo link to researchers and students at our university Table 7 of the appendix.
to collect real conversation data from them. Based on this
conversation data, we manually assigned unmatched user
queries from 64 conversations. As our CA aims to provide 5 Results
a specific service, we further assessed whether participants
were able to solve the task. This yielded satisfactory results We tested our research model using partial least squares
as 96.6% of the participants were able to find a restaurant (PLS) structural equation modelling carried out with Smart-
(Shawar & Atwell, 2007). PLS3 (Ringle et al., 2015). Therefore, we determined the
13
Table 3 Factor loadings, internal consistency criteria, AVE, and correlation matrix
13
.254***
.265***
Enjoyment
all values were below the critical threshold of 5, confirm- but reject H2b since we hypothesized a moderation effect
ing that collinearity is not an issue in our structural model. in the opposite direction. Third, the interaction modal-
By assessing path coefficients and their significance, we ity speech has a significant positive effect on enjoyment
found support for six of our nine hypotheses, as presented (ß=.265, p<.01). This effect is strengthened by the task’s
in Figure 3. goal-directedness (ß=.263, p<.01), which contradicts the
First, we examined the hypothesized relationships proposed direction in our hypothesis. Therefore, we con-
between the interaction modality, the task’s perceived firm hypothesis H3a, but reject H3b. Lastly, we find empir-
goal-directedness, and efficiency. We find that the inter- ical evidence that efficiency (ß=.271, p<.01) and enjoy-
action modality speech has a significant positive effect ment (ß=.254, p<.01) have a significant positive effect
on efficiency (ß=.167, p< .1). However, goal-directedness on participants’ satisfaction with the service encounter,
does not moderate the relationship between the interac- while cognitive effort has a significant negative effect on
tion modality and efficiency (ß=.063, p>.1). Thus, we the latter (ß=-.383, p<.01). Hence, we confirm the cor-
find support for the corresponding hypotheses H1a, but responding hypotheses H4a, H4b, and H4c. Additionally,
not for H1b. Second, our results show that the interaction we also find that the total effect of speech on participants’
modality speech has a significant negative effect on par- service satisfaction is significant (ß=.189, p<.01). Over-
ticipants’ cognitive effort (ß=-.199, p<.05), hence reduc- all, perceived efficiency, perceived cognitive effort, and
ing participants’ cognitive effort compared to text-based perceived enjoyment explain 56,7 % of the variance in
interactions. Additionally, we find a significant negative participants’ satisfaction with the service encounter. The
moderation effect between interaction modality and goal- means and standard deviations of all investigated variables
directedness on cognitive effort (ß=-.183, p<.1). Thus, are shown in Table 6 of the appendix. Table 4 summarizes
we find support for the corresponding hypothesis H2a, the hypotheses and results.
13
13
research on CAs and human-computer interaction and, for future research on the moderating effects of task char-
more specifically, to VA adoption by investigating, on a acteristics and on the applicability of each interaction
comparative basis, how and why speech and text interac- modality.
tions differ. Specifically, we show that customers perceive Third, our results also inform research on the use of CAs
speech-based interactions to be more beneficial in terms in the service encounter. So far, prior literature has mainly
of efficiency, cognitive effort, and enjoyment, and, sub- focused on the application of text-based CAs (Adam et al.,
sequently, in terms of service satisfaction. In this way, for 2021; Schuetzler et al., 2021). By addressing De Keyser
both speech- and text-based interactions, we add to the et al.’s (2019) research calls, our study sheds light on the
existing research stream on CAs (Diederich et al., 2022; impact that different interaction modalities have on cus-
Zierau et al., 2020). We extend this stream by highlight- tomer perceptions and, therefore, it has implications for the
ing the distinct characteristics of both interaction modali- development of suitable speech- or text-based CAs in the
ties, which need further in-depth investigations in line service journey. Specifically, we show that extant research
with Schmitt et al. (2021) and Zierau et al. (2020). Fur- on the use of text-based CAs in the customer encounter may
thermore, we provide an initial understanding of differ- not be equally generalizable to speech-based interactions.
ences in speech- and text-based interactions in information Rather, speech-based interactions have considerable benefits
searches, addressing research calls by Ma and Liu (2020) that need further exploration in their customer interaction
and Vakulenko et al. (2020). Our results show that speech, usage. In this context, our study also enters the existing
when compared to text-based interactions, has significant research stream investigating customers’ goal-directed and
benefits that are utilitarian and hedonic in nature. So far, experiential search behavior (Hong et al., 2004; Nadkarni
VAs’ hedonic and utilitarian benefits have only been inves- & Gupta, 2007). More specifically, we not only show that
tigated for VA adoption in general (McLean & Osei-Frim- the task’s perceived goal-directedness affects how different
pong, 2019; Pal et al., 2020; Zimmermann et al., 2021). website designs influence customers’ behavior, but that it
We show that these benefits positively affect customers’ also influences the applicability of speech- and text-based
service satisfaction. Previously, this relationship had only interaction modalities.
been investigated for text-based interactions (Diederich Furthermore, we provide insights for practitioners
et al., 2019; Diederich et al., 2020), thus receiving less assessing different types of CAs offered to customers for
attention in extant VA research. various kinds of task. This is most relevant because com-
Second, the interaction effects between interaction panies increasingly apply CAs in service encounters to
modality and task type extend cognitive fit theory and reduce costs while also providing personal interactions at
TTF theory to the applicability of different interaction all times. We show that speech-based interaction modali-
modalities. Specifically, we show that speech and text are ties can be beneficial as they increase customers’ satis-
not similarly suited to various kinds of tasks and that the faction with service encounters. Hence, providers should
tasks’ perceived goal-directedness influences their effect. not only offer text-based touchpoints to their customers,
In this way, we extend cognitive fit theory to the study of but also consider the provision of speech-based services.
conversational human-computer interactions, specifically As a consequence, questions relevant to the deployment
speech- and text-based interactions, and their interplay in of chatbots (Schuetzler et al., 2021) need to be re-eval-
two-way communications. These communications not only uated for VAs. For example, when providing product
include information representation by the system, but also information and recommendations, providers should con-
information requests from the user. So far, extant research sider offering VA applications for their customers’ smart
using cognitive fit theory has been concerned with the speakers. They should also implement speech-based
examination of visual information representation formats, interaction touchpoints on their websites to increase cus-
e.g., in lists and matrices. Until now, only Chen et al. tomers’ efficiency and enjoyment. We further suggest
(2021) had investigated cognitive fit for text-based CAs, that practitioners should not refrain from using VAs for
showing that a friendlier conversational style better fits a more complex tasks as our results show that the benefits
goal-directed task. We extend this research stream and the of speech are present for both search tasks. Finally, our
TTF theory by distinguishing between different interaction results inform service providers and offer developers
modalities based on their inherent, distinct technologies intending to design VA dialogues for more complex tasks
(VAs, compared to text-based CAs, additionally include an initial understanding of the differences in speech-
a speech recognizer and speech-to-text synthesis). While and text-based interactions in information searches.
our results do not show that text-based interaction matches Specifically, system providers should not only focus
any information search task better than does speech-based on designing efficient speech-based interactions but
interaction, we believe that our results open new avenues also ensure that customers enjoy them.
13
6.3 Limitations and Future Research how different combinations of speech input and text
output, or vice versa, affect users’ satisfaction with
This study is not without limitations; limitations that also the interactions. Future research could also compare
open future research avenues. First, our analyses build on a speech interactions to website information searches or
comparatively small sample size because participants had to product purchasing, extending initial research by Kraus
take part in the experiment one by one. While the laboratory et al. (2019). Another promising research area is the
setting offered benefits in terms of a high internal validity, comparison of speech-based, text-based, and human
future research could re-examine our research model using interactions, in particular the impact of disclosing the
larger sample sizes in an online experiment. An online CA’s non-human identity. While this topic has already
experiment also offers the possibility of recruiting partici- received initial attention for text-based CA interactions
pants globally and of considering diverse cultural back- (Cheng et al., 2021; Mozafari et al., 2021a, 2021b), it
grounds in the analyses. Apart from increasing the sample becomes progressively important given the increasing
size and including user characteristics in the research model, naturalness of synthetic speech output.
the comparison of speech- and text-based interactions offers Fourth, our study focuses on a single task characteristic
a wide range of new research opportunities. Most research in a particular context. Further research should also examine
on the use of CAs has focused on text-based interactions, the impact of the task’s goal-directedness for other contexts,
which – based on our results – differ from speech-based e.g., in organizational settings. Moreover, other task char-
interactions. However, we know little about the right design acteristics such as associated risks or the task’s importance
and the value of speech-based interactions. While social pose interesting research questions. Is there a certain degree
cues are well researched for text-based interactions, more of task importance or risk that makes text-based interactions
research is needed – for instance – on types of cues that more preferable than speech-based interactions? And would
are beneficial for speech-based interactions, e.g., loudness, customers prefer human interactions for an even higher
pitch, or pauses (Schmitt et al., 2021). degree of risk or do they already prefer humans to text-based
Second, it would be worthwhile to investigate the interactions? In this regard, it is essential to investigate trust
structure of the conversations in more detail, depend- in particular, being an important determinant of humans’
ing on the nature of the task (e.g., depending on whether usage of artificial intelligence-enabled systems in general
the user is a sender or a receiver of information). In this and of VAs in particular (Mari & Algesheimer, 2021).
vein, it could be interesting to examine the exact way
in which the number of interaction turns affects users’
interaction outcomes. Would users prefer very brief and 7 Conclusion
efficient interactions that convey a large amount of infor-
mation in each turn – information that they would need Overall, our study sheds first light on the impact of task
to convey or memorize – or would they rather interact in characteristics on the benefits of speech- based interac-
multiple turns, each comprising a single piece of infor- tion modalities for information search tasks. We specifi-
mation? Similarly, it would be interesting to investigate cally show that, in terms of perceived efficiency, cog-
this relationship with different numbers of required user- nitive effort, enjoyment, and satisfaction, speech-based
confirmation steps, especially when this results in actual interaction modalities are perceived as being superior to
product purchases as outcomes of the information search text-based interactions. We further show that the task’s
behavior. Would the user prefer longer interactions if they goal-directedness influences these effects. In this way, we
included more user confirmations? provide initial evidence that opens new avenues for future
Third, we only compared purely speech-based inter- research; research that can build on these results and that
actions with purely text-based interactions. Although can inform practitioners about the applicability of both
we deliberately opted for this comparison, as these speech- and text-based CAs. With continuous technical
interaction modalities are similar but constitute dif- advancements, companies increasingly need to decide
ferent CA configurations, further research projects whether a human, a VA, or a chatbot should answer
should investigate combinations or extensions of these customers’ service queries. We hope to inspire and
conditions. For example, future studies could examine motivate future research in this area.
13
Appendix
Efficiency (EFF) Interacting with the [CA]… .844 7-point Likert-type scale
EFF1 … takes less time for making decisions. (Cho 2004)
EFF2 … takes less time for investigating different options. .921
EFF3 … takes less time for browsing through alternatives. .882
Cognitive effort (COG) COG1 To complete the task, using this [CA] was very frustrating. .896 7-point Likert-type scale
COG2 To complete the task, using this [CA] took too much time. .845 (Hong et al. 2004; Pereira 2000)
COG3 To complete the task, using this [CA] required too much effort. .903
COG4 To complete the task, using this [CA] was too complex. .810
Enjoyment (ENJ) ENJ1 I find using the [CA] to be enjoyable. .949 7-point Likert-type scale
ENJ2 The actual process of using the [CA] is pleasant. .893 (Davis et al. 1992; Venkatesh 2000)
ENJ3 I have fun using the [CA]. .923
ENJ4 Using the [CA] is exciting. .870
Goal-directedness (GD) GD1 I knew clearly what I wanted to do. .906 7-point Likert-type scale
GD2 I had a strong sense of what I wanted to do. .939 (Stawski et al. 2007)
GD3 I know what I wanted to achieve. .902
GD4 My goals were clearly defined. .805
Satisfaction (SAT) How do you feel about your overall experience of [CA] use? .912 7-point semantic differential scale
SAT1 Very dissatisfied/Very pleased. (Bhattacherjee 2001)
SAT2 Very displeased/Very pleased. .876
SAT3 Very frustrated/Very contented. .911
SAT4 Absolutely terrible/Absolutely delighted. .862
Age 24.57 (5.57) 24.33 (3.26) 24.27 (3.99) 22.84 (3.19) 23.97 (4.10)
Personal innovativeness 4.71 (1.39) 4.64 (1.37) 4.98 (1.54) 4.96 (1.53) 4.83 (1.45)
Web skills 5.68 (1.08) 5.49 (1.15) 5.59 (1.08) 5.75 (1.22) 5.63 (1.13)
Extraversion 4.64 (1.23) 5.06 (1.16) 5.01 (1.28) 4.70 (1.25) 4.85 (1.24)
Efficiency 5.07 (1.27) 4.77 (1.64) 4.26 (1.85) 4.46 (1.82) 4.63 (1.68)
Cognitive effort 1.82 (1.01) 1.80 (1.05) 2.29 (1.43) 2.42 (1.68) 2.10 (1.35)
Enjoyment 5.12 (1.34) 5.08 (1.19) 4.03 (1.53) 4.52 (1.68) 4.67 (1.51)
Satisfaction 5.54 (.87) 5.08 (1.03) 5.04 (1.32) 4.82 (1.24) 5.11 (1.15)
Goal-directedness 6.42 (.90) 5.84 (1.21) 6.48 (.65) 5.44 (1.31) 6.04 (1.13)
N 28 27 30 31 116
13
13
Cho, E., Molina, M. D., & Wang, J. (2019). The Effects of Modality, Fryer, L. K., Ainley, M., Thompson, A., Gibson, A., & Sherlock, Z.
Device, and Task Differences on Perceived Human Likeness of (2017). Stimulating and Sustaining Interest in a Language Course:
Voice-Activated Virtual Assistants. Cyberpsychology, Behavior, An Experimental Comparison of Chatbot and Human Task Part-
and Social Networking, 22(8), 515–520. https://doi.org/10.1089/ ners. Computers in Human Behavior, 75, 461–468. https://d oi.o rg/
cyber.2018.0571 10.1016/j.chb.2017.05.045
Cho, J. (2004). Likelihood to Abort an Online Transaction: Influences Gnewuch, U., Morana, S., & Maedche, A. (2017). Towards Design-
from Cognitive Evaluations, Attitudes, and Behavioral Variables. ing Cooperative and Social Conversational Agents for Customer
Information & Management, 41(7), 827–838. https://doi.org/10. Service. Proceedings of the 38th International Conference on
1016/j.im.2003.08.013 Information Systems, Seoul, South Korea.
Dai, H., & Salam, A. F. (2020). An Empirical Assessment of Ser- Goodhue, D. L., & Thompson, R. L. (1995). Task-Technology Fit and
vice Quality, Service Consumption Experience and Relational Individual Performance. MIS Quarterly, 19(2), 213–236. https://
Exchange in Electronic Mediated Environment (EME). Informa- doi.org/10.2307/249689
tion Systems Frontiers, 22(4), 843–862. https://doi.org/10.1007/ Hair, J. F., Ringle, C. M., & Sarstedt, M. (2011). PLS-Sem: Indeed a
s10796-019-09894-1 Silver Bullet. Journal of Marketing Theory and Practice, 19(2),
Dang, Y., Zhang, Y., Brown, S. A., & Chen, H. (2020). Examining 139–152. https://doi.org/10.2753/MTP1069-6679190202
the Impacts of Mental Workload and Task-Technology Fit on Henseler, J., Ringle, C. M., & Sarstedt, M. (2015). A New Criterion
User Acceptance of the Social Media Search System. Informa- for Assessing Discriminant Validity in Variance-Based Structural
tion Systems Frontiers, 22(3), 697–718. https://doi.org/10.1007/ Equation Modeling. Journal of the Academy of Marketing Sci-
s10796-018-9879-y ence, 43(1), 115–135. https://d oi.o rg/1 0.1 007/s 11747-0 14-0 403-8
Davis, F. D., Bagozzi, R. P., & Warshaw, P. R. (1992). Extrinsic and Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N.,
Intrinsic Motivation to Use Computers in the Workplace. Journal Senior, A., Vanhoucke, V., Nguyen, P., & Sainath, T. N. (2012).
of Applied Social Psychology, 22(14), 1111–1132. https://d oi.o rg/ Deep Neural Networks for Acoustic Modeling in Speech Recogni-
10.1111/j.1559-1816.1992.tb00945.x tion: The Shared Views of Four Research Groups. IEEE Signal
De Keyser, A., Köcher, S., Alkire, L., Verbeeck, C., & Kandam- Processing Magazine, 29(6), 82–97. https://d oi.o rg/1 0.1 109/M SP.
pully, J. (2019). Frontline Service Technology Infusion: Con- 2012.2205597
ceptual Archetypes and Future Research Directions. Journal of Hirschberg, J., & Manning, C. D. (2015). Advances in Natural Lan-
Service Management, 30(1), 156–183. https://doi.org/10.1108/ guage Processing. Science, 349(6245), 261–266. https://doi.org/
JOSM-03-2018-0082 10.1126/science.aaa8685
Diederich, S., Brendel, A. B., Morana, S., & Kolbe, L. (2022). On Hoffman, D. L., & Novak, T. P. (1996). Marketing in Hypermedia
the Design of and Interaction with Conversational Agents: An Computer-Mediated Environments: Conceptual Foundations.
Organizing and Assessing Review of Human-Computer Interac- Journal of Marketing, 60(3), 50–68. https://doi.org/10.1177/
tion Research. Journal of the Association for Information Systems, 002224299606000304
forthcoming. Hong, W., Thong, J. Y., & Tam, K. Y. (2004). The Effects of Informa-
Diederich, S., Janssen-Müller, M., Brendel, A. B., & Morana, S. tion Format and Shopping Task on Consumers' Online Shopping
(2019). Emulating Empathetic Behavior in Online Service Behavior: A Cognitive Fit Perspective. Journal of Management
Encounters with Sentiment-Adaptive Responses. Proceedings Information Systems, 21(3), 149–184. https://doi.org/10.1080/
of the 40th International Conference on Information Systems, 07421222.2004.11045812
Munich, Germany. Hoy, M. B. (2018). Alexa, Siri, Cortana, and More: An Introduction to
Diederich, S., Lembcke, T.-B., Brendel, A. B., & Kolbe, L. M. (2020). Voice Assistants. Medical Reference Services Quarterly, 37(1),
Not Human after All: Exploring the Impact of Response Failure 81–88. https://doi.org/10.1080/02763869.2018.1404391
on User Perception of Anthropomorphic Conversational Service John, O. P., & Srivastava, S. (1999). The Big-Five Trait Taxonomy:
Agents. Proceedings of the 28th European Conference on Infor- History, Measurement, and Theoretical Perspectives (Vol. 2).
mation Systems, A Virtual AIS Conference. University of California, Berkeley, USA.
Dubiel, M., Halvey, M., Azzopardi, L., Anderson, D., & Daronnat, Jones, M. A., Reynolds, K. E., & Arnold, M. J. (2006). Hedonic and
S. (2020). Conversational Strategies: Impact on Search Perfor- Utilitarian Shopping Value: Investigating Differential Effects on
mance in a Goal-Oriented Task. Proceedings of the 3rd Interna- Retail Outcomes. Journal of Business Research, 59(9), 974–981.
tional Workshop on Conversational Approaches to Information https://doi.org/10.1016/j.jbusres.2006.03.006
Retrieval, Vancouver, Canada. Kanda, T., Hirano, T., Eaton, D., & Ishiguro, H. (2004). Interactive
Dubiel, M., Halvey, M., Azzopardi, L., & Daronnat, S. (2018). Inves- Robots as Social Partners and Peer Tutors for Children: A Field
tigating How Conversational Search Agents Affect User's Behav- Trial. Human–Computer Interaction, 19(1-2), 61–84. https://doi.
iour, Performance and Search Experience. Proceedings of the 2nd org/10.1207/s15327051hci1901&2_4
International Workshop on Conversational Approaches to Infor- Klesse, A.-K., Levav, J., & Goukens, C. (2015). The Effect of Prefer-
mation Retrieval, Ann Arbour, USA. ence Expression Modality on Self-Control. Journal of Consumer
Enholm, I. M., Papagiannidis, E., Mikalef, P., & Krogstie, J. (2021). Research, 42(4), 535–550. https://doi.org/10.1093/jcr/ucv043
Artificial Intelligence and Business Value: A Literature Review. Kock, N. (2004). The Psychobiological Model: Towards a New Theory
Information Systems Frontiers. https:// d oi. o rg/ 1 0. 1 007/ of Computer-Mediated Communication Based on Darwinian Evo-
s10796-021-10186-w lution. Organization Science, 15(3), 327–348. https://doi.org/10.
Foley, M., Casiez, G., & Vogel, D. (2020). Comparing Smartphone 1287/orsc.1040.0071
Speech Recognition and Touchscreen Typing for Composition Krämer, N. C., Bente, G., Eschenburg, F., & Troitzsch, H. (2009).
and Transcription. Proceedings of the CHI Conference on Human Embodied Conversational Agents: Research Prospects for Social
Factors in Computing Systems, New York, USA. Psychology. Social Psychology, 40(1), 26–36. https://doi.org/10.
Fornell, C., & Larcker, D. F. (1981). Evaluating Structural Equation 1027/1864-9335.40.1.26
Models with Unobservable Variables and Measurement Error. Kraus, D., Reibenspiess, V., & Eckhardt, A. (2019). How Voice Can
Journal of Marketing Research, 18(1), 39–50. https://doi.org/10. Change Customer Satisfaction: A Comparative Analysis between
1177/002224378101800104 E-Commerce and Voice Commerce. Proceedings of the 14th Inter-
national Conference on Wirtschaftsinformatik, Siegen, Germany.
13
Larivière, B., Bowen, D., Andreassen, T. W., Kunz, W., Sirianni, N. J., Moon, Y. (2000). Intimate Exchanges: Using Computers to Elicit Self-
Voss, C., Wünderlich, N. V., & De Keyser, A. (2017). “Service Disclosure from Consumers. Journal of Consumer Research,
Encounter 2.0”: An Investigation into the Roles of Technology, 26(4), 323–339. https://doi.org/10.1086/209566
Employees and Customers. Journal of Business Research, 79, Moussawi, S. (2018). User Experiences with Personal Intelligent
238–246. https://doi.org/10.1016/j.jbusres.2017.03.008 Agents: A Sensory, Physical, Functional and Cognitive Affor-
Le Bigot, L., Jamet, E., & Rouet, J.-F. (2004). Searching Information dances View. Proceedings of the ACM SIGMIS Conference on
with a Natural Language Dialogue System: A Comparison of Spo- Computers and People Research, New York, USA.
ken Vs. Written Modalities. Applied Ergonomics, 35(6), 557–564. Mozafari, N., Weiger, W., & Hammerschmidt, M. (2021a). Resolv-
https://doi.org/10.1016/j.apergo.2004.06.001 ing the Chatbot Disclosure Dilemma: Leveraging Selective Self-
Le Bigot, L., Rouet, J.-F., & Jamet, E. (2007). Effects of Speech-and Presentation to Mitigate the Negative Effect of Chatbot Disclo-
Text-Based Interaction Modes in Natural Language Human-Com- sure. Proceedings of the 54th Hawaii International Conference on
puter Dialogue. Human Factors, 49(6), 1045–1053. https://doi. System Sciences, Hawaii, USA.
org/10.1518/001872007X249901 Mozafari, N., Weiger, W., & Hammerschmidt, M. (2021b). Trust Me,
Lee, H.-K., Suh, K.-S., & Benbasat, I. (2001). Effects of Task-Modality I'm a Bot - Repercussions of Chatbot Disclosure in Different Ser-
Fit on User Performance. Decision Support Systems, 32(1), 27–40. vice Frontline Settings. Journal of Service Management, 33, 1–33.
https://doi.org/10.1016/S0167-9236(01)00098-7 https://doi.org/10.1108/JOSM-10-2020-0380
Li, J. (2015). The Benefit of Being Physically Present: A Survey of Nadkarni, S., & Gupta, R. (2007). A Task-Based Model of Perceived
Experimental Works Comparing Copresent Robots, Telepresent Website Complexity. MIS Quarterly, 31(3), 501–524. https://doi.
Robots and Virtual Agents. International Journal of Human- org/10.2307/25148805
Computer Studies, 77, 23–37. https://doi.org/10.1016/j.ijhcs. Nass, C., & Moon, Y. (2000). Machines and Mindlessness: Social
2015.01.001 Responses to Computers. Journal of Social Issues, 56(1), 81–103.
Li, M., Dong, Z. Y., & Chen, X. (2012). Factors Influencing Consump- https://doi.org/10.1111/0022-4537.00153
tion Experience of Mobile Commerce. Internet Research, 22(2), Novak, T. P., Hoffman, D. L., & Duhachek, A. (2003). The Influence of
120–141. https://doi.org/10.1108/10662241211214539 Goal-Directed and Experiential Activities on Online Flow Experi-
Liu, F., Xiao, B., Lim, E. T., & Tan, C.-W. (2016). Is My Effort Worth ences. Journal of Consumer Psychology, 13(1-2), 3–16. https://
It? Investigating the Dual Effects of Search Cost on Search Utility. doi.org/10.1207/S15327663JCP13-1&2_13
Proceedings of the 20th Pacific Asia Conference on Information Novak, T. P., Hoffman, D. L., & Yung, Y.-F. (2000). Measuring the
Systems, Chiayi, Taiwan. Customer Experience in Online Environments: A Structural Mod-
Lowry, P. B., Gaskin, J., & Moody, G. D. (2015). Proposing the Multi- eling Approach. Marketing Science, 19(1), 22–42. https://doi.org/
Motive Information Systems Continuance Model (MISC) to Better 10.1287/mksc.19.1.22.15184
Explain End-User System Evaluations and Continuance Inten- Novielli, N., de Rosis, F., & Mazzotta, I. (2010). User Attitude Towards
tions. Journal of the Association for Information Systems, 16(7), an Embodied Conversational Agent: Effects of the Interaction
515–579. Mode. Journal of Pragmatics, 42(9), 2385–2397. https://doi.org/
Lowry, P. B., Gaskin, J., Twyman, N., Hammer, B., & Roberts, T. 10.1016/j.pragma.2009.12.016
(2013). Taking ‘Fun and Games’ Seriously: Proposing the Paas, F., Renkl, A., & Sweller, J. (2004). Cognitive Load Theory:
Hedonic-Motivation System Adoption Model (HMSAM). Jour- Instructional Implications of the Interaction between Informa-
nal of the Association for Information Systems, 14(11), 617–671. tion Structures and Cognitive Architecture. Instructional Sci-
Lu, H.-P., & Yang, Y.-W. (2014). Toward an Understanding of the ence, 32(1/2), 1–8. https://d oi.o rg/1 0.1 023/B:T
RUC.0 00002 1806.
Behavioral Intention to Use a Social Networking Site: An Exten- 17516.d0
sion of Task-Technology Fit to Social-Technology Fit. Computers Pal, D., Arpnikanondt, C., Funilkul, S., & Chutimaskul, W. (2020).
in Human Behavior, 34, 323–332. https://doi.org/10.1016/j.chb. The Adoption Analysis of Voice-Based Smart Iot Products. IEEE
2013.10.020 Internet of Things Journal, 7(11), 10852–10867. https://doi.org/
Luger, E., & Sellen, A. (2016). “Like Having a Really Bad PA” the 10.1109/JIOT.2020.2991791
Gulf between User Expectation and Experience of Conversational Paschall, M. J., Fishbein, D. H., Hubal, R. C., & Eldreth, D. (2005).
Agents. Proceedings of the CHI Conference on Human Factors in Psychometric Properties of Virtual Reality Vignette Performance
Computing Systems, San Jose, USA. Measures: A Novel Approach for Assessing Adolescents' Social
Ma, X., & Liu, A. (2020). Challenges in Supporting Exploratory Search Competency Skills. Health Education Research, 20(1), 61–70.
through Voice Assistants. Proceedings of the CHI Conference on https://doi.org/10.1093/her/cyg103
Human Factors in Computing Systems, Hawaii, USA. Pereira, R. E. (2000). Optimizing Human-Computer Interaction for the
Mari, A., & Algesheimer, R. (2021). The Role of Trusting Beliefs in Electronic Commerce Environment. Journal of Electronic Com-
Voice Assistants During Voice Shopping. Proceedings of the 54th merce Research, 1(1), 23–44.
Hawaii International Conference on System Sciences, Hawaii, Polites, G. L., & Karahanna, E. (2012). Shackled to the Status Quo:
USA. The Inhibiting Effects of Incumbent System Habit, Switching
McKinney, V., Yoon, K., & Zahedi, F. M. (2002). The Measurement of Costs, and Inertia on New System Acceptance. MIS Quarterly,
Web-Customer Satisfaction: An Expectation and Disconfirmation 36(1), 21–42. https://doi.org/10.2307/41410404
Approach. Information Systems Research, 13(3), 296–315. https:// Ringle, C. M., Wende, S., & Becker, J.-M. (2015). SmartPLS 3. Boen-
doi.org/10.1287/isre.13.3.296.76 ningstedt. SmartPLS GmbH.
McLean, G., & Osei-Frimpong, K. (2019). Hey Alexa… Examine the Ruan, S., Wobbrock, J. O., Liou, K., Ng, A., & Landay, J. A. (2017).
Variables Influencing the Use of Artificial Intelligent in-Home Comparing Speech and Keyboard Text Entry for Short Messages
Voice Assistants. Computers in Human Behavior, 99, 28–37. in Two Languages on Touchscreen Phones. Proceedings of the
https://doi.org/10.1016/j.chb.2019.05.009 ACM on Interactive, Mobile, Wearable and Ubiquitous Technolo-
Mero, J. (2018). The Effects of Two-Way Communication and Chat gies, 1(4), 1–23. https://doi.org/10.1145/3161187
Service Usage on Consumer Attitudes in the E-Commerce Retail- Rubin, D. L., Hafer, T., & Arata, K. (2000). Reading and Listening
ing Sector. Electronic Markets, 28(2), 205–217. https://d oi.o rg/1 0. to Oral-Based Versus Literate-Based Discourse. Communication
1007/s12525-017-0281-2 Education, 49(2), 121–133. https://doi.org/10.1080/0363452000
9379200
13
Rzepka, C. (2019). Examining the Use of Voice Assistants: A Value- You, J.-J., Jong, D., & Wiangin, U. (2020). Consumers’ Purchase Inten-
Focused Thinking Approach. Proceedings of the 25th Americas tion of Organic Food Via Social Media: The Perspectives of Task-
Conference on Information Systems, Cancun, Mexico. Technology Fit and Post-Acceptance Model. Frontiers in Psychol-
Schmitt, A., Zierau, N., Janson, A., & Leimeister, J. M. (2021). Voice ogy, 11(579274). https://doi.org/10.3389/fpsyg.2020.579274
as a Contemporary Frontier of Interaction Design. Proceedings of Zierau, N., Elshan, E., Visini, C., & Janson, A. (2020). A Review of
the 29th European Conference on Information Systems, A Virtual the Empirical Literature on Conversational Agents and Future
AIS Conference. Research Directions. Proceedings of the 41st International Confer-
Schroeder, J., & Schroeder, M. (2018). Trusting in Machines: How ence on Information Systems, A Virtual AIS Conference.
Mode of Interaction Affects Willingness to Share Personal Infor- Zimmermann, S. K., Wagner, H.-T., Rössler, P., Gewald, H., & Krcmar,
mation with Machines. Proceedings of the 51st Hawaii Interna- H. (2021). The Role of Utilitarian Vs. Hedonic Factors for the
tional Conference on System Sciences, Hawaii, USA. Adoption of AI-Based Smart Speakers. Proceedings of the 27th
Schuetzler, R. M., Grimes, G. M., Giboney, J. S., & Rosser, H. K. Americas Conference on Information Systems, Montreal, Canada.
(2021). Deciding Whether and How to Deploy Chatbots. MIS
Quarterly. Executive, 20(1), 1–15. https://doi.org/10.17705/ Publisher’s Note Springer Nature remains neutral with regard to
2msqe.00039 jurisdictional claims in published maps and institutional affiliations.
Shawar, B. A., & Atwell, E. (2007). Different Measurement Metrics
to Evaluate a Chatbot System. Proceedings of the Workshop on
Bridging the Gap: Academic and Industrial Research in Dialog
Technologies, Rochester, USA. Christine Rzepka is a Ph.D. candidate and research assistant at the
Son, Y., & Oh, W. (2018). Alexa, Buy Me a Movie!: How AI Speakers Institute for Digital Management and New Media at Ludwig-Maxi-
Reshape Digital Content Consumption and Preference. Proceed- milians-Universität (LMU), Munich, Germany. Her doctoral the-
ings of the 39th International Conference on Information Systems, sis addresses benefits and costs of voice interaction and its value
San Francisco, USA. for human-computer interaction. She received an M.Sc. in business
Stawski, R. S., Hershey, D. A., & Jacobs-Lawson, J. M. (2007). Goal administration from LMU. Her research has been published in the pro-
Clarity and Financial Planning Activities as Determinants of ceedings of the International Conference on Information Systems and
Retirement Savings Contributions. The International Journal of other international conference proceedings in the field of information
Aging and Human Development, 64(1), 13–32. https://doi.org/10. systems.
2190/13GK-5H72-H324-16P2
Vakulenko, S., Savenkov, V., & de Rijke, M. (2020). Conversational Benedikt Berger is an assistant professor of Digital Transformation and
Browsing. arXiv preprint arXiv:2012.03704. Society at the Department of Information Systems at the University of
Venkatesh, V. (2000). Determinants of Perceived Ease of Use: Inte- Münster, Germany. He holds a Ph.D. in management from Ludwig-
grating Control, Intrinsic Motivation and Emotion into the Tech- Maximilians-Universität (LMU), Munich, Germany and an M.Sc. in
nology Acceptance Model. Information Systems Research, 11(4), management from the University of Mannheim, Germany. He was a
342–365. https://doi.org/10.1287/isre.11.4.342.11872 visiting scholar at the Department of Information Technologies at HEC
Vessey, I., & Galletta, D. (1991). Cognitive Fit: An Empirical Study Montréal, Canada. His current research focuses on digital products and
of Information Acquisition. Information Systems Research, 2(1), services as well as on AI-based information systems. His work has
63–84. https://doi.org/10.1287/isre.2.1.63 appeared in Journal of Management Information Systems, Electronic
Wells, J. D., Sarker, S., Urbaczewski, A., & Sarker, S. (2003). Studying Markets, Business and Information Systems Engineering, and in vari-
Customer Evaluations of Electronic Commerce Applications: A ous international conference proceedings.
Review and Adaptation of the Task-Technology Fit Perspective.
Proceedings of the 36th Hawaii International Conference on Sys- Thomas Hess is a professor of Information Systems and Management
tem Sciences, Hawaii, USA. at Ludwig-Maximilians-Universität (LMU), Munich, Germany, where
Woodford, S. (2020). Why Chatbots Are Essential to Retail. https:// he also serves as director of the Institute for Digital Management and
www.juniperresearch.com/document-library/white-papers/why- New Media. He furthermore is director of the Bavarian Research Insti-
chatbots-are-essential-to-retail tute for Digital Transformation and member of the board of the Internet
Yang, H., & Lee, H. (2019). Understanding User Behavior of Vir- Business Cluster Munich. Thomas holds a Ph.D. in management from
tual Personal Assistant Devices. Information Systems and the University of St. Gallen, Switzerland. His research focuses on the
e-Business Management, 17(1), 65–87. https://doi.org/10.1007/ digital transformation of companies and the management of media,
s10257-018-0375-1 internet, and IT companies. His work has appeared in journals such
Yang, Z., Sun, J., Zhang, Y., & Wang, Y. (2018). Peas and Carrots Just as Journal of Management Information Systems, European Journal of
Because They Are Green? Operational Fit between Green Sup- Information Systems, Electronic Markets, Business and Information
ply Chain Management and Green Information System. Informa- Systems Engineering, Communications of the ACM, Decision Support
tion Systems Frontiers, 20(3), 627–645. https://doi.org/10.1007/ Systems, Information Systems Journal, International, and others.
s10796-016-9698-y
13
1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at