Towards a Trust Reliance Paradox Exploring the Gap Between Perceived Trust in and Reliance on Algorithmic Advice
Towards a Trust Reliance Paradox Exploring the Gap Between Perceived Trust in and Reliance on Algorithmic Advice
(2021):
Towards a Trust Reliance Paradox? Exploring the Gap Between Perceived Trust in
and Reliance on Algorithmic Advice. International Conference on Information
Systems (ICIS).
Towards a Trust Reliance Paradox in Algorithmic Advice
Introduction
Due to the nature of their statistical architecture, Artificial Intelligence (AI)-based Information System (IS)
are suffering from opaqueness, limited robustness and reliability (Jordan and Mitchell 2015; Rahwan et al.
2019). State-of-the-art IS, such as conversational interfaces (CI) or decision support systems (DSS), offer
outcomes that (1) are often not fully predictable nor explainable and, (2) most critically, bring along a
certain probability of inaccuracy. For example, a skin cancer detection model with a classification accuracy
of 98% might lead to 2% of all users receiving erroneous diagnoses. Against the backdrop of these
challenges, trust plays an important role in understanding the adoption and use of AI-based IS. Their
prevalence across a variety of high-impact use cases such as credit scoring (O’Neil 2016), predictive policing
(Waardenburg et al. 2018), large-scale adaptive education (Wambsganss, Niklaus, et al. 2021) or cancer
detection (Jussupow et al. 2021) illustrates the capabilities of AI-based IS to augment human decision-
making and to even overcome human biases (Faltings et al. 2014; Rahwan et al. 2019). At the same time,
lack of trust towards such systems’ recommendations can impede successful adoption and deployment
(Söllner et al. 2016).
Existing research has named algorithmic error as a key factor for explaining decreased trust in algorithmic
advice (Dzindolet et al. 2002; Hoff and Bashir 2015). The desire for perfect forecasts has been cited as a
prominent reason for reluctance to rely on imperfect algorithms, which may be superior to human advice,
though (Dawes 1979). Yet, Liel and Zalmanson (2020) found that people adhered to erroneous AI,
suggesting that such recommendations have strong persuasive power regardless of their correctness.
However, little is known about how erroneous recommendations influence human behavior or perception
(i.e., the 2% wrongly predicted outcomes).
Most empirical work has black boxed the nature of the AI-based system and neglected the correctness of its
output when investigating related user outcomes (Haibe-Kains et al. 2020). Scholars have mostly discussed
whether algorithmic advice is over- or underutilized in comparison to human advice (Logg et al. 2019). In
fact, current literature falls short on three perspectives: First, previous studies have integrated the
manipulation of source of advice (human versus machine) within experiment surveys following a Wizard-
of-Oz approach, thereby not depicting a natural, contemporary human-computer interaction (Liel and
Zalmanson 2020). The question arises of whether stating that the source of advice stems from an algorithm
suffices as an instantiation of algorithmic advice. Second, studies comparing algorithmic advice with human
advice often neglect under which circumstances algorithmic advice is to be trusted and relied upon beyond
preference over human advice. Third, current research barely distinguishes between notions of trust and
related, behavioral outcomes. At the same time, empirical studies in human-computer interaction and IS
often define and use typologies of trust interchangeably (Yin et al. 2019), ultimately leaving unaddressed
how perceptions of a system’s functioning and behavioral response relate to each other.
Overall, researchers have called for “[...] a broader consideration of the behavior [… and the...] unintended
consequences of AI” (Rahwan et al. 2019, p. 477). More specifically, in the domain of IS, incorrect system
advice has been assessed as highly problematic, yet insufficiently addressed until now (Jussupow et al.
2021). In this study, we aim to arrive at a more nuanced consideration of both the antecedents (the
underlying technological system and the correctness of advice) and the related consequences of AI-based
advice (perceptual and behavioral outcomes). If error in algorithmic advice leads to a mis- or disuse of AI-
based systems, the question arises of how to design for interaction with failure-prone IS. So far, research
examining how specific design features alleviate potential over- or underreliance on algorithmic advice is
rather scarce (Berger et al. 2021; Yeomans et al. 2019). We aim to address this gap and contribute to
research around reliance on AI-based advice by answering the following research question (RQ):
RQ: What is the effect of incorrect algorithmic advice on user trust and reliance and how does
transparency around an AI-based system’s accuracy levels alter the effects of such incorrect advice?
To answer our research question, we conducted a 2 x 2 between-subject experiment to test whether
correctness of recommendation (correct vs. incorrect) and transparency (statement on accuracy of AI
present vs. absent) result in higher levels of trust in the information and reliance on AI-based advice. The
manipulations have been embedded in four instantiations of a text-based conversational agent (CA) to
provide the user with a familiar and recognizable AI-based IS. In the context of a vignette study, users
participated in an assessment center as part of which they had to complete a reading comprehension task
in interaction with the CA. We found that participants being exposed to incorrect advice trusted provided
information less, yet more strongly relied on the recommended answers. Our study suggests that adding
transparency around a system’s accuracy levels does not necessarily increase trust in the system’s advice
and depends on the correctness of advice.
Our research contributes to the understanding of whether users trust AI-based advice and whether such an
acceptance is influenced by the system’s accurateness in the specific context of a CA. We provide a more
nuanced understanding of perceptual and behavioral outcomes by differentiating between trust and
reliance towards the AI-based advice. By shedding light on these phenomena and empirically testing
proposed strategies for AI trustworthiness, we hope to contribute to the existing research body around
algorithmic aversion and appreciation. We first provide an overview of the conceptual background of this
paper and developed hypotheses. The subsequent section describes the chosen methodology, including the
experimental design manipulations and measurements, as well as the analysis of our constructs.
Afterwards, we present and discuss the results, followed by the conclusion of this paper. The last section
provides the contribution and limitations of this study, as well as an outlook on future research.
Reliance on algorithmic advice has been studied in medical, economic, and legal contexts, as well as for
subjective decisions (i.e., recommending a joke). These contexts are predominantly defined by high
uncertainty regarding the decision outcome, user-specific influences, including domain expertise, or
potential confounding factors such as delegation of decision responsibility. At the same time, little is known
about how algorithmic advice performs in other contexts where costs of system error are high for the user,
yet task uncertainty is limited. Liel and Zalmanson’s (2020) paper was one of the few which investigated
the effect of erroneous AI on advice reliance in the context of a simple judgement task where algorithmic
mistakes were quite apparent to the user. Whereas previous research has investigated the preference and
reliance between algorithmic and human advice, little is known about when algorithmic advice is over- or
underutilized.
per default introduce outcomes that are not fully predictable nor explainable, and, most critically, bring
along a certain probability of inaccuracy. Improving accuracy rates and model performance in practice can
be achieved by building high quality datasets, however, the improvement of accuracy rates is a trade-off
between resources and performance, with users still going to be exposed to erroneous recommendations in
the near future (Rahwan et al., 2021). If experiencing an AI-based IS providing potentially erroneous
information negatively affects perceptions of trust and reliance on advice, the question arises of how trust
can be restored in the case of an erring AI-based system.
Different mechanisms have been proposed to overcome algorithmic aversion and thus increase trust in AI-
based systems. Yeomans et al. (2019) gather evidence on how explanations on the underlying workings of
an algorithm decreases algorithmic aversion. In a similar vein, Berger et al. (2021) find that demonstrating
an AI-based system’s ability to learn also acts as a countermeasure for algorithmic aversion. They argue
that an increased understanding of how systems work leads users to assign increased capabilities to the AI-
based IS, and thus trust it more heavily. More so, Berger et al. (2021) suggest exploring transparency as a
further moderator to consider when studying the impact of erroneous algorithmic advice. Beyond their
papers, however, measures to enhance trust in algorithmic advice have barely been explored.
One possible solution to increase trustworthiness is the revelation of the system’s accuracy to the user. As
part of the Ethics Guidelines for Trustworthy AI proposed by the European Commission, transparency has
been mentioned as one of the key dimensions to establish trustworthiness around AI-based systems. More
specifically, the Ethics Guidelines propose that “[...] the AI system’s capabilities and limitations should be
communicated to AI practitioners or end-users in a manner appropriate to the use case at hand. This could
encompass communication of the AI system's level of accuracy.” (European Commission 2019). Returning
to Lee and See’s (2004) conceptual model on trust and reliance, information on the IS’ performance is
considered as a relevant factor influencing trust and related behavior. Based on the reviewed studies and
the notion of performance information allowing to potentially reduce algorithmic aversion, we thus propose
that:
H2a: Transparency on a CA’s stated accuracy levels will alleviate the stated effect of incorrect
algorithmic advice on trust in the information provided by the CA.
In a similar vein, if algorithmic output contradicts human experiences and intuitions, increased
transparency and interpretable explanations gain in importance to avert resistance and aversion towards
algorithmic advice. Subsequently, we propose that:
H2b: Transparency on a CA’s stated accuracy levels will induce users receiving incorrect algorithmic
advice conform more heavily to the advice.
Overall, we suggest that the correctness of algorithmic advice has an effect on user trust in and reliance on
the provided information (H1). In addition, making the stated accuracy levels of the AI-based system
transparent to the user moderates the effect of advice correctness on user perceptions and behavior, thereby
resolving algorithmic aversion in erroneous advice (H2). Our research model is illustrated in Figure 1.
Transparency of
CA’s Accuracy Level
H2a Trust
in Information
H1a provided by CA
Correctness
of Advice H2b
of CA
H1b Reliance on
Algorithmic Advice
Research Methodology
To test our conceptual model and related hypotheses, we conducted an online experiment to investigate (1)
the effect of incorrect algorithmic advice on related user perceptions and behavior and (2) if transparency
on an AI-based system’s stated accuracy levels alter the effects of such an incorrect advice. We relied on a
2 (correct vs incorrect advice) x 2 (no transparency cue vs transparency cue) between-subject design
resulting in one control group (CG) and three treatment groups (TG). The participants were randomly
assigned to one of the four groups. For the AI-based system, we implemented and manipulated an intent-
based CA built with the Dialogflow framework. The web experiment was facilitated by the behavioral lab of
our university.
construct to measure argumentative competencies. Participants were asked to read a discussion of two
teachers concerning the topic "Does TV make students aggressive?".
As part of the task, students had to answer certain questions about the argumentation strategy of the
involved parties in the discussion. After two control questions to ensure the past understanding of the task,
we asked the participant to interact with our CA. Participants were asked to answer four multiple choice
questions with the help of a CA which would guide them and provide suggestions to the individual
questions. The CA announced the sequence of questions and asked users for their initial answer for each
multiple-choice question. Subsequently, the agent provided its recommended answer to the respective
multiple-choice question and allowed participants to revise their initial answer. After users communicated
their final answer, the CA moved on to the next question. This was repeated for all four multiple choice
questions. To ensure a conscious interaction with the CA and increase critical judgement, we set a
mandatory minimum time of three minutes to spend on the four questions (Liel and Zalmanson 2020). A
countdown indicated the remaining time. Participants were able to continue with the experiment after the
countdown was finished. The control group used a CA providing correct advice without a transparency cue.
TG1 used a CA providing incorrect advice without a transparency cue, whereas TG2 (correct advice) and
TG3 (incorrect advice) used a CA that incorporated a statement on the accuracy level of the CA. We did not
provide any further introduction to any of the tools.
3) Post-Test: In the last step, participants completed a post-experiment questionnaire. In total, we asked
30 items to assess participants’ perception regarding the advice and the conversational agent itself, and to
control for manipulation. Moreover, we asked three qualitative questions: "What did you particularly like
about the use of the chatbot?", "What else could be improved?" and "Do you have any other ideas?". Finally,
we captured the demographics. Final information informed participants about the actual purpose of the
study and provided them with contact details of the researching team.
Measures
The measurement of our behavioral dependent variable was part of the multiple-choice reading
comprehension in our experiment. Previous studies on algorithmic advice have relied upon several different
instruments to measure the influence of advice given on the user’s decision making. Since our experiment
task encompassed a binary outcome possibility, we followed Liel and Zalmanson (2020, p. 4), who also
investigated the effect of erroneous recommendations, and their measure of conformity, which they defined
as “the act of choosing the incorrect answer recommended by the algorithm.”. This is congruent with
alternative operationalizations of reliance used in related empirical research including Zhang et al.'s (2020)
switch percentage measure. Hence, reliance was measured as the proportion of participants who adjusted
their final answers according to the CA’s suggestions at least once. Reliance on advice therefore captured if
participants “followed” the AI-based advice and was quantified as 1 if a participant’s initial prediction
disagreed with the advice and the participant’s final prediction agreed with the AI-based advice. We also
measured the frequency of adjusted answers according to the advice for each treatment group and whether
participants switched their answers regardless of the CA’s recommendation. Another key measurement is
self-reported trust in information provided by the CA. We measured trust in information with three items
(scale adapted from McKnight et al., 2020, sample item: “To me, the chatbot is generally accurate in
providing recommendations”; 7-point scale, from 1: “Strongly Disagree” to 7: “Strongly Agree”,
αTrustInformation= .91).
Besides our two key dependent variables, we measured multiple control and demographic variables in both
the pre- and post-experimental questionnaire, including participants’ trusting disposition (Gefen and
Straub 2004), personal innovativeness (Agarwal and Prasad 1998), cognitive rigidity (Lewicki and Bunker
2012), Big Five personality constructs (Rammstedt and John 2007), felt risk of technology (scale adapted
from McKnight et al. 2020, sample item: “How would you characterize the decision of whether to use the
chatbot as a decision aid”; 7-point Likert scale , from 1: “Strongly Disagree” to 7: “Strongly Agree”, αFeltRisk=
.85) and algorithmic familiarity (scale adapted from Johnson and Russo 1984).
A total of 198 participants were recruited from a university course at a higher-level education facility and
randomly assigned to the two-by-two cell between subject experiment. Attentive participation was
incentivized by making the content of the experiment part of the students’ final course assessment. We
removed those subjects who failed the attention check or did not complete the interaction with the AI-based
agent by providing complete answers to all four multiple choice questions. We further removed participants
who exhibited outlier characteristics regarding, i.e., completion time, leaving us with a final sample set of
156 subjects. A potential boundary represents the survey distribution within the course of master students
who most probably exhibit higher literary and familiarity with AI-based systems as compared to the general
public. Additional analyses on the control and demographic variables comparing the experiment groups
confirm participants’ random assignment to the different experimental conditions. Specifically, there are
no significant differences in trusting disposition, personal innovativeness, or cognitive rigidity among the
four treatments (all p > .1). In addition, no differences were found regarding the demographic variables age
and gender (all p > .1).
Results
We used a manipulation check for the transparency treatment, asking to what extent participants agree
with the following two statements: 1) “As part of the tasks, the chatbot revealed information about itself,
namely why its recommendations might be flawed.”, and, 2), “As part of the tasks, the chatbot revealed
information about itself, namely an explanation of the legal guidelines that the chatbot must adhere to.” (7-
point Likert scale, from 1: “Does not apply at all” to 7: “Applies completely”). The results of the manipulation
checks indicated that the transparency manipulation worked as intended: First, an ANOVA on the first
statement revealed a significant manipulation effect (F = 66.62, p < .001) with participants from the two
treatment groups receiving transparency statements (Accurate, Transparency; Erroneous, Transparency)
exhibiting a significant higher confirmation of the first statement (MTransparency = 4.32) than participants who
were not exposed to the transparency-enhancing statements (MNoTransparency = 2.02). Another ANOVA on the
second statement strengthened this finding as no significant effect between the transparency present
(MTransparency = 1.96) versus transparency absent (MNoTransparency = 1.86) treatments could be found. We tested
our hypotheses by conducting a series of analyses in R Studio.
TG1: Correct 40 4.67 7.5% 97.5% 92.5% 97.5% 97.5% 24.35 70%
Advice,
Transparency
Statements
TG2: Incorrect 41 2.33 22% 80.5% 82.9% 82.9% 95.1% 25.32 68.3%
Advice, No
Transparency
Statements
TG3: Incorrect 35 3.06 31.4% 68.6% 80% 85.7% 97.1% 24.34 62.9%
Advice,
Transparency
Statements
Correctness *** ** *** *** *** ns NA
Manipulation p < .001 p < .01 p< p< p<
.001 .001 .001
Table 1. Means for Perceptual and Behavioral Outcome Variables across Four Groups
For Hypothesis 1a – the effect of advice correctness on trust in the provided information – we conducted
an ANOVA comparing participants in the two conditions receiving correct advice with participants receiving
incorrect advice. The test revealed a significant main effect (F(1, 156) = 143, p < .001). In fact, participants
who received incorrect recommendations perceived the information the CA provided as less trustworthy
(MIncorrect = 2.78) as compared to those who received correct recommendations to the multiple-choice
answers (MCorrect = 4.95).
Hypothesis 1b predicted a negative impact of the correctness of CA on reliance on advice. A chi-square test
of independence revealed a statistically significant association between the correctness of the AI-based
advice and reliance on advice, X2 (1, N = 156) = 4.89, p < .01. As such, H1b is not supported in that
participants who received an incorrect advice by the CA significantly relied more heavily on those
recommendations as compared to participants who received correct algorithmic advice.
The results of our moderation analysis (see Fig. 5) show that a significant interaction effect between advice
correctness and transparency exists for trust in information (F(3, 156) = 8.76, p < .001). Pairwise t-tests
reveal significant mean differences between all treatment groups. The pairwise comparison between the
correct, no transparency (MCorrect, No Transparency = 5.23) and the erroneous, no transparency (MIncorrect, No
Transparency = 2.54) treatments (p < .001), as well as that between the correct, no transparency (M Correct, No
Transparency = 5.23) and the erroneous, transparency (MIncorrect, Transparency = 3.06) treatment (p < .001) are
significant. In a similar vein, the pairwise comparison between the correct, transparency (MCorrect, Transparency
= 4.67) and the erroneous, no transparency (MIncorrect, No Transparency = 2.54) treatments (p < .001), as well as
that between the correct, transparency (MCorrect, Transparency = 4.67) and the incorrect, transparency (MIncorrect,
Transparency = 3.06) treatment (p < .001) are significant. Last, we found that participants receiving correct
advice without any transparency statements reported significantly higher levels of trust in information as
opposed to participants who also received correct advice yet including transparency statements (p < .05).
At the same time, participants receiving incorrect advice without any transparency statements reported
significantly lower levels of trust in information as opposed to participants who also received correct advice
yet including transparency statements (p < .05). These results provide support for H2a.
Replicating the moderation effect of transparency on reliance on algorithmic advice, a chi-square test of
independence shows that transparency moderates the effect of recommendation correctness on reliance on
the recommended answers, X2 (3, N = 156) = 7.71, p < .05. Participants receiving correct advice and
transparency statements adjusted their answers according to the system’s advice significantly less than
expected. At the same time, participants receiving incorrect advice and transparency statements adjusted
their final answers in line with the CA’s advice significantly more than expected. While hypothesis 2b can
be confirmed, the results for the accurate advice treatment groups exhibit unexpected results.
Discussion
AI-supported decision making has been heavily discussed at the forefront of public attention and has drawn
researchers from different fields alike to arrive at contrasting conclusions regarding the trust decision-
makers place in algorithmic advice. As part of the current study, we aimed to further clarify under which
conditions algorithmic advice is trusted and relied upon. Past research is incongruent on the influence of
algorithmic judgement, as predominant literature streams on algorithmic aversion and algorithmic
appreciation show. Furthermore, we sought to explore whether making an algorithm’s accuracy level
transparent to the user may influence perceptions and reliance on advice, given that transparency is
oftentimes mentioned as a design mechanism to increase trust in IS yet can also increase cognitive load of
the user. We studied these questions by simulating a reading comprehension task within a job assessment
setting. Our experiment setting implied a limited set of possible answer options and algorithmic advice was
embodied in a CA’s recommendations. While this experiment design attempted to resemble a realistic
scenario, these boundary conditions should be considered when comparing the current study’s results with
similar research. Earlier studies predominantly focused on forecasting and estimation task, as well as did
not embody algorithmic advice as part of a technological artefact. Ultimately, we strived to arrive at a more
nuanced understanding of how perceptions and behavior related to algorithmic advice differ depending on
the correctness of the advice, as well as how transparency on the system’s accuracy levels moderates such
perceptions and behavior.
According to our results (see Table 2), users generally perceive incorrect algorithmic advice as less
trustworthy than correct advice. Our findings thereby support the claim that performance of AI-based
systems matters (Liel and Zalmanson 2020). At the same time, our analyses indicate that users receiving
incorrect algorithmic advice more heavily rely on this advice. Our findings thus diverge from existing
notions that humans resist statistical models as they trust more their own intuition (Highhouse 2008). Our
behavioral results are in line with Liel and Zalmanson's (2020) research who find that algorithmic
recommendations have strong persuasive power in leading users to adapt their behavior and judgement
according to incorrect algorithmic advice. While our findings converge with the literature stream around
algorithmic appreciation, stating that algorithmic advice is often trusted beyond the algorithm’s actual
capabilities (Logg et al. 2019), they point towards a novel, previously neglected pattern, namely the relation
between perceived trust in the algorithmic advice and behavioral reliance on the recommended answers.
Notions such as the personalization privacy paradox (Kehr et al. 2015; Li et al. 2017) have already discussed
the gap between attitude and behavior in related domains and in IS. However, no such gap or paradox has
been studied in the context of algorithmic acceptance or algorithmic aversion before.
In a similar vein, our findings regarding the transparency manipulation provide an indication of the
complexity trust-enhancing mechanisms can introduce. Indeed, our results show that users appreciate
transparency, yet only when algorithmic advice has been erroneous. Accordingly, making transparent a
system’s performance level to the user elicits even higher levels of reliance on advice and trust in advice and
can thus compensate for a lack of trust introduced through erroneous recommendations. At the same time,
transparency decreases trust in and reliance on correct advice. While these findings support the idea of that
demonstrating transparency around an AI’s performance levels is a promising countermeasure specifically
for systems that suffer from inaccuracy, transparency does not act as a default design mechanism to foster
trust. Making transparent an AI-based system’s performance level, thus, can create costly behavioral biases
in decision making (i.e., trusting and conforming to erroneous AI) and prevent effective system use (i.e.,
conforming to correct AI). We suggest that while participants in the incorrect advice conditions felt
reassured by a stated confidence level of 90%, participants in the correct advice conditions might have
started to consider the possibility for the AI-based system to err, which they potentially would not have
done without the statement. In addition, extant research on trust in AI-based systems has largely explored
how to enhance trust, i.e., through a system’s social cues (Feine et al. 2019).As previous examples illustrate,
however, when viewing effective system use as a function of system performance, an increase in trust is not
always desirable.
Last, comparing final correct answers across the four treatments and the four multiple choice questions
(see Table 1), the significant differences between the correct versus incorrect advice groups strengthen our
previous results. However, one can see for users who received incorrect algorithmic advice, the percentage
of subjects which submits a correct final answer increases over the time of the four questions. The range of
correct final answer submitted between the first and last multiple-choice question is particularly great for
our treatment group 3, with the CA giving incorrect advice and providing transparency statements. This
preliminary finding is in line with Berger et al.'s (2021) study which shows that the influence of incorrect
algorithmic advice decreases with increased familiarity of the decision maker. As part of future research,
the effect of algorithmic advice over time must be further explored. Little is known about the sequence and
threshold of an erring AI-based system, i.e., the number of times a system can err before a user deviates
from his or her reliance on the system.
Implications
Our research holds important and novel implications for understanding decision makers’ trust in and
reliance on AI-based systems in the context of erroneous advice, thereby contributing to researchers’ call
for a consideration of the unintended consequences of AI-based IS (Rahwan et al. 2019). While model
interpretability has been investigated to improve performance of human decision-making and perceived
trustworthiness of the underlying ML model (Alufaisan et al. 2020; Doshi-Velez and Kim 2017; Shin 2021),
researchers in the field of IS have been urging to further empirically study the disclosure of ML’s inner
workings to users and how such interpretations can meet ultimate users’ demands (Bauer et al. 2021). The
findings of the current study point towards a more nuanced picture on the paradox between user perception
and related behavior, as well as on the contradicting effect transparency additionally has on perception and
behavior. We contribute to existing research in the fields of IS, Human-Computer Interaction and
psychology on the current understanding of how humans perceive the use of (erroneous) machines as
decision aids. From a design science perspective, we empirically evaluate a specific design feature, namely
an instantiation of transparency for trustworthy AI.
We explored the notions of trust and reliance in the specific context of a reading comprehension task. Our
results suggest that a general trust in and reliance on algorithmic advice do not exist but can be viewed as a
function of system performance. Additionally, we demonstrate that transparency on the algorithm’s
accuracy levels strengthens the effect that erroneous algorithmic advice has on trusting in and relying on
that advice. More so, transparency influences the impact of algorithmic advice in different ways, very much
depending on the correctness of the advice. We revisit established notions of trust (Söllner et al. 2012) and
Lee and See’s (2004) conceptual model of the dynamic process on trust and its effect on reliance specifically.
While we cannot demonstrate a mediating effect of trust on subsequent behavior, i.e., reliance, our findings
point towards the importance of conceptually differentiating between trust as a perceptual and reliance as
a behavioral outcome. In addition, our results contribute to Lee and See’s (2004) notion of information on
system performance as a crucial factor influencing trust in IS.
Instantiations of AI-based IS as used as part of our study are increasingly deployed in learning and support
tasks such as providing suggestions in e-mail applications (Seabrook 2019). From a practitioner’s point of
view, a key concern is the aversion towards algorithmic advice, especially if such advice outperforms human
judgement (Yeomans et al. 2019). Beyond attempting to improving accuracy and performance rates of
deployed systems, practitioners should consider how advice correctness affects actual behavior, and thus,
decision making. While previous research suggests to increase transparency in the algorithmic forecasting
process to foster adoption and use of algorithmic advice (Yeomans et al. 2019), our results indicate that
transparency-enhancing design features should be implemented with caution. The current research shows
that transparency can raise counterproductive actions on the decision-maker side. In that sense, users
might rely more heavily and easily on erroneous advice, while (unnecessarily) questioning correct advice.
Acknowledgements
We thank the Swiss National Science Foundation for funding parts of this research (100013_192718) and
acknowledge funding from the Basic Research Fund (GFF) of the University of St. Gallen.
References
Agarwal, R., and Prasad, J. 1998. “A Conceptual and Operational Definition of Personal Innovativeness in
the Domain of Information Technology,” Information Systems Research (9:2), pp. 204–215.
(https://doi.org/10.1287/isre.9.2.204).
Alufaisan, Y., Marusich, L. R., Bakdash, J. Z., Zhou, Y., and Kantarcioglu, M. 2020. “Does Explainable
Artificial Intelligence Improve Human Decision-Making?,” ArXiv.
(https://doi.org/10.31234/osf.io/d4r9t).
Baird, A., and Maruping, L. M. 2021. “The next Generation of Research on Is Use: A Theoretical Framework
of Delegation to and from Agentic Is Artifacts,” MIS Quarterly: Management Information Systems
(45:1), pp. 315–341. (https://doi.org/10.25300/MISQ/2021/15882).
Bauer, K., Hinz, O., van der Aalst, W., and Weinhardt, C. 2021. “Expl(AI)n It to Me – Explainable AI and
Information Systems Research,” Business and Information Systems Engineering (63:2), Springer
Fachmedien Wiesbaden, pp. 79–82. (https://doi.org/10.1007/s12599-021-00683-2).
Berger, B., Adam, M., Rühr, A., and Benlian, A. 2021. “Watch Me Improve—Algorithm Aversion and
Demonstrating the Ability to Learn,” Business and Information Systems Engineering (63:1), pp. 55–
68. (https://doi.org/10.1007/s12599-020-00678-5).
Castelo, N., Bos, M. W., and Lehmann, D. R. 2019. “Task-Dependent Algorithm Aversion,” Journal of
Marketing Research (56:5), pp. 809–825. (https://doi.org/10.1177/0022243719851788).
Cathy O’Neil. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens
Democracy., Crown Publishing Group, USA.
Commission, E. 2019. “Ethics Guidelines for Trustworthy AI.” (https://digital-
strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai).
D’Amour, A., Heller, K., Moldovan, D., Adlam, B., Alipanahi, B., Beutel, A., Chen, C., Deaton, J., Eisenstein,
J., Hoffman, M. D., Hormozdiari, F., Houlsby, N., Hou, S., Jerfel, G., Karthikesalingam, A., Lucic, M.,
Ma, Y., McLean, C., Mincu, D., Mitani, A., Montanari, A., Nado, Z., Natarajan, V., Nielson, C., Osborne,
T. F., Raman, R., Ramasamy, K., Sayres, R., Schrouff, J., Seneviratne, M., Sequeira, S., Suresh, H.,
Veitch, V., Vladymyrov, M., Wang, X., Webster, K., Yadlowsky, S., Yun, T., Zhai, X., and Sculley, D.
2020. “Underspecification Presents Challenges for Credibility in Modern Machine Learning,” ArXiv.
Dawes, R. M. 1979. “The Robust Beauty of Improper Linear Models in Decision Making,” American
Psychologist (34:7), pp. 571–582. (https://doi.org/10.1037/0003-066X.34.7.571).
Dietvorst, B. J., Simmons, J., and Massey, C. 2015. “Understanding Algorithm Aversion: Forecasters
Erroneously Avoid Algorithms After Seeing Them Err,” Academy of Management Proceedings
(2014:1), pp. 12227–12227. (https://doi.org/10.5465/ambpp.2014.12227abstract).
Dijkstra, J. J., Liebrand, W. B. G., and Timminga, E. 2017. Persuasiveness of Expert Systems
Persuasiveness of Expert Systems, (3001:January), pp. 155–163.
Doshi-Velez, F., and Kim, B. 2017. Towards A Rigorous Science of Interpretable Machine Learning, (Ml),
pp. 1–13. (http://arxiv.org/abs/1702.08608).
Dzindolet, M. T., Pierce, L. G., Beck, H. P., and Dawe, L. A. 2002. “The Perceived Utility of Human and
Automated Aids in a Visual Detection Task,” Human Factors (44:1), pp. 79–94.
(https://doi.org/10.1518/0018720024494856).
Elkins, A. C., and Derrick, D. C. 2013. “The Sound of Trust: Voice as a Measurement of Trust During
Interactions with Embodied Conversational Agents,” Group Decision and Negotiation (22:5), pp.
897–913. (https://doi.org/10.1007/s10726-012-9339-x).
Endsley, M. R. 2017. “From Here to Autonomy: Lessons Learned from Human-Automation Research,”
Human Factors (59:1), pp. 5–27. (https://doi.org/10.1177/0018720816681350).
Faltings, B., Pu, P., Tran, B. D., and Jurca, R. 2014. “Incentives to Counter Bias in Human Computation,”
Proceedings of the 2nd AAAI Conference on Human Computation and Crowdsourcing (Hcomp), pp.
59–66. (https://doi.org/10.1145/2441776.2441828).
Feine, J., Gnewuch, U., Morana, S., and Maedche, A. 2019. “A Taxonomy of Social Cues for Conversational
Agents,” International Journal of Human Computer Studies (132:June), Elsevier Ltd, pp. 138–161.
(https://doi.org/10.1016/j.ijhcs.2019.07.009).
Flender, J., Christmann, U., Groeben, N., and Mlynski, G. 1999. “Development and First Validation of a
Scale of Passive Argumentational-Rhetorical Competence,” Zeitschrift Für Differentielle Und
Diagnostische Psychologie (20:4), pp. 309–325.
Fox, A. B., Rosen, J., and Crawford, M. 2009. “Distractions, Distractions: Does Instant Messaging Affect
College Students’ Performance on a Concurrent Reading Comprehension Task?,” Cyberpsychology
and Behavior (12:1), pp. 51–53. (https://doi.org/10.1089/cpb.2008.0107).
Gefen, D., and Straub, D. W. 2004. “Consumer Trust in B2C E-Commerce and the Importance of Social
Presence: Experiments in e-Products and e-Services,” Omega (32:6), pp. 407–424.
(https://doi.org/10.1016/j.omega.2004.01.006).
Goddard, K., Roudsari, A., and Wyatt, J. C. 2012. “Automation Bias: A Systematic Review of Frequency,
Effect Mediators, and Mitigators,” Journal of the American Medical Informatics Association (19:1),
pp. 121–127. (https://doi.org/10.1136/amiajnl-2011-000089).
Grove, W. M., and Meehl, P. E. 1996. “Comparative Efficiency of Informal (Subjective, Impressionistic) and