0% found this document useful (0 votes)
18 views25 pages

2024 - How do ML practitioners perceive explainability an interview study of practices and challenges

This study investigates how machine learning practitioners perceive explainability in AI systems, identifying diverse definitions, practices, and challenges they face. Through interviews with 14 practitioners, the research highlights the importance of explainability for transparency, model refinement, and bias mitigation, while noting challenges such as communication with non-technical stakeholders and the lack of standardized approaches. The findings emphasize the need for a robust framework to address explainability requirements in AI development, facilitating better stakeholder engagement and understanding.

Uploaded by

rafa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views25 pages

2024 - How do ML practitioners perceive explainability an interview study of practices and challenges

This study investigates how machine learning practitioners perceive explainability in AI systems, identifying diverse definitions, practices, and challenges they face. Through interviews with 14 practitioners, the research highlights the importance of explainability for transparency, model refinement, and bias mitigation, while noting challenges such as communication with non-technical stakeholders and the lack of standardized approaches. The findings emphasize the need for a robust framework to address explainability requirements in AI development, facilitating better stakeholder engagement and understanding.

Uploaded by

rafa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Empirical Software Engineering (2025) 30:18

https://doi.org/10.1007/s10664-024-10565-2

How do ML practitioners perceive explainability? an


interview study of practices and challenges

Umm-e- Habiba1,3 · Mohammad Kasra Habib4 · Justus Bogner2 ·


Jonas Fritzsch1 · Stefan Wagner1,4

Accepted: 1 October 2024 / Published online: 1 November 2024


© The Author(s) 2024

Abstract
Explainable artificial intelligence (XAI) is a field of study that focuses on the development
process of AI-based systems while making their decision-making processes understandable
and transparent for users. Research already identified explainability as an emerging require-
ment for AI-based systems that use machine learning (ML) techniques. However, there is
a notable absence of studies investigating how ML practitioners perceive the concept of
explainability, the challenges they encounter, and the potential trade-offs with other quality
attributes. In this study, we want to discover how practitioners define explainability for AI-
based systems and what challenges they encounter in making them explainable. Furthermore,
we explore how explainability interacts with other quality attributes. To this end, we con-
ducted semi-structured interviews with 14 ML practitioners from 11 companies. Our study
reveals diverse viewpoints on explainability and applied practices. Results suggest that the
importance of explainability lies in enhancing transparency, refining models, and mitigating
bias. Methods like SHapley Additive exPlanations (SHAP) and Local Interpretable Model-
Agnostic Explanation (LIME) are frequently used by ML practitioners to understand how
models work, while tailored approaches are typically adopted to meet the specific require-
ments of stakeholders. Moreover, we have discerned emerging challenges in eight categories.
Issues such as effective communication with non-technical stakeholders and the absence of
standardized approaches are frequently stated as recurring hurdles. We contextualize these
findings in terms of requirements engineering and conclude that industry currently lacks a
standardized framework to address arising explainability needs.

Keywords Explainable AI · Interviews · Requirements engineering

1 Introduction
Artificial intelligence (AI) is increasingly applied in various domains, including medicine (Caru-
ana et al. 2015; Jin et al. 2020), law (Jin et al. 2023), autonomous driving (Atakishiyev et al.
2021), and loan application approval (Sachan et al. 2020). AI-based systems often play a
key role in the associated critical decision-making processes. Responsible individuals, such
as physicians, judges, drivers, and bankers, require explanations to different extents for the

Communicated by: Markus Borg

Extended author information available on the last page of the article

123
18 Page 2 of 25 Empirical Software Engineering (2025) 30:18

output generated by such AI-based systems. With AI becoming more prevalent as a decision
helper in high-stakes sectors, it is crucial to ensure that AI is understandable to its users (Longo
et al. 2020), guaranteeing safe, responsible, and legally compliant usage (Lagioia et al. 2020).
Machine learning (ML) pipelines are capable of producing precise predictions, however,
they frequently fail to incorporate two crucial phases, namely understanding and explaining.
Understanding involves analyzing the problem domain, data, and model behavior, including
training and quality assurance. Explaining is crucial throughout the ML lifecycle, espe-
cially when models are deployed in real-world applications. Both phases contribute to model
interpretability, trust, and effective implementation (Dwivedi et al. 2023). The increasing
autonomy and complexity of AI-based systems pose challenges for software engineers and
domain experts to comprehend them fully (Lipton 2018). This necessity has led to the devel-
opment of eXplainable AI (XAI) systems, where explanations play a pivotal role. Such
systems help users understand decisions made by the AI, thereby increasing confidence and
trustworthiness (Markus et al. 2021). Moreover, explanations serve to verify and validate
actions taken, identify the causes of errors, and reduce the likelihood of human mistakes.
Explainability is an emerging non-functional requirement that has garnered attention as
a critical quality aspect for AI-based systems (Chazette and Schneider 2020; Köhl et al.
2019). An existing study highlights that explainability significantly impacts the overall quality
of software systems (Chazette et al. 2021), contributing to other essential quality features
such as transparency and understandability. Notably, the Ethics Guidelines for Trustworthy
AI (Lipton 2018), recommended by the High-Level Expert Group on Artificial Intelligence
(AI HLEG), prioritize transparency as an essential requirement. These guidelines underscore
the significance of traceability, explainability, and communication in AI-based systems and
stress the importance of providing a clear account of the decision-making process from the
perspectives of relevant stakeholders.
To facilitate the development of explainable AI-based systems, methodologies are essen-
tial for requirements engineers to analyze, delineate, and assess the requirements related to
explainability. However, given that the notion of explainability has recently emerged as a
critical quality factor among non-functional requirements (Köhl et al. 2019), there is cur-
rently no adequate guidance available to assist practitioners in this regard. To understand
how practitioners deal with this lack of guidance, we conducted an interview study to gather
insights from practitioners regarding their interpretations of explainability, the practices they
actively employ, and the difficulties they encounter. These interviews are conducted within
the context of various stakeholders and industry requirement engineering (RE) practices.

2 Background and Related Work


Several researchers have explored the impact of explainability on AI-based systems. Accord-
ing to Brunotte et al. (2021), explanations can enhance user privacy awareness, while (Kästner
et al. 2021) argue that explainable systems increase trustworthiness. However, requests for
explainability and the means to achieve it often lack clarity (Köhl et al. 2019). Sadeghi et al.
(2021) proposed a taxonomy outlining different reasons for explanations, such as improving
user interaction. Sheh (2021) categorized explainability requirements based on the explana-
tion source, depth, and scope in the context of requirements engineering. Ethical guidelines
for trustworthy AI, like fairness and explainability, have been emphasized by Ishikawa and
Matsuno (2020) and Kuwajima and Ishikawa (2019), while Vogelsang and Vogelsang and
Borg (2019) stressed the importance of eliciting explainability requirements from the users’
perspectives.

123
Empirical Software Engineering (2025) 30:18 Page 3 of 25 18

Despite the progress, challenges persist in achieving a comprehensive explainability norm.


The lack of a consistent definition and varying stakeholder understandings represent major
obstacles (Köhl et al. 2019; Suresh et al. 2021). Stakeholder-centric approaches are essential
due to the disparity between AI-generated explanations and the comprehension of stake-
holders (Jansen Ferreira and Monteiro 2021). Additionally, existing model interpretability
methods (Carvalho et al. 2019) may not cater explicitly to end-users (Suresh et al. 2021),
requiring more user-friendly explanations. Further challenges include going beyond exist-
ing AI explainability techniques (Dhanorkar et al. 2021), accommodating different levels of
explanations arising from dynamic interactions between stakeholders, and tailoring methods
to different explainees with unique interests (Henin and Le Métayer 2021).
Many researchers strived to explore different perspectives of explainability. In a qualita-
tive study, Brennen (2020) conducted 40 interviews and 2 focus groups over nine months.
Their goal was to gain a clearer understanding of how various stakeholders in both govern-
ment and industry describe the issue of Explainable AI. The paper highlights two significant
findings: (1) current discourse on Explainable AI is hindered by a lack of consistent ter-
minology, and (2) there are multiple distinct use cases for Explainable AI. They placed
their findings in the context of needs that existing tools cannot meet. Furthermore, Hoff-
man et al. (2023) focuses on the explanation needs of stakeholders in AI/XAI systems. The
authors introduced the Stakeholder Playbook, designed to help system developers consider
various ways stakeholders need to “look inside” these systems, such as understanding their
strengths and limitations. The authors conducted an empirical investigation involving cogni-
tive interviews with senior and mid-career professionals experienced in developing or using
AI systems. Krishna et al. (2022) analyzes the disagreement problem in explainable ML,
focusing on conflicts among post-hoc explanation methods. The authors conducted semi-
structured interviews with 25 data scientists to define practical explanation disagreements;
formalizing the concept and creating a framework to quantitatively assess these disagree-
ments. Further, the authors applied the framework to measure disagreement levels across
various datasets, data modalities, explanation methods, and predictive models. Lakkaraju
et al. (2022) conducted a user evaluation study. The authors investigated the needs and desires
for explainable AI among doctors, healthcare professionals, and policymakers, revealing a
strong preference for interactive explanations in the form of natural language dialogues. The
study highlighted that domain experts want to treat ML models as accountable colleagues,
capable of explaining their decisions in an expressive and accessible manner. Based on these
findings, the authors propose five guiding principles for researchers to follow when designing
interactive explanations, providing a foundation for future work in this area.
While the presented studies provide valuable insights into the challenges of explainability,
a comprehensive method for meeting end-users’ demands in terms of explainability and
ensuring that the specified requirements align with those demands have not yet been fully
established. Nevertheless, our goal is to gather insights from ML practitioners regarding their
daily practices, challenges, and trade-offs. Our study offers the following observations:

• We explored how practitioners define explainability, categorizing these definitions based


on their intended use. We also synthesized how their roles influence their definitions and
compared these practical definitions with those found in the literature.
• We highlighted the main reasons practitioners need explainability and how the intended
audience influences their choice of practices to address explainability.
• We identified challenges faced by practitioners and, based on these challenges, suggested
future directions for research and practice.

123
18 Page 4 of 25 Empirical Software Engineering (2025) 30:18

• We examined the trade-offs practitioners must consider when implementing explainabil-


ity.
This suggests that there is a need for a more robust approach to ensure that the end users’ needs
for explainability are met and that the requirements can be validated accordingly. Addressing
diverse stakeholder-centric perspectives, specifically non-technical, and providing under-
standable explanations are vital goals. As the field of AI continues to evolve, embracing a RE
approach becomes paramount in addressing these challenges, facilitating effective commu-
nication, and ultimately ensuring the transparency and trustworthiness of AI-based systems.

3 Study Design

To structure our research efforts, we followed the five-step process suggested by Runeson
and Höst (2009). Initially, we formulated our research objective and the associated research
questions.
Our objective is to identify the practices, challenges, and trade-offs professionals face in
the XAI field. Furthermore, we want to chart potential research avenues for the academic
community to develop solutions addressing XAI challenges and providing effective support
for practitioners. To establish a clear direction and scope for our study, we subsequently
formulate four key research questions:
RQ1: How do ML practitioners describe explainability from their perspective?
RQ2: What practices do ML practitioners employ to evaluate the necessity of explain-
ability, and which practices do they apply to address it?
RQ3: Which explainability challenges do ML practitioners experience?
RQ4: Which trade-offs between explainability and other quality attributes do ML
practitioners consider?
These research questions aim to identify the need for understanding the diverse aspects of
explainability in AI-based systems. RQ1 seeks to capture how ML practitioners concep-
tualize explainability, providing insight into diverse perspectives of primary stakeholders,
and establishing a foundational understanding for the concept explainability. RQ2 aims to
uncover the current practices used to evaluate the necessity of explainability and the strategies
employed to achieve it. Further, identifying practical approaches and gaps in the implementa-
tion of these practices. RQ3, eventually, focuses on identifying the specific challenges faced
by practitioners, which is crucial for addressing explainability effectively. Finally, RQ4 seeks
to identify the trade-offs between explainability and other quality attributes, recognizing that
balancing these aspects is essential for the practical deployment of AI-based systems. These
questions as a whole provide a comprehensive exploration of explainability in the context of
AI-based systems, guiding future research and development to enhance their transparency,
trustworthiness, and usability.
To answer these research questions, we followed an exploratory and qualitative research
approach by conducting semi-structured interviews (Runeson and Höst 2009). Without a
preconceived hypothesis, we aim to investigate the topic in a preliminary and open-ended
manner. This approach gave us a foundational structure while at the same time granting us
the flexibility to dynamically adjust our inquiries in response to participants’ feedback. To
guide our selection of interviewees, we established the following set of criteria:
• The person has at least two years of experience working on AI/ML projects.
• The person is currently working on an AI/ML project or has worked on one in the past
two years.

123
Empirical Software Engineering (2025) 30:18 Page 5 of 25 18

We enlisted participants through personal industry connections within our research team
and by contacting individuals in our LinkedIn network. We followed the referral chain sam-
pling (Baltes and Ralph 2022) in which participants are initially selected through convenience
sampling, and then asked to refer or recommend other potential participants. Eight partici-
pants were recruited through convenience sampling and six via referral chain sampling.

3.1 Interview Process

An interview preamble (Runeson and Höst 2009) was designed to explain the interview
process and theme to the participants before conducting the interviews. This document was
distributed to the participants in advance to acquaint them with the study. The preamble also
delineated ethical considerations, such as confidentiality assurances, requests for consent
for audio recordings, and a guarantee that recordings and transcripts would remain confi-
dential and unpublished. Further, we created an interview guide (Seaman 2008) containing
the questions organized into thematic categories. This guide helped to structure and orga-
nize the semi-structured interviews but was not provided to interviewees before the sessions.
Additionally, we prepared a slide presentation with supplementary materials to provide con-
textual information related to our research. These slides provide information about the RE
process and explore the intersection between XAI and RE. These slides were presented to
the participants immediately before the interviews.
In total, we conducted a series of 14 individual interviews. All of them were conducted
remotely via Webex in English and lasted 35 to 55 minutes. We loosely adhered to the
structure provided in the interview guide but adapted our approach based on the participant’s
responses. To establish rapport and initiate the discussions, we began by asking participants
to introduce themselves, outlining their roles and the specific systems they were involved
with. Subsequently, we transitioned into the topic of explainability and explored the practices
followed by industry practitioners in this domain. The next segment focused on the challenges
encountered by participants when endeavoring to make their systems more explainable.
Lastly, we delved into inquiries regarding the trade-offs between explainability and other
quality attributes. Following the interviews, we transcribed each audio recording to create a
textual document for further analysis.

3.2 Data Analysis

Our data analysis started with coding each transcript, adhering to the constant comparison
method rooted in grounded theory principles (Seaman 2008). Following the preliminary
established set of codes, we assigned labels to relevant paragraphs within the transcript.
Throughout this procedure, we generated new labels, revisited previously completed tran-
scripts, and occasionally renamed, split, or merged labels as our understanding of the cases
grew more comprehensive. After that, we conducted a detailed analysis of the intricacies and
relationships among the codes within each transcript. This analysis resulted in the creation
of a textual narrative for each case.
The supplementary materials created in this process, including study design and the inter-
view preamble, can be found online1 .

1 https://zenodo.org/doi/10.5281/zenodo.10034533

123
18 Page 6 of 25 Empirical Software Engineering (2025) 30:18

4 Results

In this section, we present the interview results, grouped by our four initially stated research
questions. The data resulted from interviewing a cohort of 16 participants. We decided to
exclude the data obtained from two participants. The rationale for this exclusion stemmed
from our observation that these two individuals were Ph.D. students and lacked sufficient
industry experience according to our criteria. This left us with 14 participants, representing 9
German companies, one Nigerian company, and one Swiss company. Of the 14 participants,
9 were from companies that provide AI solutions, 4 were from the automotive domain and
one participant was from a policymaking agency, as shown in Table 1. All participants were
experienced with AI-based systems, and some had also prior experience in different software
engineering roles.

4.1 How Do ML Practitioners Describe ‘Explainability’ from their Perspective? (RQ1)

We first asked practitioners to define explainability, thereby gathering diverse perspectives


on this concept, as shown in Table 2. They also mentioned their current target audience to
provide explainability. Based on their responses, we also identified the audience, i.e., the
target users for whom an explanation is warranted. We further grouped these definitions into
four categories. In the following, we summarize the interviewee’s definitions based on these
four categories.
Explainability for Transparency and Trust P1, P5, P8, P10, and P14 described explain-
ability in terms of transparency and trust. During the interview, P1 defined explainability by
stating that “explainability can be defined as a requirement for general acceptability of a
system where lack of transparency makes it challenging.” Similarly, P14 stated that “explain-
ability contributes to establishing trust in new applications.” In an ML context, P5 described
that “explainability is making practitioners understand a pre-trained model to understand
the constraints, limitations, and the opportunities for improvement and maybe even poten-
tial risks in the business context”. Further, from a feature-centric perspective, P8 mentioned
that “trusting and being able to explain the features is crucial because if they cannot be
trusted or explained, it may lead to issues with the ML model”. P10 specified explainability
close to the core of ML development by stating that “explainability should be considered
from various perspectives throughout the development process. It impacts system properties
such as requirements, design, testing, implementation, and safety. It is crucial to address
explainability in each stage to ensure accountability and responsibility.”
Explainability for Understanding Decision-making and Model Improvement Another
view of explainability, highlighted by six ML practitioners, was explainability for under-
standing the decision-making process of an AI-based system. P2, P4, P6, and P9 emphasized
that explainability is essential for the end-user to understand how the system has arrived at
a particular result. In this context, P6 stated that “we are still far away from removing the
human-in-the-loop. So, at the end of the day, the business leader has to approve the deci-
sion.” P2, P12, and P13 emphasized that explaining and understanding the decision-making
process of an AI-based system can serve as a debugging tool for developers. Lastly, P13 said
that “explanations aid in informed decision-making and taking appropriate action.”
Explainability for Model Insights Participant P7 defined explainability as making complex
concepts understandable to non-technical individuals by elucidating the types of data and
attributes utilized in the process. Similarly, P11 defined explainability as “why the model is

123
Table 1 Participant demographics

Participant ID Company ID Role Experience Expertisea Country Domainb

P1 C1 Data scientist 8 years DL/ML Nigeria Policy making


P2 C2 Data scientist 5 years ML Germany AI-SP for audio data
P3 C3 Solution & chief architect in automotive 4 years ML Germany AI-SP for autonomous driving
Empirical Software Engineering (2025) 30:18

P4 C4 Data scientist 6 years NLP/ML Germany Automotive


P5 C3 AI solution architect 3 years NLP Germany AI-SP for face recognition
P6 C5 Data scientist, Architect 6 years ML Switzer-land AI-SP for E-commerce
P7 C6 Data scientist 3 years ML Germany AI-SP for real estate
P8 C7 Data scientist 3 years ML Germany Automotive
P9 C7 AI ethics responsible 3 years AI/NLP Germany Automotive
P10 C8 AI safety expert 5 years DL/ML Germany Automotive
P11 C9 Data scientist 4 years ML/DL Germany AI-SP for demand forecasting
P12 C10 Data scientist 10 years ML Germany AI-SP
P13 C3 Data engineer 3 years ML Germany AI-SP for explainable systems in insurance domain
P14 C11 Data manager 7 years AI and Data Germany AI-SP for creating intelligent technologies
a ML = Machine Learning; DL = Deep Learning; NLP = Natural-Language Processing
b AI-SP = AI Solution Provider
Page 7 of 25

123
18
Table 2 Explainability insights by participants
18

Category Definitions Targeted user Practices

123
Explainability for transparency and trust P1: Understanding and interpreting how End-user Demonstrating feature relationships
ML models make decisions, addressing through scientific methods and present-
Page 8 of 25

transparency and reliability concerns. ing work in a language understandable


to stakeholders.
P5: Turning a black box into a white box, Engineers & other technical personnel Incorporating existing open source tools
enabling understanding for practitioners for dynamic presentation
P8: Trusting and explaining features to Engineers & other technical personnel No standard practices
avoid issues with ML models
P10: The core of development, includ- End-user No standard practices
ing trust, safety, and design.
P14: Establishing trust by providing End-user No standard practices
understanding and transparency.
Explainability for user understanding P2: Justifying decisions from end-users Engineers & other technical personnel A hybrid approach is adopted, com-
and decision-making and gaining knowledge for developers bining unexplainable AI-based systems
with post-decision statistics to enhance
the explanation of the decision-making
process
P4: Knowing how the system filtered End-user No standard practices
and achieved results to explain to the
end-user.
P12: Understanding how the system fil- Engineers & other technical personnel SHapley values
tered and achieved results
Empirical Software Engineering (2025) 30:18
Table 2 continued
Category Definitions Targeted user Practices

P6: Explaining decisions to business End-user Custom implementation


teams, regulators, etc.
P9: Providing additional output to guide End-user No standard practices
user understanding of important input
parts.
P13: Decoding ML model predictions in Engineers & other technical personnel LIME and SHapley values
a user-friendly manner.
Explainability for model insight and P2: Using explainability as a debug tool Engineers & other technical personnel A hybrid approach is adopted, com-
improvement to improve system performance. bining unexplainable AI-based systems
with post-decision statistics to enhance
Empirical Software Engineering (2025) 30:18

the explanation of the decision-making


process.
P7: Telling non-technical personnel End-user No standard practices
about data attributes and changes
P11: Explaining how the model works Engineers & other technical personnel No standard practices
and the impact of features or data
Explainability for safety and bias miti- P3: Describing what happens inside a Engineers & other technical personnel Currently, no standard practices.
gation black box and avoiding bias against
humans.
P14: Emphasizing the importance of Engineers & other technical personnel No standard practices
explaining the data used for building the
model.
Page 9 of 25

123
18
18 Page 10 of 25 Empirical Software Engineering (2025) 30:18

giving certain results or predictions by explaining how the model works and the impact of
features or data.” P3 stated that he “would define [it] in a way that somebody could describe
what happens inside the black box”.
Explainability for Safety and Bias Mitigation Only one interviewee (P3) defined explain-
ability in terms of safety and bias mitigation: “it is about safety requirements not to [be]
biased against humans.”
These diverse perspectives on explainability highlight the need for a comprehensive approach
that caters to both technical and non-technical stakeholders. We furthermore analyzed how the
role of participants influences their perspective of explainability, as can be seen in Fig. 1. The
biggest group of Data scientists generally prioritize transparency and acceptability, whith P1
adding a focus on clear communication of model operations and ensuring stakeholder under-
standing. Similarly, P2 emphasized the need for justifying decisions and using explainability
for debugging. Participants P4 and P12 emphasized the importance of understanding system
filtering and results, while P6 and P7 highlighted concerns about bridging the gap between
technical and non-technical stakeholders. Building trust through feature explanation would
be crucial, as stated by P8 and P11.
Solution Architect P3 in the automotive sector focuses on avoiding bias and understanding
black box models. AI Solution Architect P5 aims to make complex systems more inter-
pretable, while AI Ethics Responsible P9 aims at balancing ethical considerations with
usability. AI Safety Expert P10 integrates trust, safety, and design into development. Data
engineer P13, eventually, emphasizes user-friendly decoding of ML predictions, while Data
Manager P14 stresses the importance of trust and transparency in data explanation.
We compared the definitions provided by our practitioners with the ones we found in liter-
ature. For that, we consulted five sources providing such a definition, summarized in Table 3.
These definitions are generally rather generic and do not address a specific domain, stake-
holder group, or specific requirements. Our practitioners, in contrast, define explainability
more specifically based on their job roles, the systems they work on, or the targeted stake-
holders. Based on our analysis, we identified two critical factors that should be considered
when defining the term “explainability”: a) stakeholders and b) system domains.

4.2 What Practices Do ML Practitioners Employ to Evaluate the Necessity of


Explainability, and Which Practices Do they Apply to Address it?(RQ2)
Practitioners face diverse requests for explainability across various channels and employ a
range of practices to meet these demands. In Table 4, we categorized these requests and the
corresponding practices into distinct factors. In the following, we summarize the interviewee’s
statements for each request factor.
User-Centric Factors When providing or implementing explainability, P1, P2, and P4 pri-
marily consider the perspective of the client or end-user. If the client expresses a desire for
explainability, it is essential to ensure that their specific requirements for explainability are
met. This involves clarifying their expectations and understanding what they consider to be
explainable. Additionally, feedback or requests for clarification may originate from upper-
level management or stakeholders. These inputs should be considered when determining the
origin of the requirement. P1 describes several practices to enhance explainability: “Firstly,
I ensure that each stage of my work is well-justified, clearly linking features in model devel-
opment. I use feature engineering, like Principal Component Analysis (PCA), to evaluate
feature contributions, identifying and clarifying significant ones in relation to the model’s

123
Empirical Software Engineering (2025) 30:18 Page 11 of 25 18

Fig. 1 Influence of participants’ roles on aspects of explainability definition

processes.” Where P2 stated that “the approach to explainability practices can vary depend-
ing on the project and the type of data involved.” Further, he added there are no universally
defined best practices, as it often requires a case-by-case decision based on feasibility and
usability. Overall, we have seen that user-centric practices for explainability depend on the
specific project requirements and can involve combining unexplainable AI-based systems
with supplementary statistical explanations after the decision has been made.
Ethical and Safety Considerations P3, P5, and P6 emphasized that ethics and safety drive
the need for explainability requirements. P5 added that a “need for explainability is assessed
based on factors such as high-risk systems impacting human beings and ethical consider-
ations.” Further, P6 stated that “It is ethical, to make sure that your system is unbiased,
especially when it comes to end users.”. This involves ensuring unbiased systems, partic-
ularly in user interactions, where organizations building or providing AI services bear the
responsibility for model explainability, eliminating biases, and addressing privacy concerns.
P3 did not mention a specific practice for addressing explainability requirements in the con-
text of ethical and safety considerations, while P5 and P6 described it as more of a translation
process.
Legal and Regulatory Requirements P8, P10, P11, and P13 stated in their current sce-
nario that explainability is a legal and regulatory requirement. To address such explainability

Table 3 Explainability in AI literature (Habiba et al. 2022)


Papers Explainability

Doshi-Velez and Kim (2017) Explanations are “the currency in which we exchange beliefs”
Lipton (2018) Explainability addresses “what else can the model tell me?”
Gilpin et al. (2018) Mostly concerned with the internal logic, i.e.,
“models that are able to summarize the reasons for neural network
behavior, gain the trust of users, or produce insights about
the causes of their decisions”
Montavon et al. (2018) “An explanation is the collection of features of the interpretable
domain that have contributed to a given example to produce a
decision”
Miller (2019) “Explicitly explaining decisions to people”

123
18 Page 12 of 25 Empirical Software Engineering (2025) 30:18

Table 4 Requirements and practices for explainability mentioned by participants


Factors Explainability drivers Practices to address explainability

User-centric factors P1 & P4: General acceptability P1 focuses on simplicity and


performance. Provide tests and
performance metrics to explain
simply, non-technically, making
it understandable to laymen. P4
emphasised on ensuring repro-
ducibility and bias-free datasets
for client transparency
P2: User interaction and satisfac- P2 emphasizes the use of con-
tion text information to support under-
standing of decisions.
Ethical and safety considerations P3: Bias and fairness P3 did not provide specific details
on how they address explainability
needs, but they mention concerns
related to bias and safety-critical
systems.
P5: Trustworthiness P5’s approach, which includes
translating technical concepts
and providing visualizations,
addresses the need for explain-
ability in high-risk systems and
legal mandates.
P6: Ethical responsibility P6’s approach of explaining in a
language understandable to non-
data scientists aligns with the need
for explainability driven by eth-
ical responsibility and regulatory
requirements.
Legal and regulatory requirements P8: Legal requirement P8’s approach involves using vari-
ous tools, including figures, math-
ematics, statistics, and more, to
provide understandable explana-
tions, e.g., SHapley values.
P13: Legal requirement P13’s approach of incorporat-
ing sources for decision-making
and using plots and rule-based
explanations aligns with the need
for explainability when multiple
sources provide similar explana-
tions.
P10: Law and regulatory enforce- P10’s approach, combining a pilot
ment test with explainability, addresses
the need for explainability in high-
risk systems.
P11: Law and regulatory enforce- P11’s approach of demonstrating
ment value through metrics and using
explainability techniques aligns
with the need for explainability to
convince stakeholders of the AI
model’s effectiveness.

123
Empirical Software Engineering (2025) 30:18 Page 13 of 25 18

Table 4 continued
Factors Explainability drivers Practices to address explainability

Client and business perspective P7, & P9: Client or end-user per- P7’s approach of using meet-
spective ings and various tools aligns
with the need for explainability
from the client’s perspective. P9
works closely with clients to clar-
ify expectations and use different
tools to meet their specific require-
ments for explainability.
P12: Business decision P12 uses plots and rule-based
explanations, including “what if ”
explanations, to address the need
for explainability.
P14: Customer requirements Did not mention any specific prac-
tice
Risk management P5: High-risk systems P5’s approach, which includes
translating technical concepts
and providing visualizations,
addresses the need for explain-
ability in high-risk systems and
legal mandates.
P10: Risk considerations P10’s approach, combining a pilot
test with explainability, addresses
the need for explainability in high-
risk systems.
P11: Internal standards P11’s uses SHAP values or fea-
ture importance are employed to
explain which data inputs are rel-
evant for the model’s outputs.
Data scientist and technical con- P13: Data scientist perspective P13’s approach of incorporating
sideration sources for decision-making and
using plots and rule based expla-
nations aligns with the need for
explainability.

requirements, P8 and P11 mentioned the use of various tools, including figures, mathematics,
and statistics to provide understandable explanations, e.g., SHAP values (Lundberg and Lee
2017). In addition to these tools, P13 mentioned using LIME (Ribeiro et al. 2016), plots,
and rule-based explanations aid to address the need for explainability. However, all of them
stated that stakeholders who are unfamiliar with these tools may still struggle to grasp the
explanations fully. P10 stated “It depends on the scope of your explainability of this product.
Do you want to share everything with the end-user or there are knowledge gaps?”
Client and Business Perspective P4, P12, and P14 explained that they often receive requests
for explainability from their customers, primarily to assist in making informed business
decisions. To address such needs, P12 mentioned plots and rule-based explanations, including
what if explanations, whereas P4 and P14 did not mention any specific practices.
Risk Management P5 and P10 underscored that the risk involved in a system leads to the
need for explainability. P10 emphasized that providing visualizations and using pilot tests
with explainability can help users to comprehend the system.

123
18 Page 14 of 25 Empirical Software Engineering (2025) 30:18

Data Scientist and Technical Considerations P6 and P13 listed internal standards and
data scientists’ needs as a requirement for explainability. To fulfill such needs, they mostly
rely on SHAP values (Lundberg and Lee 2017). Further, they emphasized on incorporating
business-oriented explanations to non-technical stakeholders, making it easier to convey
model predictions. For instance, P6 stated custom dashboards are created to display model
insights in a more understandable manner for different audiences.
Furthermore, we notice a link between roles (Table 1) and the practices that are used for
explainability (Table 4). A trending theme is how AI solution architects and safety engineers
are predominantly associated with factors such as risk management, and ethical and safety
considerations. Such practitioners use visualization and natural language in their explana-
tions. However, data scientists tailor their explanations to the audience. For legal and technical
stakeholders, they may rely on techniques like LIME and Shapley values. However, for end
users, there are no such standard methods for explanations. This leaves a gap in address-
ing the explainability needs at the end-user level. We also analyzed the practices explicitly
mentioned by practitioners to address explainability. Our findings indicate that practitioners
P8 and P11 referred to the use of Shapley values and feature importance. Practitioners P12
and P13 highlighted rule-based explanations, plots, and decision trees, while P5 mentioned
visualization techniques as their approach to explainability. Additionally, our analysis of
existing literature on XAI practices (Dwivedi et al. 2023) reveals that only a limited number
of techniques are being applied in practice. This highlights a significant gap between the
techniques discussed in the literature and their actual adoption by practitioners.
In summary, our practitioners employ a variety of practices to address explainability, tailor-
ing their approaches to the demands stemming from such as user-centric factors, ethical and
safety considerations, legal and regulatory mandates, client and business perspectives, risk
management, and data scientist and technical requirements. The adoption of these practices
is influenced by the intended audience, “to whom it should be explainable”. These practices
were regarded essential for ensuring transparency and facilitating informed decision-making
in AI-based systems across various domains.

4.3 Which Explainability Challenges Do ML Practitioners Experience? (RQ3)

In our exploration of the third research question, we sought to identify the challenges encoun-
tered by ML practitioners when explaining their systems. The insights shared by participants
highlight a spectrum of challenges, each shedding light on the intricacies of achieving explain-
ability in AI-based systems, as illustrated in Fig. 2. In the following, we summarize the
interviewee’s statements for each challenge category.
Communication with Non-Technical Stakeholders A recurring challenge faced by ten par-
ticipants centers on effective communication with non-technical stakeholders. P1, P4, and P6
emphasized that effectively conveying complex technical concepts and model explanations
to non-technical stakeholders, such as business teams, regulators, or customers, is challeng-
ing. P7 stated “you need to understand users, their domain knowledge, their background,
and how much they are willing to take from you as an explanation. The knowledge gap is
the biggest gripe in the industry.” P12 confirmed that “there is obviously a knowledge gap
that is hard to overcome.” P10 explained that challenges faced in explaining AI components
involve the difficulty in conveying the intricacies of the algorithms to end-users. Further,
he stated while it would be possible to define performance metrics such as false positives
and negatives, explaining the inner workings of the AI-based system would be complex and

123
Empirical Software Engineering (2025) 30:18 Page 15 of 25 18

0 2 4 6 8 10
Communicaon with Non-Technical Stakeholders
Lack of Standardized Approaches
Understanding Black-Box Model
Balancing Explainability & Other Quality Aributes
Bridging the Trust Gap
Data Quality & Adaptaon for Sustainable Explainability
Resource Constraints
Safety and Compliance

Fig. 2 Explainability challenges as mentioned by interview participants

could be overwhelming for users. Providing graphs and visualizations could offer only some
level of explanation. Instead, he suggested defining robustness and performance metrics that
users can grasp. Specific matrices, such as the AI model weight matrix, may be utilized
to provide additional insights, and following research in this area can assist in addressing
these challenges during the requirement engineering process. Furthermore, P11, P13, and
P14 highlighted the challenge of translating technical concepts related to model operations,
algorithms, and data into understandable explanations for non-technical users.
Lack of Standardized Approaches A prevalent challenge, articulated by seven participants,
pertains to the absence of standardized approaches for explainability. They stated that the
absence of universally defined best practices for explainability makes it necessary to decide
on a case-by-case basis, leading to varying approaches across projects. For example, P6
stated that “the most significant challenge [...] is the absence of readily available tools or
solutions. It often requires custom implementations, as there isn’t a pre-existing solution.”
Similarly, P11 stated that “different approaches, like using SHAP values, have limitations
and may not provide full or comprehensive explanations. Some approaches may not cover
all corner cases, depending on the specific model being used.” Furthermore, P13 mentioned
that “clear requirements for extending the explanation features and determining the target
audience for explanations is challenging.” Overall, the seven participants highlighted that
due to a lack of standardized approaches, determining and fulfilling the explainability need
for end-users is challenging, often leading to customized practices.
Understanding Black-Box Mode The third challenge highlighted by six participants cen-
ters on the difficulty of comprehending and explaining the internal workings of complex
AI models. These participants described it as a challenge to answer questions about model
predictions beyond training data. Furthermore, attempting to comprehend the inner workings
of the trained model to enhance explainability poses a significant obstacle, given the inherent
opaqueness of such models. P14 emphasized that the black box nature of the model hinders
meaningful explanation, stating “in cases where the model is considered a ’black box’, mean-
ing the internal workings are not easily interpretable, generating meaningful explanations
becomes more challenging.”
Balancing Explainability & Other Quality Attributes Striking the right balance between
model accuracy and providing simple, understandable explanations was also described as
challenging. Five participants mentioned that highly accurate models may be less inter-
pretable, while simpler models may sacrifice accuracy. P12 and P14 highlighted that
improving algorithm accuracy while maintaining explainability is challenging. P7 described
it as follows: “You have to find a good compromise on how much information you are going
to provide to the stakeholders and how much not.” P8 and P10 highlighted the trade-off
between model performance and explainability because the more complex models are less

123
18 Page 16 of 25 Empirical Software Engineering (2025) 30:18

explainable, which leads to scale-back due to a lack of explainability. Moreover, they high-
lighted when models become more complex, they tend to be less explainable, and this lack
of explainability can result in a reduction in their usage or adoption.
Bridging the Trust Gap Building trust with stakeholders and end-users was highlighted by
five participants as a pivotal challenge. Explaining how decisions are made is crucial for
building trust. P1 added that “the challenge lies in effectively communicating the explain-
ability of the model to different audiences, addressing their specific concerns, and building
trust in the reliability and validity of the AI-based system.” Further, P4 and P5 mentioned
that convincing end-users how these results are generated is a challenging part.
Resource Constraints P8, P12, and P14 highlighted, that incorporating explainability within
resource constraints can be a challenge. For example, providing explainability in large-scale
systems with high data volumes and real-time processing requirements can pose challenges
in terms of computational resources, and performance was seen as challenging, especially
when there were extensive requests for explanations or limitations in the implementation
effort.
Data Quality & Adaptation for Sustainable Explainability The importance of data quality
in achieving explainability was emphasized by P13 and P14. Further, these participants
stressed that adapting explanations to cope with changing data distributions and ensuring
relevance over time is challenging. P5 also mentioned that it is hard to answer any explanation
request beyond the data on which the model has been trained.
Safety and Compliance Ensuring that explainability meets safety and compliance require-
ments can be challenging, especially in safety-critical applications. P10 and P12 emphasized
ensuring compliance with regulations while providing explainability, addressing concerns
about the sensitivity and confidentiality of the algorithm, and dealing with the potential risks
and liabilities associated with providing explanations is challenging.
In conclusion, we observed that developing standardized approaches for explainability and
finding effective ways to communicate technical concepts to non-technical users is crucial
to overcoming these challenges. Additionally, transparency and reliability concerns can be
addressed through an improved understanding of the internal workings of complex models.
Building trust with stakeholders and end-users by effectively communicating the explainabil-
ity of the model is essential. Furthermore, the challenges encountered by ML practitioners
in achieving explainability underscore the importance of adopting an RE perspective. Clear
and comprehensive communication with non-technical stakeholders is essential, necessitat-
ing the identification of user-friendly ways to convey technical concepts. The absence of
standardized approaches for explainability calls for the development of clear requirements
that guide custom implementations based on specific project needs.

4.4 Which Trade-Offs Between Explainability and other Quality Attributes Do


Practitioners Consider? (RQ4)

When investigating the trade-offs that ML practitioners consider between explainability and
other quality attributes in AI-based systems, it becomes evident that these trade-offs are
multifaceted and depend on various factors. In the fourth part of our interviews, we sheds light
on the non-functional requirements and constraints associated with achieving explainability
in AI-based systems, as shown in Table 5.

123
Empirical Software Engineering (2025) 30:18 Page 17 of 25 18

Table 5 Quality trade-offs regarding explainability as mentioned by interview participants


Trade-offs P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 Total

Security and privacy trade-offs × × × × × × × × × 9


Accuracy trade-offs × × × × × × × 7
Performance and resource trade-off × × × × × × 6
Data quality trade-offs × × 2
Other application-specific trade-offs × × 2

Security and privacy considerations emerged as a critical trade-off in the context of


explainability. Nine participants highlighted the delicate balance between providing detailed
explanations and safeguarding system security and user privacy. For instance, P2 elaborated
on a car insurance fraud detection case. Providing extensive explanations for why a case is
deemed fraudulent could potentially expose the system’s inner workings, enabling fraud-
sters to exploit it. This trade-off highlights the difficulty of balancing the need to provide
explanations with the danger of giving more power to harmful individuals. However, he
acknowledged that some level of explanation might still be necessary to address user con-
cerns and potential legal implications. P2 further added that “one potential threat may be
users exploiting the provided explanations to manipulate the system’s decision-making pro-
cess.” Furthermore, P5 mentioned that “the security of the overall system is a concern when
too much transparency is provided. In AI-based systems, transparency could mean access to
training code, algorithms, and data” which would facilitate adversarial attacks on the trained
model.
Legal requirements, privacy concerns, and the need for transparency between companies
and end-users were also mentioned as factors influencing trade-offs related to explainability.
Nine participants emphasized that it is crucial to communicate product limitations and worst-
case scenarios to users to establish transparency and mitigate potential legal liabilities.
Seven participants emphasized that the pursuit of enhanced explainability often neces-
sitates a compromise in accuracy. P7 said in this regard that “we have to scale down to
make it explainable.” Similarly, P8 stated that “better explainability leads to poor results.”
Moreover, he added that complex AI models tend to offer higher accuracy but may be less
interpretable. Furthermore, improving model explainability can sometimes compromise per-
formance, as simplification may reduce complexity needed for optimal results. This trade-off
implies that in their efforts to make AI-based systems more interpretable, practitioners may
have to employ simpler models or interpretable algorithms, which may result in a reduction in
overall model accuracy. It was emphasized by both P7 and P8 that complex AI models, while
delivering high accuracy and state-of-the-art performance in diverse tasks, tend to obscure
the decision-making process, leading to a lack of transparency.
Moreover, another six participants pointed out that incorporating explainability into AI-
based systems often requires sacrificing performance. This trade-off manifests in the need to
limit or simplify models, potentially impacting prediction accuracy. Additionally, increased
computational time and resource usage can lead to slower system performance and higher
execution costs. The development process may also demand more effort and resources, further
contributing to increased costs.
Lastly, two participants highlighted that the trade-offs between explainability and other
quality attributes are context-dependent, varying based on the specific application area and
its unique requirements. For sensitive applications involving access to private data, prioritiz-
ing explainability, even at the expense of reduced accuracy, may be essential to maintaining

123
18 Page 18 of 25 Empirical Software Engineering (2025) 30:18

transparency and user trust. Conversely, for tasks such as internet search and retrieval, accu-
racy takes precedence, and the explainability aspect is less important because the focus is on
achieving accurate and efficient results.
In conclusion, ML practitioners must navigate a complex landscape of trade-offs when
considering explainability in AI-based systems. These trade-offs encompass accuracy, secu-
rity, privacy, transparency, performance, and context-specific considerations. Balancing these
factors requires a nuanced approach that aligns with the goals and requirements of the par-
ticular application.

5 Discussion

Our interview results revealed critical insights into the multifaceted realm of XAI practice,
prompting us to contemplate the relevance of the RE perspective in tackling these challenges.
In our exploration of RQ1, we uncovered diverse perspectives among practitioners regard-
ing the concept of explainability. This diversity highlights the absence of a unified definition,
as already identified in our prior work (Habiba et al. 2022). The distinct categories we
identified for explainability underscore the complexity of the term. Moreover, achieving a
common understanding of explainability while recognizing and accommodating the diversity
of requirements and contexts is considered essential for effectively addressing the challenges
in AI explainability.
Furthermore, our investigation delved deeper into the practices practitioners employ to
capture the requirements for explainability and how they put them into practice, as addressed
in RQ2. Our findings reveal that explainability often arises from legal requirements or system
performance failures. This demand for explainability emanates from various stakeholders in
diverse contexts, suggesting that requirements engineering practices could adapt to accommo-
date these varied needs. A comprehensive process for capturing explainability requirements,
however, is currently lacking. Practitioners typically rely on existing tools to clarify system
behavior to fellow technical personnel, but bridging the knowledge gap to convey results to
end-users poses challenges. Additionally, addressing emerging explainability requirements
post-deployment presents difficulties, indicating the need for investigation to establish meth-
ods for specifying these requirements before system deployment.
Subsequently, in RQ3, we identified several challenges faced by our participants, when
implementing explainability in AI-based systems. To address these challenges from an RE
perspective, researchers can explore strategies and tools for improving the communication of
complex technical concepts and model explanations to non-technical stakeholders. This may
involve the development of user-friendly visualization techniques and interfaces to enhance
understanding for business teams, regulators, or customers. Trust and reliability issues can
be mitigated by establishing requirements and standards for incorporating trust-building
mechanisms into system designs, including transparency, accountability, and trustworthiness
indicators. Additionally, the development of reference models and frameworks can ensure
consistency across projects. Further research should aim at improving the transparency of
black-box models, develop hybrid models that balance accuracy and interpretability, and
implement scalable explainability solutions. Adaptive systems and context-aware explana-
tions can enhance data quality and sustainability, while compliance frameworks and risk
management protocols are essential for ensuring safety and regulatory adherence.
Finally, in RQ4, we aimed to investigate the interaction between explainability and other
quality attributes. While adding explainability, ML practitioners are required to consider

123
Empirical Software Engineering (2025) 30:18 Page 19 of 25 18

trade-offs to other quality attributes. This indicates several important considerations for the
RE process.
Requirements engineers should be aware that enhancing explainability might come at
the cost of accuracy, performance, and potential security risks. Further, it is important to
understand the specific application context and user needs. Different applications may pri-
oritize either explainability or accuracy based on the sensitivity of the tasks and the users’
requirements. Moreover, requirements engineers need to engage with various stakeholders,
including ML practitioners, domain experts, end-users, and legal or compliance teams to
identify the optimal level of explainability while considering the trade-offs with other quality
attributes.

6 Threats to Validity

Throughout our study, we employed a systematic approach to strengthen the credibility and
integrity of our research. In the following, we point out the main threats to validity and our
corresponding mitigation strategies.
Internal and Construct Validity The phrasing chosen for our explanations and questions
may introduce bias and misinterpretations, especially if participants understand concepts
differently. To mitigate this, we initiated our research by conducting a series of pilot interviews
within the academic community. They helped in refining our interview questions to accurately
study the fundamental concepts under investigation.
Furthermore, participants may not have consequently revealed their true opinions. We
consider this a low risk for our study, as the concepts were neither very sensitive nor required
to reveal business-critical information. This was furthermore strengthened by guaranteeing
confidentiality and anonymity. Additionally, it’s important to note that a subset of our par-
ticipants lacked a software engineering background, and although we provided them with a
presentation, there is still a potential limitation in their ability to grasp the complete picture
of the study. Specifically, we aimed to establish a shared understanding of the concept of
explainability among practitioners without influencing their responses. To achieve this, we
carefully formulated our questions to encourage participants to define explainability from
their own perspective, in the context of the specific systems they are working on, and within
their unique working environments. This approach was intended to ensure that their answers
were based on their personal experiences and interpretations, rather than being influenced by
our presentation.
Conclusion Validity To limit observer bias and interpretation bias, we implemented a metic-
ulous coding process for the analysis of interview transcripts. This process was overseen by
the first author, chosen for her specialized technical expertise and profound understanding of
the research subject. Furthermore, all authors participated in rigorous reviews and validation
of the coding outcomes. We are confident having identified the essential underlying causal
relationships and derived meaningful conclusions.
External Validity As we interviewed 14 professionals in total, representativeness of the col-
lected data may be a potential issue. To mitigate the issue, we conducted a rigorous screening
of our participant pool before and after the interviews. Despite efforts to recruit a diverse
international participant pool, we primarily attracted respondents from German companies.
To maintain sample diversity, we ensured representation across various companies, projects,
and domains within Germany, while including a few participants located outside Germany.

123
18 Page 20 of 25 Empirical Software Engineering (2025) 30:18

We identified individuals with a minimum of two years of experience in AI/ML projects who
were currently engaged in or had recent involvement in such projects. Moreover, we decided
to exclude two participants after the interviews were done. These deliberate steps were taken
to cultivate a participant sample that is more representative and relevant. Furthermore, it’s
crucial to acknowledge that participants bring their domain knowledge and experiences into
the study, which could potentially impact their responses to the questions. Given that AI,
particularly XAI, is governed by regulatory frameworks, few of our findings are specific to
Germany’s cultural and legal context. The EU AI Act standardizes AI regulations across
all European states, influencing how AI practices are implemented and monitored. Future
research should include guidelines on legal and cultural aspects to define the study’s scope and
enhance the generalizability of findings across different regulatory environments, aligning
with the evolving AI governance landscape in the EU and beyond.

7 Conclusion

Our study highlighted practitioner perspectives on explainability in AI-based systems and the
challenges they face in implementing it effectively. We identified four categories of explain-
ability that were seen as necessary for making AI-based systems interpretable and transparent.
Furthermore, our findings also revealed that there are no standard practices to address end-
user needs for explainability, which poses a significant concern. The reasons for pursuing
explainability vary among practitioners, with legal requirements being a prominent driver.
Emerging regulatory and legal developments emphasized that AI-based systems must now
incorporate certain core functionalities. For instance, the General Data Protection Regulation
(GDPR) (European Parliament 2016) mandates transparency, accountability, and the “right
to explanation” for decisions supported by AI. Additionally, the need for explainability arises
when AI-based systems fail to provide desired results to stakeholders.
The challenges ML practitioners face in implementing explainability are many and varied.
While participants often rely on tools such as SHapley values (Lundberg and Lee 2017) and
LIME (Ribeiro et al. 2016) to explain their systems to technical personnel, communicating
complex technical concepts and model explanations to non-technical stakeholders is a pri-
mary hurdle. Building trust and convincing end-users about the reliability of AI-supported
decisions is another critical challenge, while demands for regulatory compliance in safety-
critical applications further complicate the matter. A lack of standardized approaches for
explainability and finding the right balance between other non-functional requirements and
explainability were found to be prevalent issues too.
Our study highlights the need for further research and development to effectively address
these challenges and enhance the explainability of AI-based systems across various domains.
Doing so will ensure that AI-based systems meet the explainability requirements of users.
Moreover, requirements engineers must be well-informed about the trade-offs between
explainability and other quality attributes. They must consider the application context and
domain, as well as the specific user needs to balance explainability and accuracy. Engaging
with various stakeholders is crucial to identifying the optimal level of explainability while
considering trade-offs with other quality attributes. In conclusion, improving the explainabil-
ity of AI-based systems requires collaboration, further research, and careful consideration of
trade-offs to ensure that these systems are transparent, trustworthy, and effectively meet the
needs of all stakeholders involved.
Acknowledgements This work was partially supported by the German Federal Ministry of Education and
Research, Grant Number: 21IV005E

123
Empirical Software Engineering (2025) 30:18 Page 21 of 25 18

Funding Open Access funding enabled and organized by Projekt DEAL.

Data Availability Statement Supplementary materials are available at https://zenodo.org/doi/10.5281/zenodo.


10034533. Please note that we have chosen not to share the interview recordings and transcriptions to protect
the confidentiality and privacy of our participants, as these documents could potentially reveal their identities.

Declarations

Conflict of Interest The authors declared that they have no conflict of interest.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence,
and indicate if changes were made. The images or other third party material in this article are included in the
article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

References
Atakishiyev S, Salameh M, Yao H, Goebel R (2021) Explainable artificial intelligence for autonomous driving:
a comprehensive overview and field guide for future research directions. arXiv:2112.11561
Baltes S, Ralph P (2022) Sampling in software engineering research: a critical review and guidelines. Empir
Softw Eng 27(4):94
Brennen A (2020) What do people really want when they say they want“ explainable ai?" we asked 60
stakeholders. In: Extended abstracts of the 2020 CHI conference on human factors in computing systems,
pp 1–7
Brunotte W, Chazette L, Korte K (2021) Can explanations support privacy awareness? a research roadmap. In:
2021 IEEE 29th International Requirements Engineering Conference Workshops (REW), pp 176–180.
IEEE
Caruana R, Lou Y, Gehrke J, Koch P, Sturm M, Elhadad N (2015) Intelligible models for healthcare: predicting
pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD international
conference on knowledge discovery and data mining, pp 1721–1730
Carvalho DV, Pereira EM, Cardoso JS (2019) Machine learning interpretability: a survey on methods and
metrics. Electron 8(8):832
Chazette L, Schneider K (2020) Explainability as a non-functional requirement: challenges and recommenda-
tions. Requir Eng 25(4):493–514
Chazette L, Brunotte W, Speith T (2021) Exploring explainability: a definition, a model, and a knowledge
catalogue. In: 2021 IEEE 29th International Requirements Engineering Conference (RE), pp 197–208.
IEEE
Dhanorkar S, Wolf CT, Qian K, Xu A, Popa L, Li Y (2021) Who needs to know what, when?: broadening the
explainable ai (xai) design space by looking at explanations across the ai lifecycle. Designing Interactive
Systems Conference 2021:1591–1602
Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. arXiv:1702.08608
Dwivedi R, Dave D, Naik H, Singhal S, Omer R, Patel P, Qian B, Wen Z, Shah T, Morgan G et al (2023)
Explainable ai (xai): core ideas, techniques, and solutions. ACM Comput Surv 55(9):1–33
European Parliament, Council of the European Union: Regulation (EU) 2016/679 of the European Parliament
and of the Council. https://data.europa.eu/eli/reg/2016/679/oj Accessed 2024-07-20
Gilpin LH, Bau D, Yuan BZ, Bajwa A, Specter M, Kagal L (2018) Explaining explanations: an overview of
interpretability of machine learning.In: 2018 IEEE 5th International Conference on Data Science and
Advanced Analytics (DSAA), pp 80–89. IEEE
Habiba UE, Bogner J, Wagner S (2022) Can requirements engineering support explainable artificial intelli-
gence? towards a user-centric approach for explainability requirements. In: 2022 IEEE 30th International
Requirements Engineering Conference Workshops (REW), pp 162–165

123
18 Page 22 of 25 Empirical Software Engineering (2025) 30:18

Henin C, Le Métayer D (2021) A multi-layered approach for tailored black-box explanations. In: International
conference on pattern recognition, pp 5–19. Springer
Hoffman RR, Mueller ST, Klein G, Jalaeian M, Tate C (2023) Explainable ai: roles and stakeholders, desire-
ments and challenges. Front Comput Sci 5:1117848
Ishikawa F, Matsuno Y (2020) Evidence-driven requirements engineering for uncertainty of machine learning-
based systems. In: 2020 IEEE 28th International Requirements Engineering Conference (RE), pp 346–
351. IEEE
Jansen Ferreira J, Monteiro M (2021) Designer-user communication for xai: an epistemological approach to
discuss xai design. arXiv e-prints, 2105
Jin W, Fatehi M, Abhishek K, Mallya M, Toyota B, Hamarneh G (2020) Artificial intelligence in glioma
imaging: challenges and advances. J Neural Eng 17(2):021002
Jin W, Fan J, Gromala D, Pasquier P, Hamarneh G (2023) Invisible users: uncovering end-users’ requirements
for explainable ai via explanation forms and goals. arXiv:2302.06609
Kästner L, Langer M, Lazar V, Schomäcker A, Speith T, Sterz S (2021) On the relation of trust and explainabil-
ity: Why to engineer for trustworthiness. In: 2021 IEEE 29th International Requirements Engineering
Conference Workshops (REW), pp 169–175. IEEE
Köhl MA, Baum K, Langer M, Oster D, Speith T, Bohlender D (2019) Explainability as a non-functional
requirement. In: 2019 IEEE 27th International Requirements Engineering Conference (RE), pp 363–
368. IEEE
Köhl MA, Baum K, Langer M, Oster D, Speith T, Bohlender D (2019) Explainability as a non-functional
requirement. In: 2019 IEEE 27th International Requirements Engineering Conference (RE), pp 363–
368. IEEE
Krishna S, Han T, Gu A, Pombra J, Jabbari S, Wu S, Lakkaraju H (2022) The disagreement problem in
explainable machine learning: a practitioner’s perspective. arXiv:2202.01602
Kuwajima H, Ishikawa F (2019) Adapting square for quality assessment of artificial intelligence systems. In:
2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp
13–18. IEEE
Lagioia F et al (2020) The impact of the general data protection regulation (gdpr) on artificial intelligence
Lakkaraju H, Slack D, Chen Y, Tan C, Singh S (2022) Rethinking explainability as a dialogue: a practitioner’s
perspective. arXiv:2202.01875
Lipton ZC (2018) The mythos of model interpretability: in machine learning, the concept of interpretability is
both important and slippery. Queue 16(3):31–57
Lipton ZC (2018) The mythos of model interpretability: in machine learning, the concept of interpretability is
both important and slippery. Queue 16(3):31–57
Longo L, Goebel R, Lecue F, Kieseberg P, Holzinger A (2020) Explainable artificial intelligence: Concepts,
applications, research challenges and visions. In: Machine learning and knowledge extraction: 4th IFIP
TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2020,
Dublin, Ireland, August 25–28, 2020, Proceedings, pp 1–16. Springer
Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. Adv Neural Inf Process
Syst 30
Markus AF, Kors JA, Rijnbeek PR (2021) The role of explainability in creating trustworthy artificial intelligence
for health care: a comprehensive survey of the terminology, design choices, and evaluation strategies. J
Biomed Inform 113:103655
Miller T (2019) Explanation in artificial intelligence: insights from the social sciences. Artif Intell 267:1–38
Montavon G, Samek W, Müller KR (2018) Methods for interpreting and understanding deep neural networks.
Digit Signal Process 73
Ribeiro MT, Singh S, Guestrin C (2016) ’why should i trust you?’ explaining the predictions of any classifier.
In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data
mining, pp 1135–1144
Runeson P, Höst M (2009) Guidelines for conducting and reporting case study research in software engineering.
Empir Softw Eng 14:131–164
Sachan S, Yang JB, Xu DL, Benavides DE, Li Y (2020) An explainable ai decision-support-system to automate
loan underwriting. Expert Syst Appl 144:113100
Sadeghi M, Klös V, Vogelsang A (2021) Cases for explainable software systems: characteristics and examples.
In: 2021 IEEE 29th International Requirements Engineering Conference Workshops (REW), pp 181–187.
IEEE
Seaman CB (2008) Qualitative methods. Guid Adv Empir Softw Eng 35–62
Sheh R (2021) Explainable artificial intelligence requirements for safe, intelligent robots. In: 2021 IEEE
International Conference on Intelligence and Safety for Robotics (ISR), pp 382–387. IEEE

123
Empirical Software Engineering (2025) 30:18 Page 23 of 25 18

Suresh H, Gomez SR, Nam KK, Satyanarayan A (2021) Beyond expertise and roles: a framework to characterize
the stakeholders of interpretable machine learning and their needs. In: Proceedings of the 2021 CHI
Conference on Human Factors in Computing Systems, pp 1–16
Vogelsang A, Borg M (2019) Requirements engineering for machine learning: perspectives from data scientists.
In: 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW), pp 245–251.
IEEE

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

Umm-e- Habiba earned her Master’s degree in software engineering


from the National University of Science and Technology (NUST),
Pakistan in 2014. Following that, she held a lecturer position at the
University of Kotli Azad Jammu and Kashmir from 2015 to 2021.
Presently, she is actively pursuing her doctoral research, with a pri-
mary emphasis on requirements engineering and artificial intelligence.
Since 2021, she has been a research and teaching assistant at the Insti-
tute of Software Engineering at the University of Stuttgart, Germany.

Mohammad Kasra Habib is a doctoral researcher at the Chair of Soft-


ware Engineering at the Technical University of Munich, primarily
focusing on deep learning and software engineering. He also serves
as a research and teaching assistant. He transitioned to the Techni-
cal University of Munich on May 1st, 2024, continuing in both roles
after beginning his PhD and assistantship in 2021 at the University of
Stuttgart, Germany.
He obtained his B.Sc. degree in Software Engineering from Balkh
University, Balkh, Afghanistan, in 2014, and subsequently earned his
M.Sc. degree in Computer Science from the Technical University of
Berlin, Berlin, Germany, in 2018.

123
18 Page 24 of 25 Empirical Software Engineering (2025) 30:18

Justus Bogner is currently an Assistant Professor in Software Engi-


neering at the Vrije Universiteit Amsterdam, The Netherlands. His
main research interests are empirical software engineering, software
architecture, and software quality (maintainability, evolvability, sus-
tainability), which he applies to microservices or AI-based systems.
He worked as a software engineer at Hewlett Packard and later at DXC
Technology for more than 9 years. After his PhD in 2020, he was a
postdoctoral research scientist at the University of Stuttgart, Germany,
for 3 years.

Jonas Fritzsch is a researcher in the field of software engineering and


architecture at the University of Stuttgart, Germany. His main research
focus is Microservices in the context of architectural refactoring from
monolithic systems. He benefits from over fifteen years of experience
in enterprise software development at his previous employer Hewlett
Packard Enterprise. As a university lecturer, he teaches programming
and algorithms in computer science courses.

Stefan Wagner is a full professor of software engineering at the Tech-


nical University of Munich (TUM), where he earned his PhD in com-
puter science. With a background in computer science and psychol-
ogy, he has published over 130 peer-reviewed scientific articles and
authored the book “Software Product Quality Control.” His research
interests span several areas of software engineering, including soft-
ware quality, human factors, AI-supported software engineering, auto-
motive software, AI-based systems, and empirical studies. Dr. Wagner
serves as a Section Editor of PeerJ Computer Science and on the edito-
rial boards of IEEE Software and Empirical Software Engineering. He
has received several best paper awards and was recognized as a 2022
Class of IEEE Computer Society Distinguished Contributor. Dr. Wag-
ner is a member of the German GI and a senior member of ACM and
IEEE.

Authors and Affiliations

Umm-e- Habiba1,3 · Mohammad Kasra Habib4 · Justus Bogner2 ·


Jonas Fritzsch1 · Stefan Wagner1,4

B Umm-e- Habiba
[email protected]
Mohammad Kasra Habib
[email protected]

123
Empirical Software Engineering (2025) 30:18 Page 25 of 25 18

Justus Bogner
[email protected]
Jonas Fritzsch
[email protected]
Stefan Wagner
[email protected]
1 Institute of Software Engineering, University of Stuttgart, Stuttgart, Baden-Württemberg,
Germany
2 Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
3 Department of Software Engineering, University of Kotli Azad Jammu Kashmir, Kotli Azad
Kashmir, AJk, Pakistan
4 TUM School of Communication, Information and Technology, Technical University of Munich,
Heilbronn, Germany

123

You might also like