2024 - How do ML practitioners perceive explainability an interview study of practices and challenges
2024 - How do ML practitioners perceive explainability an interview study of practices and challenges
https://doi.org/10.1007/s10664-024-10565-2
Abstract
Explainable artificial intelligence (XAI) is a field of study that focuses on the development
process of AI-based systems while making their decision-making processes understandable
and transparent for users. Research already identified explainability as an emerging require-
ment for AI-based systems that use machine learning (ML) techniques. However, there is
a notable absence of studies investigating how ML practitioners perceive the concept of
explainability, the challenges they encounter, and the potential trade-offs with other quality
attributes. In this study, we want to discover how practitioners define explainability for AI-
based systems and what challenges they encounter in making them explainable. Furthermore,
we explore how explainability interacts with other quality attributes. To this end, we con-
ducted semi-structured interviews with 14 ML practitioners from 11 companies. Our study
reveals diverse viewpoints on explainability and applied practices. Results suggest that the
importance of explainability lies in enhancing transparency, refining models, and mitigating
bias. Methods like SHapley Additive exPlanations (SHAP) and Local Interpretable Model-
Agnostic Explanation (LIME) are frequently used by ML practitioners to understand how
models work, while tailored approaches are typically adopted to meet the specific require-
ments of stakeholders. Moreover, we have discerned emerging challenges in eight categories.
Issues such as effective communication with non-technical stakeholders and the absence of
standardized approaches are frequently stated as recurring hurdles. We contextualize these
findings in terms of requirements engineering and conclude that industry currently lacks a
standardized framework to address arising explainability needs.
1 Introduction
Artificial intelligence (AI) is increasingly applied in various domains, including medicine (Caru-
ana et al. 2015; Jin et al. 2020), law (Jin et al. 2023), autonomous driving (Atakishiyev et al.
2021), and loan application approval (Sachan et al. 2020). AI-based systems often play a
key role in the associated critical decision-making processes. Responsible individuals, such
as physicians, judges, drivers, and bankers, require explanations to different extents for the
123
18 Page 2 of 25 Empirical Software Engineering (2025) 30:18
output generated by such AI-based systems. With AI becoming more prevalent as a decision
helper in high-stakes sectors, it is crucial to ensure that AI is understandable to its users (Longo
et al. 2020), guaranteeing safe, responsible, and legally compliant usage (Lagioia et al. 2020).
Machine learning (ML) pipelines are capable of producing precise predictions, however,
they frequently fail to incorporate two crucial phases, namely understanding and explaining.
Understanding involves analyzing the problem domain, data, and model behavior, including
training and quality assurance. Explaining is crucial throughout the ML lifecycle, espe-
cially when models are deployed in real-world applications. Both phases contribute to model
interpretability, trust, and effective implementation (Dwivedi et al. 2023). The increasing
autonomy and complexity of AI-based systems pose challenges for software engineers and
domain experts to comprehend them fully (Lipton 2018). This necessity has led to the devel-
opment of eXplainable AI (XAI) systems, where explanations play a pivotal role. Such
systems help users understand decisions made by the AI, thereby increasing confidence and
trustworthiness (Markus et al. 2021). Moreover, explanations serve to verify and validate
actions taken, identify the causes of errors, and reduce the likelihood of human mistakes.
Explainability is an emerging non-functional requirement that has garnered attention as
a critical quality aspect for AI-based systems (Chazette and Schneider 2020; Köhl et al.
2019). An existing study highlights that explainability significantly impacts the overall quality
of software systems (Chazette et al. 2021), contributing to other essential quality features
such as transparency and understandability. Notably, the Ethics Guidelines for Trustworthy
AI (Lipton 2018), recommended by the High-Level Expert Group on Artificial Intelligence
(AI HLEG), prioritize transparency as an essential requirement. These guidelines underscore
the significance of traceability, explainability, and communication in AI-based systems and
stress the importance of providing a clear account of the decision-making process from the
perspectives of relevant stakeholders.
To facilitate the development of explainable AI-based systems, methodologies are essen-
tial for requirements engineers to analyze, delineate, and assess the requirements related to
explainability. However, given that the notion of explainability has recently emerged as a
critical quality factor among non-functional requirements (Köhl et al. 2019), there is cur-
rently no adequate guidance available to assist practitioners in this regard. To understand
how practitioners deal with this lack of guidance, we conducted an interview study to gather
insights from practitioners regarding their interpretations of explainability, the practices they
actively employ, and the difficulties they encounter. These interviews are conducted within
the context of various stakeholders and industry requirement engineering (RE) practices.
123
Empirical Software Engineering (2025) 30:18 Page 3 of 25 18
123
18 Page 4 of 25 Empirical Software Engineering (2025) 30:18
3 Study Design
To structure our research efforts, we followed the five-step process suggested by Runeson
and Höst (2009). Initially, we formulated our research objective and the associated research
questions.
Our objective is to identify the practices, challenges, and trade-offs professionals face in
the XAI field. Furthermore, we want to chart potential research avenues for the academic
community to develop solutions addressing XAI challenges and providing effective support
for practitioners. To establish a clear direction and scope for our study, we subsequently
formulate four key research questions:
RQ1: How do ML practitioners describe explainability from their perspective?
RQ2: What practices do ML practitioners employ to evaluate the necessity of explain-
ability, and which practices do they apply to address it?
RQ3: Which explainability challenges do ML practitioners experience?
RQ4: Which trade-offs between explainability and other quality attributes do ML
practitioners consider?
These research questions aim to identify the need for understanding the diverse aspects of
explainability in AI-based systems. RQ1 seeks to capture how ML practitioners concep-
tualize explainability, providing insight into diverse perspectives of primary stakeholders,
and establishing a foundational understanding for the concept explainability. RQ2 aims to
uncover the current practices used to evaluate the necessity of explainability and the strategies
employed to achieve it. Further, identifying practical approaches and gaps in the implementa-
tion of these practices. RQ3, eventually, focuses on identifying the specific challenges faced
by practitioners, which is crucial for addressing explainability effectively. Finally, RQ4 seeks
to identify the trade-offs between explainability and other quality attributes, recognizing that
balancing these aspects is essential for the practical deployment of AI-based systems. These
questions as a whole provide a comprehensive exploration of explainability in the context of
AI-based systems, guiding future research and development to enhance their transparency,
trustworthiness, and usability.
To answer these research questions, we followed an exploratory and qualitative research
approach by conducting semi-structured interviews (Runeson and Höst 2009). Without a
preconceived hypothesis, we aim to investigate the topic in a preliminary and open-ended
manner. This approach gave us a foundational structure while at the same time granting us
the flexibility to dynamically adjust our inquiries in response to participants’ feedback. To
guide our selection of interviewees, we established the following set of criteria:
• The person has at least two years of experience working on AI/ML projects.
• The person is currently working on an AI/ML project or has worked on one in the past
two years.
123
Empirical Software Engineering (2025) 30:18 Page 5 of 25 18
We enlisted participants through personal industry connections within our research team
and by contacting individuals in our LinkedIn network. We followed the referral chain sam-
pling (Baltes and Ralph 2022) in which participants are initially selected through convenience
sampling, and then asked to refer or recommend other potential participants. Eight partici-
pants were recruited through convenience sampling and six via referral chain sampling.
An interview preamble (Runeson and Höst 2009) was designed to explain the interview
process and theme to the participants before conducting the interviews. This document was
distributed to the participants in advance to acquaint them with the study. The preamble also
delineated ethical considerations, such as confidentiality assurances, requests for consent
for audio recordings, and a guarantee that recordings and transcripts would remain confi-
dential and unpublished. Further, we created an interview guide (Seaman 2008) containing
the questions organized into thematic categories. This guide helped to structure and orga-
nize the semi-structured interviews but was not provided to interviewees before the sessions.
Additionally, we prepared a slide presentation with supplementary materials to provide con-
textual information related to our research. These slides provide information about the RE
process and explore the intersection between XAI and RE. These slides were presented to
the participants immediately before the interviews.
In total, we conducted a series of 14 individual interviews. All of them were conducted
remotely via Webex in English and lasted 35 to 55 minutes. We loosely adhered to the
structure provided in the interview guide but adapted our approach based on the participant’s
responses. To establish rapport and initiate the discussions, we began by asking participants
to introduce themselves, outlining their roles and the specific systems they were involved
with. Subsequently, we transitioned into the topic of explainability and explored the practices
followed by industry practitioners in this domain. The next segment focused on the challenges
encountered by participants when endeavoring to make their systems more explainable.
Lastly, we delved into inquiries regarding the trade-offs between explainability and other
quality attributes. Following the interviews, we transcribed each audio recording to create a
textual document for further analysis.
Our data analysis started with coding each transcript, adhering to the constant comparison
method rooted in grounded theory principles (Seaman 2008). Following the preliminary
established set of codes, we assigned labels to relevant paragraphs within the transcript.
Throughout this procedure, we generated new labels, revisited previously completed tran-
scripts, and occasionally renamed, split, or merged labels as our understanding of the cases
grew more comprehensive. After that, we conducted a detailed analysis of the intricacies and
relationships among the codes within each transcript. This analysis resulted in the creation
of a textual narrative for each case.
The supplementary materials created in this process, including study design and the inter-
view preamble, can be found online1 .
1 https://zenodo.org/doi/10.5281/zenodo.10034533
123
18 Page 6 of 25 Empirical Software Engineering (2025) 30:18
4 Results
In this section, we present the interview results, grouped by our four initially stated research
questions. The data resulted from interviewing a cohort of 16 participants. We decided to
exclude the data obtained from two participants. The rationale for this exclusion stemmed
from our observation that these two individuals were Ph.D. students and lacked sufficient
industry experience according to our criteria. This left us with 14 participants, representing 9
German companies, one Nigerian company, and one Swiss company. Of the 14 participants,
9 were from companies that provide AI solutions, 4 were from the automotive domain and
one participant was from a policymaking agency, as shown in Table 1. All participants were
experienced with AI-based systems, and some had also prior experience in different software
engineering roles.
123
Table 1 Participant demographics
123
18
Table 2 Explainability insights by participants
18
123
Explainability for transparency and trust P1: Understanding and interpreting how End-user Demonstrating feature relationships
ML models make decisions, addressing through scientific methods and present-
Page 8 of 25
123
18
18 Page 10 of 25 Empirical Software Engineering (2025) 30:18
giving certain results or predictions by explaining how the model works and the impact of
features or data.” P3 stated that he “would define [it] in a way that somebody could describe
what happens inside the black box”.
Explainability for Safety and Bias Mitigation Only one interviewee (P3) defined explain-
ability in terms of safety and bias mitigation: “it is about safety requirements not to [be]
biased against humans.”
These diverse perspectives on explainability highlight the need for a comprehensive approach
that caters to both technical and non-technical stakeholders. We furthermore analyzed how the
role of participants influences their perspective of explainability, as can be seen in Fig. 1. The
biggest group of Data scientists generally prioritize transparency and acceptability, whith P1
adding a focus on clear communication of model operations and ensuring stakeholder under-
standing. Similarly, P2 emphasized the need for justifying decisions and using explainability
for debugging. Participants P4 and P12 emphasized the importance of understanding system
filtering and results, while P6 and P7 highlighted concerns about bridging the gap between
technical and non-technical stakeholders. Building trust through feature explanation would
be crucial, as stated by P8 and P11.
Solution Architect P3 in the automotive sector focuses on avoiding bias and understanding
black box models. AI Solution Architect P5 aims to make complex systems more inter-
pretable, while AI Ethics Responsible P9 aims at balancing ethical considerations with
usability. AI Safety Expert P10 integrates trust, safety, and design into development. Data
engineer P13, eventually, emphasizes user-friendly decoding of ML predictions, while Data
Manager P14 stresses the importance of trust and transparency in data explanation.
We compared the definitions provided by our practitioners with the ones we found in liter-
ature. For that, we consulted five sources providing such a definition, summarized in Table 3.
These definitions are generally rather generic and do not address a specific domain, stake-
holder group, or specific requirements. Our practitioners, in contrast, define explainability
more specifically based on their job roles, the systems they work on, or the targeted stake-
holders. Based on our analysis, we identified two critical factors that should be considered
when defining the term “explainability”: a) stakeholders and b) system domains.
123
Empirical Software Engineering (2025) 30:18 Page 11 of 25 18
processes.” Where P2 stated that “the approach to explainability practices can vary depend-
ing on the project and the type of data involved.” Further, he added there are no universally
defined best practices, as it often requires a case-by-case decision based on feasibility and
usability. Overall, we have seen that user-centric practices for explainability depend on the
specific project requirements and can involve combining unexplainable AI-based systems
with supplementary statistical explanations after the decision has been made.
Ethical and Safety Considerations P3, P5, and P6 emphasized that ethics and safety drive
the need for explainability requirements. P5 added that a “need for explainability is assessed
based on factors such as high-risk systems impacting human beings and ethical consider-
ations.” Further, P6 stated that “It is ethical, to make sure that your system is unbiased,
especially when it comes to end users.”. This involves ensuring unbiased systems, partic-
ularly in user interactions, where organizations building or providing AI services bear the
responsibility for model explainability, eliminating biases, and addressing privacy concerns.
P3 did not mention a specific practice for addressing explainability requirements in the con-
text of ethical and safety considerations, while P5 and P6 described it as more of a translation
process.
Legal and Regulatory Requirements P8, P10, P11, and P13 stated in their current sce-
nario that explainability is a legal and regulatory requirement. To address such explainability
Doshi-Velez and Kim (2017) Explanations are “the currency in which we exchange beliefs”
Lipton (2018) Explainability addresses “what else can the model tell me?”
Gilpin et al. (2018) Mostly concerned with the internal logic, i.e.,
“models that are able to summarize the reasons for neural network
behavior, gain the trust of users, or produce insights about
the causes of their decisions”
Montavon et al. (2018) “An explanation is the collection of features of the interpretable
domain that have contributed to a given example to produce a
decision”
Miller (2019) “Explicitly explaining decisions to people”
123
18 Page 12 of 25 Empirical Software Engineering (2025) 30:18
123
Empirical Software Engineering (2025) 30:18 Page 13 of 25 18
Table 4 continued
Factors Explainability drivers Practices to address explainability
Client and business perspective P7, & P9: Client or end-user per- P7’s approach of using meet-
spective ings and various tools aligns
with the need for explainability
from the client’s perspective. P9
works closely with clients to clar-
ify expectations and use different
tools to meet their specific require-
ments for explainability.
P12: Business decision P12 uses plots and rule-based
explanations, including “what if ”
explanations, to address the need
for explainability.
P14: Customer requirements Did not mention any specific prac-
tice
Risk management P5: High-risk systems P5’s approach, which includes
translating technical concepts
and providing visualizations,
addresses the need for explain-
ability in high-risk systems and
legal mandates.
P10: Risk considerations P10’s approach, combining a pilot
test with explainability, addresses
the need for explainability in high-
risk systems.
P11: Internal standards P11’s uses SHAP values or fea-
ture importance are employed to
explain which data inputs are rel-
evant for the model’s outputs.
Data scientist and technical con- P13: Data scientist perspective P13’s approach of incorporating
sideration sources for decision-making and
using plots and rule based expla-
nations aligns with the need for
explainability.
requirements, P8 and P11 mentioned the use of various tools, including figures, mathematics,
and statistics to provide understandable explanations, e.g., SHAP values (Lundberg and Lee
2017). In addition to these tools, P13 mentioned using LIME (Ribeiro et al. 2016), plots,
and rule-based explanations aid to address the need for explainability. However, all of them
stated that stakeholders who are unfamiliar with these tools may still struggle to grasp the
explanations fully. P10 stated “It depends on the scope of your explainability of this product.
Do you want to share everything with the end-user or there are knowledge gaps?”
Client and Business Perspective P4, P12, and P14 explained that they often receive requests
for explainability from their customers, primarily to assist in making informed business
decisions. To address such needs, P12 mentioned plots and rule-based explanations, including
what if explanations, whereas P4 and P14 did not mention any specific practices.
Risk Management P5 and P10 underscored that the risk involved in a system leads to the
need for explainability. P10 emphasized that providing visualizations and using pilot tests
with explainability can help users to comprehend the system.
123
18 Page 14 of 25 Empirical Software Engineering (2025) 30:18
Data Scientist and Technical Considerations P6 and P13 listed internal standards and
data scientists’ needs as a requirement for explainability. To fulfill such needs, they mostly
rely on SHAP values (Lundberg and Lee 2017). Further, they emphasized on incorporating
business-oriented explanations to non-technical stakeholders, making it easier to convey
model predictions. For instance, P6 stated custom dashboards are created to display model
insights in a more understandable manner for different audiences.
Furthermore, we notice a link between roles (Table 1) and the practices that are used for
explainability (Table 4). A trending theme is how AI solution architects and safety engineers
are predominantly associated with factors such as risk management, and ethical and safety
considerations. Such practitioners use visualization and natural language in their explana-
tions. However, data scientists tailor their explanations to the audience. For legal and technical
stakeholders, they may rely on techniques like LIME and Shapley values. However, for end
users, there are no such standard methods for explanations. This leaves a gap in address-
ing the explainability needs at the end-user level. We also analyzed the practices explicitly
mentioned by practitioners to address explainability. Our findings indicate that practitioners
P8 and P11 referred to the use of Shapley values and feature importance. Practitioners P12
and P13 highlighted rule-based explanations, plots, and decision trees, while P5 mentioned
visualization techniques as their approach to explainability. Additionally, our analysis of
existing literature on XAI practices (Dwivedi et al. 2023) reveals that only a limited number
of techniques are being applied in practice. This highlights a significant gap between the
techniques discussed in the literature and their actual adoption by practitioners.
In summary, our practitioners employ a variety of practices to address explainability, tailor-
ing their approaches to the demands stemming from such as user-centric factors, ethical and
safety considerations, legal and regulatory mandates, client and business perspectives, risk
management, and data scientist and technical requirements. The adoption of these practices
is influenced by the intended audience, “to whom it should be explainable”. These practices
were regarded essential for ensuring transparency and facilitating informed decision-making
in AI-based systems across various domains.
In our exploration of the third research question, we sought to identify the challenges encoun-
tered by ML practitioners when explaining their systems. The insights shared by participants
highlight a spectrum of challenges, each shedding light on the intricacies of achieving explain-
ability in AI-based systems, as illustrated in Fig. 2. In the following, we summarize the
interviewee’s statements for each challenge category.
Communication with Non-Technical Stakeholders A recurring challenge faced by ten par-
ticipants centers on effective communication with non-technical stakeholders. P1, P4, and P6
emphasized that effectively conveying complex technical concepts and model explanations
to non-technical stakeholders, such as business teams, regulators, or customers, is challeng-
ing. P7 stated “you need to understand users, their domain knowledge, their background,
and how much they are willing to take from you as an explanation. The knowledge gap is
the biggest gripe in the industry.” P12 confirmed that “there is obviously a knowledge gap
that is hard to overcome.” P10 explained that challenges faced in explaining AI components
involve the difficulty in conveying the intricacies of the algorithms to end-users. Further,
he stated while it would be possible to define performance metrics such as false positives
and negatives, explaining the inner workings of the AI-based system would be complex and
123
Empirical Software Engineering (2025) 30:18 Page 15 of 25 18
0 2 4 6 8 10
Communicaon with Non-Technical Stakeholders
Lack of Standardized Approaches
Understanding Black-Box Model
Balancing Explainability & Other Quality Aributes
Bridging the Trust Gap
Data Quality & Adaptaon for Sustainable Explainability
Resource Constraints
Safety and Compliance
could be overwhelming for users. Providing graphs and visualizations could offer only some
level of explanation. Instead, he suggested defining robustness and performance metrics that
users can grasp. Specific matrices, such as the AI model weight matrix, may be utilized
to provide additional insights, and following research in this area can assist in addressing
these challenges during the requirement engineering process. Furthermore, P11, P13, and
P14 highlighted the challenge of translating technical concepts related to model operations,
algorithms, and data into understandable explanations for non-technical users.
Lack of Standardized Approaches A prevalent challenge, articulated by seven participants,
pertains to the absence of standardized approaches for explainability. They stated that the
absence of universally defined best practices for explainability makes it necessary to decide
on a case-by-case basis, leading to varying approaches across projects. For example, P6
stated that “the most significant challenge [...] is the absence of readily available tools or
solutions. It often requires custom implementations, as there isn’t a pre-existing solution.”
Similarly, P11 stated that “different approaches, like using SHAP values, have limitations
and may not provide full or comprehensive explanations. Some approaches may not cover
all corner cases, depending on the specific model being used.” Furthermore, P13 mentioned
that “clear requirements for extending the explanation features and determining the target
audience for explanations is challenging.” Overall, the seven participants highlighted that
due to a lack of standardized approaches, determining and fulfilling the explainability need
for end-users is challenging, often leading to customized practices.
Understanding Black-Box Mode The third challenge highlighted by six participants cen-
ters on the difficulty of comprehending and explaining the internal workings of complex
AI models. These participants described it as a challenge to answer questions about model
predictions beyond training data. Furthermore, attempting to comprehend the inner workings
of the trained model to enhance explainability poses a significant obstacle, given the inherent
opaqueness of such models. P14 emphasized that the black box nature of the model hinders
meaningful explanation, stating “in cases where the model is considered a ’black box’, mean-
ing the internal workings are not easily interpretable, generating meaningful explanations
becomes more challenging.”
Balancing Explainability & Other Quality Attributes Striking the right balance between
model accuracy and providing simple, understandable explanations was also described as
challenging. Five participants mentioned that highly accurate models may be less inter-
pretable, while simpler models may sacrifice accuracy. P12 and P14 highlighted that
improving algorithm accuracy while maintaining explainability is challenging. P7 described
it as follows: “You have to find a good compromise on how much information you are going
to provide to the stakeholders and how much not.” P8 and P10 highlighted the trade-off
between model performance and explainability because the more complex models are less
123
18 Page 16 of 25 Empirical Software Engineering (2025) 30:18
explainable, which leads to scale-back due to a lack of explainability. Moreover, they high-
lighted when models become more complex, they tend to be less explainable, and this lack
of explainability can result in a reduction in their usage or adoption.
Bridging the Trust Gap Building trust with stakeholders and end-users was highlighted by
five participants as a pivotal challenge. Explaining how decisions are made is crucial for
building trust. P1 added that “the challenge lies in effectively communicating the explain-
ability of the model to different audiences, addressing their specific concerns, and building
trust in the reliability and validity of the AI-based system.” Further, P4 and P5 mentioned
that convincing end-users how these results are generated is a challenging part.
Resource Constraints P8, P12, and P14 highlighted, that incorporating explainability within
resource constraints can be a challenge. For example, providing explainability in large-scale
systems with high data volumes and real-time processing requirements can pose challenges
in terms of computational resources, and performance was seen as challenging, especially
when there were extensive requests for explanations or limitations in the implementation
effort.
Data Quality & Adaptation for Sustainable Explainability The importance of data quality
in achieving explainability was emphasized by P13 and P14. Further, these participants
stressed that adapting explanations to cope with changing data distributions and ensuring
relevance over time is challenging. P5 also mentioned that it is hard to answer any explanation
request beyond the data on which the model has been trained.
Safety and Compliance Ensuring that explainability meets safety and compliance require-
ments can be challenging, especially in safety-critical applications. P10 and P12 emphasized
ensuring compliance with regulations while providing explainability, addressing concerns
about the sensitivity and confidentiality of the algorithm, and dealing with the potential risks
and liabilities associated with providing explanations is challenging.
In conclusion, we observed that developing standardized approaches for explainability and
finding effective ways to communicate technical concepts to non-technical users is crucial
to overcoming these challenges. Additionally, transparency and reliability concerns can be
addressed through an improved understanding of the internal workings of complex models.
Building trust with stakeholders and end-users by effectively communicating the explainabil-
ity of the model is essential. Furthermore, the challenges encountered by ML practitioners
in achieving explainability underscore the importance of adopting an RE perspective. Clear
and comprehensive communication with non-technical stakeholders is essential, necessitat-
ing the identification of user-friendly ways to convey technical concepts. The absence of
standardized approaches for explainability calls for the development of clear requirements
that guide custom implementations based on specific project needs.
When investigating the trade-offs that ML practitioners consider between explainability and
other quality attributes in AI-based systems, it becomes evident that these trade-offs are
multifaceted and depend on various factors. In the fourth part of our interviews, we sheds light
on the non-functional requirements and constraints associated with achieving explainability
in AI-based systems, as shown in Table 5.
123
Empirical Software Engineering (2025) 30:18 Page 17 of 25 18
123
18 Page 18 of 25 Empirical Software Engineering (2025) 30:18
transparency and user trust. Conversely, for tasks such as internet search and retrieval, accu-
racy takes precedence, and the explainability aspect is less important because the focus is on
achieving accurate and efficient results.
In conclusion, ML practitioners must navigate a complex landscape of trade-offs when
considering explainability in AI-based systems. These trade-offs encompass accuracy, secu-
rity, privacy, transparency, performance, and context-specific considerations. Balancing these
factors requires a nuanced approach that aligns with the goals and requirements of the par-
ticular application.
5 Discussion
Our interview results revealed critical insights into the multifaceted realm of XAI practice,
prompting us to contemplate the relevance of the RE perspective in tackling these challenges.
In our exploration of RQ1, we uncovered diverse perspectives among practitioners regard-
ing the concept of explainability. This diversity highlights the absence of a unified definition,
as already identified in our prior work (Habiba et al. 2022). The distinct categories we
identified for explainability underscore the complexity of the term. Moreover, achieving a
common understanding of explainability while recognizing and accommodating the diversity
of requirements and contexts is considered essential for effectively addressing the challenges
in AI explainability.
Furthermore, our investigation delved deeper into the practices practitioners employ to
capture the requirements for explainability and how they put them into practice, as addressed
in RQ2. Our findings reveal that explainability often arises from legal requirements or system
performance failures. This demand for explainability emanates from various stakeholders in
diverse contexts, suggesting that requirements engineering practices could adapt to accommo-
date these varied needs. A comprehensive process for capturing explainability requirements,
however, is currently lacking. Practitioners typically rely on existing tools to clarify system
behavior to fellow technical personnel, but bridging the knowledge gap to convey results to
end-users poses challenges. Additionally, addressing emerging explainability requirements
post-deployment presents difficulties, indicating the need for investigation to establish meth-
ods for specifying these requirements before system deployment.
Subsequently, in RQ3, we identified several challenges faced by our participants, when
implementing explainability in AI-based systems. To address these challenges from an RE
perspective, researchers can explore strategies and tools for improving the communication of
complex technical concepts and model explanations to non-technical stakeholders. This may
involve the development of user-friendly visualization techniques and interfaces to enhance
understanding for business teams, regulators, or customers. Trust and reliability issues can
be mitigated by establishing requirements and standards for incorporating trust-building
mechanisms into system designs, including transparency, accountability, and trustworthiness
indicators. Additionally, the development of reference models and frameworks can ensure
consistency across projects. Further research should aim at improving the transparency of
black-box models, develop hybrid models that balance accuracy and interpretability, and
implement scalable explainability solutions. Adaptive systems and context-aware explana-
tions can enhance data quality and sustainability, while compliance frameworks and risk
management protocols are essential for ensuring safety and regulatory adherence.
Finally, in RQ4, we aimed to investigate the interaction between explainability and other
quality attributes. While adding explainability, ML practitioners are required to consider
123
Empirical Software Engineering (2025) 30:18 Page 19 of 25 18
trade-offs to other quality attributes. This indicates several important considerations for the
RE process.
Requirements engineers should be aware that enhancing explainability might come at
the cost of accuracy, performance, and potential security risks. Further, it is important to
understand the specific application context and user needs. Different applications may pri-
oritize either explainability or accuracy based on the sensitivity of the tasks and the users’
requirements. Moreover, requirements engineers need to engage with various stakeholders,
including ML practitioners, domain experts, end-users, and legal or compliance teams to
identify the optimal level of explainability while considering the trade-offs with other quality
attributes.
6 Threats to Validity
Throughout our study, we employed a systematic approach to strengthen the credibility and
integrity of our research. In the following, we point out the main threats to validity and our
corresponding mitigation strategies.
Internal and Construct Validity The phrasing chosen for our explanations and questions
may introduce bias and misinterpretations, especially if participants understand concepts
differently. To mitigate this, we initiated our research by conducting a series of pilot interviews
within the academic community. They helped in refining our interview questions to accurately
study the fundamental concepts under investigation.
Furthermore, participants may not have consequently revealed their true opinions. We
consider this a low risk for our study, as the concepts were neither very sensitive nor required
to reveal business-critical information. This was furthermore strengthened by guaranteeing
confidentiality and anonymity. Additionally, it’s important to note that a subset of our par-
ticipants lacked a software engineering background, and although we provided them with a
presentation, there is still a potential limitation in their ability to grasp the complete picture
of the study. Specifically, we aimed to establish a shared understanding of the concept of
explainability among practitioners without influencing their responses. To achieve this, we
carefully formulated our questions to encourage participants to define explainability from
their own perspective, in the context of the specific systems they are working on, and within
their unique working environments. This approach was intended to ensure that their answers
were based on their personal experiences and interpretations, rather than being influenced by
our presentation.
Conclusion Validity To limit observer bias and interpretation bias, we implemented a metic-
ulous coding process for the analysis of interview transcripts. This process was overseen by
the first author, chosen for her specialized technical expertise and profound understanding of
the research subject. Furthermore, all authors participated in rigorous reviews and validation
of the coding outcomes. We are confident having identified the essential underlying causal
relationships and derived meaningful conclusions.
External Validity As we interviewed 14 professionals in total, representativeness of the col-
lected data may be a potential issue. To mitigate the issue, we conducted a rigorous screening
of our participant pool before and after the interviews. Despite efforts to recruit a diverse
international participant pool, we primarily attracted respondents from German companies.
To maintain sample diversity, we ensured representation across various companies, projects,
and domains within Germany, while including a few participants located outside Germany.
123
18 Page 20 of 25 Empirical Software Engineering (2025) 30:18
We identified individuals with a minimum of two years of experience in AI/ML projects who
were currently engaged in or had recent involvement in such projects. Moreover, we decided
to exclude two participants after the interviews were done. These deliberate steps were taken
to cultivate a participant sample that is more representative and relevant. Furthermore, it’s
crucial to acknowledge that participants bring their domain knowledge and experiences into
the study, which could potentially impact their responses to the questions. Given that AI,
particularly XAI, is governed by regulatory frameworks, few of our findings are specific to
Germany’s cultural and legal context. The EU AI Act standardizes AI regulations across
all European states, influencing how AI practices are implemented and monitored. Future
research should include guidelines on legal and cultural aspects to define the study’s scope and
enhance the generalizability of findings across different regulatory environments, aligning
with the evolving AI governance landscape in the EU and beyond.
7 Conclusion
Our study highlighted practitioner perspectives on explainability in AI-based systems and the
challenges they face in implementing it effectively. We identified four categories of explain-
ability that were seen as necessary for making AI-based systems interpretable and transparent.
Furthermore, our findings also revealed that there are no standard practices to address end-
user needs for explainability, which poses a significant concern. The reasons for pursuing
explainability vary among practitioners, with legal requirements being a prominent driver.
Emerging regulatory and legal developments emphasized that AI-based systems must now
incorporate certain core functionalities. For instance, the General Data Protection Regulation
(GDPR) (European Parliament 2016) mandates transparency, accountability, and the “right
to explanation” for decisions supported by AI. Additionally, the need for explainability arises
when AI-based systems fail to provide desired results to stakeholders.
The challenges ML practitioners face in implementing explainability are many and varied.
While participants often rely on tools such as SHapley values (Lundberg and Lee 2017) and
LIME (Ribeiro et al. 2016) to explain their systems to technical personnel, communicating
complex technical concepts and model explanations to non-technical stakeholders is a pri-
mary hurdle. Building trust and convincing end-users about the reliability of AI-supported
decisions is another critical challenge, while demands for regulatory compliance in safety-
critical applications further complicate the matter. A lack of standardized approaches for
explainability and finding the right balance between other non-functional requirements and
explainability were found to be prevalent issues too.
Our study highlights the need for further research and development to effectively address
these challenges and enhance the explainability of AI-based systems across various domains.
Doing so will ensure that AI-based systems meet the explainability requirements of users.
Moreover, requirements engineers must be well-informed about the trade-offs between
explainability and other quality attributes. They must consider the application context and
domain, as well as the specific user needs to balance explainability and accuracy. Engaging
with various stakeholders is crucial to identifying the optimal level of explainability while
considering trade-offs with other quality attributes. In conclusion, improving the explainabil-
ity of AI-based systems requires collaboration, further research, and careful consideration of
trade-offs to ensure that these systems are transparent, trustworthy, and effectively meet the
needs of all stakeholders involved.
Acknowledgements This work was partially supported by the German Federal Ministry of Education and
Research, Grant Number: 21IV005E
123
Empirical Software Engineering (2025) 30:18 Page 21 of 25 18
Declarations
Conflict of Interest The authors declared that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence,
and indicate if changes were made. The images or other third party material in this article are included in the
article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
References
Atakishiyev S, Salameh M, Yao H, Goebel R (2021) Explainable artificial intelligence for autonomous driving:
a comprehensive overview and field guide for future research directions. arXiv:2112.11561
Baltes S, Ralph P (2022) Sampling in software engineering research: a critical review and guidelines. Empir
Softw Eng 27(4):94
Brennen A (2020) What do people really want when they say they want“ explainable ai?" we asked 60
stakeholders. In: Extended abstracts of the 2020 CHI conference on human factors in computing systems,
pp 1–7
Brunotte W, Chazette L, Korte K (2021) Can explanations support privacy awareness? a research roadmap. In:
2021 IEEE 29th International Requirements Engineering Conference Workshops (REW), pp 176–180.
IEEE
Caruana R, Lou Y, Gehrke J, Koch P, Sturm M, Elhadad N (2015) Intelligible models for healthcare: predicting
pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD international
conference on knowledge discovery and data mining, pp 1721–1730
Carvalho DV, Pereira EM, Cardoso JS (2019) Machine learning interpretability: a survey on methods and
metrics. Electron 8(8):832
Chazette L, Schneider K (2020) Explainability as a non-functional requirement: challenges and recommenda-
tions. Requir Eng 25(4):493–514
Chazette L, Brunotte W, Speith T (2021) Exploring explainability: a definition, a model, and a knowledge
catalogue. In: 2021 IEEE 29th International Requirements Engineering Conference (RE), pp 197–208.
IEEE
Dhanorkar S, Wolf CT, Qian K, Xu A, Popa L, Li Y (2021) Who needs to know what, when?: broadening the
explainable ai (xai) design space by looking at explanations across the ai lifecycle. Designing Interactive
Systems Conference 2021:1591–1602
Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. arXiv:1702.08608
Dwivedi R, Dave D, Naik H, Singhal S, Omer R, Patel P, Qian B, Wen Z, Shah T, Morgan G et al (2023)
Explainable ai (xai): core ideas, techniques, and solutions. ACM Comput Surv 55(9):1–33
European Parliament, Council of the European Union: Regulation (EU) 2016/679 of the European Parliament
and of the Council. https://data.europa.eu/eli/reg/2016/679/oj Accessed 2024-07-20
Gilpin LH, Bau D, Yuan BZ, Bajwa A, Specter M, Kagal L (2018) Explaining explanations: an overview of
interpretability of machine learning.In: 2018 IEEE 5th International Conference on Data Science and
Advanced Analytics (DSAA), pp 80–89. IEEE
Habiba UE, Bogner J, Wagner S (2022) Can requirements engineering support explainable artificial intelli-
gence? towards a user-centric approach for explainability requirements. In: 2022 IEEE 30th International
Requirements Engineering Conference Workshops (REW), pp 162–165
123
18 Page 22 of 25 Empirical Software Engineering (2025) 30:18
Henin C, Le Métayer D (2021) A multi-layered approach for tailored black-box explanations. In: International
conference on pattern recognition, pp 5–19. Springer
Hoffman RR, Mueller ST, Klein G, Jalaeian M, Tate C (2023) Explainable ai: roles and stakeholders, desire-
ments and challenges. Front Comput Sci 5:1117848
Ishikawa F, Matsuno Y (2020) Evidence-driven requirements engineering for uncertainty of machine learning-
based systems. In: 2020 IEEE 28th International Requirements Engineering Conference (RE), pp 346–
351. IEEE
Jansen Ferreira J, Monteiro M (2021) Designer-user communication for xai: an epistemological approach to
discuss xai design. arXiv e-prints, 2105
Jin W, Fatehi M, Abhishek K, Mallya M, Toyota B, Hamarneh G (2020) Artificial intelligence in glioma
imaging: challenges and advances. J Neural Eng 17(2):021002
Jin W, Fan J, Gromala D, Pasquier P, Hamarneh G (2023) Invisible users: uncovering end-users’ requirements
for explainable ai via explanation forms and goals. arXiv:2302.06609
Kästner L, Langer M, Lazar V, Schomäcker A, Speith T, Sterz S (2021) On the relation of trust and explainabil-
ity: Why to engineer for trustworthiness. In: 2021 IEEE 29th International Requirements Engineering
Conference Workshops (REW), pp 169–175. IEEE
Köhl MA, Baum K, Langer M, Oster D, Speith T, Bohlender D (2019) Explainability as a non-functional
requirement. In: 2019 IEEE 27th International Requirements Engineering Conference (RE), pp 363–
368. IEEE
Köhl MA, Baum K, Langer M, Oster D, Speith T, Bohlender D (2019) Explainability as a non-functional
requirement. In: 2019 IEEE 27th International Requirements Engineering Conference (RE), pp 363–
368. IEEE
Krishna S, Han T, Gu A, Pombra J, Jabbari S, Wu S, Lakkaraju H (2022) The disagreement problem in
explainable machine learning: a practitioner’s perspective. arXiv:2202.01602
Kuwajima H, Ishikawa F (2019) Adapting square for quality assessment of artificial intelligence systems. In:
2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp
13–18. IEEE
Lagioia F et al (2020) The impact of the general data protection regulation (gdpr) on artificial intelligence
Lakkaraju H, Slack D, Chen Y, Tan C, Singh S (2022) Rethinking explainability as a dialogue: a practitioner’s
perspective. arXiv:2202.01875
Lipton ZC (2018) The mythos of model interpretability: in machine learning, the concept of interpretability is
both important and slippery. Queue 16(3):31–57
Lipton ZC (2018) The mythos of model interpretability: in machine learning, the concept of interpretability is
both important and slippery. Queue 16(3):31–57
Longo L, Goebel R, Lecue F, Kieseberg P, Holzinger A (2020) Explainable artificial intelligence: Concepts,
applications, research challenges and visions. In: Machine learning and knowledge extraction: 4th IFIP
TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2020,
Dublin, Ireland, August 25–28, 2020, Proceedings, pp 1–16. Springer
Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. Adv Neural Inf Process
Syst 30
Markus AF, Kors JA, Rijnbeek PR (2021) The role of explainability in creating trustworthy artificial intelligence
for health care: a comprehensive survey of the terminology, design choices, and evaluation strategies. J
Biomed Inform 113:103655
Miller T (2019) Explanation in artificial intelligence: insights from the social sciences. Artif Intell 267:1–38
Montavon G, Samek W, Müller KR (2018) Methods for interpreting and understanding deep neural networks.
Digit Signal Process 73
Ribeiro MT, Singh S, Guestrin C (2016) ’why should i trust you?’ explaining the predictions of any classifier.
In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data
mining, pp 1135–1144
Runeson P, Höst M (2009) Guidelines for conducting and reporting case study research in software engineering.
Empir Softw Eng 14:131–164
Sachan S, Yang JB, Xu DL, Benavides DE, Li Y (2020) An explainable ai decision-support-system to automate
loan underwriting. Expert Syst Appl 144:113100
Sadeghi M, Klös V, Vogelsang A (2021) Cases for explainable software systems: characteristics and examples.
In: 2021 IEEE 29th International Requirements Engineering Conference Workshops (REW), pp 181–187.
IEEE
Seaman CB (2008) Qualitative methods. Guid Adv Empir Softw Eng 35–62
Sheh R (2021) Explainable artificial intelligence requirements for safe, intelligent robots. In: 2021 IEEE
International Conference on Intelligence and Safety for Robotics (ISR), pp 382–387. IEEE
123
Empirical Software Engineering (2025) 30:18 Page 23 of 25 18
Suresh H, Gomez SR, Nam KK, Satyanarayan A (2021) Beyond expertise and roles: a framework to characterize
the stakeholders of interpretable machine learning and their needs. In: Proceedings of the 2021 CHI
Conference on Human Factors in Computing Systems, pp 1–16
Vogelsang A, Borg M (2019) Requirements engineering for machine learning: perspectives from data scientists.
In: 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW), pp 245–251.
IEEE
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
123
18 Page 24 of 25 Empirical Software Engineering (2025) 30:18
B Umm-e- Habiba
[email protected]
Mohammad Kasra Habib
[email protected]
123
Empirical Software Engineering (2025) 30:18 Page 25 of 25 18
Justus Bogner
[email protected]
Jonas Fritzsch
[email protected]
Stefan Wagner
[email protected]
1 Institute of Software Engineering, University of Stuttgart, Stuttgart, Baden-Württemberg,
Germany
2 Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
3 Department of Software Engineering, University of Kotli Azad Jammu Kashmir, Kotli Azad
Kashmir, AJk, Pakistan
4 TUM School of Communication, Information and Technology, Technical University of Munich,
Heilbronn, Germany
123