Explanation in AI Systems Revised
Explanation in AI Systems Revised
net/publication/352061422
Explanation in AI systems
CITATIONS READS
0 46
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Marko Tesic on 02 June 2021.
1
Acknowledgements
This work was supported by the Intelligence Advanced Research Projects Activity
(IARPA) as part of the Bayesian Argumentation via Delphi (BARD) project, and by
the Humboldt Foundation.
1
Explanation in AI systems
In this chapter, we consider recent work aimed at guiding the design of algorithmically
generated explanations. The chapter proceeds in four parts. Firstly, we introduce the
general problem of machine-generated explanation and illustrate different notions of
explanation with the help of Bayesian belief networks. Secondly, we introduce key
theoretical perspectives on what constitutes an explanation, and more specifically
a ‘good’ explanation, from the philosophy literature. We compare these theoretical
perspectives and the criteria they propose with a case study on explaining reasoning
in Bayesian belief networks and present implications for AI. Thirdly, we consider the
pragmatic nature of explanation with the focus on its communicative aspects that are
manifested in considerations of trust. Finally, we present conclusions.
user levels of engagement; at least here, the step of translating low-level representa-
tions into a suitable higher-level representations accessible to us is, in a large number
of cases, already taken care of.
Bayesian Belief Networks (BNs) are an AI technique that has been viewed as sig-
nificantly more interpretable and transparent than deep neural networks (Gunning
and Aha, 2019), while still possessing a notable predictive power and being applied
to various contexts ranging from defence and military (Falzon, 2006; Laskey and Ma-
honey, 1997; Lippmann et al., 2006) and cyber security (Chockalingam et al., 2017; Xie
et al., 2010), over medicine (Agrahari et al., 2018; Wiegerinck et al., 2013), and law
and forensics (Lagnado et al., 2013; Fenton et al., 2013), to agriculture (Drury et al.,
2017) as well as psychology, philosophy, and economics (see below). As such, BNs seem
to serve well one of the goals of this chapter which is to bring and overlay insights
on explanations from different areas of research: they are a promising meeting point
connecting the research on machine-generated explanation in AI and the research on
human understanding of explanation in psychology and philosophy. We thus use BNs
as the focal point of our analysis in this chapter. Given the increasing popularity of
BNs within AI (Friedman et al., 1997; Pernkopf and Bilmes, 2005; Roos et al., 2005;
Ng and Jordan, 2002), including their relation to deep neural networks (Choi et al.,
2019; Rohekar et al., 2018; Wang and Yeung, 2016) and efforts to explain deep neural
networks via BNs (Harradon et al., 2018), this should be intrinsically interesting. Fur-
thermore, we take the kinds of issues we identify here to be indicative of the kinds of
problems and distinctions that will likely emerge in any attempt at machine-generated
explanation.
In particular, BNs are helpful in spelling out the implications of less intuitive inter-
actions between variables. This is readily illustrated with the example of “explaining
away”, a phenomenon that has received widespread psychological investigation (Davis
and Rehder, 2017; Fernbach and Rehder, 2013; Liefgreen et al., 2018; Morris and Lar-
rick, 1995; Pilditch et al., 2019; Rehder, 2014; Rehder and Waldmann, 2017; Rottman
and Hastie, 2014; Rottman and Hastie, 2016; Sussman and Oppenheimer, 2011; Tešić
et al., 2020). Figure 1.1 illustrates a simple example of explaining away. There are two
potential causes, a physical abuse and haemophilia (a genetic bleeding disorder), of
a single effect, bruises on a child’s body. Before finding out anything about whether
there are bruises on a body, the two causes are independent: learning that a child
is suffering from haemophilia will not change our beliefs about whether the child is
physically abused. However, if we learn that the child has bruises on its body, then the
two causes become dependent: additionally learning that the child is suffering from
haemophilia will change (decrease) the probability that it has been physically abused
since haemophilia alone is sufficient to explain away the bruises.
The example illustrates not just BNs ability to model explaining away situations
and provide us with both qualitative and quantitative normative answers, but also
their advantage over classical logic and rule-based expert systems. A rule-based expert
system consisting of a set of IF-THEN rules and a set of facts (see Grosan and Abraham
2011) may carry out an incorrect chaining in situations representing explaining away.
For instance, a rule-based system may combine plausibly-looking rules “If the child is
suffering from haemophilia, then it is likely the child has bruises” with “If the child
has bruises, then it is likely the child is physically abused” to get “If the child is
suffering from haemophilia, then it is likely the child is physically abused”. However,
we know that actually the opposite is true: learning about haemophilia makes physical
abuse less likely (Pearl, 1988). The application of rule-based expert systems to legal
and medical contexts (Grosan and Abraham, 2011) where explaining away and other
causal-probabilistic relationships can be found highlights the importance of accurately
capturing these relationships in computational terms.
overview). We next describe one more recent attempt in the context of the Bayesian
Argumentation via Delphi (BARD) project.
Fig. 1.2 A BN of a fictional scenario used in BARD testing phase. Four pieces of evidence are
available: Emerson Report=Yes, Quinns Report=Yes, AitF Sawyer Report=Yes, and Comms
Analyst Winter Report=No.
Fig. 1.3 A summary report generated by the BARD algorithm applied on the BN from
Figure 1.2. In addition to the natural language explanation, it provides sets with nodes that
are HighImpSet, NoImpSet and OppImpSet. For the purposes of this chapter we can ignore
MinHIS and CombMinSet.
imagine that in the BN in Figure 1.2 we additionally learn that the logs are most
likely true, but we are not absolutely convinced. To reflect that we would set p(Are
logs true? (Are Emerson & Quinn spies) = True) to equal 0.95 for instance. Thus the
probability of Are logs true? (Are Emerson & Quinn spies) = True has changed from
0.46243 to 0.95, but it didn’t go all the way to 1. The current version of the algorithm
is not able to calculate the impact of such change. Second, the explanations generated
by the algorithm are not aimed specifically at what a human user might find hard to
understand. To make matters worse, it is arguably the interactions between variables
and the often counterintuitive effects of these, that users will most struggle with (for
psychological evidence to this effect see for example Dewitt et al. 2018; Liefgreen et al.
2018; Phillips et al. 2018; Pilditch et al. 2018; Pilditch et al. 2019; Tešić et al. 2020).
In other words, the system generates an (accurate) explanation, but not necessarily a
good explanation. For further guidance on what might count as a good explanation
we consult research on this topic within the philosophy of science and epistemology,
Good explanation 7
that flagpole presumably involves an appeal to the maker of the flagpole in some form
or other. Examples such as these serve to illustrate not just the limits of Hempel’s
account but of the limits of deductive approaches in the context of explanation more
generally.
The asymmetric relations involved in explanation prompted alternative accounts
of scientific explanation within the subsequent literature. Chief among these are causal
accounts which assert that to explain something is to give a specification of its causes.
The standard explication of cause in this context is that of factors without which
something could not be the case (i.e. conditio sine qua non). This deals readily even
with low probability events, and causes can be identified through a process of “screen-
ing off”. If one finds that p(M | N, L) = p(M | N ), then N screens off L from M
and that M is causally irrelevant to L. For example, a reading of a barometer (B)
and whether there is a storm (S) are correlated. However, knowing the atmospheric
pressure (A) will make these two independent: p(B | A, S) = p(B | A), suggesting no
causal relationship between B and S. However, the notion of cause in itself is notori-
ously fraught as is evidenced by J. L. Mackie’s convoluted (Mackie, 1965) definition
whereby a cause is defined as an “insufficient but necessary part of an unnecessary but
sufficient condition”. This rather tortured definition reflects the difficulties with the
notion of causation when multiple causes are present which is giving rise to overde-
termination (for example, decapitation and arsenic in the blood stream can both be
the causes of death), the difficulties created by causal chains (for example, tipping
over the bottle which hits the floor which releases the toxic liquid) and the impact
of background conditions (for example, putting yeast in the dough causes it to rise,
but only if it is actually put in the oven, the oven works, the electrical bills have been
paid, and so on). It is a matter of ongoing research to what extent causal Bayes nets,
that is BN’s supplemented with the do-calculus (Pearl, 2000), provide a fully satis-
factory account of causality and these difficulties (see also Halpern and Pearl 2005a).
At the same time, the difficulty of picking out a single one out of multiple potential
causes points to the second main alternative to Hempel’s covering law model, namely
so-called pragmatic accounts of explanation.
According to van Fraassen (1977) an explanation always has a pragmatic com-
ponent: specifically what counts as an explanation in any given context depends on
the possible contrasts the questioner has in mind. For example, consider the question
“why did the dog bury the bone?”. Different answers are required for different prosodic
contours: “why did the dog (i.e., not some other animal) bury the bone?”; why did
the dog bury the bone? (say, rather than eat it); why did the dog bury the bone? (say,
rather than the ball). In short, pragmatic accounts bring into the picture the recipient
of an explanation while rejecting a fundamental connection between explanation and
inference assumed by Hempel’s model.
what distinguishes better ones from poorer ones. In search of explanatory virtues that
characterise good explanation, a number of factors have been identified: explanatory
power, unification, coherence, and simplicity are chief among these. Explanatory power
often relates to the ability of an explanation to decrease the degree to which we find the
explanandum surprising; the less surprising the explanadum in light of an explanation
the more powerful the explanation. For instance, a geologist may find a prehistoric
earthquake as explanatory of deformation in layers of bedrock to the extent that these
deformations would be less surprising given the occurrence of such an earthquake
(Schupbach and Sprenger, 2011). Unification refers to explanations’ ability to provide
a unified account of a wide range of phenomena. For example, Maxwell’s theory (expla-
nation) managed to unify electricity and magnetism (phenomena). Coherence renders
explanations that better fit our already established beliefs to be preferred to those
that do not (Thagard, 1989). Explanations can also have internal coherence, namely
how parts of an explanation fit together. An often motioned explanatory virtue is
simplicity. According to Thagard (1978), simplicity is related to the size and nature
of auxiliary assumptions needed by an explanation to explain evidence. For instance,
the phlogiston theory of combustion needed a number of auxiliary assumptions to ex-
plain facts that are easily explained by Lavoisier’s theory: it assumed existence of a
fire-like element ‘phlogiston’ that’s given away in combustion and that had ‘negative
weight’ since bodies undergoing combustion increase in weight. Others operationalise
simplicity as a number of causes invoked in an explanation: the more causes the less
simple an explanation (Lombrozo, 2007).
While all of these factors seem intuitive, debate persists about their normative
basis. In particular, there is ongoing debate within the philosophy of science about
whether these factors admit of adequate probabilistic reconstruction (Glymour, 2014).
At the same time, there is now a sizeable program within psychology that seeks to ex-
amine the application of these virtues to every day lay explanation. This body of work
probes the extent to which lay reasoners endorse these criteria when distinguishing
better from worse explanations (Bechlivanidis et al., 2017; Bonawitz and Lombrozo,
2012; Johnson et al., 2014a; Johnson et al., 2014b; Lombrozo, 2007; Lombrozo, 2016;
Pennington and Hastie, 1992; Sloman, 1994; Williams and Lombrozo, 2010; Zemla
et al., 2017). To date, researchers found some degree of support for these factors, but
also seeming deviations in practice.
Finally, there is a renewed interest in both philosophy and psychology in the no-
tion of inference to the best explanation (Harman, 1965; Lipton, 2003). Debate here
centres around the question of whether the fact that an explanation seems in some
purely non-evidential way better than its rivals should provide grounds for thinking
that explanation is more probable. In other words, the issue is whether an explanation
exhibiting certain explanatory considerations that other explanations do not should
be considered more likely to be true (Douven, 2013; Harman, 1967; Henderson, 2013;
Lipton, 2003; Thagard, 1978). Likewise this has prompted psychological research into
whether such probability boosts can actually be observed in reasoning contexts (Dou-
ven and Schupbach, 2015). The research on explanatory virtues in both philosophy
and psychology is still very much active.
10 Explanation in AI systems
1.2.3 Implications
explore these ideas further, we conducted a case study on explanation in BNs which
we describe next.
Three independent raters, all of whom were experts in probabilistic reasoning, were
then given access to implementations of the respective model in order to probe the
BNs in more detail, and asked to then provide answers to questions that prompted
them to consider how learning evidence changed the probabilities of the target nodes.
Below are sample questions:
• Given evidence: {Neighbours grass = wet}
Question: How does the probability of ‘Our Sprinkler = was on’ change compared
to when there was no evidence and why?
• Given Evidence: {Our grass = wet, Wall = wet}
Question: How does the probability of ‘Rain = rained’ change compared to when
the only available evidence was ‘Our Grass = wet’ and why?
Subsequently, the three independent sets of answers were subjected to an analysis
by a fourth person in order to identify both commonalities and differences across the
answers. This then formed the basis of the subsequent evaluation of those answers.
12 Explanation in AI systems
We describe the full set of results elsewhere (Tešić and Hahn, prepa), restricting
ourselves here to an initial summary of the results. First, we observed high levels of
agreement across answers. Differences were typically more presentational than sub-
stantive. For example, the following three statements all seek to describe the same
state of affairs:
• ‘As A is true C is more likely to be true if B is true and less likely to be true if B
is false. As we do not know B these alternatives essentially cancel themselves out
and leave the probability of C unchanged.’
• ‘It does not change. P (C | A) is equal to P (C) if P (C | A, B) = P (C | ∼A, ∼B)
and P (C | A, ∼B) = P (C | ∼A, B) (assuming P (B) = 0.5), which here is the case.’
• ‘According to model parameters: If A and B both true or both false, then C has
probability .75. If A true but B false, or vice-versa, then C has probability .25.
When we know A is true, and prior for B is 50%, there is a 50% that probability
of C is 75% and a 50% that probability of C is 2%, therefore overall probability
of C is 50%.’
Second, all appeal to hypothetical reasoning as a way of unpacking interactions of
evidence variables:
• ‘Wall = wet is a lot more likely if the sprinkler was on than if it rained (as a
matter of fact, if it rained, the wall is more likely to be dry than wet). Since, Our
sprinkler = was on went down, Wall = wet went down.’
Third, causal explanations are prevalent, and, where present, typically appeal to
the underlying target system being modelled (i.e., sprinkler, wall, rain) as opposed to
the model itself:
• ‘The probability of rain decreases because, although the sprinkler and rain can
both cause our grass to be wet, the wet wall is more likely to happen when the
sprinkler is on rather than rain.’
Notably, in appealing to causes, it is the most probable cause that seems to be
highlighted as an explanation:
• ‘There is a decrease [in probability] because the most likely cause of our grass
being wet is the sprinkler and since the wall is dry the sprinkler is unlikely to be
on.’
Finally, these data seem to suggest that the structure of the BNs is exploited
in order to zero in on ‘the explanation’ as a subset of all variables described in the
problem. Specifically, explanations seemed to make use of the Markov blanket: a set of
nodes consisting of the another node’s parents, its children, and its children’s parents
and making that node conditionally independent of the rest of the network (Korb and
Nicholson, 2010). In addition to Markov blanket, rater’s descriptions mostly followed
the direction of evidence propagation, i.e. followed the directed paths in a BN:
• ‘The probability of Battery voltage = dead increases because failure of the car to
start could be explained by the car not cranking and the likely cause of this is
a faulty starter system. A dead battery is one possible explanation for a faulty
starter system.’
Bringing in the user: bi-directional relationships 13
the reliability of the explanation’s source. Here, it is not only characteristics of the
explanation such as its perceived cogency, how articulately it is framed, or how easily
to process it that are likely to influence perceived source reliability, there are also likely
to be effects of the specific content. In particular, the extent to which the content of
the message fits with our present (uncertain) beliefs about the world has been shown
to affect beliefs about the issue at hand as well as beliefs about the perceived reliability
of the speaker (Collins et al., 2018; Collins and Hahn, 2019).
1.4 Conclusions
We have seen multiple ways to build different notions of what counts as an explana-
tion. One of these involves explanation as identification of the variables that mattered
in generating certain outcome. In the context of computational models of explanation
in BN’s this corresponds to the usual focus on explaining observed evidence via unob-
served nodes within the network (Pacer et al., 2013). In other words, the explanation
identifies a justification/hypothesis. This is the notion of explanation that has figured
prominently in work on computer-generated explanations as well as in psychological
and philosophical literature on explanation. The second notion of explanation we con-
sidered includes explanation of the inference that links evidence and hypothesis. In
the context of BN’s this means explaining the inferences that lead to a change (or
no change) in the probabilities of the query nodes. In other words, the explanation
involves a target hypothesis plus information about the incremental reasoning process
16 Explanation in AI systems
Fernbach, Philip M and Rehder, Bob (2013). Cognitive shortcuts in causal inference.
Argument & Computation, 4(1), 64–88.
Friedman, Nir, Geiger, Dan, and Goldszmidt, Moises (1997). Bayesian network clas-
sifiers. Machine learning, 29(2-3), 131–163.
Glymour, Clark (2014). Probability and the explanatory virtues. British Journal for
the Philosophy of Science, 66(3), 591–604.
Goodfellow, Ian, Bengio, Yoshua, and Courville, Aaron (2016). Deep Learning. MIT
Press.
Goodman, Bryce and Flaxman, Seth (2016). EU regulations on algorithmic decision-
making and a “right to explanation”. In ICML workshop on human interpretability
in machine learning (WHI 2016), New York, NY. http://arxiv. org/abs/1606.08813
v1.
Graves, Alex and Schmidhuber, Jürgen (2005). Framewise phoneme classifica-
tion with bidirectional lstm and other neural network architectures. Neural net-
works, 18(5-6), 602–610.
Grosan, Crina and Abraham, Ajith (2011). Rule-based expert systems. In Intelligent
Systems, pp. 149–185. Springer.
Gunning, David and Aha, David W (2019). Darpa’s explainable artificial intelligence
program. AI Magazine, 40(2), 44–58.
Hahn, Ulrike and Hornikx, Jos (2016). A normative framework for argument quality:
argumentation schemes with a Bayesian foundation. Synthese, 193(6), 1833–1873.
Hahn, Ulrike and Oaksford, Mike (2006). A Bayesian approach to informal argument
fallacies. Synthese, 152(2), 207–236.
Hahn, Ulrike and Oaksford, Mike (2007). The rationality of informal argumentation:
a Bayesian approach to reasoning fallacies. Psychological review , 114(3), 704.
Halpern, Joseph Y and Pearl, Judea (2005a). Causes and explanations: A structural-
model approach. Part I: Causes. The British journal for the philosophy of sci-
ence, 56(4), 843–887.
Halpern, Joseph Y and Pearl, Judea (2005b). Causes and explanations: A structural-
model approach. Part II: Explanations. The British journal for the philosophy of
science, 56(4), 889–911.
Harman, Gilbert (1965). The inference to the best explanation. The philosophical
review , 74(1), 88–95.
Harman, Gilbert (1967). Detachment, probability, and maximum likelihood. Nous,
401–411.
Harradon, Michael, Druce, Jeff, and Ruttenberg, Brian (2018). Causal learning and
explanation of deep neural networks via autoencoded activations. arXiv preprint
arXiv:1802.00541 .
Harris, Adam JL, Hahn, Ulrike, Madsen, Jens K, and Hsu, Anne S (2016). The
appeal to expert opinion: quantitative support for a Bayesian network approach.
Cognitive Science, 40(6), 1496–1533.
Hayes, Bradley and Shah, Julie A (2017). Improving robot controller transparency
through autonomous policy explanation. In 2017 12th ACM/IEEE International
Conference on Human-Robot Interaction (HRI, pp. 303–312. IEEE.
Hempel, Carl G. and Oppenheim, Paul (1948). Studies in the logic of explanation.
20 References
Wiegerinck, Wim, Burgers, Willem, and Kappen, Bert (2013). Bayesian networks,
introduction and practical applications. In Handbook on Neural Information Pro-
cessing, pp. 401–431. Springer.
Williams, Joseph J and Lombrozo, Tania (2010). The role of explanation in discovery
and generalization: Evidence from category learning. Cognitive Science, 34(5), 776–
806.
Woodward, James (2017). Scientific explanation. In The Stanford Encyclopedia of
Philosophy (Fall 2017 edn) (ed. E. N. Zalta). Metaphysics Research Lab, Stanford
University.
Xie, Peng, Li, Jason H, Ou, Xinming, Liu, Peng, and Levy, Renato (2010). Using
Bayesian networks for cyber security analysis. In 2010 IEEE/IFIP International
Conference on Dependable Systems & Networks (DSN), pp. 211–220. IEEE.
Yap, Ghim-Eng, Tan, Ah-Hwee, and Pang, Hwee-Hwa (2008). Explaining inferences
in bayesian networks. Applied Intelligence, 29(3), 263–278.
Yuan, Changhe, Lim, Heejin, and Lu, Tsai-Ching (2011). Most relevant explanation
in Bayesian networks. Journal of Artificial Intelligence Research, 42, 309–352.
Zemla, Jeffrey C, Sloman, Steven, Bechlivanidis, Christos, and Lagnado, David A
(2017). Evaluating everyday explanations. Psychonomic bulletin & review , 24(5),
1488–1500.
Zhang, Yongfeng and Chen, Xu (2020). Explainable recommendation: A survey and
new perspectives. Foundations and Trends in Information Retrieval , 14(1), 1–101.
Zukerman, Ingrid, McConachy, Richard, and Korb, Kevin B. (1998). Bayesian
reasoning in an abductive mechanism for argument generation and analysis. In
AAAI/IAAI, pp. 833–838.
Zukerman, Ingrid, McConachy, Richard, Korb, Kevin B., and Pickett, Deborah
(1999). Exploratory interaction with a Bayesian argumentation system. In IJCAI,
pp. 1294–1299.