0% found this document useful (0 votes)

8 views

The Mythos of Model Interpretability

In machine learning, the concept of interpretability is both important and slippery.

Uploaded by

Lineth Martinez

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

The Mythos of Model Interpretability

In machine learning, the concept of interpretability is both important and slippery.

Uploaded by

Lineth Martinez

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

1 of 27 TEXT

machine learning ONLY

The
Mythos
of Model
Interpretability
In machine learning, the
concept of interpretability is
both important and slippery.

ZACHARY C. LIPTON

S
upervised machine-learning models boast
remarkable predictive capabilities. But can you
trust your model? Will it work in deployment?
What else can it tell you about the world?
Models should be not only good, but also
interpretable, yet the task of interpretation appears
underspecified. The academic literature has provided
diverse and sometimes non-overlapping motivations for
interpretability and has offered myriad techniques for
rendering interpretable models. Despite this ambiguity,
many authors proclaim their models to be interpretable
axiomatically, absent further argument. Problematically,
it is not clear what common properties unite these
techniques.
This article seeks to refine the discourse on
interpretability. First it examines the objectives of previous
acmqueue | may-june 2018 1
machine learning 2 of 27

papers addressing interpretability, finding them to be

diverse and occasionally discordant. Then, it explores
model properties and techniques thought to confer
interpretability, identifying transparency to humans and
post hoc explanations as competing concepts. Throughout,
the feasibility and desirability of different notions of
interpretability are discussed. The article questions the
oft-made assertions that linear models are interpretable
and that deep neural networks are not.

INTRODUCTION
Until recently, humans had a monopoly on agency in
society. If you applied for a job, loan, or bail, a human
decided your fate. If you went to the hospital, a human
would attempt to categorize your malady and recommend
treatment. For consequential decisions such as these, you
might demand an explanation from the decision-making
agent.
If your loan application is denied, for example, you
might want to understand the agent’s reasoning in a bid to
strengthen your next application. If the decision was based
on a flawed premise, you might contest this premise in the
hope of overturning the decision. In the hospital, a doctor’s
explanation might educate you about your condition.
In societal contexts, the reasons for a decision often
matter. For example, intentionally causing death (murder)
vs. unintentionally (manslaughter) are distinct crimes.
Similarly, a hiring decision being based (directly or
indirectly) on a protected characteristic such as race has a
bearing on its legality. However, today’s predictive models
are not capable of reasoning at all.

acmqueue | may-june 2018 2

machine learning 3 of 27

Over the past 20 years, rapid progress in ML (machine

learning) has led to the deployment of automatic decision
processes. Most ML-based decision making in practical use
works in the following way: the ML algorithm is trained
to take some input and predict the corresponding output.
For example, given a set of attributes characterizing a
financial transaction, an ML algorithm can predict the
long-term return on investment. Given images from a CT

I
scan, the algorithm can assign a probability that the scan
n the academic
literature,
depicts a cancerous tumor. The ML algorithm takes in a
few authors large corpus of (input, output) pairs, and outputs a model
articulate that can predict the output corresponding to a previously
precisely what unseen input. Formally, researchers call this problem
interpretability
means or setting supervised learning. Then, to automate decisions
precisely how fully, one feeds the model’s output into some decision rule.
their proposed For example, spam filters programmatically discard emails
solution is
predicted to be spam with a level of confidence exceeding
useful.
some threshold.
Thus, ML-based systems do not know why a given input
should receive some label, only that certain inputs are
correlated with that label. For example, shown a dataset
in which the only orange objects are basketballs, an image
classifier might learn to classify all orange objects as
basketballs.
This model would achieve high accuracy even on held
out images, despite failing to grasp the difference that
actually makes a difference.
As ML penetrates critical areas such as medicine, the
criminal justice system, and financial markets, the inability
of humans to understand these models seems problematic.
Some suggest model interpretability as a remedy, but in the

acmqueue | may-june 2018 3

machine learning 4 of 27

academic literature, few authors articulate precisely what

interpretability means or precisely how their proposed
solution is useful.
Despite the lack of a definition, a growing body of
literature proposes purportedly interpretable algorithms.
From this, you might conclude that either: (1) the definition
of interpretability is universally agreed upon, but no
one has bothered to set it in writing; or (2) the term
interpretability is ill-defined, and, thus, claims regarding
interpretability of various models exhibit a quasi-scientific
character. An investigation of the literature suggests
the latter. Both the objectives and methods put forth in
the literature investigating interpretability are diverse,
suggesting that interpretability is not a monolithic concept
but several distinct ideas that must be disentangled before
any progress can be made.
This article focuses on supervised learning rather than
other ML paradigms such as reinforcement learning and
interactive learning. This scope derives from the current
primacy of supervised learning in real-world applications
and an interest in the common claim that linear models are
interpretable while deep neural networks are not.15 To gain
conceptual clarity, consider these refining questions: What
is interpretability? Why is it important?
Let’s address the second question first (expanded in
the section, “Desiderata of Interpretability Research”).
Many authors have proposed interpretability as a
means to engender trust.9,24 This leads to a similarly
vexing epistemological question: What is trust? Does it
refer to faith that a model will perform well? Or does
interpretability simply mean a low-level mechanistic

acmqueue | may-june 2018 4

machine learning 5 of 27

understanding of models? Is trust defined subjectively?

Other authors suggest that an interpretable model is
desirable because it might help uncover causal structure
in observational data.1 The legal notion of a right to
explanation offers yet another lens on interpretability. One
goal of interpretability might simply be to get more useful
information from the model.
While the discussed desiderata, or objectives of
interpretability, are diverse, they typically speak to
situations where standard ML problem formulations,
e.g. maximizing accuracy on a set of hold-out data for
which the training data is perfectly representative,
are imperfectly matched to the complex real-life
tasks they are meant to solve. Consider medical
research with longitudinal data. The real goal may be
to discover potentially causal associations that can
guide interventions, as with smoking and cancer.29 The
optimization objective for most supervised learning
models, however, is simply to minimize error, a feat that
might be achieved in a purely correlative fashion.
Another example of such a mismatch is that available
training data imperfectly represents the likely deployment
environment. Real environments often have changing
dynamics. Imagine training a product recommender for
an online store, where new products are periodically
introduced, and customer preferences can change over
time. In more extreme cases, actions from an ML-based
system may alter the environment, invalidating future
predictions.
After addressing the desiderata of interpretability,
this article considers which properties of models might

acmqueue | may-june 2018 5

machine learning 6 of 27

render them interpretable (expanded in the section,

“Properties of Interpretable Models”). Some papers equate
interpretability with understandability or intelligibility,16
(i.e., you can grasp how the models work). In these papers,
understandable models are sometimes called transparent,
while incomprehensible models are called black boxes.
But what constitutes transparency? You might look to the
algorithm itself: Will it converge? Does it produce a unique
solution? Or you might look to its parameters: Do you
understand what each represents? Alternatively, you could
consider the model’s complexity: Is it simple enough to be
examined all at once by a human?
Other work has investigated so-called post hoc
interpretations. These interpretations might explain
predictions without elucidating the mechanisms by which
models work. Examples include the verbal explanations
produced by people or the saliency maps used to analyze
deep neural networks. Thus, human decisions might admit
post hoc interpretability despite the black-box nature
of human brains, revealing a contradiction between two
popular notions of interpretability.

DESIDERATA OF INTERPRETABILITY RESEARCH

This section spells out the various desiderata of
interpretability research. The demand for interpretability
arises when a mismatch occurs between the formal
objectives of supervised learning (test set predictive
performance) and the real-world costs in a deployment
setting.
Typically, evaluation metrics require only predictions
and ground-truth labels. When stakeholders additionally

acmqueue | may-june 2018 6

machine learning 7 of 27

demand interpretability, you might infer the existence

of objectives that cannot be captured in this fashion.
Consider that most common evaluation metrics for
supervised learning require only predictions, together with
ground truth, to produce a score. So, the very desire for
an interpretation suggests that sometimes predictions
alone and metrics calculated on them do not suffice to
characterize the model. You should then ask, What are
these other objectives and under what circumstances are
they sought?
Often, real-world objectives are difficult to encode
as simple mathematical functions. Otherwise, they
might just be incorporated into the objective function
and the problem would be considered solved. For
example, an algorithm for making hiring decisions should
simultaneously optimize productivity, ethics, and legality.
But how would you go about writing a function that
measures ethics or legality? The problem can also arise
when you desire robustness to changes in the dynamics
between the training and deployment environments.

Trust
Some authors suggest interpretability is a prerequisite
for trust.9,23 Again, what is trust? Is it simply confidence
that a model will perform well? If so, a sufficiently
accurate model should be demonstrably trustworthy, and
interpretability would serve no purpose. Trust might also
be defined subjectively. For example, a person might feel
more at ease with a well-understood model, even if this
understanding serves no obvious purpose. Alternatively,
when the training and deployment objectives diverge, trust

acmqueue | may-june 2018 7

machine learning 8 of 27

might denote confidence that the model will perform well

with respect to the real objectives and scenarios.
For example, consider the growing use of ML models
to forecast crime rates for purposes of allocating police
officers. The model may be trusted to make accurate
predictions but not to account for racial biases in the
training data for the model’s own effect in perpetuating
a cycle of incarceration by over-policing some
neighborhoods.
Another sense in which an end user might be said to
trust an ML model might be if they are comfortable with
relinquishing control to it. Through this lens, you might
care not only about how often a model is right, but also
for which examples it is right. If the model tends to make
mistakes on only those kinds of inputs where humans also
make mistakes, and thus is typically accurate whenever
humans are accurate, then you might trust the model
owing to the absence of any expected cost of relinquishing
control. If a model tends to make mistakes for inputs that
humans classify accurately, however, then there may
always be an advantage to maintaining human supervision
of the algorithms.

Causality
Although supervised learning models are only optimized
directly to make associations, researchers often use them
in the hope of inferring properties of the natural world. For
example, a simple regression model might reveal a strong
association between thalidomide use and birth defects, or
between smoking and lung cancer.29
The associations learned by supervised learning

acmqueue | may-june 2018 8

machine learning 9 of 27

algorithms are not guaranteed to reflect causal

relationships. There could always be unobserved causes
responsible for both associated variables. You might hope,
however, that by interpreting supervised learning models,
you could generate hypotheses that scientists could then
test. Liu et al.,14 for example, emphasize regression trees
and Bayesian neural networks, suggesting that these
models are interpretable and thus better able to provide

W
clues about the causal relationships between physiologic
hile
the
signals and affective states. The task of inferring causal
machine- relationships from observational data has been extensively
learning studied.22 Causal inference methods, however, tend to
objective rely on strong assumptions and are not widely used by
might be to
reduce error, practitioners, especially on large, complex data sets.
the real-world
purpose is to Transferability
provide useful
Typically, training and test data are chosen by randomly
information.
partitioning examples from the same distribution. A
model’s generalization error is then judged by the gap
between its performance on training and test data.
Humans exhibit a far richer capacity to generalize,
however, transferring learned skills to unfamiliar
situations. ML algorithms are already used in these
situations, such as when the environment is nonstationary.
Models are also deployed in settings where their use might
alter the environment, invalidating their future predictions.
Along these lines, Caruana et al.3 describe a model trained
to predict probability of death from pneumonia that
assigned less risk to patients if they also had asthma.
Presumably, asthma was predictive of a lower risk of death
because of the more aggressive treatment these patients

acmqueue | may-june 2018 9

machine learning 10 of 27

received. If the model were deployed to aid in triage, these

patients might then receive less aggressive treatment,
invalidating the model.
Even worse, there are situations, such as machine
learning for security, where the environment might be
actively adversarial. Consider the recently discovered
susceptibility of CNNs (convolutional neural networks).
The CNNs were made to misclassify images that were
imperceptibly (to a human) perturbed.26 Of course, this isn’t
overfitting in the classical sense. The models both achieve
strong results on training data and generalize well when
used to classify held out test data. The crucial distinction
is that these images have been altered in ways that, while
subtle to human observers, the models never encountered
during training. However, these are mistakes a human
wouldn’t make, and it would be preferable that models
not make these mistakes, either. Already, supervised
learning models are regularly subject to such adversarial
manipulation. Consider the models used to generate
credit ratings; higher scores should signify a higher
probability that an individual repays a loan. According to
its own technical report, FICO trains credit models using
logistic regression,6 specifically citing interpretability as
a motivation for the choice of model. Features include
dummy variables representing binned values for average
age of accounts, debt ratio, the number of late payments,
and the number of accounts in good standing.
Several of these factors can be manipulated at will
by credit-seekers. For example, one’s debt ratio can be
improved simply by requesting periodic increases to
credit lines while keeping spending patterns constant.

acmqueue | may-june 2018 10

machine learning 11 of 27

Similarly, the total number of accounts can be increased by

simply applying for new accounts when the probability of
acceptance is reasonably high. Indeed, FICO and Experian
both acknowledge that credit ratings can be manipulated,
even suggesting guides for improving one’s credit rating.
These rating-improvement strategies may fundamentally
change one’s underlying ability to pay a debt. The fact
that individuals actively and successfully game the rating
system may invalidate its predictive power.

Informativeness
Sometimes, decision theory is applied to the outputs of
supervised models to take actions in the real world. In
another common use paradigm, however, the supervised
model is used instead to provide information to human
decision-makers, a setting considered by Kim et al.11 and
Huysmans et al.8 While the machine-learning objective
might be to reduce error, the real-world purpose is to
provide useful information. The most obvious way that a
model conveys information is via its outputs, but it may
be possible via some procedure to convey additional
information to the human decision-maker.
An interpretation may prove informative even without
shedding light on a model’s inner workings. For example,
a diagnosis model might provide intuition to a human
decision maker by pointing to similar cases in support
of a diagnostic decision. In some cases, a supervised
learning model is trained when the real task more closely
resembles unsupervised learning. The real goal might be
to explore the underlying structure of the data, and the
labeling objective serves only as weak supervision.

acmqueue | may-june 2018 11

machine learning 12 of 27

Fair and ethical decision making

At present, politicians, journalists, and researchers have
expressed concern that interpretations must be produced
for assessing whether decisions produced automatically
by algorithms conform to ethical standards.7 Recidivism
predictions are already used to determine whom to
release and whom to detain, raising ethical concerns. How
can you be sure that predictions do not discriminate on
the basis of race? Conventional evaluation metrics such
as accuracy or AUC (area under the curve) offer little
assurance that ML-based decisions will behave acceptably.
Thus, demands for fairness often lead to demands for
interpretable models.

THE TRANSPARENCY NOTION OF INTERPRETABILITY

Let’s now consider the techniques and model properties
that are proposed to confer interpretability. These
fall broadly into two categories. The first relates to
transparency (i.e., how does the model work?). The second
consists of post hoc explanations (i.e., what else can the
model tell me?)
Informally, transparency is the opposite of opacity or
“black-boxness.” It connotes some sense of understanding
the mechanism by which the model works. Transparency
is considered here at the level of the entire model
(simulatability), at the level of individual components such
as parameters (decomposability), and at the level of the
training algorithm (algorithmic transparency).

Simulatability
In the strictest sense, a model might be called transparent

acmqueue | may-june 2018 12

machine learning 13 of 27

if a person can contemplate the entire model at once. This

definition suggests that an interpretable model is a simple
model. For example, for a model to be fully understood,
a human should be able to take the input data together
with the parameters of the model and in reasonable
time step through every calculation required to produce
a prediction. This accords with the common claim that
sparse linear models, as produced by lasso regression,27
are more interpretable than dense linear models learned
on the same inputs. Ribeiro et al.23 also adopt this notion of
interpretability, suggesting that an interpretable model is
one that “can be readily presented to the user with visual
or textual artifacts.”
The tradeoffs between model size and computation
to apply a single prediction varies across models. For
example, in some models, such as decision trees, the size
of the model (total number of nodes) may grow quite
large compared to the time required to perform inference
(length of pass from root to leaf). This suggests that
simulatability may admit two subtypes: one based on the
size of the model and another based on the computation
required to perform inference.
Fixing a notion of simulatability, the quantity denoted
by reasonable is subjective. Clearly, however, given the
limited capacity of human cognition, this ambiguity might
span only several orders of magnitude. In this light, neither
linear models, rule-based systems, nor decision trees are
intrinsically interpretable. Sufficiently high-dimensional
models, unwieldy rule lists, and deep decision trees could
all be considered less transparent than comparatively
compact neural networks.

acmqueue | may-june 2018 13

machine learning 14 of 27

Decomposability
A second notion of transparency might be that each part
of the model—input, parameter, and calculation—admits
an intuitive explanation. This accords with the property of
intelligibility as described by Lou et al.15 For example, each
node in a decision tree might correspond to a plain text
description (e.g., all patients with diastolic blood pressure

T
over 150). Similarly, the parameters of a linear model could
he weights
of a lin-
be described as representing strengths of association
ear model between each feature and the label.
might seem Note that this notion of interpretability requires
intuitive, that inputs themselves be individually interpretable,
but they can be
fragile with re- disqualifying some models with highly engineered or
spect to feature anonymous features. While this notion is popular, it
selection and shouldn’t be accepted blindly. The weights of a linear model
preprocessing.
might seem intuitive, but they can be fragile with respect
to feature selection and preprocessing. For example, the
coefficient corresponding to the association between
flu risk and vaccination might be positive or negative,
depending on whether the feature set includes indicators
of old age, infancy, or immunodeficiency.

Algorithmic transparency
A final notion of transparency might apply at the level of
the learning algorithm itself. In the case of linear models,
you may understand the shape of the error surface. You
can prove that training will converge to a unique solution,
even for previously unseen data sets. This might provide
some confidence that the model will behave in an online
setting requiring programmatic retraining on previously

acmqueue | may-june 2018 14

machine learning 15 of 27

unseen data. On the other hand, modern deep learning

methods lack this sort of algorithmic transparency. While
the heuristic optimization procedures for neural networks
are demonstrably powerful, we don’t understand how they
work, and at present cannot guarantee a priori that they
will work on new problems. Note, however, that humans
exhibit none of these forms of transparency.

Post hoc interpretability

Post hoc interpretability represents a distinct approach to
extracting information from learned models. While post hoc
interpretations often do not elucidate precisely how a model
works, they may nonetheless confer useful information
for practitioners and end users of machine learning. Some
common approaches to post hoc interpretations include
natural language explanations, visualizations of learned
representations or models, and explanations by example
(e.g., this tumor is classified as malignant because to the
model it looks a lot like these other tumors).
To the extent that we might consider humans to
be interpretable, this is the sort of interpretability
that applies. For all we know, the processes by which
humans make decisions and those by which they explain
them may be distinct. One advantage of this concept of
interpretability is that opaque models can be interpreted
after the fact, without sacrificing predictive performance.

Text explanations
Humans often justify decisions verbally. Similarly, one
model might be trained to generate predictions, and
a separate model, such as a recurrent neural network

acmqueue | may-june 2018 15

machine learning 16 of 27

language model, to generate an explanation. Such an

approach is taken in a line of work by Krening et al.12 They
propose a system in which one model (a reinforcement
learner) chooses actions to optimize cumulative
discounted return. They train another model to map a
model’s state representation onto verbal explanations
of strategy. These explanations are trained to maximize
the likelihood of previously observed ground-truth
explanations from human players and may not faithfully
describe the agent’s decisions, however plausible they
appear. A connection exists between this approach and
recent work on neural image captioning in which the
representations learned by a discriminative CNN (trained
for image classification) are coopted by a second model to
generate captions. These captions might be regarded as
interpretations that accompany classifications.
In work on recommender systems, McAuley and
Leskovec18 use text to explain the decisions of a latent
factor model. Their method consists of simultaneously
training a latent factor model for rating prediction and
a topic model for product reviews. During training they
alternate between decreasing the squared error on rating
prediction and increasing the likelihood of review text.
The models are connected because they use normalized
latent factors as topic distributions. In other words, latent
factors are regularized such that they are also good
at explaining the topic distributions in review text. The
authors then explain user-item compatibility by examining
the top words in the topics corresponding to matching
components of their latent factors. Note that the practice
of interpreting topic models by presenting the top words is

acmqueue | may-june 2018 16

machine learning 17 of 27

itself a post hoc interpretation technique that has invited

scrutiny.4 Moreover note that here we have only spoken
to the form factor of an explanation (that it consists of
natural language), but not what precisely constitutes
correctness. So far, the literature has dodged the issue of
correctness, sometimes punting the issue by embracing
a subjective view of the problem and asking people what
they prefer.
Visualization
Another common approach to generating post hoc
interpretations is to render visualizations in the hope of
determining qualitatively what a model has learned. One
popular method is to visualize high-dimensional distributed
representations with t-SNE (t-distributed stochastic
neighbor embedding),28 a technique that renders 2D
visualizations in which nearby data points are likely to
appear close together.
Mordvintsev et al.20 attempt to explain what an image
classification network has learned by altering the input
through gradient descent to enhance the activations
of certain nodes selected from the hidden layers. An
inspection of the perturbed inputs can give clues to what
the model has learned. Likely because the model was
trained on a large corpus of animal images, they observed
that enhancing some nodes caused certain dog faces to
appear throughout the input image.
In the computer vision community, similar approaches
have been explored to investigate what information is
retained at various layers of a neural network. Mahendran
and Vedaldi17 pass an image through a discriminative CNN
to generate a representation. They then demonstrate that

acmqueue | may-june 2018 17

machine learning 18 of 27

the original image can be recovered with high fidelity even

from reasonably high-level representations (level 6 of an
AlexNet) by performing gradient descent on randomly
initialized pixels. As before with text, discussions of
visualization focus on form factor and appeal, but we still
lack a rigorous standard of correctness.

Local explanations
While it may be difficult to describe succinctly the full
mapping learned by a neural network, some of the
literature focuses instead on explaining what a neural
network depends on locally. One popular approach for
deep neural nets is to compute a saliency map. Typically,
they take the gradient of the output corresponding to
the correct class with respect to a given input vector. For
images, this gradient can be applied as a mask, highlighting
regions of the input that, if changed, would most influence
the output.25,30
Note that these explanations of what a model is
focusing on may be misleading. The saliency map is a local
explanation only. Once you move a single pixel, you may get
a very different saliency map. This contrasts with linear
models, which model global relationships between inputs
and outputs.
Another attempt at local explanations is made by
Ribeiro et al.23 In this work, the authors explain the
decisions of any model in a local region near a particular
point by learning a separate sparse linear model to explain
the decisions of the first. Strangely, although the method’s
appeal over saliency maps owes to its ability to provide
explanations for non-differentiable models, it is more

acmqueue | may-june 2018 18

machine learning 19 of 27

often used when the model subject to interpretation is in

fact differentiable. In this case, what is provided, besides
a noisy estimate of the gradient, remains unclear. In this
paper, the explanation is offered in terms of a set of
superpixels. Whether or not this is more informative than
a plain gradient may depend strongly on how one chooses
the superpixels. Moreover, absent a rigorously defined
objective, who is to say which hyper-parameters are
correct?

Explanation by example
One post hoc mechanism for explaining the decisions of
a model might be to report (in addition to predictions)
which other examples are most similar with respect to
the model, a method suggested by Caruana et al.2 Training
a deep neural network or latent variable model for a
discriminative task provides access to not only predictions
but also the learned representations. Then, for any
example, in addition to generating a prediction, you can
use the activations of the hidden layers to identify the
k-nearest neighbors based on the proximity in the space
learned by the model. This sort of explanation by example
has precedent in how humans sometimes justify actions by
analogy. For example, doctors often refer to case studies
to support a planned treatment protocol.
In the neural network literature, Mikolov et al.19 use
such an approach to examine the learned representations
of words after training the word2vec model. While their
model is trained for discriminative skip-gram prediction,
to examine which relationships the model has learned
they enumerate nearest neighbors of words based on

acmqueue | may-june 2018 19

machine learning 20 of 27

distances calculated in the latent space. Kim et al.10 and

Doshi-Velez et al.5 have done related work in Bayesian
methods, investigating case-based reasoning approaches
for interpreting generative models.

DISCUSSION
The concept of interpretability appears simultaneously
important and slippery. Earlier, this article analyzed both

L
the motivations for interpretability and some attempts by
inear models
lose simu-
the research community to confer it. Now let’s consider the
latability implications of this analysis and offer several takeaways.
or decom- 3 Linear models are not strictly more interpretable
posability,
than deep neural networks. Despite this claim’s enduring
respectively.
popularity, its truth value depends on which notion of
interpretability is employed. With respect to algorithmic
transparency, this claim seems uncontroversial, but
given high-dimensional or heavily engineered features,
linear models lose simulatability or decomposability,
respectively.
When choosing between linear and deep models,
you must often make a tradeoff between algorithmic
transparency and decomposability. This is because
deep neural networks tend to operate on raw or lightly
processed features. So, if nothing else, the features are
intuitively meaningful, and post hoc reasoning is sensible.
To get comparable performance, however, linear models
often must operate on heavily hand-engineered features.
Lipton et al.13 demonstrate such a case where linear
models can approach the performance of recurrent neural
networks (RNNs) only at the cost of decomposability.
For some kinds of post hoc interpretation, deep

acmqueue | may-june 2018 20

machine learning 21 of 27

neural networks exhibit a clear advantage. They learn

rich representations that can be visualized, verbalized,
or used for clustering. Considering the desiderata for
interpretability, linear models appear to have a better
track record for studying the natural world, but there
seems to be no theoretical reason why this must be so.
Conceivably, post hoc interpretations could prove useful in
similar scenarios.
3 Claims about interpretability must be qualified.
As demonstrated here, the term interpretability does
not reference a monolithic concept. To be meaningful,
any assertion regarding interpretability should fix
a specific definition. If the model satisfies a form of
transparency, this can be shown directly. For post hoc
interpretability, work in this field should fix a clear
objective and demonstrate evidence that the offered form
of interpretation achieves it.
3 In some cases, transparency may be at odds with
the broader objectives of AI (artificial intelligence).
Some arguments against black-box algorithms appear
to preclude any model that could match or surpass
human abilities on complex tasks. As a concrete example,
the short-term goal of building trust with doctors by
developing transparent models might clash with the
longer-term goal of improving health care. Be careful
when giving up predictive power that the desire for
transparency is justified and not simply a concession to
institutional biases against new methods.
3 Post hoc interpretations can potentially mislead.
Beware of blindly embracing post hoc notions of
interpretability, especially when optimized to placate
acmqueue | may-june 2018 21
machine learning 22 of 27

subjective demands. In such cases, one might—deliberately

or not—optimize an algorithm to present misleading
but plausible explanations. As humans, we are known to
engage in this behavior, as evidenced in hiring practices
and college admissions. Several journalists and social
scientists have demonstrated that acceptance decisions
attributed to virtues such as leadership or originality
often disguise racial or gender discrimination.21 In the
rush to gain acceptance for
machine learning and to emulate
Related articles
human intelligence, we should
3 Accountability in Algorithmic
Decision-making
all be careful not to reproduce
pathological behavior at scale.
Nicholas Diakopoulos
A view from computational journalism
FUTURE WORK
https://queue.acm.org/detail.cfm?id=2886105
There are several promising
3 Black Box Debugging
James A. Whittaker, Herbert H. Thompson
directions for future work.
First, for some problems, the
It’s all about what takes place at the
discrepancy between real-life and
boundary of an application.
machine-learning objectives could
https://queue.acm.org/detail.cfm?id=966807
be mitigated by developing richer
3 Hazy: Making it Easier to Build and
Maintain Big-data Analytics
loss functions and performance
metrics. Exemplars of this
Arun Kumar, Feng Niu, and Christopher Ré
direction include research on
Racing to unleash the full potential of
sparsity-inducing regularizers
big data with the latest statistical and
and cost-sensitive learning.
machine-learning techniques.
Second, this analysis can be
https://queue.acm.org/detail.cfm?id=2431055
expanded to other ML paradigms
such as reinforcement learning.
Reinforcement learners can address some (but not all)
of the objectives of interpretability research by directly

acmqueue | may-june 2018 22

machine learning 23 of 27

modeling interaction between models and environments.

This capability, however, may come at the cost of allowing
models to experiment in the world, incurring real
consequences.
Notably, reinforcement learners are able to learn
causal relationships between their actions and real-world
impacts. Like supervised learning, however, reinforcement
learning relies on a well-defined scalar objective. For
problems such as fairness, where we struggle to verbalize
precise definitions of success, a shift of the ML paradigm is
unlikely to eliminate the problems we face.

References
1. A
they, S., Imbens, G. W. 2015 Machine-learning methods
https://arxiv.org/abs/1504.01132v1 (see also ref. 7).
2. C aruana, R., Kangarloo, H., Dionisio, J. D, Sinha, U.,
Johnson, D. 1999. Case-based explanation of non-case-
based learning methods. In Proceedings of the American
Medical Informatics Association (AMIA) Symposium:
212-215.
3. C aruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M.,
Elhadad, N. 2015. Intelligible models for healthcare:
Predicting pneumonia risk and hospital 30-day
readmission. In Proceedings of the 21st Annual SIGKDD
International Conference on Knowledge Discovery and
Data Mining, 1721-1730.
4. C hang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., Blei,
D. M. 2009. Reading tea leaves: how humans interpret
topic models. In Proceedings of the 22nd International
Conference on Neural Information Processing Systems
(NIPS), 288-296.

acmqueue | may-june 2018 23

machine learning 24 of 27

5. D oshi-Velez, F., Wallace, B., Adams, R. 2015. Graph-

sparse lDA: a topic model with structured sparsity. In
Proceedings of the 29th Association for the Advancement
of Artificial Intelligence (AAAI) Conference, 2575-2581.
6. F ICO (Fair Isaac Corporation). 2011. Introduction to
model builder scorecard; http://www.fico.com/en/latest-
thinking/white-papers/introduction-to-model-builder-
scorecard.
7. G oodman, B., Flaxman, S. 2016. European Union
regulations on algorithmic decision-making and a “right
to explanation.” https://arxiv.org/abs/1606.08813v3.
8. H uysmans, J., Dejaeger, K., Mues, C., Vanthienen,
J., Baesens, B. 2011. An empirical evaluation of the
comprehensibility of decision table, tree- and rule-
based predictive models. Journal of Decision Support
Systems 51(1), 141-154.
9. K im, B. 2015. Interactive and interpretable machine-
learning models for human-machine collaboration.
Ph.D. thesis. Massachusetts Institute of Technology.
10. Kim, B., Rudin, C., Shah, J. A. 2014. The Bayesian case
model: A generative approach for case-based reasoning
and prototype classification. In Proceedings of the
27th International Conference on Neural Information
Processing Systems (NIPS), volume 2, 1952-1960.
11 . K
im, B., Glassman, E., Johnson, B., Shah, J. 2015. iBCM:
Interactive Bayesian case model empowering humans
via intuitive interaction. Massachusetts Institute of
Technology, Cambridge, MA.
12. Krening, S., Harrison, B., Feigh, K., Isbell, C., Riedl, M.,
Thomaz, A. 2017. Learning from explanations using
sentiment and advice in RL. IEEE Transactions on

acmqueue | may-june 2018 24

machine learning 25 of 27

Cognitive and Developmental Systems 9(1), 41-55.

13. L ipton, Z. C., Kale, D. C., Wetzel, R. 2016. Modeling
missing data in clinical time series with RNNs. In
Proceedings of Machine Learning for Healthcare.
14. Liu, C., Rani, P., Sarkar, N. 2006. An empirical study of
machine-learning techniques for affect recognition
in human-robot interaction. Pattern Analysis and
Applications 9(1): 58-69.
15. L ou, Y., Caruana, R., Gehrke, J. 2012. Intelligible models
for classification and regression. In Proceedings of
the 18th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, 150-158.
16. L ou, Y., Caruana, R., Gehrke, J., Hooker, G. 2013. Accurate
intelligible models with pairwise interactions. In
Proceedings of the 19th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining,
623-631.
17. Mahendran, A., Vedaldi, A. 2015. Understanding deep
image representations by inverting them. In Proceedings
of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 1-9.
18. M cAuley, J., Leskovec, J. 2013. Hidden factors and
hidden topics: understanding rating dimensions with
review text. In Proceedings of the 7th ACM Conference on
Recommender Systems, 165-172.
19. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., Dean,
J. 2013. Distributed representations of words and
phrases and their compositionality. In Proceedings of
the 26th International Conference on Neural Information
Processing Systems (NIPS), volume 2, 3111–3119.
20. Mordvintsev, A., Olah, C., Tyka, M. 2015. Inceptionism:

acmqueue | may-june 2018 25

machine learning 26 of 27

going deeper into neural networks. Google AI Blog;

https://ai.googleblog.com/2015/06/inceptionism-going-
deeper-into-neural.html.
21. M
ounk, Y. 2014. Is Harvard unfair to Asian-Americans?
New York Times (Nov. 24); http://www.nytimes.
com/2014/11/25/opinion/is-harvard-unfair-to-asian-
americans.html.
22. Pearl, J. 2009. Causality. Cambridge University Press.
23. R ibeiro, M. T., Singh, S., Guestrin, C. 2016. “Why should I
trust you?”: explaining the predictions of any classifier.
In Proceedings of the 22nd SIGKDD International
Conference on Knowledge Discovery and Data Mining,
1135-1144.
24. Ridgeway, G., Madigan, D., Richardson, T., O’Kane, J.
1998. Interpretable boosted naïve Bayes classification.
In Proceedings of the 4th International Conference on
Knowledge Discovery and Data Mining: 101-104.
25. S imonyan, K., Vedaldi, A., Zisserman, A. 2013. Deep
inside convolutional networks: Visualising image
classification models and saliency maps. https://arxiv.
org/abs/1312.6034 (see notes to refs 1, 7).
26. S zegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan,
D., Goodfellow, I., Fergus, R. 2013. Intriguing properties
of neural networks. https://arxiv.org/abs/1312.6199 (see
refs 1, 7, 25).
27. Tibshirani, R. 1996. Regression shrinkage and selection
via the lasso. Journal of the Royal Statistical Society:
Series B (Statistical Methodology) 58(1), 267-288.
28. Van der Maaten, L., Hinton, G. 2008. Visualizing data
using t-SNE. Journal of Machine Learning Research 9,
2579-2605.

acmqueue | may-june 2018 26

machine learning 27 of 27

29. Wang, H.-X., Fratiglioni, L., Frisoni, G. B., Viitanen, M.,

Winblad, B. 1999. Smoking and the occurrence of
Alzheimer’s disease: cross-sectional and longitudinal
data in a population-based study. American Journal of
Epidemiology 149(7), 640-644.
30. Wang, Z., Freitas, N., Lanctot, M. 2016. Dueling network
architectures for deep reinforcement learning.
Proceedings of the 33rd International Conference on
Machine Learning 48, 1995-2003.

Zachary Chase Lipton is an assistant professor at Carnegie

Mellon University. His research spans both core machine-
learning methods and their social impact, concentrating
on deep learning for time series data and sequential
decision making. This work addresses diverse application
areas, including medical diagnosis, dialogue systems, and
product recommendation. He is the founding editor of the
Approximately Correct blog and the lead author of Deep
Learning – The Straight Dope, an open-source interactive
book teaching deep learning through Jupyter Notebook. Find
him on Twitter (@zacharylipton) or GitHub (@zackchase).

acmqueue | may-june 2018 27

machine learning

acmqueue | may-june 2018 28

Lets Talk About Interpersonal Communication Download
No ratings yet
Lets Talk About Interpersonal Communication Download
5 pages
Maths Formula Sheet
No ratings yet
Maths Formula Sheet
3 pages
General Navigation Questions
77% (22)
General Navigation Questions
242 pages
Christoph Molnar-Interpretable Machine Learning-2021
No ratings yet
Christoph Molnar-Interpretable Machine Learning-2021
368 pages
Edgetech - 4200FS HD MP DF Side Scan System Towfish User Manual Rev - 1.2
No ratings yet
Edgetech - 4200FS HD MP DF Side Scan System Towfish User Manual Rev - 1.2
50 pages
Rudin - 2019 - Stop Explaining Black Box Machine Learning Models For High Stakes Decisions and
No ratings yet
Rudin - 2019 - Stop Explaining Black Box Machine Learning Models For High Stakes Decisions and
10 pages
21 SS133
No ratings yet
21 SS133
85 pages
Interpretable Machine Learning
No ratings yet
Interpretable Machine Learning
80 pages
An Introduction To Machine Learning Interpretability 2e
100% (1)
An Introduction To Machine Learning Interpretability 2e
62 pages
Interpretable Machine Learning - Fundamental Principles and 10 Grand Challenges
No ratings yet
Interpretable Machine Learning - Fundamental Principles and 10 Grand Challenges
74 pages
An Introduction To Machine Learning Interpretability Second Edition PDF
No ratings yet
An Introduction To Machine Learning Interpretability Second Edition PDF
62 pages
An Introduction To Machine Learning Interpretability
No ratings yet
An Introduction To Machine Learning Interpretability
39 pages
Interpretability_and_Explainability_A_Ma
No ratings yet
Interpretability_and_Explainability_A_Ma
24 pages
Causal Interpretability For Machine Learning
No ratings yet
Causal Interpretability For Machine Learning
16 pages
Navdeep Gill, Patrick Hall - An Introduction To Machine Learning Interpretability (2018, O'Reilly Media, Inc.) PDF
No ratings yet
Navdeep Gill, Patrick Hall - An Introduction To Machine Learning Interpretability (2018, O'Reilly Media, Inc.) PDF
45 pages
Ayush Somani_ Dilip K. Prasad_ Alexander Horsch - Interpretability in Deep Learning-Springer (2023)
No ratings yet
Ayush Somani_ Dilip K. Prasad_ Alexander Horsch - Interpretability in Deep Learning-Springer (2023)
483 pages
Philosophy Compass - 2022 - Beisbart - Philosophy of Science at Sea Clarifying The Interpretability of Machine Learning
No ratings yet
Philosophy Compass - 2022 - Beisbart - Philosophy of Science at Sea Clarifying The Interpretability of Machine Learning
11 pages
Chapter5
No ratings yet
Chapter5
29 pages
Hima Lakkaraju XAI ShortCourse
No ratings yet
Hima Lakkaraju XAI ShortCourse
271 pages
Explainable AI Introduction 2 of 2
No ratings yet
Explainable AI Introduction 2 of 2
39 pages
Human Factors in Explainability 1 of 2
No ratings yet
Human Factors in Explainability 1 of 2
41 pages
Explainable and Interpretable Models in Computer Vision and Machine Learning
No ratings yet
Explainable and Interpretable Models in Computer Vision and Machine Learning
305 pages
entropy-23-00018-v2-36
No ratings yet
entropy-23-00018-v2-36
1 page
Ex ML
No ratings yet
Ex ML
33 pages
Issues in Machine Learning With Conclution (2)
No ratings yet
Issues in Machine Learning With Conclution (2)
8 pages
Algorithms For Interpretable Machine Learning
No ratings yet
Algorithms For Interpretable Machine Learning
125 pages
XAI Basics
No ratings yet
XAI Basics
34 pages
A Survey On Explainable Artificial Intelligence XAI Toward Medical XAI
No ratings yet
A Survey On Explainable Artificial Intelligence XAI Toward Medical XAI
21 pages
Emergence III
From Everand
Emergence III
Larry Matthews
No ratings yet
Module1 Lecture 1
No ratings yet
Module1 Lecture 1
39 pages
2005.12800v1
No ratings yet
2005.12800v1
11 pages
Interpretable-machine-learning
No ratings yet
Interpretable-machine-learning
10 pages
t Respaiit i 4 l2 en File 14.En
No ratings yet
t Respaiit i 4 l2 en File 14.En
181 pages
Critiques ML PDF
No ratings yet
Critiques ML PDF
78 pages
Applying Genetic Programming To Improve Interpretability in Machine Learning Models
No ratings yet
Applying Genetic Programming To Improve Interpretability in Machine Learning Models
8 pages
Human and Machine Learning: Jianlong Zhou Fang Chen Editors
100% (1)
Human and Machine Learning: Jianlong Zhou Fang Chen Editors
485 pages
Interpretability of Machine Learning: Recent Advances and Future Prospects
No ratings yet
Interpretability of Machine Learning: Recent Advances and Future Prospects
12 pages
2011.07876 (1)
No ratings yet
2011.07876 (1)
74 pages
2.1 Importance of Interpretability
No ratings yet
2.1 Importance of Interpretability
5 pages
Interpretable Machine Learning
No ratings yet
Interpretable Machine Learning
185 pages
Explainable AI Introduction
No ratings yet
Explainable AI Introduction
51 pages
Machine Learning Interpretability
No ratings yet
Machine Learning Interpretability
10 pages
Machine Learning: Fundamentals and Applications
From Everand
Machine Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Designing Inherently Interpretable Machine Learning Models
No ratings yet
Designing Inherently Interpretable Machine Learning Models
8 pages
[Ebooks PDF] download Interpretable AI Building explainable machine learning systems MEAP V02 Ajay Thampi full chapters
100% (2)
[Ebooks PDF] download Interpretable AI Building explainable machine learning systems MEAP V02 Ajay Thampi full chapters
55 pages
What Is Interpretability
No ratings yet
What Is Interpretability
3 pages
Explaining Explanations - An Overview of Interpretability of Machine Learning
No ratings yet
Explaining Explanations - An Overview of Interpretability of Machine Learning
10 pages
Complete Download Interpretable AI Building explainable machine learning systems MEAP V02 Ajay Thampi PDF All Chapters
100% (3)
Complete Download Interpretable AI Building explainable machine learning systems MEAP V02 Ajay Thampi PDF All Chapters
62 pages
Christoph Molnar - Interpretable Machine Learning-Lulu - Com (2020)
No ratings yet
Christoph Molnar - Interpretable Machine Learning-Lulu - Com (2020)
255 pages
Overview ML Interpretability
No ratings yet
Overview ML Interpretability
10 pages
Nist Ir 8367
No ratings yet
Nist Ir 8367
56 pages
Interpretability in Deep Learning 1st Edition Ayush Somani - The ebook is available for quick download, easy access to content
100% (2)
Interpretability in Deep Learning 1st Edition Ayush Somani - The ebook is available for quick download, easy access to content
66 pages
entropy-23-00018-v2-3
No ratings yet
entropy-23-00018-v2-3
1 page
Critiques ML Aut19
No ratings yet
Critiques ML Aut19
86 pages
Explainable Artificial Intelligence Challenges and Future Directions
No ratings yet
Explainable Artificial Intelligence Challenges and Future Directions
36 pages
FAaCT (Fair Accountable Transparent ) Machine Learning
No ratings yet
FAaCT (Fair Accountable Transparent ) Machine Learning
9 pages
GNN-Foundations-Frontiers-and-Applications-chapter7
No ratings yet
GNN-Foundations-Frontiers-and-Applications-chapter7
27 pages
Introduction to LLMs for Business Leaders: Responsible AI Strategy Beyond Fear and Hype: Byte-Sized Learning Series
From Everand
Introduction to LLMs for Business Leaders: Responsible AI Strategy Beyond Fear and Hype: Byte-Sized Learning Series
I. Almeida
No ratings yet
A Survey of Methods For Explaining Black Box Models
No ratings yet
A Survey of Methods For Explaining Black Box Models
42 pages
Get (Ebook) Interpretability in Deep Learning by Ayush Somani, Alexander Horsch, Dilip K. Prasad ISBN 9783031206382, 9783031206399, 303120638X, 3031206398 free all chapters
100% (2)
Get (Ebook) Interpretability in Deep Learning by Ayush Somani, Alexander Horsch, Dilip K. Prasad ISBN 9783031206382, 9783031206399, 303120638X, 3031206398 free all chapters
81 pages
Full download Interpretability in Deep Learning 1st Edition Ayush Somani pdf docx
100% (2)
Full download Interpretability in Deep Learning 1st Edition Ayush Somani pdf docx
40 pages
A Survey On Explainable Artificial Intelligence
No ratings yet
A Survey On Explainable Artificial Intelligence
22 pages
An Overview of Explanation Approaches For Deep Neural Networks (Ongoing Work-This Is A Draft)
No ratings yet
An Overview of Explanation Approaches For Deep Neural Networks (Ongoing Work-This Is A Draft)
17 pages
2002.05193v2
No ratings yet
2002.05193v2
68 pages
Pneumatic Actuators Manufactured in India by Elomatic
No ratings yet
Pneumatic Actuators Manufactured in India by Elomatic
3 pages
Sebastián Aguilera de Heredia: Ensalada. Obra de 8o Tono Alto
No ratings yet
Sebastián Aguilera de Heredia: Ensalada. Obra de 8o Tono Alto
5 pages
NIO - HTL LTL Demarcation
No ratings yet
NIO - HTL LTL Demarcation
15 pages
Schneider Electric - TeSys-DF-and-LS1 - DF103
No ratings yet
Schneider Electric - TeSys-DF-and-LS1 - DF103
4 pages
SD 251
No ratings yet
SD 251
7 pages
Sem 2 BBA-Bcom - Class Notes 2
No ratings yet
Sem 2 BBA-Bcom - Class Notes 2
14 pages
EES Mathematics Revision and Exam Year 10 Test Factorise
No ratings yet
EES Mathematics Revision and Exam Year 10 Test Factorise
1 page
CAAM82N6 Finite Elements in The Analysis of Pressure Vessels and Piping
No ratings yet
CAAM82N6 Finite Elements in The Analysis of Pressure Vessels and Piping
22 pages
FCE Listening Practice Tests - EngExam
No ratings yet
FCE Listening Practice Tests - EngExam
1 page
Ex 3
No ratings yet
Ex 3
4 pages
Grade Level 10 Quarter / Domain 1 Living Things and Their Environment Third Quarter Science 10
No ratings yet
Grade Level 10 Quarter / Domain 1 Living Things and Their Environment Third Quarter Science 10
4 pages
YX9000 SERIES HIGH-PERFORMANCE AC DRIVE User Manual (001-023)
No ratings yet
YX9000 SERIES HIGH-PERFORMANCE AC DRIVE User Manual (001-023)
23 pages
Orphee Mythic 18 Analyzer - Service Manual
83% (6)
Orphee Mythic 18 Analyzer - Service Manual
68 pages
Applied Cyber Security and the Smart Grid: Implementing Security Controls Into the Modern Power Infrastructure 1st Edition by Eric Knapp, Raj Samani ISBN 1597499989 9781597499989 - The full ebook with all chapters is available for download now
100% (7)
Applied Cyber Security and the Smart Grid: Implementing Security Controls Into the Modern Power Infrastructure 1st Edition by Eric Knapp, Raj Samani ISBN 1597499989 9781597499989 - The full ebook with all chapters is available for download now
86 pages
2 VDPA：Wirespeed Virtual Networking——张雅兰
No ratings yet
2 VDPA：Wirespeed Virtual Networking——张雅兰
28 pages
Comparison of Capital Costs Per RKM in Urban Rail
No ratings yet
Comparison of Capital Costs Per RKM in Urban Rail
14 pages
Ingles III Task 2
No ratings yet
Ingles III Task 2
6 pages
Facilities 2013 SG - Full Time, 2nd Year
100% (2)
Facilities 2013 SG - Full Time, 2nd Year
49 pages
Artnet 8™: User Guide
No ratings yet
Artnet 8™: User Guide
24 pages
Tekla 20 Structure Modeling Tutorial
100% (2)
Tekla 20 Structure Modeling Tutorial
232 pages
Current Sensing Relay RIBXKTF
No ratings yet
Current Sensing Relay RIBXKTF
1 page
Magic Formula Sheet - 2024 Sheet
No ratings yet
Magic Formula Sheet - 2024 Sheet
33 pages
Unit 20 Destination B2
No ratings yet
Unit 20 Destination B2
8 pages
TV Genres Bingo TV Genres Bingo
No ratings yet
TV Genres Bingo TV Genres Bingo
3 pages
Bitcoinheist: Topological Data Analysis For Ransomware Prediction On The Bitcoin Blockchain
No ratings yet
Bitcoinheist: Topological Data Analysis For Ransomware Prediction On The Bitcoin Blockchain
7 pages
Tang Minh Phat - Project Profile
No ratings yet
Tang Minh Phat - Project Profile
21 pages

The Mythos of Model Interpretability

Uploaded by

The Mythos of Model Interpretability

Uploaded by

1 of 27 TEXT

machine learning ONLY

papers addressing interpretability, finding them to be

acmqueue | may-june 2018 2

Over the past 20 years, rapid progress in ML (machine

acmqueue | may-june 2018 3

academic literature, few authors articulate precisely what

acmqueue | may-june 2018 4

understanding of models? Is trust defined subjectively?

acmqueue | may-june 2018 5

render them interpretable (expanded in the section,

DESIDERATA OF INTERPRETABILITY RESEARCH

acmqueue | may-june 2018 6

demand interpretability, you might infer the existence

acmqueue | may-june 2018 7

might denote confidence that the model will perform well

acmqueue | may-june 2018 8

algorithms are not guaranteed to reflect causal

acmqueue | may-june 2018 9

received. If the model were deployed to aid in triage, these

acmqueue | may-june 2018 10

Similarly, the total number of accounts can be increased by

acmqueue | may-june 2018 11

Fair and ethical decision making

THE TRANSPARENCY NOTION OF INTERPRETABILITY

acmqueue | may-june 2018 12

if a person can contemplate the entire model at once. This

acmqueue | may-june 2018 13

acmqueue | may-june 2018 14

unseen data. On the other hand, modern deep learning

Post hoc interpretability

acmqueue | may-june 2018 15

language model, to generate an explanation. Such an

acmqueue | may-june 2018 16

itself a post hoc interpretation technique that has invited

acmqueue | may-june 2018 17

the original image can be recovered with high fidelity even

acmqueue | may-june 2018 18

often used when the model subject to interpretation is in

acmqueue | may-june 2018 19

distances calculated in the latent space. Kim et al.10 and

acmqueue | may-june 2018 20

neural networks exhibit a clear advantage. They learn

subjective demands. In such cases, one might—deliberately

acmqueue | may-june 2018 22

modeling interaction between models and environments.

acmqueue | may-june 2018 23

5. D  oshi-Velez, F., Wallace, B., Adams, R. 2015. Graph-

acmqueue | may-june 2018 24

Cognitive and Developmental Systems 9(1), 41-55.

acmqueue | may-june 2018 25

going deeper into neural networks. Google AI Blog;

acmqueue | may-june 2018 26

29. Wang, H.-X., Fratiglioni, L., Frisoni, G. B., Viitanen, M.,

Zachary Chase Lipton is an assistant professor at Carnegie

acmqueue | may-june 2018 27

acmqueue | may-june 2018 28

You might also like

5. D oshi-Velez, F., Wallace, B., Adams, R. 2015. Graph-

29. Wang, H.-X., Fratiglioni, L., Frisoni, G. B., Viitanen, M.,