0% found this document useful (0 votes)
15 views

Applications of machine learning in drug discovery and development

The document discusses the applications of machine learning (ML) in drug discovery and development, highlighting its potential to enhance decision-making and streamline processes across various stages. It notes the challenges related to data quality and the interpretability of ML results, while emphasizing the need for comprehensive high-dimensional data. The review also outlines the types of ML techniques used and their successful applications in identifying targets, predicting drug responses, and improving clinical trial outcomes.

Uploaded by

rrashed221102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Applications of machine learning in drug discovery and development

The document discusses the applications of machine learning (ML) in drug discovery and development, highlighting its potential to enhance decision-making and streamline processes across various stages. It notes the challenges related to data quality and the interpretability of ML results, while emphasizing the need for comprehensive high-dimensional data. The review also outlines the types of ML techniques used and their successful applications in identifying targets, predicting drug responses, and improving clinical trial outcomes.

Uploaded by

rrashed221102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

REVIEWS

Applications of machine learning in


drug discovery and development
Jessica Vamathevan 1*, Dominic Clark1, Paul Czodrowski 2, Ian Dunham3,
Edgardo Ferran1, George Lee4, Bin Li5, Anant Madabhushi6,7, Parantu Shah8,
Michaela Spitzer3 and Shanrong Zhao9
Abstract | Drug discovery and development pipelines are long, complex and depend on
numerous factors. Machine learning (ML) approaches provide a set of tools that can improve
discovery and decision making for well-​specified questions with abundant, high-​quality data.
Opportunities to apply ML occur in all stages of drug discovery. Examples include target
validation, identification of prognostic biomarkers and analysis of digital pathology data in
clinical trials. Applications have ranged in context and methodology, with some approaches
yielding accurate predictions and insights. The challenges of applying ML lie primarily with
the lack of interpretability and repeatability of ML-​generated results, which may limit their
application. In all areas, systematic and comprehensive high-​dimensional data still need to be
generated. With ongoing efforts to tackle these issues, as well as increasing awareness of the
factors needed to validate ML approaches, the application of ML can promote data-​driven
decision making and has the potential to speed up the process and reduce failure rates in drug
discovery and development.

1
European Molecular Biology Biological systems are complex sources of information powerful models from data and the demonstrable suc-
Laboratory, European
during development and disease. This information is cess of these techniques in numerous public contests3,4
Bioinformatics Institute,
Cambridge, UK.
now being systematically measured and mined at have helped to enormously increase the applications
2
Technical University of
unprecedented levels using a plethora of ‘omics’ of ML within pharmaceutical companies in the past
Dortmund, Dortmund, and smart technologies. The advent of these high-​ 2 years.
Germany. throughput approaches to biology and disease presents Although many consumer service industries have
3
Open Targets and European both challenges and opportunities to the pharmaceuti- been early adopters of newer methods from the field of
Molecular Biology cal industry, for which the aim is to identify plausible ML, uptake from the pharmaceutical industry has lagged
Laboratory, European therapeutic hypotheses from which to develop drugs. until recently. It is well known that the success rate for
Bioinformatics Institute,
Cambridge, UK.
However, recent advances in a number of factors have drug development (as defined from phase I clinical
led to increased interest in the use of machine learn- trials to drug approvals) is very low across all therapeu-
4
Bristol-​Myers Squibb,
Princeton, NJ, USA. ing (ML) approaches within the pharmaceutical indus- tic areas and across the global pharmaceutical industry.
5
Takeda Pharmaceuticals
try. Coupled with infinitely scalable storage, the large A recent study on 21,143 compounds found that the
International Co., Cambridge, increase in the types and sizes of data sets that may overall success rate was as low as 6.2%5. Hence, much of
MA, USA. provide the basis for ML has enabled pharmaceutical the rationale for the use of ML technologies within the
6
Case Western Reserve companies to access and organize many more data. pharmaceutical industry is driven by business needs to
University, Cleveland, OH, USA. Data types can include images, textual information, lower overall attrition and costs.
7
Louis Stokes Cleveland biometrics and other information from wearables, assay All stages of drug discovery and development,
Veterans Affair Medical information and high-​dimensional omics data1. including clinical trials, have embarked on developing
Center, Cleveland, OH, USA.
Over the past few years, the field of artificial intelli- and utilizing ML algorithms and software (Fig. 1) to iden-
8
EMD Serono R&D Institute,
gence (AI) has moved from largely theoretical studies to tify novel targets6, provide stronger evidence for target–
Billerica, MA, USA.
real-​world applications. Much of that explosive growth disease associations7, improve small-​molecule com-
9
Pfizer Worldwide Research
and Development,
has to do with the wide availability of new computer pound design and optimization8, increase understanding
Cambridge, MA, USA. hardware such as graphical processing units (GPUs) that of disease mechanisms, increase understanding of
*e-​mail: [email protected] make parallel processing faster, especially in numerically disease and non-​disease phenotypes9, develop new bio-
https://doi.org/10.1038/ intensive computations. More recently, advances in new markers for prognosis, progression and drug efficacy1,
s41573-019-0024-5 ML algorithms, such as deep learning (DL)2, that build improve analysis of biometric and other data from

Nature Reviews | Drug Discovery


Reviews

Target identification Compound screening Preclinical Clinical


and validation and lead discovery development development

Successful applications in drug discovery


• Target identification and • Compound design with • Tissue-specific biomarker • Determination of drug
prioritization based on desirable properties identification response by cellular
gene–disease associations • Compound synthesis • Classification of cancer phenotyping in oncology
• Target druggability predictions reaction plans drug–response signatures • Precise measurements of the
• Identification of alternative • Ligand-based • Prediction of biomarkers tumour microenvironment
targets (splice variants) compound screening of clinical end points in immuno-oncology

Required data characteristics


• Current data are highly • Large amounts of • Biomarkers: • Pathology: well-curated
heterogeneous: need training data needed reproducibility of models expert annotations for
standardized high-dimensional • Models for compound based on gene broad-use cases (cancer
target–disease–drug reaction space and expression data versus normal cells)
association data sets rules • Dimension reduction of • Gold standard data sets to
• Comprehensive omics data • Gold standard ADME single-cell data for cell improve interpretability and
from disease and normal states data type and biomarker transparency of models
• High-confidence associations • Numerous protein identification • Sample size: high number of
from the literature structures • Proteomic and images per clinical trial
• Metadata from successful and transcriptomic data of
failed clinical trials high quality and quantity

Fig. 1 | Machine learning applications in the drug discovery pipeline and their required data characteristics.
Several successful applications of machine learning in various stages of the drug development pipeline in pharmaceutical
companies have been published. However, within each data domain, there are still challenges related to the standard
of data quality and data quantity needed to capitalize on the full potential of these methods for discovery. ADME,
absorption, distribution, metabolism and excretion.

patient monitoring and wearable devices, enhance continuous variables, whereas unsupervised methods are
digital pathology imaging10 and extract high-​content used for exploratory purposes to develop models that
information from images at all levels of resolution. enable clustering of the data in a way that is not speci-
Consequently, many pharmaceutical companies have fied by the user. Supervised learning trains a model on
begun to invest in resources, technologies and services to known input and output data relationships so that it can
generate and curate data sets to support research in this predict future outputs for new inputs. Future outputs are
area. Furthermore, technology giants such as IBM and typically models or results for data classification or an
Google, biotechnology start-​ups and academic centres understanding of the most influential variables (regres-
are not only providing cloud-​based computation services sion). The unsupervised learning technique identifies
but also working in the pharmaceutical and health-​care hidden patterns or intrinsic structures in the input data
space with industry partners. This Review provides an and uses these to cluster data in meaningful ways.
overview of current tools and techniques (the toolbox)
used in ML, including deep neural nets, and an overview Model selection concepts. The aim of a good ML model
of progress so far in key pharmaceutical application areas. is to generalize well from the training data to the test data
at hand. Generalization refers to how well the concepts
The machine learning toolbox learned by the model apply to data not seen by the model
Fundamentally, ML is the practice of using algorithms to during training. Within each technique, several meth-
parse data, learn from it and then make a determination ods exist (Fig. 2), which vary in their prediction accuracy,
or a prediction about the future state of any new data training speed and the number of variables they can han-
sets. So rather than hand-​coding software routines with dle. Algorithms must be chosen carefully to ensure that
a specific set of instructions (pre-​determined by the pro- they are suitable for the problem at hand and the amount
grammer) to accomplish a particular task, the machine and type of data available. The amount of parameter tun-
is trained using large amounts of data and algorithms ing needed and how well the method separates signal
that give it the ability to learn how to perform the task. from noise are also important considerations.
The programmer codes the algorithm used to train the Model overfitting happens when the model learns not
network instead of coding expert rules. only the signal but also some of the unusual features of
The algorithms adaptively improve their perfor- the training data and incorporates these into the model,
mance as the quantity and quality of data available for with a resulting negative impact on the performance of
learning increase. Hence, ML is best applied to solve the model on new data. Underfitting refers to a model
problems for which a large amount of data and several that can neither model the training data nor generalize
variables are at hand but a model or formula relating to new data. Typical ways to limit overfitting are to apply
Graphical processing units these is not known. resampling methods or to hold back part of the training
(GPUs). Processors designed to There are two main types of technique that are used data to use as a validation data set. Regularization regres-
accelerate the rendering of
graphics and that can handle
to apply ML: supervised and unsupervised learning. sion methods (such as Ridge, LASSO or elastic nets) add
tens of thousands of Supervised learning methods are used to develop train- penalties to parameters as model complexity increases so
operations per cycle. ing models to predict future values of data categories or that the model is forced to generalize the data and not

www.nature.com/nrd
Reviews

Supervised learning techniques Unsupervised learning techniques

Regression analysis methods Classifier methods Clustering methods

Elastic net regression General Sparse Discriminant K- Hierarchical Gaussian


Linear (e.g. LASSO and SVMs
linear linear analysis means clustering mixture
regression Ridge regularization) model regression
NLP kernel Nearest Neural networks
Partial least Principal component SVR methods neighbour (Kohonen maps,
squares regression regression autoencoders and DAENs)
Ensemble Ensemble NLP
Gaussian methods (such Neural networks methods (gradient Bayesian Hidden Markov GANs
process Decision boosting) classifier model
as random trees (DNNs, CNNs
regression forests) and RNNs)

Compound bioactivity Target druggability based


and assay readouts from on PK properties and protein
virtual drug–target screens14 structure or sequence31–34

Novel therapeutic
targets from target–gene De novo molecule design45,46
associations7

Target–disease–drug Feature Cell types and


Cancer-related Disease and target associations from reduction in biomarkers
genes from druggability from literature19,20 single-cell data from
RNAi screen9 multi-dimensional data17 to identify single-cell
Tissue-specific cell types75 RNA data76
Novel targets and therapeutic Targets for Drug biomarkers from gene
resistance from disease-specific Huntington sensitivity expression signatures1 Deep feature Low-dose
splice variants21,22,24 disease18 prediction56,65 selection for CT image
biomarkers79–81 analysis104
Chemical–genetic Quantitative structure–activity ADME properties in targets
associations29 relationships41 and planning chemical synthesis40

Gene expression Biomarkers of clinical Polygenic risk Molecular features Ligand-based


signatures that end points from scores for that predict cancer virtual
predict clinical continuous complex traits73 drug response31 screening53
trial success38 variable data61,62

Phenotyping of Accelerated MRI Image-based


cellular images9 data acquisition103 diagnosis95–98

Fig. 2 | Machine learning tools and their drug discovery applications. This figure gives an overview of the machine
learning techniques that have been used to answer the drug discovery questions covered in this Review. A range of
supervised learning techniques (regression and classifier methods) are used to answer questions that require prediction
of data categories or continuous variables, whereas unsupervised techniques are used to develop models that enable
clustering of the data. ADME, absorption, distribution, metabolism and excretion; CNN, convolutional neural
network; CT, computed tomography; DAEN, deep autoencoder neural network; DNN, deep neural network; GAN,
generative adversarial network; MRI, magnetic resonance imaging; NLP, natural language processing; PK,
pharmacokinetic; RNAi, RNA interference; RNN, recurrent neural network; SVM, support vector machine; SVR,
Central processing units support vector regression.
(CPUs). Processors designed to
solve every computational
problem in a general fashion overfit. One of the most effective ways to avoid overfit- GPUs and tensor processing units (TPUs)), and from
and that can handle tens of
operations per cycle. The
ting is the dropout method11, which randomly removes desktops to clusters of servers. Commonly used ML pro-
cache and memory are units in the hidden layer. Different ML techniques have grammatic frameworks are the open-​source framework
designed to be optimal for any different performance metrics. Basic evaluation met- TensorFlow, originally developed by researchers and
general programming problem. rics12 such as classification accuracy, kappa13, area under engineers from the Google Brain team within Google’s
the curve (AUC), logarithmic loss, the F1 score and the AI organization (see Related links), as well as PyTorch,
Tensor processing units
(TPUs). Co-​processors
confusion matrix can be used to compare performance Keras and Scikit-​learn.
manufactured by Google that across methods. The availability of gold standard data
are designed to accelerate sets as well as independently generated data sets can be Deep neural network architectures. DL is a modern
deep learning tasks developed invaluable in generating well-​performing models. reincarnation of artificial neural networks from the
using TensorFlow (a
programming framework) and
Several software libraries are now available for high-​ 1980s and 1990s and uses sophisticated, multi-​level
can handle up to 128,000 performance mathematical computation across a variety deep neural networks (DNNs) to create systems that
operations per cycle. of hardware platforms (central processing units (CPUs), can perform feature detection from massive amounts of

Nature Reviews | Drug Discovery


Reviews

unlabelled or labelled training data2. The major differ- Data characteristics. The practice of ML is said to
ence between DL and traditional artificial neural net- consist of at least 80% data processing and cleaning and
works is the scale and complexity of the networks used. 20% algorithm application. The predictive power of any
In neural networks, input features are fed to an input ML approach is therefore dependent on the availabil-
layer, and after a number of nonlinear transformations ity of high volumes of data of high quality. Data used
using hidden layers, the predictions are generated by an for training need to be accurate, curated and as com-
output layer. This is typically done by using the back- plete as possible in order to maximize predictability.
propagation of errors to progressively reduce the dif- Experimental design often involves discussions on the
ference between the obtained and the expected values ideal sample size and the appropriate power calculations
of the output. Each output node corresponds to a task for correctly estimating this parameter. Whether the cor-
(or class) to be predicted. If there is only one node in the rect type of data is even available and what data should
output layer, then the corresponding network is referred be experimentally generated are also key considerations
to as a single-​task neural network. DL can have a large for certain questions. ML applications are more powerful
number of hidden layers because it uses more powerful when used on data that have been generated in a system-
CPU and GPU hardware, whereas traditional neural net- atic manner, with minimal noise and good annotation.
works normally use one or two hidden layers because of As we discuss below, many applications are not particu-
hardware limitations. There are also many algorithmic larly effective because data are combined from multiple
improvements in DL. sources with variable data quality. There are ongoing
The applications of DNNs in drug discovery have efforts to develop open annotated data in specific areas
been numerous and include bioactivity prediction14, of drug discovery, such as target validation16. These aim
de novo molecular design, synthesis prediction and to generate good quality positive and negative annota-
biological image analysis3. One advantage of DNNs is tions in areas that are important to drug discovery and
that they have several different flexible architectures development to foster application of ML.
described below and are thus used to answer a variety of
questions. In the first architecture, deep convolutional Applications in drug discovery
neural networks (CNNs), some of the hidden layers are Target identification and validation. The pre-​eminent
only locally (rather than globally) connected to the next approach in drug discovery is to develop drugs (small
hidden layer. CNNs achieve the best predictive perfor- molecules, peptides, antibodies or newer modalities
mance in areas such as speech and image recognition including short RNAs or cell therapies) that will alter the
by hierarchically composing simple local features into disease state by modulating of the activity of a molecular
complex models. Graph convolutional networks are a target. Notwithstanding a recent resurgence in pheno-
special type of CNN that can be applied to structured typic screens, initiating a drug development programme
data in the form of graphs or networks. The second requires identification of a target with a plausible ther-
architecture is the recurrent neural network (RNN), apeutic hypothesis: that modulation of the target will
which takes the form of a chain of repeating modules of result in modulation of the disease state. Selecting this
neural networks in which connections between nodes target on the basis of the available evidence is referred to
form a directed graph along a sequence. This allows for as target identification and prioritization. Having made
the analysis of dynamic changes over time where per- this preliminary choice, the next step is to validate the
sistent information is needed. Long short-​term mem- role of the chosen target in disease using physiologically
ory neural networks are a special kind of RNN that are relevant ex vivo and in vivo models (target validation).
capable of learning long-​term dependencies. The third Although the ultimate validation of the target will only
example — fully connected feedforward networks — come later, through clinical trials, early target valida-
are networks in which every input neuron is connected tion is crucial to focus efforts on potentially successful
to every neuron in the next layer. This is the opposite projects.
of an RNN in that, with fully connected feedforward Modern biology is increasingly rich in data. This
networks, the gradient is clearly defined and comput- includes human genetic information in large popu-
able through backpropagation. These models have lations, transcriptomic, proteomic and metabolomic
been used in challenging predictive model building profiling of healthy individuals and those with specific
cases, such as with gene expression data, in which the diseases and high-​content imaging of clinical material.
number of samples is small relative to the number of The ability to capture these large data sets and to re-​use
features. The fourth network architecture is the deep them via public databases presents new opportunities
autoencoder neural network (DAEN). This type of neu- for early target identification and validation. However,
ral network is an unsupervised learning algorithm that these multi-​dimensional data sets require appropriate
applies backpropagation to project its input to its output analytical methods to yield statistically valid models that
with the purpose of dimension reduction15, thus trying can make predictions for target identification, and this
to preserve the important random variables of the data is where ML can be exploited. The range of experiments
while removing the non-​essential parts. The fifth and that can contribute to target identification and validation
final network architecture — generative adversarial net- is wide, but if these experiments are data-​driven, ML is
works (GANs) — consist of any two networks (although increasingly being applied.
often a combination of feedforward neural networks and The first step in target identification is establishing
CNNs), where one is tasked to generate content and the a causal association between the target and the disease.
other to classify that content. Establishing causality requires demonstration that

www.nature.com/nrd
Reviews

modulation of a target affects disease from either natu- abstracts. This supervised learning approach relies on
rally occurring (genetic) variation or carefully designed the manually annotated European Union adverse drug
experimental intervention. However, ML can be used to reactions (EU-​ADR) database corpus of relationships
analyse large data sets with information on the function and a semi-​automatically annotated corpus based on the
of a putative target to make predictions about poten- Genetic Association Database. DigSee20 identifies genes
tial causality, driven, for instance, by the properties of and diseases in Medline abstracts, uses NLP to extract
known true targets. ML methods have been applied biological events between these entities and ranks the
in this way across several aspects of the target identi- evidence sentences with a Bayesian classifier.
fication field. Costa et al.17 built a decision tree-​based One area with great scope for ML is in understanding
meta-​classifier trained on network topology of protein– basic aspects of biology to identify therapeutic oppor-
protein, metabolic and transcriptional interactions, as tunities through alternate modalities or novel targets.
well as tissue expression and subcellular localization, Understanding genetic variation in splicing signals is
to predict genes associated with morbidity that are also one example. DL splicing models are now able to accu-
druggable. By inspecting the decision tree, they identi- rately predict alternate splicing signals21. The latest inte-
fied regulation by multiple transcription factors (TFs), grative splicing models22 combine CLIP–seq assay data
centrality in metabolic pathways and extracellular loca- of splicing factor binding in vivo with RNA sequencing
tion as key parameters. In other studies, ML models have experiments in which these splicing factors have been
focused on specific diseases or therapeutic areas. Jeon knocked down or overexpressed. Combining splicing
et al.6 built a support vector machine (SVM) classifier using code models with predictions of de novo and complex
various genomic data sets to classify proteins into drug splicing variations has allowed identification of splicing
targets and non-​drug targets for breast, pancreatic and variants specific to Alzheimer disease23. Recent applica-
ovarian cancers. Key classification features were gene tions of similar approaches identified an escape mech-
essentiality, mRNA expression, DNA copy number, anism from CART-19 immunotherapy24, rare genetic
mutation occurrence and protein–protein interaction variants leading to deafness25 and splicing variants
network topology. In all, 122 global cancer targets were associated with autism26.
identified, 69 of which overlap with 116 known cancer ML can also predict cancer-​specific drug effects.
targets. In addition, 266, 462 and 355 targets were iden- Iorio et al.27 screened 990 cancer cell lines against 265
tified as specific to breast, pancreatic and ovarian can- anticancer drugs and investigated how genome-​wide
cers, respectively. Two predicted targets were validated gene expression, DNA methylation, gene copy num-
with peptide inhibitors that had strong anti-​proliferative ber and somatic mutation data affect drug response.
effects in cell culture models. Further, inhibitors for 137 They used ANOVA, logic models and ML algorithms
predicted pancreatic cancer targets were almost twice (elastic net regression and random forests) to identify
as likely to show strong inhibition of cell viability as molecular features that predict drug response. The
other compounds. Ament et al.18 built a model based most predictive data type across cancer types was gene
on mouse TF binding sites and transcriptome profiling expression, whereas the most predictive cancer-​specific
data to characterize transcriptional changes underlying models included genomic features (driver mutations
Huntington disease. They reconstructed a genome-​scale or copy number alterations) and were even better if
model of target genes for 718 TFs in the mouse striatum they included DNA methylation data. Tsherniak et al.28
using a regression model and LASSO regularization. used data from RNA interference (RNAi) screens of
Overall, 13 of 48 identified TF modules were differen- 501 cancer cell lines to find molecular markers that pre-
tially expressed in striatal tissue in human disease and dict cancer dependencies for 769 genes. They developed
provided potential starting points for Huntington dis- a nonlinear regression model based on conditional infer-
ease therapies. Molecular targets for tissue-​specific anti-​ ence trees to generate predictive models based on gene
ageing therapies have been identified by Mamoshina expression, gene copy number and somatic gene muta-
et al.1. They compared gene expression signatures from tions. McMillan et al.29 screened 222 chemicals against
young and old muscle. The comparison of several super- >100 heavily annotated cell models of diverse and
Support vector machine
(SVM) classifier vised ML methods revealed SVMs with linear kernel and charac­teristic somatic lung cancer lesions. They applied
A method that performs deep feature selection to be best suited to the identifi- regularized ML (elastic net) and probability-​based met-
classification tasks by cation of ageing biomarkers. In each of these examples, rics (scanning Kolmogorov–Smirnov) to identify 171
constructing separating lines to ML generated a set of predictions of targets that have chemical–genetic associations that revealed targetable
distinguish between objects
with different class
properties that suggest they are likely to bind drugs, or mechanistic vulnerabilities in a range of oncotypes with-
memberships in a multi-​ be involved in disease, but further validation is essential out effective therapies. These approaches suggest that
dimensional space. to generate a therapeutic hypothesis. there are opportunities for tumour-​intrinsic precision
The literature is the primary source of knowledge on medicine.
CLIP–seq
target association with disease. Automated processing Another important question for drug developers is
Ultraviolet crosslinking
immuno­precipitation (CLIP) of the literature unlocks information from unstruc- how likely it is that a drug can be made for any given target.
followed by RNA sequencing to tured text that would otherwise be inaccessible. Recent For small-​molecule drugs, this entails identifying tar­
identify all RNA species bound advances in natural language processing (NLP), an ML gets that have features that suggest these proteins can bind
by a protein of interest. This approach applied to text mining, have enabled more small molecules30. Different target attributes can be used
method can be used to map
RNA protein binding sites or
effective data mining to identify relevant papers. BeFree19 to generate these druggability models. Nayal and Honig31
RNA modification sites on a applies NLP Kernel methods to identify drug–disease, trained a random forest classifier on physicochemical,
genome-​wide scale. gene–disease and target–drug associations in Medline structural and geometric attributes of 99 drug-​binding

Nature Reviews | Drug Discovery


Reviews

and 1,187 non-​drug-binding cavities from a set of computationally. This has typically been performed
99 proteins. Size and shape of the surface cavities were the using classic statistical methods, but multi-​task DNNs
most important features. Several studies derived various are proving to be more effective40. DNNs can significantly
physicochemical properties from protein sequences of boost predictive power when inferring the properties
known drug and non-​drug targets and applied SVMs32,33 and activities of small molecules41. The one-​shot learn-
or biased SVMs with stacked autoencoders, a DL model34, ing technique can be used to substantially reduce the
to predict druggable targets. Druggable proteins have amount of data required to make meaningful predictions
also been found to occupy specific regions of protein– about the readout of a molecule in a new experimental
protein interaction networks and tend to be highly con- setup. Combining ML with Markov state models, this
nected6,17,35. Again, these examples of ML approaches technique was used to identify the previously unknown
generated sets of targets that are predicted as likely to mechanism of opiate binding to the µ-​opioid receptor,
bind drugs, hence reducing the potential search space, revealing an allosteric site that is involved in its activa-
but these targets require further validation. tion42. The benefits of multi-​task models over single-​task
The holy grail for target identification or validation models are, however, highly data set-​dependent. To help
is the early prediction of future clinical trial success for benchmark ML algorithms, Pande et al. compiled a large
a target-​based drug discovery programme. Various non-​ benchmarking data set, MoleculeNet43, which has been
ML analyses point to possible predictors of success5,36,37. used for the comparison of different ML algorithms.
Using ML, Rouillard et al.38 assessed omics data for a set MoleculeNet contains data on the properties of over
of 332 targets that succeeded or failed phase III clinical 700,000 compounds. All data sets have been curated
trials by multivariate feature selection. They found gene and integrated into the open-​source DeepChem package
expression data were particularly predictive of successful (see Related links), which also includes other tools.
targets, characterized by low mean RNA expression and DNNs and modern tree search algorithms can also
high variance across tissues. This study confirmed pre- be used to plan efficient routes of chemical synthesis.
vious findings that ideal targets exhibit disease-​specific To plan the synthesis of a target molecule, the mol­
expression in affected tissues39. Ferrero et al.7 trained a ecule is formally decomposed using reversed reactions
range of ML classifiers using target–disease associations (retrosynthesis). This procedure results in a sequence of
from the open targets platform16 to predict de novo poten- reactions that can then be executed in the laboratory in
tial therapeutic targets. Assessment of feature importance the forward direction to synthesize the target. A major
identified the existence of an animal model, gene expres- challenge is to systematically apply synthetic chemistry
sion and genetic data as key data types for therapeutic knowledge to this process. The manual incorporation
target prediction independent of the indication. However, of transformation rules is prohibitive as the knowl-
this approach is limited by the sparse nature of the data edge of chemistry grows exponentially, and the scope
and the lack of information about reasons for failure of and limitations of many reactions are not completely
initiated programmes. More fundamentally, owing to understood. To automatically extract the rules, Segler
the length of time between initiating a successful drug et al.44 used the Reaxys database (~11 million reactions
discovery programme and bringing the drug to market, and ~300,000 rules) and performed a Monte Carlo tree
successful programmes reflect earlier paradigms for drug search (MCTS) to score the tree nodes in conjunction
development. The drivers of successful small-​molecule with DNNs to steer the search in the most promising
programmes are unlikely to be the same today, as newer directions. In quantitative analyses, this method out-
modalities, such as biologics (including antibodies), are performs the gold standard, best first search, with two
available. The increasing focus on precision medicine different implementations (heuristic method and neural).
introduces additional constraints. It is essential for future Furthermore, MCTS is 30 times faster than traditional
prediction approaches that extensive data on successful computer-​aided search methods for almost two-​thirds of
and failed drug discovery programmes are available with the molecules examined. Qualitative tests were also per-
metadata in the public domain. formed in a double-​blind study. Organic chemists were
asked to choose between literature-​based and predicted
Small-​molecule design and optimization. The dis- synthesis routes without knowing how the route was
covery of drug candidates that can block or activate obtained. Here, for the first time, chemists considered
the target protein of interest involves extensive virtual the quality of the predicted routes to be, on average, as
and experimental high-​throughput screening of large good as routes taken from the literature.
compound libraries. Candidate structures are then Another valuable application of DL is molecu-
further refined and modified to improve target speci- lar de novo design through reinforcement learning.
ficity and selectivity, along with optimized pharmaco- Researchers at AstraZeneca45 made use of RNNs for expan-
dynamic, pharmacokinetic and toxicological properties. sion of the chemical space by tuning a sequence-based
Importantly, though, the lack of sufficient high-​quality generative model to design compounds with almost
data for new chemistry such as proteolysis-​targeting chi- optimal values for solubility, pharmacokinetic proper-
meras (PROTACs) and macrocycles can limit the impact ties, bioactivity and other parameters. Kadurin et al.46
of ML on such chemistry. also developed similar models using deep GANs to per-
Much work has been done to apply DL methods, form molecular feature extraction on very large data sets.
Heuristic method
A function that calculates the
such as multi-​task neural networks, to ligand-​based However, it must be noted that reinforcement learning
approximate cost of a problem virtual screening. Given a lead compound, compounds might not help in identifying new and unprecedented
(or ranks alternatives). that have a similar chemical structure can be identified synthetic routes47.

www.nature.com/nrd
Reviews

Community problem-​solving competitions can be Predictive biomarkers. ML-​based biomarker discovery


useful to advance method development in a particular and drug sensitivity predictive models are demonstrated
area. Researchers at Merck Sharp & Dohme sponsored approaches to help improve clinical success rates, to
a Kaggle competition for the prediction of other rele- better understand the mechanism of action of a drug
vant absorption, distribution, metabolism and excretion and to identify the right drug for the right patients56–58.
(ADME) parameters as well as some biochemical targets. Late-​stage clinical trials take many years and millions
The winning team used DNNs, which, in 13 out of 15 of dollars to conduct, so it will be most beneficial to
assay systems, performed slightly better than a standard build, validate and apply predictive models earlier,
random forest41. Some of their key learnings were that the using preclinical and/or early-​stage clinical trial data.
optimization of the hyperparameters can improve DNNs, A translational biomarker can be predicted using ML
feature selection is not necessary, multi-​task models per- approaches on preclinical data sets. After being validated
form better than single-​task models and overfitting can using independent data sets (either preclinical or clini-
be prevented by using dropout. Ramsundar et al.40 also cal), the model and its corresponding biomarker can be
observed that multi-​task DNNs perform better than applied to stratify patients, identify potential indications
single-​task DNNs. A comparison between single-​task and suggest the mechanisms of action of a drug (Fig. 4).
and multi-​task DNNs and a comparison between differ- Although there are thousands of papers on biomark-
ent ML methods (random forest, SVM, naive Bayes and ers and predictive models in the literature, few of them
logic regression) were pursued by Lenselink et al.48 using have been used in clinical trials. Various factors contrib-
one standardized data set obtained from ChEMBL49. ute to this gap, including data quality, model selection,
Here, the DNN model performed best, and a multi-​task access to data and software, model reproducibility and
DNN was also found to be better than a single-​task DNN. the design of assays suitable for a clinical setting. To
Multi-​task DNNs have also been shown to be better for address some of the model-​related issues, several com-
predictions of lead optimization and lead identification, munity efforts have evaluated ML approaches to develop
as they can synthesize information from many distinct both classification and regression models. Several years
biological sources50 owing to the presence of multiple ago, the US Food and Drug Administration (FDA)
nodes in the output layer. organized the MicroArray Quality Control II (MAQC II)
Feature selection before model building can improve initiative to evaluate various ML methods for predict-
ML models, as shown in a study by Kramer and ing clinical end points from baseline gene expression
Gütlein51. They were also able to detect improvements data59. In the project, 36 independent teams analysed
in random forest models against other ML methods such 6 microarray data sets to generate predictive models to
as SVMs and naive Bayes, with faster performance and classify a sample with 1 of 13 clinical end points. General
fewer features used while training models. In their view, observations included the importance of the data quality
one major benefit from filtering out chemical fingerprint control processes, the need for skilled scientists (some
bits is the improvement in model interpretability. If the teams perform consistently better than other teams
fingerprint is not filtered, the interpretability is hindered using the same ML methods) and the importance of
owing to an effect called ‘bit collisions’. The crucial selecting appropriate modelling approaches for clinical
impact of filtering fingerprints was also independently end points. For instance, a poor prediction of overall
shown by Landrum et al.8. survival for patients with multiple myeloma could be
Hochreiter et al.52 also found that DNN-​based mod- partly due to applying an arbitrary survival cut-​off of
els significantly outperformed all competing methods 24 months. Both gene expression and overall survival in
and that the predictive performance of DL, using a data multiple myeloma are continuous variables, and there-
set of all ChEMBL assays and target prediction based fore, a regression-​based prediction model is appropriate.
on a simplified molecular input line entry system (SMILES) Indeed, using a univariate Cox regression approach, a
input, is in many cases comparable to that of tests per- gene expression signature that significantly predicts
formed in wet laboratories. The Hochreiter group also a high-​risk subgroup of patients was identified60. This
showed that DNNs outperformed all other ML methods signature was confirmed in several independent stud-
(k-​nearest neighbour, naive Bayes, random forest and ies and from different regression-​based approaches61–64,
Chemical fingerprint SVMs) and statistics-​based methods (similarity ensem- highlighting the advantage of a regression approach
A concept used in chemical ble approach53) for target prediction54. The same group without predefined class membership.
informatics to compare won the majority of the challenges in the Tox21 Data The National Cancer Institute (NCI)-DREAM chal-
molecules with each other. The
structure of a molecule is
Challenge 2014 (ref.55). lenge was another community effort to evaluate regres-
encoded in a series of binary An unresolved challenge in the field of small-molecule sion methods for building drug sensitivity predictive
digits (bits) that represent the design is how to best represent the chemical structure. models (defined as regression questions)65. Each partici­
presence or absence of A plethora of representations exist, from simple cir- pating team used their best modelling approaches and
particular substructures in the
cular fingerprints such as the extended-​connectivity optimized their parameter sets on the same training data
molecule.
fingerprint (ECFP) to sophisticated symmetry functions sets (35 breast cancer cell lines treated with 31 drugs)
Simplified molecular input (Fig. 3). It is still not clear which structure represen­ then tested the performance of their models on the
line entry system (SMILES) tation works best for which small-molecule design same blinded testing data sets (18 breast cancer cell lines
A line notation for entering and problem. Therefore, it will be interesting to see if the treated with the same 31 drugs). Six types of baseline
representing molecules
and reactions; for example,
rise in ML studies in the field of cheminformatics will profiling data were available for generating predictive
carbon dioxide is represented give more guidance about the best choice for structure models — RNA microarray, single nucleotide poly-
as O = C = O. representation. morphism (SNP) array, RNA sequencing, reverse phase

Nature Reviews | Drug Discovery


Reviews

protein array, exome sequencing and DNA methylation to be used as validation data sets for method develop-
status — to which 44 participating teams applied various ment and evaluation, for example, on new random for-
regression approaches such as kernel method, nonlinear est ensemble frameworks66, group factor analyses67 and
regression (regression trees), sparse linear regression, other approaches68,69.
partial least squares regression, principal component Several successful case studies have now been pub-
regression or ensemble methods. Consistent with the lished in which ML-​generated predictive models and
MAQC II results, some teams consistently outperformed their corresponding biomarkers have played a criti-
other teams using the same approaches. The differen- cal role in drug discovery and development. Li et al.56
tial performance was likely reflective of the technical conducted a case study using standard-​of-care drugs
details used for quality control, data reduction, feature in which they first built models for drug sensitivity
selection, splitting strategy and fine-​tuning ML param- to erlotinib and sorafenib (one model for each drug)
eters, as well as potential incorporation of biological using cancer cell line screen data. They then applied
knowledge such as gene function information or clin- the models to stratify patients from the BATTLE
ical data into the construction of the predictive models. clinical trial70, who were treated with one of the two
In addition, some drugs were easier to build predictive drugs, and demonstrated that the models were predic-
models for than others for all teams and methods. The tive and drug-​specific. The model-​derived biomarker
NCI-​DREAM challenge data sets and results continue genes were shown to be reflective of the mechanism of

ECFP Coulomb matrix Grid featurizer

1 O
0
NH2 N
NH H

N HN
H2N H
NH2

Symmetry function Graph convolution Weave

2 7
3 1 NH2

4 6
Cartesian 5

coordinates
1.0 2.0 3.0 4.0 5.0 6.0 NH2
C1 C2 C3 C4 C5 C6 N7
Radial distance per Å C1
C2
C3
C4
C5
C6
N7

Radial symmetry function

Fig. 3 | The challenges of compound structure representation in machine learning models. The appropriate
representation of chemical structures and their features can take on many representations depending on the required
application. Extended-​connectivity fingerprints (ECFPs) contain information about topological characteristics of the
molecule, which enables this information to be applied to tasks such as similarity searching and activity prediction.
A Coulomb matrix encodes information about the nuclear charges of a molecule and their coordinates. The grid featurizer
method incorporates structural features of both the ligand and the target protein as well as the intermolecular forces that
contribute to binding affinity. Symmetry function is another common encoding of atomic coordinate information, which
focuses on the distance between atom pairs and the on angles formed within triplets of atoms. The graph convolution
method computes an initial feature vector and a neighbour list for each atom that summarizes the local chemical
environment of an atom, including atom types, hybridization types and valence structures. Weave featurization calculates
a feature vector for each pair of atoms in the molecule, including bond properties (if directly connected), graph distance
and ring info, forming a feature matrix. Reproduced by permission of the Royal Society of Chemistry, Wu, Z. et al.
MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018), ref.43.

www.nature.com/nrd
Reviews

Drug discovery (preclinical) Drug development (clinical trials)

survival (proportion)
1.0

Progression-free
0.8

Idgeftrjbavceoykvmv

Idgeftrjbavceoykvmv
Idgeftrjbavceoykvmv

Idgeftrjbavceoykvmv
Idgeftrjbavceoykvmv

Idgeftrjbavceoykvmv

Idgeftrjbavceoykvmv

Idgeftrjbavceoykvmv
Dgeftrjbavceoykvmv
Sgeftrjbavceoykvmv

Pgeftrjbavceoykvmv
Idgefrjbavceoykvmv

Wgeftrjbavceoykvm
Idgeftrjbaceoykvmv
Idgeftrjbavceoykvm

Idgeftrjbavceoykvm

Idgeftrjbavceoykvm
Idgftrjbavceoykvmv

Idgeftrjbavceoyvmv

Idgeftrjbavcoykvmv

Idgeftrjbavceoyvmv
Ideftrjbavceoykvmv

Idgeftrjavceoykvmv

Idgeftrjavceoykvmv
Idgeftrjbaveoykvm
Idgeftrjbavcykvmv

Idgeftrjbavceoykv
Aeftrjbavcoykvmv

Idgeftrjbavcev
0.6
Jhdahfda
Jhdahfda
Ahdahfda
Pdahfda
Jhdahfda
0.4
Jhdahfda
Jhdahfda
Kdahfda
Jhdahfda
Thdxfda
0.2
Jhdahfda
Jhdahfda
Ddahfda
Jhdahfda
0.0
Jhdahfda
Ohdahfda
Whdahfda
Jhdahfda
0 2 4 6 8 10 12
Months from start of therapy

• Molecular profiling • Disease category Patient stratification, MOA


• Imaging • Drug response, and disease indication
• IHC, etc. etc. selection

Machine learning (SVM, EN, RF, etc.) Apply the model to patients
to build drug sensitivity predictive and globally normalized
models and identify biomarkers internal or external data

Drug sensitivity predictive model and corresponding


biomarker validated by independent testing data
set(s) and preclinical or early-stage clinical trials

Fig. 4 | utilizing predictive biomarkers to support drug discovery and development. A drug sensitivity predictive
model (yellow box) can be generated using machine learning approaches on preclinical data. The model could then be
tested using data from early-​stage clinical patient samples. Once validated, the model could be used for patient
stratification and/or disease indication selection to support the clinical development of a drug, as well as to infer its
mechanism of action. EN, elastic net; IHC, immunohistochemistry; MOA, mechanism of action; RF, random forest;
SVM, support vector machine.

action of each drug, and when combined with globally The rapid evolution of single-​cell RNA sequencing
normalized public domain data from various cancer technologies has been used for gene clustering and cell-​
types, the model predicted sensitivities of cancer types specific biomarker discovery. Single-​cell RNA sequenc-
to each drug that were consistent with their FDA-​ ing techniques have been used to identify novel cell
approved indications. This study shows that using ML types, distinguish cell states, trace development lineages
approaches to identify key features that contribute to and integrate expression profiles with spatial resolution
drug sensitivity across various cancer types in a tissue-​ of cells. However, an unsolved challenge is the reduc-
agnostic manner could be useful for drug develop- tion in the gene expression measurements from tens of
ment (in comparison with cancer type-​based clinical thousands of cells to low-​dimension space, typically two
trials followed by label expansions). In 2017, the FDA or three variables. Ding et al.75 developed a probabilistic
approved the programmed cell death 1 (PD1) inhibi- generative model, scvis, to reduce the high-​dimensional
tor pembrolizumab for cancers with a specific genetic space to the low-​dimensional structures in single-​cell
biomarker. This is the first FDA approval based on a gene expression data with uncertainty estimates. This
cross-​indication genetic biomarker rather than a cancer tool was then used to analyse four single-​cell RNA
type71, highlighting the need for more mechanism-​based sequencing data sets and produced 2D representations
biomarker discovery. of the multi-​dimensional single-​cell RNA sequenc-
Recently, there has been much progress on ML-​ ing data that could be interpreted to robustly identify
based predictive biomarkers in indications other than cell types. In addition, Rashid et al.76 have used vari­
oncology using various types of input data. Tasaki et al.72 ational autoencoders (VAEs) to transform single-​cell
applied ML approaches to multi-​omics data to better RNA sequencing data to a latent encoded feature space
understand drug responses for patients with rheumatoid that more efficiently differentiates between the hid-
arthritis. Pare et al.73 developed a novel ML framework den tumour subpopulations. Analysis of the encoded
based on gradient boosted regression trees to build poly- feature space revealed subpopulations of cells and the
genic risk scores for predicting complex traits. Tested on evolutionary relationship between them. The method
the UK Biobank data set, their SNP-​based models were was completely unsupervised and required minimal
able to explain 46.9% and 32.7% of overall polygenic pre-​processing of the data. Additionally, the method is
variance for height and BMI, respectively. In addition, tolerant of gene expression dropout in single-​cell RNA
Khera et al.74 developed genome-​wide polygenic scores sequencing data sets. Wang and Gu77 proposed deep
to identify individuals at high risk of coronary artery variational autoencoder for single-​cell RNA sequencing
disease, atrial fibrillation, type 2 diabetes, inflammatory data (VASC), a deep multi-​layer generative model, for
bowel disease and breast cancer. the unsupervised dimension reduction and visualization

Nature Reviews | Drug Discovery


Reviews

of this data. Tested on 20 data sets, VASC is superior haemotoxylin and eosin (H&E)-stained images. Nuclear
and has broader data set compatibility than several state-​ morphometry was among the earliest implementations
of-the-​art dimension-​reduction methods such as ZIFA78 of computational pathology, demonstrating the ability
and SIMLR79. to determine associations between computer-​generated
One exciting recent development in ML is the rapid features and prognosis87. Beck et al.88 looked at cells in
rise of feature selection for biomarker discovery. For the context of their spatial locations within the sur-
example, researchers applied unsupervised DL models rounding tumour stroma and showed associations
to extract meaningful representations of gene modules between stromal features and survival in breast cancer.
or sample clusters80. Way and Greene81 introduced a VAE Lee et al.89 have also demonstrated that computational
model trained on The Cancer Genome Atlas (TCGA) analysis of tumour-​adjacent benign tissue in prostate
pan-​cancer RNA sequencing data and identified spe- cancer can reveal information that is typically ignored
cific patterns in the VAE encoded features. Beck et al.82 by pathologists but is associated with progression-​free
conducted image analysis and data integration with survival. More recently, Lu et al. showed that features
gene expression and proteomics data to improve the that describe nuclear shape and nuclear orientation were
identification of lung squamous cell carcinoma. Nirschl strongly associated with survival in both oral cancers90
et al.83 showed that a CNN model could better predict and early-​stage oestrogen receptor-​positive breast can-
the likelihood of cardiac failure from endomyocardial cers91. In many cases, the availability of immunohisto-
biopsy samples (AUC = 0.97) than two trained cardiac chemical stains, which use antibodies to target specific
pathologists could (AUC = 0.73 and 0.75). proteins in an image and mark specific cell and tissue
In all these examples, for ML-​generated predic- types, circumvents the need for cell and tissue detec-
tive biomarkers to be more successful, there are sev- tion by morphology and thus enables the generation of
eral key issues that still need to be addressed. At least sophisticated data without the use of DL tools. However,
some of these issues concern the interpretability of in the case of immuno-​oncology, ML allows for high-​
the classifier, considered by at least some end-​users to throughput generation of features that describe spatial
be critical for clinical adoption. One of the other key relationships for thousands of cells, an infeasible task
issues is the need to validate these approaches in the for pathologists. Improvements in individual cell and
context of multi-​site, multi-​institutional data sets to tissue detection via DL methods allow for very precise
demonstrate the generalizability of the approach. The measurements of the tumour microenvironment, so het-
research community is actively addressing these issues erogeneous features that describe spatial relationships
and making rapid progresses, including the application between cells and tissue structures can now be measured
of objective approaches and measures for model train- at scale (Fig. 5).
ing and parameter optimization84, model interpreta- In a study by Mani et al.92, several markers for lym-
tion and extraction of biological insights85, and model phocytes were utilized to understand the heterogeneity
reproducibility86. of these populations in breast cancer. Giraldo et al.93
examined cell–cell interactions and showed that, using
Computational pathology. Pathology is a descriptive cell densities and the relative location of PD1+ and CD8+
field, as a pathologist interprets what is seen on a glass cells, they could identify patients with Merkel cell car-
slide by visual inspection. Analysis of these glass slides cinoma who would respond to pembrolizumab. The
provides a vast amount of information, such as the type trade-​off for these types of experiment is that they use a
of cell present in the tissue and their spatial context. The lot of tissue, typically requiring additional slides for each
interplay between tumour and immune cells within the stain; however, hundreds or thousands of features can be
tumour microenvironment is increasingly important in examined, and the number of possible cell–cell inter-
the study of immuno-​oncology and is not captured by actions increases with each stain used. In such a case,
other technologies. a combination of feature selection and ML methods is
Pharmaceutical companies need to understand how used to determine combinations that may be predictive
drug treatments affect particular tissues and cells and of therapeutic response.
need to test thousands of compounds before selecting a The application of CNNs to pathology images works
candidate for a clinical trial. Furthermore, as the num- well because there is a large number of viable pixels that
ber of clinical trials grows, discovering new biomarkers can be used for training from a single biopsy or resec-
will be increasingly important to identify patients who tion. Given enough well-​curated exemplars, a DL algo-
will respond to a particular therapy. Increased use of rithm can be designed to learn features automatically
computational pathology that may allow for the discov- for a wide variety of classification tasks94. For example, a
ery of novel biomarkers and generate them in a more multi-​scale convolutional neural network (M-​CNN) was
precise, reproducible and high-​throughput manner will used in a supervised learning approach for phenotyping
ultimately cut down drug development time and allow high-​content cellular images9 in a single step as opposed
patients faster access to beneficial therapies. to several, independent customized steps. Using solely
Before DL, algorithms for tissue image analysis were pixel intensity values from the images to convert those
often biologically inspired in collaboration with pathol- images into phenotypes, the approach resulted in overall
ogists and required computer scientists to handcraft more accurate classification of the effects of a compound
descriptive features for a computer to classify a cer- treatment at multiple concentrations. Many image
tain type of tissue or cell. These studies were aimed at analy­sis challenges have successfully used DL methods
identifying morphological descriptors in widely used to identify areas within cancer tumours95–98, tubules99,

www.nature.com/nrd
Reviews

Carcinoma localization Nuclei segmentation

• Area • Morphology
Lymphoma typing • Density • Density Epithelium segmentation
• Texture

• Density • Structure
• Cell-specific • Texture
markers Higher order tasks • Organization
(e.g. grading,
prognosis and
prediction)

• Number • Regularity
per region • Area
of interest
Mitosis detection Tubule segmentation
• Tumour
infiltration
• Organization

Lymphocyte detection

Outcome
Task-specific features
Segmentation
Detection
Classification

Fig. 5 | computational pathology tasks for machine learning applications. Deep learning frameworks can replace
traditional handcrafted features in several basic pathology image-​recognition tasks (such as segmentation of nuclei,
epithelia or tubules, lymphocyte detection, mitosis detection or classification of tumours) using image segmentation
(yellow background), detection of specific features (blue background) or detection of a set of features used for
classification (green background). Recognition is based on the task-​specific features shown in the pink regions and can
lead to more accurate prognosis or prediction of disease.

mitotic activity100 and lymphocytes101,102 in breast and reasons. First, while DL has shown its ability to match or
lung cancer. outperform humans in very specific problems (such as
Beyond pathology images, DL can also facilitate the the detection of glomeruli), it is still not a great general-​
integration of other modalities of information. DL can purpose image analysis tool. Development times remain
also be used to accelerate magnetic resonance imaging long owing to this lack of flexibility. There is also an
(MRI) data acquisition103 or reduce the radiation dose overall scarcity of expert labels available for a specific
required for computed tomography (CT) imaging104. classification task, as these are expensive to generate.
With improved imaging quality including temporal and Approaches to mitigate this include using immunohis-
spatial resolution and a high signal to noise ratio, the tochemistry staining to provide additional information
performance of image analysis may correspondingly to pathologists for samples where annotations are chal-
improve in applications such as image quantification, lenging106 as well as efforts to increase the availability
abnormal tissue detection, patient stratification and of well-​curated expert annotations for broad-​use cases
disease diagnosis or prediction. Another recent study105 (cancer cells versus normal cells), which is an ongoing
demonstrated the ability to use an inception DL frame- community task.
work to predict the presence of certain mutated genes Another challenge is the issue of transparency. DL
from H&E-​stained images of lung tumours. methods are known for their black-​box approach. The
However, although DL continues to excel in many underlying rationale behind a decision for classification
specific image analysis tasks, in practice, a combina- tasks is unclear. For drug development, it is important
tion of DL and traditional image analysis algorithms is to understand mechanisms, and having an interpretable
applied in most problem sets. This is done for several output can be useful for finding not only new potential

Nature Reviews | Drug Discovery


Reviews

drug targets but also new potential biomarkers to pre- and development pipeline by pharmaceutical compa-
dict therapeutic response. The generation of many more nies. This pervasive implementation of ML methods
handcrafted features is needed for increased trust in has a few but important known issues. A typical issue
interpretability. with deep-​trained neural networks is the lack of inter-
A further challenge is the large sample size needed pretability, that is, the difficulties in obtaining a suitable
in clinical trials to apply DL directly to infer therapeutic explanation from the trained neural network on how it
response. DL typically requires tens of thousands if not arrives at the result. If the system is used to diagnose a
hundreds of thousands of examples to learn from, and disease such as melanoma, for instance, on the basis of
clinical trials typically do not produce enough exam- medical images, this lack of interpretability may hinder
ples. In certain cases, it may be possible to combine data scientists, regulatory agencies, doctors and patients,
across clinical trials, but biases may exist that can make even in situations in which neural networks perform
the results more difficult to interpret. better than human experts. Would a patient trust
Examples of successful integration of DL and tradi- the ML diagnosis more than that of a human expert?
tional image analysis workflows include work by Saltz Although much less dramatic, a similar situation may
et al.101 and Corredor et al.102, in which CNNs were used occur in drug design. Would a pharmaceutical company
to detect lymphocytes in H&E-​stained tissue and sub- trust a neural network for choosing a small molecule for
sequent graph-​based features were extracted to predict inclusion in their portfolio and investment to progress
disease response. This will likely be a common role for to the clinic, without a clear explanation for why the
DL in the near future, as its superior ability to detect neural network has selected this molecule? In addition,
cells and tissue can replace traditional segmentation and there may be patent application issues with inventor-
nuclear detection algorithms, and subsequent interpret- ship if compounds have been designed by computer
able features can be applied to give spatial context to algorithms. In any case, ML results have to be consid-
these features. ered as only hypotheses or interesting starting points
that are then further developed in studies by research-
Outlook ers. Complementary experiments that validate the ML
ML approaches and recent developments in DL pro- result will help to build trust in approaches and outputs,
vide many opportunities to increase efficiency across but regulatory agencies have yet to clarify their view on
the drug discovery and development pipeline. As such, the lack of interpretability for the clinical use of ML.
we expect to see increasing numbers of applications for However, even beyond the issue of trust, the lack of inter-
well-​defined problems across the industry in the com- pretability of the approaches makes it more difficult to
ing years. With available data becoming ‘bigger’, at least troubleshoot these approaches when they unexpectedly
in the sense of more thoroughly covering the relevant fail on new unseen data sets.
variability of the whole data space, and as computers Another important issue for neural networks is
become increasingly more powerful, ML algorithms repeatability, which arises because ML outputs are highly
are going to systematically generate improved outputs, dependent on the initial values or weights of the network
and new, interesting applications are expected to follow. parameters or even the order in which training examples
This has been clearly exemplified in the previous sec- are presented to the network, as all of them are typically
tions, in which we have described some ML applications chosen at random. Would the network always select the
for target identification and validation, drug design and same disease target using the same expression data as
development, biomarker identification and pathology the input? Would the structure of the drug proposed
for disease diagnosis and therapy prognosis in the clinic. by the ML method always be the same? This lack of
These methods are also being applied within the repeatability is particularly problematic for biomarker
health-​care setting, which, when combined with drug identification, as seen in situations where different tools
discovery, could lead to significant advances in personal- generated different prognosis biomarkers for breast
ized medicine107. ML has also been applied to electronic cancer on the basis of molecular expression signatures113.
health records108 and real-​world evidence in order to The fact that different ML methods can yield different
improve clinical trial results and optimize the process of results will add uncertainty to the adoption of these
clinical trial eligibility assessment. For example, a recent methods at scale. Some solutions to the problems of both
study demonstrated that DNNs are a highly competitive interpretability and repeatability have been proposed.
approach for automatically extracting useful informa- These usually centre on the use of a more complex or
tion from electronic medical records for disease diag- more time-​consuming algorithm or averaging results
noses and classification109. Some studies have shown that from several network models, but this might be seen as
ML models in electronic health records can outperform adding only one more result to a range of existing results.
conventional models in predicting prognosis110. ML can Another important point to consider is the avail-
also be applied to data now coming from sensors and ability of high-​quality, accurate and curated data in
wearables to understand disease and develop treatments, large quantities to train and develop ML models. The
especially in the neurosciences111. Gkotsis et al.112 applied requirements for the amounts and accuracy desired
DL approaches to characterize mental health conditions are dependent on the complexity of the data type and
on unstructured social media data, which is a difficult the question to be resolved. Thus, it can be expensive
task for traditional ML approaches. to generate these data sets. Pre-​competitive consortia
As shown in Fig. 1, ML approaches are beginning to of pharmaceutical companies and academic institu-
be commonly used in the various steps of the discovery tions that use appropriate data standards and have the

www.nature.com/nrd
Reviews

necessary operational and open data frameworks may data. In medicinal chemistry, for example, the design
be part of the solution to meet these data demands. of compounds with alternative mechanisms of action,
Many of the data types that are used during drug dis- such as macrocycles, protein–protein interaction inhibi­
covery are far from comprehensive. For example, the tors or PROTACs, can probably only be performed with
knowledge of all folds and structures of proteins is not traditional methods.
complete, and coverage of the data space is similarly As well as data and models, the training of research-
incomplete. Thus, applications in which these structures ers that understand pharmaceutical science as well as
are predicted, even if much progress has been made, are computer science, computational statistics and statis-
not yet as good as in other areas. The same applies for tical ML and are proficient in utilizing these methods
the prediction of reactions involved in the synthesis of needs to be accelerated. Competitions like the DREAM
small molecules for which the entire chemistry space Challenges (see Related links), which have shown that
is unknown. team composition is a factor in performance, can also be
Data curation is key to the provision of reusable useful to attract talent and advance methodology devel-
and trustworthy data and can be expensive in terms of opment. However, applications will need to be success-
the time and skills required. Biological curation — the ful in the clinical setting in order to motivate further
extraction of biological information from the scien- investment from large pharmaceutical and technology
tific literature and its integration into a database — lies companies.
between an art and a science114, requiring a combina- ML algorithms, including DL methods, have enabled
tion of computational skills with in-​depth biological the utilization of AI in the industry setting and in day to
and domain expertise. Collaborative efforts to develop day life. The impact of ML methods in all areas of drug
shared data resources and metadata (labels) may be ways discovery and health care is already being felt, especially
by which high-​quality data in the public domain can be in the analysis of omics and imaging data. ML algo-
made more available. This also includes metadata from rithms are also successful in speech recognition, NLP,
both successful and failed drug discovery programmes computer vision and other applications. For example,
that can enable prediction approaches and determina- Internet-​enabled smart assistants are now common-
tion of factors that can reduce attrition in drug develop- place and can transmit health-​related information in the
ment. Much more pre-​competitive collaboration is also form of speech and images or videos. ML approaches
needed to aggregate and generate large data resources applied to data collected from such an amalgamation of
of corporate bioactive data sets of investigational Internet-​enabled technologies, coupled with biological
compounds as well as historic clinical trial data. data, have the potential to dramatically improve the
Another limitation in the application of ML models predictive power of such algorithms and aid medical
is in their use to predict alternative paradigms. Because decision making about the therapeutic benefits, clinical
the entire premise of ML relies on the use of training biomarkers and side effects of therapies.
data to generate suitable models, ML models can only
predict within the known framework of the training Published online xx xx xxxx

1. Mamoshina, P. et al. Machine learning on human diagnoses in dermatopathology. J. Pathol. Inform. 9, 21. Leung, M. K. K., Xiong, H. Y., Lee, L. J. & Frey, B. J.
muscle transcriptomic data for biomarker discovery 32–32 (2018). Deep learning of the tissue-​regulated splicing code.
and tissue-​specific drug target identification. Front. 11. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. Bioinformatics 30, i121–i129 (2014).
Genet. 9, 242 (2018). & Salakhutdinov, R. Dropout: a simple way to prevent 22. Jha, A., Gazzara, M. R. & Barash, Y. Integrative deep
2. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. neural networks from overfitting. J. Mach. Learn. Res. models for alternative splicing. Bioinformatics 33,
Nature 521, 436 (2015). 15, 1929–1958 (2014). i274–i282 (2017).
3. Chen, H., Engkvist, O., Wang, Y., Olivecrona, M. & 12. Jiao, Y. & Pufeng, D. Performance measures in 23. Vaquero-​Garcia, J. et al. A new view of transcriptome
Blaschke, T. The rise of deep learning in drug evaluating machine learning based bioinformatics complexity and regulation through the lens of local
discovery. Drug Discov. Today 23, 1241–1250 predictors for classifications. Quant. Biol. 4, 320 splicing variations. eLife 5, e11752 (2016).
(2018). (2016). 24. Sotillo, E. et al. Convergence of acquired mutations
This article is the first effort to highlight the 13. Czodrowski, P. Count on kappa. J. Comput. Aided Mol. and alternative splicing of CD19 enables resistance to
recent applications of DL in drug discovery Des. 28, 1049–1055 (2014). CART-19 immunotherapy. Cancer Discov. 5,
research and is an introduction to some popular 14. Rifaioglu, A. S. et al. Recent applications of deep 1282–1295 (2015).
DL architectures. learning and machine intelligence on in silico 25. Rohacek, A. M. et al. ESRP1 mutations cause hearing
4. Hinton, G. Deep learning — a technology with the drug discovery: methods, tools and databases. loss due to defects in alternative splicing that disrupt
potential to transform health care. JAMA 320, Brief. Bioinform. https://doi.org/10.1093/bib/bby061 cochlear development. Dev. Cell 43, 318–331 (2017).
1101–1102 (2018). (2018). 26. Xiong, H. Y. et al. RNA splicing. The human splicing
5. Wong, C. H., Siah, K. W. & Lo, A. W. Estimation of 15. Hinton, G. E. & Salakhutdinov, R. R. Reducing the code reveals new insights into the genetic determinants
clinical trial success rates and related parameters. dimensionality of data with neural networks. Science of disease. Science 347, 1254806 (2015).
Biostatistics https://doi.org/10.1093/biostatistics/ 313, 504 (2006). This article describes a computational model based
kxx069 (2018). 16. Koscielny, G. et al. Open targets: a platform for on DL that predicts splicing regulation for any mRNA
6. Jeon, J. et al. A systematic approach to identify novel therapeutic target identification and validation. sequence and has been applied to more than half a
cancer drug targets using machine learning, inhibitor Nucleic Acids Res. 45, D985–D994 (2017). million human mRNA splicing sequence variants.
design and high-​throughput screening. Genome Med. 17. Costa, P. R., Acencio, M. L. & Lemke, N. A machine Thousands of known disease-​causing mutations are
6, 57 (2014). learning approach for genome-​wide prediction of identified as well as new disease-​linked genes.
7. Ferrero, E., Dunham, I. & Sanseau, P. In silico prediction morbid and druggable human genes based on 27. Iorio, F. et al. A landscape of pharmacogenomic
of novel therapeutic targets using gene-​disease systems-​level data. BMC Genomics 11, S9–S9 (2010). interactions in cancer. Cell 166, 740–754 (2016).
association data. J. Transl Med. 15, 182 (2017). 18. Ament, S. A. et al. Transcriptional regulatory networks This paper applies ML to data from somatic
8. Riniker, S., Wang, Y., Jenkins, J. & Landrum, G. Using underlying gene expression changes in Huntington’s mutations, copy number alterations, DNA
information from historical high-​throughput screens to disease. Mol. Systems Biol. 14, e7435 (2018). methylation and gene expression from 1,000
predict active compounds. J. Chem. Inf. Model. 54, 19. Bravo, A., Pinero, J., Queralt-​Rosinach, N., cancer cell lines to model drug response of the cell
1880–1891 (2014). Rautschka, M. & Furlong, L. I. Extraction of relations lines and demonstrates the importance of genomic
9. Godinez, W. J., Hossain, I., Lazic, S. E., Davies, J. W. & between genes and diseases from text and large-​scale features for prediction.
Zhang, X. A multi-​scale convolutional neural network data analysis: implications for translational research. 28. Tsherniak, A. et al. Defining a cancer dependency
for phenotyping high-​content cellular images. BMC Bioinformatics 16, 55 (2015). map. Cell 170, 564–576 (2017).
Bioinformatics 33, 2010–2019 (2017). 20. Kim, J., Kim, J.-j. & Lee, H. An analysis of disease-​gene 29. McMillan, E. A. et al. Chemistry-​first approach for
10. Olsen, T. et al. Diagnostic performance of deep relationship from Medline abstracts by DigSee. Sci. Rep. nomination of personalized treatment in lung cancer.
learning algorithms applied to three common 7, 40154 (2017). Cell 173, 864–878 (2018).

Nature Reviews | Drug Discovery


Reviews

30. Al-​Lazikani, B. et al. in Bioinformatics — From generative models for molecules in drug discovery. 75. Ding, J., Condon, A. & Shah, S. P. Interpretable
Genomes to Therapies Ch. 36 (Wiley-​VCH, 2008). J. Chem. Inf. Model. 58, 1736–1741 (2018). dimensionality reduction of single cell transcriptome
31. Nayal, M. & Honig, B. On the nature of cavities on 55. Unterthiner, T., Mayr, A., Klambauer, G. & Hochreiter, S. data with deep generative models. Nat. Commun. 9,
protein surfaces: application to the identification of Toxicity prediction using deep learning. Preprint at 2002 (2018).
drug-​binding sites. Proteins 63, 892–906 (2006). arXiv https://arxiv.org/abs/1503.01445 (2015). 76. Rashid, S., Shah, S., Bar-​Joseph, Z. & Pandya, R.
This article describes a classifier to identify drug-​ 56. Li, B. et al. Development of a drug-​response modeling Project Dhaka: variational autoencoder for unmasking
binding cavities on the basis of physicochemical, framework to identify cell line derived translational tumor heterogeneity from single cell genomic data.
structural and geometric attributes of proteins. biomarkers that can predict treatment outcome to Preprint at bioRxiv https://www.biorxiv.org/content/
32. Li, Q. & Lai, L. Prediction of potential drug erlotinib or sorafenib. PLOS ONE 10, e0130700 10.1101/183863v4 (2018).
targets based on simple sequence properties. (2015). 77. Wang, D. & Gu, J. VASC: dimension reduction and
BMC Bioinformatics 8, 353 (2007). In this paper, a translational predictive biomarker visualization of single-​cell RNA-​seq data by deep
33. Bakheet, T. M. & Doig, A. J. Properties and is used to demonstrate that predictive models can variational autoencoder. Genomics Proteomics
identification of human protein drug targets. be generated from preclinical training data sets Bioinformatics 16, 320–331 (2017).
Bioinformatics 25, 451–457 (2009). and then be applied to clinical patient samples to 78. Pierson, E. & Yau, C. ZIFA: dimensionality reduction
34. Wang, Q., Feng, Y., Huang, J., Wang, T. & Cheng, G. stratify patients, infer the mechanism of action of a for zero-​inflated single-​cell gene expression analysis.
A novel framework for the identification of drug target drug and select appropriate disease indications. Genome Biol. 16, 241 (2015).
proteins: combining stacked auto-​encoders with a 57. van Gool, A. J. et al. Bridging the translational 79. Wang, B., Zhu, J., Pierson, E., Ramazzotti, D. &
biased support vector machine. PLOS ONE 12, innovation gap through good biomarker practice. Batzoglou, S. Visualization and analysis of single-​cell
e0176486 (2017). Nat. Rev. Drug Discov. 16, 587–588 (2017). RNA-​seq data by kernel-​based similarity learning.
35. Kandoi, G., Acencio, M. L. & Lemke, N. Prediction 58. Kraus, V. B. Biomarkers as drug development tools: Nat. Methods 14, 414 (2017).
of druggable proteins using machine learning and discovery, validation, qualification and use. Nat. Rev. 80. Tan, J., Hammond, J. H., Hogan, D. A. & Greene, C. A.-O.
systems biology: a mini-​review. Front. Physiol. 6, Rheumatol. 14, 354–362 (2018). ADAGE-​based integration of publicly available
366–366 (2015). 59. Shi, L. et al. The MicroArray Quality Control (MAQC)-II Pseudomonas aeruginosa gene expression data with
36. Nelson, M. R. et al. The support of human genetic study of common practices for the development and denoising autoencoders illuminates microbe-host
evidence for approved drug indications. Nat. Genet. validation of microarray-​based predictive models. interactions. mSystems 1, e00025–15 (2016).
47, 856–860 (2015). Nat. Biotechnol. 28, 827–838 (2010). 81. Way, G. P. & Greene, C. S. Extracting a biologically
37. Morgan, P. et al. Impact of a five-​dimensional 60. Zhan, F. et al. The molecular classification of multiple relevant latent space from cancer transcriptomes with
framework on R&D productivity at AstraZeneca. myeloma. Blood 108, 2020–2028 (2006). variational autoencoders. Pac. Symp. Biocomput. 23,
Nat. Rev. Drug Discov. 17, 167–181 (2018). 61. Shaughnessy, J. D. Jr. et al. A validated gene 80–91 (2018).
38. Rouillard, A. D., Hurle, M. R. & Agarwal, P. expression model of high-​risk multiple myeloma is 82. Casanova, R. et al. Morphoproteomic characterization
Systematic interrogation of diverse Omic data reveals defined by deregulated expression of genes mapping of lung squamous cell carcinoma fragmentation, a
interpretable, robust, and generalizable transcriptomic to chromosome 1. Blood 109, 2276–2284 (2007). histological marker of increased tumor invasiveness.
features of clinically successful therapeutic targets. 62. Zhan, F., Barlogie, B., Mulligan, G., Shaughnessy, J. D. Cancer Res. 77, 2585–2593 (2017).
PLOS Comput. Biol. 14, e1006142 (2018). Jr & Bryant, B. High-​risk myeloma: a gene expression 83. Nirschl, J. J. et al. A deep-​learning classifier identifies
39. Kumar, V., Sanseau, P., Simola, D. F., Hurle, M. R. & based risk-​stratification model for newly diagnosed patients with clinical heart failure using whole-​slide
Agarwal, P. Systematic analysis of drug targets confirms multiple myeloma treated with high-​dose therapy is images of H&E tissue. PLOS ONE 13, e0192726
expression in disease-​relevant tissues. Sci. Rep. 6, predictive of outcome in relapsed disease treated with (2018).
36205 (2016). single-​agent bortezomib or high-​dose dexamethasone. 84. Angermueller, C., Pärnamaa, T., Parts, L. & Stegle, O.
40. Ramsundar, B. et al. Is multitask deep learning Blood 111, 968–969 (2008). Deep learning for computational biology. Mol. Syst. Biol.
practical for pharma? J. Chem. Inf. Model. 57, 63. Decaux, O. et al. Prediction of survival in multiple 12, 878 (2016).
2068–2076 (2017). myeloma based on gene expression profiles reveals 85. Finnegan, A. & Song, J. S. Maximum entropy
41. Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E. & Svetnik, V. cell cycle and chromosomal instability signatures in methods for extracting the learned features of deep
Deep neural nets as a method for quantitative high-​risk patients and hyperdiploid signatures in low-​ neural networks. PLOS Comput. Biol. 13, e1005836
structure–activity relationships. J. Chem. Inf. Model. risk patients: a study of the Intergroupe Francophone (2017).
55, 263–274 (2015). du Myelome. J. Clin. Oncol. 26, 4798–4805 (2008). 86. Hutson, M. Artificial intelligence faces reproducibility
42. Barati Farimani, A., Feinberg, E. & Pande, V. Binding 64. Mulligan, G. et al. Gene expression profiling and crisis. Science 359, 725–726 (2018).
pathway of opiates to μ-​opioid receptors revealed by correlation with outcome in clinical trials of the 87. Veltri, R. W., Partin, A. W. & Miller, M. C. Quantitative
machine learning. Biophys. J. 114, 62a–63a (2018). proteasome inhibitor bortezomib. Blood 109, nuclear grade (QNG): a new image analysis-​based
43. Wu, Z. et al. MoleculeNet: a benchmark for molecular 3177–3188 (2007). biomarker of clinically relevant nuclear structure
machine learning. Chem. Sci. 9, 513–530 (2018). 65. Costello, J. C. et al. A community effort to assess alterations. J. Cell. Biochem. Suppl. 35, S151–S157
44. Segler, M. H. S., Preuss, M. & Waller, M. P. Planning and improve drug sensitivity prediction algorithms. (2000).
chemical syntheses with deep neural networks and Nat. Biotechnol. 32, 1202–1212 (2014). 88. Beck, A. H. et al. Systematic analysis of breast cancer
symbolic AI. Nature 555, 604 (2018). This paper is an effort to collect and objectively morphology uncovers stromal features associated with
This seminal paper describes a very thorough evaluate various ML approaches by teams around survival. Sci. Transl Med. 3, 108ra113 (2011).
approach to retrosynthetic analysis. The authors the world on multi-​omics data sets and various 89. Lee, G. et al. Nuclear shape and architecture in benign
show that their method can compete with compounds. The data sets and results are fields predict biochemical recurrence in prostate
retrosynthesis done by experienced chemists who continuously used as benchmarks for new method cancer patients following radical prostatectomy:
are experts in this field. developments and validation. preliminary findings. Eur. Urol. Focus 3, 457–466
45. Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. 66. Rahman, R., Otridge, J. & Pal, R. IntegratedMRF: (2017).
Molecular de-​novo design through deep reinforcement random forest-​based framework for integrating 90. Lu, C. et al. An oral cavity squamous cell carcinoma
learning. J. Cheminform. 9, 48 (2017). prediction from different data types. Bioinformatics quantitative histomorphometric-​based image classifier
46. Kadurin, A., Nikolenko, S., Khrabrov, K., Aliper, A. 33, 1407–1410 (2017). of nuclear morphology can risk stratify patients
& Zhavoronkov, A. druGAN: an advanced generative 67. Bunte, K., Leppäaho, E., Saarinen, I. & Kaski, S. for disease-​specific survival. Mod. Pathol. 30,
adversarial autoencoder model for de novo generation Sparse group factor analysis for biclustering of 1655–1665 (2017).
of new molecules with desired molecular properties in multiple data sources. Bioinformatics 32, 2457–2463 91. Lu, C. et al. Nuclear shape and orientation features
silico. Mol. Pharm. 14, 3098–3104 (2017). (2016). from H&E images predict survival in early-​stage
47. Smith, J. S., Roitberg, A. E. & Isayev, O. Transforming 68. Huang, C., Mezencev, R., McDonald, J. F. & Vannberg, F. estrogen receptor-​positive breast cancers. Lab. Invest.
computational drug discovery with machine learning Open source machine-​learning algorithms for the 98, 1438–1448 (2018).
and AI. ACS Med. Chem. Lett. 9, 1065–1069 (2018). prediction of optimal cancer drug therapies. PLOS ONE 92. Mani, N. L. et al. Quantitative assessment of the
48. Lenselink, E. B. et al. Beyond the hype: deep neural 12, e0186906 (2017). spatial heterogeneity of tumor-​infiltrating lymphocytes
networks outperform established methods using a 69. Hejase, H. A. & Chan, C. Improving drug sensitivity in breast cancer. Breast Cancer Res. 18, 78 (2016).
ChEMBL bioactivity benchmark set. J. Cheminform. 9, prediction using different types of data. CPT 93. Giraldo, N. A. et al. The differential association of
45 (2017). Pharmacometrics Syst. Pharmacol. 4, e2 (2015). PD-1, PD-​L1, and CD8 + cells with response to
49. Gaulton, A. et al. The ChEMBL database in 2017. 70. Kim, E. S. et al. The BATTLE trial: personalizing pembrolizumab and presence of Merkel cell
Nucleic Acids Res. 45, D945–D954 (2017). therapy for lung cancer. Cancer Discov. 1, 44–53 polyomavirus (MCPyV) in patients with Merkel cell
50. Ramsundar, B. et al. Massively multitask networks (2011). carcinoma (MCC). Cancer Res. 77, 662 (2017).
for drug discovery. Preprint at arXiv https://arxiv.org/ 71. Boyiadzis, M. M. et al. Significance and implications of 94. Janowczyk, A. & Madabhushi, A. Deep learning for
abs/1502.02072 (2015). FDA approval of pembrolizumab for biomarker-defined digital pathology image analysis: a comprehensive
51. Gutlein, M. & Kramer, S. Filtered circular fingerprints disease. J. Immunother. Cancer 6, 35 (2018). tutorial with selected use cases. J. Pathol. Informat. 7,
improve either prediction or runtime performance 72. Tasaki, S. et al. Multi-​omics monitoring of drug 29 (2016).
while retaining interpretability. J. Cheminform. 8, 60 response in rheumatoid arthritis in pursuit of This article is the first comprehensive review of DL
(2016). molecular remission. Nat. Commun. 9, 2755 (2018). in the context of digital pathology images. The
52. Mayr, A. et al. Large-​scale comparison of machine This work identifies molecular signatures that are paper also systematically explains and presents
learning methods for drug target prediction on resistant to drug treatments and illustrates a multi-​ approaches for training and validating DL
ChEMBL. Chem. Sci. 9, 5441–5451 (2018). omics approach to understanding drug response. classifiers for a number of image-​based problems
This research paper describes the methodology 73. Paré, G., Mao, S. & Deng, W. Q. A machine-​learning in digital pathology, including cell detection,
being used by the winners of almost all categories heuristic to improve gene score prediction of polygenic segmentation and tissue classification.
of the Tox21 Challenge. traits. Sci. Rep. 7, 12665 (2017). 95. Sharma, H., Zerbe, N., Klempert, I., Hellwich, O. &
53. Keiser, M. J. et al. Relating protein pharmacology by 74. Khera, A. V. et al. Genome-​wide polygenic scores Hufnagl, P. Deep convolutional neural networks for
ligand chemistry. Nat. Biotechnol. 25, 197 (2007). for common diseases identify individuals with risk automatic classification of gastric carcinoma using
54. Preuer, K., Renz, P., Unterthiner, T., Hochreiter, S. & equivalent to monogenic mutations. Nat. Genet. 50, whole slide images in digital histopathology. Comput.
Klambauer, G. Fréchet ChemNet Distance: a metric for 1219–1224 (2018). Med. Imaging Graph. 61, 2–13 (2017).

www.nature.com/nrd
Reviews

96. Korbar, B. et al. Deep learning for classification of quantification of immune cells from H&E slides and 109. Yang, Z. et al. Clinical assistant diagnosis for electronic
colorectal polyps on whole-​slide images. J. Pathol. the identification of sub-​categories of immune medical record based on convolutional neural
Informat. 8, 30 (2017). infiltrate as related to therapeutic outcome. network. Sci. Rep. 8, 6329 (2018).
97. Bychkov, D. et al. Deep learning based tissue analysis 102. Corredor, G. et al. Spatial architecture and 110. Steele, A. J., Denaxas, S. C., Shah, A. D., Hemingway, H.
predicts outcome in colorectal cancer. Sci. Rep. 8, arrangement of tumor-​infiltrating lymphocytes for & Luscombe, N. M. Machine learning models in
3395 (2018). predicting likelihood of recurrence in early-​stage electronic health records can outperform conventional
98. Cruz-​Roa, A. et al. Accurate and reproducible invasive non-​small cell lung cancer. Clin. Cancer Res. 25, survival models for predicting patient mortality in
breast cancer detection in whole-​slide images: A Deep 1526–1534 (2018). coronary artery disease. PLOS ONE 13, e0202344
Learning approach for quantifying tumor extent. In this paper, the spatial arrangement, and not just (2018).
Sci. Rep. 7, 46450 (2017). the density, of tumour-​infiltrating lymphocytes in 111. Mohr, D. C., Zhang, M. & Schueller, S. M. Personal
This is one of the first papers to apply DL to early-​stage lung cancer pathology images is shown sensing: understanding mental health using ubiquitous
identify regions of breast cancer on digital to be prognostic of recurrence. A comprehensive sensors and machine learning. Annu. Rev. Clin. Psychol.
pathology images and shows that the algorithmic comparison is provided, showing that computer- 13, 23–47 (2017).
approach outperforms breast cancer pathologists. extracted features of spatial arrangement of tumour- 112. Gkotsis, G. et al. Characterisation of mental health
It is one of the first studies to have a large data set infiltrating lymphocytes are more prognostic conditions in social media using Informed Deep
of cases (>600) with independent training and than manual (pathologist) enumeration of Learning. Sci. Rep. 7, 45141 (2017).
validation sets. tumour-infiltrating lymphocyte density. 113. Koscielny, S. Why most gene expression signatures of
99. Romo-​Bucheli, D., Janowczyk, A., Gilmore, H., 103. Cohen, O., Zhu, B. & Rosen, M. S. MR fingerprinting tumors have not been useful in the clinic. Sci. Transl
Romero, E. & Madabhushi, A. Automated tubule Deep RecOnstruction NEtwork (DRONE). Magn. Med. 2, 14ps12 (2010).
nuclei quantification and correlation with oncotype DX Reson. Med. 80, 885–894 (2018). 114. Odell, S. G., Lazo, G. R., Woodhouse, M. R., Hane, D. L.
risk categories in ER + breast cancer whole slide 104. Chen, H. et al. Low-​dose CT with a residual encoder-​ & Sen, T. Z. The art of curation at a biological database:
images. Sci. Rep. 6, 32706 (2016). decoder convolutional neural network (RED-​CNN). principles and application. Curr. Plant Biol. 11–12,
This article applies DL to identify the presence and Preprint at arXiv https://arxiv.org/abs/1702.00288 2–11 (2017).
location of tubules in breast pathology images (2017).
and subsequently demonstrates that the number 105. Coudray, N. et al. Classification and mutation prediction Acknowledgements
of detected tubules correlates with the risk from non–small cell lung cancer histopathology images The authors thank E. Birney and E. Papa for helpful com-
assessments of breast cancer via a genomic test. using deep learning. Nat. Med. 24, 1559–1567 ments, M. Segler for contributing to the small-​molecule opti-
It is one of the first papers to show how DL can be (2018). mization subsection and A. Janowczyk for providing the
used to establish genotype–phenotype associations. This paper uses DL frameworks to predict pathology images in Figure 4.
100. Romo-​Bucheli, D., Janowczyk, A., Gilmore, H., mutations from H&E images, which has implications
Romero, E. & Madabhushi, A. A deep learning based for identifying key mechanistic insights from Competing interests
strategy for identifying and associating mitotic activity standard whole-​slide imaging as well as for patient The authors declare no competing interests.
with gene expression derived risk categories in stratification.
estrogen receptor positive breast cancers. Cytometry 106. Turkki, R., Linder, N., Kovanen, P. E., Pellinen, T. & Publisher’s note
A 91, 566–573 (2017). Lundin, J. Antibody-​supervised deep learning for Springer Nature remains neutral with regard to jurisdictional
101. Saltz, J. et al. Spatial organization and molecular quantification of tumor-​infiltrating immune cells in claims in published maps and institutional affiliations.
correlation of tumor-​infiltrating lymphocytes using hematoxylin and eosin stained breast cancer samples.
deep learning on pathology images. Cell Rep. 23, J. Pathol. Inform. 7, 38 (2016).
181–193 (2018). 107. Norgeot, B., Glicksberg, B. S. & Butte, A. J. A call Related links
This large-​scale study utilizes DL to identify for deep-​learning healthcare. Nat. Med. 25, 14–15 DeepChem: https://www.deepchem.io/
lymphocytes across all images and relate spatial (2019). DReAM Challenges: http://dreamchallenges.org/
characteristics of lymphocytes to molecular 108. Esteva, A. et al. A guide to deep learning in healthcare. TensorFlow: https://www.tensorflow.org/
assessments. This article is key to the automatic Nat. Med. 25, 24–29 (2019).

Nature Reviews | Drug Discovery

You might also like