0% found this document useful (0 votes)
16 views9 pages

DeepDTA: Deep Drug-Target Binding Affinity Prediction

This document proposes a deep learning model called DeepDTA that uses the sequences of proteins and ligands to predict drug-target binding affinities. It evaluates DeepDTA on two datasets, finding it outperforms other methods at predicting binding affinities and obtains the lowest error. The model uses convolutional neural networks to learn representations from protein and ligand sequences alone without additional features.

Uploaded by

wangxiangwen0201
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views9 pages

DeepDTA: Deep Drug-Target Binding Affinity Prediction

This document proposes a deep learning model called DeepDTA that uses the sequences of proteins and ligands to predict drug-target binding affinities. It evaluates DeepDTA on two datasets, finding it outperforms other methods at predicting binding affinities and obtains the lowest error. The model uses convolutional neural networks to learn representations from protein and ligand sequences alone without additional features.

Uploaded by

wangxiangwen0201
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Bioinformatics, 34, 2018, i821–i829

doi: 10.1093/bioinformatics/bty593
ECCB 2018

DeepDTA: deep drug–target binding


affinity prediction
Hakime Öztürk1, Arzucan Özgür1,* and Elif Ozkirimli2,*

Downloaded from https://academic.oup.com/bioinformatics/article/34/17/i821/5093245 by guest on 21 November 2022


1
Department of Computer Engineering and 2Department of Chemical Engineering, Bogazici University, Istanbul
34342, Turkey
*To whom correspondence should be addressed.

Abstract
Motivation: The identification of novel drug–target (DT) interactions is a substantial part of the
drug discovery process. Most of the computational methods that have been proposed to predict
DT interactions have focused on binary classification, where the goal is to determine whether a DT
pair interacts or not. However, protein–ligand interactions assume a continuum of binding strength
values, also called binding affinity and predicting this value still remains a challenge. The increase
in the affinity data available in DT knowledge-bases allows the use of advanced learning techni-
ques such as deep learning architectures in the prediction of binding affinities. In this study, we
propose a deep-learning based model that uses only sequence information of both targets and
drugs to predict DT interaction binding affinities. The few studies that focus on DT binding affinity
prediction use either 3D structures of protein–ligand complexes or 2D features of compounds. One
novel approach used in this work is the modeling of protein sequences and compound 1D repre-
sentations with convolutional neural networks (CNNs).
Results: The results show that the proposed deep learning based model that uses the 1D represen-
tations of targets and drugs is an effective approach for drug target binding affinity prediction. The
model in which high-level representations of a drug and a target are constructed via CNNs
achieved the best Concordance Index (CI) performance in one of our larger benchmark datasets,
outperforming the KronRLS algorithm and SimBoost, a state-of-the-art method for DT binding af-
finity prediction.
Availability and implementation: https://github.com/hkmztrk/DeepDTA
Contact: [email protected] or [email protected]
Supplementary information: Supplementary data are available at Bioinformatics online.

1 Introduction (Cer et al., 2009) and low IC50 values signal strong binding.
The successful identification of drug–target interactions (DTI) is a Similarly, low Ki values indicate high binding affinity. Kd and Ki val-
critical step in drug discovery. As the field of drug discovery expands ues are usually represented in terms of pKd or pKi, the negative loga-
with the discovery of new drugs, repurposing of existing drugs and rithm of the dissociation or inhibition constants.
identification of novel interacting partners for approved drugs is In binary classification based DTI prediction studies, construc-
also gaining interest (Oprea and Mestres, 2012). Until recently, DTI tion of the datasets constitutes a major step, since designation of the
prediction was approached as a binary classification problem negative (not-binding) samples directly affects the performance of
(Bleakley and Yamanishi, 2009; Cao et al., 2014, 2012; Cobanoglu the model. As of last decade, most of the DTI studies utilized four
et al., 2013; Gönen, 2012; Öztürk et al., 2016; Yamanishi et al., major datasets by Yamanishi et al. (2008) in which DT pairs with
2008; van Laarhoven et al., 2011), neglecting an important piece of no known binding information are treated as negative (not-binding)
information about protein–ligand interactions, namely the binding samples. Recently, DTI studies that rely on databases with binding
affinity values. Binding affinity provides information on the strength affinity information have been providing more realistic binary
of the interaction between a drug–target (DT) pair and it is usually datasets created with a chosen binding affinity threshold value (Wan
expressed in measures such as dissociation constant (Kd), inhibition and Zeng, 2016). Formulating the DT prediction task as a binding
constant (Ki) or the half maximal inhibitory concentration (IC50). affinity prediction problem enables the creation of more realistic
IC50 depends on the concentration of the target and ligand datasets, where the binding affinity scores are directly used.

C The Author(s) 2018. Published by Oxford University Press.


V i821
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/),
which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact
[email protected]
i822 H.Öztürk et al.

Furthermore, a regression-based model brings in the advantage of In this study, we propose an approach to predict the binding
predicting an approximate value for the strength of the interaction affinities of protein–ligand interactions with deep learning models
between the drug and target which in turn would be significantly using only sequences (1D representations) of proteins and ligands.
beneficial for limiting the large compound search-space in drug dis- To this end, the sequences of the proteins and SMILES (Simplified
covery studies. Molecular Input Line Entry System) representations of the com-
Prediction of protein–ligand binding affinities has been the focus pounds are used rather than external features or 3D-structures of
of protein–ligand scoring, which is frequently used after virtual the binding complexes. We employ CNN blocks to learn representa-
screening and docking campaigns in order to predict the putative tions from the raw protein sequences and SMILES strings and com-
strengths of the proposed ligands to the target (Ragoza et al., 2017). bine these representations to feed into a fully connected layer block
Non-parametric machine learning methods such as the Random that we call DeepDTA. We use the Davis Kinase binding affinity
Forest (RF) algorithm have been used as a successful alternative to dataset (Davis et al., 2011) and the KIBA large-scale kinase inhibi-
scoring functions that depend on multiple parameters (Ballester and tors bioactivity data (He et al., 2017; Tang et al., 2014) to evaluate

Downloaded from https://academic.oup.com/bioinformatics/article/34/17/i821/5093245 by guest on 21 November 2022


Mitchell, 2010; Li et al., 2015; Shar et al., 2016). However, Gabel the performance of our model and compare our results with the
et al. (2014) showed that RF-score failed in virtual screening and KronRLS (Pahikkala et al., 2014) and SimBoost algorithms (He
docking tests, speculating that using features such as co-occurrence et al., 2017). Our new model that uses two separate CNN-based
of atom-pairs over-simplified the description of the protein–ligand blocks to represent proteins and drugs performs as well as the
complex and led to the loss of information that the raw interaction KronRLS and SimBoost algorithms on the Davis dataset, and it per-
complex could provide. Around the same time this study was pub- forms significantly better than both the KronRLS and SimBoost
lished, deep learning started to become a popular architecture pow- algorithms on the KIBA dataset (P-value, 0.0001). With our pro-
ered by the increase in data and high capacity computing machines posed model, we also obtain the lowest Mean Squared Error (MSE)
challenging other machine learning methods. value on both datasets.
Inspired by the remarkable success rate in image processing
(Ciregan et al., 2012; Donahue et al., 2014; Simonyan and
Zisserman, 2015) and speech recognition (Dahl et al., 2012; Graves
et al., 2013; Hinton et al., 2012), deep learning methods are now
2 Materials and methods
being intensively used in many other research fields, including bio- 2.1 Datasets
informatics such as in genomics studies (Leung et al., 2014; Xiong We evaluated our proposed model on two different datasets, the
et al., 2015) and quantitative-structure activity relationship (QSAR) Kinase dataset Davis (Davis et al., 2011) and KIBA dataset (Tang
studies in drug discovery (Ma et al., 2015). The major advantage of et al., 2014), which were previously used as benchmark datasets for
deep learning architectures is that they enable better representations binding affinity prediction evaluation (He et al., 2017; Pahikkala
of the raw data by non-linear transformations in each layer (LeCun et al., 2014).
et al., 2015) and thus they facilitate learning the hidden patterns in The Davis dataset contains selectivity assays of the kinase pro-
the data. tein family and the relevant inhibitors with their respective dissoci-
A few studies employing Deep Neural Networks (DNN) have al- ation constant (Kd) values. It comprises interactions of 442 proteins
ready been performed for DTI binary class prediction using different and 68 ligands. The KIBA dataset, on the other hand, originated
input models for proteins and drugs (Chan et al., 2016; Tian et al., from an approach called KIBA, in which kinase inhibitor bioactiv-
2015; Hamanaka et al., 2016) in addition to some studies that ities from different sources such as Ki, Kd and IC50 were combined
employ stacked auto-encoders (Wang et al., 2017) and deep- (Tang et al., 2014). KIBA scores were constructed to optimize the
belief networks (Wen et al., 2017). Similarly, stacked auto-encoder consistency between Ki, Kd and IC50 by utilizing the statistical infor-
based models with Recurrent Neural Networks (RNNs) and mation they contained. The KIBA dataset originally comprised 467
Convolutional Neural Networks (CNNs) were applied to represent targets and 52 498 drugs. He et al. (2017) filtered it to contain only
chemical and genomic structures in real-valued vector forms drugs and targets with at least 10 interactions yielding a total of 229
(Gómez-Bombarelli et al., 2018; Jastrzkeski et al., 2016). Deep unique proteins and 2111 unique drugs. Table 1 summarizes these
learning approaches have also been applied to protein–ligand inter- datasets in the forms that we used in our experiments.
action scoring in which a common application has been the use of While Pahikkala et al. (2014) used the Kd values of the Davis
CNNs that learn from the 3D structures of the protein–ligand com- dataset directly as the binding affinity values, we used the values
plexes (Gomes et al., 2017; Ragoza et al., 2017; Wallach et al., transformed into log space, pKd, similar to He et al. (2017) as
2015). However, this approach is limited to known protein–ligand explained in Equation (1).
complex structures, with only 25 000 ligands reported in PDB (Rose  
Kd
et al., 2016). pKd ¼ log10 (1)
1e9
Pahikkala et al. (2014) employed the Kronecker Regularized
Least Squares (KronRLS) algorithm that utilizes only 2D based Figure 1A (left panel) illustrates the distribution of the binding
compound similarity-based representations of the drugs and affinity values in pKd form. The peak at pKd value 5 (10 000 nM)
Smith–Waterman similarity representation of the targets. Recently, constitutes more than half of the dataset (20 931 out of 30 056).
SimBoost method was proposed to predict binding affinity scores These values correspond to the negative pairs that either have very
with a gradient boosting machine by using feature engineering to
represent DTI (He et al., 2017). They utilized similarity-based in- Table 1. Summary of the datasets
formation of DT pairs as well as features that were extracted from
network-based interactions between the pairs. Both studies used Proteins Compounds Interactions
traditional machine learning algorithms and utilized 2D-represen-
Davis (Kd) 442 68 30 056
tations of the compounds in order to obtain similarity KIBA 229 2111 118 254
information.
DeepDTA i823

Downloaded from https://academic.oup.com/bioinformatics/article/34/17/i821/5093245 by guest on 21 November 2022


Fig. 1. Summary of the Davis (left panel) and KIBA (right panel) datasets. (A) Distribution of binding affinity values. (B) Distribution of the lengths of the SMILES
strings. (C) Distribution of the lengths of the protein sequences

weak binding affinities (Kd > 10000 nM) or are not observed in the The protein sequences of the Davis dataset were extracted from
primary screen (Pahikkala et al., 2014). As such they are true the UniProt protein database based on gene names/RefSeq accession
negatives. numbers (Apweiler et al., 2004). Similarly, the UniProt IDs of the
The distribution of the KIBA scores is depicted in the right panel targets in the KIBA dataset were used to collect the protein sequen-
of Figure 1A. He et al. (2017) pre-processed the KIBA scores as fol- ces. Figure 1C (left panel) shows the lengths of the sequences of the
lows: (i) for each KIBA score, its negative was taken, (ii) the min- proteins in the Davis dataset. The maximum length of a protein se-
imum value among the negatives was chosen and (iii) the absolute quence is 2549 and the average length is 788 characters. Figure 1C
value of the minimum was added to all negative scores, thus con- (right panel) depicts the distribution of protein sequence length in
structing the final form of the KIBA scores. KIBA targets. The maximum length of a protein sequence is 4128
The compound SMILES strings of the Davis dataset were and the average length is 728 characters.
extracted from the Pubchem compound database based on their We should also note that the Smith–Waterman (S–W) similarity
Pubchem CIDs (Bolton et al., 2008). For KIBA, first the CHEMBL among proteins of the KIBA dataset is at most 60% for 99% of the
IDs were converted into Pubchem CIDs and then, the corresponding protein pairs. The target similarity is at most 60% for 92% of the
CIDs were used to extract the SMILES strings. Figure 1B illustrates protein pairs for the Davis dataset. These statistics indicate that
the distribution of the lengths of the SMILES strings of the com- both datasets are non-redundant.
pounds in the Davis (left) and KIBA (right) datasets. For the com-
pounds of the Davis dataset, the maximum length of a SMILES is
103, while the average length is equal to 64. For the compounds of 2.2 Input representation
KIBA, the maximum length of a SMILES is 590, while the average We used integer/label encoding that uses integers for the categories
length is equal to 58. to represent inputs. We scanned approximately 2 M SMILES
i824 H.Öztürk et al.

sequences that we collected from Pubchem and compiled 64 labels


(unique letters). For protein sequences, we scanned 550 K protein
sequences from UniProt and extracted 25 categories (unique letters).
Here we represent each label with a corresponding integer (e.g.
‘C’: 1, ‘H’: 2, ‘N’: 3 etc.). The label encoding for the example
SMILES, ‘CN¼C ¼ O’, is given below.

½C N ¼ C ¼ O ¼ ½1 3 63 1 63 5 

Protein sequences are encoded in a similar way using label encod-


ings. Both SMILES and protein sequences have varying lengths.
Hence, in order to create an effective representation form, we decided
on fixed maximum lengths of 85 for SMILES and 1200 for protein

Downloaded from https://academic.oup.com/bioinformatics/article/34/17/i821/5093245 by guest on 21 November 2022


sequences for Davis. To represent the components of KIBA, we chose
the maximum 100 characters length for SMILES and 1000 for protein
sequences. We chose these maximum lengths based on the distribu-
tions illustrated in Figure 1B and C so that the maximum lengths
cover at least 80% of the proteins and 90% of the compounds in the
datasets. The sequences that are longer than the maximum length are
truncated, whereas shorter sequences are 0-padded.

2.3 Proposed model


In this study, we treated protein–ligand interaction prediction as a regres-
sion problem by aiming to predict the binding affinity scores. As a pre-
diction model, we adopted a popular deep learning architecture,
Fig. 2. DeepDTA model with two CNN blocks to learn from compound
Convolutional Neural Network (CNN). CNN is an architecture that
SMILES and protein sequences
contains one or more convolutional layers often followed by a pooling
layer. A pooling layer down-samples the output of the previous layer
and provides a way of generalization of the features that are learned by The learning was completed with 100 epochs and mini-batch
the filters. On top of the convolutional and pooling layers, the model is size of 256 was used to update the weights of the network. Adam
completed with one or more fully connected (FC) layers. The most was used as the optimization algorithm to train the networks
powerful feature of CNN models is their ability to capture the local (Kingma and Ba, 2015) with the default learning rate of 0.001. We
dependencies with the help of filters. Therefore, the number and size of used Keras’ Embedding layer to represent characters with 128-di-
the filters in a CNN directly affects the type of features the model learns mensional dense vectors. The input for Davis dataset consisted of
from the input. It is often reported that as the number of filters increases, (85, 128) and (1200, 128) dimensional matrices for the compounds
the model becomes better at recognizing patterns (Kang et al., 2014). and proteins, respectively. We represented KIBA dataset with a
We proposed a CNN-based prediction model that comprises two (100, 128) dimensional matrix for the compounds and a (1000,
separate CNN blocks, each of which aims to learn representations from 128) dimensional matrix for the proteins.
SMILES strings and protein sequences. For each CNN block, we used
three consecutive 1D-convolutional layers with increasing number of fil-
ters. The second layer had double and the third convolutional layer had 3 Experiments and results
triple the number of filters in the first one. The convolutional layers were Here, we propose a novel drug–target binding affinity prediction
then followed by the max-pooling layer. The final features of the max- method based on only sequence information of compounds and pro-
pooling layers were concatenated and fed into three FC layers, which we teins. We utilized the Concordance Index (CI) to measure the per-
named as DeepDTA. We used 1024 nodes in the first two FC layers, formance of the proposed model and compared it with the current
each followed by a dropout layer of rate 0.1. Dropout is a regularization state-of-art methods that we chose as our baselines, namely a
technique that is used to avoid over-fitting by setting the activation of Kronecker Regularized Least Squares (KronRLS) based approach
some of the neurons to 0 (Srivastava et al., 2014). The third layer con- (Pahikkala et al., 2014) and SimBoost (He et al., 2017). We provide
sisted of 512 nodes and was followed by the output layer. The proposed more information about these baseline methodologies, our model
model that combines two CNN blocks is illustrated in Figure 2. and experimental setup, as well as our results in the following
As the activation function, we used Rectified Linear Unit (ReLU) subsections.
(Nair and Hinton, 2010), gðxÞ ¼ maxð0; xÞ, which has been widely
used in deep learning studies (LeCun et al., 2015). A learning model
tries to minimize the difference between the expected (real) value 3.1 Baselines
and the prediction during training. Since we work on a regression 3.1.1 Kron-RLS
task, we used mean squared error (MSE) as the loss function, in KronRLS aims to minimize the following function, where f is the
which P is the prediction vector, and Y corresponds to the vector of prediction function (Pahikkala et al., 2014):
actual outputs. n indicates the number of samples.
X
m
Jðf Þ ¼ ðyi  f ðxi ÞÞ2 þ kjjf jj2k (3)
1X
n
MSE ¼ ðPi  Yi Þ2 (2) i¼1
n i¼1
jjf jj2k is the norm of f, which is related to the kernel function k,
and k > 0 is a regularization hyper-parameter defined by the user.
DeepDTA i825

A minimizer for Equation (3) can be defined as follows (Kimeldorf


and Wahba, 1971):
X
m
f ðxÞ ¼ ai kðx; xi Þ (4)
i¼1

where k is the kernel function. In order to represent compounds, Fig. 3. Experiment setup
they utilized a similarity matrix computed using Pubchem structure
clustering server (Pubchem Sim)(http://pubchem.ncbi.nlm.nih.gov), 1X
a tool that utilizes single linkage for cluster and uses 2D properties CI ¼ hðbi  bj Þ (7)
Z d >d
i j
of the compounds to measure their similarity. As for proteins, the
Smith–Waterman algorithm was used to construct a protein similar- where bi is the prediction value for the larger affinity di, bj is the pre-
ity matrix (Smith and Waterman, 1981).

Downloaded from https://academic.oup.com/bioinformatics/article/34/17/i821/5093245 by guest on 21 November 2022


diction value for the smaller affinity dj, Z is a normalization con-
stant, h(x) is the step function (Pahikkala et al., 2014):
8
3.1.2 SimBoost >
> 1; if x > 0
<
SimBoost is a gradient boosting machine based method that depends hðxÞ ¼ 0:5; if x ¼ 0 (8)
>
>
on the features constructed from drugs, targets and drug–target pairs :
0; if x < 0
(He et al., 2017). The proposed methodology uses feature engineer-
ing to build three types of features: (i) object-based features that util- The metric measures whether the predicted binding affinity val-
ize occurrence statistics and pairwise similarity information of drugs ues of two random drug–target pairs were predicted in the same
and targets, (ii) network-based features such as neighbor statistics, order as their true values were. We used paired-t test for the statistic-
network metrics (betweenness, closeness etc.), PageRank score, al significance tests with 95% confidence interval. We also used
which are collected from the respective drug–drug and target–target MSE, which was explained in Section 2.3, as an evaluation metric.
networks (In a drug–drug network, drugs are represented as nodes
and connected to each other if the similarity of these two drugs is
3.3 Experiment setup
above a user-defined threshold. The target–target network is con-
We evaluated the performance of the proposed model on the bench-
structed in a similar way.) and (iii) network-based features that are
mark datasets (Davis et al., 2011; Tang et al., 2014) similarly to (He
collected from a heterogeneous network (drug–target network)
et al., 2017). They used nested-cross validation to decide on the best
where a node can either be a drug or target and the drug nodes
parameters for each test set. In order to learn a generalized model,
and target nodes are connected to each other via binding affinity
we randomly divided our dataset into six equal parts in which one
value. In addition to the network metrics, neighbor statistics and
part is selected as the independent test set. The remaining parts of
PageRank scores, as well as latent vectors from matrix factorization
the dataset were used to determine the hyper-parameters via 5-fold
are also included in this type of network.
cross validation. Figure 3 illustrates the partitioning of the dataset.
These features are fed into a supervised learning method named
The same setting with the same train and test folds was used for
gradient boosting regression trees (Chen and Guestrin, 2016; Chen
KronRLS (Pahikkala et al., 2014) and Simboost (He et al., 2017) for
and He, 2015) derived from gradient boosting machine model
a fair comparison.
(Friedman, 2001). With gradient boosting regression trees, for a
We decided on three hyper-parameters for our model, namely
given drug–target pair dti, the binding affinity score yi predicted as
the number of the filters (same for proteins and compounds), the
follows (He et al., 2017):
length of the filter size for compounds, and the length of the filter
X
M size for proteins. We opted to experiment with different filter
yi ¼ hðdti Þ ¼ fm ðdti Þ; fm 2 F (5) lengths for compounds and proteins instead of a common length,
m¼1
due to the fact that they have different alphabets. The hyper-
in which M denotes the number of regression trees and F represents parameter combination that provided the best average CI score over
the space of all possible trees. A regularized objective function to the validation set was selected as the best combination in order to
learn the set of trees fm is described in the following form (He et al., model the test set. We first experimented with hyper-parameters
2017): chosen from a wide range and then fine-tuned the model. For ex-
X X ample, to determine the number of filters we performed a search
RðhÞ ¼ lðyi ; y i Þ þ aðfm Þ (6) over [16, 32, 64, 128, 512]. We then narrowed the search range
i m
around the best performing parameter (e.g. if 16 was chosen as the
where l is the loss function that measures the difference between the best parameter, then our range was updated as [4, 8, 16, 20] etc.).
actual binding affinity value yi and the predicted value y i , while a is As explained in the Proposed Model subsection, the second convo-
the tuning parameter that controls the complexity of the model. The lution layer was set to contain twice the number of filters of the first
details are described in (Chen and Guestrin, 2016; Chen and He, layer, and the third one was set to contain three times the number of
2015; He et al., 2017). Similar to Pahikkala et al. (2014), He et al. filters of the first layer. 32 filters gave the best results over the cross-
(2017) also used PubChem clustering server for drug similarity and validation experiments. Therefore, in the final model, each CNN
Smith–Waterman for protein similarity computation. block consisted of three 1D convolutions of 32, 64, 96 filters. For all
test results reported in Table 3, we used the same structure summar-
ized in Table 2 except for the lengths of the pre-fine-tuned filters that
3.2 Evaluation metrics were used for the compound CNN-block and protein CNN-block.
To evaluate the performance of a model that outputs continuous In order to provide a more robust performance measure, we eval-
values, Concordance Index (CI) was used (Gönen and Heller, 2005): uated the performance over the independent test set which was
i826 H.Öztürk et al.

Table 2. Parameter settings for CNN based DeepDTA model Table 4. The average CI and MSE scores of the test set trained on
five different training sets for the KIBA dataset
Parameters Range
Proteins Compounds CI (std) MSE
Number of filters 32*1; 32*2; 32*3
Filter length (compounds) [4, 6, 8] KronRLS S–W Pubchem Sim 0.782 (0.0009) 0.411
Filter length (proteins) [4, 8, 12] (Pahikkala
epoch 100 et al., 2014)
hidden neurons 1024; 1024; 512 SimBoost S–W Pubchem Sim 0.836 (0.001) 0.222
batch size 256 (He et al.,
dropout 0.1 2017)
optimizer Adam DeepDTA S–W Pubchem Sim 0.710 (0.002) 0.502
learning rate (lr) 0.001 DeepDTA CNN Pubchem Sim 0.718 (0.004) 0.571
DeepDTA S–W CNN 0.854 (0.001) 0.204

Downloaded from https://academic.oup.com/bioinformatics/article/34/17/i821/5093245 by guest on 21 November 2022


DeepDTA CNN CNN 0.863 (0.002) 0.194

Table 3. The average CI and MSE scores of the test set trained on Note: The standard deviations are given in parenthesis.
five different training sets for the Davis dataset

Proteins Compounds CI (std) MSE


Tables 3 and 4 report the average MSE and CI scores over the in-
KronRLS S–W Pubchem Sim 0.871 (0.0008) 0.379 dependent test set of the five models trained with the same parame-
(Pahikkala ters (shown in Table 2) using the five different training sets for
et al., 2014) Davis and KIBA datasets.
SimBoost S–W Pubchem Sim 0.872 (0.002) 0.282
In the Davis dataset, SimBoost and KronRLS methods perform
(He et al.,
similarly while the CI values for SimBoost is higher than that for
2017)
DeepDTA S–W Pubchem Sim 0.790 (0.009) 0.608
KronRLS in the larger KIBA dataset. When the similarity measures
DeepDTA CNN Pubchem Sim 0.835 (0.005) 0.419 S–W, for proteins, and Pubchem Sim, for compounds, are used with
DeepDTA S–W CNN 0.886 (0.008) 0.420 the the fully connected part of the neural networks (DeepDTA), the
DeepDTA CNN CNN 0.878 (0.004) 0.261 CI drops to 0.79 for the Davis dataset and to 0.71 for the KIBA
dataset. The MSE increases to >0.5. These results suggest that the
Note: The standard deviations are given in parenthesis. use of a feed-forward neural network with predefined features is not
sufficient to describe drug target interactions and to predict drug tar-
initially left out (blue part). We utilized the same five training sets get affinities. Therefore, we used CNN layers to learn representations
that we used in 5-fold cross validation to train the model with the of drugs and proteins to capture hidden patterns in the datasets.
learned parameters in Table 2 (note that the validation sets were not We first used CNN to learn representations of proteins and used
used, yielding only four green parts for each training set.) The final the predefined Pubchem Sim scores for the ligands. Using this com-
CI score was reported as the average of these five results. Keras bination did not improve the results suggesting that use of a CNN
(Chollet et al., 2015) with Tensorflow (Abadi et al., 2016) back-end architecture is not effective enough to learn from amino acid
was used as development framework. Our experiments were run on sequences.
OpenSuse 13.2 [3.50 GHz Intel(R) Xeon(R) and GeForce GTX Then we used the CNN block to learn compound representa-
1070 (8GB)]. The work was accelerated by running on GPU with tions from SMILES and used the predefined S–W scores for the pro-
cuDNN (Chetlur et al., 2014). We provide our source code as well teins. This combination outperformed the baselines on the KIBA
as the train and test folds of the datasets (https://github.com/ dataset with statistical significance (P-value of 0.0001 for both
hkmztrk/DeepDTA/). SimBoost and KronRLS), and on the Davis dataset (P-value of
around 0.03 for both SimBoost and KronRLS). These results sug-
gested that the CNN is able to capture more information than
3.4 Results Pubchem Sim in the compound representation task.
In this study, we propose a deep-learning model that uses two CNN- Motivated by this result, we tested the combined CNN model in
blocks to learn representations for drugs and targets based on their which both protein and compound representations are learned from
sequences. As a baseline for comparison, the KronRLS algorithm and the CNN layer. This method performed as well as the baseline meth-
SimBoost methods that use similarity matrices for proteins and com- ods with CI score of 0.878 on the Davis dataset and achieved the
pounds as input were used. The S–W and Pubchem Sim algorithms best CI score (0.863) on the KIBA dataset with statistical signifi-
were used to compute the pairwise similarities for the proteins and cance over both baselines (P-value of 0.0001 for both). The MSE
ligands, respectively. We then used these S–W and Pubchem Sim simi- values of this model were also notably lower than the MSE of the
larity scores as inputs to the FC part of our model (DeepDTA) to baseline models on both datasets. Even though learning protein rep-
evaluate the model. Finally, we used three alternative combinations in resentations with CNN was not effective, combination of the two
learning the hidden patterns of the data and used this information as CNN blocks for proteins and ligands provided a strong model.
input to our DeepDTA model. The combinations were (i) learning In an effort to provide a better assessment of our model, we
only compound representation with a CNN block and using S–W measured the performances of DeepDTA with two CNN modules
similarity as protein representation, (ii) learning only protein sequence and two baseline methods with two different metrics as well. r2m
representation with a CNN block and using Pubchem Sim to describe index can be used to evaluate the external predictive performance of
compounds and (iii) learning both protein representation and com- QSAR models where r2m values > 0.5 for the test set was determined
pound representations with a CNN block. We call the last combin- as an acceptable model. The metric is described in Equation (9)
ation used with DeepDTA the combined model. where r2 and r02 are the squared correlation coefficients with and
DeepDTA i827

2
Table 5. The average rm and AUPR scores of the test set trained on
five different training sets for the Davis dataset

Proteins Compounds r2m (std) AUPR (std)

KronRLS S–W Pubchem 0.407 (0.005) 0.661 (0.010)


(Pahikkala Sim
et al., 2014)
SimBoost S–W Pubchem 0.644 (0.006) 0.709 (0.008)
(He et al., Sim
2017)
DeepDTA CNN CNN 0.630 (0.017) 0.714 (0.010)

Note: The standard deviations are given in parenthesis.

Downloaded from https://academic.oup.com/bioinformatics/article/34/17/i821/5093245 by guest on 21 November 2022


2
Table 6. The average rm and AUPR scores of the test set trained on
five different training sets for the KIBA dataset

Proteins Compounds r2m (std) AUPR (std)

KronRLS S–W Pubchem 0.342 (0.001) 0.635 (0.004)


(Pahikkala Sim
et al., 2014)
SimBoost S–W Pubchem 0.629 (0.007) 0.760 (0.003)
(He et al., Sim
2017)
DeepDTA CNN CNN 0.673 (0.009) 0.788 (0.004)

Note: The standard deviations are given in parenthesis.

without intercept, respectively. The details of the formulation are


explained in (Pratim Roy et al., 2009; Roy et al., 2013).
qffiffiffiffiffiffiffiffiffiffiffiffiffiffi
r2m ¼ r2  ð1  r2  r20 Þ (9)

The Area Under Precision Recall (AUPR) score is adopted by


many studies that utilize binary prediction. In order to measure
AUPR based performances, we converted the quantitative datasets
into binary datasets by selecting binding affinity thresholds. For Davis
dataset we used pKd value of 7 as threshold (pKd  7 binds) similar to Fig. 4. Predictions from DeepDTA model with two CNN blocks against meas-
(He et al., 2017). For KIBA dataset we used the suggested threshold ured (real) binding affinity values for Davis (pKd) and KIBA (KIBA score)
datasets
KIBA value of 12.1 (He et al., 2017; Tang et al., 2014). Tables 5 and
6 depict the performances of DeepDTA with two CNN modules and
two baseline methods on Davis and KIBA datasets, respectively. that employed the KronRLS regression algorithm (Pahikkala et al.,
The results suggest that both SimBoost and DeepDTA are ac- 2014) and the SimBoost method (He et al., 2017) as our baselines.
ceptable models for affinity prediction in terms of r2m value and We perform our experiments on the Davis kinase–drug dataset and
DeepDTA performs significantly better than SimBoost in KIBA the KIBA dataset.
dataset in terms of r2m (P-value of 0.0001) and AUPR performances Our results showed that the use of predefined features with
(P-value of 0.0003). DeepDTA is not sufficient to describe protein–ligand interactions.
Figure 4 illustrates the predicted against measured (actual) bind- However, when two CNN-blocks that learn representations of pro-
ing affinity values for Davis and KIBA datasets. A perfect model is teins and drugs based on raw sequence data are used in conjunction
expected to provide a p ¼ y line where predictions (p) are equal to with DeepDTA, the performance increases significantly compared
the measured (y) values. We observe that especially for KIBA data- to both baseline methodologies for both KIBA and Davis datasets.
set, the density is high around the p ¼ y line. Furthermore, the model that uses CNN to learn compound represen-
We also provide plots for two sample targets from KIBA dataset tations from SMILES and S–W similarities of proteins also achieves
with predictions against actual values in Supplementary Figures S1 better performance than the baselines.
and S2. We observed that the model that uses CNN-block to learn pro-
teins and 2D compound similarity to represent compounds per-
formed poorly compared to the other methods that employ CNN.
4 Conclusion This might be an indication that amino-acids require a structure that
We propose a deep-learning based approach to predict drug–target can handle their ordered relationships, which the CNN architecture
binding affinity using only sequences of proteins and drugs. We use failed to capture successfully. Long-Short Term Memory (LSTM),
Convolutional Neural Networks (CNN) to learn representations which is a special type of Recurrent Neural Networks (RNN),
from the raw sequence data of proteins and drugs and fully con- could be a more suitable approach to learn from protein sequences,
nected layers (DeepDTA) in the affinity prediction task. We com- since the architecture has memory blocks that allow effective learning
pare the performance of the proposed model with two recent studies from a long sequence. LSTM architecture has been successfully
i828 H.Öztürk et al.

employed to tasks such as detecting homology (Hochreiter et al., Cer,R.Z. et al. (2009) Ic 50-to-k i: a web-based tool for converting ic 50 to k i
2007), constructive peptide design (Muller et al., 2018) and function values for inhibitors of enzyme activity and ligand binding. Nucleic Acids
prediction (Liu, 2017) that utilize amino-acid sequences. As future Res., 37, W441–W445.
Chan,K.C. et al. (2016) Large-scale prediction of drug–target interactions
work, we also aim to utilize a recent ligand-based protein representa-
from deep representations. In: 2016 International Joint Conference on
tion method proposed by our team that uses SMILES sequences of
Neural Networks (IJCNN), Vancouver, BC, Canada. IEEE, pp. 1236–1243.
the interacting ligands to describe proteins (Öztürk et al., 2018).
Chen, T. and Guestrin, C. (2016) Xgboost: a scalable tree boosting system. In:
The results indicated that deep-learning based methodologies Proceedings of the 22nd acm sigkdd international conference on knowledge
performed notably better than the baseline methods with a statistical discovery and data mining, San Francisco, CA, USA. ACM, pp. 785–794.
significance when the dataset grows in size, as the KIBA dataset is Chen,T. and He,T. (2015) Higgs boson discovery with boosted trees. In: NIPS
four times larger than the Davis dataset. The improvement over the 2014 Workshop on High-energy Physics and Machine Learning, Montreal,
baseline was significantly higher for the KIBA dataset (from CI score Canada, pp. 69–80.
of 0.836 to 0.863) compared to the Davis dataset (from CI score of Chetlur,S. et al. (2014) cudnn: Efficient primitives for deep learning. arXiv pre-

Downloaded from https://academic.oup.com/bioinformatics/article/34/17/i821/5093245 by guest on 21 November 2022


0.872 to 0.878). The increase in the data enables the deep learning print arXiv: 1410.0759.
Chollet,F. et al. (2015) Keras. https://github.com/fchollet/keras.
architectures to capture the hidden information better.
Ciregan,D. et al. (2012) Multi-column deep neural networks for image classifi-
The major contribution of this study is the presentation of a
cation. In: 2012 IEEE Conference on Computer Vision and Pattern
novel deep learning-based model for drug–target affinity prediction
Recognition (CVPR), Providence, Rhode Island. IEEE, pp. 3642–3649.
that uses only character representations of proteins and drugs. By Cobanoglu,M.C. et al. (2013) Predicting drug–target interactions using prob-
simply using raw sequence information for both drugs and targets, abilistic matrix factorization. J. Chem. Inf. Model., 53, 3399–3409.
we were able to achieve similar or better performance than the base- Dahl,G.E. et al. (2012) Context-dependent pre-trained deep neural networks
line methods that depend on multiple different tools and algorithms for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang.
to extract features. Process., 20, 30–42.
A large percentage of proteins remains untargeted, either due to Davis,M.I. et al. (2011) Comprehensive analysis of kinase inhibitor selectivity.
bias in the drug discovery field for a select group of proteins or due Nat. Biotechnol., 29, 1046–1051.
Donahue,J. et al. (2014) Decaf: a deep convolutional activation feature for
to their undruggability, and this untapped pool of proteins has
generic visual recognition. In: ICML, Beijing, China, pp. 647–655.
gained interest with protein deorphanizing efforts (Edwards et al.,
Edwards,A.M. et al. (2011) Too many roads not taken. Nature, 470, 163.
2011; Fedorov et al., 2010; O’Meara et al., 2016). As future work,
Fedorov,O. et al. (2010) The (un) targeted cancer kinome. Nat. Chem. Biol.,
we will focus on building an effective representation for protein 6, 166.
sequences. The methodology can then be extended to predict the af- Friedman,J.H. (2001) Greedy function approximation: a gradient boosting
finity of known compounds/targets to novel targets/drugs as well as machine. Ann. Stat., 29, 1189–1232.
to the prediction of the affinity of novel drug–target pairs. Gabel,J. et al. (2014) Beware of machine learning-based scoring functions
on the danger of developing black boxes. J. Chem. Inf. Model., 54,
2807–2815.
Gomes,J. et al. (2017) Atomic convolutional networks for predicting
Acknowledgements protein–ligand binding affinity. arXiv preprint arXiv: 1703.10603.
TUBITAK-BIDEB 2211-E Scholarship Program (to H.O.) and BAGEP Award Gómez-Bombarelli,R. et al. (2018) Automatic chemical design using a
of the Science Academy (to A.O.) are gratefully acknowledged. We thank data-driven continuous representation of molecules. ACS Cent. Sci., 4,
Ethem Alpaydın, Attila Gürsoy and Pınar Yolum for the helpful discussions. 268–276.
Gönen,M. (2012) Predicting drug–target interactions from chemical and gen-
omic kernels using bayesian matrix factorization. Bioinformatics, 28,
2304–2310.
Funding Gönen,M. and Heller,G. (2005) Concordance probability and discriminatory
This work was supported by Bogazici University Research Fund (BAP) Grant power in proportional hazards regression. Biometrika, 92, 965–970.
Number 12304. Graves,A. et al. (2013) Speech recognition with deep recurrent neural net-
works. In: 2013 IEEE international conference on acoustics, speech and sig-
Conflict of Interest: none declared. nal processing, Vancouver, Canada. IEEE, pp. 6645–6649.
Hamanaka,M. et al. (2016) Cgbvs-dnn: prediction of compound–
protein interactions based on deep learning. Mol. Inform., 36. doi:
References 10.1002/minf.201600045.
Abadi,M. et al. (2016) Tensorflow: a system for large-scale machine learning. He,T. et al. (2017) Simboost: a read-across approach for predicting
In: OSDI, Vol. 16, pp. 265–283. drug–target binding affinities using gradient boosting machines.
Apweiler,R. et al. (2004) Uniprot: the universal protein knowledgebase. J. Cheminform., 9, 24.
Nucleic Acids Res., 32(Suppl. 1), D115–D119. Hinton,G. et al. (2012) Deep neural networks for acoustic modeling in speech
Ballester,P.J. and Mitchell,J.B. (2010) A machine learning approach to pre- recognition: the shared views of four research groups. IEEE Signal Process.
dicting protein–ligand binding affinity with applications to molecular dock- Mag., 29, 82–97.
ing. Bioinformatics, 26, 1169–1175. Hochreiter,S. et al. (2007) Fast model-based protein homology detection with-
Bleakley,K. and Yamanishi,Y. (2009) Supervised prediction of drug–target out alignment. Bioinformatics, 23, 1728–1736.
interactions using bipartite local models. Bioinformatics, 25, 2397–2403. Jastrzkeski,S. et al. (2016) Learning to smile (s). arXiv preprint arXiv:
Bolton,E.E. et al. (2008) Pubchem: integrated platform of small molecules and 1602.06289.
biological activities. Annu. Rep. Comput. Chem., 4, 217–241. Kang,L. et al. (2014) Convolutional neural networks for no-reference image
Cao,D.-S. et al. (2012) Large-scale prediction of drug–target interactions using quality assessment. In: Proceedings of the IEEE Conference on Computer
protein sequences and drug topological structures. Anal. Chim. Acta, 752, Vision and Pattern Recognition, Washington, DC, USA, pp. 1733–1740.
1–10. Kimeldorf,G. and Wahba,G. (1971) Some results on tchebycheffian spline
Cao,D.-S. et al. (2014) Computational prediction of drug–target interactions functions. J. Math. Anal. Appl., 33, 82–95.
using chemical, biological, and network features. Mol. Inform., 33, Kingma,D. and Ba,J. (2015) Adam: a method for stochastic optimization. In:
669–681. 3rd International Conference for Learning Representations, San Diego.
DeepDTA i829

LeCun,Y. et al. (2015) Deep learning. Nature, 521, 436–444. predictions: emphasis on scaling of response data. J. Comput. Chem., 34,
Leung,M.K. et al. (2014) Deep learning of the tissue-regulated splicing code. 1071–1082.
Bioinformatics, 30, i121–i129. Shar,P.A. et al. (2016) Pred-binding: large-scale protein–ligand binding affin-
Li,H. et al. (2015) Low-quality structural and interaction data improves ity prediction. J. Enzyme Inhib. Med. Chem., 31, 1443–1450.
binding affinity prediction via random forest. Molecules, 20, 10947–10962. Simonyan,K. and Zisserman,A. (2015) Very deep convolutional networks for
Liu,X. (2017) Deep recurrent neural network for protein function prediction large-scale image recognition. In: 3rd International Conference on
from sequence. arXiv Preprint arXiv, 1701, 08318. Learning Representations (ICLR), Hilton San Diego Resort & Spa, May
Ma,J. et al. (2015) Deep neural nets as a method for quantitative 7–9, 2015.
structure–activity relationships. J. Chem. Inf. Model., 55, 263–274. Smith,T.F. and Waterman,M.S. (1981) Identification of common molecular
Muller,A.T. et al. (2018) Recurrent neural network model for constructive subsequences. J. Mol. Biol., 147, 195–197.
peptide design. J. Chem. Inf. Model., 58, 472–479. Srivastava,N. et al. (2014) Dropout: a simple way to prevent neural networks
Nair,V. and Hinton,G.E. (2010) Rectified linear units improve restricted from overfitting. J. Mach. Learn. Res., 15, 1929–1958.
boltzmann machines. In: Proceedings of the 27th International Conference Tang,J. et al. (2014) Making sense of large-scale kinase inhibitor bioactivity

Downloaded from https://academic.oup.com/bioinformatics/article/34/17/i821/5093245 by guest on 21 November 2022


on Machine Learning (ICML-10), Haifa, Israel, pp. 807–814. data sets: a comparative and integrative analysis. J. Chem. Inf. Model., 54,
O’Meara,M.J. et al. (2016) Ligand similarity complements sequence, physical 735–743.
interaction, and co-expression for gene function prediction. PLoS One, 11, Tian,K. et al. (2015) Boosting compound–protein interaction prediction by
e0160098. deep learning. In: 2015 IEEE International Conference on Bioinformatics
Oprea,T. and Mestres,J. (2012) Drug repurposing: far beyond new targets for and Biomedicine (BIBM), Washington, DC, USA. IEEE, pp. 29–34.
old drugs. AAPS J., 14, 759–763. van Laarhoven,T. et al. (2011) Gaussian interaction profile kernels for predict-
Öztürk,H. et al. (2016) A comparative study of smiles-based compound ing drug–target interaction. Bioinformatics, 27, 3036–3043.
similarity functions for drug–target interaction prediction. BMC Wallach,I. et al. (2015) Atomnet: a deep convolutional neural network for bio-
Bioinformatics, 17, 128. activity prediction in structure-based drug discovery. arXiv preprint arXiv:
Öztürk,H. et al. (2018) A novel methodology on distributed representations of 1510.02855.
proteins using their interacting ligands. Bioinformatics, 34, i295–i303. Wan,F. and Zeng,J. (2016) Deep learning with feature embedding for
Pahikkala,T. et al. (2014) Toward more realistic drug–target interaction pre- compound-protein interaction prediction. bioRxiv. doi: 10.1101/086033.
dictions. Brief. Bioinformatics, 16, 325–327. Wang,L. et al. (2017) A computational-based method for predicting
Pratim Roy,P. et al. (2009) On two novel parameters for validation of predict- drug–target interactions by using stacked autoencoder deep neural network.
ive qsar models. Molecules, 14, 1660–1701. J. Comput. Biol., 25, 361–373.
Ragoza,M. et al. (2017) Protein–ligand scoring with convolutional neural net- Wen,M. et al. (2017) Deep-learning-based drug–target interaction prediction.
works. J. Chem. Inf. Model., 57, 942–957. J. Proteome Res., 16, 1401–1409.
Rose,P.W. et al. (2016) The RCSB protein data bank: integrative view of Xiong,H.Y. et al. (2015) The human splicing code reveals new insights into
protein, gene and 3D structural information. Nucleic Acids Res., 45, the genetic determinants of disease. Science, 347, 1254806.
D271–D281. Yamanishi,Y. et al. (2008) Prediction of drug–target interaction networks
Roy,K. et al. (2013) Some case studies on application of ‘rm2’ metrics for from the integration of chemical and genomic spaces. Bioinformatics, 24,
judging quality of quantitative structure–activity relationship i232–i240.

You might also like