0% found this document useful (0 votes)
6 views9 pages

Bioinformatics-2020-MingyueZheng-0-TransformerCPI

The paper presents TransformerCPI, a novel deep learning model for predicting compound-protein interactions (CPI) using only protein sequence information. It addresses common pitfalls in existing models, such as inappropriate datasets and hidden ligand bias, by constructing new datasets and implementing rigorous label reversal experiments. TransformerCPI demonstrated improved performance and interpretability, highlighting important interacting regions in protein sequences and compound structures, which could aid in drug discovery and chemical biology studies.

Uploaded by

haitao fu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views9 pages

Bioinformatics-2020-MingyueZheng-0-TransformerCPI

The paper presents TransformerCPI, a novel deep learning model for predicting compound-protein interactions (CPI) using only protein sequence information. It addresses common pitfalls in existing models, such as inappropriate datasets and hidden ligand bias, by constructing new datasets and implementing rigorous label reversal experiments. TransformerCPI demonstrated improved performance and interpretability, highlighting important interacting regions in protein sequences and compound structures, which could aid in drug discovery and chemical biology studies.

Uploaded by

haitao fu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Bioinformatics, 36(16), 2020, 4406–4414

doi: 10.1093/bioinformatics/btaa524
Advance Access Publication Date: 19 May 2020
Original Paper

Structural bioinformatics
TransformerCPI: improving compound–protein
interaction prediction by sequence-based deep learning

Downloaded from https://academic.oup.com/bioinformatics/article/36/16/4406/5840724 by guest on 05 November 2024


with self-attention mechanism and label reversal
experiments
Lifan Chen1,2, Xiaoqin Tan1,2, Dingyan Wang1,2, Feisheng Zhong1,2, Xiaohong Liu1,3,
Tianbiao Yang1,2, Xiaomin Luo1, Kaixian Chen1,3, Hualiang Jiang1,3,* and
Mingyue Zheng 1,*

1
Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy
of Sciences, Shanghai 201203, China, 2University of Chinese Academy of Sciences, Beijing 100049, China and 3Shanghai Institute for
Advanced Immunochemical Studies, School of Life Science and Technology, ShanghaiTech University, Shanghai 200031, China
*To whom correspondence should be addressed.
Associate Editor: Arne Elofsson
Received on February 26, 2020; revised on April 13, 2020; editorial decision on May 12, 2020; accepted on May 14, 2020

Abstract
Motivation: Identifying compound–protein interaction (CPI) is a crucial task in drug discovery and chemogenomics
studies, and proteins without three-dimensional structure account for a large part of potential biological targets,
which requires developing methods using only protein sequence information to predict CPI. However, sequence-
based CPI models may face some specific pitfalls, including using inappropriate datasets, hidden ligand bias and
splitting datasets inappropriately, resulting in overestimation of their prediction performance.
Results: To address these issues, we here constructed new datasets specific for CPI prediction, proposed a novel
transformer neural network named TransformerCPI, and introduced a more rigorous label reversal experiment to
test whether a model learns true interaction features. TransformerCPI achieved much improved performance on the
new experiments, and it can be deconvolved to highlight important interacting regions of protein sequences and
compound atoms, which may contribute chemical biology studies with useful guidance for further ligand structural
optimization.
Availability and implementation: https://github.com/lifanchen-simm/transformerCPI.
Contact: [email protected] or [email protected]

1 Introduction same time in a unified model (Bleakley and Yamanishi, 2009; Cheng
et al., 2012; Gonen, 2012; Jacob and Vert, 2008; van Laarhoven
Identifying compound–protein interaction (CPI) plays an import et al., 2011; Wang et al., 2011; Wang and Zeng, 2013; Yamanishi
role in discovering hit compounds (Vamathevan et al., 2019). et al., 2008).
Conventional methods, such as structure-based virtual screening With the rapid development of deep learning, many types of end-
and ligand-based virtual screening, have been studied for decades to-end frameworks have been utilized in CPI research. In compari-
and gained great success in drug discovery. However, some cases are son with traditional machine leaning algorithms, end-to-end learn-
not suitable to apply conventional screening methods, where the ing integrates representation learning and model training in a
protein three-dimensional (3D) structure is unknown or the amount unified architecture simultaneously and no descriptors need to be
of known ligand dataset is too small. Therefore, Bredel and Jacoby defined and calculated before modeling. Although deep neural net-
(2004) introduced a novel perspective called chemogenomics to pre- works have been used in several CPI models, these current methods
dict CPI without protein 3D structures. A variety of machine learn- take predefined molecular fingerprints and protein descriptors as in-
ing based algorithms have been proposed since then, which put features, which are fixed during training process and contain
considers compound information and protein information at the less information than that of end-to-end learning (Hamanaka et al.,

C The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected]
V 4406
TransformerCPI 4407

2017; Tian et al., 2016; Wan and Zeng, 2016). Regarding the CPI investigate if the model learns in the manner as expected. The hid-
problem as binary classification task, compounds can be considered den ligand bias issue has been reported in DUD-E and MUV datasets
as 1D sequences or molecular graphs (i.e. traditionally called 2D (Sieg et al., 2019), raising extensive concerns in the field of drug de-
structures), and protein sequences can be regarded as 1D sequences. sign. Structure-based virtual screening, 3D-CNN-based models
DeepDTA (Ozturk et al., 2018) used convolutional neural networks (Chen et al., 2019) and other models trained on DUD-E dataset
(CNNs) to extract low-dimensional real-valued features of com- (Sieg et al., 2019) have been pointed out to make predictions mainly
pounds and proteins, and then concatenated two feature vectors and based on ligand patterns rather than interaction features, leading to
pass through fully connected layers to calculate the final output. mismatch between theoretical modeling and practical application.
WideDTA (Öztürk et al., 2019) and Conv-DTI (Lee et al., 2019) fol- We wondered whether chemogenomics-based CPI modeling facing
lowed the similar idea, and WideDTA utilized two extra features as similar problem, and thus revisited a previous typical model CPI–
well, ligand max common structures and protein motifs and GNN trained on Human dataset as an example to study the poten-
domains, to improve model performance. From another perspective tial effects of hidden ligand bias.
that regards compound structure as molecular graph, CPI–GNN Figure 1A shows the weight distribution plot of CPI–GNN
(Tsubaki et al., 2019) and GraphDTA (Nguyen et al., 2019) used model trained on Human dataset. The weights of CNN blocks used

Downloaded from https://academic.oup.com/bioinformatics/article/36/16/4406/5840724 by guest on 05 November 2024


graph neural networks (Scarselli et al., 2009) (GNNs) and graph to extract protein features are significantly concentrated in zero,
CNNs (Kipf and Welling, 2016) (GCNs) instead of CNNs to learn which indicates that little protein information has been considered
the representation of compounds. In addition, recurrent neural net- when making prediction. In contrast, the weight distribution of
works were used to extract feature vectors of compounds and pro- GNN blocks utilized to extract compound features is wide and flat.
teins in DeepAffinity (Karimi et al., 2019), and Gao et al. (2018) We therefore argue that ligand information plays an overwhelming
and Zheng et al. (2020) also treated compounds and proteins as se- role as compared to protein information. Further training with
quential information. ligand-only information and its comparison with the original model
Due to the high relevance to chemical biology and pharmaceut- are elucidated in Figure 1B, where the dataset was randomly split
ical chemistry, many novel models based on deep learning or ma- for 10 times, and two models were evaluated on 10 different trails.
chine learning have been developed showing satisfactory The P-value in a two-sample t-test for the difference of AUC distri-
performance on various datasets. However, much less efforts have bution is greater than 0.05, suggesting that using ligand information
been devoted to evaluate their generalization ability on external tests alone may achieve competitive performance to the original CPI–
or practical applications. Since deep learning is a data-driven tech- GNN model using both ligand and protein information. Thus, CPI–
nique, it is critical to understand what the model really learns and to GNN model mainly learns how to classify different ligands rather
avoid the influences of unexpected factors. Recently, Google than different CPI pairs, which increases the risk that a ligand is
researchers put forward three pitfalls to avoid in machine learning
(Riley, 2019), consisting of splitting data inappropriately, hidden
variable and mistaking the objective. Inspired by these warnings in
the AI industry, we wonder whether chemogenomics-based CPI
modeling is facing similar problems, and three unique problems are
summarized.

1.1 Using inappropriate datasets


Data are the core foundation of deep learning models, and in a way
what a model learns mainly depends on the datasets it is fed, and in-
appropriate datasets make the model easily deviate from the goal. In
chemogenomics-based CPI modeling, the general goal of modeling is
to predict interaction between different proteins and different com-
pounds based on a form of abstract representation of protein and
ligand features. Therefore, interaction information is the key ingre-
dient that the model should learn from the datasets. Considering
chemogenomics-based CPI modeling as a binary classification task,
a properly designed dataset should mainly consist of such instances
that a specific ligand interacts with protein A but does not interact
with protein B, which forces the model to learn protein information,
or preferably the interaction features, to distinguish these instances.
This proper designed dataset cannot be separated by other informa-
tion rather than interaction features, such as ligand patterns.
Previous chemogenomics-based CPI prediction models used in-
appropriate datasets to build deep learning models, such as DUD-E
dataset (Mysinger et al., 2012) and Human dataset (Liu et al., 2015;
Tsubaki et al., 2019), where DUD-E dataset was collected with the
intention to train structure-based virtual screening. Moreover, most
ligands in DUD-E, MUV, Human and BindingDB only occur in one
class, and negative samples were generated by algorithms that may
introduce undetectable noise (Liu et al., 2015; Mysinger et al.,
2012). These datasets can be separated by ligand information, and
cannot guarantee that models learn protein information or inter-
action features. Fig. 1. Common pitfalls analysis. (A) Weight distribution plot of CPI–GNN model.
Blue line depicts the weight distribution of CNN blocks used to extract protein fea-
tures. Orange line depicts the weight distribution of GNN blocks used to extract
1.2 Hidden ligand bias compound features. (B) Violin plots of AUROC for two CPI–GNN models, one uti-
Deep learning system is usually referred to as black-box models, lizing both protein and ligand information, and the other using ligand information
thus it is difficult to interpret what exactly the model learns and only. The white dots are the average AUROCs. The upper and lower end points of
based on which the model makes a prediction. Obtaining a better the black segments are the first and the third quartile, respectively. The P-value of
the t test is shown above the violin. Each model was evaluated on 10 different trials,
performance on the validation set and the test set usually means the
and the P-value was 0.067
end of the study, and fewer efforts were devoted to further
4408 L.Chen et al.

always predicted to interact or not interact with different proteins. the word2vec model for 30 epochs on the large corpus we built be-
These results highlighted the possibility that ligand patterns can mis- fore, protein sequences can be inferred to real-valued 100-dimen-
lead the model. sional vectors.
Sequential feature vectors of proteins were then passed to the en-
coder to learn more abstract representations of proteins. Of note
1.3 Splitting dataset inappropriately here we replaced the original self-attention layers in the encoder
The risk of hidden ligand bias is difficult to eliminate but can be with a relatively simple structure. Considering that conventional
reduced. Usually, machine learning researchers split data into train- transformer architecture usually requires a large training corpus and
ing and test sets at random. However, using conventional classifica- is easy to overfit on small or modestly sized datasets (Qiu et al.,
tion measurements on a randomly split test set, we are not clear 2020), we used a gated convolutional network (Dauphin et al.,
whether the model learns true interaction features or other unex- 2016) with Conv1D and gated linear unit instead because it showed
pected hidden variables, which may produce precise models that an- better performance on our designed datasets. The input to gated
swer the wrong questions (Riley, 2019). Thus, test sets should be convolutional network is a sequence of protein feature vectors. We
designed according to the real goal of modeling and its application compute the hidden layers h0 ; . . . ; hL as Equation 1

Downloaded from https://academic.oup.com/bioinformatics/article/36/16/4406/5840724 by guest on 05 November 2024


scenario.
To address these pitfalls, we proposed a novel transformer neur- hl ðX Þ ¼ ðXW 1 þ sÞbrðXW 2 þ t Þ; (1)
al network named TransformerCPI, constructed new datasets specif-
ic for CPI modeling, and introduced a more rigorous label reversal where X 2 R nm1
, is the input of layer hl , W 1 2 R
km1 m2
,s2R ,
m2

experiments to evaluate whether a data-driven model falls into com- W 2 2 Rkm1 m2 , t 2 Rm2 , are learned parameters, L is the number
mon pitfalls of AI. As a result, TransformerCPI achieved the best of hidden layers, n is the sequence length, m1 , m2 are the dimen-
performance on three public datasets and two label reversal data- sion of input and hidden features, respectively, and k is the patch
sets. Moreover, we further studied the interpretability of size, r is the sigmoid function and  is the element-wise product be-
TransformerCPI to uncover its underlying prediction mechanism by tween matrices (Dauphin et al., 2016). The output of the gated con-
mapping attention weights back to protein sequences and compound volutional network is the final representation of protein sequences,
molecules, and the results also confirmed that the self-attention as shown in Figure 2. In our implementation, L is 3, m1 is 64, m2 is
mechanism of TransformerCPI is useful in capturing the desired 128 and k is 7. The output of the encoder is protein sequence
interaction features. We hope that these findings may raise our at- p1 ; p2 ; . . . ; pb , where b is the length of protein sequence.
tention to improve the generalization and interpretation capability Each of the atom features was initially represented as a vector of
of CPI modeling. size 34 using RDKit python package, and the list of atom features is
summarized in Table 1. We then used GCNs to learn the representa-
tion of each atom by integrating its neighbor atom features.
2 Materials and methods The GCN is originally devised to solve the problem of semisuper-
vised node classification, which can be transferred to solve molecu-
2.1 Model architecture of TransformerCPI lar representation problem. We here denote a graph for a compound
The model we proposed is based on the transformer architecture molecule as G ¼ ðV; EÞ, where V 2 Raf is the set of a atoms in a
(Vaswani et al., 2017), which was originally devised for neural ma- molecule, each represented as a f-dimensional feature vector and E is
chine translation tasks. Transformer is an autoregressive encoder– the set of covalent bonds in a molecule represented as an adjacency
decoder model using a combination of multiheaded attention layers matrix A 2 Raa . The propagation rule is shown in Equation 2:
and positional feed forward to solve sequence-to-sequence (seq2seq)    1 
1
tasks. Recently, transformer architecture achieves great success in ðÞ
H ðlþ1Þ ¼ f H l ; A ¼ r D ~ 2 A
~D ~ 2 H ðlÞ W ðlÞ ; (2)
3
language representation learning task, and many novel and powerful
pre-training models have been established, such as BERT (Devlin where A ~ ¼ A þ I; I is the identity matrix; H ðlÞ 2 Raf is the output
ðlÞ
et al., 2019), GPT-2 , Transformer-XL (Dai et al., 2019) and XLnet of the ‘th layer, W 3 2 Rf f is a weight matrix for the ‘th neural
(Yang et al., 2019). Transformer is also applied in chemical reaction network layer, D ~ 2 Raa is the diagonal node degree matrix of
prediction (Schwaller et al., 2019), however, it is still confined in ~ 2 Raa ; and rðÞ is a nonlinear activation function. In our imple-
A
seq2seq tasks. Inspired by its great ability of capturing features be- mentation, we chose f to be 34, and the number of GCN layers to
tween two sequences, we modified the transformer architecture to be 1. After processed by GCN layer, the atom sequence
predict CPI, regarding compounds and proteins as two kinds of c1 ; c2 ; . . . ; ca is obtained, where a is the number of atoms.
sequences. An overview of the proposed TransformerCPI is shown When protein sequence representation and atom representation
in Figure 2, where we remained the decoder of the transformer and were obtained, we successfully converted proteins and compounds
modified its encoder and final linear layers. into two sequences, which fitted the transformer architecture.
To convert protein sequences into sequential representation, we Interaction features are learned through the decoder of transformer,
first split a protein sequence into an overlapping 3-gram amino acid which consists of self-attention layers and feed forward layers. In
sequence, and then translated all words to real-valued embeddings our work, protein sequence is the input of encoder, while the atom
by the pretraining approach word2vec (Mikolov et al., 2013a,b). sequence is the input of decoder, and the output of decoder is the
Word2vec is an unsupervised technique to learn high-quality distrib- interaction sequence which contains interaction features and has the
uted vector representations that describe sophisticated syntactic and
semantic word relationships, comprising two pretraining technique
called Skip-Gram and Continue Bag-of-Words (CBOW). Skip-Gram Table 1. List of compound atom features
is used to predict a certain word from its context, while CBOW is
used to predict context from a given word. Integrating Skip-Gram Atom type C, N, O, F, P, S, Cl, Br, I, other (one hot)
and CBOW, word2vec can finally map the words to low- Degree of atom 0, 1, 2, 3, 4, 5, 6 (one hot)
dimensional real-valued vectors, where the words that have similar Formal charge 0 or 1
semantics map to the vectors that are close to each other. There Number of radical electrons 0 or 1
have been some works to apply word2vec to represent protein Hybridization type sp, sp2, sp3, sp3d, sp3d2, other (one hot)
sequences (Kimothi et al., 2016; Kobeissy et al., 2015; Mazzaferro Aromatic 0 or 1
and Carlo, 2017; Yang et al., 2018), in which the amino acid se- Number of hydrogen 0, 1, 2, 3, 4 (one hot)
quence of constant length k (k-mers) were split as words and the atoms attached
whole amino acid sequence was regarded as a document. We fol- Chirality 0(False) or 1(True)
lowed these works to preprocess protein sequence, and included all Configuration R, S (one hot)
human protein sequences in UniProt as corpus to pretrain the
word2vec model, and set hidden dimension to 100. After training
TransformerCPI 4409

same length with atom sequence. Given that the order of atom fea- The final interaction feature vector is calculated by weighted
ture vectors has no effect on CPI modeling, we removed positional sum of interaction vectors with attention weights:
embeddings in TransformerCPI.
The key technique in the decoder is multiheaded self-attention X
a
yinteraction ¼ a i xi : (6)
layer. A multiheaded self-attention layer consists of several scaled- i¼1
dot attention layers to extract interaction information between the
encoder and the decoder. The self-attention layer takes three inputs, At last, the final interaction feature vector yinteraction is fed to the
the keys, K, the values, V and the queries, Q, and calculates the at- following fully connected layers, and the probability y^ that a com-
tention as follows: pound interacts with a protein is returned. As a conventional binary
! classification task, we used binary cross entropy loss to train
QK T TransformerCPI model:
attentionðQ; K; V Þ ¼ softmax pffiffiffiffiffi V; (3)
dk  
Loss ¼  ylog^ y þ ð1  yÞlogð1  y^Þ : (7)
where dk is a scaling factor depending on the layer size. This mech-

Downloaded from https://academic.oup.com/bioinformatics/article/36/16/4406/5840724 by guest on 05 November 2024


TransformerCPI was implemented with Pytorch 1.2.0, and the
anism allows the decoder to focus on some crucial parts from the word2vec model was built and trained with Gensim 3.4.0. Whereas
output of encoder dynamically, which directly captures the inter- the base transformer model had six layers with 512 hidden dimension,
action features of the given two sequences. In addition, the original we decreased the number of layers from 6 to 3 and the dimension of
transformer was designed to solve sequence prediction tasks and hidden layers from 512 to 64. The dimension of protein representation,
utilize mask operation to cover the downstream context of each atom representation, hidden layers and yinteraction is 64. We kept the ori-
word in the decoder. Therefore, we modified the mask operation of ginal eight attention heads because this configuration achieved superior
the decoder to ensure that our model is accessible to whole se- generalization ability. For the training, we used the LookAhead (Zhang
quence, which is one of the most crucial modification to transfer et al., 2019) optimizer combined with RAdam (Liu et al., 2019) opti-
transformer architecture from autoregressive task to classification mizer, which solved the most serious convergence problems caused by
task (Fig. 2). Adam optimizer without the learning rate warmup. The learning rate
After extracting interaction features in the decoder, a set of inter- was set to 0.0001, the batch size was set to 8 and the gradients were
action sequence x1 ; x2 ; . . . ; xa are obtained, and the modulus of accumulated over eight batches. All the settings and hyperparameters
each vector is computed as follows: of TransformerCPI are summarized in Table 2.
x0i ¼ k xi k22 ; (4)
2.2 Datasets
where i ¼ 1; 2; 3; . . . ; a. The weight of each vector can be calculated
by softmax function as follows: 2.2.1 Public datasets
We compared our model on three previous benchmark datasets,
exi
0
Human dataset, Caenorhabditis elegans dataset (Tsubaki et al.,
ai ¼ Pa 0 : (5) 2019) and BindingDB dataset (Gao et al., 2018). Human dataset
i¼1 exi
and C.elegans dataset include positive CPI pairs from DrugBank 4.1
(Wishart et al., 2008) and Matador (Gunther et al., 2007) and high-
ly credible negative CPI samples obtained using a systematic screen-
ing framework (Liu et al., 2015). In detail, the human dataset
contains 3369 positive interactions between 1052 unique com-
pounds and 852 unique proteins; the C.elegans dataset contains
4000 positive interactions between 1434 unique compounds and
2504 unique proteins and the training, valid and test sets are ran-
domly split (Tsubaki et al., 2019). BindingDB dataset contains
39 747 positive examples and 31 218 negative examples from a pub-
lic database (Gilson et al., 2016). The training, valid and test sets of
BindingDB are well-designed, and the test set includes CPI pairs
where ligands or proteins are not observed in training set.
Therefore, BindingDB dataset can assess models’ generalization abil-
ity to unknown ligands and proteins.

2.2.2 Label reversal datasets


To construct datasets specifically for chemogenomics-based CPI
modeling, we followed two rules: (i) collecting CPI data from

Table 2. Hyperparameters of TransformerCPI

Name Value

Number of encoder layers 3


Number of decoder layers 3
Dimension of atom representation 64
Number of attention heads 8
FFN inner hidden size 512
Hidden size 64
Patch size 7
Learning rate 1e–4
Weight decay 1e–4
Batch size 8
Dropout 0.2
Fig. 2. Computational graph of TransformerCPI
4410 L.Chen et al.

experimentally validated database; (ii) each ligand should exist in Table 3. Summary of the datasets
both two classes. Many previous studies generated negative samples
by random cross combination of CPI pairs or by using similarity Proteins Compounds Interactions Positive Negative
based approaches, which may introduce unexpected noise and un-
GPCR 356 5359 15 343 7989 7354
noticed bias. Here, compiled negative data that have been experi-
mentally validated. Kinase 229 1644 111 237 23 190 88 047
First, we constructed a GPCR dataset from GLASS database
(Chan et al., 2015). GLASS database provides a great amount of ex-
perimentally validated GPCR–ligand associations (Chan et al.,
2015), which satisfies our first rule. GLASS database used IC50, Ki
and EC50 as the binding affinity values, which were transformed
into negative logarithm, pIC50, pKi and pEC50. Following early
works (Liu et al., 2007; Wan et al., 2019), a threshold of 6.0 was set
to divide original dataset into a positive set and a negative set. Then,

Downloaded from https://academic.oup.com/bioinformatics/article/36/16/4406/5840724 by guest on 05 November 2024


we selected protein–compound pairs that follow our second rule to
construct the final GPCR dataset. Our final GPCR dataset com-
prises 5359 ligands, 356 proteins and 15 343 CPI among them.
Second, we constructed a Kinase dataset based on KIBA dataset
(Tang et al., 2014). KIBA score was developed to combine various
bioactivity types, including IC50, Ki and Kd, and to remove incon-
sistency between different bioactivity types, which greatly reduced
bias in the dataset (Tang et al., 2014). The KIBA dataset contains
467 targets and 52 498 ligands collected from ChEMBL and
STITCH (Szklarczyk et al., 2016), which ensures that data in KIBA
is experimentally validated. Given that the majority of ligands only
occur once, we followed SimBoost (He et al., 2017) to filter original
KIBA dataset to comprise only compounds and proteins with at least
10 interactions, gaining a total of 229 proteins and 2111 com-
pounds. Then, we used the suggested threshold KIBA value of 12.1
(He et al., 2017; Tang et al., 2014) to divide dataset into a positive
set and a negative set, and selected protein–compound pairs where
compounds occur in both positive set and negative set, yielding a
total of 1644 compounds, 229 proteins and 111 237 CPI. Table 3
summarizes GPCR dataset and Kinase dataset we constructed.
As mentioned before, hidden ligand bias may cause a data-
driven model to learn unexpected statistical clues or patterns in
data other than the desired CPI information. To confirm the model
actually learn the interaction features and accurately assess the im-
pact of hidden variables, we proposed a more rigorous label rever-
sal experiment. The schematic illustration of label reversal
experiment is shown in Figure 3A, where a ligand in the training
set appears only in one class of samples (either positive or negative
interaction CPI pairs), while the ligand appears only in the oppos-
ite class of samples in the test set. In this way, the model was forced
to utilize protein information to understand interaction modes and
make opposite predictions for those chosen ligands. If a model
only memorizes the ligand patterns, it is unlikely to make correct
predictions because the ligands it memorizes have the wrong (op-
posite) labels in test set. Therefore, this label reversal experiment is
specifically designed to evaluate chemogenomics-based CPI models
Fig. 3. (A) Schematic illustration of label reversal experiment, where a ligand in the
and is capable of indicating how much influence the hidden ligand training set appears only in one class of samples (either positive or negative inter-
bias has exerted. action CPI pairs), and it appears only in the opposite class of samples in the test set.
For GPCR set and Kinase set, we randomly selected 500 and 300 (B) Data distribution of two datasets. The red line represents where number of posi-
ligands, respectively, and pooled together all the negative CPI sam- tive interactions equals to number of negative interactions. (C) Results of
ples involving these ligands in the test set. Also, we selected another TransformerCPI, CPI–GNN, GraphDTA, GCN and TransformerCPI-ablation on
500 and 300 ligands, respectively, and pooled together all their asso- GPCR valid set, Kinase valid set, GPCR test set and Kinase test set
ciated positive samples in the test set. Under this experiment design, 0 1
we finally established GPCR test set with 1537 interactions and ðiÞ
Npos
Kinase test set with 19 685 interactions. The remaining datasets logratioðiÞ ¼ log10 @ ðiÞ A; i ¼ 1; 2; 3; . . . ; L;
were used to determine the hyperparameters, and the best model Nneg
was selected to evaluate on label reversal experiments. ðiÞ
where Npos is the number of ith ligand’s positive interactions while
ðiÞ
Nneg is the number of ith ligand’s negative interactions, and L is the
total number of ligands. The distributions of GPCR set and Kinase
2.2.3 Data distribution of label reversal datasets set are shown in Figure 3B.
Before training the model, we studied the data distribution of GPCR
set and Kinase set. Since each ligand may occur in multiple positive
and negative classes, representing either interacting or non- 3 Results and discussion
interacting with different proteins, the frequency of occurrence in
positive and negative samples was analyzed. On account of this 3.1 Performance on public datasets
issue, we calculated the log ratio of two classes for each ligand as Many machine learning methods, such as K nearest neighbors
follows to describe the data distribution: (KNN), random forest (RF), L2-logistic (L2), support vector
TransformerCPI 4411

machines (SVM), newly reported sequence-based models CPI–GNN patterns of GPCR dataset may bring in non-negligible influence as
(Tsubaki et al., 2019) and DrugVQA (Zheng et al., 2020) have been ligand bias in CPI–GNN. On Kinase set, TransformerCPI outper-
evaluated on these datasets. GraphDTA (Nguyen et al., 2019) was forms CPI–GNN, GraphDTA and GCN in terms of AUC and PRC
originally designed for regression task, here, we tailor its last layer to and AUC of reference models are all smaller than 0.5, so we argue
binary classification task. It should be noted that models relying on that ligand patterns of Kinase dataset may brought in non-negligible
3D structural information of protein are not compared here, due to influence in all reference models. Moreover, GraphDTA and GCN
the absence of such information for these two datasets. We followed achieved good performance on GPCR dataset, which are close to
the same training and evaluating strategies as CPI–GNN (Tsubaki TransformerCPI, but performed much worse on Kinase set. In com-
et al., 2019) and repeated with three different random seeds followed parison, TransformerCPI achieved the best performance on both
by DrugVQA (Zheng et al., 2020) to evaluate TransformerCPI, and datasets, revealing its robustness and generalization ability. Overall,
Area Under Receiver Operating Characteristic Curve (AUC), preci- these results suggested that our proposed TransformerCPI possesses
sion and recall of each model are shown in Tables 4 and 5. Since the capability of learning interactions between proteins and ligands, and
implementation of KNN, RF, L2, SVM are not mentioned in the lit- the label reversal experiments can effectively assess the impact of
erature (Tsubaki et al., 2019), these models are not compared on hidden ligand bias on models, and, more importantly, the proposed

Downloaded from https://academic.oup.com/bioinformatics/article/36/16/4406/5840724 by guest on 05 November 2024


BindingDB dataset. Area Under Precision Recall Curve (PRC) and modeling scheme is useful in reducing common risks of
AUC of each model are shown in Table 6. TransformerCPI outper- chemogenomics-based CPI tasks.
formed other models on three public datasets.

3.3 The system dependency of models


3.2 Performance on label reversal datasets When comparing the results between GPCR set and Kinase set, it is
We chose CPI–GNN, GraphDTA and GCN as references and com- also of note that TransformerCPI, GraphDTA and GCN perform
pared the performance of TransformerCPI with these models in much better on GPCR set than Kinase set. We argue that there might
terms of AUC and PRC. Figure 3C summarizes the AUC and PRC of be two potential reasons for this performance difference. The first
these models. To conduct fair comparison, each model was thor- one is that the data distribution of GPCR set and Kinase set is differ-
oughly fine-tuned on the same valid sets. As shown in Figure 3C, all ent, resulting in performance gap between the two datasets. The se-
of the models achieved similar performance on both GPCR valid set cond one is that the sequence features of GPCR are relatively easier
and Kinase valid set, however, big performance gaps between these for TransformerCPI to learn.
models were observed on test sets. Although these models have simi- As shown in Figure 3B, the peak of log_ratio distribution of
lar performance on random split valid sets, the knowledge they have GPCR set is located at zero, which means most ligands in GPCR
learned greatly differs from each other, which is exposed by a more set have an equal quantity of interaction pairs and noninteraction
rigorous label reversal experiment. On GPCR set, TransformerCPI pairs. In contrast, the peak of log_ratio distribution of Kinase set is
outperformed CPI–GNN, GraphDTA and GCN both in terms of significantly shifted to -1, indicating that most ligands in Kinase set
AUC and PRC, showing improved power to capture interaction fea- possess almost ten times more noninteraction pairs than inter-
tures between compounds and proteins. Compared with other mod- action pairs. Therefore, the highly unbalanced distribution of posi-
els, CPI–GNN failed to enrich positive samples before negative tive pairs and negative pairs may introduce severe ligand bias to
samples in label reversal experiments, so we argue that ligand the dataset, which might increase the risk that a data-driven model
memorizes ligand patterns, causing inferior prediction perform-
ance on Kinase set.
Table 4. Comparison results of the proposed model and baselines Another potential reason is that the CPI-associated sequence fea-
on human dataset tures of GPCR are easier to learn than those of Kinase. Although
GPCR family shares massive alpha-helix regions and seven trans-
Method AUC Precision Recall
membrane structures, the binding location and binding pockets are
KNN 0.860 0.927 0.798 more diverse across the family, which is relatively easy for models to
learn CPI-associated sequence features to distinguish interaction
RF 0.940 0.897 0.861
pairs and noninteraction pairs. However, compared with GPCR
L2 0.911 0.913 0.867
family, Kinase family shares a more conservative ATP binding
SVM 0.910 0.966 0.969
pocket with fewer different residues. It is challenging for models to
GraphDTA 0.960 6 0.005 0.882 6 0.040 0.912 6 0.040
distinguish interaction and non-interaction pairs since the model has
GCN 0.956 6 0.004 0.862 6 0.006 0.928 6 0.010 to learn to detect and understand the minor changes on protein
CPI–GNN 0.970 0.918 0.923 sequences. Furthermore, the system dependency of TransformerCPI
DrugVQA (VQA-seq)a 0.964 6 0.005 0.897 6 0.004 0.948 6 0.003 also informs us that there is still room for improvement in
TransformerCPI 0.973 6 0.002 0.916 6 0.006 0.925 6 0.006 chemogenomics-based CPI prediction, especially the representation
of protein sequences.
a
It should be noted that DrugVQA uses protein structural information as
input, while its alternative version VQA-seq only using protein sequence in-
formation is listed here for a fair comparison. 3.4 Model ablation study
Previous chemogenomics-based CPI models extract ligand and pro-
tein features separately and independently, and then concatenate
Table 5. Comparison results of the proposed model and baselines these two feature vectors as input features. To validate the role of
on C.elegans dataset the transformer encoder–decoder architecture, we next evaluated
the TransformerCPI-ablation model which replaces transformer
Method AUC Precision Recall
Table 6. Comparison results of the proposed model and baselines
KNN 0.858 0.801 0.827
on BindingDB dataset
RF 0.902 0.821 0.844
L2 0.892 0.890 0.877 Method AUC PRC
SVM 0.894 0.785 0.818
GraphDTA 0.974 6 0.004 0.927 6 0.015 0.912 6 0.023 GraphDTA 0.929 0.917
GCN 0.975 6 0.004 0.921 6 0.008 0.927 6 0.006 GCN 0.927 0.913
CPI–GNN 0.978 0.938 0.929 CPI–GNN 0.603 0.543
TransformerCPI 0.988 6 0.002 0.952 6 0.006 0.953 6 0.005 TransformerCPI 0.951 0.949
4412 L.Chen et al.

decoder by conventional vector concatenation on the same label re- to different proteins. This result also explains why TransformerCPI
versal experiment. As shown in Figure 3C, this ablation procedure shows better performance on label reversal experiments. Dynamic fea-
significantly compromised the performance of TransformerCPI on ture extraction of TransformerCPI based on a specific protein context
both GPCR set and Kinase set, demonstrating that self-attention helps the model extract the key information of the interaction, while
mechanism together with encoder–decoder architecture indeed plays also reducing the probability of hidden ligand bias. Moreover, the de-
a key role in extracting CPI features between two types of coder of TransformerCPI integrates the features of protein sequences
sequences. and compound atoms dynamically to form direct interaction features,
which is similar to language translation task and agrees well with the
binding process of ligands to proteins.
3.5 Model interpretation
To further verify the meaning of the attention weights of atoms,
Although deep learning is known as a black box algorithm, it is es-
we selected the compound phenothiazine to show the interpretation
sential to understand how the model makes a prediction, and
of TransformerCPI. Phenothiazine is a classic antipsychotic drug tar-
whether the model can provide suggestions or guidance for opti-
geted on dopamine receptor, and its structure–activity relationship
mization. Due to the transformer architecture and self-attention
(SAR) has been thoroughly explored. As illustrated in Figure 4B, the

Downloaded from https://academic.oup.com/bioinformatics/article/36/16/4406/5840724 by guest on 05 November 2024


mechanism, it provides easy access for understanding the mechan-
atoms of phenothiazine highlighted by attention weights agree well
ism behind the model through attention weights of protein sequen-
with the SAR of phenothiazine, which confirms that our model is
ces and compound atoms.
capable of catching true interaction features and finding out key
As illustrated in Figure 4A, the attention weights were mapped
atoms interacting with proteins. This information of great value
to compound atoms to reveal the knowledge TransformerCPI has
assists medicinal chemists to speculate the potential SAR of the tar-
learned. TransformerCPI pays attention to different atoms when fac-
get molecule and may offer useful guidance to further structural
ing different compound protein pairs and correctly classify the com-
optimization.
pound protein pairs into two classes, interaction and
After interpreting atom-level attention mechanism, we also
noninteraction.
studied the attention weights of the protein sequences to see which
Here, TransformerCPI generates different ligand features corre-
parts of the protein sequences became the focus of attention. As a re-
sponding to different proteins, which is consistent with the fact that
sult, TransformerCPI may roughly speculate whether the binding
binding modes of the ligand are different when interacting with differ-
site of the ligand to the GPCR family is in the extracellular region or
ent proteins. Therefore, it is difficult for TransformerCPI to memorize
in the transmembrane region, and detect the ATP-binding pocket of
ligand patterns because the ligand features are changing with regard
Kinase family. We selected histamine H1 receptor, 5-HT1B receptor
and mitogen-activated protein kinase 8 (MAPK8) with their corre-
sponding actives as examples.
As shown in Figure 5, TransformerCPI successfully localized the
binding site of the ligand to histamine H1 receptor in the transmem-
brane region, the binding site of the ligand to 5-HT1B receptor in the
extracellular region, and detected ATP-binding pocket of MAPK8,
which further verifies that TransformerCPI has learned biological
knowledge and gained structural insights. These results suggested
that TransformerCPI can speculate whether a new compound binds
to the transmembrane region or the extracellular region of a GPCR
target, which is useful in drug design especially when 3D structure
of the GPCR target is unknown. Meanwhile, we may also notice
that the highlighted region involves more extensive regions, and
does not correspond to the exact binding site residues. To address
this issue, more high-quality data with precise annotation need to be
incorporated, and new sequence-based deep representation learning
may be also of help for better encoding and decoding the structural
information. For example, a new representation scheme recently
proposed by Alley et al. (2019) has demonstrated efficiency im-
provement for studying protein sequences.
Overall, there is still a long way for chemogenomics-based CPI
to go, and we hope that this work can raise our attention to prob-
lems of chemogenomics-based CPI modeling and provide useful
guidance for further study. In addition, experiment design plays an
import role in deep learning, and more efforts should be directed to-
Fig. 4. Attention weights of atoms in different compounds. The atoms, which have
ward evaluating what a deep learning model has really learned. In
high attention scores extracted from TransformerCPI, are highlighted in red. (A) A
certain ligand shows different attention score distributions when interacting with
this way, not only the new deep learning approaches but also new
Histamine H1 receptor and 5-HT2C receptor, respectively. (B) The SAR of pheno- validation strategies and experiment designs should be emphasized
thiazine and its attention scores in future development of deep learning.

Fig. 5. Attention weights of protein sequences. The regions in proteins, which have high attention weights extracted from TransformerCPI, are highlighted in purple. (A)
Attention weight of histamine H1 receptor (PDB: 3RZE). (B) Attention weight of 5-HT1B receptor (PDB: 4IAQ). (C)Attention weight of MAPK8 (PDB: 1UKI)
TransformerCPI 4413

4 Conclusion Gilson,M.K. et al. (2016) BindingDB in 2015: a public database for medicinal
chemistry, computational chemistry and systems pharmacology. Nucleic
In this work, a transformer architecture with self-attention mechan- Acids Res., 44, D1045–1053.
ism was modified to address sequence-based CPI classification task, Gonen,M. (2012) Predicting drug–target interactions from chemical and gen-
resulting in a model named TransformerCPI showing high perform- omic kernels using Bayesian matrix factorization. Bioinformatics, 28,
ance on three benchmark datasets. Intriguingly, we compared it 2304–2310.
with previous reported CPI models and conventional machine Gunther,S. et al. (2007) SuperTarget and Matador: resources for exploring
learning-based control models, and noticed that most of these mod- drug–target relationships. Nucleic Acids Res., 36, D919–D922. (Database
els yielded impressive results on those benchmark tests. Given the issue).
challenging nature of CPI prediction, we argue that these models Hamanaka,M. et al. (2017) CGBVS-DNN: prediction of compound–protein
might face potential pitfalls of deep learning. To address these po- interactions based on deep learning. Mol. Inform., 36, 1–2.
tential risks, we constructed new datasets specific for He,T. et al. (2017) SimBoost: a read-across approach for predicting drug–tar-
chemogenomics-based CPI task, and designed more rigorous label get binding affinities using gradient boosting machines. J. Cheminform., 9,
reversal experiments as new measurements for chemogenomics- 24.
Jacob,L. and Vert,J.P. (2008) Protein–ligand interaction prediction: an

Downloaded from https://academic.oup.com/bioinformatics/article/36/16/4406/5840724 by guest on 05 November 2024


based CPI modeling. Compared with other models, TransformerCPI
improved chemogenomics approach. Bioinformatics, 24, 2149–2156.
achieved significantly improved performance on the new experi-
Karimi,M. et al. (2019) DeepAffinity: interpretable deep learning of com-
ments, suggesting it can learn desired interaction features and de-
pound–protein affinity through unified recurrent and convolutional neural
crease the risk of hidden ligand bias. Finally, model interpretation
networks. Bioinformatics, 35, 3329–3338.
capability was studied through mapping attention weights to protein Kimothi,D. et al. (2016) Distributed Representations for Biological Sequence
sequences and compound atoms, which could help us determine Analysis. In, arXiv e-prints. 2016. p. arXiv:1608.05949.
whether a prediction is reliable and having physical significance. Kipf,T. and Welling,M. (2016) Semi-supervised classification with graph con-
Overall, TransformerCPI provides access to model interpretation volutional networks. In, arXiv e-prints. 2016. p. arXiv:1609.02907.
and contributes chemical biology studies with useful guidance for Kobeissy,F.H. et al. (2015) Continuous distributed representation of bio-
further ligand structural optimization. logical sequences for deep proteomics and genomics. PLoS One, 10,
e0141287.
Lee,I. et al. (2019) DeepConv-DTI: prediction of drug–target interactions via
deep learning with convolution on protein sequences. PLoS Comput. Biol.,
Funding 15, e1007129.
Liu,H. et al. (2015) Improving compound–protein interaction prediction by
This work was supported by the National Natural Science Foundation of China building up highly credible negative samples. Bioinformatics, 31,
(81773634 to M.Z.), National Science & Technology Major Project ‘Key New i221–i229.
Drug Creation and Manufacturing Program’, China (Number: Liu,L. et al. (2019) On the variance of the adaptive learning rate and beyond.
2018ZX09711002 to H.J.) and ‘Personalized Medicines—Molecular Signature- In, arXiv e-prints. 2019. p. arXiv:1908.03265.
based Drug Discovery and Development’, Strategic Priority Research Program Liu,T. et al. (2007) BindingDB: a web-accessible database of experimentally
determined protein–ligand binding affinities. Nucleic Acids Res., 35,
of the Chinese Academy of Sciences (XDA12050201 to M.Z.).
D198–D201.
Conflict of Interest: none declared. Mazzaferro,C. (2017) Predicting protein binding affinity with word
embeddings and recurrent neural networks. http://dx.doi.org/10.11
01/128223.
Mikolov,T. et al. (2013a) Efficient estimation of word representations in vec-
References tor space. In, arXiv e-prints. 2013. p. arXiv:1301.3781.
Alley,E.C. et al. (2019) Unified rational protein engineering with Mikolov,T. et al. (2013b) Distributed representations of words and phrases
sequence-based deep representation learning. Nat. Methods, 16, and their compositionality. Adv. Neural Inform. Process. Syst., 26,
1315–1322. 3111–3119.
Bleakley,K. and Yamanishi,Y. (2009) Supervised prediction of drug–target Mysinger,M.M. et al. (2012) Directory of useful decoys, enhanced (DUD-E):
interactions using bipartite local models. Bioinformatics, 25, 2397–2403. better ligands and decoys for better benchmarking. J. Med. Chem., 55,
Bredel,M. and Jacoby,E. (2004) Chemogenomics: an emerging strategy for 6582–6594.
rapid target and drug discovery. Nat. Rev. Genet., 5, 262–275. Nguyen,T. et al. (2019) GraphDTA: prediction of drug–target binding affinity
Chan,W.K.B. et al. (2015) GLASS: a comprehensive database for experimen- using graph convolutional networks. bioRxiv : doi: http://dx.doi.org/10.
tally validated GPCR–ligand associations. Bioinformatics (Oxford, 1101/684662.
England), 31, 3035–3042. Ozturk,H. et al. (2018) DeepDTA: deep drug–target binding affinity predic-
Chen,L. et al. (2019) Hidden bias in the DUD-E dataset leads to misleading tion. Bioinformatics, 34, i821–i829.
performance of deep learning in structure-based virtual screening. PLoS Öztürk,H. et al. (2019) WideDTA: prediction of drug–target binding affinity.
One, 14, e0220113. In, arXiv e-prints. 2019. p. arXiv:1902.04166.
Cheng,F. et al. (2012) Prediction of chemical–protein interactions: Qiu,X. et al. (2020) Pre-trained Models for Natural Language Processing: A
multitarget-QSAR versus computational chemogenomic methods. Mol. Survey. In, arXiv e-prints. 2020. p. arXiv:2003.08271.
Biosyst., 8, 2373–2384. Riley,P. (2019) Three pitfalls to avoid in machine learning. Nature, 572,
Dai,Z. et al. (2019) Transformer-XL: Attentive Language Models Beyond a 27–29.
Fixed-Length Context. In Proceedings of the 57th Annual Meeting of the Scarselli,F. et al. (2009) The graph neural network model. IEEE Trans. Neural
Association for Computational Linguistics, pp. 2978–2988. Association for Netw., 20, 61–80.
Computational Linguistics. Florence, Italy. Schwaller,P. et al. (2019) Molecular Transformer: A Model for
Dauphin,Y. et al. (2016) Language modeling with gated convolutional net- Uncertainty-Calibrated Chemical Reaction Prediction. ACS Central
works. In Proceedings of the 34th International Conference on Machine Science., 5, 1572–1583.
Learning, JMLR.org, Sydney, NSW, Australia, pp. 933–941. Sieg,J. et al. (2019) In need of bias control: evaluating chemical data for ma-
Devlin,J. et al. (2019) BERT: Pre-training of Deep Bidirectional Transformers chine learning in structure-based virtual screening. J. Chem. Inf. Model., 59,
for Language Understanding. In Proceedings of the 2019 Conference of the 947–961.
North American Chapter of the Association for Computational Linguistics: Szklarczyk,D. et al. (2016) STITCH 5: augmenting protein–chemical inter-
Human Language Technologies, Volume 1 (Long and Short Papers), pp. action networks with tissue and affinity data. Nucleic Acids Res., 44,
4171–4186. Association for Computational Linguistics, Minneapolis, D380–D384.
Minnesota. Tang,J. et al. (2014) Making sense of large-scale kinase inhibitor bioactivity
Gao,K. et al. (2018) Interpretable drug target prediction using deep neural rep- data sets: a comparative and integrative analysis. J. Chem. Inf. Model., 54,
resentation. In Proceedings of the Twenty-Seventh International Joint 735–743.
Conference on Artificial Intelligence, AAAI Press, Stockholm, Sweden. pp. Tian,K. et al. (2016) Boosting compound–protein interaction prediction by
3371–3377. deep learning. Methods, 110, 64–72.
4414 L.Chen et al.

Tsubaki,M. et al. (2019) Compound–protein interaction prediction with Wang,Y. and Zeng,J. (2013) Predicting drug–target interactions using
end-to-end learning of neural networks for graphs and sequences. restricted Boltzmann machines. Bioinformatics, 29, i126–i134.
Bioinformatics, 35, 309–318. Wishart,D.S. et al. (2008) DrugBank: a knowledgebase for drugs, drug actions
Vamathevan,J. et al. (2019) Applications of machine learning in drug discov- and drug targets. Nucleic Acids Res., 36, D901–D906.
ery and development. Nat. Rev. Drug Discov., 18, 463–477. Yamanishi,Y. et al. (2008) Prediction of drug–target interaction networks
van Laarhoven,T. et al. (2011) Gaussian interaction profile kernels for predict- from the integration of chemical and genomic spaces. Bioinformatics, 24,
ing drug–target interaction. Bioinformatics, 27, 3036–3043. i232–i240.
Vaswani,A. et al. (2017) Attention Is All You Need. In Advances in Neural Yang,K.K. et al. (2018) Learned protein embeddings for machine learning.
Information Processing Systems, pp. 6000-6010. Long Beach, California, Bioinformatics, 34, 2642–2648.
USA, Curran Associates Inc. Yang,Z. et al. (2019) XLNet: Generalized Autoregressive Pretraining for
Wan,F. and Zeng,J. (2016) Deep learning with feature embedding for com- Language Understanding. In Proceedings of the 57th Conference of the
pound–protein interaction prediction. bioRxiv doi:10.1101/086033. Association for Computational Linguistics, ACL 2019, Florence, Italy, July
Wan,F. et al. (2019) DeepCPI: a deep learning-based framework for 28-August 2, 2019, Volume 1: Long Papers, pp. 2978–2988. Association
large-scale in silico drug screening. Genomics Proteomics Bioinf., 17, for Computational Linguistics. Florence, Italy.
478–495. Zhang,M. et al. (2019) Lookahead optimizer: k steps forward, 1 step back. In,

Downloaded from https://academic.oup.com/bioinformatics/article/36/16/4406/5840724 by guest on 05 November 2024


Wang,F. et al. (2011) Computational screening for active compounds target- arXiv e-prints. 2019. p. arXiv:1907.08610.
ing protein sequences: methodology and experimental validation. J. Chem. Zheng,S. et al. (2020) Predicting drug–protein interaction using quasi-visual
Inf. Model., 51, 2821–2828. question answering system. Nat. Mach. Intell., 2, 134–140.

You might also like