protein-protein-interaction-prediction-via-structure-based-2teh5onh
protein-protein-interaction-prediction-via-structure-based-2teh5onh
Abstract
Protein-protein interactions (PPIs) play an essential role in life activities. Many
machine learning algorithms based on protein sequence information have been
developed to predict PPIs. However, these models have difficulty dealing with various
sequence lengths and suffer from low generalization and prediction accuracy. In this
study, we proposed a novel end-to-end deep learning framework, RSPPI, combining
Residual Neural Network (ResNet) and Spatial Pyramid Pooling (SPP), to predict PPIs
based on the protein sequence physicochemistry properties and spatial structural
information. In the RSPPI model, ResNet was employed to extract the structural and
physicochemical information from the protein 3D structure and primary sequence; the
SPP layer was used to transform feature maps to a single vector and avoid the fixed-
length requirement. The RSPPI model possessed excellent cross-species performance
and outperformed several state-of-the-art methods based either on protein sequence or
gene ontology in most evaluation metrics. The RSPPI model provides a novel strategic
direction to develop an AI PPI prediction algorithm.
Keywords: protein-protein interactions, deep learning, Residual Neural Network,
Spatial Pyramid Pooling, cross-species prediction
Introduction
Proteins serve as the basic building blocks playing specific roles in living cells, such as
cell adhesion, signal transduction, post-translational modification, etc. [1, 2]. Generally,
a biological process is accomplished by the cooperation of multiple proteins through
transient protein-protein interactions (PPIs) [3]. High-throughput experimental
methods have been widely used to determine PPIs, such as yeast two-hybrid screens
[4], protein chips [5], and surface plasmon resonance [6]. However, experimental
methods remain expensive, labor-intensive, and time-consuming. Therefore, the
computational method has emerged in PPI studies. With the advance of machine
learning and the accumulation of tremendous knowledge of protein and PPI information,
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
large numbers of computational methods have been developed for predicting PPIs
based on various data types, such as protein sequence [7] and protein secondary
structure [8, 9].
The primary structures of the majority of proteins have been sequenced and stored in
the UniProt database [10]. Thus, there is a longstanding interest in using sequence-
based methods to model and predict protein interactions [11-13]. Several sequence-
based methods have been developed to predict PPIs. For instance, Hashemifar et al. [14]
proposed a deep learning model, DPPI, using a Convolutional Neural Network (CNN)
and random projection modules to predict PPIs. DPPI achieved high prediction
accuracy and could predict homodimeric interactions. Yang et al. [15] proposed a model
based on a CNN architecture and protein evolutionary profiles. Their model showed
superior performance and outperformed several other human-virus PPI prediction
methods. However, the models based on the protein sequence showed low ability in
generalizability [15, 16].
In addition, proteins have a higher chance to interact when localized in the same cellular
component, or when sharing a common biological process or molecular function.
Accordingly, several methods predict PPIs from the gene ontology (GO) annotations
and semantic similarity of proteins [17]. Armean et al. [18] combined GO annotations
with SVM for PPI prediction and other models employed in PPI prediction tasks
including Bayesian classifiers [19] and random forest [20].
The PPI is mediated by non-covalent interactions, including electrostatic interaction,
hydrophobic interaction, and hydrogen bonding, etc. In addition, the PPI is regulated
by the fitness of the protein surfaces. Therefore, the structure of the protein and the
physicochemical property of the surface amino acids together play an important role in
PPI. However, the sequence-based and GO-based methods are incapable of
incorporating protein spatial structure information. Thus, the structure-based model is
needed for better learning PPI features. Currently, there are a few structure-based
models for predicting PPIs. For example, Struct2Net [21] threads protein pair
sequences to the protein complex in the Protein Data Bank (PDB) and searches for the
potential match, and then predicts the PPI [22]. Cai et al. proposed a support vector
machine (SVM) model to predict PPIs by analyzing the protein secondary structure [23].
Regrettably, these models based on protein structures required the corresponding
homologous templates in PDB or ignored the global structure and physicochemical
properties of proteins.
Herein, we designed a feature-extracting strategy, which converts 1D physicochemical
properties and 3D structure information to 2D feature maps. The 2D feature maps were
then fed to a newly proposed PPI deep learning framework, RSPPI. The RSPPI
incorporated three ResNet blocks [24], an SPP layer [25], and a fully connected layer.
The ResNet was used to extract the structure and physicochemical information; the SPP
layer [25] was used to handle the problem of variable length of protein sequences; the
fully connected layer was used to fuse the pairwise protein feature vectors and give the
prediction of the interaction. Our model bridged sequence and structure information
and transformed these features into 2D maps, which could be trained with ResNet. As
RSPPI is based on structure and physicochemical features, the model does not rely on
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
the homologous template of the PDB database. Moreover, thanks to the SPP layer our
model does not need to preprocess the protein sequence into a fixed length, which may
potentially introduce artifacts. These advantage of the model guarantees the outstanding
performance and excellent cross-species predicting capability of the RSPPI model.
Overview
We introduce an end-to-end deep learning framework, RSPPI, for identifying PPIs. The
PPI prediction task is a binary classification problem. The amino acid physicochemical
properties (including hydrophobicity and charge) and protein spatial structure
(represented by distance map) were used as input in our RSPPI model. The model
includes two parts: the data-preprocessing module and the prediction module (Fig. 1).
In the data-preprocessing module, the 2D feature matrixes were generated from the
physicochemical properties and spatial structure information of the protein. The 2D
distance map, hydrophobic map, and charge map can be considered as multiple
channels of an image with protein features. Inspired by the image recognition
algorithms, we designed and trained a deep learning network. In the prediction module,
the ResNet combined with an SPP layer was employed to learn the PPI features. Then,
the pairwise protein representations are concatenated and fed into the designed fully
connected layer to predict the interaction probability of the protein pairs.
atom distance between the ith and jth residue, and cutoff distance d0 is set to 4 Å. Any
value greater than the cutoff was set to 0 [29]. Therefore, the values of the adjacency
matrix were in the range of 0 to 1.
The physicochemical properties of amino acids affect inter-protein interactions, such as
their hydrophobicity and electrostatic charge. Hydrophobicity is a measure of how
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
strongly the side chains of the amino acids in an aqueous environment are excluded by
the water molecules. The hydrophobic amino acids of a protein tend to form
hydrophobic interactions to reduce the interface with water or other polar molecules.
The hydrophobicity is usually characterized by the hydrophobic moments (HM),
ranging from -7.5 for Arg to 3.1 for Ile. The larger value HM indicates the amino acid
possesses a higher hydrophobicity. As the hydrophobic interaction depends on the
hydrophobicity of a pair of amino acids, the hydropathy compatibility index (HCI) was
used to evaluate the hydrophobic interaction of amino acid pairs [30]. The HCIs of any
two residues were calculated as:
HCI = 20 − HM (i ) − HM ( j ) 19 /10.6 (1) [30]
The electrostatic charges of amino acids define the electrostatic interactions. The
residues with opposite charges attract each other, while residues with the same charge
repel each other. A simple characterization of electrostatic properties is the isoelectric
point (pI). The net charge of an amino acid varies with the pH of the environment owing
to the gain or loss of protons. PI is the pH at which the net charge of the amino acid is
0. For instance, the negatively charged amino acids, Asp and Glu have PIs of 2.7 and
3.2, respectively, and the positively charged amino acids, His, Lys, and Arg, have PIs
of 7.5, 9.7, and 10.7, respectively. The rest of the amino acids have PIs at ~6, so they
are considered to carry no net charge at the physiological condition. Similar to the HCL
of the hydrophobicity, the charge compatibility index (CCI) was used to evaluate the
electrostatic interaction of amino acid pairs [30]. The CCIs of any two residues were
calculated as:
CCI = 11 − PI (i) − 7 PI ( j) − 7 19 / 33.8 (2) [30]
With the HCI and CCI, two more physicochemical feature maps of the protein surface
shell could be constructed. Similar to the distance map, 2D hydrophobicity and
electrostatic matrixes of the protein surface shell were built by assigning the
corresponding element of the matrixes with the HCI and CCI value.
to the output F'(x). In contrast, thanks to the padding operation, the number of output
channels and the size of the feature maps remained unchanged in the identity ResNet
block. Therefore, the original x is added to the output F(x) at the end of the identity
ResNet block. A kernel size of 3 × 3 was used in all the convolution operations. The
filter number of the convolution operations in each ResNet layer was set to 16, 32, and
64, respectively. The ELU activation function was used as the activation layer because
the ELU activation function is more effective than the standard ReLU in the ResNet
[32]. Consequently, the feature maps of output changed from 3 × L × L to 64 × (L/8) ×
(L/8), where L is the protein sequence length.
We compute the difference and product of Z1 and Z 2 to fuse the two protein features
mul = Z1 Z 2 (5)
where diff indicates the element-wise difference and mul indicates the element-wise
product. Then, the vector representations of diff and mul were concatenated to represent
the pairwise protein feature, then the concatenated vector with 3840 dimensions fed
into a fully connected layer. The fully connected layer stacks three linear layers
followed by three ReLU activation functions, respectively, and an output layer followed
by the Sigmoid activation function. The lengths of the linear layers and output layer
were set to 3840, 960, 480, and 1 (Fig 5). BCELoss was used as a loss function to train
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
where N is the total number of protein–protein samples in the training dataset. yi and
xi , respectively, were the true target and the predicted score of the ith sample.
Dataset preparation
As the PPI is the training purpose, the dimers stored in the Protein Data Bank (PDB)
were collected to build the training dataset. The species with fewer structures were
screened out. Thus, the dimer complexes from six species, including Homo sapiens,
Mus musculus, E. coli, Bacteria, Eukaryota, and S. cerevisiae were kept in the datasets
for training. The dimer datasets were then further screened by using sequence alignment.
All the redundant proteins with over 50% sequence identity were removed from the
dataset. In addition, the small proteins with lengths less than 50 amino acids were
removed as well. The filtered protein dimer complexes were used as positive samples.
Negative samples were generated by randomly pairing the chains. To maintain the
balance of the training data, the sizes of the negative datasets were set to the same as
the positive datasets. With the aforementioned filtering criteria and random pairing
strategy, six single-species datasets and a mixed-species (including all six species)
dataset were constructed (Table 1).
Model implementation
The training was carried out by using PyTorch in Python. The RSPPI model was trained
for 200 epochs using the Adam optimizer. The learning rate was set to 0.001 initially
which exponentially decays by a gamma of 0.98 every epoch. The batch size was set to
1 during training. To avoid overfitting, the dropout technique was used, and the weight
decay was set to 0.03 during training.
Evaluation standard
The performance of the models was evaluated by classification accuracy, precision,
sensitivity, F1-score (F1), and Matthews correlation coefficient (MCC). These
parameters are defined below:
TP + TN
Accuracy = (7)
TP + TN + FP + FN
TP
Precision = (8)
TP + FP
TP
Sensitivity (recall ) = (9)
TP + FN
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
2 precision recall
F1 = (10)
precision + recall
TP TN − FP FN
MCC = (11)
(TP + FP )(TP + FN )(TN + FP )(TN + FN )
where TP, TN, FP, and FN stand for true positive, true negative, false positive, and false
negative, respectively. A Receiver Operating Characteristic (ROC) curve indicates the
performance of a classification model at all classification thresholds. The Area Under
the ROC Curve (AUC) was also used to evaluate the model. In total, the above six
evaluation specs were used to evaluate the RSPPI model.
Results
From the ablation study, it was demonstrated the importance of the spatial structural
information and SPP layer. Then the RSPPI was applied to the cross-species dataset and
six single-species datasets. The trained model showed generally high precision both in
the same species and cross species.
AUC and MCC. The RSPPI outperformed KNN_LD, SVM_LD, PPI_MetaGO, and
GO2ppi_RF by 15.7%, 2.4%, 2.1%, and 2.0% in AUC, respectively, and outperformed
PPI_MetaGO and GO2ppi_RF by 5.9% and 5.1% in MCC, respectively. In other
metrics, RSPPI was one of the top 2 models, slightly behind SVM_LD in accuracy,
sensitivity, and F1, and slightly behind KNN_LD in precision. Although SVM_LD
ranked top 1 in three metrics, its performance in precision is the lowest among all
models. Similarly, the top model, KNN_LD, concerning the precision, exhibited the
worst performance in all the other metrics. Overall, our model performance is
adequately good and well-balanced.
Moreover, KNN_LD, SVM_LD, and three additional models, PCA_EELM [12],
SVM_ACC [13], and SVM_PIRB [23] have been trained on S. cerevisiae. To further
compare our model to these models, we summarized the evaluation metrics of our
RSPPI model and the aforementioned five state-of-the-art models trained on S.
cerevisiae side-by-side (Fig. 11). Among these models, KNN_LD, SVM_LD,
PCA_EELM [12], and SVM_ACC [13] were based on the primary sequence of proteins,
while SVM_PIRB [23] was based on the secondary structure. On the S. cerevisiae
dataset, RSPPI led in both accuracy and precision and even showed great advantages
in precision over all the other models, and was only left behind in sensitivity and AUC.
The accuracy of RSPPI outperformed KNN_LD, SVM_LD, PCA_EELM, SVM_ACC,
and SVM_PIRB by 4.0%, 1.6%, 3.2%, 0.9%, and 2.2%, respectively. By integrating
physicochemical features and global spatial structure information, RSPPI achieved the
best performance in most evaluation metrics.
Conclusion
In this paper, we propose RSPPI, a novel end-to-end deep learning method to predict
PPIs. RSPPI learned from comprehensive feature profiles of proteins based on their
physicochemical properties and spatial structure information. Systematic evaluations
showed that RSPPI achieved good performance and high precision in both intraspecies
and cross-species PPI prediction. The excellent performance on cross-species PPI
prediction suggests the RSPPI has the potential to predict the less studied species based
on current knowledge. As the structure and physicochemical compatibility are the basis
of non-covalent PPIs, our RSPPI model based on spatial structure and physicochemical
information showed better performance in most evaluation metrics, comparing to the
sequence-based, GO-based, and secondary structure-based methods. In addition, by
utilizing the Spatial Pyramid Pooling layer, RSPPI avoided the fixed-length
requirement for the fully connected layer, thus reducing the artifact from padding or
cropping. The RSPPI model provides a new strategy for PPI prediction by combining
the spatial and physicochemical features of the protein.
References
1. Parsons JT, Horwitz AR, Schwartz MA. Cell adhesion: integrating cytoskeletal
dynamics and cellular tension, Nat Rev Mol Cell Biol 2010;11:633-643.
2. Huttlin EL, Bruckner RJ, Paulo JA et al. Architecture of the human interactome
defines protein communities and disease networks, Nature 2017;545:505-509.
3. Perkins JR, Diboun I, Dessailly BH et al. Transient protein-protein interactions:
structural, functional, and network properties, Structure 2010;18:1233-1243.
4. Ito T, Chiba T, Ozawa R et al. A comprehensive two-hybrid analysis to explore the
yeast protein interactome, Proceedings of the National Academy of Sciences
2001;98(8):4569-4574.
5. Tong AHY, Drees B, Nardelli G et al. A Combined Experimental and
Computational Strategy to Define Protein Interaction Networks for Peptide
Recognition Modules, Science 2002;295(5553):321-324.
6. Williams C, Addona TA. The integration of SPR biosensors with mass
spectrometry: possible applications for proteome analysis, Trends in Biotechnology
2000;18(2):45-48.
7. Yang R, Zhang C, Gao R et al. An ensemble method with hybrid features to identify
extracellular matrix proteins, PLoS One 2015;10:e0117804.
8. Wang J, Li Y, Liu X et al. High-accuracy prediction of protein structural classes
using PseAA structural properties and secondary structural patterns, Biochimie
2014;101:104-112.
9. Zhang S, Liang Y, Yuan X. Improving the prediction accuracy of protein structural
class: approached with alternating word frequency and normalized Lempel-Ziv
complexity, J Theor Biol 2014;341:71-77.
10. UniProt C. UniProt: the universal protein knowledgebase in 2021, Nucleic Acids
Res 2021;49:D480-D489.
11. Zhou YZ, Gao Y, Zheng YY. Prediction of Protein-Protein Interactions Using Local
Description of Amino Acid Sequence, Advances in Computer Science and Education
Applications. Springer, Berlin, Heidelberg 2011:254-262.
12. You Z-H, Lei Y-K, Zhu L et al. Prediction of protein-protein interactions from
amino acid sequences with ensemble extreme learning machines and principal
component analysis, BMC Bioinformatics 2013;14(8):1-11.
13. Guo Y, Yu L, Wen Z et al. Using support vector machine combined with auto
covariance to predict protein-protein interactions from protein sequences, Nucleic
Acids Res 2008;36:3025-3030.
14. Hashemifar S, Neyshabur B, Khan AA et al. Predicting protein-protein interactions
through sequence-based deep learning, Bioinformatics 2018;34:i802-i810.
15. Wuchty S, Zhang Z, Yang X. Multi-scale Convolutional Neural Networks for the
Prediction of Human-virus Protein Interactions. Proceedings of the 13th International
Conference on Agents and Artificial Intelligence. 2021, 41-48.
16. Yang L, Han Y, Zhang H et al. Prediction of Protein-Protein Interactions with Local
Weight-Sharing Mechanism in Deep Learning, Biomed Res Int 2020;2020:5072520.
17. Jain S, Bader GD. An improved method for scoring protein-protein interactions
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
using semantic similarity within the gene ontology, BMC Bioinformatics 2010;11(1).
18. Armean IM, Lilley KS, Trotter MWB et al. Co-complex protein membership
evaluation using Maximum Entropy on GO ontology and InterPro annotation,
Bioinformatics 2018;34:1884-1892.
19. Patil A, Nakamura H. Filtering high-throughput protein-protein interaction data
using a combination of genomic features, BMC Bioinformatics 2005;6:100.
20. Maetschke SR, Simonsen M, Davis MJ et al. Gene Ontology-driven inference of
protein-protein interactions using inducers, Bioinformatics 2012;28:69-75.
21. Singh R, Park D, Xu J et al. Struct2Net: a web service to predict protein-protein
interactions using a structure-based approach, Nucleic Acids Res 2010;38:W508-515.
22. Sussman JL, Lin D, Jiang J et al. Protein Data Bank (PDB): Database of Three-
Dimensional Structural Information of Biological Macromolecules, Acta
Crystallographica Section D: Biological Crystallography 1998;54(6):1078-1084.
23. Cai L, Pei Z, Qin S et al. Prediction of Protein-Protein Interactions in
Saccharomyces cerevisiae Based on Protein Secondary Structure. 2012 International
Conference on Biomedical Engineering and Biotechnology. 2012, 413-416.
24. He K, Zhang X, Ren S et al. Deep Residual Learning for Image Recognition,
Proceedings of the IEEE conference on computer vision and pattern recognition
2016:770-778.
25. He K, Zhang X, Ren S et al. Spatial Pyramid Pooling in Deep Convolutional
Networks for Visual Recognition, IEEE Trans Pattern Anal Mach Intell 2015;37:1904-
1916.
26. Tien MZ, Meyer AG, Sydykova DK et al. Maximum allowed solvent accessibilites
of residues in proteins, PLoS One 2013;8:e80635.
27. Cock PJ, Antao T, Chang JT et al. Biopython: freely available Python tools for
computational molecular biology and bioinformatics, Bioinformatics 2009;25:1422-
1423.
28. Huang Y, Wuchty S, Zhou Y et al. SGPPI: structure-aware prediction of protein-
protein interactions in rigorous conditions with graph convolutional network, Brief
Bioinform 2023;24.
29. Chen S, Sun Z, Lin L et al. To Improve Protein Sequence Profile Prediction through
Image Captioning on Pairwise Residue Distance Map, J Chem Inf Model 2020;60:391-
399.
30. Biro JC. Amino acid size, charge, hydropathy indices and matrices for protein
structure analysis, Theor Biol Med Model 2006;3:15.
31. Ulyanov D, Vedaldi A, Lempitsky V. Instance Normalization:The Missing
Ingredient for Fast Stylization, arXiv preprint 2016;arXiv:1607.08022.
32. Shah A, Shinde S, Kadam E et al. Deep Residual Networks with Exponential Linear
Unit, Proceedings of the third international symposium on computer vision and the
internet 2016:59-65.
33. Yang L, Xia JF, Gui J. Prediction of protein-protein interactions from protein
sequence using local descriptors, Protein Pept Lett 2010;17:1085-1090.
34. Guo Y, Li M, Pu X et al. PRED_PPI: a server for predicting protein-protein
interactions based on sequence data with probability assignment, BMC Research Notes
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
2010;3(1):1-7.
35. Chen KH, Wang TF, Hu YJ. Protein-protein interaction prediction using a hybrid
feature representation and a stacked generalization scheme, BMC Bioinformatics
2019;20:308.
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
Figure 1. The overall architecture of the RSPPI model. All protein structures were
pretreated by selecting the amino acids on the surface and then fed to the feature
extraction module.
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
Figure 2. The framework of the convolution (A) and identity (B) ResNet blocks.
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
Figure 3. The framework of the ResNet Layer. Conv indicates the convolution kernel.
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
Figure 7. The performance comparison between models with and without the SPP layer.
A. The performance comparison on the Mus musculus dataset. B. The performance
comparison on the E.coli dataset.
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
Figure 10. The performance comparison with other methods on the E. coli dataset,
including Accuracy (A), Precision (B), Sensitivity (C), F1-score (D), Area Under the
ROC Curve (E), and Matthews correlation coefficient (F).
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.
Figure 11. The performance comparison with other methods on the S. cerevisiae,
including Accuracy (A), Precision (B), Sensitivity (C), F1-score (D), Area Under the
ROC Curve (E), and Matthews correlation coefficient (F).
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.