0% found this document useful (0 votes)

8 views

protein-protein-interaction-prediction-via-structure-based-2teh5onh

Uploaded by

Urvi Kashyap

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

protein-protein-interaction-prediction-via-structure-based-2teh5onh

Uploaded by

Urvi Kashyap

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023.

The copyright holder for this preprint

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

Protein-protein interaction prediction via

structure-based deep learning
Yucong Liu1, Zhenhai Li1†
1Shanghai Key Laboratory of Mechanics in Energy Engineering, Shanghai Institute of
Applied Mathematics and Mechanics, School of Mechanics and Engineering Science,
Shanghai University, Shanghai 200072, China
†Corresponding authors: [email protected]

Abstract
Protein-protein interactions (PPIs) play an essential role in life activities. Many
machine learning algorithms based on protein sequence information have been
developed to predict PPIs. However, these models have difficulty dealing with various
sequence lengths and suffer from low generalization and prediction accuracy. In this
study, we proposed a novel end-to-end deep learning framework, RSPPI, combining
Residual Neural Network (ResNet) and Spatial Pyramid Pooling (SPP), to predict PPIs
based on the protein sequence physicochemistry properties and spatial structural
information. In the RSPPI model, ResNet was employed to extract the structural and
physicochemical information from the protein 3D structure and primary sequence; the
SPP layer was used to transform feature maps to a single vector and avoid the fixed-
length requirement. The RSPPI model possessed excellent cross-species performance
and outperformed several state-of-the-art methods based either on protein sequence or
gene ontology in most evaluation metrics. The RSPPI model provides a novel strategic
direction to develop an AI PPI prediction algorithm.
Keywords: protein-protein interactions, deep learning, Residual Neural Network,
Spatial Pyramid Pooling, cross-species prediction

Introduction
Proteins serve as the basic building blocks playing specific roles in living cells, such as
cell adhesion, signal transduction, post-translational modification, etc. [1, 2]. Generally,
a biological process is accomplished by the cooperation of multiple proteins through
transient protein-protein interactions (PPIs) [3]. High-throughput experimental
methods have been widely used to determine PPIs, such as yeast two-hybrid screens
[4], protein chips [5], and surface plasmon resonance [6]. However, experimental
methods remain expensive, labor-intensive, and time-consuming. Therefore, the
computational method has emerged in PPI studies. With the advance of machine
learning and the accumulation of tremendous knowledge of protein and PPI information,
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

large numbers of computational methods have been developed for predicting PPIs
based on various data types, such as protein sequence [7] and protein secondary
structure [8, 9].
The primary structures of the majority of proteins have been sequenced and stored in
the UniProt database [10]. Thus, there is a longstanding interest in using sequence-
based methods to model and predict protein interactions [11-13]. Several sequence-
based methods have been developed to predict PPIs. For instance, Hashemifar et al. [14]
proposed a deep learning model, DPPI, using a Convolutional Neural Network (CNN)
and random projection modules to predict PPIs. DPPI achieved high prediction
accuracy and could predict homodimeric interactions. Yang et al. [15] proposed a model
based on a CNN architecture and protein evolutionary profiles. Their model showed
superior performance and outperformed several other human-virus PPI prediction
methods. However, the models based on the protein sequence showed low ability in
generalizability [15, 16].
In addition, proteins have a higher chance to interact when localized in the same cellular
component, or when sharing a common biological process or molecular function.
Accordingly, several methods predict PPIs from the gene ontology (GO) annotations
and semantic similarity of proteins [17]. Armean et al. [18] combined GO annotations
with SVM for PPI prediction and other models employed in PPI prediction tasks
including Bayesian classifiers [19] and random forest [20].
The PPI is mediated by non-covalent interactions, including electrostatic interaction,
hydrophobic interaction, and hydrogen bonding, etc. In addition, the PPI is regulated
by the fitness of the protein surfaces. Therefore, the structure of the protein and the
physicochemical property of the surface amino acids together play an important role in
PPI. However, the sequence-based and GO-based methods are incapable of
incorporating protein spatial structure information. Thus, the structure-based model is
needed for better learning PPI features. Currently, there are a few structure-based
models for predicting PPIs. For example, Struct2Net [21] threads protein pair
sequences to the protein complex in the Protein Data Bank (PDB) and searches for the
potential match, and then predicts the PPI [22]. Cai et al. proposed a support vector
machine (SVM) model to predict PPIs by analyzing the protein secondary structure [23].
Regrettably, these models based on protein structures required the corresponding
homologous templates in PDB or ignored the global structure and physicochemical
properties of proteins.
Herein, we designed a feature-extracting strategy, which converts 1D physicochemical
properties and 3D structure information to 2D feature maps. The 2D feature maps were
then fed to a newly proposed PPI deep learning framework, RSPPI. The RSPPI
incorporated three ResNet blocks [24], an SPP layer [25], and a fully connected layer.
The ResNet was used to extract the structure and physicochemical information; the SPP
layer [25] was used to handle the problem of variable length of protein sequences; the
fully connected layer was used to fuse the pairwise protein feature vectors and give the
prediction of the interaction. Our model bridged sequence and structure information
and transformed these features into 2D maps, which could be trained with ResNet. As
RSPPI is based on structure and physicochemical features, the model does not rely on
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

the homologous template of the PDB database. Moreover, thanks to the SPP layer our
model does not need to preprocess the protein sequence into a fixed length, which may
potentially introduce artifacts. These advantage of the model guarantees the outstanding
performance and excellent cross-species predicting capability of the RSPPI model.

Materials and methods

Overview
We introduce an end-to-end deep learning framework, RSPPI, for identifying PPIs. The
PPI prediction task is a binary classification problem. The amino acid physicochemical
properties (including hydrophobicity and charge) and protein spatial structure
(represented by distance map) were used as input in our RSPPI model. The model
includes two parts: the data-preprocessing module and the prediction module (Fig. 1).
In the data-preprocessing module, the 2D feature matrixes were generated from the
physicochemical properties and spatial structure information of the protein. The 2D
distance map, hydrophobic map, and charge map can be considered as multiple
channels of an image with protein features. Inspired by the image recognition
algorithms, we designed and trained a deep learning network. In the prediction module,
the ResNet combined with an SPP layer was employed to learn the PPI features. Then,
the pairwise protein representations are concatenated and fed into the designed fully
connected layer to predict the interaction probability of the protein pairs.

Feature extraction module

The amino acids located on the surface of a protein have a prominent contribution to
the protein interaction. Thus, the amino acids on the surface were carefully selected by
calculating the relative solvent accessibility (RSA) [26]. The RSA of amino acids in the
protein was calculated by the DSSP module of Biopython [27]. The amino acids with
RSA exceeding 0.2 were defined as the surface amino acids [28]. A shell of the protein
was then built on the selected surface amino acids and was used in the future data
preprocessing (the inset in Fig. 1). A two-dimensional (2D) distance matrix of the
protein shell was obtained by calculating the minimum heavy atom distance between
residues, which represents the spatial structure information of the protein. As the amino
acids in close contact have a greater chance to coordinate a specific interaction, we took
more care of the amino acids which are close in space. Therefore, the protein distance
map was further converted into an adjacency matrix. Any value less than or equal to a
2d0
chosen cutoff of 14 Å was replaced by Sij = , where dij is the heavy
d0 + max(d0 , dij )

atom distance between the ith and jth residue, and cutoff distance d0 is set to 4 Å. Any
value greater than the cutoff was set to 0 [29]. Therefore, the values of the adjacency
matrix were in the range of 0 to 1.
The physicochemical properties of amino acids affect inter-protein interactions, such as
their hydrophobicity and electrostatic charge. Hydrophobicity is a measure of how
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

strongly the side chains of the amino acids in an aqueous environment are excluded by
the water molecules. The hydrophobic amino acids of a protein tend to form
hydrophobic interactions to reduce the interface with water or other polar molecules.
The hydrophobicity is usually characterized by the hydrophobic moments (HM),
ranging from -7.5 for Arg to 3.1 for Ile. The larger value HM indicates the amino acid
possesses a higher hydrophobicity. As the hydrophobic interaction depends on the
hydrophobicity of a pair of amino acids, the hydropathy compatibility index (HCI) was
used to evaluate the hydrophobic interaction of amino acid pairs [30]. The HCIs of any
two residues were calculated as:
HCI = 20 −  HM (i ) − HM ( j )  19 /10.6 (1) [30]

where HM (i ) and HM ( j ) are the HM of the residue i and j .

The electrostatic charges of amino acids define the electrostatic interactions. The
residues with opposite charges attract each other, while residues with the same charge
repel each other. A simple characterization of electrostatic properties is the isoelectric
point (pI). The net charge of an amino acid varies with the pH of the environment owing
to the gain or loss of protons. PI is the pH at which the net charge of the amino acid is
0. For instance, the negatively charged amino acids, Asp and Glu have PIs of 2.7 and
3.2, respectively, and the positively charged amino acids, His, Lys, and Arg, have PIs
of 7.5, 9.7, and 10.7, respectively. The rest of the amino acids have PIs at ~6, so they
are considered to carry no net charge at the physiological condition. Similar to the HCL
of the hydrophobicity, the charge compatibility index (CCI) was used to evaluate the
electrostatic interaction of amino acid pairs [30]. The CCIs of any two residues were
calculated as:
CCI = 11 −  PI (i) − 7 PI ( j) − 7 19 / 33.8 (2) [30]

where PI (i ) and PI ( j ) are the PI of the residue i and j .

With the HCI and CCI, two more physicochemical feature maps of the protein surface
shell could be constructed. Similar to the distance map, 2D hydrophobicity and
electrostatic matrixes of the protein surface shell were built by assigning the
corresponding element of the matrixes with the HCI and CCI value.

Prediction module - ResNet layer design

Our network consisted of three ResNet layers with the same architecture (Fig. 3). In
each layer, there were two ResNet blocks: a 1 × 1 Conv ResNet block and an identity
ResNet block, respectively (Fig. 2). In both blocks, the feature maps firstly underwent
convolution processing, then followed by an instance normalization [31], and
eventually was activated via an exponential linear unit (ELU) activation layer [32].
These steps were repeated once again in the ResNet blocks. In the Convolution ResNet
block, the size of the feature maps is halved owing to the 3 × 3 convolution kernel,
while the number of output channels of the feature maps is doubled. Therefore, the
original x had to be transformed to x' with an additional 1 × 1 Conv before adding up
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

to the output F'(x). In contrast, thanks to the padding operation, the number of output
channels and the size of the feature maps remained unchanged in the identity ResNet
block. Therefore, the original x is added to the output F(x) at the end of the identity
ResNet block. A kernel size of 3 × 3 was used in all the convolution operations. The
filter number of the convolution operations in each ResNet layer was set to 16, 32, and
64, respectively. The ELU activation function was used as the activation layer because
the ELU activation function is more effective than the standard ReLU in the ResNet
[32]. Consequently, the feature maps of output changed from 3 × L × L to 64 × (L/8) ×
(L/8), where L is the protein sequence length.

Prediction module - Spatial pyramid pooling layer design

The fully connected layer requires a fixed-size input image. Therefore, a fixed length
of the protein sequence is required to utilize the current CNNs with a fully connected
layer for the PPIs prediction. Thus, truncating and zero-padding techniques for the
lengthy proteins and short proteins, respectively, were widely used to unify the protein
length. However, both truncating and zero-padding may introduce artifacts to the
protein feature and influence the recognition. To bypass the fixed-length requirement,
a four-layer SPP network was used after the ResNet layers. In the SPP network used
here, four pooling operations were performed on the ResNet output feature maps. Four
pooling grids of 4 × 4, 3 × 3, 2 × 2, and 1 × 1 were used for four pooling operations,
respectively. As a result, four vectors with lengths of 16, 9, 4, and 1 were obtained from
the four pooling operations, respectively. Considering the output of the ResNet is 64
feature maps, with the four pooling operations, 256 vectors were extracted (Fig. 4). By
concatenating all the 256 vectors, a vector with a length of 1920 was obtained regardless
of the size of the input protein.

Prediction module - Feature fusion layer design

To predict the interaction between two proteins, the pairwise protein vectors with the
length of 1920 were extracted from the SPP network, respectively, namely Z1 and Z 2 .

We compute the difference and product of Z1 and Z 2 to fuse the two protein features

according to the following equations:

diff = Z1 − Z2 (4)

mul = Z1  Z 2 (5)
where diff indicates the element-wise difference and mul indicates the element-wise
product. Then, the vector representations of diff and mul were concatenated to represent
the pairwise protein feature, then the concatenated vector with 3840 dimensions fed
into a fully connected layer. The fully connected layer stacks three linear layers
followed by three ReLU activation functions, respectively, and an output layer followed
by the Sigmoid activation function. The lengths of the linear layers and output layer
were set to 3840, 960, 480, and 1 (Fig 5). BCELoss was used as a loss function to train
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

the model, which was defined as:

N
1
L=−
N
  y  log x + (1 − y )  log(1 − x )
i =1
i i i i (6)

where N is the total number of protein–protein samples in the training dataset. yi and

xi , respectively, were the true target and the predicted score of the ith sample.

Dataset preparation
As the PPI is the training purpose, the dimers stored in the Protein Data Bank (PDB)
were collected to build the training dataset. The species with fewer structures were
screened out. Thus, the dimer complexes from six species, including Homo sapiens,
Mus musculus, E. coli, Bacteria, Eukaryota, and S. cerevisiae were kept in the datasets
for training. The dimer datasets were then further screened by using sequence alignment.
All the redundant proteins with over 50% sequence identity were removed from the
dataset. In addition, the small proteins with lengths less than 50 amino acids were
removed as well. The filtered protein dimer complexes were used as positive samples.
Negative samples were generated by randomly pairing the chains. To maintain the
balance of the training data, the sizes of the negative datasets were set to the same as
the positive datasets. With the aforementioned filtering criteria and random pairing
strategy, six single-species datasets and a mixed-species (including all six species)
dataset were constructed (Table 1).

Model implementation
The training was carried out by using PyTorch in Python. The RSPPI model was trained
for 200 epochs using the Adam optimizer. The learning rate was set to 0.001 initially
which exponentially decays by a gamma of 0.98 every epoch. The batch size was set to
1 during training. To avoid overfitting, the dropout technique was used, and the weight
decay was set to 0.03 during training.

Evaluation standard
The performance of the models was evaluated by classification accuracy, precision,
sensitivity, F1-score (F1), and Matthews correlation coefficient (MCC). These
parameters are defined below:
TP + TN
Accuracy = (7)
TP + TN + FP + FN

TP
Precision = (8)
TP + FP

TP
Sensitivity (recall ) = (9)
TP + FN
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

2  precision  recall
F1 = (10)
precision + recall

TP  TN − FP  FN
MCC = (11)
(TP + FP )(TP + FN )(TN + FP )(TN + FN )
where TP, TN, FP, and FN stand for true positive, true negative, false positive, and false
negative, respectively. A Receiver Operating Characteristic (ROC) curve indicates the
performance of a classification model at all classification thresholds. The Area Under
the ROC Curve (AUC) was also used to evaluate the model. In total, the above six
evaluation specs were used to evaluate the RSPPI model.

Results
From the ablation study, it was demonstrated the importance of the spatial structural
information and SPP layer. Then the RSPPI was applied to the cross-species dataset and
six single-species datasets. The trained model showed generally high precision both in
the same species and cross species.

Importance of the spatial structural information

To study the contribution of spatial structural and physicochemical information, we
evaluated the performance of using three kinds of embedding features, i.e., using
physicochemical features (hydrophobic and charge) only, using the structural feature
only, and using all features. In the Mus musculus dataset, the performance of the model
trained on the structural feature was better than that trained on the physicochemical
features, while the model trained on all the features achieved the best performance (Fig.
6). Obviously, adding the structural feature will effectively improve the evaluation
metrics, especially the precision of the model in the PPI prediction.

Importance of Spatial Pyramid Pooling Layer

To avoid padding or cropping of protein, we equipped the networks with an SPP layer.
To investigate the effect of the SPP layer, we compared our model with a fixed-length
model trained on the E. coli and the Mus musculus datasets. In the fixed-length model,
we used a similar model architecture, including six ResNet layers at the beginning and
a fully connected layer at the end. In addition, all the proteins fed to the model were
preprocessed to a fixed sequence length of 800 amino acids by padding the short
proteins with zeros and cropping the long proteins. The batch size was set to 32 for the
fixed-length training and other parameters were kept the same as the previous training
with the SPP layer. Obviously, the model with the RSPPI achieved better performance
than the fixed-length model in predicting the PPI on both the E. coli and Mus musculus
datasets (Fig. 7).
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

Evaluation of RSPPI on the intraspecies dataset

To test the performance of RSPPI on the intraspecies dataset, the 10-fold cross-
validation was carried out on the Homo sapiens, Mus musculus, E. coli, S. cerevisiae,
Bacteria, Eukaryota, and the mixed-species datasets, respectively. The mean evaluation
parameters were calculated from the tenfold cross-validation (Fig. 8). In all types of
species, the RSPPI exhibited high precisions, while showing acceptable performances
in the accuracies, sensitivities, F1s, and AUCs. In all the single species datasets, RSPPI
presented an average accuracy of ~91%, a precision of ~99%, a sensitivity of ~87%, an
F1 of ~90%, an AUC of ~93%, and an MCC of ~82%. The model showed the best
performance in the bacteria dataset with accuracy, sensitivity, and F1 all above 93%, an
AUC above 97%, and a precision above 99%. In the Mixed species dataset, RSPPI
presented an accuracy of 93%, a precision of 97%, an F1 of 92%, and an AUC of 92%.
According to the evaluation parameters, the RSPPI model had good performance in all
the tested species.

The evaluation of the cross-species performance of RSPPI

To explore the generalization of RSPPI in predicting PPI, a cross-species validation was
carried out, in which the model was trained with the Homo sapiens dataset and tested
with all the other five species (Fig. 9). Surprisingly, our model presented a comparable
performance in the cross-species validation comparing to the performance in the
intraspecies evaluation, while most state-of-art PPI models failed to provide an
acceptable prediction on cross-species dataset [16]. The Homo sapiens dataset trained
model presented an average prediction accuracy of >90% on all the other five datasets,
while the average precision of the intraspecies datasets is ~98%. Among all the datasets,
Eukaryota exhibited a relatively low accuracy of ~87%, while the Bacteria dataset
exhibited the highest accuracy of >95%. The Homo sapiens dataset trained RSPPI
model learned the features of Homo sapiens PPIs. However, the model predicted the
PPIs of the other species with similar performance as in the intraspecies evaluation,
which suggested that the RSPPI combined the physicochemical properties and spatial
structure information is able to predict the PPIs across species.

Comparison with the existing PPI prediction method

In order to compare our RSPPI model with other models, we listed the evaluation
metrics of our model and five state-of-the-art methods, including KNN_LD [33],
SVM_LD [11], PRED_PPI [34], PPI_MetaGO [35], Go2ppi_RF [20], trained with the
PPIs information of the E. coli dataset side-by-side (Fig. 10). The KNN_LD, SVM_LD,
and PRED_PPI models are based on the primary sequence of proteins, while the
PPI_MetaGO and Go2ppi_RF models are based on the gene ontology (GO) information
of proteins. AUC provides an aggregate measure of performance across all possible
classification thresholds, in which a higher value represents the better classification
ability of the model. MCC describes the correlation coefficient between the actual
sample and the predicted sample, and its value close to 1 indicates that the prediction
is very accurate. On the E. coli species dataset, RSPPI is superior to all these models in
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

AUC and MCC. The RSPPI outperformed KNN_LD, SVM_LD, PPI_MetaGO, and
GO2ppi_RF by 15.7%, 2.4%, 2.1%, and 2.0% in AUC, respectively, and outperformed
PPI_MetaGO and GO2ppi_RF by 5.9% and 5.1% in MCC, respectively. In other
metrics, RSPPI was one of the top 2 models, slightly behind SVM_LD in accuracy,
sensitivity, and F1, and slightly behind KNN_LD in precision. Although SVM_LD
ranked top 1 in three metrics, its performance in precision is the lowest among all
models. Similarly, the top model, KNN_LD, concerning the precision, exhibited the
worst performance in all the other metrics. Overall, our model performance is
adequately good and well-balanced.
Moreover, KNN_LD, SVM_LD, and three additional models, PCA_EELM [12],
SVM_ACC [13], and SVM_PIRB [23] have been trained on S. cerevisiae. To further
compare our model to these models, we summarized the evaluation metrics of our
RSPPI model and the aforementioned five state-of-the-art models trained on S.
cerevisiae side-by-side (Fig. 11). Among these models, KNN_LD, SVM_LD,
PCA_EELM [12], and SVM_ACC [13] were based on the primary sequence of proteins,
while SVM_PIRB [23] was based on the secondary structure. On the S. cerevisiae
dataset, RSPPI led in both accuracy and precision and even showed great advantages
in precision over all the other models, and was only left behind in sensitivity and AUC.
The accuracy of RSPPI outperformed KNN_LD, SVM_LD, PCA_EELM, SVM_ACC,
and SVM_PIRB by 4.0%, 1.6%, 3.2%, 0.9%, and 2.2%, respectively. By integrating
physicochemical features and global spatial structure information, RSPPI achieved the
best performance in most evaluation metrics.

Conclusion
In this paper, we propose RSPPI, a novel end-to-end deep learning method to predict
PPIs. RSPPI learned from comprehensive feature profiles of proteins based on their
physicochemical properties and spatial structure information. Systematic evaluations
showed that RSPPI achieved good performance and high precision in both intraspecies
and cross-species PPI prediction. The excellent performance on cross-species PPI
prediction suggests the RSPPI has the potential to predict the less studied species based
on current knowledge. As the structure and physicochemical compatibility are the basis
of non-covalent PPIs, our RSPPI model based on spatial structure and physicochemical
information showed better performance in most evaluation metrics, comparing to the
sequence-based, GO-based, and secondary structure-based methods. In addition, by
utilizing the Spatial Pyramid Pooling layer, RSPPI avoided the fixed-length
requirement for the fully connected layer, thus reducing the artifact from padding or
cropping. The RSPPI model provides a new strategy for PPI prediction by combining
the spatial and physicochemical features of the protein.

Availability of data and codes

The code, datasets, and trained model in this study are available in GitHub repository:
https://github.com/Mechnobiology/RSPPI.
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

References
1. Parsons JT, Horwitz AR, Schwartz MA. Cell adhesion: integrating cytoskeletal
dynamics and cellular tension, Nat Rev Mol Cell Biol 2010;11:633-643.
2. Huttlin EL, Bruckner RJ, Paulo JA et al. Architecture of the human interactome
defines protein communities and disease networks, Nature 2017;545:505-509.
3. Perkins JR, Diboun I, Dessailly BH et al. Transient protein-protein interactions:
structural, functional, and network properties, Structure 2010;18:1233-1243.
4. Ito T, Chiba T, Ozawa R et al. A comprehensive two-hybrid analysis to explore the
yeast protein interactome, Proceedings of the National Academy of Sciences
2001;98(8):4569-4574.
5. Tong AHY, Drees B, Nardelli G et al. A Combined Experimental and
Computational Strategy to Define Protein Interaction Networks for Peptide
Recognition Modules, Science 2002;295(5553):321-324.
6. Williams C, Addona TA. The integration of SPR biosensors with mass
spectrometry: possible applications for proteome analysis, Trends in Biotechnology
2000;18(2):45-48.
7. Yang R, Zhang C, Gao R et al. An ensemble method with hybrid features to identify
extracellular matrix proteins, PLoS One 2015;10:e0117804.
8. Wang J, Li Y, Liu X et al. High-accuracy prediction of protein structural classes
using PseAA structural properties and secondary structural patterns, Biochimie
2014;101:104-112.
9. Zhang S, Liang Y, Yuan X. Improving the prediction accuracy of protein structural
class: approached with alternating word frequency and normalized Lempel-Ziv
complexity, J Theor Biol 2014;341:71-77.
10. UniProt C. UniProt: the universal protein knowledgebase in 2021, Nucleic Acids
Res 2021;49:D480-D489.
11. Zhou YZ, Gao Y, Zheng YY. Prediction of Protein-Protein Interactions Using Local
Description of Amino Acid Sequence, Advances in Computer Science and Education
Applications. Springer, Berlin, Heidelberg 2011:254-262.
12. You Z-H, Lei Y-K, Zhu L et al. Prediction of protein-protein interactions from
amino acid sequences with ensemble extreme learning machines and principal
component analysis, BMC Bioinformatics 2013;14(8):1-11.
13. Guo Y, Yu L, Wen Z et al. Using support vector machine combined with auto
covariance to predict protein-protein interactions from protein sequences, Nucleic
Acids Res 2008;36:3025-3030.
14. Hashemifar S, Neyshabur B, Khan AA et al. Predicting protein-protein interactions
through sequence-based deep learning, Bioinformatics 2018;34:i802-i810.
15. Wuchty S, Zhang Z, Yang X. Multi-scale Convolutional Neural Networks for the
Prediction of Human-virus Protein Interactions. Proceedings of the 13th International
Conference on Agents and Artificial Intelligence. 2021, 41-48.
16. Yang L, Han Y, Zhang H et al. Prediction of Protein-Protein Interactions with Local
Weight-Sharing Mechanism in Deep Learning, Biomed Res Int 2020;2020:5072520.
17. Jain S, Bader GD. An improved method for scoring protein-protein interactions
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

using semantic similarity within the gene ontology, BMC Bioinformatics 2010;11(1).
18. Armean IM, Lilley KS, Trotter MWB et al. Co-complex protein membership
evaluation using Maximum Entropy on GO ontology and InterPro annotation,
Bioinformatics 2018;34:1884-1892.
19. Patil A, Nakamura H. Filtering high-throughput protein-protein interaction data
using a combination of genomic features, BMC Bioinformatics 2005;6:100.
20. Maetschke SR, Simonsen M, Davis MJ et al. Gene Ontology-driven inference of
protein-protein interactions using inducers, Bioinformatics 2012;28:69-75.
21. Singh R, Park D, Xu J et al. Struct2Net: a web service to predict protein-protein
interactions using a structure-based approach, Nucleic Acids Res 2010;38:W508-515.
22. Sussman JL, Lin D, Jiang J et al. Protein Data Bank (PDB): Database of Three-
Dimensional Structural Information of Biological Macromolecules, Acta
Crystallographica Section D: Biological Crystallography 1998;54(6):1078-1084.
23. Cai L, Pei Z, Qin S et al. Prediction of Protein-Protein Interactions in
Saccharomyces cerevisiae Based on Protein Secondary Structure. 2012 International
Conference on Biomedical Engineering and Biotechnology. 2012, 413-416.
24. He K, Zhang X, Ren S et al. Deep Residual Learning for Image Recognition,
Proceedings of the IEEE conference on computer vision and pattern recognition
2016:770-778.
25. He K, Zhang X, Ren S et al. Spatial Pyramid Pooling in Deep Convolutional
Networks for Visual Recognition, IEEE Trans Pattern Anal Mach Intell 2015;37:1904-
1916.
26. Tien MZ, Meyer AG, Sydykova DK et al. Maximum allowed solvent accessibilites
of residues in proteins, PLoS One 2013;8:e80635.
27. Cock PJ, Antao T, Chang JT et al. Biopython: freely available Python tools for
computational molecular biology and bioinformatics, Bioinformatics 2009;25:1422-
1423.
28. Huang Y, Wuchty S, Zhou Y et al. SGPPI: structure-aware prediction of protein-
protein interactions in rigorous conditions with graph convolutional network, Brief
Bioinform 2023;24.
29. Chen S, Sun Z, Lin L et al. To Improve Protein Sequence Profile Prediction through
Image Captioning on Pairwise Residue Distance Map, J Chem Inf Model 2020;60:391-
399.
30. Biro JC. Amino acid size, charge, hydropathy indices and matrices for protein
structure analysis, Theor Biol Med Model 2006;3:15.
31. Ulyanov D, Vedaldi A, Lempitsky V. Instance Normalization:The Missing
Ingredient for Fast Stylization, arXiv preprint 2016;arXiv:1607.08022.
32. Shah A, Shinde S, Kadam E et al. Deep Residual Networks with Exponential Linear
Unit, Proceedings of the third international symposium on computer vision and the
internet 2016:59-65.
33. Yang L, Xia JF, Gui J. Prediction of protein-protein interactions from protein
sequence using local descriptors, Protein Pept Lett 2010;17:1085-1090.
34. Guo Y, Li M, Pu X et al. PRED_PPI: a server for predicting protein-protein
interactions based on sequence data with probability assignment, BMC Research Notes
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

2010;3(1):1-7.
35. Chen KH, Wang TF, Hu YJ. Protein-protein interaction prediction using a hybrid
feature representation and a stacked generalization scheme, BMC Bioinformatics
2019;20:308.
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

Figure 1. The overall architecture of the RSPPI model. All protein structures were
pretreated by selecting the amino acids on the surface and then fed to the feature
extraction module.
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

Figure 2. The framework of the convolution (A) and identity (B) ResNet blocks.
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

Figure 3. The framework of the ResNet Layer. Conv indicates the convolution kernel.
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

Figure 4. The architecture of the Spatial Pyramid Pooling Layer.

bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

Figure 5. The architecture of the fully connected layer.

Figure 6. The performance of using three kinds of embedding features, including

Accuracy (A), Precision (B), Sensitivity (C), F1-score (D), Area Under the ROC Curve
(E), and Matthews correlation coefficient (F).
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

Figure 7. The performance comparison between models with and without the SPP layer.
A. The performance comparison on the Mus musculus dataset. B. The performance
comparison on the E.coli dataset.
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

Figure 8. The performance of the RSPPI model evaluated on intraspecies datasets,

including Accuracy (A), Precision (B), Sensitivity (C), F1-score (D), Area Under the
ROC Curve (E), and Matthews correlation coefficient (F).
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

Figure 9. The performance of RSPPI evaluated with cross-species datasets, including

Accuracy (A), Precision (B), Sensitivity (C), F1-score (D), Area Under the ROC
Curve (E), and Matthews correlation coefficient (F).
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

Figure 10. The performance comparison with other methods on the E. coli dataset,
including Accuracy (A), Precision (B), Sensitivity (C), F1-score (D), Area Under the
ROC Curve (E), and Matthews correlation coefficient (F).
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

Figure 11. The performance comparison with other methods on the S. cerevisiae,
including Accuracy (A), Precision (B), Sensitivity (C), F1-score (D), Area Under the
ROC Curve (E), and Matthews correlation coefficient (F).
bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
made available under aCC-BY-NC-ND 4.0 International license.

Table 1. Statistical data of multi-species dataset.

Dataset Positive Negative Total
Homo sapiens 2690 2690 5380
E. coli 369 369 738
Bacteria 4975 4975 9950
Eukaryota 3420 3420 6840
Mus musculus 575 575 1150
S. cerevisiae 338 338 676
Mixed-species 10604 10604 21208

Python for Chemistry: An introduction to Python algorithms, Simulations, and Programing for Chemistry (English Edition)
From Everand
Python for Chemistry: An introduction to Python algorithms, Simulations, and Programing for Chemistry (English Edition)
Dr. M. Kanagasabapathy
5/5 (1)
PDF PDF
100% (2)
PDF PDF
9 pages
PROTEIN-PROTEIN INTERACTION PREDICTION USING A DEEP NEURAL NETWORK WITH BATCH NORMALIZATION AND QUARTILE ALGORITHM
No ratings yet
PROTEIN-PROTEIN INTERACTION PREDICTION USING A DEEP NEURAL NETWORK WITH BATCH NORMALIZATION AND QUARTILE ALGORITHM
11 pages
BMRI2014-598129
No ratings yet
BMRI2014-598129
9 pages
HSSPPI hierarchical and spatial sequential modeling for PPIs prediction
No ratings yet
HSSPPI hierarchical and spatial sequential modeling for PPIs prediction
12 pages
Prediction of Protein-Protein Interactions With LSTM Deep Learning Model
No ratings yet
Prediction of Protein-Protein Interactions With LSTM Deep Learning Model
5 pages
Structure Based Approaches for Protein Protein Interaction Prediction Using Machine Learning and Deep Learning
No ratings yet
Structure Based Approaches for Protein Protein Interaction Prediction Using Machine Learning and Deep Learning
23 pages
Multifaceted Protein-Protein Interaction Prediction Based On Siamese Residual RCNN
No ratings yet
Multifaceted Protein-Protein Interaction Prediction Based On Siamese Residual RCNN
10 pages
1756-0500-3-145
No ratings yet
1756-0500-3-145
8 pages
IEEE_Journal_TNBE
No ratings yet
IEEE_Journal_TNBE
8 pages
s12859 022 04598 X
No ratings yet
s12859 022 04598 X
19 pages
Prediction of Protein-Protein Interactions From Amino Acid Sequences Using A Novel Multi-Scale Continuous and Discontinuous Feature Set
No ratings yet
Prediction of Protein-Protein Interactions From Amino Acid Sequences Using A Novel Multi-Scale Continuous and Discontinuous Feature Set
9 pages
2020 11 12 380774v1 Full
No ratings yet
2020 11 12 380774v1 Full
8 pages
Methods: Surabhi Maheshwari, Michal Brylinski
No ratings yet
Methods: Surabhi Maheshwari, Michal Brylinski
8 pages
ppiGReMLIN A Graph Mining Based
No ratings yet
ppiGReMLIN A Graph Mining Based
25 pages
Dingo Optimized Fuzzy CNN Technique For Efficient Protein Structure Prediction
No ratings yet
Dingo Optimized Fuzzy CNN Technique For Efficient Protein Structure Prediction
9 pages
Chu Et Al. - 2022 - Prediction of Liquid-Liquid Phase Separating Proteins Using Machine Learning
No ratings yet
Chu Et Al. - 2022 - Prediction of Liquid-Liquid Phase Separating Proteins Using Machine Learning
13 pages
2021, Transfer Learning Via Multi-Scale Convolutional Neural Layers For Human-Virus Protein-Protein Interaction Prediction
No ratings yet
2021, Transfer Learning Via Multi-Scale Convolutional Neural Layers For Human-Virus Protein-Protein Interaction Prediction
8 pages
Bandyopadhyay 2016
No ratings yet
Bandyopadhyay 2016
10 pages
Ensemble of Neural Networks To Solve Class Imbalance Problem of Protein Secondary Structure Prediction
No ratings yet
Ensemble of Neural Networks To Solve Class Imbalance Problem of Protein Secondary Structure Prediction
12 pages
Protein Secondary Structure Prediction - A Survey of the State of the Art
No ratings yet
Protein Secondary Structure Prediction - A Survey of the State of the Art
24 pages
BIO 401 Note... Protein Function Prediction and Protein Interaction, String
No ratings yet
BIO 401 Note... Protein Function Prediction and Protein Interaction, String
4 pages
42 Han 2012 BMCBioinformatics
No ratings yet
42 Han 2012 BMCBioinformatics
10 pages
2014 J Mol Recognit 28 35-48
No ratings yet
2014 J Mol Recognit 28 35-48
14 pages
A Structure Based Approach For Accurate Prediction of Protein
No ratings yet
A Structure Based Approach For Accurate Prediction of Protein
8 pages
ss2
No ratings yet
ss2
7 pages
biomolecules-12-00774
No ratings yet
biomolecules-12-00774
16 pages
Normalized L3-Based Link Prediction in Protein-Protein Interaction Networks
No ratings yet
Normalized L3-Based Link Prediction in Protein-Protein Interaction Networks
28 pages
Prediction of Interactions Between Hiv-1 and Human Proteins
No ratings yet
Prediction of Interactions Between Hiv-1 and Human Proteins
5 pages
Porter 6 Protein Secondary Structure Prediction by Leveraging Pre Trained Language Models (PLMs)
No ratings yet
Porter 6 Protein Secondary Structure Prediction by Leveraging Pre Trained Language Models (PLMs)
16 pages
In Silico PPI Determination
No ratings yet
In Silico PPI Determination
13 pages
btae271
No ratings yet
btae271
7 pages
Bioinformatics 2007 Dyer I159 66
No ratings yet
Bioinformatics 2007 Dyer I159 66
8 pages
Protein Structure Prediction
No ratings yet
Protein Structure Prediction
13 pages
TSP CMC 26408
No ratings yet
TSP CMC 26408
14 pages
(Ebook) Protein Structure Prediction by Daisuke Kihara (eds.) ISBN 9781493903658, 1493903659 - Instantly access the full ebook content in just a few seconds
100% (1)
(Ebook) Protein Structure Prediction by Daisuke Kihara (eds.) ISBN 9781493903658, 1493903659 - Instantly access the full ebook content in just a few seconds
59 pages
DL Protein
No ratings yet
DL Protein
25 pages
ijms-25-08426
No ratings yet
ijms-25-08426
21 pages
Arabidopsis Thaliana: Designing A Computational System To Predict Protein-Protein Interactions in
No ratings yet
Arabidopsis Thaliana: Designing A Computational System To Predict Protein-Protein Interactions in
4 pages
Bbad 484
No ratings yet
Bbad 484
11 pages
s41586 021 03828 1 - Reference
No ratings yet
s41586 021 03828 1 - Reference
23 pages
Bioinformatics: Merging Biology and Technology
From Everand
Bioinformatics: Merging Biology and Technology
Mani Devar
No ratings yet
j.bbagen.2020.129545
No ratings yet
j.bbagen.2020.129545
18 pages
Structural bioinformatics
No ratings yet
Structural bioinformatics
23 pages
Design-and-Development-of-Hydrophobicity-and-Net-charge-Ba_2023_Procedia-Com
No ratings yet
Design-and-Development-of-Hydrophobicity-and-Net-charge-Ba_2023_Procedia-Com
11 pages
s41586 021 03819 2 - Reference
No ratings yet
s41586 021 03819 2 - Reference
16 pages
P020
No ratings yet
P020
6 pages
Deep Learning With Feature Embedding For Compound-Protein Interaction Prediction
No ratings yet
Deep Learning With Feature Embedding For Compound-Protein Interaction Prediction
21 pages
Proteome Wide Interaction Maos and Networks
No ratings yet
Proteome Wide Interaction Maos and Networks
12 pages
benchmarking-protein-structure-predictors-to-assist-machine-1h3bc063
No ratings yet
benchmarking-protein-structure-predictors-to-assist-machine-1h3bc063
13 pages
Bioinformatics-2022-MinLi-0-BridgeDPI a novel graph neural network for predicting drug–protein interactions
No ratings yet
Bioinformatics-2022-MinLi-0-BridgeDPI a novel graph neural network for predicting drug–protein interactions
8 pages
Accurate Prediction of Inter-protein Residue–Residue Contacts for Homo-oligomeric Protein Complexes-converted
No ratings yet
Accurate Prediction of Inter-protein Residue–Residue Contacts for Homo-oligomeric Protein Complexes-converted
19 pages
NeurIPS-2023-predicting-mutational-effects-on-protein-protein-binding-via-a-side-chain-diffusion-probabilistic-model-Paper-Conference
No ratings yet
NeurIPS-2023-predicting-mutational-effects-on-protein-protein-binding-via-a-side-chain-diffusion-probabilistic-model-Paper-Conference
12 pages
Evaluating The Accuracy and Efficiency of Complex Network Classification Algorithms
No ratings yet
Evaluating The Accuracy and Efficiency of Complex Network Classification Algorithms
6 pages
benchmarking-protein-structure-predictors-to-assist-machine-learning-guided-peptide-discovery
No ratings yet
benchmarking-protein-structure-predictors-to-assist-machine-learning-guided-peptide-discovery
24 pages
Deciphering Protein Conversations: A Proteomic Exploration of Protein-Protein Interactions
No ratings yet
Deciphering Protein Conversations: A Proteomic Exploration of Protein-Protein Interactions
2 pages
11 Chapter.4
No ratings yet
11 Chapter.4
26 pages
Protein Function
No ratings yet
Protein Function
12 pages
Satyajit BIB
No ratings yet
Satyajit BIB
13 pages
Identifying Protein-Protein Interaction Using Tree LSTM and Structured Attention
No ratings yet
Identifying Protein-Protein Interaction Using Tree LSTM and Structured Attention
8 pages
Bioinformatics Unveiled
From Everand
Bioinformatics Unveiled
Joan Melody
No ratings yet
Protein Slides
No ratings yet
Protein Slides
47 pages
Animal Glues
No ratings yet
Animal Glues
12 pages
Proteins Reviewer
100% (1)
Proteins Reviewer
6 pages
Physicochemical Characterization of Chitosan Nanoparticles: Electrokinetic and Stability Behavior
No ratings yet
Physicochemical Characterization of Chitosan Nanoparticles: Electrokinetic and Stability Behavior
8 pages
Electrophoresis 04 04 2020 Final PDF 1
No ratings yet
Electrophoresis 04 04 2020 Final PDF 1
68 pages
Medeiros 2019
No ratings yet
Medeiros 2019
9 pages
Amino Acids
No ratings yet
Amino Acids
36 pages
(UNIT - 6 - A - HASEENA) PROTEINS
No ratings yet
(UNIT - 6 - A - HASEENA) PROTEINS
77 pages
Complete Download Protein Chromatography Methods and Protocols 3rd Edition Sinéad Loughran PDF All Chapters
100% (4)
Complete Download Protein Chromatography Methods and Protocols 3rd Edition Sinéad Loughran PDF All Chapters
67 pages
All About Protein
No ratings yet
All About Protein
24 pages
Oc Amino Acids Peptides and Proteins
No ratings yet
Oc Amino Acids Peptides and Proteins
11 pages
Protein Analysis
No ratings yet
Protein Analysis
40 pages
Determination of The Pka Values of An Amino Acid
100% (1)
Determination of The Pka Values of An Amino Acid
3 pages
Justine Chirichella - Lab 3 Biochemistry
No ratings yet
Justine Chirichella - Lab 3 Biochemistry
7 pages
CHEM1280 2011 2012 Final Exam 1
No ratings yet
CHEM1280 2011 2012 Final Exam 1
3 pages
92-SurPASS3 RefGuide
No ratings yet
92-SurPASS3 RefGuide
126 pages
Pharmaceutical Biochemistry
No ratings yet
Pharmaceutical Biochemistry
6 pages
Short Notes Class 12 Chemistry 2023 (1)_230227_192957 (1)-Compressed
No ratings yet
Short Notes Class 12 Chemistry 2023 (1)_230227_192957 (1)-Compressed
6 pages
13a Carrageenan
No ratings yet
13a Carrageenan
34 pages
Cricket isolates as ingredients to design protein foods
No ratings yet
Cricket isolates as ingredients to design protein foods
12 pages
NSEC 2013 Solution 1.1 PDF
No ratings yet
NSEC 2013 Solution 1.1 PDF
14 pages
Advance Leather Technology-Lpde6011
No ratings yet
Advance Leather Technology-Lpde6011
132 pages
Precipitation, Bioseparation
No ratings yet
Precipitation, Bioseparation
30 pages
Detective Tests For Amino Acids: A Report Submitted To The Department of Dentistry University of Duhok
No ratings yet
Detective Tests For Amino Acids: A Report Submitted To The Department of Dentistry University of Duhok
18 pages
CC - Bishop
No ratings yet
CC - Bishop
19 pages
Proteins Biochemistry and Biotechnology 2nd Edition Gary Walsh 2024 Scribd Download
100% (1)
Proteins Biochemistry and Biotechnology 2nd Edition Gary Walsh 2024 Scribd Download
61 pages
2D Gel Electrophoresis
No ratings yet
2D Gel Electrophoresis
9 pages
Bds Protein Chemistry 1
100% (1)
Bds Protein Chemistry 1
53 pages
Improved Photocatalytic and Antibacterial Performance of CR Doped TiO2 Nanoparticles
No ratings yet
Improved Photocatalytic and Antibacterial Performance of CR Doped TiO2 Nanoparticles
9 pages

protein-protein-interaction-prediction-via-structure-based-2teh5onh

Uploaded by

protein-protein-interaction-prediction-via-structure-based-2teh5onh

Uploaded by

bioRxiv preprint doi: https://doi.org/10.1101/2023.05.27.542552; this version posted May 30, 2023.

The copyright holder for this preprint

Protein-protein interaction prediction via

Materials and methods

Feature extraction module

where HM (i ) and HM ( j ) are the HM of the residue i and j .

where PI (i ) and PI ( j ) are the PI of the residue i and j .

Prediction module - ResNet layer design

Prediction module - Spatial pyramid pooling layer design

Prediction module - Feature fusion layer design

according to the following equations:

the model, which was defined as:

Importance of the spatial structural information

Importance of Spatial Pyramid Pooling Layer

Evaluation of RSPPI on the intraspecies dataset

The evaluation of the cross-species performance of RSPPI

Comparison with the existing PPI prediction method

Availability of data and codes

Figure 4. The architecture of the Spatial Pyramid Pooling Layer.

Figure 5. The architecture of the fully connected layer.

Figure 6. The performance of using three kinds of embedding features, including

Figure 8. The performance of the RSPPI model evaluated on intraspecies datasets,

Figure 9. The performance of RSPPI evaluated with cross-species datasets, including

Table 1. Statistical data of multi-species dataset.

You might also like