0% found this document useful (0 votes)
76 views

Innovative Computing Review (ICR) : Issn: 2791-0024 ISSN: 2791-0032 Homepage

Knowing the protein structure helps us to investigate diseases in human beings related to abnormal or impaired folded proteins. This research provides a solution for how to identify the misbalance of homotypic and heterotypic contacts on the sequential stage. There are two methods of protein structure prediction, template based and Ab-initio models. Template based model matches the given sequence with the original sequence. Whereas, Ab-initio calculates the weight of the given sequence and ident
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views

Innovative Computing Review (ICR) : Issn: 2791-0024 ISSN: 2791-0032 Homepage

Knowing the protein structure helps us to investigate diseases in human beings related to abnormal or impaired folded proteins. This research provides a solution for how to identify the misbalance of homotypic and heterotypic contacts on the sequential stage. There are two methods of protein structure prediction, template based and Ab-initio models. Template based model matches the given sequence with the original sequence. Whereas, Ab-initio calculates the weight of the given sequence and ident
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Innovative Computing Review (ICR)

Volume 2 Issue 1, Spring 2022


ISSN(P): 2791-0024 ISSN(E): 2791-0032
Homepage: https://journals.umt.edu.pk/index.php/UMT-AIR

Article QR

Protein Structure Prediction with AlphaFold2, How it Works,


Limitations and Solution for Less number of Homotypic and Large
Title:
number of Heterotypic Contacts

Author (s): Muhammad Noman Khalid1, Hassan Kaleem2

Affiliation (s):
1
Allama Iqbal Medical College Lahore,
2
SQL Consultancy LTD 9 Frances Street, Crewe, England

DOI: https://doi.org/10.32350.icr.21.02

History: Received: April 10, 2022, Revised: May 25, 2022, Accepted: June 13, 2022

M. N. Khalid and H. Kaleem, “Protein Structure Prediction with AlphaFold2,


How it Works, Limitations and Solution for Less number of Homotypic and
Citation:
Large number of Heterotypic Contacts,” UMT Artif. Intell. Rev., vol. 2, no. 1, pp.
00-00, 2022, doi: https://doi.org/10.32350.icr.21.02

Copyright: © The Authors


Licensing: This article is open access and is distributed under the terms of
Creative Commons Attribution 4.0 International License
Conflict of
Interest: Author(s) declared no conflict of interest

A publication of
School of Systems and Technology
University of Management and Technology, Lahore, Pakistan
Protein Structure Prediction with AlphaFold2, How it
Works, Limitations and Solution for Less number of
Homotypic and Large number of Heterotypic Contacts
Muhammad Noman Khalid *,1, Hassan Kaleem2
Allama Iqbal Medical College Lahore, Pakistan
1
2
SQL Consultency LTD 9 Frances Street, Crewe, England
Abstract-Knowing the protein limitations, misbalance, protein
structure helps us to investigate structure prediction
diseases in human beings related to I.Introduction
abnormal or impaired folded
Knowledge and prediction of
proteins. This research provides a
protein structures help us to
solution for how to identify the
understand their working, that is,
misbalance of homotypic and
how these chemicals help human
heterotypic contacts on the beings in their daily life. The word
sequential stage. There are two ‘protein’ comes from the Greek
methods of protein structure word ‘proteios’ which means
prediction, template based and Ab- ‘highest importance’. Individual
initio models. Template based model proteins are categorized based on
matches the given sequence with the their functions which describe the
original sequence. Whereas, Ab- tasks they do. Protein structure
initio calculates the weight of the recognition is used by the immune
given sequence and identifies system, which is in charge of our
whether it is balanced or not. If the body's defense. Knowing the protein
sequence is not in balance, it can be structure helps us to know diseases
labeled as on the initial stage by in human beings related to abnormal
calculating its weight. In this or impaired, folded proteins.
research, future directions to Proteins can be categorized on the
researchers are provided as how to basis of functions they perform. For
example, structural proteins help to
achieve maximum accuracy in
determine cell shape and integrity.
protein structure prediction.
These proteins also play a vital role
Index Terms-Ab-initio modeling, in the mitosis and meiosis of the
Alphafold2, hetrotypic, homoptypic, cell’s reproduction and also in the
immune system of our body (by

*
Corresponding author: [email protected]
School of System and Technology
19
Volume 2 Issue 1, Spring 2022
Protein Structure Prediction

structural recognition of the average cost of these tests is


immunoglobulins). Thus, knowing around 450$ [2] per sample. The
and modifying these protein Ab-initio [3] model was developed
structures can revolutionize the to predict the secondary structure of
medical field. proteins. Two models were
developed, namely PaleAle 4.0 with
Linus Paul [1] first predicted the
the accuracy of 80.0% and Porter
spiral structure of proteins in 1936.
4.0 with the accuracy of 82.2%.
Afterwards, with the help of
Bidirectional Recurrent Neural
technological advancements in
Network (BRNN) [4] was used to
biology, scientists discovered 4
predict the secondary structure.
different levels of protein structure
DeepCNF [5] was developed based
which are primary, secondary,
on machine learning to predict the
tertiary, and quaternary levels.
protein structure with the accuracy
Primary structure comprises the
of 82.3%. DeepCNF is also used to
sequence of amino acids in its
predict the IDRs (intrinsically
polypeptide chain. Secondary
disordered regions) of proteins.
structure constitutes polypeptide’s
However, with the training of AUC
backbone which is the main chain is
[6], the model achieved the
its local spatial arrangement.
accuracy of 84.5%. Spider 3 [7] was
Tertiary structure forms the three-
developed to predict the secondary
dimensional structure of the entire
structure of proteins by using Long
chain of polypeptides. Lastly,
Short Term Memory (LSTM)[8]
quaternary structure comprises the
and Bidirectional Recurrent Neural
three-dimensional arrangement of
Network (BRNNs) [9] It achieved
subnet in multi-subnet protein.
the accuracy of 83.9%. MUFOLD-
There are two methods used to SS [10] was developed to predict the
determine protein structure secondary structure of proteins. It
including X-Ray Crystallography achieved the accuracy of 88.20% in
and Proton Nuclear Magnetic easy cases and 83.37% in hard
Resonance (PNMR). These cases. Easy cases are those in which
methods help us to visualize the the hit value or e-value is <=0.5,
different layers of protein structure. while hard cases are those where hit
However, the main problem is the value or e-value is >0.5. Ab-initio
cost of determining the protein [11] model was developed with the
structure which remains very high. updated version of Porter 4.0 model.
According to the X-Ray Porter 5.0 achieved the accuracy of
Crystallography Facility (XRCF), 84.19% in protein structure

Innovative Computing Review


20
Volume 2 Issue 1, Spring 2022
Khalid and Kaleem

prediction. SPOT-1D [12] predicted predicting misbalanced homotypic


the protein structure with the and heterotypic contacts.
accuracy of 86.18%. SPOT-1D uses
II.Related Work
Deep Neural Network (DNN)
architecture based on recurrent and Many models have been
convolutional methods. NetSurfP- developed to predict the protein
2.0 [13] was developed to predict structure covered in these reviews
the secondary structure of proteins [19] [20] [21] [22] [23]. Despite
from their primary sequence. All applying neural network
these models were developed and architecture for prediction [19] [22]
used to predict the one-dimensional [23], the improved structure
secondary structure of proteins. The prediction of protein [15] [24] [25]
accuracy of these models is based on [26]. These approaches follow the
three (3) class labels in the current improvement of computer vision
study. The tertiary and quaternary systems [27]. They attempt to fold
3D structures were found to be the tertiary structure of proteins to
problematic. The challenge is how make the quaternary structure [28]
to visualize the tertiary structure and [29] [30], which ultimately creates
then combine all the visualized the 3D structure of proteins. Few
forms to create the quaternary models have been developed to
structure. Several methods have predict the protein structure, directly
been developed to predict the [31] [32] [33] [34]. However, these
structure of proteins. DeepMind approaches fail to match the
[14] have developed a model to previous structure prediction
predict protein structure known as pipelines [35]. Still, the success of
AlphaFold 1 [15] with CASP 13 transformers, which are self-
[16]. AlphaFold 1 uses concurrent attention based model for language
neural network architecture to processing [36] and more recently,
predict protein structure, while of computer vision based models
AlphaFold 2 [17] with CASP 14 [37] [38], has diverted the attention
[18] uses the transformer. of researchers to adopt the self-
Transformer adopts the self- attention based approaches [39] [40]
attention mechanism which takes [41].
sequential input data. Still, its III. Research Methodology
prediction is very low in case of
homotypic and heterotypic contacts. The goal of this research is to
This research provides a solution for provide a review of existing
problems and their solutions for

School of System and Technology


21
Volume 2 Issue 1, Spring 2022
Protein Structure Prediction

protein structure can predict protein structure


prediction. We followed the comparable to the width of an atom.
methodology of a survey that was The model was trained on CASP 14
designed by various researchers. with 170000 known protein
The research objectives of this structures. Although, for a problem
paper are as follows: like protein structure prediction, this
is a very small number. They have
1. Workings of AlphaFold2
taken a much larger dataset from
2. Limitations of AlphaFold2
unknown structures of protein
3. Solutions for a small
sequences. They have learned to
number of homotypic and a
extract information from unlabeled
large number of
data, for example, unsupervised
heterotypic contacts
learning which enables a lot of AI
4. Feature extraction of
breakthroughs. GPT 3 [54]
protein
(Generative Pre-trained
Transformer) was trained on a huge
A. CASP (Critical Assessment of amount of data collected from the
Protein Structure Prediction) web. Then, it was given a slice of
CASP is a community sentence and it had to predict which
experiment conducted every two words were likely to come in the
years since 1993 on a large scale. next sentence. In another example, a
Experimentally determined slice of an image was given to the
information is passed on to the model and the model was asked to
predicator of protein structure. predict the remaining part of the
When predictions are made, neither image.
the predictor nor the organizer and B. Limitations of AlphaFold 2
accessor know about them. These AlphaFold2 [17] uses
predictions are then solved by X- transformer, a deep learning model
Ray Crystallography and PNMR. based on self-attention mechanism.
Afterwards, these entries are kept in However, this model slows down
hold by PDB (Protein Data Bank). when the sequence size of protein is
IV. Results increased [55]. Another limitation
A. How Alpha Fold 2 Works highlighted by the AlphaFold2 [17]
team is that it’s prediction is much
AlphaFold2 [17] achieved the weaker for those proteins who have
median score of 92.4 GDT (Global a small number of homotypic
Distance Test). It indicates that even contacts.
with the hardest protein targets, it
Innovative Computing Review
22
Volume 2 Issue 1, Spring 2022
Khalid and Kaleem

Table I

Year of No of No. of
Version_Human Ref
Publishing Entries Sequences

CASP1 1994 229 1 [42]

CASP2 1996 212 2 [43]

CASP3 1998 235 2 [44]

CASP4 2000 203 1 [45]

CASP5 2002 191 3 [46]

CASP6 2004 217 2 [47]

CASP7 2006 217 1 [48]

CASP8 2008 246 1 [49]

CASP9 2010 229 3 [50]

CASP10 2012 221 3 [51]

CASP11 2014 117 1 [52]

CASP12 2016 140 3 [53]

CASP13 2018 150 1 [16]

CASP14 2020 190 2 [18]

School of System and Technology


23
Volume 2 Issue 1, Spring 2022
Protein Structure Prediction

Multiple experiments were types, then, in advance metastasis


conducted for proteins with a large tumors there will be some other
number of heterotypic contacts [56] type of cells in epithelial cells, such
on a recent PDB dataset [57]. as connective tissue cells [58].
Homotypic contacts are defined by
the attachment of one cell to another In a template-based model, for
cell and these cells have to be example, we have a protein whose
identical. Whereas, in heterotypic original sequence is:
contact protein’s physical Gly-Ala-Pro-Leu-Val-Met-Val-
interaction have different primary Pro-Ala-Cys-Gly-Ala-Pro-Leu-
structure. In protein 3D structure Val-Met-Val-Pro-Ala-Cys-Gly-
prediction, there is a sequence. Ala-Pro-Leu-Val-Met-Val-Pro-
Firstly, primary structure is Ala-Cys-Gly-Ala-Pro-Leu-Val-
predicted based on the numbers of Met-Val-Pro-Ala-Cys-Gly-Ala-
amino acids in the polypeptide Pro-Leu-Val-Met-Val-Pro-Ala-Cys
chain. Then, in the secondary The sequence obtained from the
structure, sequence from the user is:
primary structure is classified into Gly-Trp-Pro-Leu-Val-Met-Val-
different parts, namely Alpha Pro-Ala-Cys-Gly-Ala-Pro-Leu-
Hilux, Beta strand, and random coil. Val-Met-Val-Pro-Ala-Cys-Gly-
Afterwards, in the tertiary structure, Ala-Pro-Leu-Val-Met-Val-Pro-
Alpha Hilux, Beta strand, and Ala-Cys-Gly-Ala-Pro-Leu-Val-
random coil are visualized Met-Val-Pro-Ala-Cys-Gly-Ala-
separately. Finally, in the Pro-Leu-Val-Met-Val-Pro-Ala-Cys
quaternary structure, all of these So, there is the original
visualized forms are folded together sequence and also the original
to create a 3D structure of the weight of this sequence. The user
protein. sequence is matched with the
original sequence. After matching
C. How to Calculate Homotypic the sequence with the original
and Heterotypic Contacts sequence, it was found that alanine
is replaced with tryptophan in the
In advanced metastasis tumors,
user sequence. Tryptophan is the
due to the lack of tumor suppressor
heaviest amino acid among all the
genes different cells types
essential amino acids. So, it
(heterotypic) grow in between
automatically changes the weight of
normal cells types, for example, if
the user sequence, thus making the
normal alignment consists of
misbalance of homotypic and
epithelial cells in epithelial cell
Innovative Computing Review
24
Volume 2 Issue 1, Spring 2022
Khalid and Kaleem

heterotypic contacts in the user calculations, it was found that the


sequence easily identifiable. This real weight of the original amino
problem can be solved by using acid is lower than the user’s amino
machine learning algorithms. Ten acid sequence. It is due to the fact
machine learning algorithms were that the weight of alanine is lower
developed and implemented [59], than tryptophan, which is present in
each with K-Fold cross-validation the user’s amino acid sequence. So,
testing. A feature extraction method the weight of the amino acid chains
for protein [60] was created on a can be calculated by matching them
live server. This app requires with their original sequences and
protein sequence in FASTA format the errors can be shown
and it automatically creates a csv numerically.
file to be used in machine learning D. Feature Extraction for
algorithms. The problem is that Protien
creating a dataset which has all the For the extraction of protein
original sequences of protein is not
features, the first step is to compute
possible. In case of human proteins
the matrix formation of the input
all the genes have been identified.
protein query. The protein sequence
So, this approach can resolve the
length is used to build the
problem.
following:
The second method is the Ab-
initio model which is the
1.Position Relative Incidence
computational matrix of quantum
Matrix (PRIM)
chemistry. In this model, the weight
of protein sequence is calculated to
2. Reverse Position Relative
identify if it is in balance or not.
Incidence Matrix (RPRIM)
This fact was revealed when the
user sequence was matched with the
protein’s real sequence [61]. 3. Accumulative Absolute
Weight calculation showed that the Position Incidence Vector
user sequence has a higher weight (AAPIV)
because it has tryptophan in the
amino acid chain. While, the real 4. Reverse Accumulative
sequence has alanine at the position Absolute Position Incidence
of the R. Since tryptophan is the Vector (RAAPIV)
heaviest among all, thus the result
can be assessed by matching it with 5. Frequency Vector (FV)
the real sequences. After
School of System and Technology
25
Volume 2 Issue 1, Spring 2022
Protein Structure Prediction

Live server is created for the machine learning algorithms via K-


feature extraction of proteins [60] Fold cross-validation testing.
based on Chou’s 5-step rule [62]. However, it was found that the best
The server accepts only FASTA method to predict the protein
format. structure is Ab-initio. A solution
with one real-time example was
V. Discussion and Future proposed regarding how to
Research Work calculate the weight of protein
sequences and identify those not in
The current paper briefly discussed
balance. On the basis of this
protein structure prediction keeping
technique, protein folding and
in view the previous research
repairing technique can also be
conducted in this field. The primary
applied to achieve the maximum
sequence was predicted based on
GDT in protein structure prediction.
the numbers of amino acids in the
Since Ab-initio is based on
polypeptide chain. Secondary
calculating rather than predicting
structure prediction included the
structure (based on CASP), it is the
identification of Alpha Helix, Beta
best method to predict the protein
strand, and random coil. Tertiary
structure.
structure included the visualization
of these classes and in the
quaternary structure, these References
visualized forms were merged
[1] “On the Structure of Native,
together to create the final 3D
Denatured, and Coagulated
structure of the target protein.
Proteins.”
AlphaFold2 with CASP 14
https://www.ncbi.nlm.nih.gov/
achieved the maximum frequency
pmc/articles/PMC1076802/
of 92 GDT. Although, it was also
(accessed Jan. 09, 2022).
found that there are limitations to
AlphaFold2 algorithms. If there are
[2] “Fees,” X-Ray Crystallography
a small number of homotypic and a
Facility (XRCF).
large number of heterotypic
http://xrcf.caltech.edu/xrcf/fees
contacts, their prediction GDT is
(accessed Jan. 09, 2022).
very low. This issue can be resolved
by homology based modeling and
Ab-initio modeling. In homology [3] C. Mirabello and G. Pollastri,
based modeling, a protein feature “Porter, PaleAle 4.0: high-
extraction technique is developed accuracy prediction of protein
on a live server to test and create 10 secondary structure and relative

Innovative Computing Review


26
Volume 2 Issue 1, Spring 2022
Khalid and Kaleem

solvent accessibility,” interactions by long short-term


Bioinformatics, vol. 29, no. 16, memory bidirectional recurrent
pp. 2056–2058, Aug. 2013, doi: neural networks for improving
10.1093/bioinformatics/btt344. prediction of protein secondary
structure, backbone angles,
[4] P. Baldi, S. Brunak, P. Frasconi, contact numbers and solvent
G. Soda, and G. Pollastri, accessibility,” Bioinformatics,
“Exploiting the past and the vol. 33, no. 18, pp. 2842–2849,
future in protein secondary Sep. 2017, doi:
structure prediction,” 10.1093/bioinformatics/btx218.
Bioinformatics, vol. 15, no. 11,
pp. 937–946, Nov. 1999, doi: [8] S. Hochreiter and J.
10.1093/bioinformatics/15.11.9 Schmidhuber, “Long Short-
37. Term Memory,” Neural
Comput., vol. 9, no. 8, pp.
[5] S. Wang, J. Ma, and J. Xu, 1735–1780, Nov. 1997, doi:
“AUCpreD: proteome-level 10.1162/neco.1997.9.8.1735.
protein disorder prediction by
AUC-maximized deep [9] M. Schuster and K. K. Paliwal,
convolutional neural fields,” “Bidirectional recurrent neural
Bioinformatics, vol. 32, no. 17, networks,” IEEE Trans. Signal
pp. i672–i679, Sep. 2016, doi: Process., vol. 45, no. 11, pp.
10.1093/bioinformatics/btw446 2673–2681, Nov. 1997, doi:
10.1109/78.650093.
[6] “AUC: a misleading measure of
the performance of predictive [10] C. Fang, Y. Shang, and D.
distribution models - Lobo - Xu, “MUFOLD-SS: New deep
2008 - Global Ecology and inception-inside-inception
Biogeography - Wiley Online networks for protein secondary
Library.” structure prediction,” Proteins
https://onlinelibrary.wiley.com/ Struct. Funct. Bioinforma., vol.
doi/abs/10.1111/j.1466- 86, no. 5, pp. 592–598, 2018,
8238.2007.00358.x (accessed doi: 10.1002/prot.25487.
Jan. 19, 2022).
[11] M. Torrisi, M. Kaleel, and
[7] R. Heffernan, Y. Yang, K. G. Pollastri, “Porter 5: fast,
Paliwal, and Y. Zhou, state-of-the-art ab initio
“Capturing non-local prediction of protein secondary
School of System and Technology
27
Volume 2 Issue 1, Spring 2022
Protein Structure Prediction

structure in 3 and 8 classes,” https://www.uniprot.org/unipro


Oct. 2018. doi: t/O75601
10.1101/289033.
[17] “Highly accurate protein
[12] Hanson, Jack, “Protein structure prediction with
Structure Prediction by AlphaFold | Nature.”
Recurrent and Convolutional https://www.nature.com/article
Deep Neural Network s/s41586-021-03819-2
Architectures,” Nov. 2018, doi: (accessed Jan. 09, 2022).
10.25904/1912/3830.
[18] “CASP 14.” [Online].
[13] M. S. Klausen et al., Available:
“NetSurfP-2.0: Improved https://www.uniprot.org/unipro
prediction of protein structural t/P31944
features by integrated deep
learning,” Proteins Struct. [19] R. Pearce and Y. Zhang,
Funct. Bioinforma., vol. 87, no. “Deep learning techniques have
6, pp. 520–527, 2019, doi: significantly impacted protein
10.1002/prot.25674. structure prediction and protein
design,” Curr. Opin. Struct.
[14] “DeepMind - What if Biol., vol. 68, pp. 194–207, Jun.
solving one problem could 2021, doi:
unlock solutions to thousands 10.1016/j.sbi.2021.01.007.
more?,” Deepmind.
https://deepmind.com/ [20] “Advances in protein
(accessed Feb. 01, 2022). structure prediction and design |
Nature Reviews Molecular Cell
[15] A. W. Senior et al., Biology.”
“Improved protein structure https://www.nature.com/article
prediction using potentials from s/s41580-019-0163-x (accessed
deep learning,” Nature, vol. Feb. 02, 2022).
577, no. 7792, Art. no. 7792,
Jan. 2020, doi: 10.1038/s41586- [21] D. S. Marks, T. A. Hopf,
019-1923-7. and C. Sander, “Protein
structure prediction from
[16] “CASP 13.” [Online]. sequence variation,” Nat.
Available: Biotechnol., vol. 30, no. 11, Art.

Innovative Computing Review


28
Volume 2 Issue 1, Spring 2022
Khalid and Kaleem

no. 11, Nov. 2012, doi: 1503, Jan. 2020, doi:


10.1038/nbt.2419. 10.1073/pnas.1914677117.

[22] N. Qian and T. J. Sejnowski, [26] Y. Li et al., “Deducing high-


“Predicting the secondary accuracy protein contact-maps
structure of globular proteins from a triplet of coevolutionary
using neural network models,” matrices through deep residual
J. Mol. Biol., vol. 202, no. 4, pp. convolutional networks,” PLOS
865–884, Aug. 1988, doi: Comput. Biol., vol. 17, no. 3, p.
10.1016/0022-2836(88)90564- e1008865, Mar. 2021, doi:
5. 10.1371/journal.pcbi.1008865.

[23] “Prediction of contact maps [27] K. He, X. Zhang, S. Ren,


with neural networks and and J. Sun, “Deep Residual
correlated mutations | Protein Learning for Image
Engineering, Design and Recognition,” 2016, pp. 770–
Selection | Oxford Academic.” 778. Accessed: Feb. 02, 2022.
https://academic.oup.com/peds/ [Online]. Available:
article/14/11/835/1608425?logi https://openaccess.thecvf.com/c
n=true (accessed Feb. 02, ontent_cvpr_2016/html/He_De
2022). ep_Residual_Learning_CVPR_
2016_paper.html
[24] S. Wang, S. Sun, Z. Li, R.
Zhang, and J. Xu, “Accurate De [28] “Identification of direct
Novo Prediction of Protein residue contacts in protein–
Contact Map by Ultra-Deep protein interaction by message
Learning Model,” PLOS passing | PNAS.”
Comput. Biol., vol. 13, no. 1, p. https://www.pnas.org/content/1
e1005324, Jan. 2017, doi: 06/1/67.short (accessed Feb. 02,
10.1371/journal.pcbi.1005324. 2022).

[25] J. Yang, I. Anishchenko, H. [29] D. S. Marks et al., “Protein


Park, Z. Peng, S. Ovchinnikov, 3D Structure Computed from
and D. Baker, “Improved Evolutionary Sequence
protein structure prediction Variation,” PLOS ONE, vol. 6,
using predicted interresidue no. 12, p. e28766, Dec. 2011,
orientations,” Proc. Natl. Acad. doi:
Sci., vol. 117, no. 3, pp. 1496– 10.1371/journal.pone.0028766.
School of System and Technology
29
Volume 2 Issue 1, Spring 2022
Protein Structure Prediction

[30] “PSICOV: precise International Conference on


structural contact prediction Learning Representations, Sep.
using sparse inverse covariance 2018. Accessed: Feb. 02, 2022.
estimation on large multiple [Online]. Available:
sequence alignments | https://openreview.net/forum?i
Bioinformatics | Oxford d=Byg3y3C9Km
Academic.”
https://academic.oup.com/bioin [34] J. Li, “Universal
formatics/article/28/2/184/1981 Transforming Geometric
08?login=true (accessed Feb. Network,” ArXiv190800723 Cs
02, 2022). Q-Bio, Aug. 2019, Accessed:
Feb. 02, 2022. [Online].
[31] “End-to-End Differentiable Available:
Learning of Protein Structure - http://arxiv.org/abs/1908.00723
ScienceDirect.”
https://www.sciencedirect.com/ [35] J. Xu, M. McPartlon, and J.
science/article/pii/S240547121 Li, “Improved protein structure
9300766 (accessed Feb. 02, prediction by deep learning
2022). irrespective of co-evolution
information,” Nat. Mach.
[32] “Protein structure Intell., vol. 3, no. 7, Art. no. 7,
prediction using multiple deep Jul. 2021, doi: 10.1038/s42256-
neural networks in the 13th 021-00348-5.
Critical Assessment of Protein
Structure Prediction (CASP13) [36] A. Vaswani et al.,
- Senior - 2019 - Proteins: “Attention is All you Need,” in
Structure, Function, and Advances in Neural Information
Bioinformatics - Wiley Online Processing Systems, 2017, vol.
Library.” 30. Accessed: Feb. 02, 2022.
https://onlinelibrary.wiley.com/ [Online]. Available:
doi/full/10.1002/prot.25834 https://proceedings.neurips.cc/p
(accessed Feb. 02, 2022). aper/2017/hash/3f5ee243547de
e91fbd053c1c4a845aa-
[33] J. Ingraham, A. Riesselman, Abstract.html
C. Sander, and D. Marks,
“Learning Protein Structure [37] Z. Huang, X. Wang, L.
with a Differentiable Huang, C. Huang, Y. Wei, and
Simulator,” presented at the W. Liu, “CCNet: Criss-Cross
Innovative Computing Review
30
Volume 2 Issue 1, Spring 2022
Khalid and Kaleem

Attention for Semantic [41] “Biological structure and


Segmentation,” 2019, pp. 603– function emerge from scaling
612. Accessed: Feb. 02, 2022. unsupervised learning to 250
[Online]. Available: million protein sequences |
https://openaccess.thecvf.com/c PNAS.”
ontent_ICCV_2019/html/Huan https://www.pnas.org/content/1
g_CCNet_Criss- 18/15/e2016239118.short
Cross_Attention_for_Semantic (accessed Feb. 02, 2022).
_Segmentation_ICCV_2019_p
aper.html [42] “CASP 1.” [Online].
Available:
[38] “Axial-DeepLab: Stand- https://www.uniprot.org/unipro
Alone Axial-Attention for t/P31944
Panoptic Segmentation |
SpringerLink.” [43] “CASP 2.” [Online].
https://link.springer.com/chapte Available:
r/10.1007/978-3-030-58548- https://www.uniprot.org/unipro
8_7 (accessed Feb. 02, 2022). t/P31944

[39] E. C. Alley, G. Khimulya, S. [44] “CASP 3.” [Online].


Biswas, M. AlQuraishi, and G. Available:
M. Church, “Unified rational https://www.uniprot.org/unipro
protein engineering with t/P31944
sequence-based deep
representation learning,” Nat. [45] “CASP 4.” [Online].
Methods, vol. 16, no. 12, Art. Available:
no. 12, Dec. 2019, doi: https://www.uniprot.org/unipro
10.1038/s41592-019-0598-1. t/P31944

[40] M. Heinzinger et al., [46] “CASP 5.” [Online].


“Modeling aspects of the Available:
language of life through https://www.uniprot.org/unipro
transfer-learning protein t/P51878
sequences,” BMC
Bioinformatics, vol. 20, no. 1, p. [47] “CASP 6.” [Online].
723, Dec. 2019, doi: Available:
10.1186/s12859-019-3220-8. https://www.uniprot.org/unipro
t/P55212
School of System and Technology
31
Volume 2 Issue 1, Spring 2022
Protein Structure Prediction

[48] “CASP 7.” [Online]. [55] S. Gao et al., “Limitations


Available: of Transformers on Clinical
https://www.uniprot.org/unipro Text Classification,” IEEE J.
t/P55210 Biomed. Health Inform., vol.
25, no. 9, pp. 3596–3607, Sep.
[49] “CASP 8.” [Online]. 2021, doi:
Available: 10.1109/JBHI.2021.3062322.
https://www.uniprot.org/unipro
t/Q14790 [56] K. Tunyasuvunakool et al.,
“Highly accurate protein
[50] “CASP 9.” [Online]. structure prediction for the
Available: human proteome,” Nature, vol.
https://www.uniprot.org/unipro 596, no. 7873, Art. no. 7873,
t/P55211 Aug. 2021, doi:
10.1038/s41586-021-03828-1.
[51] “CASP 10.” [Online].
Available: [57] wwPDB consortium,
https://www.uniprot.org/unipro “Protein Data Bank: the single
t/Q92851 global archive for 3D
macromolecular structure data,”
[52] “CASP 11.” [Online]. Nucleic Acids Res., vol. 47, no.
Available: D1, pp. D520–D528, Jan. 2019,
https://www.uniprot.org/unipro doi: 10.1093/nar/gky949.
t/Q91XW7
[58] D. Rusciano, D. R. Welch,
[53] “CASP 12.” [Online]. and M. M. Burger, Eds.,
Available: “Homotypic and heterotypic
https://www.uniprot.org/unipro cell adhesion in metastasis,” in
t/Q6UXS9 Laboratory Techniques in
Biochemistry and Molecular
[54] L. Floridi and M. Chiriatti, Biology, vol. 29, Elsevier, 2000,
“GPT-3: Its Nature, Scope, pp. 9–64. doi: 10.1016/S0075-
Limits, and Consequences,” 7535(00)29003-7.
Minds Mach., vol. 30, no. 4, pp.
681–694, Dec. 2020, doi: [59] RaoHassanKaleem,
10.1007/s11023-020-09548-1. RaoHassanKaleem/Diebetes-
Detection-using-Machine-
Learning-Algorithms. 2022.
Innovative Computing Review
32
Volume 2 Issue 1, Spring 2022
Khalid and Kaleem

Accessed: Feb. 14, 2022. [Online]. [Online]. Available:


Available: https://biochimia.usmf.md/sites/
https://github.com/RaoHassanK default/files/inline-
aleem/Diebetes-Detection- files/Aminoacids-peptides-
using-Machine-Learning- primary-structure_0.pdf
Algorithms
[62] S. J. Malebary, M. S. ur
[60] “Feature Extraction App - Rehman, and Y. D. Khan,
Proteins · Streamlit.” “iCrotoK-PseAAC: Identify
https://share.streamlit.io/raohas lysine crotonylation sites by
sankaleem/fetprotextract/main/a blending position relative
pp.py (accessed Apr. 13, 2022). statistical features according to
the Chou’s 5-step rule,” PLOS
[61] “Aminoacids-peptides- ONE, vol. 14, no. 11, p.
primary-structure_0.pdf.” e0223993, Nov. 2019, doi:
Accessed: Feb. 15, 2022. 10.1371/journal.pone.0223993.

School of System and Technology


33
Volume 2 Issue 1, Spring 2022
Protein Structure Prediction

Innovative Computing Review


2
Volume 2 Issue 1, Spring 2022

You might also like