0% found this document useful (0 votes)
20 views13 pages

Protein Structure Prediction

Uploaded by

Vignesh Vignesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views13 pages

Protein Structure Prediction

Uploaded by

Vignesh Vignesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

SRI RAMACHANDRA

INSTITUTE OF HIGHER EDUCATION AND RESEARCH


(Deemed to be University)

Protein structure prediction

Dr. UDHAYA LAVINYA B.


ASST. PROFESSOR,
DEPT. OF BMS, SRIHER (DU)
Introduction
• Proteins differ from one another primarily in their sequence of amino acids
• This results in different spatial shape and structure and therefore different biological
functionalities in cells
• It is much easier to obtain protein sequences than to obtain their structures
• The UniProt/TrEMBL database contains currently more than 85 million of protein
sequences
• On the structure side, X-ray crystallography and NMR spectroscopy are currently the
two major experimental techniques for protein structure determination
• Both are, time- and manpower-consuming, and have their own technical limitations for
different protein targets
• As of April 2017, the number of protein structures in PDB increases to ~ 120,000, which
counts however only < 0.2% of the protein sequences in the UniProt.
Secondary structure prediction
• Protein secondary structure refers to the local conformation proteins’ polypeptide backbone
• There are two regular secondary structure states, α-helix (H) and β-strand (E), and one irregular secondary
structure type, the coil region (C)
• Sander developed a secondary structure assignment method Dictionary of Secondary Structure of Proteins
(DSSP)3
• It automatically assigns secondary structure into eight states (H, E, B, T, S, L, G, and I) according to hydrogen-
bonding patterns
• These eight states are often further simplified into three states of helix, sheet and coil
• The most widely used convention is that helix is designated as G, H and I; sheet as B and E; and all other
states are designated as a coils
• Most commonly, the secondary structure prediction problem is formulated as follows: given a protein
sequence with amino acids, predict whether each amino acid is in the α-helix (H), β-strand (E), or coil region
(C)
• Protein secondary structure prediction is usually evaluated by Q3 accuracy, which measures the percentage
of residues for three-state secondary structures to determine whether they have been predicted correctly
Secondary structure prediction
• Many statistical approaches and machine learning approaches have been developed to predict
secondary structure.
• One of the first approaches for predicting protein secondary structure, uses a combination of statistical
and heuristic rules.
• The GOR6 method formalizes the secondary structure prediction problem within an information-
theoretic framework.
• Position specific scoring matrix (PSSM) based on PSIBLAST reflects evolutionary information and has
made the most significant improvements in protein secondary structure prediction
• Many machine learning methods have been developed to predict protein secondary structure
• They exhibit good performance by exploiting evolutionary information, as well as statistic information
about amino acid subsequences
• For example, many neural network (NN) methods, hidden Markov model (HMM), support vector
machines (SVM) and K-nearest neighbors22 have had substantial success, and Q3 accuracy has reached
to 80%.
Secondary structure prediction
• The prediction accuracy has been continuously improved over the years,
especially by
• using hybrid or ensemble methods and
• incorporating evolutionary information in the form of profiles extracted from alignments of
multiple homologous sequences
• The highest Q3 accuracy without relying on structure templates is now at 82–84%
• DeepCNF is a deep learning extension of conditional neural fields (CNF), which
integrates conditional random fields and shallow neural networks.
• The overall performance of DeepCNF is significantly better than other state-of-
the-art methods, breaking the long-lasting ~80% accuracy.
• Recently SPIDER3 improved the prediction of protein secondary structure by
capturing non-local interactions using long short-term memory bidirectional
recurrent neural networks.
Frequently used tools
• PSRSM
• Protein Secondary Structure Prediction based on Data Partition and Semi-Random Subspace Method
• This method partitions the training dataset based on protein sequence length and employs a semi-random subspace technique to
train multiple classifiers. It combines predictions using a majority vote rule, achieving high accuracy across various datasets.
• Reported Q3 accuracy ranges from 85% to 86.38% on different datasets, outperforming many existing methods
• PSSpred
• Neural network-based tool that utilizes multiple sequence alignments gathered through PSI-BLAST.
• It trains separate neural networks for secondary structure prediction using amino acid frequency data.
• The final prediction is a combination of results from seven different neural network predictors
• JPred
• This server uses multiple neural networks trained on PSI-BLAST and HMMER profiles to predict both secondary structure and
solvent accessibility.
• Input Formats: Accepts sequences in various formats, including FASTA, and allows batch submissions for multiple sequences
• PSIPRED
• It employs two feed-forward neural networks to analyze outputs from PSI-BLAST profiles for secondary structure prediction
• It remains one of the most reliable tools in the field
• RaptorX-SS8
• Utilizes conditional neural fields to predict both three-state and eight-state secondary structures from protein sequences
• It is recognized for its effectiveness in structure prediction tasks
Tertiary structure prediction
• Three-dimensional arrangement of all the
atoms in a single polypeptide chain
• Crucial for the protein's functionality
• Formed through
• various interactions among the side
chains (R groups) of the amino acids
that make up the protein and
• interactions between these side chains
and the backbone of the polypeptide
Anfinsen’s dogma
Methods
• Similar sequences from the same evolutionary family often adopt similar protein structures
• This forms the foundation of homology modeling
• Most accurate way to predict protein structure by taking its homologous structure in PDB as template
• With the rapid growth of PDB database, an increasing proportion of target proteins can be predicted via
homology modeling
• When no structure with obvious sequence similarity to the target protein can be found in PDB, it is still
possible to find out proteins with structural similarity to the target protein
• The method to identify template structures from the PDB is called threading or fold recognition,
• It matches the target sequence to homologous and distant-homologous structures based on some algorithm
and take the best matches as structural template
• The basic premise for threading to work is that protein structure is highly conservative in evolution and the
number of unique structural folds are limited in nature
• Both homology modeling (based on sequence comparison) and threading methods (based on fold-
recognition) can be called template-based structure prediction methods
Frequently used tools
FALCON2
• Integrates template-based modeling (ProALIGN) and ab initio prediction (ProFOLD).
• FALCON2 simultaneously utilizes both approaches to enhance prediction accuracy. ProALIGN aligns the target protein with known
templates, while ProFOLD uses a neural network to estimate inter-residue distances.
• The server includes quality assessment tools to select the best candidate structures from predictions, demonstrating improved
accuracy through the integration of methods 1.
AlphaFold
• Deep learning-based approach.
• Developed by DeepMind, AlphaFold has achieved remarkable success in predicting protein structures by utilizing attention
mechanisms to model the relationships between amino acids.
• It has set new benchmarks in structure prediction, particularly in the CASP competitions, showcasing its ability to predict complex
structures with high accuracy.
I-TASSER
• Threading and fragment assembly.
• I-TASSER predicts protein structures by threading target sequences through known structures and assembling fragments based on
these templates.
• It is widely used for generating structural models when experimental data is lacking.
Frequently used tools
Phyre2
• Template-based modeling.
• Phyre2 predicts protein structures by aligning sequences with known structures and generating models based on
these alignments.
• It provides a user-friendly interface for researchers to input sequences and receive structural predictions.
MODELLER
• Homology modeling.
• MODELLER builds models based on homologous proteins with known structures, allowing users to create accurate
models for target proteins.
• Offers extensive options for model refinement and evaluation.
RaptorX
• Remote homology detection and threading.
• RaptorX combines template-based methods with ab initio approaches to predict protein structures effectively.
• It provides detailed structural predictions along with confidence scores.

You might also like