Ab Initio Protein Structure Prediction
Ab Initio Protein Structure Prediction
STRUCTURE PREDICTION
BY UTSAV KS
BIOINFORMATICS
1
PROTEIN STRUCTURE PREDICTION
• Protein structure prediction (PSP) is the prediction of the three-
dimensional structure of a protein from its amino acid sequence i.e. the
prediction of its tertiary structure from its primary structure.
2
3
AB INITIO MODELLING
• ab initio modelling conducts a conformational search under the guidance
of a designed energy function.
• This procedure usually generates a number of possible conformations
(structure decoys), and final models are selected from them.
4
A successful ab initio modelling depends on three factors:
An accurate energy function with which the native structure of a protein corresponds to
the most thermodynamically stable state, compared to all possible decoy structures.
An efficient search method which can quickly identify the low-energy states through
conformational search.
Selection of native-like models from a pool of decoy structures.
5
ENERGY FUNCTIONS
• Energy classified into two groups
• Physics-based energy functions
• Knowledge-based energy functions
6
Physics-Based Energy Functions
• “In a strictly-defined physics-based ab initio method, interactions between
atoms should be based on quantum mechanics and the coulomb potential
with only a few fundamental parameters such as the electron charge and
the Planck constant; all atoms should be described by their atom types
where only the number of electrons is relevant.“
• (Hagler et al. 1974; Weiner et al. 1984)
7
A compromised force field with a large number of selected atom types is used. In each
atom type, the chemical and physical properties of the atoms are enough alike with the
parameters calculated from crystal packing or quantum mechanical theory.
8
Well-known examples of such all-atom physics- based force fields include:-
AMBER
• CHARMM
• OPLS
• GROMOS96
These potentials contain terms associated with bond lengths, angles, torsion angles, van der
Waals, and electrostatics interactions.
The major difference between them lies in the selection of atom types and the interaction
parameters.
9
Knowledge-Based Energy Function
• Refers to the empirical energy terms derived from the statistics of the solved
structures in deposited PDB.
• Can be divided into two types:
• generic and sequence-independent terms such as the hydrogen bonding and
the local backbone stiffness of a polypeptide chain.
• Amino acid or protein sequence dependent terms e.g. pair wise residue
contact potential, distance dependent atomic contact potential, and secondary
structure propensities.
10
Conformational Search Methods
• Successful ab initio modelling of protein structures depends on the availability of a
powerful conformation search method which can efficiently find the global minimum
energy structure for a given energy function with complicated energy landscape.
• Types:
• • Monte Carlo Simulations.
• Molecular Dynamics
• Genetic Algorithm.
• Mathematical Optimization
11
Monte Carlo Simulations
• Its core idea is to use random samples of parameters or inputs to explore
the behavior of a complex system or process.
12
13
Molecular Dynamics.
• MD simulation solves Newton's equations of motion at each step of atom
movement, which is probably the most faithful method depicting atomistically
what is occurring in proteins.
• The method is therefore most-often used for the study of progin folding pathways .
• The long simulation time is one of the major issues of this method, since the
incremental time scale is usually in the order of femtoseconds (10-15 s) while the
fastest folding time of a small protein (less than 100 residues) is in the millisecond
range in nature.
14
Genetic Algorithm
• The genetic algorithm is a method for solving problems that is based on natural
selection, the process that drives biological evolution.
• The genetic algorithm repeatedly modifies a population of individual solutions.
• At each step, the genetic algorithm selects individuals at random from the
current population to be parents and uses them to produce the children for the
next generation.
• Over successive generations, the population "evolves“ toward an optimal
solution.
15
Mathematical Optimization
• • Mathematical optimization is the selection of a best element (with regard
to some criteria) from some set of available alternatives.
16
Model Selection
• The selection of protein models has been emerged as a new field called
Model Quality Assessment Programs (MQAP)
• Modelling selection approaches can be classified into two types:
• energy based
• free-energy based
17
Physics-Based Energy Function
• Selects the decoy with the lowest energy.
18
Knowledge-Based Energy Function
• . Sippl developed a pair wise residue-distance based potential (Sippl 1990)
using the statistics of known PDB structures in 1990 (its newest version is
PROSA II (Sippl 1993; Wiederstein and Sippl 2007)).A variety of
knowledge-based potentials have been proposed, which include atomic
interaction potential, solvation potential, hydrogen bond potential, torsion
angle potential, etc.
19
Sequence-Structure Compatibility
Function
• Best models are selected not purely based on energy functions.
• They are selected based on the compatibility of target sequences to model
structures.
• The earliest and still successful example is that by Luthy et al.(1992), who
used threading scores to evaluate structures.
• Colovos and Yeates (1993) later used a quadratic error function to describe the
non-covalently bonded interactions among CC,CN, CO, NN, NO and 00,
where near-native structures havefewer errors than other decoys
20
Clustering of Decoy Structures.
• Cluster analysis or clustering is the task of grouping a set of objects in such a way
that objects in the same group (called a cluster) are more similar (in some sense or
another) to each other than to those in other groups (clusters).
• The cluster-centre conformation of the largest cluster is considered closer to native
structures than the majority of decoys.
• In the work by Shortle et al. (1998), for all 12 cases tested, the cluster-centre
conformation of the largest cluster was closer to native structures than the majority
of decoys. Cluster-centre structures were ranked as the top 1-5%closest to their
native structures.
21
22
23
24
25