Protein structure prediction and modeling
Protein structure prediction and modeling
MODELLING
Homology modelling
INTRODUCTION:
Homology modeling, also known as comparative modeling of protein is the
technique which allows to construct an unknown atomic-resolution model of the
"target" protein from:
1. Its amino acid sequence and
2.An experimental 3D structure of a related homologous protein (the
"template").
Prediction of the three dimensional structure of a given protein sequence i.e. target
protein from the amino acid sequence of a homologous (template) protein for
which an X-ray or NMR structure is available based on an alignment to one or more
known protein structures.
If similarity between the target sequence and the template sequence is
detected, structural similarity can be assumed.
In general, 30% sequence identity is required to generate an useful model.
Sequence similarity & structural similarity
As long as the length of two sequences and the percentage of identical residues fall in the
region marked as “safe” the two sequences are practically guaranteed to adopt a similar
structure.
Homology modeling concept
Structure prediction by homology modelling
An example
To know the structure of sequence A (150 amino acids long), 1ST of all compare sequence A
to all the sequences of known structures stored in the PDB (using, for example, BLAST), if
a sequence B (300 amino acids long) containing a region of 150 amino acids that match
sequence A with 50% identical residues.
As this match (alignment) clearly falls in the safe zone(50%) , we can simply take the
known structure of sequence B (the template), cut out the fragment corresponding to the
aligned region, mutate those amino acids that differ between sequences A and B, and
finally arrive at our model for structure A. Structure A is called the target and is of course
not known at the time of modeling.
HISTORY
The first homology modelling studies were done using wire and plastic models of bonds and
atoms as early as the 1960’s. The models were constructed by taking the coordinates of a
known protein structure and modified by hand for those amino acids that did not match the
structure.
In 1969 David Phillips, Brown and co-workers published the first paper regarding
homology modelling. They modelled -lactalbumin based on the structure of hen- egg
white lysozyme. The sequence identity between these two proteins was 39%.
Steps of homology modelling
Protein Sequence
1. Template recognition
and initial alignment
Sequence alignment Database Searches
2. Alignment correction
3. Backbone generation Good Secondary structure
Structure prediction
4. Loop modeling homologue?
Three dimensional
Check model structure
1.Template recognition and initial alignment
Template recognition & selection involves searching the PDB for homologous
proteins with determined structures. The search can be performed using simple
sequence alignment programs such as BLAST or FASTA as the percentage identity
between the Target sequence and a possible template is high enough in the safe
zone, to be detected with these programs.
For ex: To align the sequence LTLTLTLT with YAYAYAYAY which is nearly
impossible, then only a third sequence, TYTYTYTYT, that aligns easily to both of
them can solve the issue.
2 is correct, because it leads to a small gap, compared to a huge hole
associated with alignment 1.
3.Backbone generation
When the alignment is correct, the backbone of the target can be created.
The coordinates of the template-backbone are copied to the target.
When the residues are identical, the side-chain coordinates are also copied.
4.LOOP MODELLING
After the sequence alignment, there are often regions created by insertions and
deletions that lead to gaps in alignment. These gaps are modeled by loop
modeling, which is less accurate. Currently, two main techniques are used to
approach the problem:
The database searching method - this involves finding loops from known protein
structures and superimposing them onto the two stem regions (main chains mostly)
of the target protein. Some specialized programs like FREAD and CODA can be
used.
The ab initio method - this generates many random loops and searches for one
that has reasonably low energy and φ and ψ angles in the allowable regions in the
Ramachandran plot.
The red loop is modeled with the green
residues as anchor residues. The insertion of
2 residues results in a longer loop.
5.Side-chain modeling
This is important in evaluating protein–ligand interactions at active sites and
protein–protein interactions at the contact interface.
A side chain can be built by searching every possible conformation for every torsion
angle of the side chain to select the one that has the lowest interaction energy with
neighboring atoms.
A rotamer library can also be used, which has all the favorable side chain torsion
angles extracted from known protein crystal structures.
6: model optimization
energy minimization procedure on the entire model, by adjusting the relative
position of the atoms so that the overall conformation of the molecule has the
lowest possible energy potential. The goal is to relieve steric collisions without
altering the overall structure.
Optimization can also be done by Molecular Dynamic Simulation which moves the
atoms toward a global minimum by applying various stimulation conditions
(heating, cooling, considering water molecules) thus having a better chance at
finding the true structure.
Energy = Stretching Energy +Bending Energy +Torsion Energy +Non- Bonded
Interaction Energy
7.Model validation
Disadvantages
Homology models are unable to predict conformations of insertions or
deletions, or side chain positions with a high level of accuracy.
Homology models are not useful in modeling and ligand docking studies
necessary for the drug designing and development process. However, it may be
helpful for the same, if the sequence identity with the template is greater than
70%.
Ramachandran plot