Protein Structure, Databases and Structural Alignment

Protein Structure, DatabaSeS anD
Structural alignment
Saramita De
Chakravarti
Research Scientist, II (i(
Chembiotech
Research
Laboratories
1

2
Basics of proteinBasics of protein
structurestructure

3
Why Proteins Structure ?Why Proteins Structure ?
 Proteins are fundamental components of all living
cells, performing a variety of biological tasks.
 Each protein has a particular 3D structure that
determines its function.
 Protein structure is more conserved than protein
sequence, and more closely related to function.

4
Protein core - usually conserved.
Protein loops - variable regions
Hydrophobic core
Surface loops
Protein Structure

5
Supersecondary structures
Assembly of secondary structures which are
shared by many structures.
Beta hairpin
Beta-alpha-beta unit
Helix hairpin

6
Hemoglobin (1bab(
Fold: General structure composed of
sets of Supersecondary structures

7
http://scop.berkeley.edu/count.html
How Many Folds Are There ?How Many Folds Are There ?

8
• Two conserved sequences similar structures
• Two similar structures conserved sequences?
Structure – Sequence RelationshipsStructure – Sequence Relationships
There are cases of proteins with the same
structure but no clear sequence similarity.

9
Principles of Protein Structure
•Today's proteins reflect millions of years of
evolution.
•3D structure is better conserved than sequence
during evolution.
•Similarities among sequences or among
structures may reveal information about shared
biological functions of a protein family.

10
The Levinthal paradox
Assume a protein is comprised of 100 AAs and that
each AA can take up 10 different conformations.
Altogether we get:10100
(i.e. google( conformations.
If each conformation were sampled in the shortest
possible time (time of a molecular vibration ~ 10-13
s(
it would take an astronomical amount of time (~1077
years( to sample all possible conformations, in order
to find the Native State.

11
The Levinthal paradox
Luckily, nature works out with these sorts of
numbers and the correct conformation of a protein
is reached within seconds.

12
How is the 3D Structure Determined ?How is the 3D Structure Determined ?
Experimental methods (Best approach(:Experimental methods (Best approach(:
• X-rays crystallography.
• NMR.
• Others (e.g., neutron diffraction(.

13
In-silico methodsIn-silico methods
Ab-initio structure prediction given only the
sequence as input - not always successful.

14
A note on ab-initio predictions: The
current state is that “failure can no
longer be guaranteed”…

15
A note on ab-initio secondary structure
prediction: Success ~70%.

16
In-silico methodsIn-silico methods
Threading = Sequence-structure alignment. The
idea is to search for a structure and sequence in
existing databases of 3D structure, and use
similarity of sequences + information on the
structures to find best predicted structures.

17
Comments
• X-ray crystallography is the most widely
used method.
• Quaternary structure of large proteins
(ribosomes, virus particles, etc) can be
determined by electron microscopes
(cryoEM).

18
Protein DatabasesProtein Databases

19
PDB: Protein Data Bank
• Holds 3D models of biological macromolecules
(protein, RNA, DNA).
• All data are available to the public.
• Obtained by X-Ray crystallography (84%) or NMR
spectroscopy (16%).
• Submitted by biologists and biochemists from
around the world.

20
PDB: Protein Data Bank
•Founded in 1971 by Brookhaven National
Laboratory, New York.
•Transferred to the Research Collaboratory
for Structural Bioinformatics (RCSB) in 1998.
•Currently it holds > 49,426 released
structures.
61695

21
PDB - model
• A model defines the 3D positions of atoms in
one or more molecules.
• There are models of proteins, protein
complexes, proteins and DNA, protein
segments, etc …
• The models also include the positions of ligand
molecules, solvent molecules, metal ions, etc.

22
PDB – Protein Data Bank
http://www.pdb.org/pdb/home/home.do

23
The PDB file – text formatThe PDB file – text format

24
The PDB file – textThe PDB file – text formatformat
ATOM:
Usually protein
or DNA
HETATM:
Usually Ligand,
ion, water
chain
Residue
identity
Residue
number
Atom
number
Atom
identity
The coordinates
for each residue in
the structure
X Y Z

26
Why structural alignment?
• Structural similarity can point to remote
evolutionary relationship
• Shared structural motifs among proteins
suggest similar biological function
• Getting insight into sequence-structure
mapping (e.g., which parts of the protein
structure are conserved among related
organisms).

27
As in any alignment problem, we can
search for GLOBAL ALIGNMENT or for
LOCAL ALIGNMENT

28
Human Myoglobin
pdb:2mm1
Human Hemoglobin
alpha-chain
pdb:1jebA
Sequence id: 27%
Structural id: 90%

29
What is the best transformation thatWhat is the best transformation that
superimposes the unicorn on the lion?superimposes the unicorn on the lion?

30
Solution:
Regard the shapes as sets of points
and try to “match”
these sets using a transformation

31
This is not a good result….

33
Kinds of transformations:
• Rotation
• Translation
• Scaling
and more….

37
We represent a protein as a geometric
object in the plane.
The object consists of points represented
by coordinates (x, y, z).
Thr
Lys
Met Gly
Glu
Ala

38
The aim:
Given two proteins
Find the transformation that produces
the best Superimposition of one protein
onto the other

39
Correspondence is Unknown
Given two configurations of points in the three
dimensional space:
+

40
Find those rotations and translations of one of the point
sets which produce “large” superimpositions of
corresponding 3-D points
?

42
Simple case – two closely related proteins with the
same number of amino acids.
Question:
how do we asses the
quality of the
transformation?
+

43
Scoring the Alignment
Two point sets: A={ai} i=1…n
B={bj} j=1…m
• Pairwise Correspondence:
(ak1,bt1) (ak2,bt2)… (akN,btN)
(1) Bottleneck max ||aki – bti||
(2) RMSD (Root Mean Square Distance)
Sqrt( Σ||aki – bti||2
/N)

44
RMSD – Root Mean Square
Deviation
Given two sets of 3-D points :
P={pi}, Q={qi} , i=1,…,n;
rmsd(P,Q) = √ Σ i|pi - qi |2
/n
Find a 3-D transformation T*
such that:
rmsd( T*
(P), Q ) = minT √ Σ i|T(pi) - qi |2
/n
Find the highest number of atoms aligned with the lowest RMSD

45
Pitfalls of RMSD
• all atoms are treated equally
(residues on the surface have a higher degree of
freedom than those in the core)
• best alignment does not always mean minimal
RMSD
• does not take into account the attributes of the
amino acids

46
Flexible alignment vs. Rigid
alignment
Rigid alignment
Flexible alignment

47
Some more issuesSome more issues

48
Does the fact that all proteins have alpha-
helix indicates that they are all evolutionary
related?
No. Alpha helices reflect physical constraints,
as do beta sheets.
For structures – it is difficult sometimes
to separate convergent evolution from
evolutionary relatedness.

49
Structural genomics: solve or predict 3D of
all proteins of a given organism (X-ray, NMR,
and homology modelling).
Unlike traditional structural biology, 3D is
often solved before anything is known on
the protein in question. A new challenge
emerged: predict a protein’s function from
its 3D structure.

50
CASP: a competition for predicting 3D
structures.
Instead of running to publish a new 3D
structure, the AA sequence is published and
each group is invited to give their
predictions.

51
Capri: same as casp – but for docking.

52
Homology modeling: predicting the
structure from a closely related known
structure.
This can be important for example to
predict how a mutation influences the
structure

Protein Structure, Databases and Structural Alignment

Recommended

More Related Content

What's hot (20)

Similar to Protein Structure, Databases and Structural Alignment (20)

More from Saramita De Chakravarti (7)

Recently uploaded (20)

Protein Structure, Databases and Structural Alignment

Editor's Notes