0% found this document useful (0 votes)
10 views

Progress and Challenges in Protein Structure Prediction - Zhang 2008

Related to bioinformatics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Progress and Challenges in Protein Structure Prediction - Zhang 2008

Related to bioinformatics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Available online at www.sciencedirect.

com

Progress and challenges in protein structure prediction


Yang Zhang

Depending on whether similar structures are found in the PDB The crucial problems/efforts in the field of protein struc-
library, the protein structure prediction can be categorized into ture prediction include: first, for the sequences of similar
template-based modeling and free modeling. Although structures in PDB (especially those of weakly/distant
threading is an efficient tool to detect the structural analogs, the homologous relation to the target), how to identify the
advancements in methodology development have come to a correct templates and how to refine the template structure
steady state. Encouraging progress is observed in structure closer to the native; second, for the sequences without
refinement which aims at drawing template structures closer to appropriate templates, how to build models of correct
the native; this has been mainly driven by the use of multiple topology from scratch. The progress made along these
structure templates and the development of hybrid knowledge- directions was assessed in the recent CASP7 experiment
based and physics-based force fields. For free modeling, [5] under the categories of template-based modeling
exciting examples have been witnessed in folding small (TBM) and free modeling (FM). Here, I will review
proteins to atomic resolutions. However, predicting structures the new progress and challenges in these directions.
for proteins larger than 150 residues still remains a challenge,
with bottlenecks from both force field and conformational Template-based modeling
search. The canonical procedure of the TBM consists of four
steps: first, finding known structures (templates) related
Address to the sequence to be modeled (target); second, aligning
Center for Bioinformatics and Department of Molecular Biosciences, the target sequence to the template structure; third,
University of Kansas, 2030 Becker Drive, Lawrence, KS 66047, United
States
building structural frameworks by copying the aligned
regions or by satisfying the spatial restraints from tem-
Corresponding author: Zhang, Yang ([email protected]) plates; fourth, constructing the unaligned loop regions
and adding side-chain atoms. The first two steps are
actually done in a single procedure called threading (or
Current Opinion in Structural Biology 2008, 18:342–348 fold recognition) [6,7] because the correct selection of
This review comes from a themed issue on
templates relies on the accurate alignment. Similarly, the
Sequence and Topology last two steps are performed simultaneously since the
Edited by Nick Grishin and Sarah Teichmann atoms of the core and loop regions are in close interaction.
Available online 22nd April 2008 The existence of similar structures in the PDB is a
0959-440X/$ – see front matter necessary precondition for the successful TBM. An
# 2008 Elsevier Ltd. All rights reserved. important question is how complete the current PDB
structure library is. Figure 1 shows a distribution of the
DOI 10.1016/j.sbi.2008.02.004
best templates found by the structural alignment [8] for
1413 representative single-domain proteins between 80
and 200 residues. Remarkably, even excluding the hom-
Introduction ologous templates of sequence identity >20%, all the
In recent years, despite many debates, structure genomics target proteins have at least one structural analog in the
is probably one of the most noteworthy efforts in protein PDB with a Ca root-mean-squared deviation (rmsd) to the
structure determination, which aims to obtain 3D models target <6 Å covering >70% regions. The average rmsd
of all proteins by an optimized combination of exper- and coverage are 2.96 Å and 86%, respectively. Zhang and
imental structure solution and computer-based structure Skolnick [9] recently showed that high-quality full-
prediction [1,2]. Two factors will dictate the success of length models could be built for all the protein targets
the structure genomics: experimental structure determi- with an average rmsd 2.25 Å when using the best tem-
nation of optimally selected proteins and efficient com- plates in the PDB. These data demonstrate that the
puter modeling algorithms. Based on about 40 000 structural universe of the current PDB library is complete
structures in the PDB library (many are redundant) [3], essentially for solving the protein structure problem for at
4 million models/fold-assignments can be obtained by a least the single-domain proteins. However, most of the
simple combination of the PSI-BLAST search and the target–template pairs at this level of sequence identity
comparative modeling technique [4]. Development of (15%) are difficult to identify by threading. In fact, after
more sophisticated and automated computer modeling excluding the templates of sequence identity >30%, only
approaches will dramatically enlarge the scope of model- two-third of the proteins could be assigned by the current
able proteins in the structure genomics project. threading techniques to the templates of a correct top-

Current Opinion in Structural Biology 2008, 18:342–348 www.sciencedirect.com


Protein structure prediction Zhang 343

Figure 1 position of the target MSA and the log-odds of the amino
acid in the template MSA, the profile [19]. There are
alternatives in calculating the PPA scores [20]. The
profile-alignment-based methods demonstrated advan-
tages in several recent blind tests [21,22,23]. In Live-
Bench-8 [21], for example, all top four servers (BASD/
MASP/MBAS, SFST/STMP, FFAS03, and ORF2/
ORFS) were based on the sequence PPA. In CAFASP
[22] and the recent CASP Server Section [23], several
sequence-profile-based methods were ranked at the top
of single-threading servers. Wu and Zhang [24] recently
showed that the accuracy of the sequence PPAs can be
further improved by about 5–6% by incorporating a
variety of additional structural information.

In CASP7, HHsearch [16], a HMM–HMM alignment


method, stands out to be the best single-threading server.
The principle of the HMM–HMM alignments and the
PPAs is similar in that both try to perform a pair-wise
alignment of the target MSA with the template MSA.
Instead of representing the MSAs by sequence profiles,
HHsearch uses profile HMMs that can generate the
sequences with certain probabilities, given by the product
Structural superimposition results of 1413 representative single-domain
proteins on their analogs in the PDB library. The structural analogs are of amino-acid emission and insertion/deletion probabil-
searched by a sequence-independent structural-alignment tool, TM- ities. HHsearch aligns the target and template HMMs by
align [8], and ranked by TM-score (a structural similarity measure maximizing the probability that two models coemit the
balancing rmsd and coverage) [51]. All structural analogs with a same amino-acid sequence. In this way, amino-acid fre-
sequence identity >20% to the target are excluded. If the analog of
the highest TM-score has a coverage below 70%, the first structural
quencies and insertions and deletions of both HMMs are
analog with the coverage >70% is presented. As a result, all the matched up together in an optimum way [16].
structural analogs have a rmsd < 6 Å; 80% have a rmsd < 4 Å with
>75% regions covered. Although the average performance differs among differ-
ent algorithms, there is not a single-threading program
that can outperform other methods for every target. This
ology with some alignment errors (average rmsd  4 Å) naturally leads to the prevalence of the so-called meta-
[10]. Thus, the role of the structure genomics initiative is server [25,26,27], which collects and combines results
to bridge the target–template gap for the remaining one- from a set of different threading programs. There are two
third proteins, as well as, to improve the alignment ways to generate predictions in meta-servers. One is to
accuracy of the two-third proteins by providing evolutio- build a hybrid model by cut-and-paste of the selected
narily closer template proteins. structural fragments from multiple templates [27]. The
combined model has on average larger coverage and
Template structure identification better topology than the best single template. One draw-
Since its invention in the early 1990s [6,7], threading has back is that often the hybrid models have nonphysical
become one of the most active areas in proteins structure local clashes between atoms. The second way is to select
prediction. Numerous algorithms have been developed the best model based on a variety of scoring functions or
during the past 15 years for the purpose of identifying machine-learning techniques, which emerges as a new
structure templates from the PDB, which use techniques research topic called Model Quality Assessment Programs
including sequence profile–profile alignments (PPAs) (MQAPs) [28]. Despite considerable efforts in developing
[10–13], structural profile alignments [14], hidden Mar- various MQAP scores, the most robust score turns out to
kov models (HMMs) [15,16], machine learning [17,18], be the one based on the structure consensus [29], that is,
and others. the best models are those simultaneously hit by various
threading algorithms. The idea behind the consensus
The sequence PPA is probably the most often-used and approach is simple because there are more ways for a
robust threading approach. Instead of matching the single threading program to select a wrong template than a right
sequences of target and template, PPA aligns a target one. Therefore, the chances for multiple threading pro-
multiple sequence alignment (MSA) with a template grams to make a common but wrong selection are much
MSA. The alignment score in the PPA is usually calcu- lower than the chances to make a common and correct
lated as a product of the amino-acid frequency at each selection.

www.sciencedirect.com Current Opinion in Structural Biology 2008, 18:342–348


344 Sequence and Topology

Table 1

Top 10 servers in CASP7 as ranked by the accumulative GDT-TS score

Servers Number GDT-TS Server type; URL address


of targets score
Zhang-Server 124 76.04 Threading, refinement, and free modeling; http://zhang.bioinformatics.ku.edu/I-TASSER
HHpred2 124 71.94 HMM–HMM alignment (single-threading server); http://toolkit.tuebingen.mpg.de/hhpred
Pmodeller6 124 71.69 Meta-threading server; http://pcons.net
CIRCLE 124 71.09 Meta-threading server; http://www.pharm.kitasato-u.ac.jp/fams/fams.html
ROBETTA 123 70.87 Threading, refinement, and free modeling; http://robetta.org/submit.jsp
MetaTasser 124 70.77 Threading, refinement, and free modeling; http://cssb.biology.gatech.edu/skolnick/
webservice/MetaTASSER
RAPTOR-ACE 124 69.70 Meta-threading server; http://ttic.uchicago.edu/jinbo/RAPTOR_form.htm
SP3 124 69.38 Profile–profile alignment (single-threading server); http://sparks.informatics.iupui.edu/hzhou/
anonymous-fold-sp3.html
beautshot 124 69.26 Meta-threading server; http://inub.cse.buffalo.edu/form.html
UNI-EID-expm 121 69.13 Profile–profile alignment (single-threading server); server not avaliable

Multiple servers from the same lab are represented by the highest rank one.

The meta-server predictors have dominated the server observation was recently made by Summa and Levitt
predictions in previous experiments (e.g. CAFASP4 [28], [37] who exploited different molecular mechanics
LiveBench-8 [21], and CASP6 [30]). In the recent CASP7 (MM) potentials (AMBER99, OPLS-AA, GROMOS96,
experiment [23], however, Zhang-Server (an automated and ENCAD) on the refinement of 75 proteins by in vacuo
server based on profile–profile threading and I-TASSER energy minimization. The authors found that a knowl-
structure refinement [31]) clearly outperforms others edge-based atomic contact potential based on the PDB
(including the meta-servers which include it as an input statistics outperforms all the traditional MM potentials by
[29]). A list of the top 10 automated servers in the CASP7 moving almost all the test proteins closer to the native
experiment is shown in Table 1. This data on the one state, while the MM potentials, except for AMBER99,
hand highlight the challenge to the MQAP methods in essentially drive the decoys away from the native. The
correctly ranking and selecting the best models; on the vacuum simulation without solvation may be a part of the
other hand, the success of the composite threading plus reason for the failure of the MM potentials. But this
refinement servers (as Zhang-Server, ROBETTA, and observation demonstrates the potential of the hybrid
MetaTasser) demonstrates the advantage of structure knowledge-based and physics-based potentials in the
refinement in the TBM prediction. protein structure refinement.

Template structure refinement Encouraging template refinements have been recently


The goal of the protein structure refinement is to draw the achieved by combining the hybrid potentials with spatial
templates closer to the native, which has proven to be an restraints from threading templates [9,38,39]. Misura
extremely nontrivial problem. Until only a few years ago, et al. [38] first built low-resolution models by
most of the TBM procedures either keep the templates ROSETTA [40] using a fragment library enriched by
unchanged or drive the templates away from the native the query-template alignment; the Cb-contact restraints
structures [32,33]. were used to guide the assembly procedure. The low-
resolution models were then refined by a physics-based
Early efforts on template structure refinement have been atomic potential. As a result, in 22 of 39 test cases, at least
focused on the molecular dynamics (MD)-based atomic 1 of the 10 lowest energy models was found closer to the
simulations, which attempt to refine low-resolution native than the template.
models by running the classic software such as AMBER
and CHARMM. Except for some isolated instances, A more comprehensive test of the template refinement
however, no systematic improvement was achieved procedure based on TASSER simulations, combined with
[34]. The failure of the MD-based structure refinements consensus spatial restraints from multiple templates, was
seems contrary to the reported successes of the MD reported by Zhang and Skolnick [9,36]. For 1489 test
potentials in discriminating the native from structural cases, TASSER reduces the rmsd of the templates in the
decoys. Wroblewska and Skolnick [35] recently showed majority of cases with an average rmsd reduction from 6.7
that the AMBER plus GB potential could only discrimi- to 4.4 Å over the threading aligned regions. Even starting
nate the native from roughly minimized TASSER struc- from the best templates as identified by the structural
ture decoys [36]. After a 2-ns MD simulation, none of the alignment, TASSER refines the models from 2.5 to 1.88 Å
native structures have the lowest energy among decoys in the aligned regions. Here, TASSER has built the
and the energy–rmsd correlation vanishes. A noteworthy structures based on a reduced model (specified by Ca

Current Opinion in Structural Biology 2008, 18:342–348 www.sciencedirect.com


Protein structure prediction Zhang 345

and side-chain center of mass) with a purely knowledge- CASP experiments and made the fragment assembly
based force field. One of the major contributions to the approach popular in the field. In the new developments
refinements is the use of multiple threading templates of ROSETTA [44,45], the authors first assemble struc-
where the consensus spatial restraint is more accurate tures in a reduced knowledge-based model with confor-
than that from the individual template. Second, the mations specified by the heavy backbone atoms and Cbs. In
composite knowledge-based energy terms have been the second stage, Monte Carlo simulations with an all-atom
extensively optimized using large-scale structure decoys physics-based potential are performed to refine the details
[41] which help coordinate the complicated correlations of the low-resolution models. An exciting achievement was
between different interaction terms. demonstrated in CASP6 by generating a model for T0281
(70 residues) of 1.6 Å away from the crystal structure. In
The progress of threading template refinements has been CASP7, ROSETTA built a model for T0283 (112 residues)
assessed in the recent CASP7 experiment, where the with rmsd = 1.8 Å over 92 residues (Figure 2, left panel).
assessors compared the predicted models with the best Despite significant success, the computer cost of the
structural template (or ‘virtual predictor group’) and procedure (150 CPU days for a small protein <100 resi-
commented that ‘The best group in this respect (24, dues) is still too expensive for the routine use.
Zhang) managed to achieve a higher GDT-TS score than
the virtual group in more than half the assessment units Another successful free modeling approach, called TAS-
and a higher GDT-HA score in approximately one-third SER [36] by Zhang and Skolnick, constructs 3D models
of cases’ [42]. This comparison may not entirely reflect based on a purely knowledge-based approach. Continu-
the template refinement ability of the algorithms because ous fragments of various sizes are excised from threading
the predictors actually start from threading templates alignments and used to reassemble protein structures in
rather than the best structural alignments and the latter an on-and-off lattice system. A newer version of I-TAS-
requests the information of the native, which was not SER was recently developed by Wu et al. [46], which
available when the predictions were made. On the con- refines the TASSER cluster centroids by iterative Monte
trary, a global GDT score comparison may favor the full- Carlo simulations. Although the procedure uses structural
length models because the template alignment has a fragments and spatial restraints from threading templates,
shorter length than the models. In a direct comparison it often constructs models of correct topology even when
of the rmsd over the same aligned regions, we find that the the topologies of individual templates are incorrect. In
first I-TASSER model is closer to the native than the best CASP7, among 19 FM and FM/TBM targets, I-TASSER
initial template in 86 of 105 TBM cases while the other 13 builds correct topology (3–5 Å) for 7 cases with
(6) cases are worse than (equal to) the template. The sequences up to 155 residues long. Figure 2 (right panel)
average rmsd is 4.9 and 3.8 Å for the templates and shows one example of T0382 (123 residues) where all
models, respectively, over the same aligned regions initial templates have a wrong topology (>9 Å) but the
[31]. final model is 3.6 Å away from the X-ray structure.

Free modeling Significant efforts have been made on the purely physics-
When structural analogs do not exist in the PDB library or based protein folding and structure prediction. The very
could not be successfully identified by threading (which is first milestone of successful ab initio protein folding is
more often the case as shown by Figure 1), the structure probably the 1997 work of Duan and Kollman, who folded
prediction has to be generated from scratch. This type of the villin headpiece (a 36-mer) by MD simulations in
predictions has been termed as ‘ab initio’ or ‘de novo’ explicit solvent for two months on parallel supercompu-
modeling, a term that may be easily understood as a ters with models up to 4.5 Å [47]. With the help of the
modeling ‘from first principle’. In CASP7, it is named worldwide-distributed computers, this small protein was
as ‘free modeling’ which I think reflects more appropri- recently folded by Pande and coworkers [48] to 1.7 Å with
ately the status of the field, since the most efficient a total simulation time of 300 ms or approximately
methods in this category still consider hybrid approaches 1000 CPU years. To reduce the computing cost, Scheraga
including both knowledge-based and physics-based and coworkers [49] developed a reduced physics-based
potentials. Evolutionary information is often used in model, called UNRES, which represents protein confor-
generating sparse spatial restraints or identifying local mations by Ca, side-chain center, and a virtual peptide
structural building blocks. group. The low-energy UNRES models are then con-
verted to all-atom representations based on ECEPP/3. In
The best-known idea for free modeling is probably the one CASP6, a structure genomic target of TM0487 (T0230,
pioneered by Bowie and Eisenberg who assembled new 102 residues) was folded to a structure within 7.3 Å by the
tertiary structures using small fragments (mainly 9-mer) approach. Using ASTRO-FOLD on the ECEPP/3 optim-
cut from other PDB proteins [43]. On the basis of similar ization, Floudas and coworkers [50] recently constructed
idea, Baker and coworkers developed ROSETTA [40], a model of 5.2 Å for a four-helical bundle protein of 102
which has worked extremely well for free modeling in the residues in a double-blind prediction.

www.sciencedirect.com Current Opinion in Structural Biology 2008, 18:342–348


346 Sequence and Topology

Figure 2

Representative examples of free modeling in CASP7 generated by two different approaches. T0283 (left panel) is a TBM target (from Bacillus
halodurans) of 112 residues; but the model is generated by all-atom ROSETTA (a hybrid knowledge-based and physics-based approach) [45] based
on free modeling, which gives a TM-score 0.74 and a rmsd 1.8 Å over the first 92 residues (the overall rmsd is 13.8 Å mainly because of the
misorientation of C-terminal). T0382 (right panel) is a FM/TBM target (from Rhodopseudomonas palustris CGA009) of 123 residues; the model is
generated by I-TASSER (a purely knowledge-based approach) [31] with a TM-score 0.66 and a rmsd 3.6 Å. Blue and red represent the model and the
crystal structure, respectively.

Conclusions selecting models from a set of other threading programs.


Since a detailed physicochemical description of protein On the contrary, the template structure refinement has
folding principles does not yet exist, the protein structure enjoyed promising progress. In the recent CASP7 exper-
prediction problem is largely defined by the evolutionary iment [23], automated threading plus structure refine-
or structural distance between the target and the solved ment servers outperforms by a margin the threading-only
proteins in the PDB library. For the proteins with close and the MQAP-based meta-servers. Nevertheless, the
templates, full-length models can be constructed by template refinement mainly occurs at the topology level.
copying the template framework. Recent studies show The demand for atomic-level structural refinements,
that if using the best possible template structures in PDB, which can generate models of use in drug screening
the state-of-the-art modeling algorithms could build high- and biochemical function inference, is keener than ever,
quality full-length models for almost all single-domain especially when more and more template structures
proteins with an average rmsd 2.3 Å; this suggests that become available through the structure genomics and
the current PDB structure universe may be approaching traditional structural biology.
complete for solving the protein structure prediction
problem [9]. However, most of the target–template Free modeling is certainly the ‘Holy Grail’ of the protein
pairs are evolutionarily too distant to be detected with structure prediction because its success would mark the
the current threading approaches. eventual solution to the problem. Although a purely phy-
sics-based ab initio simulation has the advantage in reveal-
The development of efficient threading algorithms to ing the pathway of protein folding, the best current free-
detect weakly/distant homologous templates has been a modeling results come from those which combine both
central theme in the field and may persist as a principal knowledge-based and physics-based approaches. Although
direction, as the gap between threading and the best there are consistent successes in building correct topology
structural alignment is obvious and tempting. However, (3–6 Å) for small proteins, the more exciting high-resol-
progress in reducing this gap is slow or incremental since ution free modeling (<2 Å) is rarer and computationally
the invention of the PPA techniques. There is no single- expensive. There is evidence that the current atomic
threading method that outperforms all others on every potentials have the lowest energy near the native state
target; this results in the prevalence of the meta-servers and the bottleneck of high-resolution folding seems to be
and MQAP which generate predictions by collecting and the insufficient conformational sampling [44]. However,

Current Opinion in Structural Biology 2008, 18:342–348 www.sciencedirect.com


Protein structure prediction Zhang 347

a golf-hole-like energy landscape without middle-range 13. Ginalski K, Pas J, Wyrwicz LS, von Grotthuss M, Bujnicki JM,
Rychlewski L: ORFeus: detection of distant homology using
funnel should not be the one taken in nature, which can be sequence profiles and predicted secondary structure. Nucleic
a deeper reason for the failure of conformational search. Acids Res 2003, 31:3804-3807.
Thus, the bottleneck for free modeling comes from the 14. Shi J, Blundell TL, Mizuguchi K: FUGUE: sequence–structure
homology recognition using environment-specific
lack of both funnel-like force fields and efficient space substitution tables and structure-dependent gap penalties. J
searching, especially for proteins of larger sizes. Mol Biol 2001, 310:243-257.
15. Karplus K, Barrett C, Hughey R: Hidden Markov models for
Acknowledgements detecting remote protein homologies. Bioinformatics 1998,
The project is supported in part by KU Start-up Fund 06194, the Alfred P. 14:846-856.
Sloan Foundation, and Grant Number R01GM083107 of the National 16. Soding J: Protein homology detection by HMM–HMM
Institute of General Medical Sciences.  comparison. Bioinformatics 2005, 21:951-960.
The sequence–HMM alignment is extended to the pair-wise profile HMM–
References and recommended reading HMM alignment for the remote homology detection. The HHsearch is one
of the best single-threading servers in CASP7.
Papers of particular interest, published within the annual period of
review, have been highlighted as: 17. Jones DT: GenTHREADER: an efficient and reliable protein fold
recognition method for genomic sequences. J Mol Biol 1999,
 of special interest 287:797-815.
 of outstanding interest 18. Cheng J, Baldi P: A machine learning information retrieval
approach to protein fold recognition. Bioinformatics 2006,
22:1456-1463.
1. Burley SK, Almo SC, Bonanno JB, Capel M, Chance MR,
Gaasterland T, Lin D, Sali A, Studier FW, Swaminathan S: 19. Gribskov M, McLachlan AD, Eisenberg D: Profile analysis:
Structural genomics: beyond the human genome project. Nat detection of distantly related proteins. Proc Natl Acad Sci U S A
Genet 1999, 23:151-157. 1987, 84:4355-4358.
2. Chandonia JM, Brenner SE: The impact of structural genomics: 20. Sadreyev R, Grishin N: COMPASS: a tool for comparison of
 expectations and outcomes. Science 2006, 311:347-351. multiple protein alignments with assessment of statistical
The authors review and assess the gain and loss of the structural significance. J Mol Biol 2003, 326:317-336.
genomics project in the past five years in contrast with traditional
structural biology. 21. Rychlewski L, Fischer D: LiveBench-8: the large-scale,
continuous assessment of automated protein structure
3. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, prediction. Protein Sci 2005, 14:240-245.
Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids
Res 2000, 28:235-242. 22. Fischer D, Rychlewski L, Dunbrack RL Jr, Ortiz AR, Elofsson A:
CAFASP3: the third critical assessment of fully automated
4. Pieper U, Eswar N, Davis FP, Braberg H, Madhusudhan MS, structure prediction methods. Proteins 2003, 53(Suppl 6):503-
 Rossi A, Marti-Renom M, Karchin R, Webb BM, Eramian D et al.: 516.
MODBASE: a database of annotated comparative protein
structure models and associated resources. Nucleic Acids Res 23. Battey JN, Kopp J, Bordoli L, Read RJ, Clarke ND, Schwede T:
2006, 34:D291-D295.  Automated server predictions in CASP7. Proteins 2007,
MODBASE is a database of 3D models built by the MODELLER pipeline 69(Suppl 8):68-82.
for all protein sequences in SwissProt based on available structural It is an official assessment paper for the structure prediction servers in
templates in the PDB library. CASP7, which is especially helpful for the users who want to find
appropriate servers for generating their own structure prediction.
5. Moult J, Fidelis K, Kryshtafovych A, Rost B, Hubbard T,
Tramontano A: Critical assessment of methods of protein 24. Wu ST, Zhang Y: MUSTER: improving protein sequence profile–
structure prediction (CASP) — round VII. Proteins 2007, profile alignments by using multiple sources of structure
69(Suppl 8):3-9. information. Proteins 2008 doi: 10.1002/prot.21945.
6. Bowie JU, Luthy R, Eisenberg D: A method to identify protein 25. Ginalski K, Elofsson A, Fischer D, Rychlewski L: 3D-Jury: a simple
sequences that fold into a known three-dimensional structure. approach to improve protein structure predictions.
Science 1991, 253:164-170. Bioinformatics 2003, 19:1015-1018.
7. Jones DT, Taylor WR, Thornton JM: A new approach to protein 26. Wu ST, Zhang Y: LOMETS: a local meta-threading-server for
fold recognition. Nature 1992, 358:86-89.  protein structure prediction. Nucleic Acids Res 2007, 35:3375-
3382.
8. Zhang Y, Skolnick J: TM-align: a protein structure alignment LOMETS is a new meta-server with all individual threading programs
algorithm based on the TM-score. Nucleic Acids Res 2005, installed locally, which ensures a quick collection and selection of multiple
33:2302-2309. threading results.
9. Zhang Y, Skolnick J: The protein structure prediction problem 27. Fischer D: 3D-SHOTGUN: a novel, cooperative, fold-
 could be solved using the current PDB library. Proc Natl Acad recognition meta-predictor. Proteins 2003, 51:434-441.
Sci U S A 2005, 102:1029-1034.
Using the best available templates, TASSER could build high-quality 28. Fischer D: Servers for protein structure prediction. Curr Opin
models for all single-domain proteins. This shows that the current struc- Struct Biol 2006, 16:178-182.
ture set in PDB is essentially complete for the protein structure prediction
problem, though most of the templates are not detectable by current 29. Wallner B, Elofsson A: Prediction of global and local model
threading approaches.  quality in CASP7 using Pcons and ProQ. Proteins 2007,
69(Suppl 8):184-193.
10. Skolnick J, Kihara D, Zhang Y: Development and large scale The Pcons-server generates structure predictions by ranking and select-
benchmark testing of the PROSPECTOR 3.0 threading ing models generated by other programs. It shows that the structural
algorithm. Protein 2004, 56:502-518. consensus is the most robust score for protein model selection.
11. Jaroszewski L, Rychlewski L, Li Z, Li W, Godzik A: FFAS03: a 30. Moult J, Fidelis K, Rost B, Hubbard T, Tramontano A: Critical
server for profile–profile sequence alignments. Nucleic Acids assessment of methods of protein structure prediction (CASP)
Res 2005, 33:W284-W288. — round 6. Proteins 2005, 61(Suppl 7):3-7.
12. Zhou H, Zhou Y: Fold recognition by combining sequence 31. Zhang Y: Template-based modeling and free modeling
profiles derived from evolution and from depth-dependent  by I-TASSER in CASP7. Proteins 2007, 69(Suppl 8):
structural alignment of fragments. Proteins 2005, 58:321-328. 108-117.

www.sciencedirect.com Current Opinion in Structural Biology 2008, 18:342–348


348 Sequence and Topology

Template structures can be refined significantly closer to the native by a 42. Kopp J, Bordoli L, Battey JN, Kiefer F, Schwede T: Assessment of
purely knowledge-based I-TASSER modeling. I-TASSER also generated  CASP7 predictions for template-based modeling targets.
the correct topology for 7 of 19 free modeling targets in CASP7. Proteins 2007, 69(Suppl 8):38-56.
The paper assesses the template-based modeling category, which
32. Tress M, Ezkurdia I, Grana O, Lopez G, Valencia A: Assessment of includes 108 out of a total of 123 targets/domains in CASP7. Progress
predictions submitted for the CASP6 comparative modeling in the template refinement is highlighted.
category. Proteins 2005, 61(Suppl 7):27-45.
43. Bowie JU, Eisenberg D: An evolutionary approach to folding
33. Tramontano A, Morea V: Assessment of homology-based small alpha-helical proteins that uses sequence information
predictions in CASP5. Proteins 2003, 53(Suppl 6):352-368. and an empirical guiding fitness function. Proc Natl Acad Sci U
34. Lee MR, Tsai J, Baker D, Kollman PA: Molecular dynamics in the S A 1994, 91:4436-4440.
endgame of protein structure prediction. J Mol Biol 2001, 44. Bradley P, Misura KM, Baker D: Toward high-resolution de novo
313:417-430.  structure prediction for small proteins. Science 2005,
35. Wroblewska L, Skolnick J: Can a physics-based, all-atom 309:1868-1871.
 potential find a protein’s native structure among misfolded This is the first work to report successful high-resolution modeling cases
structures? I. Large scale AMBER benchmarking. J Comput by free modeling. It states that atomic potentials have the lowest energy
Chem 2007, 28:2059-2066. near the native state and the bottleneck for high-resolution free modeling
AMBER plus GB solvation potential can discriminate the native from the is the insufficient conformation search.
roughly minimized structural decoys. After a longer MD simulation,
45. Das R, Qian B, Raman S, Vernon R, Thompson J, Bradley P,
however, the energy–rmsd correlation vanishes. This finding partially
 Khare S, Tyka MD, Bhat D, Chivian D et al.: Structure prediction
explains the discrepancy between the discrimination ability and some
for CASP7 targets using extensive all-atom refinement with
unsuccessful folding/refinement results of the physics-based potentials.
Rosetta@home. Proteins 2007, 69(Suppl 8):118-128.
36. Zhang Y, Skolnick J: Automated structure prediction of weakly The paper summarizes the recent progress of ROSSETA using distributed
homologous proteins on a genomic scale. Proc Natl Acad Sci U computing resource and the performance of ROSETTA@home on the
S A 2004, 101:7594-7599. CASP7 targets.

37. Summa CM, Levitt M: Near-native structure refinement using in 46. Wu ST, Skolnick J, Zhang Y: Ab initio modeling of small proteins
 vacuo energy minimization. Proc Natl Acad Sci U S A 2007,  by iterative TASSER simulations. BMC Biol 2007, 5:17.
104:3177-3182. By iterative TASSER assembly, I-TASSER is able to generate medium-
The in vacuo energy minimization experiments show that a knowledge- resolution to high-resolution models for small proteins without using
based atomic contact potential from the PDB statistics outperforms all homologous templates. The computing cost is significantly lower than
traditional molecular mechanics potentials in driving the protein structure the corresponding atomic-based structure predictions.
decoys toward the native state.
47. Duan Y, Kollman PA: Pathways to a protein folding intermediate
38. Misura KM, Chivian D, Rohl CA, Kim DE, Baker D: Physically observed in a 1-microsecond simulation in aqueous solution.
 realistic homology models built with ROSETTA can be more Science 1998, 282:740-744.
accurate than their templates. Proc Natl Acad Sci U S A 2006,
103:5361-5366. 48. Zagrovic B, Snow CD, Shirts MR, Pande VS: Simulation of folding
The hybrid approaches of the ROSETTA structure assembly combined of a small alpha-helical protein in atomistic detail using
with atomic refinements guided by spatial restraints are shown to be able worldwide-distributed computing. J Mol Biol 2002, 323:927-
to draw 22 of 39 template models closer to the native. 937.

39. Chen J, Brooks CL III: Can molecular dynamics simulations 49. Oldziej S, Czaplewski C, Liwo A, Chinchio M, Nanias M, Vila JA,
 provide high-resolution refinement of protein structure?  Khalili M, Arnautova YA, Jagielska A, Makowski M et al.: Physics-
Proteins 2007, 67:922-930. based protein-structure prediction using a hierarchical
CHARMM22/GBSW with spatial restraints are able to refine four of five protocol based on the UNRES force field: assessment in two
CASP6 CM targets with up to 1 Å rmsd reduction, a new progress of the blind tests. Proc Natl Acad Sci U S A 2005, 102:7547-7552.
MD-based structure refinements. By using a reduced physics-based approach, UNRES is able to generate
correct topologies for proteins up to 102 residues.
40. Simons KT, Kooperberg C, Huang E, Baker D: Assembly of
protein tertiary structures from fragments with similar local 50. Klepeis JL, Wei Y, Hecht MH, Floudas CA: Ab initio prediction of
sequences using simulated annealing and Bayesian scoring the three-dimensional structure of a de novo designed
functions. J Mol Biol 1997, 268:209-225. protein: a double-blind case study. Proteins 2005, 58:560-570.
41. Zhang Y, Kolinski A, Skolnick J: TOUCHSTONE II: a new 51. Zhang Y, Skolnick J: Scoring function for automated
approach to ab initio protein structure prediction. Biophys J assessment of protein structure template quality. Proteins
2003, 85:1145-1164. 2004, 57:702-710.

Current Opinion in Structural Biology 2008, 18:342–348 www.sciencedirect.com

You might also like