Sanchez Bioinformatics 1999

MODBASE is a database of comparative protein structure models generated automatically from protein sequences. The database contains over 17,000 models of proteins from various organisms. Models are evaluated for accuracy based on overlap with known protein structures to identify reliable models for predicting protein structure and function.

Uploaded by

hahaha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views

Sanchez Bioinformatics 1999

Uploaded by

hahaha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Vol. 15 no.

12 1999
BIOINFORMATICS APPLICATIONS NOTE Pages 1060–1061

M OD BASE: A database of comparative protein

structure models
Roberto Sánchez and Andrej Šali
Laboratories of Molecular Biophysics, The Pels Family Center for Biochemistry and
Structural Biology, The Rockefeller University, 1230 York Ave, New York, NY 10021,
USA

Received on March 16, 1999; revised on June 9, 1999; accepted on June 24, 1999

Abstract quence and a related known protein structure are aligned

Summary: M OD BASE is a database of evaluated and by the ALIGN2D command of M ODELLER (Sánchez and
annotated comparative protein structure models. The Šali, in preparation). This procedure places gaps in the
database also includes fold assignments and alignments structurally reasonable context. In the third step, all the
on which the models were based. pairwise sequence–structure alignments are used indi-
Availability: M OD BASE is accessible on the Web at http:// vidually to build 3D models for the matched parts of the
guitar.rockefeller.edu/ modbase. Models for yeast proteins protein sequences by the program M ODELLER (Šali and
are also accessible through links from the S ACCH 3D Blundell, 1993; Sánchez and Šali, 1997a). The fourth step,
database at http:// genome-www.stanford.edu/ Sacch3D. evaluation of models, is discussed in the following section.
Contact: [email protected]; http//guitar.rockefeller.edu/ It is essential for assessing the value of 3D protein
models to estimate their overall accuracy (Lüthy et al.,
Native three-dimensional structure (3D) of a protein is 1992; Sippl, 1993; Sánchez and Šali, 1997b). In the fold
valuable in testing, understanding, and modifying protein assignment step of the pipeline, a relatively permissive
function. While 3D structures of only a tiny fraction of cutoff is used for selecting known protein structures for
known protein sequences (Benson et al., 1999) have been model building. This results in a smaller number of
defined experimentally (Abola et al., 1987), comparative missed hits, but it also increases the number of false fold
modeling can frequently provide a useful 3D model assignments and the number of mistakes in alignments.
of a protein (Johnson et al., 1994; Sánchez and Šali, The fold assignment errors begin to appear when relatively
1997b). Despite the usefulness of comparative modeling, dissimilar template–target sequences are matched (i.e.
it is still not a common sequence analysis tool for the <30% sequence identity). In addition, even if the fold is
biologist, partly due to the lack of easy access to reliable assigned correctly, errors in the alignment may still result
and evaluated models. The S WISS -M ODEL (Guex et al., in a bad model. The alignment errors can be significant
1999) database of comparative models attempts to resolve when the sequence identity drops below 35%. A reliable
this problem, as does the M OD BASE database described model is obtained only if both the correct fold assignment
in this paper. and an approximately correct alignment are made. The
M OD BASE is a database of annotated comparative overall accuracy of a model is measured by an overlap
protein structure models. The models consist of coordi- between the model and the actual structure. The overlap
nates for all non-hydrogen atoms in the modeled part of is defined as the fraction of residues whose Cα atoms are
a protein. Models are generated entirely automatically within 3.5 Å of each other in the globally superposed
in a four step procedure (Sánchez and Šali, 1998, 1999): pair of structures. Models that overlap with the correct
(i) fold assignment, (ii) sequence–structure alignment, structures in more than 30% of their residues are defined
(iii) model building, and (iv) model evaluation. This pro- here as ‘good’ models. Such models are likely to have
cedure can be applied to thousands of protein sequences, a correct fold, which is frequently sufficient for coarse
including complete genomes and large protein sequence prediction of protein function (Orengo et al., 1994). A
databases. In the fold assignment step, each sequence method for calculating the probability of whether a given
from a genome is compared with a non-redundant set model is good, pG, was developed (Sánchez and Šali,
of proteins of known 3D structure (Abola et al., 1987). 1998) and is used to evaluate all the models in M OD BASE.
This is achieved by an iterative sequence similarity search The database currently contains models for segments of
by program PSI-BLAST (Altschul et al., 1997). In the more than 17,000 proteins in Saccharomyces cerevisiae,
second step, the matching parts of a given protein se- Mycoplasma genitalium, Caenorhabditis elegans, Es-

1060
c Oxford University Press 1999
ModBase: A database of comparative protein structure models

Table 1. Contents of M OD BASE sequence databases (Bairoch and Apweiler, 1999) and var-
ious EST databases will be processed by the end of 1999.
Organism Proteins Modelsb % of organism % of organism
with proteins with residues Acknowledgments
modelsa models modeled We are grateful to Dr. Steve A. Chervitz for making links
from SGD to M OD BASE and Paul de Bakker for help
Saccharomyces cerevisiae 2587 4484 42 20
Mycoplasma genitalium 216 280 45 29
in implementing the WWW interface. RS is a Howard
Caenorhabditis elegans 7900 13523 39 22 Hughes Medical Institute predoctoral fellow. AŠ is a
Escherichia coli 1625 2560 38 27 Sinsheimer Scholar and an Alfred P. Sloan Research
Methanobacterium thermo. 663 1125 21 19 Fellow. The project has also been aided by grants from
Synechocystis sp. 1000 1670 38 25 NIH (GM 54762) and NSF (BIR-9601845).
Pyrococcus horikoshii 611 946 30 24
Methanococcus jannaschii 630 987 36 28 References
Haemophilus influenzae 670 1217 40 30
Aquifex acolicus 665 1063 44 31 Abola,B.B., Bernstein,F.C., Bryant,S.H., Koetzle,T. and Weng,J.
Mycoplasma pneumoniae 244 297 18 16 (1987) Protein data bank. In Allen,F.H., Bergerhoff,G. and
Sulfolobus solfataricus 301 579 30 26 Sievers,R. (eds), Crystallographic Databases—Information,
Content, Software Systems, Scientific Applications Data
a The number of proteins that have at least one segment modeled reliably. Commission of the International Union of Crystallography,
Whether or not a model is reliable is predicted as described briefly in the text, Bonn/Cambridge/Chester, pp. 107–132.
and in more detail in Sánchez and Šali (1998). Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J.Z., Miller,W.
b The number of models calculated for the genome. This number is larger
and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new
than the number of proteins modeled because many proteins have generation of protein database search programs. Nucleic Acids
independently calculated models for the same domain in the protein, as well
Res., 25, 3389–3402.
as independently calculated models for different domains in the same protein.
Bairoch,A. and Apweiler,R. (1999) The SWISS-PROT protein
cherichia coli, Methanobacterium thermoautotrophicum, sequence data bank and its supplement TrEMBL in 1999. Nucleic
Synechocystis sp., Pyrococcus horikoshii, Methanococcus Acids Res., 27, 49–54.
Benson,D.A., Boguski,M.S., Lipman,D.J., Ostell,J., Guel-
jannaschii, Haemophilus influenzae, Aquifex aeolicus,
lette,B.F. F., Rapp,B.A. and Wheeler,D.L. (1999) Genbank.
Mycoplasma pneumoniae and Sulfolobus solfataricus Nucleic Acids Res., 27, 12–17.
(Table 1). Guex,N., Diemand,A. and Peitsch,M.C. (1999) Protein modelling
The database is searchable by protein names, keywords, for all. Trends Biochem. Sci., 24, 364–367.
template structure, organism, model reliability, model Johnson,M.S., Srinivasan,N., Sowdhamini,R. and Blundell,T.L.
size, target–template sequence identity, and alignment (1994) Knowledge-based protein modelling. CRC Crit. Rev.
significance. It is also possible to search for sequence sim- Biochem. Mol. Biol., 29, 1–68.
ilarities to the model sequences using BLAST (Altschul Lüthy,R., Bowie,J.U. and Eisenberg,D. (1992) Assessment of pro-
et al., 1997). Searching produces a table of models tein models with three-dimensional profiles. Nature, 356, 83–85.
satisfying all search criteria. The table lists the modeled Orengo,C.A., Jones,D.T. and Thornton,J.M. (1994) Protein super-
regions of the target proteins, the templates used to con- families and domain super-folds. Nature, 372, 631–634.
Orengo,C.A., Pearl,F.M. G., Bray,J.B., Todd,A.B., Martin,A.C.,
struct the models, target-template similarities, and model
Conte,L.L. and Thornton,J.M. (1999) The CATH database pro-
reliabilities. For each model, it also includes links to a vides insights into protein structure/function relationship. Nu-
more detailed description of the model, a summary of all cleic Acids Res., 27, 275–279.
models for a given protein, and the PDB database (Abola Šali,A. and Blundell,T.L. (1993) Comparative protein modelling
et al., 1987) for a detailed description of the template bysatisfaction of spatial restraints. J. Mol. Biol., 234, 779–815.
structure used in modeling. The model description page Sánchez,R. and Šali,A. (1997a) Evaluation of comparative protein
contains a schematic representation of the target-template structure modeling by MODELLER-3. Proteins, Suppl. 1, 50–
alignment and links to the template fold entries in the 58.
CATH database (Orengo et al., 1999). In addition, it Sánchez,R. and Šali,A. (1997b) Advances in comparative protein-
links to the model coordinates in the PDB format, the structure modeling. Curr. Opin. Struct. Biol., 7, 206–214.
target-template alignment used to derive the model, and Sánchez,R. and Šali,A. (1998) Large-scale protein structure model-
ing of the Saccharomyces cerevisiae genome. Proc. Natl Acad.
display of the model by the 3D visualization program
Sci. USA, 95, 13597–13602.
RASMOL (Sayle and Milner-White, 1995). Sánchez,R. and Šali,A. (1999) Comparative protein structure mod-
In the future, M OD BASE will grow to reflect (i) the eling in genomics. J. Comp. Phys., 151, 388–401.
growth of the sequence databases, (ii) the growth of Sayle,R. and Milner-White,B.J. (1995) RasMol: Biomolecular
the database of known protein structures, (iii) and im- graphics for all. Trends Biochem. Sci., 20, 374.
provements in the software for calculating the models. Sippl,M.J. (1993) Recognition of errors in three-dimensional struc-
It is expected that the S WISS -P ROT+T R EMBL protein tures of proteins. Proteins, 17, 355–362.

1061

Unit 5 Anatomy and Physiology in Health and Social Care
90% (31)
Unit 5 Anatomy and Physiology in Health and Social Care
53 pages
Saji, 2007 - GA341
No ratings yet
Saji, 2007 - GA341
15 pages
Protein Modeling in Biochemistry
No ratings yet
Protein Modeling in Biochemistry
29 pages
Pieper NucleicAcidsRes 2004
No ratings yet
Pieper NucleicAcidsRes 2004
6 pages
Sanchez CurrOpinStructBiol 1997
No ratings yet
Sanchez CurrOpinStructBiol 1997
9 pages
Protein Side Chain Correction
No ratings yet
Protein Side Chain Correction
28 pages
Tools For Analyzing Comparative Protein Structure
No ratings yet
Tools For Analyzing Comparative Protein Structure
7 pages
Bioinformatics Notes - 17Bt54: Module - 4
No ratings yet
Bioinformatics Notes - 17Bt54: Module - 4
48 pages
3-D Structure of Proteins: Laws of Physics Theory of Evolution
No ratings yet
3-D Structure of Proteins: Laws of Physics Theory of Evolution
9 pages
Topical Guidebook For GCE O Level Biology 3 Part 2
From Everand
Topical Guidebook For GCE O Level Biology 3 Part 2
Esther Chen
5/5 (1)
Tertiary Structure Prediction Methods: Any Given Protein Sequence
No ratings yet
Tertiary Structure Prediction Methods: Any Given Protein Sequence
29 pages
Main
No ratings yet
Main
15 pages
Dr. Qudsia Yousafi
No ratings yet
Dr. Qudsia Yousafi
30 pages
2. Protein Structure Prediction
No ratings yet
2. Protein Structure Prediction
34 pages
Protein Modeling: Protein Structure Prediction Other Topics
No ratings yet
Protein Modeling: Protein Structure Prediction Other Topics
76 pages
Protein Structure Prediction.pptx
No ratings yet
Protein Structure Prediction.pptx
23 pages
Experiment-7(HOMOLOGY MODELING)
No ratings yet
Experiment-7(HOMOLOGY MODELING)
12 pages
Bioinformatics TM6
No ratings yet
Bioinformatics TM6
30 pages
Protein Modelling: (Building 3D Models of Proteins)
No ratings yet
Protein Modelling: (Building 3D Models of Proteins)
19 pages
Progress and Challenges in Protein Structure Prediction - Zhang 2008
No ratings yet
Progress and Challenges in Protein Structure Prediction - Zhang 2008
7 pages
2015 Article 14 Twilight Zone
No ratings yet
2015 Article 14 Twilight Zone
11 pages
Protein Sequence
No ratings yet
Protein Sequence
36 pages
Protein structure prediction and modeling
No ratings yet
Protein structure prediction and modeling
20 pages
De Novo Protein Design
No ratings yet
De Novo Protein Design
6 pages
Eswar MethodsMolBiol 2008
No ratings yet
Eswar MethodsMolBiol 2008
25 pages
Pre-Assessment Questions
No ratings yet
Pre-Assessment Questions
18 pages
Homology Modelling
No ratings yet
Homology Modelling
8 pages
Proposal
No ratings yet
Proposal
5 pages
Advances in Protein Structure Prediction and Design
No ratings yet
Advances in Protein Structure Prediction and Design
17 pages
Homolgy Modeling
No ratings yet
Homolgy Modeling
19 pages
Protein Structure Similarity: Mlesnick@stanford - Edu
No ratings yet
Protein Structure Similarity: Mlesnick@stanford - Edu
8 pages
Fold Recognition (Threading) : Lecture-02
No ratings yet
Fold Recognition (Threading) : Lecture-02
5 pages
Genome Sequencing Projects: Increase in The Number of Protein Sequences
No ratings yet
Genome Sequencing Projects: Increase in The Number of Protein Sequences
27 pages
Generative Modeling For Protein Structures
No ratings yet
Generative Modeling For Protein Structures
12 pages
Protein Threading
No ratings yet
Protein Threading
9 pages
Lecture Notes in Bioinformatics
No ratings yet
Lecture Notes in Bioinformatics
160 pages
Fiser Bioinformatics 2003
No ratings yet
Fiser Bioinformatics 2003
2 pages
Fiser Bioinformatics 2003
No ratings yet
Fiser Bioinformatics 2003
2 pages
SWISS-MODEL - Waterhouse Et Al 2018
No ratings yet
SWISS-MODEL - Waterhouse Et Al 2018
8 pages
Reordering Life: Knowledge and Control in the Genomics Revolution
From Everand
Reordering Life: Knowledge and Control in the Genomics Revolution
Stephen Hilgartner
No ratings yet
Structural Bioinformatics and Protein Structure Prediction (1)
No ratings yet
Structural Bioinformatics and Protein Structure Prediction (1)
14 pages
Nucl. Acids Res. 2005 Ginalski 1874 91
No ratings yet
Nucl. Acids Res. 2005 Ginalski 1874 91
18 pages
Reviews: Advances in Protein Structure Prediction and Design
No ratings yet
Reviews: Advances in Protein Structure Prediction and Design
17 pages
Structural bioinformatics
No ratings yet
Structural bioinformatics
23 pages
Lec6-Protein Structure Prediction
No ratings yet
Lec6-Protein Structure Prediction
16 pages
Bioinformatics DA 2.1
No ratings yet
Bioinformatics DA 2.1
11 pages
An Initio Method8 PDF
No ratings yet
An Initio Method8 PDF
23 pages
GKL 789
No ratings yet
GKL 789
10 pages
ssrn-4541252
No ratings yet
ssrn-4541252
25 pages
Lecture3-Structural Bioinformatics-Secondary Resources
No ratings yet
Lecture3-Structural Bioinformatics-Secondary Resources
26 pages
Zhou 105
No ratings yet
Zhou 105
11 pages
Workshop Protein Modeling PDF
No ratings yet
Workshop Protein Modeling PDF
54 pages
Protein Structure Prediction
No ratings yet
Protein Structure Prediction
13 pages
Homo Logy
No ratings yet
Homo Logy
8 pages
Protein STR
No ratings yet
Protein STR
63 pages
Protein Structure Analysis and Prediction
No ratings yet
Protein Structure Analysis and Prediction
33 pages
(Ebook) Protein Structure Prediction by Daisuke Kihara (eds.) ISBN 9781493903658, 1493903659 - Instantly access the full ebook content in just a few seconds
100% (1)
(Ebook) Protein Structure Prediction by Daisuke Kihara (eds.) ISBN 9781493903658, 1493903659 - Instantly access the full ebook content in just a few seconds
59 pages
Protein Structure Modeling
No ratings yet
Protein Structure Modeling
21 pages
Introduction to Bioinformatics, Sequence and Genome Analysis
From Everand
Introduction to Bioinformatics, Sequence and Genome Analysis
Jerry H. Swift
No ratings yet
Bioinfo - S1 2021 - L9 - Protein Structure - 1 Slide
No ratings yet
Bioinfo - S1 2021 - L9 - Protein Structure - 1 Slide
87 pages
TR_20211112_许锦波_基于深度学习的蛋白质结构预测
No ratings yet
TR_20211112_许锦波_基于深度学习的蛋白质结构预测
47 pages
Purpose: NAVTEX (Navigational Telex) Forecasts
No ratings yet
Purpose: NAVTEX (Navigational Telex) Forecasts
3 pages
Case Study in Architectural Structures
No ratings yet
Case Study in Architectural Structures
18 pages
Sali Structure 2003
No ratings yet
Sali Structure 2003
5 pages
Barrientos JBiomolNMR 2001
No ratings yet
Barrientos JBiomolNMR 2001
2 pages
Sali Structure 2007
No ratings yet
Sali Structure 2007
1 page
Sali Structure 2005
No ratings yet
Sali Structure 2005
3 pages
Baldi Bioinformatics 1999
No ratings yet
Baldi Bioinformatics 1999
2 pages
Lima Structure 2006
No ratings yet
Lima Structure 2006
1 page
Sali Structure 2002
No ratings yet
Sali Structure 2002
2 pages
Vdocuments - MX - TM 1830 Aveva Everything3d 11 Draw Rev 10 PDF
No ratings yet
Vdocuments - MX - TM 1830 Aveva Everything3d 11 Draw Rev 10 PDF
346 pages
Groft NatStructBiol 2001
No ratings yet
Groft NatStructBiol 2001
1 page
Ps Final Report
No ratings yet
Ps Final Report
37 pages
Cell Adhesion Molecules 300107
100% (1)
Cell Adhesion Molecules 300107
32 pages
Biology book 1st year of high school 2015
No ratings yet
Biology book 1st year of high school 2015
49 pages
grade 6 test memo
No ratings yet
grade 6 test memo
2 pages
"Slender Hopes," by Nicolas Martin. Bestways, Nov. 1985
No ratings yet
"Slender Hopes," by Nicolas Martin. Bestways, Nov. 1985
3 pages
Ross Mounce: Contact Information
No ratings yet
Ross Mounce: Contact Information
2 pages
Deforestation - Final Prroject
No ratings yet
Deforestation - Final Prroject
14 pages
Characteristics and Classification of Living Organisms
No ratings yet
Characteristics and Classification of Living Organisms
8 pages
CAPSTONE BONDOC, JEANNE B.
No ratings yet
CAPSTONE BONDOC, JEANNE B.
14 pages
Activity 02 The Cell City
0% (1)
Activity 02 The Cell City
4 pages
A Model Organism
No ratings yet
A Model Organism
14 pages
DWI LISTIYANI (17304241032) Pendidikan Biologi A 2017: Hypogaea)
No ratings yet
DWI LISTIYANI (17304241032) Pendidikan Biologi A 2017: Hypogaea)
3 pages
Revised Matrix TOS
No ratings yet
Revised Matrix TOS
9 pages
Affinity Chromatography - Definition, Principle, Parts, Steps, Uses
No ratings yet
Affinity Chromatography - Definition, Principle, Parts, Steps, Uses
18 pages
Raiz Segunda Clase
No ratings yet
Raiz Segunda Clase
11 pages
(eBook PDF) Ecology: The Economy of Nature 7th Editionpdf download
100% (4)
(eBook PDF) Ecology: The Economy of Nature 7th Editionpdf download
42 pages
Ws. 5 Meiosis
No ratings yet
Ws. 5 Meiosis
3 pages
Sop Ihc
No ratings yet
Sop Ihc
6 pages
Student Notes Structure of DNA
No ratings yet
Student Notes Structure of DNA
15 pages
기출문제 영어 천재 이재영 5단원 2회
No ratings yet
기출문제 영어 천재 이재영 5단원 2회
9 pages
Nutrition Recommendation and Physiotherapy
No ratings yet
Nutrition Recommendation and Physiotherapy
5 pages
Lab 01 Homeostasis - Student F2020
No ratings yet
Lab 01 Homeostasis - Student F2020
11 pages
Inserto Controles 3P 1S1201 PDF
No ratings yet
Inserto Controles 3P 1S1201 PDF
9 pages
L 13 IPR Lectures notes complete2023
No ratings yet
L 13 IPR Lectures notes complete2023
282 pages
Thankyou For Downloading!: Get More Resource
No ratings yet
Thankyou For Downloading!: Get More Resource
7 pages
LESSON-2a-CELLULAR-REPRODUCTION-April-19-2024
No ratings yet
LESSON-2a-CELLULAR-REPRODUCTION-April-19-2024
43 pages
Jamia Boys Provost Thesis
No ratings yet
Jamia Boys Provost Thesis
23 pages
Thurm 2005 Measurement of Cytokine Production UNIT 7.18B
No ratings yet
Thurm 2005 Measurement of Cytokine Production UNIT 7.18B
12 pages
Lehninger 4th Edition RDT Polymerases Info
No ratings yet
Lehninger 4th Edition RDT Polymerases Info
3 pages

Sanchez Bioinformatics 1999

Uploaded by

Sanchez Bioinformatics 1999

Uploaded by

Vol. 15 no.

M OD BASE: A database of comparative protein

Abstract quence and a related known protein structure are aligned

You might also like