Sanchez Bioinformatics 1999
Sanchez Bioinformatics 1999
12 1999
BIOINFORMATICS APPLICATIONS NOTE Pages 1060–1061
Received on March 16, 1999; revised on June 9, 1999; accepted on June 24, 1999
1060
c Oxford University Press 1999
ModBase: A database of comparative protein structure models
Table 1. Contents of M OD BASE sequence databases (Bairoch and Apweiler, 1999) and var-
ious EST databases will be processed by the end of 1999.
Organism Proteins Modelsb % of organism % of organism
with proteins with residues Acknowledgments
modelsa models modeled We are grateful to Dr. Steve A. Chervitz for making links
from SGD to M OD BASE and Paul de Bakker for help
Saccharomyces cerevisiae 2587 4484 42 20
Mycoplasma genitalium 216 280 45 29
in implementing the WWW interface. RS is a Howard
Caenorhabditis elegans 7900 13523 39 22 Hughes Medical Institute predoctoral fellow. AŠ is a
Escherichia coli 1625 2560 38 27 Sinsheimer Scholar and an Alfred P. Sloan Research
Methanobacterium thermo. 663 1125 21 19 Fellow. The project has also been aided by grants from
Synechocystis sp. 1000 1670 38 25 NIH (GM 54762) and NSF (BIR-9601845).
Pyrococcus horikoshii 611 946 30 24
Methanococcus jannaschii 630 987 36 28 References
Haemophilus influenzae 670 1217 40 30
Aquifex acolicus 665 1063 44 31 Abola,B.B., Bernstein,F.C., Bryant,S.H., Koetzle,T. and Weng,J.
Mycoplasma pneumoniae 244 297 18 16 (1987) Protein data bank. In Allen,F.H., Bergerhoff,G. and
Sulfolobus solfataricus 301 579 30 26 Sievers,R. (eds), Crystallographic Databases—Information,
Content, Software Systems, Scientific Applications Data
a The number of proteins that have at least one segment modeled reliably. Commission of the International Union of Crystallography,
Whether or not a model is reliable is predicted as described briefly in the text, Bonn/Cambridge/Chester, pp. 107–132.
and in more detail in Sánchez and Šali (1998). Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J.Z., Miller,W.
b The number of models calculated for the genome. This number is larger
and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new
than the number of proteins modeled because many proteins have generation of protein database search programs. Nucleic Acids
independently calculated models for the same domain in the protein, as well
Res., 25, 3389–3402.
as independently calculated models for different domains in the same protein.
Bairoch,A. and Apweiler,R. (1999) The SWISS-PROT protein
cherichia coli, Methanobacterium thermoautotrophicum, sequence data bank and its supplement TrEMBL in 1999. Nucleic
Synechocystis sp., Pyrococcus horikoshii, Methanococcus Acids Res., 27, 49–54.
Benson,D.A., Boguski,M.S., Lipman,D.J., Ostell,J., Guel-
jannaschii, Haemophilus influenzae, Aquifex aeolicus,
lette,B.F. F., Rapp,B.A. and Wheeler,D.L. (1999) Genbank.
Mycoplasma pneumoniae and Sulfolobus solfataricus Nucleic Acids Res., 27, 12–17.
(Table 1). Guex,N., Diemand,A. and Peitsch,M.C. (1999) Protein modelling
The database is searchable by protein names, keywords, for all. Trends Biochem. Sci., 24, 364–367.
template structure, organism, model reliability, model Johnson,M.S., Srinivasan,N., Sowdhamini,R. and Blundell,T.L.
size, target–template sequence identity, and alignment (1994) Knowledge-based protein modelling. CRC Crit. Rev.
significance. It is also possible to search for sequence sim- Biochem. Mol. Biol., 29, 1–68.
ilarities to the model sequences using BLAST (Altschul Lüthy,R., Bowie,J.U. and Eisenberg,D. (1992) Assessment of pro-
et al., 1997). Searching produces a table of models tein models with three-dimensional profiles. Nature, 356, 83–85.
satisfying all search criteria. The table lists the modeled Orengo,C.A., Jones,D.T. and Thornton,J.M. (1994) Protein super-
regions of the target proteins, the templates used to con- families and domain super-folds. Nature, 372, 631–634.
Orengo,C.A., Pearl,F.M. G., Bray,J.B., Todd,A.B., Martin,A.C.,
struct the models, target-template similarities, and model
Conte,L.L. and Thornton,J.M. (1999) The CATH database pro-
reliabilities. For each model, it also includes links to a vides insights into protein structure/function relationship. Nu-
more detailed description of the model, a summary of all cleic Acids Res., 27, 275–279.
models for a given protein, and the PDB database (Abola Šali,A. and Blundell,T.L. (1993) Comparative protein modelling
et al., 1987) for a detailed description of the template bysatisfaction of spatial restraints. J. Mol. Biol., 234, 779–815.
structure used in modeling. The model description page Sánchez,R. and Šali,A. (1997a) Evaluation of comparative protein
contains a schematic representation of the target-template structure modeling by MODELLER-3. Proteins, Suppl. 1, 50–
alignment and links to the template fold entries in the 58.
CATH database (Orengo et al., 1999). In addition, it Sánchez,R. and Šali,A. (1997b) Advances in comparative protein-
links to the model coordinates in the PDB format, the structure modeling. Curr. Opin. Struct. Biol., 7, 206–214.
target-template alignment used to derive the model, and Sánchez,R. and Šali,A. (1998) Large-scale protein structure model-
ing of the Saccharomyces cerevisiae genome. Proc. Natl Acad.
display of the model by the 3D visualization program
Sci. USA, 95, 13597–13602.
RASMOL (Sayle and Milner-White, 1995). Sánchez,R. and Šali,A. (1999) Comparative protein structure mod-
In the future, M OD BASE will grow to reflect (i) the eling in genomics. J. Comp. Phys., 151, 388–401.
growth of the sequence databases, (ii) the growth of Sayle,R. and Milner-White,B.J. (1995) RasMol: Biomolecular
the database of known protein structures, (iii) and im- graphics for all. Trends Biochem. Sci., 20, 374.
provements in the software for calculating the models. Sippl,M.J. (1993) Recognition of errors in three-dimensional struc-
It is expected that the S WISS -P ROT+T R EMBL protein tures of proteins. Proteins, 17, 355–362.
1061