Databases - Final
Databases - Final
• Private database:
• Private companies sequence genomes of
commercially or scientifically interesting organisms
• Data is not available to the public free of charge
• Academics normally are not able to pay the money
required for accessing these databases and they are
mainly used by the pharmaceutical and biotech
industries
• Sometimes, databases that have been public turn private
– For example, on the 1st of June 2002 the former public genome
database of Saccharomyces cerevisiae (yeast) and
Caenorhabditis elegans, two of the most widely-studied
eukaryote model organisms, changed from a free to a
chargeable service
– The database was taken over by Incyte Genomics in year 2000
and they charged US$2000 per lab and year
Nucleotide sequence databases
• The International Nucleotide Sequence Database (INSD) consists of
the following databases:
– GenBank (National Centre for Biotechnology Information; NCBI;
USA)
– EMBL (European Molecular Biology Laboratory; Europe)
– DDJB (DNA Databank of Japan; Japan)
•It was the very first sequence database, setup at the National
Biomedical Research Foundation (Georgetown University,
Washington DC, USA)
•In 1988 the PIR joined with two other groups: the Munich
Information Center for Protein Sequences (MIPS) in Germany
and the Japan International Protein Information Database
(Tsukuba)
•The PIR maintains several databases about proteins:
• PIR-PSD: about protein sequence
• iProClass: classification of protein according to structure
and function
• ASDB: annotation and similarity database
• P/R-NREF: a database of sequence and annotations of
proteins of known structure deposited in the PDB
• RESID: a database of covalent structure modifications
(e.g. S-S bridges)
SwissProt:
•Primary database
•It is an American database started in 1971 by the late Walter
Hamilton at Brookhaven National Laboratories at Long Island, New
York
•It is now managed by the Research Collaboratory for Structural
Bioinformatics (RCSB) at Rutgers University
•It is based in the San Diego Supercomputer Center in New Jersey,
California and at the National Institute of Standards and Technology
in Maryland
•It contains 3-D structures about proteins, nucleic acids and some
carbohydrates
•Most of the data of the PDB is generated by X-ray crystallography
and NMR
•Comprises of:
Protein Databank in Europe (PDBe)
Protein Databank in Japan (PDBj)
Research Collaboratory for Structural Bioinformatics (RCSB)
• Secondary databases:
• SCOP - Structural Classification of Proteins
• CATH - Protein Structure Classification downloaded
from PDB
• PDBsum - A pictorial database that provides an at-a-
glance overview of the contents of each 3D structure
deposited in the Protein Data Bank
Pathway Databases
• These are databases that describe biochemical
pathways, reactions, and enzymes
• For the modeling and simulation of a biopathway,
suitable information selection from public biopathway
databases, such as Kyoto Encyclopedia of Genes and
Genomes (KEGG) and BioCyc are useful
KEGG: