0% found this document useful (0 votes)
26 views

Databases

NCBI houses several biomedical databases including GenBank for DNA sequences and PubMed. It is directed by David Lipman and located in Bethesda, Maryland. EMBL maintains the nucleotide sequence database in collaboration with DDBJ and GenBank. Entrez is NCBI's retrieval system that integrates data from various databases through cross-referencing.

Uploaded by

Nandni Jha
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Databases

NCBI houses several biomedical databases including GenBank for DNA sequences and PubMed. It is directed by David Lipman and located in Bethesda, Maryland. EMBL maintains the nucleotide sequence database in collaboration with DDBJ and GenBank. Entrez is NCBI's retrieval system that integrates data from various databases through cross-referencing.

Uploaded by

Nandni Jha
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

NCBI

• NCBI stands for National Centre for


Biotechnology Information.
• It is part of the United States National Library of
Medicine (NLM), a branch of the National
Institutes of Health.
• The NCBI is located in Bethesda, Maryland was
founded in 1988.
• The NCBI houses a series of databases relevant
to biotechnology and biomedicine.
• Major databases include Genebank for DNA
sequences and PubMed, a bibliographic
database for the biomedical literature.
• Other databases include the NCBI
Epigenomics database. All these databases are
available online through the Entrez search
engine.
• NCBI is directed by David Lipman, one of the
original authors of the BLAST sequence
alignment program.
EMBL
• The EMBL Nucleotide Sequence Database
(http://www.ebi.ac.uk/embl/), maintained at the European
Bioinformatics Institute (EBI), incorporates, organizes and
distributes nucleotide sequences from public sources.
• The database is a part of an international collaboration with DDBJ
(Japan) and GenBank (USA).
• Data are exchanged between the collaborating databases on a daily
basis.
• The web-based tool, Webin, is the preferred system for individual
submission of nucleotide sequences, including Annotation and
alignment data.
• Automatic submission procedures are used for submission of data
from large-scale genome sequencing centers and from the
European Patent Office.
• The latest data collection can be accessed via FTP, email
and WWW interfaces.
• The EBI's Sequence Retrieval System (SRS) integrates and
links the main nucleotide and protein databases as well as
many other specialist molecular biology databases.
• For sequence similarity searching, a variety of tools (e.g.
FASTA and BLAST) are available that allow external users
to compare their own sequences against the data in the
EMBL Nucleotide Sequence Database, the complete
genomic component subsection of the database, the WGS
data sets and other databases.
• All available resources can be accessed via the EBI home
page at http://www.ebi.ac.uk.
Home page of EMBL-ENA database
Result
Text file format
Fasta sequence
Home page of EMBL-EBI database
Total hits
Results
DDBJ
• DNA Data Bank of Japan is a biological database that collects DNA
sequences.
• It is located at National Institute of Genetics (NIG) in the Shizuoka
prefecture of Japan.
• It is also member of the International Nucleotide Sequence
Database Collaboration or INSDC.
• It exchanges its data with European Molecular Biology Laboratory
at European Bioinformatics Institute and with Genbank at the
National Center for Biotechnology Information on a daily basis.
• Thus these three databanks contains the same data at any given
time.
• DDBJ began data bank activities in 1986 at NIG and remains the
only nucleotide sequence data bank in Asia.
• Although DDBJ mainly receives its data from Japanese
researchers, it can accept data from contributors from any other
country.
• DDBJ is primarily founded by Japanese Ministry of Education,
Culture, Sports, Science and Technology.
• DDBJ has an international advisory committee which consists of
nine members, 3 member each from Europe, US and Japan.
• This committee advises DDBJ about its maintenance,
management and future plans once a year.
• Apart from this DDBJ also has an international collaborative
committee which advices on various technical issues related to
international collaboration and consists of working level
participants.
Home page of DDBJ
Search and analysis
Flat file of DDBJ
Nucleotide Fasta sequence
Amino acid fasta sequence
Results of ARSA
Entrez
• The NCBI developed and maintains Entrez, a biological
database retrieval system.
• It is a gateway that allows text-based searches for a
wide variety of data, including annotated genetic
sequence information, structural information, as well as
citations and abstracts, full papers, and taxonomic data.
• The key feature of Entrez is its ability to integrate
information, which comes from cross-referencing
between NCBI databases based on pre-existing and
logical relationships between individual entries.
• This is highly convenient: users do not have to
visit multiple databases located in disparate
places.
• For example, in a nucleotide sequence page,
one may find cross-referencing links to the
translated protein sequence, genome mapping
data, or to the related PubMed literature
information, and to protein structures if
available.
• Effective use of Entrez requires an understanding of the main
features of the search engine.
• There are several options common to all NCBI databases that
help to narrow the search.
• One option is “Limits,” which helps to restrict the search to a
subset of a particular database.
• It can also be set to restrict a search to a particular database
(e.g., the field for author or publication date) or a particular
type of data (e.g., chloroplast DNA/RNA).
• The search can also be limited to a particular search field (e.g.,
gene name or accession number).
• The “History” option provides a record of the previous searches
so that the user can review, revise, or combine the results of
earlier searches.
• One of the databases accessible from Entrez is a
biomedical literature database known as PubMed,
which contains abstracts and in some cases the full text
articles from nearly 4,000 journals.
• An important feature of PubMed is the retrieval of
information based on medical subject headings (MeSH)
terms.
• The MeSH system consists of a collection of more than
20,000 controlled and standardized vocabulary terms
used for indexing articles.
• In other words, it is a thesaurus that helps convert
search keywords into standardized terms to describe a
concept.
• By doing so, it allows “smart” searches in
which a group of accepted synonyms are
employed so that the user not only gets exact
matches, but also related matches on the
same topic that otherwise might have been
missed.
• Another way to broaden the retrieval is by
using the “Related Articles” option.
• For a complex search, a user can use the Boolean operators or a
combination of Limits and Preview/Index features to conduct
complex searches.
• Alternatively, field tags can be used to improve the efficiency of
obtaining the search results.
• The tags are identifiers for each field and are placed in brackets.
For example, [AU] limits the search for author name, and [JID] for
journal name.
• PubMed uses a list of tags for literature searches. The search
terms can be specified by the tags which are joined by Boolean
operators.
• Another unique database accessible from Entrez is Online
Mendelian Inheritance in Man(OMIM),which is a non-sequence-
based database of human disease genes and human genetic
disorders.
• Each entry in OMIM contains summary information about a
particular disease as well as genes related to the disease. The text
contains numerous hyperlinks to literature citations, primary
sequence records, as well as chromosome loci of the disease genes.
• The database can serve as an excellent starting point to study genes
related to a disease.
• NCBI also maintains a taxonomy database that contains the names
and taxonomic positions of over 100,000 organisms with at least
one nucleotide or protein sequence represented in the GenBank
database.
• The taxonomy database has a hierarchical classification scheme. The
root level is Archaea, Eubacteria, and Eukaryota.
• The database allows the taxonomic tree for a particular organism to
be displayed. The tree is based on molecular phylogenetic data,
namely, the small ribosomal RNA data.
SRS
• Sequence retrieval system (SRS;available at
http://srs6.ebi.ac.uk/) is a retrieval system maintained
by the EBI, which is comparable to NCBI Entrez.
• It is not as integrated as Entrez, but allows the user to
query multiple databases simultaneously, another
good example of database integration.
• It also offers direct access to certain sequence analysis
applications such as sequence similarity searching and
Clustal sequence alignment.
• Queries can be launched using “Quick Text Search”
with only one query box in which to enter information.
• There are also more elaborate submission forms, the
“Standard Query Form” and the “Extended Query
Form.”
• The standard form allows four criteria (fields) to be
used, which are linked by Boolean operators.
• The extended form allows many more diversified
criteria and fields to be used.
• The search results contain the query sequence and
sequence annotation as well as links to literature,
metabolic pathways, and other biological databases.

You might also like