Additional Note PDF
Additional Note PDF
As an interdisciplinary field,
bioinformatics combines computer
science, statistics, mathematics
and engineering to study
biological data and processes
P. Paulsharma Chakravarthy
The Central Dogma & Biological Data
Original DNA Sequences
(Genomes)
Protein Sequences
-Inferred
-Direct sequencing
Protein structures
-Experiments
-Models (homologues)
Literature information
Types of Biological Database
Biological
Database
https://www.ebi.ac.uk/training/online/course/bioinformatics-terrified/what-database/relational-databases/primary-and-secondary-databases
What can be discovered about a gene by
a database search?
A little or a lot, depending on the gene
• Evolutionary information - homologous genes, taxonomic distributions,
allele frequencies, synteny, etc.
https://www.ncbi.nlm.nih.gov/
NCBI
https://www.ncbi.nlm.nih.gov/
Try to search “gene expression
potato diseases” and see the
number of hits!
Try to key in “crop diseases” or
“insect pests” in the Title words and
see the number of hits!
NCBI
https://www.ncbi.nlm.nih.gov/
https://www.ncbi.nlm.nih.gov/books
Try to search
“crop diseases”
NCBI
https://www.ncbi.nlm.nih.gov/
One database of particular importance to biologists is
GenBank®, which encompasses all publicly available
protein and nucleotide sequences
Let’s work on this…
• Go to https://www.ncbi.nlm.nih.gov/genbank/
• Identify loci (genes) associated with the sequence.
à Input – Pi-b
• For each particular “hit”, we can look at that sequence and
its alignment in more detail.
• See similar sequences, and the organisms in which they are
found.
• But there’s much more that can be found on these genes,
even just inside NCBI…
PRIMARY VS. DERIVATIVE SEQUENCE DATABASES
ACGT
GC RefSeq
CTTC T
Labs
A A
GAG
GAG
A CATC TATAGCCG
TA AGCTCCGATA
TA CCGATGACAA
GC
Sequencing G C CG
Centers ACG
T Genome
CGT
TA
Curators Assembly
C
T
GA
T TG
GA
AT
ACA
CA
TGC
CG
GA
TT TTGACA Updated
CTA
C
CG
CGTGA
AC
A
ACG
TAT AT
CG GC
A
T continually
GC
G AC
GT
C GA
ATTGTG
TA AGC TGAA
TAT
C
TG TA
C
TT
GA
T TGCACT CT AGC TG
G
TATAGCCG CA by NCBI
A T A
A
T
TATAGCCG
TATAGCCG
A TATAGCCG
TA
ATA T A G C
TA TT
GA GenBank
AT UniGene
Updated ONLY
by submitters
TACTTTCTT CTTC T
GAGA A A
T GAGA GAG
GAG
A ATCA C A CATC Algorithms 20.
Similar to NCBI…
https://www.uniprot.org
https://www.youtube.com/user/NCBINLM
Some take home messages
ü There are a lot of molecular biology databases, containing a lot of
valuable information
ü Not even the best databases have everything (or the best of
everything)