Coursera BioinfoMethods-I Lecture01
Coursera BioinfoMethods-I Lecture01
Course material developed by Ryan Austin, David Guttman, Laura Hug, Momoko Price, and Nicholas Provart
Course produced by Jamie Waese, Rohan Patel, William Heikoop, and Nicholas Provart
Bioinformatic Methods I
Topic
NCBI/Blast I
Blast II/Comparative Genomics
Multiple Sequence Alignments
Phylogenetics
Selection Analysis
NGS Analysis / Metagenomics .
Bioinformatic Methods I
What is bioinformatics?
Bioinformatics
is the development and application of computational tools in managing all
kinds of biological data
involves the technology that uses computers for storage, retrieval,
manipulation, and distribution of information related to biological
macromoleculates such as DNA, RNA, proteins and metabolites
generally limited to sequence, structural, and functional analysis of genes
and genomes and their corresponding products
sometimes called computational molecular biology
This field has developed over the past decade or so to help manage the
huge increase in data generated by genome sequencing projects, highthroughput technologies etc.
Bioinformatic Methods I
Why bioinformatics?
>gi|27500381:c623297-542205 Homo sapiens chromosome 17 genomic contig
AAAACTGCGACTGCGCGGCGTGAGCTCGCTGAGACTTCCTGGACGGGGGACAGGCTGTGGGGTTTCTCAG
ATAACTGGGCCCCTGCGCTCAGGAGGCCTTCACCCTCTGCTCTGGGTAAAGGTAGTAGAGTCCCGGGAAA
GGGACAGGGGGCCCAAGTGATGCTCTGGGGTACTGGCGTGGGAGAGTGGATTTCCGAAGCTGACAGATGG
GTATTCTTTGACGGGGGGTAGGGGCGGAACCTGAGAGGCGTAAGGCGTTGTGAACCCTGGGGAGGGGGGC
AGTTTGTAGGTCGCGAGGGAAGCGCTGAGGATCAGGAAGGGGGCACTGAGTGTCCGTGGGGGAATCCTCG
TGATAGGAACTGGAATATGCCTTGAGGGGGACACTATGTCTTTAAAAACGTCGGCTGGTCATGAGGTCAG
GAGTTCCAGACCAGCCTGACCAACGTGGTGAAACTCCGTCTCTACTAAAAATACAAAAATTAGCCGGGCG
TGGTGCCGCTCCAGCTACTCAGGAGGCTGAGGCAGGAGAATCGCTAGAACCCGGGAGGCGGAGGTTGCAG
TGAGCCGAGATCGCGCCATTGCACTCCAGCCTGGGCGACAGAGCGAGACTGTCTCAAAACAAAACAAAAC
AAAACAAAACAAAAAACACCGGCTGGTATGTATGAGAGGATGGGACCTTGTGGAAGAAGAGGTGCCAGGA
ATATGTCTGGGAAGGGGAGGAGACAGGATTTTGTGGGAGGGAGAACTTAAGAACTGGATCCATTTGCGCC
ATTGAGAAAGCGCAAGAGGGAAGTAGAGGAGCGTCAGTAGTAACAGATGCTGCCGGCAGGGATGTGCTTG
AGGAGGATCCAGAGATGAGAGCAGGTCACTGGGAAAGGTTAGGGGCGGGGAGGCCTTGATTGGTGTTGGT
TTGGTCGTTGTTGATTTTGGTTTTATGCAAGAAAAAGAAAACAACCAGAAACATTGGAGAAAGCTAAGGC
TACCACCACCTACCCGGTCAGTCACTCCTCTGTAGCTTTCTCTTTCTTGGAGAAAGGAAAAGACCCAAGG
GGTTGGCAGCAATATGTGAAAAAATTCAGAATTTATGTTGTCTAATTACAAAAAGCAACTTCTAGAATCT
TTAAAAATAAAGGACGTTGTCATTAGTTCTTTGGTTTGTATTATTCTAAAACCTTCCAAATCTTAAATTT
ACTTTATTTTAAAATGATAAAATGAAGTTGTCATTTTATAAACCTTTTAAAAAGATATATATATATGTTT
TTCTAATGTGTTAAAGTTCATTGGAACAGAAAGAAATGGATTTATCTGCTCTTCGCGTTGAAGAAGTACA
AAATGTCATTAATGCTATGCAGAAAATCTTAGAGTGTCCCATCTGGTAAGTCAGCACAAGAGTGTATTAA
TTTGGGATTCCTATGATTATCTCCTATGCAAATGAACAGAATTGACCTTACATACTAGGGAAGAAAAGAC
ATGTCTAGTAAGATTAGGCTATTGTAATTGCTGATTTCCTTAACTGAAGAACTTTAAAAATATAGAAAAT
GATTCCTTGTTCTCCATCCACTCTGCCTCTCCCACTCCTCTCCTTTTCAACACAAATCCTGTGGTCCGGG
AAAGACAGGGACTCTGTCTTGATTGGTTCTGCACTGGGGCAGGAATCTAGTTTAGATTAACTGGCATTTT
GGCTTTTCTTCCAGCTCTAAAACAAGCTCCATCACTTGAAATGGCAAAATAAAATCATGGATGAGGCCGA
GGGCGGTGGCTTATGCCTGTAATCCCAGCACTTTGGGAGGCCAAGGTGGTAGGATCACGAGGTCAGGAGA
TCGAGACCATCCTGGCCAACATGGTGAAACCCCCTCTCCACTAAAAATACAAAAATTAGCTGGGCGTAGT
Bioinformatic Methods I
Biological Databases
Outline
Why databases?
What is a database?
Data structures: Flat File and Relational
Accession numbers and identifiers
A practical example of utility GQuery/Entrez
Bioinformatic Methods I
Why databases?
Bioinformatic Methods I
Why databases?
Genome and genomic sequences
Gene sequences, mutations
Gene regulation
Gene expression (where and when)
Intron splice variants
Protein sequence, post-translational
modifications
Protein tertiary structure (3D)
Protein networks
Protein localization
Enzyme Kinetics
Metabolites, metabolic networks
Diseases
Literature
Bioinformatic Methods I
What is a database?
How can data be stored...
zzz
Last_name
Institution
Department
Address
Nancy
Dengler
University of Toronto
Botany
Peter
Lewis
Uni. Toronto
Dept. of Biochemistry
John
Coleman
University of Toronto
Department of Botany
John
Coleman
York University
Dept. of Biology
Bioinformatic Methods I
Relational Databases
Nancy|Dengler|Botany|University of Toronto|25 Willocks St, Toronto, ON. M5S 3B2
Peter|Lewis|Dept. of Biochemistry|Uni. Toronto|1 Kings College Circle, Toronto, ON. M5S 1A8
John|Coleman|Department of Botany|University of Toronto|25 Willcocks St, Toronto, ON. M5S 3B2
John|Coleman|Dept. of Biology|York University|4700 Keele St, Toronto, ON. M3J 1P3
Table 'Professors'
Professor_id
1
2
3
4
First_name
Nancy
Peter
John
John
Last_name
Dengler
Lewis
Coleman
Coleman
Contact_id
1
2
1
3
Institution
University of Toronto
Uni. Toronto
York University
Department
Dept. of Botany
Dept. of Biochemisty
Dept. of Biology
Address
25 Willocks St, Toronto, ON. M5S 3B2
1 Kings College Circle, Toronto, ON. M5S 1A8
4700 Keele St, Toronto, ON. M3J 1P3
primary key
Table 'Contacts'
Contact_id
1
2
3
foreign key
primary key
Bioinformatic Methods I
Identifier
Accession code (or number)
Bioinformatic Methods I
Bioinformatic Methods I
Bioinformatic Methods I
Bioinformatic Methods I
The GenBank flatfile format (GBFF) is one of the most commonly used formats
used for nucleotide sequences. It contains all of the information associated with
the sequence, as well as the sequence itself.
The GBFF has 3 parts: the header, the features, and the sequence itself.
LOCUS
HUMADH6A01
409 bp
DNA
identifier
length
source type
Bioinformatic Methods I
linear
PRI 17-OCT-2000
Bioinformatic Methods I
REFERENCE
AUTHORS
TITLE
JOURNAL
MEDLINE
PUBMED
Homo sapiens.
Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
1 (bases 1 to 409)
Yasunami,M., Chen,C.S. and Yoshida,A.
A human alcohol dehydrogenase gene (ADH6) encoding an additional
class of isozyme
Proc. Natl. Acad. Sci. U.S.A. 88 (17), 7610-7614 (1991)
91352038
1881901
Bioinformatic Methods I
misc_signal
exon
Location/Qualifiers
1..409
/organism="Homo sapiens
/db_xref="taxon:9606
/sex="male
/tissue_type="liver
34..48
287..396
/gene="ADH6
Bioinformatic Methods I
FEATURES
CDS
sig_peptide
mat_peptide
Location/Qualifiers
160..>2301
/codon_start=1
/product="EGF-receptor
/protein_id="CAA42219.1
/db_xref="GI:50804
/db_xref="MGD:95294
/db_xref="SWISS-PROT:Q01279
/translation="MRPSGTARTTLLVLLTALCAAGGALEEKKVCQGTSNRLTQLGTF
EDHFLSLQRMYNNCEVVLGNLEITYVQRNYDLSFLKTIQEVAGYVLIALNTVERIPLE
NLQIIRGNALYENTYALAILSNYGTNRTGLRELPMRNLQEILIGAVRFSNNPILCNMD
TIQWRDIVQNVFMSNMSMDLQSHPSSCPKCDPSCPNGSCWGGGEENCQKLTKIICAQQ
CSHRCRGRSPSDCCHNQCAAGCTGPRESDCLVCQKFQDEATCKDTCPPLMLYNPTTYQ
MDVNPEGKYSFGATCVKKCPRNYVVTDHGSCVRACGPDYYEVEEDGIRKCKKCDGPCR
KVCNGIGIGEFKDTLSINATNIKHFKYCTAISGDLHILPVAFKGDSFTRTPPLDPREL
EILKTVKEITGFLLIQAWPDNWTDLHAFENLEIIRGRTKQHGQFSLAVVGLNITSLGL
RSLKEISDGDVIISGNRNLCYANTINWKKLFGTPNQKTKIMNNRAEKDCKAVNHVCNP
LCSSEGCWGPEPRDCVSCQNVSRGRECVEKWNILEGEPREFVENSECIQCHPECLPQA
MNITCTGRGPDNCIQCAHYIDGPHCVKTCPAGIMGENNTLVWKYADANNVCHLCHANC
TYGCAGPGLQGCEVWPSGPKIPSIATGIVGGLLFIVVVALGIGLFMRRRHIVRKRTLR
RLLQERELVEPLTPSGEAPNQAHLRILKETEF
160..231
232..>2301
Bioinformatic Methods I
77 g
aaagaaatac
tttaaaactc
gtttgcattt
gaacttccat
gaaggtcgga
aaagttgcta
gagtactaca
124 t
ttttgtacac
aaaaaaatgg
tcaccttttg
caagcacggg
ccagccttct
caggatctcc
ggccaagtag
tctgttagaa
ataataagag
gctctttcac
agagcctact
gatctacagt
ctttctcaat
gtgcagtat
Bioinformatic Methods I
attttaagtt
ggacctgttt
tgagatgagc
tttcctgttt
cgcctgtgta
aaattcatct
2005
Bioinformatic Methods I
2008
N. Provart Intro for Lab 1 Slide 20
10
Bioinformatic Methods I
Definitions
Bioinformatic Methods I
11
Bioinformatic Methods I
Sample Problem
Identify the SNPs which potentially cause early onset breast cancer, and design
oligos to PCR them in samples of human genomic DNA for sequencing. Use the
OMIM function of GQuery/Entrez. OMIM provides links to everything that is
known about a given disease across the various databases at NCBI.
http://www.ncbi.nih.gov/Database/datamodel/index.html
Bioinformatic Methods I
12
Bioinformatic Methods I
Bioinformatic Methods I
13
Bioinformatic Methods I
Bioinformatic Methods I
14
Steve Rozen and Helen J. Skaletsky (2000), in: Krawetz S, Misener S (eds) Bioinformatics Methods and Protocols: Methods in
Molecular Biology. Humana Press, Totowa, NJ, pp 365-386
Bioinformatic Methods I
Bioinformatic Methods I
15