Pairwise Sequ Datab: Appos® ©mimfoimrdaifcltes
Pairwise Sequ Datab: Appos® ©mimfoimrdaifcltes
AppOS^ MimfoimrDaifcltes
uence
match mismatch GCG!
TA CC TA CC
GCG A
TG
gap
1.
The sequences may share a common origin - a common ancestor sequence. If the similarity is sufficiently
convincing or if we have additional evidence for an evolutionary relationship, then we say that the sequences are homologous.
2.
The sequences may have the same or related structure and function.
3.
4.
Dot Matrix
Global Alignment
5.
6.
Local Alignment
isualization: Dotmatnx
A G A T T C G C A G T T C C G T A C G
A high-quality alignment? For DNA sequences Long runs of identity Few gaps in the aligned regions An overall high degree of identity (>8o%)
For protein sequences Includes most of each sequence A significant proportion of identities throughout the alignment Multiple examples of conservative substitutions Relatively few gaps 50% is very good
=
{92&f
The alternative pathways that could form the maximum match are illustrated. The maximum match terminates at the largest number in the first row or first column, 8 in this case.
A A A A U 00 0-0
A 00 10 10 00
C 0-0 0-0 0*7 0-7 10 0-0 0-0 0-7 17 0-3 1-3 0-0 0-7 10 1-7
C 00 0*0 00 10 0 0 0-3 1-7 2-7 2-7 2-3 1-0 0-7 0-7 1*7 0-3
G 0*0 0-0 0*0 0-7 10 1-3 13 2-3 2-3 20 0 7 17 0-3 1*3 ()() 0-0 0-0 0-0 1-0 0-3 10 10 1-0 2 0
4 c
c A
U U G A C G G
o-o
o-
o-o 00
0-7 2-0 0-7 0-3 0 0 10 0-0 0-7 0-0
to
Database Searching
A sequence by itself is not information. Comparison can help find the important biological information, e.g. function of unknown genes, structure of query sequences, duplicated genes.
Similar scores: allowing substitutions or residues with similar characteristics (e.g. BLOSUM62, PAM250)
Two programs, which greatly facilitated the similarity search, were developed: FASTA (Pearson and Lipman 1988) and BLAST (Altschul et al. 1990). Many programs have been further developed from them.
Basic Local Alignment Search Tool (BLAST) was developed as a new way to perform sequence similarity search. It is a string pattern search.
Widely used similarity search tool Heuristic approach based on Smith Waterman algorithm Finds best local alignments Provides statistical significance All combinations (DNA/Protein) query and database . DNA vs DNA (BLASTN) DNA translation vs Protein (BLASTX) Protein vs Protein (BLASTP) Protein vs DNA translation (TBLASTN) DNA translation vs DNA translation (TBLASTX) www, standalone, and network clients
Word Size = 11
GTACTGGACAT = 28
table of words
ACTGGACATGG CTGGACATGGA TGGACATGGAC
Online BLASJ S e a r c h ^
http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs BLAST Home Recent Results Saved Strategies Help a a Apis mellifera Gallus NCBI/ BLAST Home gatlus o Pan BLAST finds regions of similarity between biological troglodyte sequences, more.., ^^2 Aligning Multiple Protein Sequences? Try the COBALT Multiple Alignment Tool. GoJ BLAST Assembled RefSeq Genomes n Oryza sativa D Bos taurus n Human a Mouse a Rat a Arabidopsis thaliana Basic BLAST Choose a BLAST program to run. nucleoti de blast protei n blast Search a nucleotide database using a nucleotide query Algorithms: blastn, megablast. discontiguous megablast Search protein database using a protein query Algorithms: blastp. psi-blast. phi-blast Choose a species genome to search, or list all genomic BLAST databases
se using a translated nucleotide query Search translated nucleotide database using a protein query Search translated nucleotide database using a translated nucleotide query
Your Recent Results New! Nucleotide Sequence (255 lett... Nucleotide Sequence (25 lette... News Hew SNP BLAST page The dbSNP BLAST page has been updated. Wed, 12 Jan 2011 14:00:00 EST g| More BLAST news... Tip of the Day Use Genomic BLAST to see the genomic context If you are interested in the evolution of a particular gene or gene famity it is often intetesting to examine the intro-exon structure even across species. [^I More tips...
Specialized BLAST Choose a type of specialized search (or database name in parentheses ) Make specific primers with Primer-BLAST o Search trace archives D Find conserved domains in your sequence (cds) Find sequences with similar conserved domain architecture (cdart) D Search sequences that have gene expression profiles (GEO) Search immunoglobulins (IgBLAST) Search using SNP flanks
utput: Alignments
>gi|127552|sp|P23367|MUTL_ECOLI DNA mismatch repair protein mutL Length = 615 Score =42.0 bits (97), Expect = 3e-04 Identities = 26/59 (44%), Positives = 33/59 (55%), Gaps = 9/59 (15%) HEVHF-------LHE----ESILEV-QQHIESKL HEVRFHQSRLVHDFIYQGVLSVLQQQLETPL +H+ L PIOSITHPFLYLSLEIS PQNVDVNVH L 338 58 negative positive score + +L V QQ +E+ L Query + P 9 L LEI P VDVNVH substitution LGitfDQQPAFVLYLE IDPHQVDVNVH (conservative) Sbjct 280
Identical match
Perform Blast search of the following sequence. In which gene? In the coding region?
GGCCGTGCCT GGGGATCCAA GTTCCCCTCT CTCCACCTGT GCTCACCTCT CCTCCGTCCC CAACCCTGCA CAGGCAAGAT CGTGGACGCC GTGATTCAGG AGCACCAGCC CTCCGTGCTG CTGGAGCTGG GGGCCTACTG TGGCTACTCA GCTGTGCGCA TGGCCCGCCT GCTGTCACCA GGGGCGAGGC TCATCACCAT CGAGATCAAC CCCGACTGTG CCGCCATCAC CCAGCGGATG GTGGATTTCG CTGGC