0% found this document useful (0 votes)
48 views

Pairwise Sequ Datab: Appos® ©mimfoimrdaifcltes

a

Uploaded by

Ved Classes
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Pairwise Sequ Datab: Appos® ©mimfoimrdaifcltes

a

Uploaded by

Ved Classes
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
You are on page 1/ 25

Pairwise Sequ Datab

AppOS^ MimfoimrDaifcltes

uence
match mismatch GCG!
TA CC TA CC

GCG A

TG

gap

Why make sequence alignments?

1.

The sequences may share a common origin - a common ancestor sequence. If the similarity is sufficiently

convincing or if we have additional evidence for an evolutionary relationship, then we say that the sequences are homologous.

2.

The sequences may have the same or related structure and function.

3.

The difference in the alignments may be linked to the functional changes/diseases.

Approaches in Pairwise Sequence Alignment

4.

Dot Matrix
Global Alignment

5.

6.

Local Alignment

isualization: Dotmatnx
A G A T T C G C A G T T C C G T A C G

rotmatrix - StophylococcuI^pIBermiclis TCP62A and ATCC12228

ignment --StophylococcusTpJBFrmidisRP62A and ATCC12228

A high-quality alignment? For DNA sequences Long runs of identity Few gaps in the aligned regions An overall high degree of identity (>8o%)

For protein sequences Includes most of each sequence A significant proportion of identities throughout the alignment Multiple examples of conservative substitutions Relatively few gaps 50% is very good

=
{92&f

NBed4etn fn and Wunsc

The alternative pathways that could form the maximum match are illustrated. The maximum match terminates at the largest number in the first row or first column, 8 in this case.

Local Alignment: Smith-Waterman Algorithm (1981)

A A A A U 00 0-0

C 00 0-0 0 0 00 00 10 10 0-0 00 00 0-0 00 10 00 00

A 00 10 10 00

C 0-0 0-0 0*7 0-7 10 0-0 0-0 0-7 17 0-3 1-3 0-0 0-7 10 1-7

C 0 0 0*0 00 0-3 0-3 2 0 10 0-3 03 1-3 00 10 10 03 0-7

U 0-0 00 00 00 00 1-3 30 1-7 1-8 10 10 0-3 20 0-7 0-3

C 00 0*0 00 10 0 0 0-3 1-7 2-7 2-7 2-3 1-0 0-7 0-7 1*7 0-3

G 0*0 0-0 0*0 0-7 10 1-3 13 2-3 2-3 20 0 7 17 0-3 1*3 ()() 0-0 0-0 0-0 1-0 0-3 10 10 1-0 2 0

C 0-0 00 00 0-0 00 20 13 0*7 0-7 0-7 2-0

U 00 0-0 00 10 00 0-7 17 1-0 1-7 1-7 17 1-7 27 2-7 1-3

l! 0-0 0-0 00 1-0 0-7 0-3 0 3 1*3 20 2-7 13 1*3 13 2 3 2-3

G 0 0 1 (I I 0 0-0 0-7 0-3 0-0 13 10 1-7 2-3 2-3 10 10 2-0 00 0o 7 07 10 0* 3 00 00 10 27 20 20 20 20

4 c
c A
U U G A C G G

o-o 0-0 00 00 00 00 00 00 0-0 00 00 00 00

o-o

o-

o-o 00
0-7 2-0 0-7 0-3 0 0 10 0-0 0-7 0-0

to

!li 20 t 17 \oso 2-7 13 \

Match: 1.0 Mismatch: -1/3 Gapwk=1.0+l/3*k

'Smith-Waterman Algorithm vs Needleman-Wunsch Algorithm

Database Searching

Similarity searches in sequence databases have become a mainstay of bioinformatics.

A sequence by itself is not information. Comparison can help find the important biological information, e.g. function of unknown genes, structure of query sequences, duplicated genes.

Similar scores: allowing substitutions or residues with similar characteristics (e.g. BLOSUM62, PAM250)

Two programs, which greatly facilitated the similarity search, were developed: FASTA (Pearson and Lipman 1988) and BLAST (Altschul et al. 1990). Many programs have been further developed from them.

Sequence databases, e.g. NCBI.

Basic Local Alignment Search Tool (BLAST)

Basic Local Alignment Search Tool (BLAST) was developed as a new way to perform sequence similarity search. It is a string pattern search.

SA/hat BLAST Tells You


BLAST reports surprising alignments Different than chance

Assumptions Random sequences Constant composition

Conclusions Surprising similarities imply evolutionary homology

Basic Local Alignment Search Tool (BLAST)

Widely used similarity search tool Heuristic approach based on Smith Waterman algorithm Finds best local alignments Provides statistical significance All combinations (DNA/Protein) query and database . DNA vs DNA (BLASTN) DNA translation vs Protein (BLASTX) Protein vs Protein (BLASTP) Protein vs DNA translation (TBLASTN) DNA translation vs DNA translation (TBLASTX) www, standalone, and network clients

Word Size = 11
GTACTGGACAT = 28

Minimum word size = 7


megablast default

blastn default = 11 Make a lookup TACTGGACATG

table of words
ACTGGACATGG CTGGACATGGA TGGACATGGAC

GGACATGGACC GACATGGACCC ACATGGACCCT

Online BLASJ S e a r c h ^

http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs BLAST Home Recent Results Saved Strategies Help a a Apis mellifera Gallus NCBI/ BLAST Home gatlus o Pan BLAST finds regions of similarity between biological troglodyte sequences, more.., ^^2 Aligning Multiple Protein Sequences? Try the COBALT Multiple Alignment Tool. GoJ BLAST Assembled RefSeq Genomes n Oryza sativa D Bos taurus n Human a Mouse a Rat a Arabidopsis thaliana Basic BLAST Choose a BLAST program to run. nucleoti de blast protei n blast Search a nucleotide database using a nucleotide query Algorithms: blastn, megablast. discontiguous megablast Search protein database using a protein query Algorithms: blastp. psi-blast. phi-blast Choose a species genome to search, or list all genomic BLAST databases

II f Skin Inl Register! S e a r c h p r o t e i n d a t a b a

se using a translated nucleotide query Search translated nucleotide database using a protein query Search translated nucleotide database using a translated nucleotide query

Your Recent Results New! Nucleotide Sequence (255 lett... Nucleotide Sequence (25 lette... News Hew SNP BLAST page The dbSNP BLAST page has been updated. Wed, 12 Jan 2011 14:00:00 EST g| More BLAST news... Tip of the Day Use Genomic BLAST to see the genomic context If you are interested in the evolution of a particular gene or gene famity it is often intetesting to examine the intro-exon structure even across species. [^I More tips...

Specialized BLAST Choose a type of specialized search (or database name in parentheses ) Make specific primers with Primer-BLAST o Search trace archives D Find conserved domains in your sequence (cds) Find sequences with similar conserved domain architecture (cdart) D Search sequences that have gene expression profiles (GEO) Search immunoglobulins (IgBLAST) Search using SNP flanks

n Sr.rppn spmipnre fnr \/Rrtnr rontaminatinn iVprsrrppnl

utput: Alignments

>gi|127552|sp|P23367|MUTL_ECOLI DNA mismatch repair protein mutL Length = 615 Score =42.0 bits (97), Expect = 3e-04 Identities = 26/59 (44%), Positives = 33/59 (55%), Gaps = 9/59 (15%) HEVHF-------LHE----ESILEV-QQHIESKL HEVRFHQSRLVHDFIYQGVLSVLQQQLETPL +H+ L PIOSITHPFLYLSLEIS PQNVDVNVH L 338 58 negative positive score + +L V QQ +E+ L Query + P 9 L LEI P VDVNVH substitution LGitfDQQPAFVLYLE IDPHQVDVNVH (conservative) Sbjct 280

Identical match

From NCBI training tutorial

Perform Blast search of the following sequence. In which gene? In the coding region?

Translate it into aa sequence, and perform Blastp search

GGCCGTGCCT GGGGATCCAA GTTCCCCTCT CTCCACCTGT GCTCACCTCT CCTCCGTCCC CAACCCTGCA CAGGCAAGAT CGTGGACGCC GTGATTCAGG AGCACCAGCC CTCCGTGCTG CTGGAGCTGG GGGCCTACTG TGGCTACTCA GCTGTGCGCA TGGCCCGCCT GCTGTCACCA GGGGCGAGGC TCATCACCAT CGAGATCAAC CCCGACTGTG CCGCCATCAC CCAGCGGATG GTGGATTTCG CTGGC

You might also like