0% found this document useful (0 votes)

158 views

Sequence Comparison Homology and Similarity

The document discusses sequence comparison and alignment. It defines homology as sequences being evolutionarily related through common ancestry, while similarity refers to sequences that look alike without implying ancestry. High similarity can provide evidence for inferring homology. Optimal alignments maximize matches and minimize gaps/mismatches based on assigned scores. Global alignments align full sequences while local alignments find best-matching regions.

Uploaded by

Anthony Liang

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

158 views

Sequence Comparison Homology and Similarity

Uploaded by

Anthony Liang

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Sequence

Comparison

Homology and similarity

Homology
Sequences are homologous if they are evoluBonarily related - i.e. they share a common ancestor through evoluBon

Similarity BINF3010/9010
Looking alike Not an evoluBonary concept

Homology and similarity

Homology is not a quanBty
Two sequences are either homologous or not homologous e.g., it is incorrect to refer to two sequences as being 50% homologous

Homology and similarity

ComputaBonal methods recognise and measure similarity High similarity is supporBng evidence to infer homology

Similarity can be quanBed

e.g., two sequences can be 50% similar, 80% similar etc

Types of homology
Orthologs: Genes/proteins descended from a common ancestor Paralogs: Genes/proteins related to each other due to a gene duplicaBon event

EvoluBon through mutaBons

SPAMEGGANDSPAM
substitutions

SPATEGGANDSPAM
insertions deletions

1 SPLATEGGANDSPAM

2 SPAGANDSPAM

Visualising the process

Dotmatrix plots (dotplots) Alignments

Dotmatrix plot
1 SPLATEGGANDSPAM M A 2 SPAGANDSPAM P S D N A G A 2 P S SPLATEGGANDSPAM 1

Dotmatrix plots

Dotmatrix plot: Principle

Word size = 1
A **
G T A C C G T T C C AAGTTCAGTAGGCATTTAAGCG * * * ** * * ** * * ** * *** ** * * * ** * * * * * * * * ** * * ** * *** ** * *** * * * * * *

Word size = 2

AAGTTCAGTAGGCATTTAAGCG A * * * * G * * T * * A C C * G * * T * ** T * C C

Word size = 3

A G T A C C G T T C C AAGTTCAGTAGGCATTTAAGCG * * *

* *

Word size = 3
Threshold = 2
AAGTTCAGTAGGCATTTAAGCG
A G T A C C G T T C C * * * * * * * ** * * * * ** * * * * * * * * * * *

Window = 30 Stringency = 9

Window = 20 Stringency = 9

Window = 30 Stringency = 14

Window = 20 Stringency = 13

Dotmatrix plot: repeats

1 SPLATEGGANDSPAM M A 2 SPAGANDSPAM P S D N A G A 2 P S SPLATEGGANDSPAM 1

Repeat detecBon

Sequence alignment
1 SPLATEGGANDSPAM 2 SPAGANDSPAM

TFIIIA
vs
TFIIIA

1 SPLATEGGANDSPAM || | |||||||| 2 SP-A---GANDSPAM

Global vs Local Alignment

Global: align the whole of the two sequences together

1 ....AUAUCUUUAAUUUAAUGGUAAAAUAUUAGAAUACGAAUCUAAUUAU 46 |||| || | || || || || | | | || || 1 UGGUAUAUAGUUUAAACAAAACGAAUGAUUUCGACUCAUUAAAUUAUGAU 50 . . 47 AUAGGUUCAAAUCCUAUAAGAUAUUCCA 74 | | | | | 51 AAUCAUAUUUACCAACCA.......... 68

Which alignment is correct?

1 SPLATEGGANDSPAM || | |||||||| 2 SP-A---GANDSPAM 2 insertion/deletions 1 SPLATEGGANDSPAM || |||||||| 2 SPA----GANDSPAM 1 indel, 1 substitution

Local: align only the region of best similarity

44 UAUAUAGGUUCAA 56 ||||||| || || 4 UAUAUAGUUUAAA 16

1 SPLATEGGANDSPAM || |||||||| 2 SP----AGANDSPAM 1 indel, 1 substitution

1 SPLATEGGANDSPAM | |||||||| 2 -SPA---GANDSPAM 2 indels, 2 substitutions

Which alignment is opBmal?

Select a scoring system for alignments
Assign values to matches, mismatches and gaps

For example: Match: +2 Mismatch: -1 Gap: 5

1 SPLATEGGANDSPAM || | |||||||| 2 SP-A---GANDSPAM 1 SPLATEGGANDSPAM ||x |||||||| 2 SPA----GANDSPAM

Sum up the values over the whole alignment

Alignment score = Scorematch - Scoregap

S = (112) + (0-1) - (2*5) = 12

S = (102) + (1-1) - (1*5) = 14

The opBmal alignment is the one with the highest score

1 SPLATEGGANDSPAM || x|||||||| 2 SP----AGANDSPAM

S = (102) + (1-1) - (1*5) = 14

1 SPLATEGGANDSPAM xx| |||||||| 2 -SPA---GANDSPAM

S = (92) + (2-1) - (2*5) = 6

Algorithms
Global alignment
Needleman-Wunsch Sellers

Local alignment
Smith-Waterman

Note that the opBmal alignment is not necessarily the correct biological alignment. However, it is usually impossible to know the correct evoluBonary alignment

Structure alignment

Structure alignment
10 20 30 40 50 60 ....*....|....*....|....*....|....*....|....*....|....*....| 1 ~VLSPADKTNVKAAWGKVgaHAGEYGAEALERMFLSFPTTKTYFPHFDls~~~~~~hGSA 53 1 vHLTPEEKSAVTALWGKV~~NVDEVGGEALGRLLVVYPWTQRFFESFGdlstpdavmGNP 58 70 80 90 100 110 120 ....*....|....*....|....*....|....*....|....*....|....*....| 54 QVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHL 113 59 KVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHF 118

4HHB_A 2HHB_B

130 140 ....*....|....*....|....*... 4HHB_A 114 PAEFTPAVHASLDKFLASVSTVLTSKYR 141 2HHB_B 119 GKEFTPPVQAAYQKVVAGVANALAHKYH 146

Scoring systems
Matches and mismatches
SubsBtuBon mutaBons

DNA sequence alignment

768 TT....TGTGTGCATTTAAGGGTGATAGTGTATTTGCTCTTTAAGAGCTG || || || | | ||| | |||| ||||| ||| ||| 87 TTGACAGGTACCCAACTGTGTGTGCTGATGTA.TTGCTGGCCAAGGACTG . . . . . 814 AGTGTTTGAGCCTCTGTTTGTGTGTAATTGAGTGTGCATGTGTGGGAGTG | | | | |||||| | |||| | || | | 136 AAGGATC.............TCAGTAATTAATCATGCACCTATGTGGCGG . . . . . 864 AAATTGTGGAATGTGTATGCTCATAGCACTGAGTGAAAATAAAAGATTGT ||| | ||| || || ||| | ||||||||| || |||||| | 173 AAA.TATGGGATATGCATGTCGA...CACTGAGTG..AAGGCAAGATTAT 813 135 863 172 913 216

Gaps
InserBons and deleBons

DNA scoring matrix used in EMBOSS

A 5 -4 -4 -4 T -4 5 -4 -4 G -4 -4 5 -4 C -4 -4 -4 5

Protein Sequence Alignment

TPKRREAEDLQVGQVLGGPLQLLE...SLQKRGIVEQCCT ||:|: |: |:|||::|: ||||||||| YPKKRDMEQ......LSGPLDMLQQEYQKMKRGIVEQCCH

A T G C

Section of EMBOSS data file EDNAFULL

Protein Sequence Alignment

TPKRREAEDLQVGQVLGGPLQLLE...SLQKRGIVEQCCT ||:|: |: |:|||::|: ||||||||| YPKKRDMEQ......LSGPLDMLQQEYQKMKRGIVEQCCH Identical

Protein Sequence Alignment

TPKRREAEDLQVGQVLGGPLQLLE...SLQKRGIVEQCCT ||:|: |: |:|||::|: ||||||||| YPKKRDMEQ......LSGPLDMLQQEYQKMKRGIVEQCCH Identical
Similar
Different

Protein Comparison: Scoring Matrix

Ala Cys Asp Glu A C D E 0.8 0.0 -0.4 -0.2 1.8 -0.6 -0.8 1.2 0.4 1.0 Phe Gly F G -0.4 0.0 -0.4 -0.6 -0.6 -0.2 -0.6 -0.4 1.2 -0.6 1.2 His H -0.4 -0.6 -0.2 0.0 -0.2 -0.4 1.6 Ile I -0.2 -0.2 -0.6 -0.6 0.0 -0.8 -0.6 0.8 Lys Leu K L -0.2 -0.2 -0.6 -0.2 -0.2 -0.8 0.2 -0.6 -0.6 0.0 -0.4 -0.8 -0.2 -0.6 -0.6 0.4 1.0 -0.4 0.8 Met Asn M N -0.2 -0.4 -0.2 -0.6 -0.6 0.2 -0.4 0.0 0.0 -0.6 -0.6 0.0 -0.4 0.2 0.2 -0.6 -0.2 0.0 0.4 -0.6 1.0 -0.4 1.2 Pro Gln P Q -0.2 -0.2 -0.6 -0.6 -0.2 0.0 -0.2 0.4 -0.8 -0.6 -0.4 -0.4 -0.4 0.0 -0.6 -0.6 -0.2 0.2 -0.6 -0.4 -0.4 0.0 -0.4 0.0 1.4 -0.2 1.0 Arg Ser R S -0.2 0.2 -0.6 -0.2 -0.4 0.0 0.0 0.0 -0.6 -0.4 -0.4 0.0 0.0 -0.2 -0.6 -0.4 0.4 0.0 -0.4 -0.4 -0.2 -0.2 0.0 0.2 -0.4 -0.2 0.2 0.0 1.0 -0.2 0.8 Thr Val T V 0.0 0.0 -0.2 -0.2 -0.2 -0.6 -0.2 -0.4 -0.4 -0.2 -0.4 -0.6 -0.4 -0.6 -0.2 0.6 -0.2 -0.4 -0.2 0.2 -0.2 0.2 0.0 -0.6 -0.2 -0.4 -0.2 -0.4 -0.2 -0.6 0.2 -0.4 1.0 0.0 0.8 Trp Tyr W Y -0.6 -0.4 -0.4 -0.4 -0.8 -0.6 -0.6 -0.4 0.2 0.6 -0.4 -0.6 -0.4 0.4 -0.6 -0.2 -0.6 -0.4 -0.4 -0.2 -0.2 -0.2 -0.8 -0.4 -0.8 -0.6 -0.4 -0.2 -0.6 -0.4 -0.6 -0.4 -0.4 -0.4 -0.6 -0.2 2.2 0.4 1.4 A C D E F G H I K L M N P Q R S T V W Y Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met Asn Pro Gln Arg Ser Thr Val Trp Tyr

First principles amino acid subsBtuBon matrices

IdenBty matrix
Perfect match: posiBve score Any mismatch: negaBve score

GeneBc score matrix

Based on the average number of nucleoBde changes needed to mutate one amino acid into another e.g. K (AAA, AAG) to N (AAC, AAU) has a higher score than K (AAA, AAG) to D (GAU, GAC)

Chemical properBes matrices

e.g. K (basic) to R (basic) has a higher score than K (basic) to F (aromaBc) or K to E (acidic)

BLOSUM62 Matrix

IdenBty matrix example

D E Q H V F W +1 -1 -1 -1 -1 -1 -1 D

Data-based matrices
Calculated from amino acid frequencies in known homologous sequences PAM family of matrices BLOSUM family of matrices Perform befer than rst principle matrices (which are sBll useful for some specialised applicaBons)

+1 -1 -1 -1 -1 -1 E

+1 -1 -1 -1 -1 Q

+1 -1 +1 -1 -1 +1 -1 -1 -1 +1 H V F W

BLOSUM matrices
BLOSUM 62

BLOSUM matrices
Heniko and Heniko, 1992 Blocks SubsBtuBon Matrix Based on the BLOCKS database Currently, most widely used matrix family Most commonly used matrices: BLOSUM62 and BLOSUM55

BLOCKS database
BLOCKS are ungapped mulBple sequence alignments based on the SWISS-PROT database and the PROSITE protein family database All the sequences from SWISS-PROT belonging to a PROSITE family are aligned together, to create local ungapped alignments characterisBc of the protein family

ID Mn_catalase; BLOCK AC IPB007760A; distance from previous block=(3,160) DE Manganese containing catalase BL HIL; width=14; seqs=49; 99.5%=727; strength=1034 CTJC_BACSU|Q45538 ( 67) HLEMIATMVYKLTK 12 GS80_BACSU|P80878 ( 69) HVEMIATMIARLLE 14 YDHU_BACSU|O05513 ( 4) HGNLITDLLDNLLL 25 O69145 ( 70) HMEIVAETINLLNG 64 Q9KDZ2 ( 136) SGNLIFDLLHNYFL 34 Q9KAU6 ( 69) HVEMLATMIARLLD 16 Q9I1T0 ( 68) HLEIIGSIVGMLNK 20 Q97JE8 ( 68) HLEIVGSIVRQLSR 50 MCAT_CLOAB|Q97FE0 ( 124) TGDIVADLLSNIAS 73 Q8Z7E1 ( 68) HLEIIGSLVGMLNK 17 Q8YY54 ( 69) HIEMLATMIAHLLD 27 Q8YSJ5 ( 68) HLEMVGKLIEAHTK 36 Q9KWV1 ( 68) HLEIIGSLVGMLNK 17 Q8XDQ1 ( 68) HLEIIGSLVGMLNK 17 YJQC_BACSU|O34423 ( 69) HVEMLATMISRLLD 19 Q8R929 ( 68) HLEIIATLVFKLLK 22 Q8PG91 ( 68) HLEIIGSIIAMLNK 19 Q8P4M4 ( 68) HLEIIGSIIAMLNK 19 Q8EQM8 ( 18) SGNLLADFRANLTA 35

BLOCK example

From BLOCKS to BLOSUM

1. Count the number of amino acid pairs observed in each column of each block and calculate the observed frequency of each pair 2. Calculate the expected frequency of each pair (based on the frequency of individual amino acids) 3. Calculate the log raBo (typically log2)

1. Count number of observed pairs and calculate frequencies

# 6& There are 4 % ( = 60 aligned pairs of amino acids in the block $ 2'

DADA AAAE AAEE AADA AAEE AADE

Aligned pair Proportion of times observed (xy)

(oxy)
A to A
26/60
A to D
A to E
D to D
D to E
E to E
8/60
10/60
3/60
6/60
7/60

General case for step 1.

For each pair of amino acids x and y, n xy = number of times x and y are in the same column of a block oxy = observed proportion of aligned pair xy oxy = n xy

2. Calculate the expected frequency of each Amino acid (x)

Proportion in block (px)
pair
A
14/24
4/24
6/24

DADA AAAE AAEE AADA AAEE AADE

D
E

u v

n uv

Amino acid pair (xy)

Expected proportion (exy)
A to A
(14/24)2 = 196/576
A to D
2(14/24) (4/24) = 112/576
A to E
2 (14/24) (6/24) = 168/576
D to D
(4/24)2 = 16/576
D to E
2(4/24) (6/24) = 48/576
E to E
(6/24)2 = 36/576

General case for step 2

Expected proportion of amino acid pair xy in random block of same amino acid composition : #2 px py if x y exy = $ % px py if x = y

3. Calculate the log raBo

"o % xy Matrix entry = 2log 2 $ $e ' ' (rounded to nearest integer) # xy &
Aligned pair (xy)
A to A
A to D
A to E
D to D
D to E
E to E
oxy

26/60
8/60
10/60
3/60
6/60
7/60

exy

196/576
112/576
168/576
16/576
48/576
36/576

2log2(oxy/exy)
0.70
-1.09
-1.61
1.70
0.53
1.80

Final matrix

BLOSUM family

A D E

A 1 -1 -2

D -1 2 1

E -2 1 2

Problem: counBng every amino acid in the block can lead to an over-representaBon of amino acid changes found in closely related sequences SoluBon: cluster sequences closer than a set % idenBty, and average their contribuBon so that the whole cluster counts as one sequence This gives rise to a family of matrices, depending on the % idenBty threshold

The 2log2 transformation means that the matrix is in half-bits

VSLHL ELTRS EWTRS EISRS ELCRT

80% identical 60% identical

PAM matrices
PAM120

nEE

No clustering (BLOSUM100)
Clustering sequences with 80% identity (BLOSUM80)
Clustering sequences with 60% identity (BLOSUM60)
6
3
2

nVE

4
3
2

PAM matrices
PAM - Point (Percent) Accepted MutaBon Schwartz and Dayho, 1978 Also known as MDM78 (mutaBon data matrix) or Dayho matrix Empirical matrix based on evoluBonary model Based on small number of families of closely related proteins (>85% idenBty) so that sequences can be aligned unambiguously by hand Since the changes observed between these sequences did not aect the funcBon of the protein, these are accepted muta9ons

1. Align the sequences by hand 2. Order the sequences using parsimony

hbb_ornan hbb_tacac hbe_ponpy hbb_speci hbb_speto hbb_equhe LSELHCDKLH LSELHCDKLH LSELHCDKLH LSELHCDKLH LSELHCDKLH LSELHCDKLH VDPENFNRLG VDPENFNRLG VDPENFKLLG VDPENFKLLG VDPENFKLLG VDPENFRLLG NVLIVVLARH NVLVVVLARH NVMVIILATH NMIVIVMAHH NMIVIVMAHH NVLVVVLARH FSKDFSPEVQ FSKEFTPEAQ FGKEFTPEVQ LGKDFTPEAQ LGKDFTPEAQ FGKDFTPELQ AAWQKLVSGV AAWQKLVSGV AAWQKLVSAV AAFQKVVAGV AAFQKVVAGV ASYQKVVAGV

3. Count the number of Bmes each amino acid changes to each other one e.g. F changing to L
hbb_ornan hbb_tacac hbe_ponpy hbb_speci hbb_speto hbb_equhe LSELHCDKLH LSELHCDKLH LSELHCDKLH LSELHCDKLH LSELHCDKLH LSELHCDKLH VDPENFNRLG VDPENFNRLG VDPENFKLLG VDPENFKLLG VDPENFKLLG VDPENFRLLG NVLIVVLARH NVLVVVLARH NVMVIILATH NMIVIVMAHH NMIVIVMAHH NVLVVVLARH FSKDFSPEVQ FSKEFTPEAQ FGKEFTPEVQ LGKDFTPEAQ LGKDFTPEAQ FGKDFTPELQ AAWQKLVSGV AAWQKLVSGV AAWQKLVSAV AAFQKVVAGV AAFQKVVAGV ASYQKVVAGV

4. Calculate probability for each amino acid mutaBng to each other amino acid
For each pair of amino acids i and j,the frequency of change fij is: N ij f ij = N ik
k

L L 1 F<->L change. (NFL = 1)

L F F F

F F F

For ij, the probability of change pij is:

pij = cf ij and pii = 1 cf ij i j where c is a posiBve scaling constant chosen so that each pii > 0.

Probability matrix
The resulBng probability matrix allows modelling the evoluBon of protein sequences as a Markov process - that is, the probability of any amino acid mutaBng to another one is dependent only on that amino acid
A C D E pAA pAC pCC pAD pCD pDD pAE pCE pDE pEE A C D E

The constant c is chosen so that the expected number of amino acid changes amer one round of applying the probabiliBes is 1 in 100 amino acids

PAM 1

Expected proportion of mutated amino acids :

p p
i i i j

= c pi fij = 0.01
i i j

The resulBng probability matrix is the PAM1 probability matrix, giving the probability that an amino acid will mutate to another over an amount of evoluBonary Bme such that 1% of amino acids mutate

5. PAM N
Because the probability matrix is Markov, it is possible to calculate probability matrices for longer evoluBonary Bmes by mulBplying the matrix by itself n Bmes
e.g. PAM2 probability matrix : " pAA pAC pAD ...% " pAA pAC $ ' $ $ pCA pCC pCD ...' $ pCA pCC $ pDA pDC pDD ...' $ pDA pDC $ ' $ ... ... ...& # ... ... # ...

PAM N
e.g. a PAM250 matrix represents a 250% level of evoluBonary change e.g. PAM120, PAM80, PAM60 matrices could be used for aligning sequences which are approximately 40%, 50% and 60% similar, respecBvely PAM250 has been shown preferable for distantly related proteins of 14-27% similarity

pAD pCD pDD ...

...% ' ...' ...' ' ...&

DetecBng evoluBonary relaBonships

Rather than use probabiliBes, it is more convenient to use log odds matrices If pij is an entry in the PAMN probability matrix, the corresponding entry in the PAMN log odds matrix is:

6. PAM log odds matrices

300 million years

200 million years
100 million years
Today

where C is a posiBve constant and qi and qj are the respecBve observed frequencies of amino acids i and j in the sequences Interpreted as the raBo of the probability that the subsBtuBon represents an authenBc evoluBonary change to the probability that it occurred due to random events of no biological signicance.
PAM100
PAM100
PAM200

" p % ij C log$ $q q ' ' # i j&

PAM100
PAM100
PAM200

PAM matrices - summary

Family of subsBtuBon matrices corresponding to dierent levels of evoluBonary Bme Based on sound evoluBonary principles Distances for long periods of evoluBonary history extrapolated from shorter Bmes (assumpBon!) Based on a relaBvely small dataset (mainly globular proteins)

BLOSUM vs PAM
PAM
Built from an evolutionary model based on closely related proteins
Extrapolation from closely related sequences
Built from a small number of complete sequences
BLOSUM
Built directly from blocks of aligned protein segments covering a wide range of evolutionary time
No extrapolation
Built from a large number of sequence segments

BLOSUM vs PAM (cont.)

PAM
PAMn matrices with low n are better suited to closely related sequences
Uses phylogenetic tree to avoid over-representing closely related sequences

Commonly used as log odds matrix
BLOSUM
BLOSUMn matrices with low n are better suited to highly divergent sequences
Uses clustering of related sequences and direct counting of amino acid changes
Commonly used as log odds matrix

BLOSUM vs PAM CounBng Changes BLOSUM

AA AB BB direct counts
A-B count = 4
PAM

AA
BB AB
AB
counts from an
evolutionary model
A-B count = 2

Gap penalBes I
RaBonale:
Gaps arise through inserBon/deleBon events,which do not happen one residue at a Bme. Penalty for creaBng a new gap Typically, relaBvely high to prevent too many gaps in the alignment Penalty for extending an exisBng gap Typically, relaBvely small so that a small dierence in gap length will not aect the penalty for this gap, but not too small to result in very long gaps.

Gap Penalties II
Alignment of human and hemoglobin chains

Gap creaBon penalty:

Gap penalty = 1, Gap extension penalty = 0.1

1 V.LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF.DLSH.....GSA | |.|.:|..|.| |||| :.:| |:|||:|::: :| |. :|. | ||| |.: 1 VHLTPEEKSAVTALWGKV..NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNP . . . . . . 54 QVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHL .||:||||| :|:.:::||:|::...:..||:||..||:||| ||:||::.|:..|| |: 59 KVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHF . . 114 PAEFTPAVHASLDKFLASVSTVLTSKYR 141 . ||||:|:|..:|.:|:|...|. ||: 119 GKEFTPPVQAAYQKVVAGVANALAHKYH 146

Gap extension (length) penalty:

Gap PenalBes III

Alignment of human and hemoglobin chains

The twilight zone

True positives

Gap penalty = 5, Gap extension penalty = 0.1

2 LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF......DLSHGSAQV |.|.:|..|.| |||| :.:| |:|||:|::: :| |. :|. | | |.:.| 3 LTPEEKSAVTALWGKV..NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKV . . . . . . 56 KGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPA |:||||| :|:.:::||:|::...:..||:||..||:||| ||:||::.|:..|| |:. 61 KAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGK . . 116 EFTPAVHASLDKFLASVSTVLTSKYR 141 ||||:|:|..:|.:|:|...|. ||: 121 EFTPPVQAAYQKVVAGVANALAHKYH 146

False negatives

Rost, B. Protein Eng. 1999 12:85-94; doi:10.1093/protein/12.2.85

Measuring alignment quality

Alignment score
RelaBve to random alignment?

Something to think about

Why do we add the scores together?

Percentage idenBty Percentage similarity EvoluBonary distance

In its simplest form, 1-%idenBty Several methods available to correct for mulBple subsBtuBons

Vitros 4600
100% (4)
Vitros 4600
78 pages
Excel Shortcuts (BIWS)
100% (1)
Excel Shortcuts (BIWS)
3 pages
TC2906en-Ed08 Release Note and Installation Procedure OmniPCX Enterprise R12.4 Version M5.204.88
No ratings yet
TC2906en-Ed08 Release Note and Installation Procedure OmniPCX Enterprise R12.4 Version M5.204.88
63 pages
Packz 4
No ratings yet
Packz 4
18 pages
BLAST N FASTA
No ratings yet
BLAST N FASTA
55 pages
ItoBI Lec5
No ratings yet
ItoBI Lec5
18 pages
Introduction To Bioinformatics: Sequence Alignment
No ratings yet
Introduction To Bioinformatics: Sequence Alignment
29 pages
Sequence Similarity Searching: WWW - Med.nyu - edu/rcr/rcr/course/PPT/similarity
No ratings yet
Sequence Similarity Searching: WWW - Med.nyu - edu/rcr/rcr/course/PPT/similarity
57 pages
Scoring Matrices 06
No ratings yet
Scoring Matrices 06
25 pages
Bioinfo-Ders-7-ALLIGNMENT_1
No ratings yet
Bioinfo-Ders-7-ALLIGNMENT_1
55 pages
Act01_OpenReadingFrames
No ratings yet
Act01_OpenReadingFrames
8 pages
T Coffee - Overview
No ratings yet
T Coffee - Overview
102 pages
Unit Ii
No ratings yet
Unit Ii
14 pages
Sequence Alignment: Scoring Matrices
No ratings yet
Sequence Alignment: Scoring Matrices
30 pages
lec-02
No ratings yet
lec-02
103 pages
Sequence Similarity Searching: Basic Local Alignment Search Tool
No ratings yet
Sequence Similarity Searching: Basic Local Alignment Search Tool
47 pages
Database Similarity Searching: Irit Orr Shifra Ben Dor
No ratings yet
Database Similarity Searching: Irit Orr Shifra Ben Dor
76 pages
About Basic Local Alignment Search Tool
No ratings yet
About Basic Local Alignment Search Tool
17 pages
RFG0503 Reportno 63
No ratings yet
RFG0503 Reportno 63
125 pages
Gene Identification - I: Shivani Chandra Birla Institute of Scientific Research
No ratings yet
Gene Identification - I: Shivani Chandra Birla Institute of Scientific Research
35 pages
04 CAP5510 Fall21
No ratings yet
04 CAP5510 Fall21
37 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
24 pages
Classic Subcloning Row
No ratings yet
Classic Subcloning Row
32 pages
Chapter 3 Alignment Bioinformatics DR - Tuan LMS 2022
No ratings yet
Chapter 3 Alignment Bioinformatics DR - Tuan LMS 2022
26 pages
FASTA Result1
No ratings yet
FASTA Result1
6 pages
Part 1: Your First BLAST Search
No ratings yet
Part 1: Your First BLAST Search
24 pages
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
No ratings yet
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
47 pages
Sub Cloning Notebook Euro
No ratings yet
Sub Cloning Notebook Euro
72 pages
Bacterial Genome Assembly Illumina
No ratings yet
Bacterial Genome Assembly Illumina
49 pages
Bioinformaticpdf 1
No ratings yet
Bioinformaticpdf 1
21 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Primr Design
No ratings yet
Primr Design
57 pages
Durbin Watson Tables
No ratings yet
Durbin Watson Tables
35 pages
Seminar 2005 Dec.23
No ratings yet
Seminar 2005 Dec.23
48 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
74abt/ac/act/hc/hct Ic
No ratings yet
74abt/ac/act/hc/hct Ic
8 pages
Handy Tools For A Modern Lab: Euroscipy 2009
No ratings yet
Handy Tools For A Modern Lab: Euroscipy 2009
30 pages
Control Struct
0% (1)
Control Struct
835 pages
Advance Research Method 1 Midterm Exam Answer Key
No ratings yet
Advance Research Method 1 Midterm Exam Answer Key
7 pages
Canonical Correlation 1
No ratings yet
Canonical Correlation 1
8 pages
Genomica 1 Alinieri Secvente
0% (1)
Genomica 1 Alinieri Secvente
5 pages
Counters
No ratings yet
Counters
48 pages
UCSC Genome Browser
No ratings yet
UCSC Genome Browser
424 pages
Running BLAST Through Perl
No ratings yet
Running BLAST Through Perl
35 pages
An Example of Attribute Based MDS Using Discriminant Analysis
No ratings yet
An Example of Attribute Based MDS Using Discriminant Analysis
17 pages
_second_done_w14b_searching squence databases
No ratings yet
_second_done_w14b_searching squence databases
32 pages
Solutions Manual to Advanced Regression Models with SAS and R 1st Edition Olga Korosteleva download
100% (4)
Solutions Manual to Advanced Regression Models with SAS and R 1st Edition Olga Korosteleva download
65 pages
Alignment: Results Parameters
No ratings yet
Alignment: Results Parameters
128 pages
Prelucrari Colecistita
No ratings yet
Prelucrari Colecistita
42 pages
Pss Options
No ratings yet
Pss Options
6 pages
1983, Holst T. L., Numerical Computation of Transonic Flow Governed by The Full-Potential Equation, NASA TM 84310 PDF
No ratings yet
1983, Holst T. L., Numerical Computation of Transonic Flow Governed by The Full-Potential Equation, NASA TM 84310 PDF
113 pages
Bioinfo FINAL
No ratings yet
Bioinfo FINAL
44 pages
Nature02358 s1
No ratings yet
Nature02358 s1
11 pages
ARIMA Model
No ratings yet
ARIMA Model
30 pages
Full Download Solutions Manual to Advanced Regression Models with SAS and R 1st Edition Olga Korosteleva PDF DOCX
100% (4)
Full Download Solutions Manual to Advanced Regression Models with SAS and R 1st Edition Olga Korosteleva PDF DOCX
55 pages
Arima Model
No ratings yet
Arima Model
30 pages
Single Nucleotide Polymorphism Analysis
No ratings yet
Single Nucleotide Polymorphism Analysis
34 pages
Photonic Modules
No ratings yet
Photonic Modules
290 pages
Classical Approach to Constrained and Unconstrained Molecular Dynamics
From Everand
Classical Approach to Constrained and Unconstrained Molecular Dynamics
Ajith Gunaratne
No ratings yet
AP Calculus Flashcards, Fourth Edition: Up-to-Date Review and Practice
From Everand
AP Calculus Flashcards, Fourth Edition: Up-to-Date Review and Practice
Barron's Educational Series
No ratings yet
ASVAB Study Guide Premium: 6 Practice Tests + Comprehensive Review + Online Practice
From Everand
ASVAB Study Guide Premium: 6 Practice Tests + Comprehensive Review + Online Practice
Barron's Educational Series
1.5/5 (1)
Robust Adaptive Control
From Everand
Robust Adaptive Control
Petros Ioannou
No ratings yet
01-Intro To Sequence
No ratings yet
01-Intro To Sequence
2 pages
Week 9 Tutorial Questions Solutions
No ratings yet
Week 9 Tutorial Questions Solutions
4 pages
Hapter: International Trade and Trade Policy
No ratings yet
Hapter: International Trade and Trade Policy
25 pages
ACCT1501 Study Notes
No ratings yet
ACCT1501 Study Notes
76 pages
Opencl On Fpga: Marc Gaucheron INTEL Programmable Solution Group
No ratings yet
Opencl On Fpga: Marc Gaucheron INTEL Programmable Solution Group
128 pages
Best Photo Editing Apps
100% (1)
Best Photo Editing Apps
28 pages
Euro 2020 Office Pool V3.42 - 10 Player - Free Version
No ratings yet
Euro 2020 Office Pool V3.42 - 10 Player - Free Version
16 pages
Unit 1
No ratings yet
Unit 1
14 pages
Unit 4
No ratings yet
Unit 4
6 pages
Cambridge International AS & A Level: Computer Science For Examination From 2021
No ratings yet
Cambridge International AS & A Level: Computer Science For Examination From 2021
28 pages
SQL Commands - The Complete List (W - Examples) - Dataquest
No ratings yet
SQL Commands - The Complete List (W - Examples) - Dataquest
22 pages
Ursalink UR71 Industrial Cellular Router Datasheet
No ratings yet
Ursalink UR71 Industrial Cellular Router Datasheet
6 pages
lastUIException 63865829741
No ratings yet
lastUIException 63865829741
1 page
Lastexception 63814567684
No ratings yet
Lastexception 63814567684
4 pages
Rockwell Automation Library of Process Objects: Single-Speed Motor (P - Motor)
No ratings yet
Rockwell Automation Library of Process Objects: Single-Speed Motor (P - Motor)
52 pages
AJ MCQ Final
No ratings yet
AJ MCQ Final
33 pages
Unit 1 - Awt
No ratings yet
Unit 1 - Awt
35 pages
Addfem Poco Plus Manual A5e00075541bl-02
No ratings yet
Addfem Poco Plus Manual A5e00075541bl-02
82 pages
Their Experiences Offering Their Stanford Courses Online in Fall 2011. Soon After, They Left Standford University To Launch Coursera
No ratings yet
Their Experiences Offering Their Stanford Courses Online in Fall 2011. Soon After, They Left Standford University To Launch Coursera
3 pages
What Are Smart Contracts.9663184.Powerpoint
No ratings yet
What Are Smart Contracts.9663184.Powerpoint
4 pages
OOAD Chapter 5
No ratings yet
OOAD Chapter 5
23 pages
Registration Completed Successfully PDF
No ratings yet
Registration Completed Successfully PDF
3 pages
Rx3i-Ethernet Manual PDF
100% (1)
Rx3i-Ethernet Manual PDF
328 pages
Question and Answer
No ratings yet
Question and Answer
7 pages
Install Ubuntu Server 18
No ratings yet
Install Ubuntu Server 18
11 pages
Setup hMailServer To Use A SSL Certificate - Dovetail Software
No ratings yet
Setup hMailServer To Use A SSL Certificate - Dovetail Software
5 pages
Android Application Development - Course Outline
100% (1)
Android Application Development - Course Outline
3 pages
Use Case: Create Project Information: Optimal Fertilizer Solution (OFERS)
No ratings yet
Use Case: Create Project Information: Optimal Fertilizer Solution (OFERS)
2 pages
Smartplant P&Id Insulation Manager: User'S Guide
No ratings yet
Smartplant P&Id Insulation Manager: User'S Guide
30 pages
Secure, Fast & Private Web Browser With Adblocker - Brave Browser
No ratings yet
Secure, Fast & Private Web Browser With Adblocker - Brave Browser
12 pages
December 2019 INCALDVDCover Letter
No ratings yet
December 2019 INCALDVDCover Letter
4 pages