Sequence Alignment Presentation
Sequence Alignment Presentation
The sequences are padded with gaps (dashes) so that wherever possible,
columns contain identical characters from the sequences involved
tcctctgcctctgccatcat---caaccccaaagt
|||| ||| ||||| ||||| ||||||||||||
tcctgtgcatctgcaatcatgggcaaccccaaagt
Sequence alignment
* prediction of function
* database searching
* gene finding
* sequence divergence
Causes for sequence (dis)similarity
A global alignment between two sequences is an alignment in which all the
characters in both sequences participate in the alignment.
Global alignments are useful mostly for finding closely-related sequences.
The global best fit between two sequences
Example: the sequences s = VIVALASVEGAS and
t = VIVADAVIS align like:
A(s,t) =
V I V A L A S V E G A S
| | | | | | |
V I V A D A - V - - I S
indels
Local alignment methods find related regions within sequences - they can
consist of a subset of the characters within each sequence.
LGPSSKQTGKGS-SRIWDN
Global alignment
LN-ITKSAGKGAIMRLGDA
-------TGKG--------
Local alignment
-------AGKG--------
Methods of pairwise alignment
Dot matrix analysis
The dynamic programming (DP) algorithm
Word methods
Dot matrix analysis
A dot matrix analysis is a method for comparing two
sequences to look for possible alignment (Gibbs and McIntyre
1970)
Dot plots are two dimensional graphs showing a comparison
of two sequences. The two axes X and Y of the graph represent
the two sequences being compared. Wherever a base or
residue of one axis coincides with a base or residue on the
other axis, it is marked with a dot. Any region of similarity is
revealed by a diagonal row of dots. Isolated dots not on
diagonal represent random matches.
Assume that we have to compare the following sequences
Sequence 1: AGCTAGGA
Sequence 2: CACTAGGC
Insert a dot in each matching cell and then scan the resulting
graphs for a series of dots that form a diagonal.
A G C T A G G A
C
A
C
T
A
G
G
C
If both penalties are set to 0, it aims to always find an alignment with maximum
matches so far.
It is used to compare the similarity between two sequences of DNA or Protein, to
predict similarity of their functionalities.
is: indels
Conservative substitution: Substitution that does not affect the protein’s property or
function.
Substitution score matrix is used to show scores for amino acid substitutions.
When calculating alignment scores, identical amino acids are given greater value
than substitutions and among substitutions conservative substitutions are given
greater value than non-conservative substitutions.
Two widely used substitution matrices are PAM and BLOSUM.
PAM BLOSUM
Based on global alignments Based on local alignments.
of closely related proteins.
PAM is the substitution of one amino acid of a protein by another that is ‘accepted’
by evolution. This implies that within some given species, the mutation has not
only arisen but has overtime, spread to essentially the entire species. One PAM
(PAM1) is a unit of evolutionary divergence in which 1% of the amino acids has
been changed (i.e. one point mutation per 100 residues).
PAM is based on the estimated mutation rates from the closely related proteins
and is dominated by the amino acid mutations caused by single base changes.