Scoring Matrices 06
Scoring Matrices 06
Shifra Ben-Dor
Irit Orr
Scoring matrices
Sequence alignment and database searching
programs compare sequences to each other
as a series of characters.
All algorithms (programs) for comparison
rely on some scoring scheme for that.
Scoring matrices are used to assign a score
to each comparison of a pair of characters.
Scoring matrices
The scores in the matrix are integer values.
In most cases a positive score is given to
identical or similar character pairs, and a
negative or zero score to dissimilar
character pairs.
Different types of matrices
Identity scoring - the simplest scoring
scheme, where characters are classified as:
identical (scores 1) , or non-identical (scores
0). This scoring scheme is not much used.
DNA scoring - consider changes as
transitions and transversions. This matrix
scores identical bp 3, transitions 2, and
transversions 0.
Different types of matrices
Chemical similarity scoring (for proteins) -
this matrix gives greater weight to amino
acids with similar chemical properties (e.g
size, shape or charge of the aa).
Observed matrices for proteins - most
commonly used by all programs. These
matrices are constructed by analyzing the
substitution frequencies seen in the
alignments of known families of proteins.
Observed Scoring Matrices
Every possible identity and substitution is
assigned a score.
This score is based on the observed
frequencies of such occurrences in
alignments of evolutionary related proteins.
This score will also reflect the frequency
that a particular amino acid occurs in
nature, as some amino acids are more
abundant than others.
Observed Scoring Matrices