0% found this document useful (0 votes)
2 views19 pages

L-8 Global Alignment

The document discusses methods of sequence alignment, focusing on global and local alignment techniques. It explains the Needleman-Wunsch algorithm for global alignment and the Smith-Waterman algorithm for local alignment, along with scoring matrices and gap penalties used in these methods. The process involves initializing a dynamic programming matrix, filling it with scores, and tracing back to find optimal alignments between sequences.

Uploaded by

roopalmishra98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views19 pages

L-8 Global Alignment

The document discusses methods of sequence alignment, focusing on global and local alignment techniques. It explains the Needleman-Wunsch algorithm for global alignment and the Smith-Waterman algorithm for local alignment, along with scoring matrices and gap penalties used in these methods. The process involves initializing a dynamic programming matrix, filling it with scores, and tracing back to find optimal alignments between sequences.

Uploaded by

roopalmishra98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Methods of Sequence Alignment

Global Alignment
Closely related sequences which are of same length are very
much appropriate for global alignment.
Here, the alignment is carried out from beginning till end
of the sequence to find out the best possible alignment

How it is done?
Needleman-Wunsch algorithm-A formula or set of steps to
solve a problem
Developed by Saul B. Needleman and Christian D. Wunsch in
1970
Dynamic programming algorithm for sequence alignment
 Local Alignment
Sequences which are suspected to have similarity or even
dissimilar sequences can be compared with local alignment
method.
It finds local regions with high level of similarity

How it is done?
Smith-Waterman algorithms
Scoring matrices
Mostly Needleman-Wunsch and Smith-Waterman algorithms
use scoring system
For nucleotide sequence alignment, the scoring matrices
used are relatively simpler since the frequency of mutation for
all the bases are equal.
Positive or higher value is assigned for a match and a
negative or a lower value is assigned for mismatch.
These assumption based scores can be used for scoring the
matrices
Mainly used predefined matrices are PAM and BLOSUM
 Gap score or gap penalty: Dynamic programming algorithms
use gap penalties to maximize the biological meaning.
 Gap penalty is subtracted for each gap that has been introduced.
 There are different gap penalties such as gap open and gap
extension. The gap score defines a penalty given to alignment
when we have insertion or deletion.
 During the evolution, there may be a case where we can see
continuous gaps all along the sequence, so the linear gap penalty
would not be appropriate for the alignment.
 Thus gap open and gap extension has been introduced when there
are continuous gaps (five or more).
 The open penalty is always applied at the start of the gap, and
then the other gaps following it is given with a gap extension
penalty which will be less compared to the open penalty. Typical
values are –12 for gap opening, and –4 for gap extension.
The dynamic programming matrix is defined with three
different steps.

1.Initialization of the matrix with the scores possible.


2.Matrix filling with maximum scores.
3.Trace back the residues for appropriate alignment.
Initialization Step
This example assumes that there is gap penalty. First row and first
column of the matrix can be initially filled with 0. If the gap score is
assumed, the gap score can be added to the previous cell of the row
or column
Matrix Fill Step
 To find the maximum score of each cell, it is required to
know the neighbouring scores (diagonal, left and right) of the
current position
 Thus, we can obtain three different values, from that take
the maximum among them and fill the ith and jth position
with the score obtained

Value of box beside + Gap


Value of bottom box+ Gap
Value of Diagonal Box+Match/Mismatch Value
Matrix filling with back pointers
Trace back Step
 In the above mentioned example, one can see the bottom
right hand corner score as -1.
 The important point to be noted here is that there may be
two or more alignments possible between the two example
sequences.
 The current cell with value -1 has immediate predecessor,
where the maximum score obtained is diagonally located and
its value is 0. If there are two or more values which points
back, suggests that there can be two or more possible
alignments
 By continuing the trace back step by the above defined
method, one would reach to the 0th row, 0th column.
Seq1: ATGCG
Seq2: ATGCA

Match =1
Mismatch =-1
Gap -2
Trace Back

The trace back begins from the position which has


the highest value, pointing back with the pointers,
thus find out the possible predecessor, then move
to next predecessor and continue until we reach the
score 0
It is possible to find two pointers pointing out from one cell,
where both ways(alignments) can be considered, best one is
found by scoring and finding maximum score among them.
ATGCAG
ATGAG

A T G A G
0 -2 -4 -6 -8 -10
A -2
T -4
G -6
C -8
A -10
G -12

You might also like