Lecture 5
Lecture 5
Lecture 5
Needleman–Wunsch Algorithm
• The Needleman–Wunsch algorithm is an algorithm used
in bioinformatics to align protein or nucleotide sequences.
• It was one of the first applications of
dynamic programming to compare biological sequences.
• The algorithm was developed by Saul B. Needleman and
Christian D. Wunsch and published in 1970.
• The algorithm essentially divides a large problem (e.g. the
full sequence) into a series of smaller problems, and it
uses the solutions to the smaller problems to find an
optimal solution to the larger problem.
Needleman–Wunsch Algorithm
• It is also sometimes referred to as the
optimal matching algorithm and the global alignment
technique.
• The Needleman–Wunsch algorithm is still widely
used for optimal global alignment, particularly when
the quality of the global alignment is of the utmost
importance.
• The algorithm assigns a score to every possible
alignment, and the purpose of the algorithm is to
find all possible alignments having the highest score.
Needleman–Wunsch Algorithm
• This algorithm can be used for any two strings.
• This guide will use two small DNA sequences
as examples as shown below:
• Sequence (1): GCATGCU
• Sequence (2): GATTACA
A {
𝑓 ( 𝑖 , 𝑗 ) =𝑚𝑎𝑥 𝑓 ( ) +𝑆(𝐺𝑎𝑝)
𝑓 ( ) +𝑆(𝐺𝑎𝑝)
C
A
Needleman–Wunsch Algorithm
Next, decide how to score each individual pair of letters.
The letters may match, mismatch, or be matched to a
gap (a deletion or insertion (Indel)):
Match: The two letters at the current index are the
same.
Mismatch: The two letters at the current index are
different.
Indel (INsertion or DELetion): The best alignment
involves one letter aligning to a gap(-) in the other
string.
Needleman–Wunsch Algorithm
• Each of these scenarios is assigned a score and
the sum of the scores of all the pairings is the
score of the whole alignment candidate.
• Different systems exist for assigning scores;
For now, the following system will be used:
• Match: +1
• Mismatch or Gap: −1
Needleman–Wunsch Algorithm
G C A T G C U Match: +1; Mismatch: -1; Gap: -1;
0 -2
-1 -3
-4
-5
-6
-7 𝑓 ( ) +𝑆(𝑚𝑎𝑡𝑐h 𝑂𝑅 𝑚𝑖𝑠𝑚𝑎𝑡𝑐h)
G
A
-1
-2
{
𝑓 ( 𝑖 , 𝑗 ) =𝑚𝑎𝑥 𝑓 ( ) +𝑆(𝐺𝑎𝑝)
𝑓 ( ) +𝑆(𝐺𝑎𝑝)
T
-3
T
-4
Sequence (1): GCATGCU
Sequence (2): GATTACA
A
-5
C
-6
A
-7
Needleman–Wunsch Algorithm
G C A T G C U Match: +1; Mismatch: -1; Gap: -1;
0 -2
-1 -3
-4
-5
-6
-7 𝑓 ( ) +𝑆(𝑚𝑎𝑡𝑐h 𝑂𝑅 𝑚𝑖𝑠𝑚𝑎𝑡𝑐h)
G
A
-1
-2
?
{
𝑓 ( 𝑖 , 𝑗 ) =𝑚𝑎𝑥 𝑓 ( ) +𝑆(𝐺𝑎𝑝)
𝑓 ( ) +𝑆(𝐺𝑎𝑝)
T
-3
T
-4
Sequence (1): GCATGCU
Sequence (2): GATTACA
A
-5
?
C
-6
A
-7
Needleman–Wunsch Algorithm
G C A T G C U Match: +1; Mismatch: -1; Gap: -1;
0 -2
-1 -3
-4
-5
-6
-7 𝑓 ( ) +𝑆(𝑚𝑎𝑡𝑐h 𝑂𝑅 𝑚𝑖𝑠𝑚𝑎𝑡𝑐h)
G
A
-1
-2
1 ?
{
𝑓 ( 𝑖 , 𝑗 ) =𝑚𝑎𝑥 𝑓 ( ) +𝑆(𝐺𝑎𝑝)
𝑓 ( ) +𝑆(𝐺𝑎𝑝)
T
-3
T
-4
Sequence (1): GCATGCU
Sequence (2): GATTACA
A
-5
?
C
-6
A
-7
Needleman–Wunsch Algorithm
G C A T G C U Match: +1; Mismatch: -1; Gap: -1;
0 -2
-1 -3
-4
-5
-6
-7 𝑓 ( ) +𝑆(𝑚𝑎𝑡𝑐h 𝑂𝑅 𝑚𝑖𝑠𝑚𝑎𝑡𝑐h)
G
A
-1
-2
1
?
0 {
𝑓 ( 𝑖 , 𝑗 ) =𝑚𝑎𝑥 𝑓 ( ) +𝑆(𝐺𝑎𝑝)
𝑓 ( ) +𝑆(𝐺𝑎𝑝)
T
-3
T
-4
Sequence (1): GCATGCU
Sequence (2): GATTACA
A
-5
?
C
-6
A
-7
Needleman–Wunsch Algorithm
G C A T G C U Match: +1; Mismatch: -1; Gap: -1;
0 -2
-1 -3
-4
-5
-6
-7 𝑓 ( ) +𝑆(𝑚𝑎𝑡𝑐h 𝑂𝑅 𝑚𝑖𝑠𝑚𝑎𝑡𝑐h)
G
A
-1
1
0
-2 0 ?
{
𝑓 ( 𝑖 , 𝑗 ) =𝑚𝑎𝑥 𝑓 ( ) +𝑆(𝐺𝑎𝑝)
𝑓 ( ) +𝑆(𝐺𝑎𝑝)
T
-3
T
-4
Sequence (1): GCATGCU
Sequence (2): GATTACA
A
-5
?
C
-6
A
-7
Needleman–Wunsch Algorithm
G C A T G C U Match: +1; Mismatch: -1; Gap: -1;
0 -2
-1 -3
-4
-5
-6
-7 𝑓 ( ) +𝑆(𝑚𝑎𝑡𝑐h 𝑂𝑅 𝑚𝑖𝑠𝑚𝑎𝑡𝑐h)
G
A
-1
-2
1
0 ?
0 0
{
𝑓 ( 𝑖 , 𝑗 ) =𝑚𝑎𝑥 𝑓 ( ) +𝑆(𝐺𝑎𝑝)
𝑓 ( ) +𝑆(𝐺𝑎𝑝)
T
-3 ?
T
-4
Sequence (1): GCATGCU
Sequence (2): GATTACA
A
-5
?
C
-6
A
-7
Needleman–Wunsch Algorithm
Needleman–Wunsch Algorithm