Edit Distance
Edit Distance
Minimum edit distance between two strings is defined as the minimum number
of editing operations (operations like insertion, deletion, substitution) needed
to transform one string into another.
Minimum Edit Distance
Two strings and their alignment: Given two sequences, an alignment is a
correspondence between substrings of the two sequences.
•We’ll use dynamic programming to compute D[n,m] bottom up, combining solutions to
sub problems.
The value of D[i, j] is computed by taking the minimum of the three possible paths through
the matrix which arrive there:
If we assume the version of Levenshtein distance in which the insertions and deletions
each have a cost of 1 (ins-cost(·) = del-cost(·) = 1), and substitutions have a cost of 2
(except substitution of identical letters have zero cost), the computation for D[i,j] becomes:
Min Edit Distance Algorithm
Dynamic Programming for Minimum Edit Distance
T 6
N 5
E 4
T 3
N 2
I 1
# 0 1 2 3 4 5 6 7 8 9
# E X E C U T I O N
The Edit Distance Table
N 9
O 8
I 7
T 6
N 5
E 4
T 3
N 2
I 1
# 0 1 2 3 4 5 6 7 8 9
# E X E C U T I O N
Edit Distance
N 9
O 8
I 7
T 6
N 5
E 4
T 3
N 2
I 1
# 0 1 2 3 4 5 6 7 8 9
# E X E C U T I O N
The Edit Distance Table
N 9 8 9 10 11 12 11 10 9 8
O 8 7 8 9 10 11 10 9 8 9
I 7 6 7 8 9 10 9 8 9 10
T 6 5 6 7 8 9 8 9 10 11
N 5 4 5 6 7 8 9 10 11 10
E 4 3 4 5 6 7 8 9 10 9
T 3 4 5 6 7 8 7 8 9 8
N 2 3 4 5 6 7 8 7 8 7
I 1 2 3 4 5 6 7 6 7 8
# 0 1 2 3 4 5 6 7 8 9
# E X E C U T I O N
Minimu
m Edit Back trace for Computing
Alignments
Distance
Computing alignments
Time:
O(nm)
Space:
O(nm)
Backtrace
O(n+m)