0% found this document useful (0 votes)
7 views

02 Computing Minimum Edit Distance 5-54

Uploaded by

idhitappu
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

02 Computing Minimum Edit Distance 5-54

Uploaded by

idhitappu
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

How are we going to compute interim at a

distance? The standard algorithm is with


dynamic programing. Down programming is a
tabular method of computation. And what
we're gonna do, is we're gonna compute D,
the distance between two strings X and Y,
X of length N, Y of length M, by combining
solutions to sub-problems. And combining
solutions to sub-problems, is the
intuition of all dynamic programming
algorithms. The intuition is, is very
simple. We're gonna for small prefixes of
length I of string X, and J of string Y,
we'll compute the difference between those
strings and we'll compute our larger
Distances for larger strings based on
those previously computed smaller values.
In other words, we're going to compute the
distance I-j between prefixes of string X
of length I and prefixes of string. Y of
length J for all I and J, and we'll end up
in the end with the di, the distance. So
let's look at the actual equation. Here's
the equation for defining the minimum at a
distance. And I've given you levenshtein
distance which is the distance when there
is a cost of one for insertion, one for
deletions, and two for substitutions. So
let's look at the initialization
condition. So the, we first say that. Any
characters in X, so this is the X string
for it, for the I character is an X
string. The distance between those and the
null string in Y is the cost of deleting
each of those characters. So the cost of
those is, is the length of the string.
We're deleting each character. And
similarly for inserting all the characters
into Y. To create the string y. The
distance between the null string x and the
string of y is just a length of y. The
insertion cost of y. And then, we'll do
the recurrence relation. So, walking
through string x and walking through
string y. We'll have that the distance in
any particular cell of our matrix is going
to be the minimum way of getting to that
cell from three previous cells. If we go
from the string I, that's one shorter so
we're deleting one more thing in i. To
make a J. Or we're inserting one thing
into J to make it longer. Or we're
substituting between the previous string I
of length I, X of length I minus one and Y
of length J minus one. We're adding in a
new character. If it's the same in both
strings we have a cost of zero. If it's
different we have the substitution cost of
two. And then at the end, the distance
between the two strings is simply the It's
simply the, the DFNM, the upper right
corner of the matrix. So here's our table.
And we can fill in each element of the
table from using this equation that tells
us the deletion cost, the insertion
cost and the substitution cost. So let's
do that. I put the equation up here in the
corner. So we want to know what's the
distance between the null string of
intention and the null string of
execution, obviously zero. The null string
I to the string nothing is still the cost
of deleting an i. That's one. So now let's
try to compute whats the cost of
converting IN to E. While intuitively we
expect it's going to be a deletion and a
substitution. So let's see if that works
out. Alright, so the, Element in this cell
is the minimum of three values. It's this
distance plus one, this distance plus one,
or this distance plus either two if I and
zero are different, or zero if they are
the same. Well, they're different, so it's
the minimum of I+1, which is two. One+1
which is two, or zero + two which is two,
so we have two. So we're gonna write two
in this [inaudible]. Similarly, if we
wanna know the distance between I, N and
E, it's the minimum distance, of I-N to
nothing, plus one. So two+1 or three. Or,
the different distance between I and E
plus the cost of adding in that N or
three. Or the cost of having just an I and
adding in that n to e substitution which
is two or three. So again, we have three
here. We have a two and we have a three.
And if we continue along this manner,
again, in each case looking at the
previous cells and using this equation
over here, we'll slowly end up with. So,
[sound]. And if we continue along this, in
this manner, we're gonna end up with the
following complete table. So. Every cell
in this table, let's take this cell, tells
you the cost of sub, of ca, of at a
distance, of editing the string INTE and
turning it into the string EXE and that
means that this value here in the upper
right corner is the cost, the at a
distance, between intention and execution,
the cost of turning intention into
execution and we see the value eight,
which we earlier said was the, the
levenstein distance. So we had levenstein
distance equals eight. That's our
algorithm for computing [inaudible] at a

You might also like