Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2
How are we going to compute interim at a
distance? The standard algorithm is with
dynamic programing. Down programming is a tabular method of computation. And what we're gonna do, is we're gonna compute D, the distance between two strings X and Y, X of length N, Y of length M, by combining solutions to sub-problems. And combining solutions to sub-problems, is the intuition of all dynamic programming algorithms. The intuition is, is very simple. We're gonna for small prefixes of length I of string X, and J of string Y, we'll compute the difference between those strings and we'll compute our larger Distances for larger strings based on those previously computed smaller values. In other words, we're going to compute the distance I-j between prefixes of string X of length I and prefixes of string. Y of length J for all I and J, and we'll end up in the end with the di, the distance. So let's look at the actual equation. Here's the equation for defining the minimum at a distance. And I've given you levenshtein distance which is the distance when there is a cost of one for insertion, one for deletions, and two for substitutions. So let's look at the initialization condition. So the, we first say that. Any characters in X, so this is the X string for it, for the I character is an X string. The distance between those and the null string in Y is the cost of deleting each of those characters. So the cost of those is, is the length of the string. We're deleting each character. And similarly for inserting all the characters into Y. To create the string y. The distance between the null string x and the string of y is just a length of y. The insertion cost of y. And then, we'll do the recurrence relation. So, walking through string x and walking through string y. We'll have that the distance in any particular cell of our matrix is going to be the minimum way of getting to that cell from three previous cells. If we go from the string I, that's one shorter so we're deleting one more thing in i. To make a J. Or we're inserting one thing into J to make it longer. Or we're substituting between the previous string I of length I, X of length I minus one and Y of length J minus one. We're adding in a new character. If it's the same in both strings we have a cost of zero. If it's different we have the substitution cost of two. And then at the end, the distance between the two strings is simply the It's simply the, the DFNM, the upper right corner of the matrix. So here's our table. And we can fill in each element of the table from using this equation that tells us the deletion cost, the insertion cost and the substitution cost. So let's do that. I put the equation up here in the corner. So we want to know what's the distance between the null string of intention and the null string of execution, obviously zero. The null string I to the string nothing is still the cost of deleting an i. That's one. So now let's try to compute whats the cost of converting IN to E. While intuitively we expect it's going to be a deletion and a substitution. So let's see if that works out. Alright, so the, Element in this cell is the minimum of three values. It's this distance plus one, this distance plus one, or this distance plus either two if I and zero are different, or zero if they are the same. Well, they're different, so it's the minimum of I+1, which is two. One+1 which is two, or zero + two which is two, so we have two. So we're gonna write two in this [inaudible]. Similarly, if we wanna know the distance between I, N and E, it's the minimum distance, of I-N to nothing, plus one. So two+1 or three. Or, the different distance between I and E plus the cost of adding in that N or three. Or the cost of having just an I and adding in that n to e substitution which is two or three. So again, we have three here. We have a two and we have a three. And if we continue along this manner, again, in each case looking at the previous cells and using this equation over here, we'll slowly end up with. So, [sound]. And if we continue along this, in this manner, we're gonna end up with the following complete table. So. Every cell in this table, let's take this cell, tells you the cost of sub, of ca, of at a distance, of editing the string INTE and turning it into the string EXE and that means that this value here in the upper right corner is the cost, the at a distance, between intention and execution, the cost of turning intention into execution and we see the value eight, which we earlier said was the, the levenstein distance. So we had levenstein distance equals eight. That's our algorithm for computing [inaudible] at a