0% found this document useful (0 votes)

63 views10 pages

Levenshtein Algorithm 1 PDF

The Levenshtein algorithm calculates the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into another. It works by creating a matrix where each word is along the rows and columns. Cells are filled based on comparing letters - if equal, the value is from the diagonal cell above, otherwise the minimum of surrounding cells plus one. The edits are determined by backtracking from the bottom-right cell. It is useful for applications like spell checkers and speech recognition.

Uploaded by

yetsedaw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views10 pages

Levenshtein Algorithm 1 PDF

Uploaded by

yetsedaw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

LEVENSHTEIN ALGORITHM

By Hrishitva Patel, Goutham Ravichandran

Problem Statement

When we try to search or spell a word, we may not know the exact spelling. In this case, we
try to fix the mistake that we made by adding a letter, deleting a letter, or replacing a letter
with a convenient one with the help of our memory and cognitive skills. However, if we have
to do this operation automatically in the computer programming system in the most
optimized way, we need to develop logical steps that find a reasonable solution for every
specific word combination. For example, in google search when we enter a wrong word,
google recommends the closest word approximately. We intend to create this using
Levenshtein distance, which is used in applications such as spell checkers, correction systems,
speech recognition, spam filtering, and plagiarism detection. The efficiency of this operation
is important because the search request triggered by users in a second is about 70,000.
Therefore, we need to use a plausible algorithm that performs well under these conditions.
The DNA comparisons problem needs the same kind of algorithm to detect differences
between two DNA structures. Therefore, the main objective is to detect the differences
between two words and find out what kind of operations should be done to the target word
to make both of them the same by comparing the strings in a variety of ways.

Algorithm Description
The Levenshtein algorithm also called Levenshtein edit distance, which means the number of
differences between two words called like distance. This variable is also used as a parameter
to check how much difference can be tolerated. The Levenshtein distance between two words
is the smallest number of single-character modifications (insertions, deletions, or
substitutions) required to transform one word into the other. It is named after Vladimir
Levenshtein, a Soviet mathematician who studied this distance in 1965.
Dynamic Programming requires that you first be able to solve similar problems, to apply the
technique to the particular problem you are trying to solve. Therefore, in this algorithm, we
have divided the problem into 2 steps to figure out the distinctions and find a solution way
according to this map.

The Process of Levenshtein consists of two parts, which are forming the matrix by
crosschecking the letters of words and giving the value for each cell according to the logic of
the algorithm and the backtracking technique to announce which operation has to be done to
fix a word at the end optimally.
Figure 1: Levenshtein Matrix
In the first step, as we can see from Figure 1, we have to create a matrix in that words are
placed on the rows and columns no matter whether the number of the letters is the same or
different. Firstly, we place the initial values on words according to their order from beginning
to end. After every letter, we increment the value by one. In both words cells next to them
are filled with values in ascending order naturally. As we can see from Figure 1, the
“RELEVANT” word has values for every letter next to them in ascending order from 1 to 8 since
it is 8 letters. Besides, that same condition is valid for the other word “ELEPHANT” shown in
the figure as well. Therefore, there is an extra cell that is assigned as 0 because there is no
letter in a column or row naturally. At the beginning of the two words comparison start and
goes on at the end of the matrix the last element on the diagonal. In this manner, comparison
can be approached cell by cell and the responding row and column letter to that cell. In the
beginning, the trivial cell which has 0 value on the leftmost and the top one is selected, and
there is no letter for his cell. When we pass to the next cell one by one, we need to imply some
execution to make the right action according to the algorithm.
Figure 2: Levenshtein Matrix

For every cell, if the compared letters are equal then we have to assign the current cell's
previous diagonal cell value directly. Otherwise, if the compared letters are not the same as
each other, then it increments the three values around it at the left, top, and diagonal upper-
left. After incrementing, the smallest value of those results is selected as the new value of the
present cell. These implementations are applied to all empty cells from beginning to end cells
gradually as shown in Figure 2.

Figure 3: Choosing a minimum algorithm

The Levenshtein algorithm (also called Livan-Distance) calculates the lowest number of editing
processes necessary to modify one series to get another series. The most common way to
calculate this is by the dynamic programming approach. A matrix is initialized to measure the
Levenshtein distance between the first character of one word and the last character of the
other word. The matrix can be filled from the top left to the bottom right corner. Each jump
in the text corresponds to an insertion or a deletion which is decided in the second step. The
cost of each operation is usually set to 1. The diagonal jump costs either one or zero depending
on whether the two characters in the row and column match. Each cell always tries to
minimize the cost locally as shown in the algorithm in Figure 3.

The second step is deciding which operations need to be executed to make both strings in the
same form. For this process, we need to complete the Levenshtein matrix based on the
procedure explained above. In this matrix, we have to focus on the last element of the matrix
that is located at the rightmost and lowest cell in the whole grid. For example, if the length of
the words is n and m then this first selected cell can be said that the mth and nth cell in the
matrix.
Figure 4: Operation decision logic
After this first cell, we control the three cell around the current cell which is upper, left, and
left upper (diagonal) and the minimum of three cell values is selected as the target cell and
our cursor move there. Before we move there, a decision has to be made about which
operation needs to be done. In addition to that, if replacement is selected as the optimal
operation we need to decide which letters require exchanging each other. If the minimum
value is in the left cell, we can conclude that our operation should be arranged as the deleting
current letter that is controlled against others. If the minimum value is in an upper cell, we
can deduce that the needed operation for the appropriate solution is insertion. Finally, if the
diagonal cell has the least value, then this means a replacement operation should be included
in the list of required operations. If no minimum value is found, i.e, all cells are equal, then we
skip this point without doing anything because they are the same letters exactly.

Figure 5: Backtracking

As we can see from Figure 5, until the first element is reached, the backtracking operation is
going on. When backtracking is implemented, an extra variable is held to record how many
operations are done in a way that after every operation this value is incremented by 1.
Generally, the targeted word is settled at the row of the compromised matrix. For deletion
and insertion, letters are chosen from its row index for the desired word on that side.

RESULTS AND DISCUSSIONS

The Levenshtein algorithm has an effective result for the area of string comparison that
diversifies as DNA line-up researches, search engines recommendation even though in the link
prediction, cryptography and recognition of the images in the machine learning recently.
Besides that, some tests compared the performance of approximate subgraph matching with
the string edit distance approach. This approach outperformed the approximate subgraph
matching method in terms of computational cost and accuracy [1]. When the sort of the string
letters is changed, it affects the result also, as all values need to be recalculated.
The Levenshtein distance is the most widely used of the edit distance family of distance
metrics. The collection of elementary operations allowed to conduct the transformation
differs between these sibling distance metrics. For example, Hamming distance only allows
substitutions. There are various kinds of Levenshtein methods apart from what we discussed.
One of the Levenshtein methods is a straightforward recursive algorithm.

Figure 6: Pseudocode for Levenshtein recursion

The function x.substr(n) returns a substring of x beginning at element n. This approach is
inefficient because it calculates distance for the same prefixes many times.
Another version is iterative with a full matrix; knowing the features of the Levenshtein
distance, we can see that a matrix of dimension (|a| + 1) times (|b| +1] can be constructed,
including the value lev a,b I j) at the point I j. The first row and the first column of this matrix
are set to the values in the ranges 0..a and 0..b, respectively. We can use a dynamic
programming approach to fill in this matrix to obtain the final, bottom-right element as our
resulting distance.

Figure 7: Full matrix version of Levenshtein

When we implement this version, we will get the how many operations to need to be done
directly at the end of the matrix as below example

Figure 8: Table
5 is the result of this comparison and every operation is counted as 1 for the full matrix as
well. The bottom-right element of this matrix is the same as the five operations we observed
previously.
Another version is iterative with two rows. If we want to gain the final value alone, we can
easily modify the implementation of the above-mentioned provisions to avoid the allocation
of the entire matrix. To move forward, we only need two rows - the one we are currently
updating and the previous one.

Figure 9: Two rows version

This optimization makes it impossible to determine which edits were made. Hirschberg’s
algorithm solves this problem using both dynamic programming and division and conquer.

Furthermore, we may observe the fact that to calculate the value at the specific row position
we need only three values – the one to the left, the one directly above, and the last one
diagonal.
Figure 10: Lev Distance approach
Thus, our function may be modified to devote one row and two variants instead of two rows.
This modification makes the memory requirements for the application even more relaxed.

When we consider this algorithm, according to complexity, we can say that the length of the
words is the main parameter. The time complexity of all the iterative algorithms presented
above is O(|a| x |b|). Space complexity for the full matrix implementation is O(|a| x
|b|) which usually makes it impractical to use. Both two-rows and single-row
implementations provide linear space complexity O(max(|a| , |b|)) . Swapping source and
target to reduce computation row length will further reduce it to O(min(|a|, |b|)). It has been
shown that the Levenshtein distance cannot be calculated in subquadratic time unless
the strong exponential time hypothesis is false. Fortunately, this is only a partial description
of the complexity of the problem.

When we take this algorithm in terms of upper boundary and minimum distance, we can
say that some combined methods are used. Let's say we have a large string and want to
compare only similar strings, such as misspelled names. Complete Levenshtein computation
would have to traverse the full matrix in this scenario, including the high values in the top-
right and bottom-left corners that we won't require. This gives us an idea of how the threshold
could be improved, with all distances above a certain boundary simply being reported as out
of range. As a result, we only need to compute the values in the diagonal stripe of width 2K +
1 for bounded distance, where K is the distance threshold. In other words, if the Levenshtein
distance exceeds the boundary, the implementation will fail.
This method provides us with the time complexity of O(min(|a|,|b|)), which allows us to
execute large but comparable strings in a reasonable amount of time.

We can also skip the calculation if the distance exceeds the threshold we set because we know
the distance is at least the length difference between the strings.

PROGRAM RESULT
Entered words elephant and relevant and program ran and gave correct output edit distance
as 3, when K value is 3 or more, where k is maximum allowed changes.

Figure 11: Program Execution in C compiler

Thus, we can conclude that the code is working perfectly and we are able to simulate the
working of Levenshtein edit distances and if developed further, can be useful to work in
various other fields also.
REFERENCES
[1] Putra, Made & Supriana, Iping. (2015). Structural Off-line Handwriting Character
Recognition Using Approximate Subgraph Matching and Levenshtein Distance. Procedia
Computer Science. 59. 340-349. 10.1016/j.procs.2015.07.529.

[2] https://www.geeksforgeeks.org/backtracking-introduction/

[3] https://www.baeldung.com/cs/levenshtein-distance-computation

[4] https://dev.to/trekhleb/dynamic-programming-vs-divide-and-conquer-218i

[5] https://www.researchgate.net/figure/An-example-of-Algorithm-2-for-input-string-T-CATGACTG-
and-pattern-P-TACTG_fig5_320319792

[6] https://en.wikipedia.org/wiki/Levenshtein_distance

[7] https://dl.acm.org/cms/attachment/1b5b1be7-69a4-4d4d-ba1b-664cd797c9ce/www19-313-
fig3.jpg

[8] https://www.techiedelight.com/levenshtein-distance-edit-distance-problem/

[9] Introduction to Algorithms — Thomas H. Cormen, Charles E. Leiserson, Ronald L.

Rivest, Clifford Stein

[10] https://bdebo.medium.com/edit-distance-643a4bcfaa09

[11] https://afteracademy.com/blog/edit-distance-problem

Beatson-Tracking Antibiotic Resistance-2014-Science (New York, NY)
No ratings yet
Beatson-Tracking Antibiotic Resistance-2014-Science (New York, NY)
3 pages
Colours in Rose Seidler House
No ratings yet
Colours in Rose Seidler House
12 pages
Lab 4 MCSE - 207 - Suyash
No ratings yet
Lab 4 MCSE - 207 - Suyash
25 pages
List of Licensed Institutions As at December 22nd 2022
No ratings yet
List of Licensed Institutions As at December 22nd 2022
7 pages
Flight Schools Comparison Spreadsheet 2022
No ratings yet
Flight Schools Comparison Spreadsheet 2022
1 page
Honk
No ratings yet
Honk
13 pages
ABU Robocon 2023 Rulebook
No ratings yet
ABU Robocon 2023 Rulebook
9 pages
GarbageGreen Documentation JUNE15
No ratings yet
GarbageGreen Documentation JUNE15
26 pages
4.4 Edit (Levenshtein) Distance
No ratings yet
4.4 Edit (Levenshtein) Distance
5 pages
Contents
No ratings yet
Contents
16 pages
Case Wal Mart in Japan Survival and Future of Its Japanese Business Case
100% (1)
Case Wal Mart in Japan Survival and Future of Its Japanese Business Case
21 pages
Puppet Quest Guide
No ratings yet
Puppet Quest Guide
124 pages
Levenshtein Distance PDF
No ratings yet
Levenshtein Distance PDF
3 pages
Kemet Metallographic Brochure
No ratings yet
Kemet Metallographic Brochure
24 pages
Lab 5 Osmosis and Tonicity
No ratings yet
Lab 5 Osmosis and Tonicity
6 pages
Integrating ADC
No ratings yet
Integrating ADC
4 pages
The Striking Similarity Between Korean and Japanese English Vocabulary
No ratings yet
The Striking Similarity Between Korean and Japanese English Vocabulary
23 pages
Unit 1
No ratings yet
Unit 1
26 pages
DLD - Digital Logic Design
No ratings yet
DLD - Digital Logic Design
48 pages
Recurrence Relations
No ratings yet
Recurrence Relations
4 pages
Practice Test
No ratings yet
Practice Test
4 pages
LaTeX Cheat Sheet
No ratings yet
LaTeX Cheat Sheet
2 pages
Waves Harmony
No ratings yet
Waves Harmony
36 pages
ĐỀ 22 ĐÁP ÁN
No ratings yet
ĐỀ 22 ĐÁP ÁN
16 pages
Impact of Tourism Development Upon Environmental Sustainability: A Suggested Framework For Sustainable Ecotourism
No ratings yet
Impact of Tourism Development Upon Environmental Sustainability: A Suggested Framework For Sustainable Ecotourism
14 pages
User Guide-ElasticSuite For Magento 2-v2.6
No ratings yet
User Guide-ElasticSuite For Magento 2-v2.6
38 pages
Animals of The Amazon Minitheme by Slidesgo
No ratings yet
Animals of The Amazon Minitheme by Slidesgo
7 pages
B.Tech Aero MLR20 - 29-10-2021
No ratings yet
B.Tech Aero MLR20 - 29-10-2021
289 pages
Module 1 - Verilog HDL
No ratings yet
Module 1 - Verilog HDL
26 pages
1.1 The Nature of Simulation: Analytic
No ratings yet
1.1 The Nature of Simulation: Analytic
7 pages
Lundy: Birds - First Puffin Definitives
No ratings yet
Lundy: Birds - First Puffin Definitives
16 pages
Knowledge Understanding and Application of Industrial Training Towards Catering Students at FPTV Uthm
No ratings yet
Knowledge Understanding and Application of Industrial Training Towards Catering Students at FPTV Uthm
9 pages
17 01 2023
No ratings yet
17 01 2023
200 pages
جزوه شیمی مواد غذایی - پارسه
No ratings yet
جزوه شیمی مواد غذایی - پارسه
270 pages
Pokemon Master Trainer
No ratings yet
Pokemon Master Trainer
25 pages
PPS Q-Bank
No ratings yet
PPS Q-Bank
4 pages
Unban Forms. February 1
100% (1)
Unban Forms. February 1
137 pages
Acetaldehyde Scavengers For Poly (Ethylene Terephthalate) - Chemis
No ratings yet
Acetaldehyde Scavengers For Poly (Ethylene Terephthalate) - Chemis
334 pages
Yaesu FT 2400h User Manual
No ratings yet
Yaesu FT 2400h User Manual
42 pages
What Are Tessellations
No ratings yet
What Are Tessellations
4 pages
System Software - 5 - KQB KtuQbank
No ratings yet
System Software - 5 - KQB KtuQbank
15 pages
BSADM Module 4 Session 17 22 KSR
No ratings yet
BSADM Module 4 Session 17 22 KSR
28 pages
Chapter 16 Pages 394-420 PDF
No ratings yet
Chapter 16 Pages 394-420 PDF
30 pages
NN VI Two-Pager
100% (1)
NN VI Two-Pager
2 pages
Integration Guide: SE965HP Engine
No ratings yet
Integration Guide: SE965HP Engine
226 pages
JVVD Universities
No ratings yet
JVVD Universities
6 pages
JayDeep S CV PDF
No ratings yet
JayDeep S CV PDF
1 page
Lifting The Lids Off Ripolin A Collection of Paint From Sidney Nolan S Studio
No ratings yet
Lifting The Lids Off Ripolin A Collection of Paint From Sidney Nolan S Studio
15 pages
So 2ND Ed Int Read Extra U1
No ratings yet
So 2ND Ed Int Read Extra U1
1 page
Harlem Renaissance Thesis by Slidesgo
No ratings yet
Harlem Renaissance Thesis by Slidesgo
53 pages
Opentext™ Documentum™ Rest Services: Reference Guide
No ratings yet
Opentext™ Documentum™ Rest Services: Reference Guide
1,010 pages
Compal Confidential: UMA MB Schematic Document Eh7Lw/Eh5Lw/Fh5Tw/Eh7Lc/Eh5Lc
No ratings yet
Compal Confidential: UMA MB Schematic Document Eh7Lw/Eh5Lw/Fh5Tw/Eh7Lc/Eh5Lc
46 pages
Microbial Food Spoilage
No ratings yet
Microbial Food Spoilage
16 pages
f1 Meter Reading - New Sheet
No ratings yet
f1 Meter Reading - New Sheet
88 pages
Mining Data Streams (Part 1)
No ratings yet
Mining Data Streams (Part 1)
46 pages
Resistor Colour Code
No ratings yet
Resistor Colour Code
14 pages
Chapter 5 Genetic Resources in Agriculture
No ratings yet
Chapter 5 Genetic Resources in Agriculture
49 pages
Hortifrut - Comprar - La Compañía Fortalece Su Presencia en Europa A Lo Largo de La Cadena de Valor
No ratings yet
Hortifrut - Comprar - La Compañía Fortalece Su Presencia en Europa A Lo Largo de La Cadena de Valor
2 pages
Lavenstein Distance
No ratings yet
Lavenstein Distance
5 pages
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
CHAMPS Project 3.2 Child Clinical Abstraction - REDCap
No ratings yet
CHAMPS Project 3.2 Child Clinical Abstraction - REDCap
152 pages
University of Gondar SNHL research proposal(HQ)
No ratings yet
University of Gondar SNHL research proposal(HQ)
93 pages
String Matching 2019
No ratings yet
String Matching 2019
50 pages
Defini'on of Minimum Edit Distance
No ratings yet
Defini'on of Minimum Edit Distance
52 pages
Sentiment Analysis of Tweets Using Natural Language Processing (#1130188) - 2484168
No ratings yet
Sentiment Analysis of Tweets Using Natural Language Processing (#1130188) - 2484168
3 pages
Research Article: Subha R., Anandakumar K and Bharathi A
No ratings yet
Research Article: Subha R., Anandakumar K and Bharathi A
7 pages
Github Copilot Ai Pair Programmer: Asset or Liability?
No ratings yet
Github Copilot Ai Pair Programmer: Asset or Liability?
20 pages
Srsss
No ratings yet
Srsss
20 pages
SE Exam
No ratings yet
SE Exam
10 pages
Group G1 - BEC Case
No ratings yet
Group G1 - BEC Case
2 pages
LaTeX For Economists
No ratings yet
LaTeX For Economists
12 pages
Basisanschlussplan MS4 Sport HPI DDU7 Software Release 36
No ratings yet
Basisanschlussplan MS4 Sport HPI DDU7 Software Release 36
1 page
Philosophical and Methodological Motivations For Combining Logics
100% (1)
Philosophical and Methodological Motivations For Combining Logics
32 pages
ICT Ch1 Notes - 1
No ratings yet
ICT Ch1 Notes - 1
13 pages
Worthy Goals PDF
No ratings yet
Worthy Goals PDF
2 pages
Audio Tech
No ratings yet
Audio Tech
52 pages
assignment -1 with answer
No ratings yet
assignment -1 with answer
17 pages
BS170
No ratings yet
BS170
13 pages
project - nlp stock price prediction
No ratings yet
project - nlp stock price prediction
4 pages
Curriculum Vitae: Nitya Joyce Viswasanathan
No ratings yet
Curriculum Vitae: Nitya Joyce Viswasanathan
4 pages
Atoll
No ratings yet
Atoll
39 pages
LS TTL Data
No ratings yet
LS TTL Data
274 pages
Monitoreo de Tuberías de Agua y Detección de Fugas Usando Sensores de Humedad Del Suelo Solución Basada en IoT
No ratings yet
Monitoreo de Tuberías de Agua y Detección de Fugas Usando Sensores de Humedad Del Suelo Solución Basada en IoT
4 pages
Migration of GSM To Gprs
100% (1)
Migration of GSM To Gprs
20 pages
Kavach Installation
No ratings yet
Kavach Installation
46 pages
יובל זבנוביץ - בגרויות2
No ratings yet
יובל זבנוביץ - בגרויות2
7 pages
Sparc/Cpu-3Ce: Installation Guide
No ratings yet
Sparc/Cpu-3Ce: Installation Guide
47 pages
Hbomax
No ratings yet
Hbomax
16 pages
How Do I Hide and Show A Menu Item in The Android ActionBar
No ratings yet
How Do I Hide and Show A Menu Item in The Android ActionBar
4 pages
Job Poster
No ratings yet
Job Poster
2 pages
Mixed 2
No ratings yet
Mixed 2
104 pages
Hive 1
No ratings yet
Hive 1
7 pages
Band in A Box 2016 Mac Manual
No ratings yet
Band in A Box 2016 Mac Manual
416 pages
ACM Journal Format
No ratings yet
ACM Journal Format
6 pages
Social Engineering - The Art of Human Hacking (461-470)
No ratings yet
Social Engineering - The Art of Human Hacking (461-470)
10 pages
Operation Barbarossa 1941 3 Army Group Center Illustrated Robert Kirchubel download
No ratings yet
Operation Barbarossa 1941 3 Army Group Center Illustrated Robert Kirchubel download
32 pages
تجارب مختبر Plc+تقارير
No ratings yet
تجارب مختبر Plc+تقارير
40 pages
G2 - 3 - Personalized Stock Market DBMS
No ratings yet
G2 - 3 - Personalized Stock Market DBMS
5 pages
Subnetting Tricks Subnetting Made Easy With Examples
No ratings yet
Subnetting Tricks Subnetting Made Easy With Examples
2 pages

Levenshtein Algorithm 1 PDF

Uploaded by

Levenshtein Algorithm 1 PDF

Uploaded by

LEVENSHTEIN ALGORITHM

By Hrishitva Patel, Goutham Ravichandran

Figure 3: Choosing a minimum algorithm

RESULTS AND DISCUSSIONS

Figure 6: Pseudocode for Levenshtein recursion

Figure 7: Full matrix version of Levenshtein

Figure 9: Two rows version

Figure 11: Program Execution in C compiler

[9] Introduction to Algorithms — Thomas H. Cormen, Charles E. Leiserson, Ronald L.

You might also like