SlideShare a Scribd company logo
Shridhar Shah
Department of Computer Science and Technology
Institute of technology, Nirma University
Ahmedabad, India
14bce111@nirmauni.ac.in

 BK Tree or Burkhard Keller Tree is a data structure
that is used to perform spell check based on Edit
Distance (Levenshtein distance) concept.
 BK trees are also used for approximate string
matching.
 Various auto correct feature in many soft wares can
be implemented based on this data structure.
INTRODUCTION

 For instance if we are checking a word “ruk” we will
have {“truck”,”buck”,”duck”,……}. Therefore,
spelling mistake can be corrected by deleting a
character from the word or adding a new character
in the word or by replacing the character in the word
by some appropriate one. Therefore, we will be using
the edit distance as a measure for correctness and
matching of the misspelled word from the words in
our dictionary.
EXAMPLE

 CREATE
 SEARCH(SPELL-CHECK)
OPERATIONS

 Suppose we have the dictionary data as {“BALL”,”WALL”,”TAIL”}
 The nodes in the BK-Tree will show the elements in our dictionary and
there will be exactly the same number of elements as the number of words
in our dictionary given.
 Here for given dictionary, it is n=3(three nodes).The edges between nodes
show the edit distance(Levenshtein Distance d).The first element is the
root, then we take the Levenshtein Distance d from the root and add the
next elements on the tree like this:
 LevenshteinDistance(BALL, WALL) -> 1
 LevenshteinDistance(BALL, TAIL) -> 2
 The value of d between BALL and WALL is 1 and d is 2 for BALL and
TAIL.
 Here the tree is created and each node will have only one child with same
edit distance. If a new word "MALL" is added then it cannot be added as a
child to the root as it has already had child with d=1.It is added to the node
"WALL" as it has d=1.
CREATE

CREATE

CREATE
Here the tree is created and each node
will have only one child with same edit
distance.
If a new word "MALL" is added then it
cannot be added as a child to the root as
it has already had child with d=1.It is
added to the node "WALL" as it has d=1.

SEARCH
FINAL TREE

 Now to find the nearest correct word or the string
matching here is the example of BK tree using the
given dictionary:
SEARCH

 The simple method to find a word is start from the root and
move to left and right till the edit distance is the minimum till
the end.
 To find the corrected or misspelled word we have to define the
terms i.e. tolerance value. This tolerance value(T) is highest
edit distance from our misspelled word to the correct words in
our dictionary.
 BK tree is constructed based on edit distance calculated and
searching for misspelled word can be found out using by
searching over children with edit distance [d-T] to [d+T].
 Suppose we have an incorrectly spelled word "oop" and T is 2.
Presently, we will perceive how we will gather the normal right
for the given incorrectly spelled word.
SEARCH

 Step 1: We will begin checking the value d from the root hub. d("oop" - >
"help") = 3. Presently we will emphasize over its children having in range
[d-T,d+T] i.e [1,5]
 Step 2: Let's begin emphasizing from the most noteworthy value d i.e node
"loop" with d=4 .Now at the end we will discover its distance from our
incorrectly spelled word. d("oop","loop") = 1.
 here d = 1 i.e d <= T , so we will include "loop" to the normal right word
rundown and process its child elements having alter remove in range [d-
T,d+T] i.e. [1,3]
 Step 3: Now, we are at position "troop". Finally, we will check its distance
from the incorrectly spelled word. d("oop","troop")=2. Here again d <= T,
thus again we will include "troop" to the normal right word list.
We will continue the same for every one of the words in the range [d-
T,d+T] beginning from the root position till the base most leaf node.
 In this manner, toward the end, we will be left with just 2 expected
words for the incorrectly spelled word "oop" i.e {"loop","troop"}.
SEARCH

 The basic method to find the nearest match is take all word in the
dictionary and compare the edit distance(d) with tolerance value (T) this
will take huge amount of time i.e. O(N1*M*N2)
 where N1 is a number of words , N2 is the length of incorrect word and M
is mean size of the perfect match.
 But by using a BK tree we can reduce this time complexity in the following
manner; assuming tolerance limit(T) to be 2. Now approximately, the
depth of BK-Tree will be log N, where N number of elements. At every
level, we are visiting 2 elements in the BK tree and doing edit distance
evaluation. Therefore, our Time Complexity will be O(N1*N2*log N), here
N1 is the mean length of the string in our dictionary and N2 is the length of
the incorrect word.
ANALYSIS

 The tree is N-ary and irregular (but generally well-
balanced).
 Tests show that searching with a distance of 1
queries no more than 5-8% of the tree, and searching
with two errors queries no more than 17-25% of the
tree - a substantial improvement over checking every
node!
 Note that exact searching can also be performed
fairly efficiently by simply setting n to 0.
ANALYSIS

 BK tree usually used in the spell checking applications like in dictionary,
text editors where we write spelling wrong help in correcting the word as
it is relatively simple it has mainly three parts firstly it checks whether the
word exists in the dictionary or not, secondly find the possible fixes for
misspelled word and lastly order suggestions based on some sort of
heuristic, it takes linear time by scanning all words in the dictionary and
calculating edit distance, it is really an amazing data structure for building
a dictionary of similar words and it also used to guess the typed word like
"cat" when we wrote "cta" it works with the words from dictionary with
the help of the first word which act as a root node then with the help of the
Levenshtein distance subsequent words are attached.
 And also used in the string matching applications and various soft wares
were correct features are a prerequisite for auto-correcting the word. It has
wide application in search engines for many websites for correcting the
spelling for naïve users.
 It is a basis for the futuristic search engine and correction softwares.
APPLICATION

 http://blog.not.net/2007/4/Damn-Cool-
Algorithms-BK-Trees
 https://dzone.com /algorithm-week-bk-trees-part-1
 http://www.geeksfgeeks.org/BK-tree-introduction
/
 http://blog.mishkokyi.net/posts/2015/Jul/31/impl
ementing-bk-tree
 https://nulwords.wordpress.com/2013/03/13/the-
bk-tree -spell-checking/
REFERENCES

THANKS
Ad

More Related Content

What's hot (20)

Java
JavaJava
Java
Sneha Mudraje
 
Condicionales
CondicionalesCondicionales
Condicionales
Daniel Cáceres
 
Máquinas de Turing - Tipos y Aplicaciones
Máquinas de Turing - Tipos y AplicacionesMáquinas de Turing - Tipos y Aplicaciones
Máquinas de Turing - Tipos y Aplicaciones
Rosviannis Barreiro
 
Sentencias if python y entrada
Sentencias if python y entradaSentencias if python y entrada
Sentencias if python y entrada
Elim Aqp
 
TOC 9 | Pushdown Automata
TOC 9 | Pushdown AutomataTOC 9 | Pushdown Automata
TOC 9 | Pushdown Automata
Mohammad Imam Hossain
 
Compiler Design lab manual for Computer Engineering .pdf
Compiler Design lab manual for Computer Engineering .pdfCompiler Design lab manual for Computer Engineering .pdf
Compiler Design lab manual for Computer Engineering .pdf
kalpana Manudhane
 
0.0 Introduction to theory of computation
0.0 Introduction to theory of computation0.0 Introduction to theory of computation
0.0 Introduction to theory of computation
Sampath Kumar S
 
Plan 9: Not (Only) A Better UNIX
Plan 9: Not (Only) A Better UNIXPlan 9: Not (Only) A Better UNIX
Plan 9: Not (Only) A Better UNIX
National Cheng Kung University
 
Ejercicios prácticos con el uso de pseint
Ejercicios prácticos con el uso de pseintEjercicios prácticos con el uso de pseint
Ejercicios prácticos con el uso de pseint
Enrique Vargas
 
Contravariant functors in scala
Contravariant functors in scalaContravariant functors in scala
Contravariant functors in scala
Piotr Paradziński
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
Gene Chang
 
Linux basic commands with examples
Linux basic commands with examplesLinux basic commands with examples
Linux basic commands with examples
abclearnn
 
Clean Code
Clean CodeClean Code
Clean Code
Daniel Kummer
 
String matching with finite state automata
String matching with finite state automataString matching with finite state automata
String matching with finite state automata
Anmol Hamid
 
Gramaticas y lic
Gramaticas y licGramaticas y lic
Gramaticas y lic
Oscar Eduardo
 
Método de Búsqueda Hash
Método de Búsqueda HashMétodo de Búsqueda Hash
Método de Búsqueda Hash
Blanca Parra
 
Regular expressions-Theory of computation
Regular expressions-Theory of computationRegular expressions-Theory of computation
Regular expressions-Theory of computation
Bipul Roy Bpl
 
Ejercicios de estructuras selectivas (resueltos)
Ejercicios de estructuras selectivas (resueltos)Ejercicios de estructuras selectivas (resueltos)
Ejercicios de estructuras selectivas (resueltos)
Univerdad fermin toro
 
TOC 10 | Turing Machine
TOC 10 | Turing MachineTOC 10 | Turing Machine
TOC 10 | Turing Machine
Mohammad Imam Hossain
 
Turing machine
Turing machineTuring machine
Turing machine
Kanis Fatema Shanta
 
Máquinas de Turing - Tipos y Aplicaciones
Máquinas de Turing - Tipos y AplicacionesMáquinas de Turing - Tipos y Aplicaciones
Máquinas de Turing - Tipos y Aplicaciones
Rosviannis Barreiro
 
Sentencias if python y entrada
Sentencias if python y entradaSentencias if python y entrada
Sentencias if python y entrada
Elim Aqp
 
Compiler Design lab manual for Computer Engineering .pdf
Compiler Design lab manual for Computer Engineering .pdfCompiler Design lab manual for Computer Engineering .pdf
Compiler Design lab manual for Computer Engineering .pdf
kalpana Manudhane
 
0.0 Introduction to theory of computation
0.0 Introduction to theory of computation0.0 Introduction to theory of computation
0.0 Introduction to theory of computation
Sampath Kumar S
 
Ejercicios prácticos con el uso de pseint
Ejercicios prácticos con el uso de pseintEjercicios prácticos con el uso de pseint
Ejercicios prácticos con el uso de pseint
Enrique Vargas
 
Contravariant functors in scala
Contravariant functors in scalaContravariant functors in scala
Contravariant functors in scala
Piotr Paradziński
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
Gene Chang
 
Linux basic commands with examples
Linux basic commands with examplesLinux basic commands with examples
Linux basic commands with examples
abclearnn
 
String matching with finite state automata
String matching with finite state automataString matching with finite state automata
String matching with finite state automata
Anmol Hamid
 
Método de Búsqueda Hash
Método de Búsqueda HashMétodo de Búsqueda Hash
Método de Búsqueda Hash
Blanca Parra
 
Regular expressions-Theory of computation
Regular expressions-Theory of computationRegular expressions-Theory of computation
Regular expressions-Theory of computation
Bipul Roy Bpl
 
Ejercicios de estructuras selectivas (resueltos)
Ejercicios de estructuras selectivas (resueltos)Ejercicios de estructuras selectivas (resueltos)
Ejercicios de estructuras selectivas (resueltos)
Univerdad fermin toro
 

Similar to Spell Checker and string matching Using BK tree (20)

Trie Data Structure
Trie Data Structure Trie Data Structure
Trie Data Structure
Hitesh Mohapatra
 
apidays Paris 2024 - Embeddings: Core Concepts for Developers, Jocelyn Matthe...
apidays Paris 2024 - Embeddings: Core Concepts for Developers, Jocelyn Matthe...apidays Paris 2024 - Embeddings: Core Concepts for Developers, Jocelyn Matthe...
apidays Paris 2024 - Embeddings: Core Concepts for Developers, Jocelyn Matthe...
apidays
 
basis of infromation retrival part 2
basis of infromation retrival part 2basis of infromation retrival part 2
basis of infromation retrival part 2
Saroj Suwal
 
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
NAIST Machine Translation Study Group
 
Spell checker using Natural language processing
Spell checker using Natural language processing Spell checker using Natural language processing
Spell checker using Natural language processing
Sandeep Wakchaure
 
PA203 – Tiny Taurahe Translator We would have troubles wri.docx
PA203 – Tiny Taurahe Translator We would have troubles wri.docxPA203 – Tiny Taurahe Translator We would have troubles wri.docx
PA203 – Tiny Taurahe Translator We would have troubles wri.docx
alfred4lewis58146
 
Skip-gram Model Broken Down
Skip-gram Model Broken DownSkip-gram Model Broken Down
Skip-gram Model Broken Down
Chin Huan Tan
 
Yelp challenge reviews_sentiment_classification
Yelp challenge reviews_sentiment_classificationYelp challenge reviews_sentiment_classification
Yelp challenge reviews_sentiment_classification
Chengeng Ma
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
Hrishikesh Nair
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
eSAT Journals
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
eSAT Publishing House
 
Souvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project ReportSouvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project Report
Akshit Arora
 
Information Retrieval-05(wild card query_positional index_spell correction)
Information Retrieval-05(wild card query_positional index_spell correction)Information Retrieval-05(wild card query_positional index_spell correction)
Information Retrieval-05(wild card query_positional index_spell correction)
Jeet Das
 
NLP Concepts detail explained in details.pptx
NLP Concepts detail explained in details.pptxNLP Concepts detail explained in details.pptx
NLP Concepts detail explained in details.pptx
FaizRahman56
 
word level analysis
word level analysis word level analysis
word level analysis
tjs1
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguation
vini89
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguation
vini89
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
NohaGhoweil
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learning
telss09
 
Vectorization In NLP.pptx
Vectorization In NLP.pptxVectorization In NLP.pptx
Vectorization In NLP.pptx
Chode Amarnath
 
apidays Paris 2024 - Embeddings: Core Concepts for Developers, Jocelyn Matthe...
apidays Paris 2024 - Embeddings: Core Concepts for Developers, Jocelyn Matthe...apidays Paris 2024 - Embeddings: Core Concepts for Developers, Jocelyn Matthe...
apidays Paris 2024 - Embeddings: Core Concepts for Developers, Jocelyn Matthe...
apidays
 
basis of infromation retrival part 2
basis of infromation retrival part 2basis of infromation retrival part 2
basis of infromation retrival part 2
Saroj Suwal
 
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
NAIST Machine Translation Study Group
 
Spell checker using Natural language processing
Spell checker using Natural language processing Spell checker using Natural language processing
Spell checker using Natural language processing
Sandeep Wakchaure
 
PA203 – Tiny Taurahe Translator We would have troubles wri.docx
PA203 – Tiny Taurahe Translator We would have troubles wri.docxPA203 – Tiny Taurahe Translator We would have troubles wri.docx
PA203 – Tiny Taurahe Translator We would have troubles wri.docx
alfred4lewis58146
 
Skip-gram Model Broken Down
Skip-gram Model Broken DownSkip-gram Model Broken Down
Skip-gram Model Broken Down
Chin Huan Tan
 
Yelp challenge reviews_sentiment_classification
Yelp challenge reviews_sentiment_classificationYelp challenge reviews_sentiment_classification
Yelp challenge reviews_sentiment_classification
Chengeng Ma
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
Hrishikesh Nair
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
eSAT Journals
 
Souvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project ReportSouvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project Report
Akshit Arora
 
Information Retrieval-05(wild card query_positional index_spell correction)
Information Retrieval-05(wild card query_positional index_spell correction)Information Retrieval-05(wild card query_positional index_spell correction)
Information Retrieval-05(wild card query_positional index_spell correction)
Jeet Das
 
NLP Concepts detail explained in details.pptx
NLP Concepts detail explained in details.pptxNLP Concepts detail explained in details.pptx
NLP Concepts detail explained in details.pptx
FaizRahman56
 
word level analysis
word level analysis word level analysis
word level analysis
tjs1
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguation
vini89
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguation
vini89
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
NohaGhoweil
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learning
telss09
 
Vectorization In NLP.pptx
Vectorization In NLP.pptxVectorization In NLP.pptx
Vectorization In NLP.pptx
Chode Amarnath
 
Ad

Recently uploaded (20)

LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Ad

Spell Checker and string matching Using BK tree

  • 1. Shridhar Shah Department of Computer Science and Technology Institute of technology, Nirma University Ahmedabad, India [email protected]
  • 2.   BK Tree or Burkhard Keller Tree is a data structure that is used to perform spell check based on Edit Distance (Levenshtein distance) concept.  BK trees are also used for approximate string matching.  Various auto correct feature in many soft wares can be implemented based on this data structure. INTRODUCTION
  • 3.   For instance if we are checking a word “ruk” we will have {“truck”,”buck”,”duck”,……}. Therefore, spelling mistake can be corrected by deleting a character from the word or adding a new character in the word or by replacing the character in the word by some appropriate one. Therefore, we will be using the edit distance as a measure for correctness and matching of the misspelled word from the words in our dictionary. EXAMPLE
  • 5.   Suppose we have the dictionary data as {“BALL”,”WALL”,”TAIL”}  The nodes in the BK-Tree will show the elements in our dictionary and there will be exactly the same number of elements as the number of words in our dictionary given.  Here for given dictionary, it is n=3(three nodes).The edges between nodes show the edit distance(Levenshtein Distance d).The first element is the root, then we take the Levenshtein Distance d from the root and add the next elements on the tree like this:  LevenshteinDistance(BALL, WALL) -> 1  LevenshteinDistance(BALL, TAIL) -> 2  The value of d between BALL and WALL is 1 and d is 2 for BALL and TAIL.  Here the tree is created and each node will have only one child with same edit distance. If a new word "MALL" is added then it cannot be added as a child to the root as it has already had child with d=1.It is added to the node "WALL" as it has d=1. CREATE
  • 7.  CREATE Here the tree is created and each node will have only one child with same edit distance. If a new word "MALL" is added then it cannot be added as a child to the root as it has already had child with d=1.It is added to the node "WALL" as it has d=1.
  • 9.   Now to find the nearest correct word or the string matching here is the example of BK tree using the given dictionary: SEARCH
  • 10.   The simple method to find a word is start from the root and move to left and right till the edit distance is the minimum till the end.  To find the corrected or misspelled word we have to define the terms i.e. tolerance value. This tolerance value(T) is highest edit distance from our misspelled word to the correct words in our dictionary.  BK tree is constructed based on edit distance calculated and searching for misspelled word can be found out using by searching over children with edit distance [d-T] to [d+T].  Suppose we have an incorrectly spelled word "oop" and T is 2. Presently, we will perceive how we will gather the normal right for the given incorrectly spelled word. SEARCH
  • 11.   Step 1: We will begin checking the value d from the root hub. d("oop" - > "help") = 3. Presently we will emphasize over its children having in range [d-T,d+T] i.e [1,5]  Step 2: Let's begin emphasizing from the most noteworthy value d i.e node "loop" with d=4 .Now at the end we will discover its distance from our incorrectly spelled word. d("oop","loop") = 1.  here d = 1 i.e d <= T , so we will include "loop" to the normal right word rundown and process its child elements having alter remove in range [d- T,d+T] i.e. [1,3]  Step 3: Now, we are at position "troop". Finally, we will check its distance from the incorrectly spelled word. d("oop","troop")=2. Here again d <= T, thus again we will include "troop" to the normal right word list. We will continue the same for every one of the words in the range [d- T,d+T] beginning from the root position till the base most leaf node.  In this manner, toward the end, we will be left with just 2 expected words for the incorrectly spelled word "oop" i.e {"loop","troop"}. SEARCH
  • 12.   The basic method to find the nearest match is take all word in the dictionary and compare the edit distance(d) with tolerance value (T) this will take huge amount of time i.e. O(N1*M*N2)  where N1 is a number of words , N2 is the length of incorrect word and M is mean size of the perfect match.  But by using a BK tree we can reduce this time complexity in the following manner; assuming tolerance limit(T) to be 2. Now approximately, the depth of BK-Tree will be log N, where N number of elements. At every level, we are visiting 2 elements in the BK tree and doing edit distance evaluation. Therefore, our Time Complexity will be O(N1*N2*log N), here N1 is the mean length of the string in our dictionary and N2 is the length of the incorrect word. ANALYSIS
  • 13.   The tree is N-ary and irregular (but generally well- balanced).  Tests show that searching with a distance of 1 queries no more than 5-8% of the tree, and searching with two errors queries no more than 17-25% of the tree - a substantial improvement over checking every node!  Note that exact searching can also be performed fairly efficiently by simply setting n to 0. ANALYSIS
  • 14.   BK tree usually used in the spell checking applications like in dictionary, text editors where we write spelling wrong help in correcting the word as it is relatively simple it has mainly three parts firstly it checks whether the word exists in the dictionary or not, secondly find the possible fixes for misspelled word and lastly order suggestions based on some sort of heuristic, it takes linear time by scanning all words in the dictionary and calculating edit distance, it is really an amazing data structure for building a dictionary of similar words and it also used to guess the typed word like "cat" when we wrote "cta" it works with the words from dictionary with the help of the first word which act as a root node then with the help of the Levenshtein distance subsequent words are attached.  And also used in the string matching applications and various soft wares were correct features are a prerequisite for auto-correcting the word. It has wide application in search engines for many websites for correcting the spelling for naïve users.  It is a basis for the futuristic search engine and correction softwares. APPLICATION
  • 15.   http://blog.not.net/2007/4/Damn-Cool- Algorithms-BK-Trees  https://dzone.com /algorithm-week-bk-trees-part-1  http://www.geeksfgeeks.org/BK-tree-introduction /  http://blog.mishkokyi.net/posts/2015/Jul/31/impl ementing-bk-tree  https://nulwords.wordpress.com/2013/03/13/the- bk-tree -spell-checking/ REFERENCES