0% found this document useful (0 votes)

463 views

Rabin-Karp Algorithm

The Rabin-Karp string matching algorithm represents strings as numbers to enable faster comparisons. It computes hash values for the pattern and substrings of the text, only comparing strings character-by-character if the hashes match. Hashes may collide, resulting in spurious hits that require validation through direct comparison. The algorithm runs in linear time on average but quadratic time in the worst case.

Uploaded by

krishna16kkumawat

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

463 views

Rabin-Karp Algorithm

Uploaded by

krishna16kkumawat

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Rabin-Karp String Matching Algorithm

 Comparing numbers is easier and cheaper than comparing strings. Rabin Karp algorithm
represents strings in numbers.

 Suppose p represents values corresponding to pattern P[1…m] of length m. And t s represents

values of m-length substrings T[(s + 1) … (s + m)] for s = 0, 1, 2, …, n – m.

 We can compute p in O(m) time and all ts can be computed in O(n – m + 1) time.

 Rabin Karp algorithm is based on hashing technique. It first computes the hash value of p
and ts.

 If hash values are same, i.e. if hash(p) = hash(ts), we check the equality of inverse hash
similar to a naïve method. If hash values are not same, no need to compare actual string.

 On the hash match, actual characters of both strings are compared using brute force
approach. If the pattern is found, then it is called hit. Otherwise, it is called a spurious hit.

 For example, let us consider the hash value of string T = ABCDE is 38 and hash of string P =
ABCDX is 71. Clearly, hash values are not same, so strings cannot be same. Brute force
approach does five comparisons whereas Rabin Karp dose only one comparison.

 However, the same hash value does not ensure the string match. Two different strings can
have same hash values. That is why we need to compare them character by character on
hash hit.

 Given pattern P[1…m], we can derive its numeric value p in base d in O(m) time as follow:
p = P[m] + d(P[m – 1] + d(P[m – 2] + … + d(P[2] + dP[1]) … ))

 Similarly, we can derive numeric value of first substring t0 of length m from text T[1…n] in
O(m) time. Remaining all ti, i = 1, 2, 3, …, n – m, can be derived in constant time.

 Given ts, we can compute ts+1 as,

ts+1 = d(ts – dm-1 T[s + 1]) + T[s + m + 1]

 Assume that T = [4, 3, 1, 5, 6, 7, 5, 9, 3] and P = [1, 5, 6]. Here length of P is 3, so m = 3.

Consider that, for given pattern P, its value p = 156, and t0 = 431.

t1 = 10(431 – 102T[1]) + T[4]

= 10(431 – 400 ) + 5

= 315

 Values of p and ts may be too large to process. We can reduce these values by taking its
modulo with suitable number q. typically, q is a prime number.

 Mod function has some nice mathematical property.

 [(a mod k) + (b mod k)] mod k = (a + b) mod k

 (a mod k) mod k = a mod k

 Computing hash value of every subsequence of m character of text may turn out to be time-
consuming. However, a bit of mathematics makes it easier.

 Suppose ts represents decimal value of substring

T[(s + 1) … (s + m)]. If hash of ts is known, hash value can directly be derived for ts+1 as follow :
Hash for ts+1 = (d*(ts – T[s + 1]h) + T[s + m + 1]) mod q,

Where h = dm – 1 mod q

 Two facts :

ts = p mod q, does not mean ts = p

ts ≠ p mod q, means ts ≠ p

 If p = 45365, ts = 64371 and q = 11,

45365 mod 13 = 8

64371 mod 13 = 8

 From above example, it is clear that two different strings might have same mod. We can
reject all negative tests to rule out all possible invalid shifts. And all positive tests must be
validated to overcome spurious hits.

 Spurious hits reduce with a larger value of q, and it increases with smaller q.

Complexity Analysis:

Rabin Karp algorithm is a randomized algorithm. In most of the cases, it runs in linear time, i.e. in
O(n). However, the worst case of Rabin-Karp algorithm is as bad as a naïve algorithm, i.e. O(mn), but
it’s rare.

It can happen only when prime number used for hashing is very small.

Example: Explain spurious hits in Rabin-Karp string matching algorithm with an example. Working
modulo q = 13, how many spurious hits does the Rabin-Karp matcher encounter in the text T =
2359023141526739921 when looking for the pattern P = 31415 ?

Solution:

Given pattern P = 31415, and prime number q = 13

P mod q = 31415 mod 13 = 7

Let us find hash value for given text :

T[1…5] = 23590 mod 13 = 8

T[2…6] = 35902 mod 13 = 9

T[(m – 4) … m] = 39921 mod 13 = 11

 The hash value of pattern P is 7. We have two such values in a hash(T). So there may be a
spurious hit or actual string. In given text T, one hit is an actual match and one is spurious
hit.

 If the hash of pattern and any substring in the text is same, we have two possibilities:
(i) Hit : Pattern and text are same

(ii) Spurious hit : Hash value is same but pattern and corresponding text is not same

Here, m = length of pattern = 5.

Consider

ts = 23590. Next value ts+1 is derived as,

ts+1 = 10(ts) – 10m* T[s+1] + T[s + m + 1] )

= (10*23590) – 105 * 2 + 2

= 235900 – 200000 + 2 = 35902

ts+2 = 10(ts+1) – 10m* T[s+2] + T[s + m + 2] )

= (10*35902) – 105 * 3 + 3

= 359020 – 300000 + 3 = 59023

 In same way, we can compute the next ts+i using incremental approach.

 Rabin Karp algorithm matches hash value, rather than directly comparing actual string value.
If hash value of pattern P and the hash value of subsequence in string T are same, the actual
value of strings is compared using brute force approach. Like ts+1, we can also derive hash
value incrementally as shown below.

Calculation for hash of 14152

If ts is known, the hash value can directly be derived for ts+1 as follow.

Hash for ts+1 = (d*(ts – T[s+1]h) + T[s + m + 1]) mod q

= 10(31415 – (3104 mod 13)) + 2 mod 13

= 10(31415 – 9) + 2 mod 13

= 314062 mod 13 = 8

Codes U1 Text
0% (1)
Codes U1 Text
5 pages
SEM 5 PDS IMP by GTU Medium
No ratings yet
SEM 5 PDS IMP by GTU Medium
29 pages
L3 Workbook Learning From Legends Tendulkar Tata
100% (1)
L3 Workbook Learning From Legends Tendulkar Tata
6 pages
Database System With Administration: Technical Assessment
No ratings yet
Database System With Administration: Technical Assessment
4 pages
Unit 7 Short Test 1A: Grammar
No ratings yet
Unit 7 Short Test 1A: Grammar
2 pages
JavaScript Gtu Solved Paper 2018
No ratings yet
JavaScript Gtu Solved Paper 2018
12 pages
Q.1 (A) Give Full Form of Following Acronym
No ratings yet
Q.1 (A) Give Full Form of Following Acronym
16 pages
De Report Sem-6
No ratings yet
De Report Sem-6
22 pages
Gujarat Technological University Be - Semester-Vi (New) - Examination - Summer 2019 GTU Paper Solution
No ratings yet
Gujarat Technological University Be - Semester-Vi (New) - Examination - Summer 2019 GTU Paper Solution
16 pages
8085 Assembly Language Programs
No ratings yet
8085 Assembly Language Programs
7 pages
IPDC 1 English Question Bank 052023 - 240307 - 113331
No ratings yet
IPDC 1 English Question Bank 052023 - 240307 - 113331
24 pages
Format To Print Practicals-AI PDF
No ratings yet
Format To Print Practicals-AI PDF
19 pages
CD Solved GTU Papers (Easy Solution)
No ratings yet
CD Solved GTU Papers (Easy Solution)
105 pages
Design Engineering With Canvas
No ratings yet
Design Engineering With Canvas
20 pages
MCQ Ipdc 1
No ratings yet
MCQ Ipdc 1
93 pages
Question Bank
No ratings yet
Question Bank
13 pages
Microprocessor and Interfacing
No ratings yet
Microprocessor and Interfacing
25 pages
DAA Assignment (Module4)
No ratings yet
DAA Assignment (Module4)
10 pages
IPDC Solution
100% (1)
IPDC Solution
35 pages
Chem Draw
No ratings yet
Chem Draw
4 pages
Ipdc - 1
100% (2)
Ipdc - 1
8 pages
Pds Sem 5 Imp by Gtu Medium
No ratings yet
Pds Sem 5 Imp by Gtu Medium
37 pages
AMCAT Automata Questions: Program To Check If Two Given Matrices Are Identical in C Language
No ratings yet
AMCAT Automata Questions: Program To Check If Two Given Matrices Are Identical in C Language
36 pages
C Program For Binary Search
No ratings yet
C Program For Binary Search
4 pages
De Report Sem3rd
83% (6)
De Report Sem3rd
32 pages
Integrated Personality Development Course: My India My Pride
100% (2)
Integrated Personality Development Course: My India My Pride
4 pages
ML GTU Solution
No ratings yet
ML GTU Solution
83 pages
IPDC 1 English Question Bank
No ratings yet
IPDC 1 English Question Bank
24 pages
Advance Java Programming Assignment - 1
No ratings yet
Advance Java Programming Assignment - 1
20 pages
The Guide To Mepco Campus Placement
No ratings yet
The Guide To Mepco Campus Placement
23 pages
IOT (3160716) - CE-CSE - MID-IMP Question Bank
No ratings yet
IOT (3160716) - CE-CSE - MID-IMP Question Bank
1 page
Pds Full Asiignment Mam - 240926 - 123334
No ratings yet
Pds Full Asiignment Mam - 240926 - 123334
15 pages
Report Karan
No ratings yet
Report Karan
20 pages
Ajt Paper Solution.... For Gtu Student.... Advance Java Technology
No ratings yet
Ajt Paper Solution.... For Gtu Student.... Advance Java Technology
186 pages
Handbook MAD (3161612)
100% (1)
Handbook MAD (3161612)
123 pages
8.1.writea Program To Read Structure Elements From Keyboard. Program
No ratings yet
8.1.writea Program To Read Structure Elements From Keyboard. Program
7 pages
Internship at Brainybeam Technologies Pvt. LTD: Ruchit Mukeshbhai Patel
No ratings yet
Internship at Brainybeam Technologies Pvt. LTD: Ruchit Mukeshbhai Patel
51 pages
Verbal Ability Placement Paper
No ratings yet
Verbal Ability Placement Paper
33 pages
(Nonlinear (6-31) : Structures GTU-Sem. 3-Comp/T) Binary Tree
No ratings yet
(Nonlinear (6-31) : Structures GTU-Sem. 3-Comp/T) Binary Tree
25 pages
Gujarat Technological University: - :: Payment Receipt of Exam Form
50% (2)
Gujarat Technological University: - :: Payment Receipt of Exam Form
1 page
IPDC 2 Question Bank (2021)
100% (1)
IPDC 2 Question Bank (2021)
22 pages
Assignments Week08
No ratings yet
Assignments Week08
4 pages
LNM Matrix
No ratings yet
LNM Matrix
1 page
UST-Previous Questions-2
No ratings yet
UST-Previous Questions-2
6 pages
IPDC 2 English Question Bank PDF
100% (1)
IPDC 2 English Question Bank PDF
22 pages
Advanced JAVA Study Material GTU - 23042016 - 032615AM PDF
67% (3)
Advanced JAVA Study Material GTU - 23042016 - 032615AM PDF
71 pages
Ipdc Ans
25% (4)
Ipdc Ans
15 pages
Practical SE (SRS) 2
No ratings yet
Practical SE (SRS) 2
65 pages
6 Accenture 2023 Pseudocode Trainer Handout
No ratings yet
6 Accenture 2023 Pseudocode Trainer Handout
25 pages
CN Practical No 5,6
No ratings yet
CN Practical No 5,6
3 pages
Mobile Device Forensic Report (1) (1)
No ratings yet
Mobile Device Forensic Report (1) (1)
10 pages
TCS NQT Model Programming Coding Questions Paper
No ratings yet
TCS NQT Model Programming Coding Questions Paper
7 pages
JAVA UNIT-2 Notes
No ratings yet
JAVA UNIT-2 Notes
21 pages
TCS-NQT - Question-Paper - 3
No ratings yet
TCS-NQT - Question-Paper - 3
30 pages
Biometric Security
No ratings yet
Biometric Security
2 pages
Introduction To Python Programming: Text Book: Core Python Programming, Wesley J. Chun, Second Edition, Pearson
No ratings yet
Introduction To Python Programming: Text Book: Core Python Programming, Wesley J. Chun, Second Edition, Pearson
28 pages
7 - Accenture 2023 - Coding - Trainer Handout
No ratings yet
7 - Accenture 2023 - Coding - Trainer Handout
124 pages
Sonata Software Sample Aptitude Placement Paper Level1
No ratings yet
Sonata Software Sample Aptitude Placement Paper Level1
7 pages
5CS4-AOA-Unit-3 @zammers
No ratings yet
5CS4-AOA-Unit-3 @zammers
7 pages
BNP Unit-5 Lecture 19
No ratings yet
BNP Unit-5 Lecture 19
13 pages
Lecture 56string Matching
No ratings yet
Lecture 56string Matching
43 pages
Module9_08
No ratings yet
Module9_08
13 pages
Third Periodical Exam in Math 8
80% (5)
Third Periodical Exam in Math 8
5 pages
Tactics for TOEIC® Listening and Reading Tests Book
No ratings yet
Tactics for TOEIC® Listening and Reading Tests Book
199 pages
ks2-english-2024-grammar-punctuation-spelling-marking-scheme
No ratings yet
ks2-english-2024-grammar-punctuation-spelling-marking-scheme
20 pages
Glossika Turkish Fluency1 PDF
100% (2)
Glossika Turkish Fluency1 PDF
276 pages
Soal Tugas #1 SIB-465 Penurunan Updated
No ratings yet
Soal Tugas #1 SIB-465 Penurunan Updated
8 pages
WordPerfect Office 2002 The Official Guide Osborne CORELPRESSTM Series 2nd Edition Alan R. Neibauer - Download the ebook now for an unlimited reading experience
100% (1)
WordPerfect Office 2002 The Official Guide Osborne CORELPRESSTM Series 2nd Edition Alan R. Neibauer - Download the ebook now for an unlimited reading experience
71 pages
TELPAS Overview
No ratings yet
TELPAS Overview
14 pages
Past Exams V2.1
No ratings yet
Past Exams V2.1
15 pages
Mayank Agarwal Analyst Data Analytics_old
No ratings yet
Mayank Agarwal Analyst Data Analytics_old
4 pages
The Second Term Test of English The Second Term Test of English
100% (2)
The Second Term Test of English The Second Term Test of English
2 pages
JeR 2022 - Schedule 2
No ratings yet
JeR 2022 - Schedule 2
5 pages
Accomplishment Report On Reading Intervention
No ratings yet
Accomplishment Report On Reading Intervention
4 pages
School Report Card 15 16
No ratings yet
School Report Card 15 16
25 pages
Wonderware - InTouch Access Anywhere Admin Manual 2013
No ratings yet
Wonderware - InTouch Access Anywhere Admin Manual 2013
58 pages
PacFactory User Manual
100% (1)
PacFactory User Manual
120 pages
Methodology 2024 Seminar 3
No ratings yet
Methodology 2024 Seminar 3
6 pages
EV、DV100 User's Manual V2.0
No ratings yet
EV、DV100 User's Manual V2.0
64 pages
Jonathan Sacks and Edmund Burke
No ratings yet
Jonathan Sacks and Edmund Burke
6 pages
Mrunal (Studyplan) UPPCS Preliminary Exam Paper 2 - Aptitude, Maths, Reasoning, Decision Making, English, Hindi, Free Studymaterial & Previous Papers Print PDF
No ratings yet
Mrunal (Studyplan) UPPCS Preliminary Exam Paper 2 - Aptitude, Maths, Reasoning, Decision Making, English, Hindi, Free Studymaterial & Previous Papers Print PDF
15 pages
Model-Based Testing in Practice
No ratings yet
Model-Based Testing in Practice
10 pages
Ebook-English Verbal Phrases PDF
No ratings yet
Ebook-English Verbal Phrases PDF
13 pages
) Emma's Strange Pet
No ratings yet
) Emma's Strange Pet
28 pages
Programming in C C
No ratings yet
Programming in C C
10 pages
PF Status Function List
No ratings yet
PF Status Function List
1 page
Debate Guide
No ratings yet
Debate Guide
27 pages
Ec2209 Set 3
No ratings yet
Ec2209 Set 3
2 pages
Electromagnetic Waves _ Short Notes
No ratings yet
Electromagnetic Waves _ Short Notes
1 page
929 Vimala Invoicegst
No ratings yet
929 Vimala Invoicegst
1 page

Rabin-Karp Algorithm

Uploaded by

Rabin-Karp Algorithm

Uploaded by

Rabin-Karp String Matching Algorithm

 Suppose p represents values corresponding to pattern P[1…m] of length m. And t s represents

 Given ts, we can compute ts+1 as,

ts+1 = d(ts – dm-1 T[s + 1]) + T[s + m + 1]

 Assume that T = [4, 3, 1, 5, 6, 7, 5, 9, 3] and P = [1, 5, 6]. Here length of P is 3, so m = 3.

t1 = 10(431 – 102T[1]) + T[4]

 Mod function has some nice mathematical property.

 [(a mod k) + (b mod k)] mod k = (a + b) mod k

 (a mod k) mod k = a mod k

 Suppose ts represents decimal value of substring

ts = p mod q, does not mean ts = p

 If p = 45365, ts = 64371 and q = 11,

Given pattern P = 31415, and prime number q = 13

P mod q = 31415 mod 13 = 7

Let us find hash value for given text :

T[1…5] = 23590 mod 13 = 8

T[2…6] = 35902 mod 13 = 9

T[(m – 4) … m] = 39921 mod 13 = 11

Here, m = length of pattern = 5.

ts = 23590. Next value ts+1 is derived as,

ts+1 = 10(ts) – 10m* T[s+1] + T[s + m + 1] )

= 235900 – 200000 + 2 = 35902

ts+2 = 10(ts+1) – 10m* T[s+2] + T[s + m + 2] )

= 359020 – 300000 + 3 = 59023

Calculation for hash of 14152

Hash for ts+1 = (d*(ts – T[s+1]h) + T[s + m + 1]) mod q

= 10*(31415 – (3*104 mod 13)) + 2 mod 13

You might also like

= 10(31415 – (3104 mod 13)) + 2 mod 13