0% found this document useful (0 votes)
4 views

BNP Unit-5 Lecture 19

The document discusses string matching algorithms, focusing on the Naive String Matching and Rabin-Karp algorithms. It explains the process of finding a substring within a larger string and provides details on the complexity and implementation of these algorithms. Additionally, it highlights the performance of the Rabin-Karp algorithm in terms of preprocessing and average-case running time.

Uploaded by

aniketpsingh2004
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

BNP Unit-5 Lecture 19

The document discusses string matching algorithms, focusing on the Naive String Matching and Rabin-Karp algorithms. It explains the process of finding a substring within a larger string and provides details on the complexity and implementation of these algorithms. Additionally, it highlights the performance of the Rabin-Karp algorithm in terms of preprocessing and average-case running time.

Uploaded by

aniketpsingh2004
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Design & Analysis of Algorithms

(KCS-503)
Unit-5
String matching
Course Outline:-
⮚ String Matching
⮚ Naive String Matching
⮚ Rabin-Karp-String Matching
String Matching

String Matching Algorithm is also called "String Searching Algorithm." This is a vital
class of string algorithm is declared as "this is the method to find a place where one is
several strings are found within the larger string.“
• Given a text array, T [1.....n], of n character and a pattern array, P [1......m], of m
characters. The problems are to find an integer s, called valid shift where 0 ≤ s < n-m
and T [s+1......s+m] = P [1......m].
• In other words, to find even if P in T, i.e., where P is a substring of T. The item of P
and T are character drawn from some finite alphabet such as {0, 1} or {A, B .....Z, a,
b..... z}.Given a string T [1......n], the substrings are represented as T [i......j] for some
0≤i ≤ j≤n-1, the string formed by the characters in T from index i to index j,
inclusive. This process that a string is a substring of itself (take i = 0 and j =m).
Algorithms used for String Matching

There are different types of method is used to finding the


string

• The Naive String Matching Algorithm


• The Rabin-Karp-Algorithm
• Finite Automata
• The Knuth-Morris-Pratt Algorithm
The Naive String Matching Algorithm
The naïve approach tests all the possible placement of Pattern P [1.......m] relative
to text T [1......n]. We try shift s = 0, 1.......n-m, successively and for each shift s.
Compare T [s+1.......s+m] to P [1......m].
The naïve algorithm finds all valid shifts using a loop that checks the condition P
[1.......m] = T [s+1.......s+m] for each of the n - m +1 possible value of s.
NAIVE-STRING-MATCHER (T, P)
1. n ← length [T]
2. m ← length [P]
3. for s ← 0 to n -m
4. do if P [1.....m] = T [s + 1....s + m]
5. then print "Pattern occurs with shift" s
The Naive String Matching Algorithm
Analysis: This for loop from 3 to 5 executes for n-m + 1(we need at least
m characters at the end) times and in iteration we are doing m
comparisons. So the total complexity is O ((n-m+1).m).

Show the comparisons the naive string matcher makes for the
pattern P = 0001 in the text T = 000010001010001.
The Rabin-Karp-Algorithm
Rabin and Karp have proposed a string-matching algorithm that performs well in
practice and that also generalizes to other algorithms for related problems, such as
two-dimensional pattern matching. The Rabin-Karp algorithm uses Θ(m)
preprocessing time, and its worst-case running time is Θ((n - m +1)m). Based on
certain assumptions, however, its average-case running time is better.
Given a pattern P[1………m], we let p denote its corresponding decimal value. In
a similar manner, given a text T [1………n], we let t s denote the decimal value of
the length-m substring T[s + 1…… s + m], for s = 0, 1, . . . , n - m. Certainly, t s = p
if and only if T [s + 1.. s + m] = P[1……m]; thus, s is a valid shift if and only if t s =
p. If we could compute p in time Θ(m) and all the t s values in a total of Θ(n - m +
1) time,[1] then we could determine all valid shifts s in time Θ(m) + Θ(n - m + 1) =
Θ(n) by comparing p with each of the ts's.
The Rabin-Karp-Algorithm
In general, with a d-ary alphabet {0, 1, . . . ,d - 1}, we choose q so that d q fits within a
computer word and adjust the recurrence equation to work modulo q, so that it
becomes
ts+1 = (d(ts - T[s + 1]h) + T[s + m + 1]) mod q ,
where h = dm-1 (mod q) is the value of the digit "1" in the high-order position of an m-
digit text window.
Any shift s for which ts = p (mod q) must be tested further to see if s is really valid or
we just have a spurious hit. This testing can be done by explicitly checking the
condition P[1 . . m] = T[s + 1 . . s + m]. If q is large enough, then we can hope that
spurious hits occur infrequently enough that the cost of the extra checking is low.
The Rabin-Karp-Algorithm
RABIN-KARP-MATCHER (T, P, d, q)
1. n ← length [T]
2. m ← length [P]
3. h ← dm-1 mod q
4. p ← 0
5. t0 ← 0
6. for i ← 1 to m
7. do p ← (dp + P[i]) mod q
8. t0 ← (dt0+T [i]) mod q
9. for s ← 0 to n-m
10. do if p = ts
11. then if P [1.....m] = T [s+1.....s + m]
12. then "Pattern occurs with shift" s
13. If s < n-m
14. then ts+1 ← (d (ts -T [s+1]h)+T [s+m+1])mod q
The Rabin-Karp-Algorithm
The running time of RABIN-KARP-MATCHER is ((n - m +
1)m) in the worst case, since (like the naive string-matching
algorithm) the Rabin-Karp algorithm explicitly verifies every
valid shift. If P = am and T = an, then the verifications take time
((n - m + 1)m), since each of the n - m + 1 possible shifts is valid.
(Note also that the computation of d m-1 mod q on line 3 and the
loop on lines 6-8 take time O(m) = O((n - m + 1 )m).) q
Numerical
Working modulo q = 11, how many spurious hits
does the Rabin-Karp matcher encounter in the
text T = 3141592653589793 when looking for the
pattern P = 26?
The End

B N Pandey 7/5/2020

You might also like