0% found this document useful (0 votes)
5 views

4th_Sem_DAA_Module_4

The document discusses various string matching algorithms, including the Naive pattern searching, Rabin-Karp, and Knuth-Morris-Pratt (KMP) algorithms, detailing their methodologies and complexities. It also briefly covers problems like the N-Queen problem, Hamiltonian Circuit problem, and Subset Sum problem, explaining their approaches and solutions. Each algorithm and problem is illustrated with examples to clarify their workings and applications.

Uploaded by

Subhransu Behera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

4th_Sem_DAA_Module_4

The document discusses various string matching algorithms, including the Naive pattern searching, Rabin-Karp, and Knuth-Morris-Pratt (KMP) algorithms, detailing their methodologies and complexities. It also briefly covers problems like the N-Queen problem, Hamiltonian Circuit problem, and Subset Sum problem, explaining their approaches and solutions. Each algorithm and problem is illustrated with examples to clarify their workings and applications.

Uploaded by

Subhransu Behera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Module – IV :

String Matching Algorithms :


What is String matching ?
Ans: Finding all occurrences of a pattern in a given text(or body of text).

 Naive pattern searching is the simplest method among other pattern searching
algorithms. It checks for all character of the main string to the pattern.

 Naive algorithm is exact string matching(means finding one or all exact occurrences
of a pattern in a text) algorithm.
 This algorithm is helpful for smaller texts. It does not need any pre-processing phases.
We can find substring by checking once for the string. It also does not occupy extra
space to perform the operation.

 The naive approach tests all the possible placement of Pattern P [1…….m] relative to
text T [1……n]. We try shift s = 0, 1…….n-m, successively and for each shift s.
Compare T [s+1…….s+m] to P [1……m].It returns all the valid shifts found.

NAIVE-STRING-MATCHER (T, P)
1. n ← length [T]
2. m ← length [P]
3. for s ← 0 to n -m
4. do if P [1.....m] = T [s + 1....s + m]
5. then print "Pattern occurs with shift"

Analysis: This for loop from 3 to 5 executes for n-m + 1(we need at least m characters at the
end) times and in iteration we are doing m comparisons. So the total complexity is O (n-
m+1).

 The test on line 4 determines whether the current shift is valid or not;this test involves
an implicit loop to check corresponding character positions until all positions.

 Line 5 prints out each valid shift s.

Working of Naive String Matching

The naive-string-matching procedure can be interpreted graphically as sliding a “template”


containing the pattern over the text, noting for which shifts all of the characters on the
template equal the corresponding characters in the text.

Example 1:

[1]
Example 2:

Input: txt[] = "THIS IS STRING MATCHING ALGORITHM"


pat[] = "STRING"
Output: Pattern found at position 10

Example 3:
Input:
Main String: “ABAAABCDBBABCDDEBCABC”
pattern: “ABC”
Output:
Pattern found at position: 4
Pattern found at position: 10
Pattern found at position: 18

What is the best case?


→The best case occurs when the first character of the pattern is not present in text at all.

txt[] = "BBACCAADDEE";

pat[] = "HBB";

The number of comparisons in best case is O(n).

What is the worst case ?


→The worst case of Naive Pattern Searching occurs in following scenarios.
1) When all characters of the text and pattern are same.

txt[] = "DDDDDDDDDDDD";

pat[] = "DDDDD";

2) Worst case also occurs when only the last character is different.

txt[] = "VVVVVVVVVVVVK";

pat[] = "VVVK";
The number of comparisons in the worst case is O(m*(n-m+1)).

Problem with Naive Algorithm

Suppose T=cabababcd and P=ababc

[2]
 Whenever a character mismatch occurs after matching of several characters, the
comparison begins by going back in from the character which follows the last.

Rabin Karp Algorithm :


The Rabin-Karp algorithm is a pattern-matching algorithm that uses hashing to
compare patterns and text. Here, the term Hashing refers to the process of mapping a
larger input value to a smaller output value, called the hash value. This process will help
in avoiding unnecessary comparison which optimizes the complexity of this algorithm.
Therefore, the Rabin-Karp algorithm has a time complexity of O(n + m), where n is the
length of the text and m is the length of the pattern.

 How does Rabin Karp Algorithm work?


 The Rabin-Karp algorithm checks the given pattern within a text by moving
window one by one, but without checking all characters for all cases, it finds the hash
value. Then, compare it with the hash values of all the substrings of the text that have
the same length as the pattern.
 If the hash values match, then there is a possibility that the pattern and the substring
are equal, and we can verify it by comparing them character by character. If the hash
values do not match, then we can skip the substring and move on to the next one. In
the next section, we will understand how to calculate hash values.
 Calculating hash value in Rabin Karp Algorithm
The steps to calculate hash values are as follows –

 Step 1: Assign modulus and a base value


 Suppose we have a text Txt = "DAACABCDBA" and a pattern Ptrn = "CAB". We
will first assign numerical values to the characters of text based on their ranking. The
leftmost character will have rank 1 and the rightmost ranks 10. Also, use base b =
10 (number of characters in the text) and modulus m = 11 for our hash function. It
should be noted that the modulus m needs to be a prime number as it will help in
avoiding overflow issues.

Step 2: Calculate hash value of Pattern


The equation to calculate the hash value of the pattern is as follows −

hash value(Ptrn) = (r * bl-i-1) mod 11

where, r: ranking of character

l: length of Pattern

[3]
i: index of character within the pattern

Therefore, the hash value of Patrn is −

h(Ptrn) = ((4 * 102) + (5 * 101) + (6 * 100)) mod 11

= 456 mod 11

=5

Step 3: Calculate hash value of first Text window


Start calculating the hash value for all characters in the text by sliding over them. We
will start with the first substring as shown below −

h(DAA) = ((1 * 102) + (2 * 101) + (3 * 100)) mod 11

= 123 mod 11
=6

Now, compare the hash value of pattern and the substring. If they match, check
whether characters are matching or not. If they do, we found our match otherwise,
move to the next characters.
In the above example, hash value did not matched. Hence, we move to the next
character.

Step 4: Updating the hash value


Now, we need to remove the previous character and move to the next character. In this
process, the hash value should also be updated till we find the match.

Knuth Morris Pratt String Matching Algorithm :


The KMP algorithm is used to solve the pattern matching problem which is a task of
finding all the occurrences of a given pattern in a text. It is very useful when it comes
to finding multiple patterns. For instance, if the text is "aabbaaccaabbaadde" and the
pattern is "aabaa", then the pattern occurs twice in the text, at indices 0 and 8.

The naive solution to this problem is to compare the pattern with every possible
substring of the text, starting from the leftmost position and moving rightwards. This
takes O(n*m) time, where 'n' is the length of the text and 'm' is the length of the
pattern.

When we work with long text documents, the brute force and naive approaches may
result in redundant comparisons. To avoid such redundancy, Knuth, Morris, and Pratt
developed a linear sequence-matching algorithm named the KMP pattern matching
algorithm. It is also referred to as Knuth Morris Pratt pattern matching algorithm.

[4]
How does KMP Algorithm work?
The KMP algorithm starts the search operation from left to right. It uses the prefix
function to avoid unnecessary comparisons while searching for the pattern. This
function stores the number of characters matched so far which is known as LPS
value. The following steps are involved in KMP algorithm −

 Define a prefix function.

 Slide the pattern over the text for comparison.


 If all the characters match, we have found a match.

 If not, use the prefix function to skip the unnecessary comparisons. If the LPS value
of previous character from the mismatched character is '0', then start comparison from
index 0 of pattern with the next character in the text. However, if the LPS value is
more than '0', start the comparison from index value equal to LPS value of the
previously mismatched character.

The KMP algorithm takes O(n + m) time and O(m) space. It is faster than the naive
solution because it skips the redundant comparisons, and only compares each
character of the text at most once.

Let's understand the input-output scenario of a pattern matching problem with an


example −

Input:

main String: "AAAABCAAAABCBAAAABC"

pattern: "AAABC"

Output:

Pattern found at position: 1

Pattern found at position: 7

Pattern found at position: 14

[5]
What is N Queen Problem?
In N-Queen problem, we are given an NxN chessboard and we have to
place N number of queens on the board in such a way that no two queens attack each
other. A queen will attack another queen if it is placed in horizontal, vertical or
diagonal points in its way. The most popular approach for solving the N Queen puzzle
is Backtracking.

Input Output Scenario


Suppose the given chessboard is of size 4x4 and we have to arrange exactly 4 queens
in it. The solution arrangement is shown in the figure below −

The final solution matrix will be −

0 0 1 0
1 0 0 0

0 0 0 1

0 1 0 0

Backtracking Approach to solve N Queens Problem


In the naive method to solve n queen problem, the algorithm generates all possible
solutions. Then, it explores all of the solutions one by one. If a generated solution
satisfies the constraint of the problem, it prints that solution.

Follow the below steps to solve n queen problem using the backtracking approach −
 Place the first queen in the top-left cell of the chessboard.

 After placing a queen in the first cell, mark the position as a part of the solution and
then recursively check if this will lead to a solution.

 Now, if placing the queen doesnt lead to a solution. Then go to the first step and place
queens in other cells. Repeat until all cells are tried.
 If placing queen returns a lead to solution return TRUE.

 If all queens are placed return TRUE.

 If all rows are tried and no solution is found, return FALSE.

[6]
Hamiltonian Circuit Problem :
 A Hamiltonian cycle is a cycle that contains all vertices in a graph . If a graph has

a Hamiltonian cycle, then the graph is said to be Hamiltonian.

 A Hamiltonian cycle, also called a Hamiltonian circuit, Hamilton cycle, or Hamilton

circuit, is a graph cycle (i.e., closed loop) through a graph that visits each node exactly

once . A graph possessing a Hamiltonian cycle is said to be a Hamiltonian graph.

 The Hamiltonian cycle problem is a special case of the travelling salesman problem,

obtained by setting the distance between two cities to one if they are adjacent and two

otherwise, and verifying that the total distance travelled is equal to n (if so, the route is

a Hamiltonian circuit; if there is no Hamiltonian circuit then the shortest route will be

longer).

Example:-

 Solution: Firstly, we start our search with vertex 'a.' this vertex 'a' becomes the root of

our implicit tree.

 Next, we choose vertex 'b' adjacent to 'a' as it comes first in lexicographical order (b, c,

d).

 Next, we select 'c' adjacent to 'b.'

 Next, we select 'd' adjacent to 'c.'

[7]
 Next, we select 'e' adjacent to 'd.'

 Next, we select vertex 'f' adjacent to 'e.' The vertex adjacent to 'f' is d and e, but they

have already visited. Thus, we get the dead end, and we backtrack one step and remove

the vertex 'f' from partial solution.

 From backtracking, the vertex adjacent to 'e' is b, c, d, and f from which vertex 'f' has

already been checked, and b, c, d have already visited. So, again we backtrack one

step. Now, the vertex adjacent to d are e, f from which e has already been checked, and

adjacent of 'f' are d and e. If 'e' vertex, revisited them we get a dead state. So again we

backtrack one step.

 Now, adjacent to c is 'e' and adjacent to 'e' is 'f' and adjacent to 'f' is 'd' and adjacent to

'd' is 'a.' Here, we get the Hamiltonian Cycle as all the vertex other than the start vertex

'a' is visited only once. (a - b - c - e - f -d - a).

[8]
Subset Sum Problem :
Subset sum problem is to find subset of elements that are selected from a given set
whose sum adds up to a given number K. We are considering the set contains non-
negative values.

In computer science, the subset sum problem is an important decision


problem in complexity theory and cryptography. There are several equivalent
formulations of the problem.

The Subset-Sum Problem is to find a subset's' of the given set S = (S1 S2 S3...Sn)
where the elements of the set S are n positive integers in such a manner that s'∈S and
sum of the elements of subset's' is equal to some positive integer 'X.'

The Subset-Sum Problem can be solved by using the backtracking approach. In this
implicit tree is a binary tree. The root of the tree is selected in such a way that
represents that no decision is yet taken on any input. We assume that the elements of
the given set are arranged in increasing order:

S1 ≤ S2 ≤ S3... ≤ Sn

The left child of the root node indicated that we have to include 'S1' from the set 'S'
and the right child of the root indicates that we have to execute 'S1'. Each node stores
the total of the partial solution elements. If at any stage the sum equals to 'X' then the
search is successful and terminates.
The dead end in the tree appears only when either of the two inequalities exists:

The sum of s' is too large i.e.

s'+ Si + 1 > X

The sum of s' is too small i.e.

Example: Given a set S = (3, 4, 5, 6) and X =9. Obtain the subset sum using
Backtracking approach.

Solution:

Initially S = (3, 4, 5, 6) and X =9.

S'= (∅)

The implicit binary tree for the subset sum problem is shown as fig:

[9]
The number inside a node is the sum of the partial solution elements at a particular
level.

Thus, if our partial solution elements sum is equal to the positive integer 'X' then at
that time search will terminate, or it continues if all the possible solution needs to be
obtained.

[10]

You might also like