0% found this document useful (0 votes)
3 views

Unit V

The document discusses data structures focusing on hashing, which allows for constant average time operations for insertions, deletions, and searches using hash tables. It explains collision resolution methods such as Open Hashing (Separate Chaining) and Closed Hashing (Open Addressing), along with techniques like Linear Probing, Quadratic Probing, and Double Hashing. Additionally, it covers pattern matching algorithms, including Brute Force, Knuth-Morris-Pratt, and Boyer-Moore, emphasizing their efficiency in string searching.

Uploaded by

Pawan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Unit V

The document discusses data structures focusing on hashing, which allows for constant average time operations for insertions, deletions, and searches using hash tables. It explains collision resolution methods such as Open Hashing (Separate Chaining) and Closed Hashing (Open Addressing), along with techniques like Linear Probing, Quadratic Probing, and Double Hashing. Additionally, it covers pattern matching algorithms, including Brute Force, Knuth-Morris-Pratt, and Boyer-Moore, emphasizing their efficiency in string searching.

Uploaded by

Pawan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Data Structures

Hashing

The implementation of hash tables is frequently called as hashing. Hashing is a technique used for performing
insertions, deletions and finds in constant average time.

The idea hash table data structure is merely an array of some fixed size, containing the keys. Typically a key is a
string with an associated value. We will refer to the table size as Hsize. The table runs from 0 to Hsize-1.
Each key is mapped into some number in the range 0 to Hsize-1 and placed in the appropriate cell. The mapping is
called a hash function, which should be simple to compute and should ensure that any two distinct keys get
different cells.

A hash function is used to distribute the keys evenly among the cells. Observe the following diagram:

If the input keys are integers, then the position to be mapped is calculated as follows:

Hash (Key) = Key mode Hsize


For Ex: Hash (X) = X mod 10. (For the following table)

0
1
2
3 Kiran 25000
4 Phil 31250
5
6 Dave 27500
7 Mary 28200
8
9

This is the basic idea of hashing. The only problem with choosing a function is deciding what to do when two keys
hash to the same value. This is known to be collision. The main problem is to resolve such collisions. There are
several methods for dealing with this. The main approaches that we discuss are:
• Open Hashing (Separate Chaining)
• Closed Hashing (Open Addressing)

Open Hashing (Separate Chaining): In this approach, we need to keep a list of all elements that hash to the same
value. An open hash table follows:
Data Structures

Consider a set of values: 0, 1, 4, 25, 16, 9, 81, 64, 36, 49

0
0
1
1 81
2

4
4 64
5
25
6

7 16 36

9
9 49

Closed Hashing (Open Addressing): Open hashing has the disadvantage of requiring pointers. This tends to slow
the algorithm down a bit because of the time required to allocate new cells, and it also essentially requires the
implementation of second data structure called Closed Hashing. It is an alternative technique that is used to resolve
collisions.

In this approach, if a collision occurs, alternate cells are tried until an empty cell is found. More formally
Hi(x) = (Hash(x) + f(i)) mod Hsize

There are three common collision-resolving techniques. They are:

1. Linear Probing
2. Quadratic Probing
3. Double Hashing

Linear Probing: In Linear Probing, f is a linear function of i, typically f(i) = I. This amounts to trying cells
sequentially in search of an empty cell. The following table gives the resemblance o the same.

Apply the same formula, Hi(x) = (Hash(x) + f(i)) mod Hsize


Hash (x) = x mod 10

The table is a closed hash table with linear probing:

The keys to be inserted are: {89, 18, 49, 58, 69}

H0(x) = (89 mod 10) + f(0)) mod 10, [ since I = 0], similarly other numbers.
It implies f(0) = 0; since f(i) = I; When the first collision occurs,
Data Structures

H1(x) = (89 mod 10) + f(1)) mod 10 since I = 1. For the next collision, I= 2 and so on.

0 49
1 58
2 69
3
4
5
6
7
8 18
9 89

Quadratic Probing: In this method, we eliminate the primary clustering problem of linear probing. Here the
collision function is f(i) = i2 . Based on this the table is as follows:

0 49
1
2 58
3 69
4
5
6
7
8 18
9 89
Note: If quadratic probing is used and the table size if prime, then a new element can always be inserted if the table
is at least half empty.

Double Hashing: The last collision method is double hashing. For double hashing, one popular choice is f(i) =
i.hash2(x) where hash2(x) = R – (x mod R). Here R is a prime number smaller than Hsize.

For example, for the above table of size 10, the nearest prime number is 7 i.e., R = 7. The first collision occurs when
49 is to be inserted. hash2(49) = 7 – 0 = 7. So ‘49’ is placed at position 6 from the place of 89. Similarly hash2(58) = 7 –
(58 mod 7) = 7 – 2 = 5. Hence ‘58’ is placed at 3 from the position of 18. Similarly other numbers will be placed.
Observe the resultant table give below:

0 69
1
2
3 58
4
5
6 49
7
8 18
9 89
Data Structures

Rehashing: If the table gets too full, the running time for the operations will start taking too long and inserts might
fail for closed hashing with quadratic resolution. This can happen if there are too many removals intermixed with
insertions. A solution, then, is to build another table is about twice as big as the original hash table.
For example, if the table size is 7 then its double is 14 and the nearest prime number is 17. Hence after
rehash the table size must be 17. Rehash can be done when the table is filled to its half or when an insertion gets
fail.

Pattern Matching Algorithms:

String matching is virtually very essential for computer users. While editing the text, the user may want to section
the paragraphs, searches for a pattern, replace a pattern. One among those operations is string searching or pattern
matching. The larger the text to be searched for, the more important is the efficiency of the searching algorithm. Of
course, this searching not only applied for text patterns but also to molecular biology, where people extract the
required patterns from a sequence of DNA. There are several pattern matching algorithms available. The following
are the essential techniques:

• Brute Force or Straight forward algorithm


• Knuth-Moris-Pratt algorithm
• Boyer Moore Algorithm

Brute Force or Straight forward algorithm: It is a simple approach in string or pattern matching. The comparison
starts at the first characters of text T and pattern P. If they match, comparison starts at second character and the
process would continue till all the characters in the pattern matches in the text or to the end of the text.
Example: Text: abbabbabb Pattern: bab

i) abbabbabb

b a b // mismatch

ii) abbabbabb

b a b // mismatch

iii) abbabbabb

b a b // match

The algorithm is given in the pseudo code:

Brute force (pattern P, text T)


{
m = length of P.
n = length of T.
i = 0;
Data Structures

while ( i < n- m)
{
j = 0;
while (( T[i] == P[j]) and ( j < m))
{ i++; j--; }
if (j==m)
Return match at (i- m);
i = i – j +1;
Return “no match”.
}

Boyer Moore Algorithm:

This algorithm searches for a pattern in quite a different way compare to KMP and brute force algorithms. The BM
algorithm is ideal for searching words in the documents. The main idea of the BM algorithm is to improve the
running time of the Brute-Force algorithm by adding two potentially time saving heuristics:

Looking-Glass heuristic: when testing a possible placement of P against T, begin the comparisons from the end of
P and move backward to the front of P.

Character-Jump heuristic: During the testing of a possible placement of P against T, a mismatch of text character
T[i] = c with the corresponding patter character P[j] is handled as follows:

If c is not contained in the pattern P, shift the pattern beyond the position of mismatch in T otherwise move
to the last occurrence of the c in the pattern.

Consider an example:

Text : If you wish to understand others you must

Pattern: must

You might also like