Ch7 Hashing
Ch7 Hashing
Abhijit Sarkar
Asst. Professor
Dept. of CSE (Cyber Security)
Haldia Institute of Technology
It is a technique where items are placed into a structure based on a key-to-address transformation.
We use hashing for,
i) Performing optimal searches and retrieval of data at a constant time i.e. O(1).
ii) It increases speed, better ease of transfer, improves retrieval, optimizes searching of data and
reduces overhead.
Hash Table:-
A hash table is a data structure that uses a random access data structure, such as an array and a
mapping function, called a hash function. Data is stored into this arrayat specific index
generated by a hash function.
Hash Function:-
It is a mapping between a set of input values and a set of integers, known as hash values. It is
usually denoted by H.
According to this method, the key value is divided by appropriate number, generally a prime
number and division remainder is used as the address for the record.
H(K)=K Mod M
KKey
MSize of the hash table (usually a prime number)
H(K)Hash Function
Example:- Keys: 89,18,58,7,49,9 Table Size(M): 7
Hash Table
Key % M Index
89 % 7 5
18 % 7 4
58 % 7 2
7%7 0
49 % 7 0
9% 7 2
Hash Function: H(K)=K2 or some digit is taken from computed K2
We can consider the first digit 4 to indicate that the element 7 will be in index 4.
A collision between two keys suppose K and K’ occurs when both have to be stored in the hash
table and both hash to the same address in the table.
Consider the example of mentioned keys using Division Remainder method.
Keys: 89,18,49,58,9,7 Hash table size=7
Already there is one element 58 in the position 2. But, 9 is also hashed to position 2, which is
occupied by 58. When, this kind of situation occurs, it can be stated that collision has occurred.
1) Open Addressing
In case of collision, other positions of the hash table are checked ( a probe sequence) until an
empty position is found.
Types of Open Addressing:-
A) Linear Probing
B) Quadratic Probing
C) Double Hashing
2) Chaining
In case of collision, an external data structure (linked list) in the hash table is used.
Linear Probing:-
If a data element hashes to a location in the hash table that is already occupied, then the table is
searched sequentially from that location until an open location is found.
Quadratic Probing:-
It is almost similar to linear probing, except that, here the space between places in the sequence
increases quadratically.
DIf collision occurs at this particular address place, then the next free location is calculated by,
D+12
D+22
D+32
…..
…..
…..
D+i2
Double Hashing:-
With this method, a collision is resolved by searching the table for an empty place at intervals given by a
different hash function, H(K)=(K+C) Mod M,
Where a second hash function is used to compute C.
Let’s consider the collision from same example.
58 & 9 collision takes place as both element hashes to index 2 (Hash table size:7)
By Double Hashing, we compute another hashing function to adjust the value 9.
C=H(K)=K Mod 5= H(9) = 9 Mod 5 =4
Applying Double Hashing,
H(K)=(K+C)Mod 7 = (9+4) Mod 7=6
Element 9 can be adjusted in position 6. The second hash function must be choosen based on the following
characteristics:-
a) It must not be the same as the primary hash function.
b) It must never output a 0. Otherwise, there would be no stop, every probe would land on the same cell, and
the algorithm would go into an endless loop.
Chaining
In this method, a linked list is created at each index in the hash table.
Items that hashes to the same index are simply added to the linked list at that index.
There is no need to search for empty cells in the primary hash table array.
If we have some elements like {15, 47, 23, 34, 85, 97, 65, 89, 70}. And our hash function is h(x) = x mod 7.
Hash values will be:- Hashing with Chaining will be like:-
Here,
Max Chain Length: 2
Min Chain length: 0
Clustering in hash table occurs when filled sequence in a hash table becomes longer.
It means that positions are occupied by elements and we have to search a longer period of time to get an
empty cell.
When a hash table becomes loaded with elements then cluster grow larger.
This means that its very slow to access cells at the end of the sequence.