Hashing
Hashing
Components of Hashing
There are majorly three components of hashing:
1. Key: A Key can be anything string or integer which is fed as input in the hash
function the technique that determines an index or location for storage of an item in
a data structure.
2. Hash Function: The hash function receives the input key and returns the index of
an element in an array called a hash table. The index is known as the hash index.
3. Hash Table: Hash table is a data structure that maps keys to values using a special
function called a hash function. Hash stores the data in an associative manner in an
array where each data value has its own unique index.
How does Hashing work?
Suppose we have a set of strings {“ab”, “cd”, “efg”} and we would like to store it in a
table.
Our main objective here is to search or update the values stored in the table quickly in
O(1) time and we are not concerned about the ordering of strings in the table. So the
given set of strings can act as a key and the string itself will act as the value of the
string but how to store the value corresponding to the key?
Step 1: We know that hash functions (which are some mathematical formula) are
used to calculate the hash value which acts as the index of the data structure where
the value will be stored.
Step 2: So, let’s assign
“a” = 1,
“b”=2, .. etc, to all alphabetical characters.
Step 3: Therefore, the numerical value by summation of all characters of the string:
“ab” = 1 + 2 = 3,
“cd” = 3 + 4 = 7 ,
“efg” = 5 + 6 + 7 = 18
Step 4: Now, assume that we have a table of size 7 to store these strings. The hash
function that is used here is the sum of the characters in key mod Table size. We
can compute the location of the string in the array by taking the sum (string) mod
7.
Step 5: So we will then store
“ab” in 3 mod 7 = 3,
“cd” in 7 mod 7 = 0, and
“efg” in 18 mod 7 = 4.
The above technique enables us to calculate the location of a given string by using a
simple hash function and rapidly find the value that is stored in that location. Therefore
the idea of hashing seems like a great way to store (key, value) pairs of the data in a
table.
1. Division Method:
This is the most simple and easiest method to generate a hash value. The hash function
divides the value k by M and then uses the remainder obtained.
Formula:
h(K) = k mod M
Here,
k is the key value, and
M is the size of the hash table.
It is best suited that M is a prime number as that can make sure the keys are more
uniformly distributed. The hash function is dependent upon the remainder of a division.
Example:
k = 12345
M = 95
h(12345) = 12345 mod 95
= 90
k = 1276
M = 11
h(1276) = 1276 mod 11
=0
Pros:
1. This method is quite good for any value of M.
2. The division method is very fast since it requires only a single division operation.
Cons:
1. This method leads to poor performance since consecutive keys map to consecutive
hash values in the hash table.
2. Sometimes extra care should be taken to choose the value of M.
2. Mid Square Method:
The mid-square method is a very good hashing method. It involves two steps to
compute the hash value-
1. Square the value of the key k i.e. k2
2. Extract the middle r digits as the hash value.
Formula:
h(K) = h(k x k)
Here,
k is the key value.
The value of r can be decided based on the size of the table.
Example:
Suppose the hash table has 100 memory locations. So r = 2 because two digits are
required to map the key to the memory location.
k = 60
k x k = 60 x 60
= 3600
h(60) = 60
The hash value obtained is 60
Pros:
1. The performance of this method is good as most or all digits of the key value
contribute to the result. This is because all digits in the key contribute to generating
the middle digits of the squared result.
2. The result is not dominated by the distribution of the top digit or bottom digit of the
original key value.
Cons:
1. The size of the key is one of the limitations of this method, as the key is of big size
then its square will double the number of digits.
2. Another disadvantage is that there will be collisions but we can try to reduce
collisions.
4. Multiplication Method
This method involves the following steps:
1. Choose a constant value A such that 0 < A < 1.
2. Multiply the key value with A.
3. Extract the fractional part of kA.
4. Multiply the result of the above step by the size of the hash table i.e. M.
5. The resulting hash value is obtained by taking the floor of the result obtained in step
4.
Formula:
h(K) = floor (M (kA mod 1))
Here,
M is the size of the hash table.
k is the key value.
A is a constant value.
Example:
k = 12345
A = 0.357840
M = 100
h(12345) = floor[ 100 (12345*0.357840 mod 1)]
= floor[ 100 (4417.5348 mod 1) ]
= floor[ 100 (0.5348) ]
= floor[ 53.48 ]
= 53
Pros:
The advantage of the multiplication method is that it can work with any value between
0 and 1, although there are some values that tend to give better results than the rest.
Cons:
The multiplication method is generally suitable when the table size is the power of two,
then the whole process of computing the index by the key using multiplication hashing
is very fast.
Collision in Hashing