0% found this document useful (0 votes)
20 views

Hashing

Hashing refers to generating a fixed-size output from variable input using hash functions to determine storage location. Hashing enables constant-time storage and retrieval by mapping keys to array indices via hash functions. Good hash functions uniformly distribute keys and minimize collisions.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Hashing

Hashing refers to generating a fixed-size output from variable input using hash functions to determine storage location. Hashing enables constant-time storage and retrieval by mapping keys to array indices via hash functions. Good hash functions uniformly distribute keys and minimize collisions.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Hashing

Hashing refers to the process of generating a fixed-size output from an input of


variable size using the mathematical formulas known as hash functions. This technique
determines an index or location for the storage of an item in a data structure.

Need for Hash data structure


Every day, the data on the internet is increasing multifold and it is always a struggle to
store this data efficiently. In day-to-day programming, this amount of data might not be
that big, but still, it needs to be stored, accessed, and processed easily and efficiently. A
very common data structure that is used for such a purpose is the Array data structure.
Now the question arises if Array was already there, what was the need for a new data
structure! The answer to this is in the word “efficiency“. Though storing in Array takes
O(1) time, searching in it takes at least O(log n) time. This time appears to be small, but
for a large data set, it can cause a lot of problems and this, in turn, makes the Array data
structure inefficient.
So now we are looking for a data structure that can store the data and search in it in
constant time, i.e. in O(1) time. This is how Hashing data structure came into play.
With the introduction of the Hash data structure, it is now possible to easily store data
in constant time and retrieve them in constant time as well.

Components of Hashing
There are majorly three components of hashing:
1. Key: A Key can be anything string or integer which is fed as input in the hash
function the technique that determines an index or location for storage of an item in
a data structure.
2. Hash Function: The hash function receives the input key and returns the index of
an element in an array called a hash table. The index is known as the hash index.
3. Hash Table: Hash table is a data structure that maps keys to values using a special
function called a hash function. Hash stores the data in an associative manner in an
array where each data value has its own unique index.
How does Hashing work?
Suppose we have a set of strings {“ab”, “cd”, “efg”} and we would like to store it in a
table.
Our main objective here is to search or update the values stored in the table quickly in
O(1) time and we are not concerned about the ordering of strings in the table. So the
given set of strings can act as a key and the string itself will act as the value of the
string but how to store the value corresponding to the key?
Step 1: We know that hash functions (which are some mathematical formula) are
used to calculate the hash value which acts as the index of the data structure where
the value will be stored.
Step 2: So, let’s assign
“a” = 1,
“b”=2, .. etc, to all alphabetical characters.
Step 3: Therefore, the numerical value by summation of all characters of the string:
“ab” = 1 + 2 = 3,
“cd” = 3 + 4 = 7 ,
“efg” = 5 + 6 + 7 = 18
Step 4: Now, assume that we have a table of size 7 to store these strings. The hash
function that is used here is the sum of the characters in key mod Table size. We
can compute the location of the string in the array by taking the sum (string) mod
7.
Step 5: So we will then store
“ab” in 3 mod 7 = 3,
“cd” in 7 mod 7 = 0, and
“efg” in 18 mod 7 = 4.

Mapping key with indices of array

The above technique enables us to calculate the location of a given string by using a
simple hash function and rapidly find the value that is stored in that location. Therefore
the idea of hashing seems like a great way to store (key, value) pairs of the data in a
table.

What is a Hash function?


The hash function creates a mapping between key and value, this is done through the
use of mathematical formulas known as hash functions. The result of the hash function
is referred to as a hash value or hash. The hash value is a representation of the original
string of characters but usually smaller than the original.
For example: Consider an array as a Map where the key is the index and the value is the
value at that index. So for an array A if we have index i who will be treated as the key
then we can find the value by simply looking at the value at A[i].

Types of Hash functions:


There are many hash functions that use numeric or alphanumeric keys. This article
focuses on discussing different hash functions:
1. Division Method.
2. Mid Square Method.
3. Folding Method.
4. Multiplication Method

1. Division Method:
This is the most simple and easiest method to generate a hash value. The hash function
divides the value k by M and then uses the remainder obtained.
Formula:
h(K) = k mod M
Here,
k is the key value, and
M is the size of the hash table.
It is best suited that M is a prime number as that can make sure the keys are more
uniformly distributed. The hash function is dependent upon the remainder of a division.

Example:
k = 12345
M = 95
h(12345) = 12345 mod 95
= 90
k = 1276
M = 11
h(1276) = 1276 mod 11
=0
Pros:
1. This method is quite good for any value of M.
2. The division method is very fast since it requires only a single division operation.
Cons:
1. This method leads to poor performance since consecutive keys map to consecutive
hash values in the hash table.
2. Sometimes extra care should be taken to choose the value of M.
2. Mid Square Method:
The mid-square method is a very good hashing method. It involves two steps to
compute the hash value-
1. Square the value of the key k i.e. k2
2. Extract the middle r digits as the hash value.

Formula:
h(K) = h(k x k)
Here,
k is the key value.
The value of r can be decided based on the size of the table.

Example:
Suppose the hash table has 100 memory locations. So r = 2 because two digits are
required to map the key to the memory location.
k = 60
k x k = 60 x 60
= 3600
h(60) = 60
The hash value obtained is 60
Pros:
1. The performance of this method is good as most or all digits of the key value
contribute to the result. This is because all digits in the key contribute to generating
the middle digits of the squared result.
2. The result is not dominated by the distribution of the top digit or bottom digit of the
original key value.
Cons:
1. The size of the key is one of the limitations of this method, as the key is of big size
then its square will double the number of digits.
2. Another disadvantage is that there will be collisions but we can try to reduce
collisions.

3. Digit Folding Method:


This method involves two steps:
1. Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where each part
has the same number of digits except for the last part that can have lesser digits than
the other parts.
2. Add the individual parts. The hash value is obtained by ignoring the last carry if
any.
Formula:
k = k1, k2, k3, k4, ….., kn
s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s
Here,
s is obtained by adding the parts of the key k
Example:
k = 12345
k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5
= 51
h(K) = 51
Note:
The number of digits in each part varies depending upon the size of the hash table.
Suppose for example the size of the hash table is 100, then each part must have two
digits except for the last part which can have a lesser number of digits.

4. Multiplication Method
This method involves the following steps:
1. Choose a constant value A such that 0 < A < 1.
2. Multiply the key value with A.
3. Extract the fractional part of kA.
4. Multiply the result of the above step by the size of the hash table i.e. M.
5. The resulting hash value is obtained by taking the floor of the result obtained in step
4.

Formula:
h(K) = floor (M (kA mod 1))
Here,
M is the size of the hash table.
k is the key value.
A is a constant value.
Example:
k = 12345
A = 0.357840
M = 100
h(12345) = floor[ 100 (12345*0.357840 mod 1)]
= floor[ 100 (4417.5348 mod 1) ]
= floor[ 100 (0.5348) ]
= floor[ 53.48 ]
= 53
Pros:
The advantage of the multiplication method is that it can work with any value between
0 and 1, although there are some values that tend to give better results than the rest.
Cons:
The multiplication method is generally suitable when the table size is the power of two,
then the whole process of computing the index by the key using multiplication hashing
is very fast.

Properties of a Good hash function


A hash function that maps every item into its own unique slot is known as a perfect
hash function. We can construct a perfect hash function if we know the items and the
collection will never change but the problem is that there is no systematic way to
construct a perfect hash function given an arbitrary collection of items. Fortunately, we
will still gain performance efficiency even if the hash function isn’t perfect. We can
achieve a perfect hash function by increasing the size of the hash table so that every
possible value can be accommodated. As a result, each item will have a unique slot.
Although this approach is feasible for a small number of items, it is not practical when
the number of possibilities is large.
So, we can construct our hash function to do the same but the things that we must be
careful about while constructing our own hash function.
A good hash function should have the following properties:
1. Efficiently computable.
2. Should uniformly distribute the keys (Each table position is equally likely for each.
3. Should minimize collisions.
4. Should have a low load factor (number of items in the table divided by the size of
the table).

Complexity of calculating hash value using the hash function


 Time complexity: O(n)
 Space complexity: O(1)

Problem with Hashing


If we consider the above example, the hash function we used is the sum of the letters,
but if we examined the hash function closely then the problem can be easily visualized
that for different strings same hash value is begin generated by the hash function.
For example: {“ab”, “ba”} both have the same hash value, and string {“cd”,”be”} also
generate the same hash value, etc. This is known as collision and it creates problem in
searching, insertion, deletion, and updating of value.
What is collision?
The hashing process generates a small number for a big key, so there is a possibility
that two keys could produce the same value. The situation where the newly inserted key
maps to an already occupied and it must be handled using some collision handling
technology.

Collision in Hashing

You might also like