What is Hashing
What is Hashing
Hashing is a technique or process of mapping keys, and values into the hash
table by using a hash function. It is done for faster access to elements. The
efficiency of mapping depends on the efficiency of the hash function used.
Let a hash function H(x) maps the value x at the index x%10 in an Array. For
example if the list of values is [11,12,13,14,15] it will be stored at positions
{1,2,3,4,5} in the array or Hash table respectively.
Hashing Components:
1) Hash Table: hash table is a generalization of array. Hash table gives the
functionality in which a collection of data is stored in such a way that it is
easy to find those items later if required. This makes searching of an element
very efficient.
2) Hash Function: hash function is used to transform a given key into a
specific slot index. Its main job is to map each and every possible key into a
unique slot index. If every key is mapped into a unique slot index, then the
hash function is known as a perfect hash function. It is very difficult to create
a perfect hash function
A good hash function should have following properties:
1. Efficiently computable.
2. Should uniformly distribute the keys (Each table position equally likely for
each).
3. Should minimize collisions.
4. Should have a low load factor(number of items in table divided by size of
the table).
For example for phone numbers a bad hash function is to take first three
digits. A better function is consider last three digits. Please note that this may
not be the best hash function. There may be better ways.
3) Collision Handling: Since a hash function gets us a small number for a
big key, there is possibility that two keys result in same value. The situation
where a newly inserted key maps to an already occupied slot in hash table is
called collision and must be handled using some collision handling technique.
Many hash functions use alphanumeric or numeric keys. The main hash
functions cover -
Division Method.
Mid Square Method.
Folding Method.
Let's examine these methods in more detail.
1. Division Method
The division method is the simplest and easiest method used to generate a
hash value. In this hash function, the value of k is divided by M and uses the
remainder as obtained.
Advantages -
k = 1320
M = 11
h (1320) = 1320 mod 11
=0
The steps involved in computing this hash method include the following -
Advantages -
Since most or all of the key value's digits contribute to the outcome, this
strategy performs well. The middle digits of the squared result are
produced by a process in which all of the essential digits participate.
The top or bottom digits of the original key value do not predominate in
the outcome.
Disadvantages -
One of this method's constraints is the size of the key; if the key is large,
its square will have twice as many digits.
Chance of repeated collisions.
Example -
Let's take the hash table with 200 memory locations and r = 2, as decided on
the size of the mapping in the table.
k = 50
Therefore,
k=kxk
= 50 x 50
= 2500
Thus,
h(50) = 50
3. Folding Method
s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s
Advantages -
Breaks up the key value into precise equal-sized segments for an easy
hash value
Independent of distribution in a hash table
Disadvantages –
Sometimes inefficient if there are too many collisions
Example -
k = 54321
k1 = 54 ; k2 = 32 ; k3 = 1
Therefore,
s = k1 + k2 + k3
= 54 + 32+ 1
= 87
Thus,
h (k) = 87
COLLISIONS
Collision occurs when a hash value of a record being inserted hashes to an address ( i.e. Relative position) that already
contain a different record. (ie) When two key values hash to the same position.
Collision Resolution
1. Separate chaining
2. Open Addressing
3. Double Hashing
Separate chaining
Separate chaining is an open hashing technique. A pointer field is added to each record location. When an overflow
occurs this pointer is set to point to overflow blocks making a linked list.
In this method, the table can never overflow, since the linked list are only extended upon the arrival of new keys.
Insertion
To perform the insertion of an element, traverse down the appropriate list to check whether the element is
already in place.
If the element turns to be a new one, it is inserted either at the front of the list or at the end of the list. If it is
Advantages:
1) Simple to implement.
2) Hash table never fills up, we can always add more elements to chain.
4) It is mostly used when it is unknown how many and how frequently keys may be inserted or deleted.
Disadvantages:
1) Cache performance of chaining is not good as keys are stored using linked list. Open addressing provides
3) If the chain becomes long, then search time can become O(n) in worst case.
Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92, 73, 101.
OPEN ADDRESSING:
Like separate chaining, open addressing is a method for handling collisions. In Open Addressing, all elements are stored
in the hash table itself. So at any point, size of the table must be greater than or equal to the total number of keys
(Note that we can increase table size by copying old data if needed).
Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert k.
Search(k): Keep probing until slot‟s key doesn‟t become equal to k or an empty slot is reached.
Delete(k): Delete operation is interesting. If we simply delete a key, then search may fail. So slots of deleted keys are
Insert can insert an item in a deleted slot, but the search doesn‟t stop at a deleted slot.
LINEAR PROBING:
In linear probing, we linearly probe for next slot. For example, typical gap between two probes is 1 as taken in below
example also.
let hash(x) be the slot index computed using hash function and S be the table size
If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92, 73, 101.
QUADRATIC PROBING
Quadratic probing is similar to linear probing and the only difference is the interval between successive probes or entry
slots. Here, when the slot at a hashed index for an entry record is already occupied, you must start traversing until you
find an unoccupied slot. The interval between slots is computed by adding the successive value of an arbitrary
will be as follows:
DOUBLE HASHING
Double hashing is similar to linear probing and the only difference is the interval between successive probes. Here, the
Let us say that the hashed index for an entry record is an index that is computed by one hashing function and the slot
at that index is already occupied. You must start traversing in a specific probing sequence to look for an unoccupied
slot.
The Double Hashing is:
Example:
REHASHING
If the table is close to full, the search time grows and may become equal to the table size. When the load factor
exceeds a certain value (e.g. greater than 0.5) we do rehashing: Build a second table twice as large as the original
However, once done, the new hash table will have good performance.