0% found this document useful (0 votes)
5 views11 pages

What is Hashing

Hashing is a technique that maps keys to values in a hash table using a hash function for efficient data access. Key components include the hash table, hash function, and collision handling techniques, with various methods for generating hash values such as the division, mid-square, and folding methods. Collision resolution strategies include separate chaining and open addressing, with techniques like linear probing, quadratic probing, and double hashing to manage collisions effectively.

Uploaded by

deogirimemo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views11 pages

What is Hashing

Hashing is a technique that maps keys to values in a hash table using a hash function for efficient data access. Key components include the hash table, hash function, and collision handling techniques, with various methods for generating hash values such as the division, mid-square, and folding methods. Collision resolution strategies include separate chaining and open addressing, with techniques like linear probing, quadratic probing, and double hashing to manage collisions effectively.

Uploaded by

deogirimemo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

What is Hashing?

Hashing is a technique or process of mapping keys, and values into the hash
table by using a hash function. It is done for faster access to elements. The
efficiency of mapping depends on the efficiency of the hash function used.
Let a hash function H(x) maps the value x at the index x%10 in an Array. For
example if the list of values is [11,12,13,14,15] it will be stored at positions
{1,2,3,4,5} in the array or Hash table respectively.

Hashing Components:
1) Hash Table: hash table is a generalization of array. Hash table gives the
functionality in which a collection of data is stored in such a way that it is
easy to find those items later if required. This makes searching of an element
very efficient.
2) Hash Function: hash function is used to transform a given key into a
specific slot index. Its main job is to map each and every possible key into a
unique slot index. If every key is mapped into a unique slot index, then the
hash function is known as a perfect hash function. It is very difficult to create
a perfect hash function
A good hash function should have following properties:
1. Efficiently computable.
2. Should uniformly distribute the keys (Each table position equally likely for
each).
3. Should minimize collisions.
4. Should have a low load factor(number of items in table divided by size of
the table).
For example for phone numbers a bad hash function is to take first three
digits. A better function is consider last three digits. Please note that this may
not be the best hash function. There may be better ways.
3) Collision Handling: Since a hash function gets us a small number for a
big key, there is possibility that two keys result in same value. The situation
where a newly inserted key maps to an already occupied slot in hash table is
called collision and must be handled using some collision handling technique.

Types of Hash functions

Many hash functions use alphanumeric or numeric keys. The main hash
functions cover -

 Division Method.
 Mid Square Method.
 Folding Method.
Let's examine these methods in more detail.

1. Division Method

The division method is the simplest and easiest method used to generate a
hash value. In this hash function, the value of k is divided by M and uses the
remainder as obtained.

Formula - h(K) = k mod M

(where k = key value and M = the size of the hash table)

Advantages -

 This method works well for any value of M


 The division approach is extremely quick because it only calls for one
operation.
Disadvantages -

 This may lead to poor performance as consecutive keys are mapped to


consecutive hash values in the hash table
 There are situations when choosing the value of M requires particular
caution.
Example -

 k = 1320
 M = 11
 h (1320) = 1320 mod 11
 =0

2. Mid Square Method

The steps involved in computing this hash method include the following -

1. Squaring the value of k ( like k*k)


2. Extract the hash value from the middle r digits.
Formula - h(K) = h(k x k)

(where k = key value )

Advantages -

 Since most or all of the key value's digits contribute to the outcome, this
strategy performs well. The middle digits of the squared result are
produced by a process in which all of the essential digits participate.
 The top or bottom digits of the original key value do not predominate in
the outcome.
Disadvantages -

 One of this method's constraints is the size of the key; if the key is large,
its square will have twice as many digits.
 Chance of repeated collisions.
Example -

Let's take the hash table with 200 memory locations and r = 2, as decided on
the size of the mapping in the table.

 k = 50
 Therefore,
 k=kxk
 = 50 x 50
 = 2500
 Thus,
 h(50) = 50

3. Folding Method

There are two steps in this method -

1. The key-value k should be divided into a specific number of parts, such


as k1, k2, k3,..., kn, each having the very same number of digits aside
from the final component, which may have fewer digits than the
remaining parts.
2. Add each component separately. The last carry, if any, is disregarded to
determine the hash value.
Formula - k = k1, k2, k3, k4, ….., kn

s = k1+ k2 + k3 + k4 +….+ kn

h(K)= s

(Where, s = addition of the parts of key k)

Advantages -

 Breaks up the key value into precise equal-sized segments for an easy
hash value
 Independent of distribution in a hash table
Disadvantages –
 Sometimes inefficient if there are too many collisions
Example -

 k = 54321
 k1 = 54 ; k2 = 32 ; k3 = 1
 Therefore,
 s = k1 + k2 + k3
 = 54 + 32+ 1
 = 87
 Thus,
 h (k) = 87
COLLISIONS

Collision occurs when a hash value of a record being inserted hashes to an address ( i.e. Relative position) that already

contain a different record. (ie) When two key values hash to the same position.

Collision Resolution

The process of finding another position for the collide record.

Some of the collision Resolution Techniques

1. Separate chaining

2. Open Addressing

3. Double Hashing

Separate chaining

Separate chaining is an open hashing technique. A pointer field is added to each record location. When an overflow

occurs this pointer is set to point to overflow blocks making a linked list.

In this method, the table can never overflow, since the linked list are only extended upon the arrival of new keys.
Insertion

 To perform the insertion of an element, traverse down the appropriate list to check whether the element is

already in place.

 If the element turns to be a new one, it is inserted either at the front of the list or at the end of the list. If it is

duplicate element, an extra field is kept a d placed.

Advantages:

1) Simple to implement.

2) Hash table never fills up, we can always add more elements to chain.

3) Less sensitive to the hash function or load factors.

4) It is mostly used when it is unknown how many and how frequently keys may be inserted or deleted.

Disadvantages:

1) Cache performance of chaining is not good as keys are stored using linked list. Open addressing provides

better cache performance as everything is stored in same table.

2) Wastage of Space (Some Parts of hash table are never used)

3) If the chain becomes long, then search time can become O(n) in worst case.

4) Uses extra space for links


Example:

Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92, 73, 101.

OPEN ADDRESSING:

Like separate chaining, open addressing is a method for handling collisions. In Open Addressing, all elements are stored

in the hash table itself. So at any point, size of the table must be greater than or equal to the total number of keys

(Note that we can increase table size by copying old data if needed).

Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert k.

Search(k): Keep probing until slot‟s key doesn‟t become equal to k or an empty slot is reached.

Delete(k): Delete operation is interesting. If we simply delete a key, then search may fail. So slots of deleted keys are

marked specially as “deleted”.

Insert can insert an item in a deleted slot, but the search doesn‟t stop at a deleted slot.
LINEAR PROBING:

In linear probing, we linearly probe for next slot. For example, typical gap between two probes is 1 as taken in below

example also.

let hash(x) be the slot index computed using hash function and S be the table size

If slot hash(x) % S is full, then we try (hash(x) + 1) % S

If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S

Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92, 73, 101.

QUADRATIC PROBING

Quadratic probing is similar to linear probing and the only difference is the interval between successive probes or entry

slots. Here, when the slot at a hashed index for an entry record is already occupied, you must start traversing until you

find an unoccupied slot. The interval between slots is computed by adding the successive value of an arbitrary

polynomial in the original hashed index.


Let us assume that the hashed index for an entry is index and at index there is an occupied slot. The probe sequence

will be as follows:

index = index % hashTableSize

index = (index + 1) % hashTableSize

index = (index + 4) % hashTableSize

index = (index + 9) % hashTableSize

Quadratic Probing Example

DOUBLE HASHING

Double hashing is similar to linear probing and the only difference is the interval between successive probes. Here, the

interval between probes is computed by using two hash functions.

Let us say that the hashed index for an entry record is an index that is computed by one hashing function and the slot

at that index is already occupied. You must start traversing in a specific probing sequence to look for an unoccupied

slot.
The Double Hashing is:

f(i) = i . hash2 (x)

Where hash2(X) =R – ( X mod R) To choose a prime R < size

The probing sequence will be

Example:

REHASHING

If the table is close to full, the search time grows and may become equal to the table size. When the load factor

exceeds a certain value (e.g. greater than 0.5) we do rehashing: Build a second table twice as large as the original

and rehash there all the keys of the original table.


Rehashing is expensive operation, with running time O(N)

However, once done, the new hash table will have good performance.

You might also like