0% found this document useful (0 votes)
4 views

Study_Material_on_Hashing

The document discusses hashing as a method to improve search efficiency in data structures, particularly in hash tables. It covers key concepts such as load factor, hash functions, collision resolution techniques (open addressing and chaining), and specific hashing methods like division, mid square, and folding. It also addresses issues like clustering and the importance of choosing appropriate hash functions to ensure uniform distribution of keys.

Uploaded by

parthiv2003ghosh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Study_Material_on_Hashing

The document discusses hashing as a method to improve search efficiency in data structures, particularly in hash tables. It covers key concepts such as load factor, hash functions, collision resolution techniques (open addressing and chaining), and specific hashing methods like division, mid square, and folding. It also addresses issues like clustering and the importance of choosing appropriate hash functions to ensure uniform distribution of keys.

Uploaded by

parthiv2003ghosh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Data Structure and Algorithms

PCC301
Module 4:

Hashing
• The problem at hands is to speed up searching. Consider the problem of searching an array for a given
value.
• If the array is not sorted, the search might require examining each and all elements of the array.
• If the array is sorted, we can use the binary search, and therefore reduce the worst case runtime
complexity to O(log n).
• We could search even faster if we know in advance the index at which that value is located in the array.
• With a magic function our search may be reduced to just one probe, giving us a constant runtime O(1)
on average (under reasonable assumptions) and O(n) in worst case.
Hashing is an improvement over Direct Access Table.
Load Factor

A critical statistic for a hash table is the load factor, that is the number of entries divided by the number of
buckets:

• Load factor =n/M


where:
n= number of entries/elements
M= number of buckets/table slots
As the load factor grows larger, the hash table becomes slower, and it may even fail to work (depending on the
method used).

Hash Functions
• A hash function is a function which when given a key, generates an address in the table.
• Problem of finding a hash function: There are ‘n’ keys such that all key values are between “a” and
“b”. It is required to find a hash function f(key) that transforms a key into an address in the range 0 to
(M-1), where M>n.
• An ideal hash function should distribute the keys uniformly over the range (0, M-1).
• Three popular hashing methods include: Division method, Mid Square method and Folding method.

Division Method
• Key is divided by M and the remainder is taken to be the address.
• H(k) =k mod M
• Produces addresses exactly in the range 0 to (M-1).
• M should be chosen carefully. Some choices are not satisfactory.
• Examples:
• Keys are decimal integers. M chosen to be power of 10, say 100. Then all keys having identical last 3
digits will hash into same address.
• M chosen to be an even integer. Then all even keys will hash into odd addresses.

Mid Square method


• Key is squared and a portion of the squared value is selected from the middle which is chosen as the
address.
• Say, a q-digit address is to be generated from a p-digit key.
• Example:
• Let p=4, q=3, key=3271.
• Square of 3271 is 10699441.
• 3 digits extracted from middle. 994 or 699 may be selected as address.
• Flexible method and can be modified if needed.
• When key is large, then a selected part of the key may be squared.
• Usually found to produce addresses that are uniformly distributed over the range of the hashing
function.

Folding Method
• From a p-digit key, a q-digit address is to be generated.
• Digits of a key are partitioned into groups of q-digits from the right. Groups are added and rightmost q-
digits of sum selected as address.
• Example:
• p=8, q=3, key=39427829.
• When partitioned, there are 3 groups: 39/427/829
• Adding 39, 427 and 829, we get 1295. Selecting last 3-digits, desired address is 295.
• Flexible and can be modified as required.

Collision/Conflict
• Different keys which are transformed to the same address are referred to as synonyms.
• Since a hash function gets us a small number for a big key, there is possibility that two keys result in
same value.
• The situation where a newly inserted key maps to an already occupied slot in hash table is called
collision.
• Example:
• Suppose M=3, key1=9, key2=27.
• Using division method results in same address for both the keys:
h(key1)=9 mod 3=0 and h(key2)=27 mod 3=0
• Hence, collision occurs.
• Needs to be resolved using collision resolution methods/techniques.
• Two types of collision resolution techniques: open addressing and chaining.
Open Addressing
• All entry records are stored in the bucket array itself.
• When a new entry has to be inserted, the buckets are examined, starting with the hashed-to slot and
proceeding in some probe sequence, until an unoccupied slot is found.
• When searching for an entry, the buckets are scanned in the same sequence, until either the target record
is found, or an unused array slot is found, which indicates that there is no such key in the table.
• Well-known probe sequences include:
• Linear probing: in which the interval between probes is fixed (usually 1).
• Quadratic probing: in which the interval between probes is increased by adding the successive
outputs of a quadratic polynomial to the starting value given by the original hash computation.
• Double hashing: in which the interval between probes is computed by another hash function

Double Hashing
• Uses the idea of applying a second hash function to key when a collision occurs.
• Can be done using:
(hash1(key)+i*hash2(key)) % M
• First hash function is typically hash1(key) = key % M.
• A popular second hash function is :hash2(key) = PRIME – (key % PRIME) where PRIME is a prime
smaller than the M.
• The value of i = 0, 1, . . ., M – 1. So we start from i = 0, and increase this until we get one free space.
• A good second hash function:
– Must never evaluate to zero
– Must make sure that all cells can be probed.
Clustering
• Linear probing suffers from primary clustering.
• A collision at address “i” indicates that many keys are mapped at “i”.
• Linear probing does not distribute these keys in the hash table.
• All keys clustered around the slot, which increases the search and insertion time.
• One way to resolve is to use quadratic probing, where the locations j, (j+1), (j+4), (j+9) etc. are
searched.
• Does not ensure that all slots in the hash table would be examined.
• Possible that a key could not be inserted even when hash table is not full.
• Quadratic probing gives rise to secondary clustering.
Chaining/Closed Addressing
• In the strategy known as separate chaining, direct chaining, or simply chaining, each slot of the bucket
array is a pointer to a linked list that contains the key-value pairs that hashed to the same location.
• Lookup requires scanning the list for an entry with the given key.
• Insertion requires adding a new entry record to either end of the list belonging to the hashed slot.
• Deletion requires searching the list and removing the element.

You might also like