0% found this document useful (0 votes)
2 views

Unit-II DS Dictionaries and Hash Tables

Unit-II DS Dictionaries and Hash Tables

Uploaded by

Mohammed Afzal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Unit-II DS Dictionaries and Hash Tables

Unit-II DS Dictionaries and Hash Tables

Uploaded by

Mohammed Afzal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Unit-II: Dictionaries and Hash Tables

Dictionaries: linear list representation, skip list representation, operations - insertion, deletion and searching.
Hash Table Representation: hash functions, collision resolution-separate chaining, open addressing linear
probing, quadratic probing, double hashing, re-hashing, extendible hashing.

Dictionary Data Structure


o Dictionary is an important data structure that is usually used to store data in the form of key-value pairs.
o Each element presents in a dictionary data structure compulsorily have a key and some value is
associated with that particular key.
o Dictionary data structure is used to store the data in key-value pairs. Other names for the Dictionary data
structure are associative array, map, symbol table but broadly it is referred to as Dictionary.
o A dictionary or associative array is a general-purpose data structure that is used for the storage of a group
of objects.
o Many popular languages add Dictionary or associative array as a primitive data type in their languages
while other languages which don't consider dictionary or associative array as a primitive data type have
included dictionary or associative array in their software libraries.
o In Dictionary or associative array, the relation or association between the key and the value is known as the
mapping. Each value in the dictionary is mapped to a particular key present in the dictionary or vice-versa.
o The dictionary in data structures has an attribute called the key. The key is the attribute that helps to
locate the data or value in the memory. The keys are always unique within a dictionary. The values of the
dictionary in the data structure may or may not be unique. As the keys are unique (no two keys can be the
same in a dictionary data structure), they help to easily obtain the values related to the specific key.
o Dictionary in data structure can be found as built-in data types in various programming languages like
Python, C#, etc. But in C language there is no such direct type available.
o In computer science, a dictionary is an abstract data type that represents an ordered or unordered list of
key-value pair elements where keys are used to search/locate the elements in the list. In a dictionary
ADT, the data to be stored is divided into two parts: a Key and a Value.

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Characteristics of Dictionary
 Key-Value Pairs: Dictionaries store data as key-value pairs where each key is unique and maps to
exactly one value.
 Direct Access: The primary feature of dictionaries is to provide fast access to elements not by their
position, as in lists or arrays, but by their keys.
 Dynamic Size: Like many abstract data types, dictionaries typically allow for dynamic resizing. New
key-value pairs can be added, and existing ones can be removed.
 Ordering: Some dictionaries maintain the order of elements, such as ordered maps or sorted dictionaries.
Others, like hash tables, do not maintain any particular order.
 Key Uniqueness: Each key in a dictionary must be unique, though different keys can map to the same
value.
Fundamental Operations of Dictionary
 Insert: Add a new key-value pair to the dictionary.
 Delete: Remove a key-value pair from the dictionary.
 Update: Change the value associated with a given key.
 Search: Retrieve the value associated with a particular key.
 Keys: Return a collection of all the keys in the dictionary.
 Values: Return a collection of all the values in the dictionary.
Types of Dictionaries:
There are two major variations of dictionaries:
1. Ordered dictionary.
 In an ordered dictionary, the relative order is determined by comparison on keys.
 The order should be completely dependent on the key.
2. Unordered dictionary.
 In an unordered dictionary, no order relation is assumed on keys.
 Only equality operation can be performed on the keys.
Implementation of Dictionary
A dictionary can be implemented in several ways, such as
 Fixed length array.
 Linked lists.
 Hashing
 Trees (BST, Balancing BSTs, Splay trees, Tries etc.)

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Linear list representation

o A linear list representation of a dictionary involves using a simple list structure (like an array or a linked
list) to store key-value pairs. Each entry in the list consists of two parts: a unique key and an associated
value.
o Let's understand the implementation step by step:

Step – 1: Empty Dictionary: (head==NULL)


Start with an empty list. This list will hold all the key-value pairs.
head==NULL
Step – 2: Insertion:
To insert a new key-value pair into the dictionary:
Check for the Key's Existence: Iterate through the list to check if the given key already exists.
Update or Insert:
 If the key exists, update the associated value.
 If the key does not exist, append a new key-value pair to the list.
Algorithm for Insertion:
function insert(key, value):
for each pair in d5ictionary:
if pair.key == key:
pair.value = value
return
dictionary.append((key, value))
Step – 3: Deletion:
To delete a key-value pair from the dictionary:
 Search for the Key: Iterate through the list to find the key.
 Remove the Pair: Once the key is found, remove the key-value pair from the list.
Algorithm for Deletion:
function delete(key):
for each pair in dictionary:
if pair.key == key:
remove pair from dictionary
return
Step – 4: Update:
Updating a value for a specific key follows the same procedure as insertion. If the key exists, its
value is updated; otherwise, the key-value pair is added.
Step – 5: Search:
To find a value associated with a given key:
 Iterate Through the List: Go through each key-value pair in the list.
 Compare Keys: Check if the current key matches the search key.
 Return Value if Found: If a match is found, return the associated value.
Algorithm for Search:

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
function search(key):
for each pair in dictionary:
if pair.key == key:
return pair.value
return None // Or indicate that the key was not found
Step – 6: Traversal:
To traverse the dictionary and access each key-value pair:
 Iterate Through the List: Go through each key-value pair in the list.
 Process Each Pair: Perform the desired operation (like printing) on each pair.
Algorithm for Traversal:
function traverse():
for each pair in dictionary:
process(pair.key, pair.value)

Program for Creating an unOrdered list Dictionary using a linked list


//Dictionary Implementation Using Linked List
#include<stdio.h>
#include<stdlib.h>
#include<stdbool.h>

//node Declaration
struct node {
int key; // Assuming keys are integers
int value; // Assuming values are integers
struct node* next;
};

//Head Declaration
struct node* head=NULL;

//Function Declaration
bool isEmpty();
void insertPair(int,int);
void deletePair(int);
void displayPairs();
void searchPair(int);

//Driver Code
int main(void)
{
int choice, key, value;
while(choice!=5)
{
system("cls");
display();
printf("\n******Dictionary Implementation Using Linked List******");
printf("\n-------------------------------------------------------------------------");
Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.
Email: [email protected], Mob: +91-8179700193
printf("\n1.Insert");
printf("\n2.Delete");
printf("\n3.Display");
printf("\n4.Search");
printf("\n5.Exit");
printf("\nEnter your choice: ");
scanf("%d",&choice);
switch(choice)
{
case 1: printf("\nEnter key and value(pair): ");
scanf("%d %d", &key, &value);
insertPair(key, value);
printf("\npress any key to continue...");
getch();
break;
case 2: if(isEmpty())
printf("\nNo pairs in Dictionaries...:(");
else
{
printf("\nEnter key to delete: ");
scanf("%d", &key);
deletePair(key);
}
printf("\npress any key to continue...");
getch();
break;
case 3: displayPairs();
printf("\npress any key to continue...");
getch();
break;
case 4: if(isEmpty())
printf("\nNo pairs in Dictionaries...:(");
else
{
printf("\nEnter key to search: ");
scanf("%d", &key);
searchPair(key);
}
printf("\npress any key to continue...");
getch();
break;
case 5: printf("\nApplication is Exiting...:(");
exit(0);
default: printf("\nInvalid Option...:(");
printf("\npress any key to continue...");
getch();
}
}
return 0;
}
Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.
Email: [email protected], Mob: +91-8179700193
bool isEmpty()
{
if(head==NULL)
return true;
else
return false;
}

//Insertion Function
void insertPair(int key, int value)
{
struct node* newNode;
struct node* temp1;
struct node* temp2;
newNode = (struct node*)malloc(sizeof(struct node));
newNode->key = key;
newNode->value = value;
newNode->next = NULL;
if(head==NULL)
head=newNode;
else
{
temp1=head;
while(temp1!=NULL)
{
if(temp1->key==key)
{
temp1->value=value;
printf("\nkey=%d updated successfully with value=%d ...:(", key, value);
return;
}
temp2 = temp1;
temp1 = temp1->next;
}
temp2->next = newNode;
}
printf("<key=%d value=%d> pair Appended Successfully...:)", key, value);
}

//Deletion Function
void deletePair(int key)
{
struct node* temp1=head;
struct node* temp2;

// If head node holds the key


if (temp1 != NULL && temp1->key == key)
{
head = head->next;
Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.
Email: [email protected], Mob: +91-8179700193
free(temp1);
printf("\npair with key=%d successfully deleted...:)",key);
return;
}

// Search for the key


while (temp1 != NULL && temp1->key!=key)
{
temp2 = temp1;
temp1 = temp1->next;
}

// If key was not present


if (temp1 == NULL)
{
printf("\nDeletion not possible, Key not found...:(");
return;
}
temp2->next = temp1->next;
free(temp1);
printf("\npair with key=%d successfully deleted...:)",key);
}

//Search Function
void searchPair(int key)
{
struct node* temp = head;
while (temp != NULL)
{
if (temp->key == key)
{
printf("key=%d is associated with value=%d", key, temp->value);
return;
}
temp = temp->next;
}
printf("\nPair with key=%d is not found in Dictionary...:(", key);
}

//Display function - Traversal


void displayPairs()
{
struct node* temp = head;
if(isEmpty())
printf("\nNo pairs in Dictionaries...:(");
else
{
printf("\nDictionary pairs are:");
while(temp!=NULL)
{
Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.
Email: [email protected], Mob: +91-8179700193
printf("\n<key = %d, value= %d>", temp->key, temp->value);
temp = temp->next;
}
}
}

Output:

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
SKIP LIST REPRESENTATION
Skip List
o A skip list is a probabilistic data structure. The skip list is used to store a sorted list of elements or data
with a linked list. It allows the process of the elements or data to view efficiently.
o In one single step, it skips several elements of the entire list, which is why it is known as a skip list.
o The skip list is an extended version of the linked list. It allows the user to search, remove, and insert the
element very quickly.
o It consists of a base list that includes a set of elements which maintains the link hierarchy of the
subsequent elements.

Skip List Structure


o Skip List is typically built in two layers: The lowest layer and Top layers (express layers).
o The lowest layer of the skip list is a common sorted linked list, and the top layers of the skip list are like
an "express line" where the elements are skipped.
Base Layer: The bottom layer is a standard linear linked list that contains all the elements in the set,
sorted by key.
Express Layers: Above the base layer are one or more “express lanes” layers are maintained. Each layer
is a linked list that skips over some elements from the list below it. The topmost layer has the fewest
elements, and each layer doubles (or varies by some factor) the number of elements it skips compared to
the layer directly beneath it.

Working of the Skip list


o Let's take an example to understand the working of the skip list. In this example, we have 14 nodes, such
that these nodes are divided into two layers, as shown in the diagram.

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
o The lower layer is a common line that links all nodes, and the top layer is an express line that links only
the main nodes, as you can see in the diagram.
o Suppose you want to find 47 in this example. You will start the search from the first node of the express
line and continue running on the express line until you find a node that is equal a 47 or more than 47.
o You can see in the example that 47 does not exist in the express line, so you search for a node of less than
47, which is 40. Now, you go to the normal line with the help of 40, and search the 47, as shown in the
diagram.

Skip List Basic Operations


There are the following types of operations in the skip list.
o Insertion operation: It is used to add a new node to a particular location in a specific situation.
o Deletion operation: It is used to delete a node in a specific situation.
o Search Operation: The search operation is used to search a particular node in a skip list.

Algorithm of the insertion operation


Insertion (L, Key)
local update [0...Max_Level + 1]
a = L → header
for i = L → level down to 0 do.
while a → forward[i] → key forward[i]
update[i] = a

a = a → forward[0]
lvl = random_Level()
if lvl > L → level then
for i = L → level + 1 to lvl do
update[i] = L → header
L → level = lvl

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
a = makeNode(lvl, Key, value)
for i = 0 to level do
a → forward[i] = update[i] → forward[i]
update[i] → forward[i] = a

Algorithm of deletion operation


Deletion (L, Key)
local update [0... Max_Level + 1]
a = L → header
for i = L → level down to 0 do.
while a → forward[i] → key forward[i]
update[i] = a
a = a → forward[0]
if a → key = Key then
for i = 0 to L → level do
if update[i] → forward[i] ? a then break
update[i] → forward[i] = a → forward[i]
free(a)
while L → level > 0 and L → header → forward[L → level] = NIL do
L → level = L → level - 1

Algorithm of searching operation


Searching (L, SKey)
a = L → header
loop invariant: a → key level down to 0 do.
while a → forward[i] → key forward[i]
a = a → forward[0]
if a → key = SKey then return a → value
else return failure

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Example 1: Create a skip list by inserting the following keys in an empty skip list.
1. 6 with level 1.
2. 29 with level 1.
3. 22 with level 4.
4. 9 with level 3.
5. 17 with level 1.
6. 4 with level 2.
Answer:
Step 1: Insert 6 with level 1

Step 2: Insert 29 with level 1

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Step 3: Insert 22 with level 4

Step 4: Insert 9 with level 3

Step 5: Insert 17 with level 1

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Step 6: Insert 4 with level 2

Example 2: Consider the above example and search for key 17.

Answer:

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Advantages of the Skip list
1. If you want to insert a new node in the skip list, then it will insert the node very fast because there are no
rotations in the skip list.
2. The skip list is simple to implement as compared to the hash table and the binary search tree.
3. It is very simple to find a node in the list because it stores the nodes in sorted form.
4. The skip list algorithm can be modified very easily in a more specific structure, such as indexable skip
lists, trees, or priority queues.
5. The skip list is a robust and reliable list.
Disadvantages of the Skip list
1. It requires more memory than the balanced tree.
2. Reverse searching is not allowed.
3. The skip list searches the node much slower than the linked list.
Applications of the Skip list
1. It is used in distributed applications, and it represents the pointers and system in the distributed
applications.
2. It is used to implement a dynamic elastic concurrent queue with low lock contention.
3. It is also used with the QMap template class.
4. The indexing of the skip list is used in running median problems.
5. The skip list is used for the delta-encoding posting in the Lucene search
Representation of Dictionary Using Skip List:
o A Skip List is an efficient, probabilistic data structure that enables fast search, insertion, and deletion
operations.
o Here's a step-by-step explanation of how a Skip List is used to represent a dictionary:

Step 1: Understanding the Structure


A Skip List is composed of several layers of linked lists, where each higher layer provides a "shortcut"
through the lower layers. The bottom layer (Layer 0) contains all the elements, and each successive layer
contains a subset of these elements, chosen randomly.

Step 2: Layered Links


Each node in the Skip List contains pointers to the next node in the same layer and down to the same node in
the layer immediately below it.

Step 3: Initialization

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
When initializing a Skip List:
 Create a "head" node with pointers set to NULL for each layer.
 Set a maximum level for the list, which determines how many layers the list can have.
 Optionally, set a "tail" node with maximum possible key value to mark the end of each level.
Step 4: Insertion
To insert a key-value pair:
 Find the Position: Start from the topmost layer and move forward until you find a node with a greater
key or reach the end of the layer. Then, move down one layer and continue. Record the nodes where
you move down a level.
 Choose a Random Level: For the new node, randomly decide the number of layers (level) it will
participate in (usually done with a coin flip algorithm).
 Rearrange Pointers: For each level up to the chosen level, update the pointers of the recorded nodes
to include the new node.

Step 5: Search
To search for a key:
 Start from the topmost layer of the head node.
 Move forward in the current layer until you find a node with a greater key or reach the end.
 If you find a greater key, move down one layer.
 Repeat until you reach the bottom layer. If the key matches, return the value; otherwise, the key is not
in the list.
Step 6: Deletion
To delete a key-value pair:
 Perform a search for the key, keeping track of the predecessors at each level.
 If the key is found, update the pointers of these predecessor nodes to skip the node being deleted.
 Remove the node and deallocate its memory.
Step 7: Random Level Generation
The level for a new node is typically determined using a random process. A common method is a coin flip
algorithm: a random level is generated, and as long as a coin flip results in heads (or a random value meets a
certain condition), you increase the level.
Step 8: Traversal
To traverse the Skip List, simply follow the bottom layer from the head node to the end, processing or printing
the values.

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
INTRODUCTION - HASH TABLE
o A hash table is a widely used data structure that stores data in an associative manner.
o In a hash table, data is stored in an array format, where each data value has a unique key associated with it. The
efficiency of a hash table comes from its ability to convert these keys into indexes of an array through a process called
hashing.

Key Concepts of hash table


o Hash Function: The heart of a hash table is the hash function. This function takes a key and computes an index (an
integer) which determines where the data associated with that key should be stored in the table. A good hash function
distributes data uniformly across the table to minimize collisions (situations where different keys hash to the same
index).
o Bucket: The hash function H(key) is used to map several dictionary entries in the hash table. Each position of the hash
table is called bucket.
o Collision Resolution: Since a hash function may map multiple keys to the same index, collision resolution strategies
are crucial. Common methods include chaining (where each array element in the hash table stores a linked list of
entries that hash to the same index) and open addressing (where you probe for the next available slot using techniques
like linear probing, quadratic probing, or double hashing).
o Load Factor: The load factor of a hash table is the number of elements divided by the number of slots available in the
array. It's a measure of how full the hash table is. A higher load factor increases the likelihood of collisions, leading to
a decrease in performance. This often necessitates resizing the hash table.
o Dynamic Resizing: To maintain efficient operations, hash tables may need to be resized as elements are added or
removed. This involves creating a new, larger (or smaller) table, and rehashing all the existing elements into it.
Operations
 Insertion: Add a new key-value pair to the table.
 Deletion: Remove a key-value pair from the table.
 Search: Retrieve the value associated with a given key.
Advantages of hash table
 Efficient Data Retrieval: Offers near-constant time complexity for insertion, deletion, and search operations in
the average case.
 Direct Access: Data can be directly indexed and is not dependent on the number of elements in the table.
Disadvantages of hash table:
 Collision Management: Requires additional structures or algorithms to handle collisions effectively.
 Hash Function Design: Creating an effective hash function can be challenging for certain types of keys.
 Memory Overhead: Some implementations may consume more memory, especially to handle collisions and
resizing.

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Hash functions
o A hash function is a mathematical function that converts a given input value into a resulting hash value. A hash
function can be used to generate a checksum for data, index data in a database, or encrypt data.
o In data structures, a hash function is used to calculate the hash value of a key, which is then used to store and retrieve
the corresponding data.

Definition of hash function


o A hash function is a function that takes an input (or 'key') and returns a fixed-size string of bytes. The output is
typically a "hash code" or "hash value."
Purpose of hash function
o The primary purpose of a hash function in data structures like hash tables is to distribute keys evenly across an array,
minimizing the likelihood of collision (where two keys hash to the same index).
Properties of Good Hash Functions
 Deterministic: The same input should always produce the same hash value.
 Fast Computation: It should be quick to compute the hash value for any given input.
 Uniform Distribution: It should uniformly distribute the data across the entire set of possible hash values to
minimize collisions.
 Minimize Collisions: While collisions are inevitable, a good hash function should minimize them. Different keys
should hash to different values as much as possible.
 Avalanche Effect: A small change to the input should produce a significantly different hash value.
Types of Hash Functions
 Division Method
o Takes a key and divides it by a prime number, then uses the remainder as the hash value. This method is simple
but effective for certain types of keys.
 Multiplication Method
o Multiplies the key by a constant, extracts a certain portion of the resulting number, and uses it as the hash value.
 Folding Method
o Splits the key into parts, adds them together, and then uses the sum or a portion of it as the hash value.
Challenges in Designing Hash Functions
 Handling Collisions: When two keys hash to the same index, a strategy is needed to resolve this (e.g., chaining, open
addressing).
 Choosing the Right Hash Function: The choice depends on the type of keys and the expected key distribution. No
single hash function is perfect for all scenarios.
 Avoiding Clustering: Poor hash functions can lead to clustering, where many keys hash to the same or nearby
indices, leading to performance degradation.
 Security Concerns: In cryptographic applications, hash functions need additional properties like resistance to pre-
image attacks, which are beyond the scope of typical data structure usage.
Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.
Email: [email protected], Mob: +91-8179700193
Hashing Example:
Keys: 24, 52, 91, 67, 48, 83
Consider Hash Function: k mod 10
Insert Operation:
Consider Key: 24
Hash value : 24 mod 10
: 4

Consider Key: 52
Hash value : 52 mod 10
: 2

Consider Key: 91
Hash value : 91 mod 10
: 1

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Consider Key: 67
Hash value : 67 mod 10
: 7

Consider Key: 48
Hash value : 48 mod 10
: 8

Consider Key: 83
Hash value : 83 mod 10
: 3

Search Operation:
Use the same hash function k mod 10 for searching
Search for Key: 67
67 mod 10 = 7 index
Goto index 7 and find the element.

Deletion Operation:
Use the same hash function k mod 10 for deletion
Delete key: 82
82 mod 10 = 2 index
Goto index 2 and delete the key
Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.
Email: [email protected], Mob: +91-8179700193
Collision Resolution
o In hash tables, collision resolution is a method used to handle situations where two or more keys hash to the
same index.
Collision in Hashing
Consider the keys: 24, 19, 32, 44, 58
Hash function: k mod n (n value 6)
Consider Key: 24
Hash value : 24 mod 6
: 0

Consider Key: 19
Hash value : 19 mod 6
: 1

Consider Key: 32
Hash value : 32 mod 6
: 2

Consider Key: 44
Hash value : 44 mod 6
: 2 Collision

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
o There are several techniques for collision resolution, each with its advantages and disadvantages. The most
commonly used methods are:

SEPARATE CHAINING
o Separate chaining is a widely used method to resolve collisions in hash tables. When two or more
elements are hash to the same location, these elements are represented into a singly-linked list like a
chain.
o Since this method uses extra memory to resolve the collision, therefore, it is also known as open hashing.
How Separate Chaining Works
S. No. Operation Description
1. Hash Table The hash table is an array of a fixed size. Each element of this array
Structure: is a head pointer to a linked list.
2. Hash Function: The hash function takes a key and computes an index in the array.
3. Insertion: o Apply the hash function to the key to get the index.
o Insert the key-value pair at the head of the linked list at this
index.
4. Searching: o Compute the index for the key using the hash function.
o Search through the linked list at that index for the desired key.
5. Deletion: o Find the index using the hash function.
o Search the linked list for the key and remove the corresponding
node.

Algorithm for Implementing Separate Chaining


Separate Chaining - Insert
function INSERT (hash table, key, hash value)
if hash table[hash_value] = null then

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
hash_table[hash_value] = new LinkedList (key)
else
hash_table[hash_value]. headInsert(key)

Separate Chaining - Insert Without Duplicates


function INSERT_NO_DUPLICATES (hash_table, key, hash_value)
if Search (hash_table, key, hash_value) != -1 then
return
if hash_table[hash_value] = null then
hash_table[hash_value] = new LinkedList(key)
else
hash_table[hash_value].headInsert(key)

Separate Chaining - Search


function SEARCH(hash_table, key, hash_value)
list ← hash_table[hash_value]
for i ← 0 to list.length -1 do
if list.get(i) = key then
return i
return -1

Consider the following example:


Whenever there is a collision, we create a linked list from the collision index to store additional values.
Example:
Consider the keys : 24, 19, 32, 44, 56
Hash function : k mod n (n value 6)
Consider Key: 24
Hash value : 24 mod 6
: 0

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Consider Key: 19
Hash value : 19 mod 6
: 1

Consider Key: 32
Hash value : 32 mod 6
: 2

Consider Key: 44
Hash value : 44 mod 6
: 2 Collision

Consider Key: 56
Hash value : 56 mod 6
: 2 Collision

This technique is called open hashing because whatever space is available in the hash table that will be utilized
and extra linked list space is also utilized.

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Advantages of Separate Chaining
 Simplicity: It is straightforward to implement.
 Less Sensitive to Hash Function: It works relatively well even with a poor hash function.
 Performance: The performance gracefully degrades as the load factor increases (the ratio of the number of
elements to the number of buckets).
Disadvantages of Separate Chaining
 Memory Overhead: Every element requires extra memory for the pointers in the linked list.
 Cache Performance: Poor cache performance compared to open addressing due to the use of linked
lists.
 Variable Length: The length of the chains can vary, leading to potentially poor performance if one chain
becomes significantly longer than others.

OPEN ADDRESSING
o Open addressing is a collision resolution technique used in hash tables. In open addressing, all elements are
stored directly in the hash table itself.
o When a collision occurs (i.e., two items hash to the same slot), the method seeks to find another slot to
accommodate one of the items using a probing sequence.
o It includes several sub-methods:
a) Linear probing
Description: When a collision occurs, linear probing searches for the next available slot linearly in the table.
How Linear Probing Works
1. Hash Function: Like any hash table, linear probing starts with a hash function that computes an initial
index for a given key.
2. Insertion:
o Compute the hash for the key to find the initial index.
o If the slot at the computed index is empty, insert the item there.
o If the slot is occupied, check the next slot (i.e., move linearly forward) until an empty slot is found.
o If the end of the table is reached, wrap around to the beginning.
Algorithm for insert
function INSERT (hash table, table length, key, hash_key)
index ← hash_value
first_scan ← true
while isEmpty(hash_table, index) = false do

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
index ← (index + 1) mod table_length
if index = hash_value and first_scan = false then
return false;
first_scan = false;
hash_table[index] ← key;
return true;

3. Searching:
o Compute the hash for the key to find the initial index.

o If the slot is empty, the key is not in the table.

o If the slot contains the key, return the item.

o If the slot contains a different key, move linearly forward until the key is found or an empty slot is

encountered.
Algorithm for searching
function SEARCH (hash_table, table_length, key, hash_key)
index ← hash_value
first_scan ← true
while hash table(index]!= key do
index ← (index + 1) mod table_length
if index = hash value and first scan = false then
return -1;
first_scan = false;
return index;

4. Deletion:
o Find the item using the search operation.
o Remove the item and mark the slot as deleted (a special marker different from empty and occupied).
o Note: Deletion complicates the search and insert operations because the search must continue past
deleted items, and insert can use slots marked as deleted.

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Consider the following Example – 1:
h(k) = k mod 10
h(k, i) = (h(k) + i) mod 10 (when collision occurs this hash function is used. Here h(k) is hash value and i is the
collision number/prob number.
keys: 43, 135, 72, 23, 99, 19, 82
Note: if you use a hash function k mod n, then the maximum limit of the hash table will be n and indexing will
be from (0 to n-1), as for the mod operation with the n the possible remainders will be between (0 to n-1) only.

Consider Key: 43
Hash value : 43 mod 10 (k mod 10)
: 3

Consider Key: 135


Hash value : 135 mod 10 (k mod 10)
: 5

Consider Key: 72
Hash value : 72 mod 10 (k mod 10)
: 2

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Consider Key: 23
Hash value : 23 mod 10
: 3 Collision (k mod 10)
: (3+1) mod 10 (h(k, i) when i=1)
: 4 index

Consider Key: 99
Hash value : 99 mod 10 (k mod 10)
: 9 index

Consider Key: 19
Hash value : 19 mod 10 (k mod 10)
: 9 Collision
: (9+1) mod 10 (h(k, i) when i=1)
: 0 index

Consider Key: 82
Hash value : 82 mod 10 (k mod 10)
: 2 Collision
: (2+1) mod 10 h(k, i) when i=1
: 3 Collision
: (2+2) mod 10 h(k, i) when i=2
: 4 Collision
: (2+3) mod 10 h(k, i) when i=3
: 5 Collision
: (2+4) mod 10 h(k, i) when i=4
: 6 index

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Consider the following Example – 2:
The keys 1, 3, 12, 4, 25, 6, 18, 20, 8 are inserted into an empty hash table of length 10 using open addressing
with hash function h(i)=i2 mod 10 and linear probing. Construct hash table and find the maximum probe value?
Solution:
h(k) = i2 mod 10
h(k, i) = (h(k) + i) mod 10 (when collision occurs this hash function is used. Here h(k) is hash value and i is the
collision number/prob number.
Keys: 1, 3, 12, 4, 25, 6, 18, 20, 8
Consider Key: 1
Hash value : 12 mod 10 (i2 mod 10)
: 1 index

Consider Key: 3
Hash value : 32 mod 10 (i2 mod 10)
: 9 index

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Consider Key: 12
Hash value : 122 mod 10 (i2 mod 10)
: 4 index

Consider Key: 4
Hash value : 42 mod 10 (i2 mod 10)
: 6 index

Consider Key: 25
Hash value : 252 mod 10 (i2 mod 10)
: 5 index

Consider Key: 6
Hash value : 62 mod 10 (i2 mod 10)
: 6 Collision
: (6+1) mod 10 (h(k, i) when i=1)
: 7 index

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Consider Key: 18
Hash value : 182 mod 10 (i2 mod 10)
: 4 Collision
: (4+1) mod 10 (h(k, i) when i=1)
: 5 Collision
: (4+2) mod 10 (h(k, i) when i=2)
: 6 Collision
: (4+3) mod 10 (h(k, i) when i=3)
: 7 Collision
: (4+4) mod 10 (h(k, i) when i=4)
: 8 index
Consider Key: 20
Hash value : 202 mod 10 (i2 mod 10)
: 0 index

Consider Key: 8
Hash value : 82 mod 10 (i2 mod 10)
: 4 Collision
: (4+1) mod 10 (h(k, i) when i=1)
: 5 Collision
: (4+2) mod 10 (h(k, i) when i=2)
: 6 Collision
: (4+3) mod 10 (h(k, i) when i=3)
: 7 Collision
: (4+4) mod 10 (h(k, i) when i=4)
: 8 Collision
: (4+5) mod 10 (h(k, i) when i=5)
: 9 Collision
: (4+6) mod 10 (h(k, i) when i=6)
: 0 Collision
: (4+7) mod 10 (h(k, i) when i=7)
: 1 Collision
: (4+8) mod 10 (h(k, i) when i=8)
: 2 index

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Advantages of Linear Probing
 Memory Efficiency: All data is stored in the hash table, no extra space for pointers like in separate
chaining.
 Cache-Friendly: Linear memory access pattern is good for cache performance.

Disadvantages of Linear Probing


 Clustering: One of the significant drawbacks of linear probing is clustering. Clusters of occupied slots
tend to grow, increasing the average search time.
 Load Factor: As the table becomes more filled, performance degrades. It’s essential to keep the load
factor (ratio of items to table size) relatively low.
 Deletion Complexity: Deleted slots must be marked specially and complicate the search process.

QUADRATIC PROBING
Quadratic probing is another method of open addressing used in hash tables to resolve collisions. Unlike linear
probing, where the interval between probes is fixed, quadratic probing uses a quadratic function to calculate the
interval between probes. This approach helps to reduce the clustering problem seen in linear probing.
How Quadratic Probing Works
1. Insertion using Quadratic Probing:
o Use the hash function hash(key) to calculate the initial index for the given key.
o Check if the slot at the calculated initial index is empty.
o If it's empty, place the key-value pair in that slot, and the insertion is complete.
o If the slot at the initial index is occupied (collision occurred), quadratic probing comes into play.
o Initialize a variable i to 1 (for the first iteration). Calculate the next probe index using the quadratic
probing formula:
o next_index = (initial_index + (i2)) % table_size

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
o Check if the slot at the calculated next_index is empty.
o If it's empty, place the key-value pair in that slot, and the insertion is complete.
o If the slot is occupied, repeat the probing process by incrementing i and recalculating
the next_index until an empty slot is found.
Algorithm for insert
function insert(key, value):
initial_index = hash(key)
index = initial_index
i=1
while table[index] is not empty:
index = (initial_index + i^2) % table_size
i=i+1
table[index] = (key, value)

2. Search using Quadratic Probing:


o Use the hash function hash(key) to calculate the initial index for the given key.
o Check if the slot at the calculated initial index contains the key you're searching for.
o If it does, return the associated value or indicate that the key is present.
o If the slot at the initial index does not contain the key (collision occurred), use quadratic probing.
o Initialize a variable i to 1 (for the first iteration). Calculate the next probe index using the quadratic
probing formula:
o next_index = (initial_index + (i2)) % table_size
o Check if the slot at the next_index contains the key you're searching for.
o If it does, return the associated value or indicate that the key is present.
o If it doesn't, increment i and repeat the probing process until an empty slot is found or the key is
located.
o If the entire probing process completes without finding the key (i.e., an empty slot is encountered),
indicate that the key is not found in the table.
Algorithm for search
function search(key):
initial_index = hash(key)
index = initial_index
i=1

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
while table[index] is not empty:
if table[index].key == key:
return table[index].value
index = (initial_index + i^2) % table_size
i=i+1
return "Key not found"

Consider the following example:


h(k) = k mod 10
h(k, i) = (h(k) + i2) mod 10
Keys: 42, 16, 91, 33, 18, 27, 36, 62
Consider Key: 42
Hash value : 42 mod 10 (k mod 10)
: 2 index

Consider Key: 16
Hash value : 16 mod 10 (k mod 10)
: 6 index

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Consider Key: 91
Hash value : 91 mod 10 (k mod 10)
: 1 index

Consider Key: 33
Hash value : 33 mod 10 (k mod 10)
: 3 index

Consider Key: 18
Hash value : 18 mod 10 (k mod 10)
: 8 index

Consider Key: 27
Hash value : 27 mod 10 (k mod 10)
: 7 index

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Consider Key: 36
Hash value : 36 mod 10 (k mod 10)
: 6 Collision
: (6+12) mod 10 (h(k, i) when i=1)
: 7 Collision
: (6+22) mod 10 (h(k, i) when i=2)
: 0 index

Consider Key: 62
Hash value : 62 mod 10 (k mod 10)
: 2 Collision
: (2+12) mod 10 (h(k, i) when i=1)
: 3 Collision
: (2+22) mod 10 (h(k, i) when i=2)
: 6 Collision No Guarantee of
: (2+32) mod 10 (h(k, i) when i=3) finding the empty
: 1 Collision slot.
: (2+42) mod 10 (h(k, i) when i=4)
: 8 Collision
: (2+52) mod 10 (h(k, i) when i=5)
: 7 Collision
: (2+62) mod 10 (h(k, i) when i=6)
: 8 Collision

Advantages of Quadratic Probing


 Reduces Clustering: Compared to linear probing, quadratic probing significantly reduces primary
clustering, as the probe sequence spreads out more quickly.
 Better Cache Performance: It can offer better cache performance than more randomized probing methods,
although not as good as linear probing.
 Utilizes Hash Table Efficiently: Quadratic probing tends to utilize the hash table more efficiently than
linear probing before the performance degrades due to clustering.
Disadvantages of Quadratic Probing
 Secondary Clustering: Although it solves primary clustering, quadratic probing can suffer from secondary
clustering where different keys that hash to the same initial index follow the same probe sequence.
 Complexity in Calculation: The probe sequence calculation is more complex than in linear probing.
 Table Size Constraints: For quadratic probing to work correctly, the table size should be a prime number,
and even then, it does not guarantee that all slots can be probed.
 Load Factor Sensitivity: Like other open addressing methods, as the load factor increases, performance
tends to degrade due to an increase in collisions.
Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.
Email: [email protected], Mob: +91-8179700193
DOUBLE HASHING
o Double hashing is a technique used in hash tables to resolve collisions through open addressing. Unlike
linear or quadratic probing, double hashing uses a second hash function to calculate the probe sequence.
o This approach significantly reduces the clustering issues seen in other probing methods.
Definition of Double Hashing
o Double hashing employs two hash functions. When a collision occurs (i.e., two keys hash to the same
index), the second hash function is used to determine the interval between probes.
How Double Hashing Works
1. Initial Hash Function: The first hash function computes an initial index. Let's denote this as h1(k).
2. Second Hash Function: A second, independent hash function computes another hash value. Let's denote
this as h2(k). It's crucial that h2(k) never evaluates to zero to ensure that the probe sequence advances.
3. Collision Resolution:
o Upon collision, the next index to probe is calculated using the formula:
(h1(k) + i * h2(k)) % table_size, where i is the ith probe (starting at 0).
o This process repeats, incrementing i, until an empty slot is found or the table is deemed full.

Consider the following examples:


h1(k) = k mod 11 (always divide the key with a prime number).
h2(k) = 8-(k mod 8) (the dividend of h2 should be less than the dividend of h1)
h’(k) = (h1(k) + ih2(k)) mod 11
keys: 20, 34, 45, 70, 56
Note: the extra hash function will help to uniformly distribute the keys across the hash table. This is an
important aspect of double hashing.

Consider Key: 20
Hash value : 20 mod 11 (h1(k) = k mod 11)
: 9 index

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Consider Key: 34
Hash value : 34 mod 11 (h1(k) = k mod 11)
: 1 index

Consider Key: 45
Hash value h1(k) : 45 mod 11 (h1(k) = k mod 11)
: 1 Collision
h2(k) : 8 – (45 mod 8) (h2(k) = 8 – (k mod 8))
: 3
h’(k) : (1+(1*3)) mod 11 (h’(k) = (h1(k) + ih2(k)) mod 11)
: 4 index

Consider Key: 70
Hash value h1(k) : 70 mod 11 (h1(k) = k mod 11)
: 4 Collision
h2(k) : 8 – (70 mod 8) (h2(k) = 8 – (k mod 8))
: 2
h’(k) : (4+(1*2)) mod 11 (h’(k) = (h1(k) + ih2(k)) mod 11)
: 6 index

Consider Key: 56
Hash value h1(k) : 56 mod 11 (h1(k) = k mod 11)
: 1 Collision
h2(k) : 8 – (56 mod 8) (h2(k) = 8 – (k mod 8))
: 8
h’(k) : (1+(1*8)) mod 11 when i = 1
(h’(k) = (h1(k) + ih2(k)) mod 11)
: 9 Collision
h’(k) : (1+(2*8)) mod 11 when i = 2
(h’(k) = (h1(k) + ih2(k)) mod 11)
: 6 Collision
h’(k) : (1+(3*8)) mod 11 when i = 3
Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.
Email: [email protected], Mob: +91-8179700193
(h’(k) = (h1(k) + ih2(k)) mod 11)
: 3 index

Advantages of Double Hashing


 Reduces Clustering: Double hashing significantly reduces both primary and secondary clustering.
 Efficient Table Utilization: It tends to utilize the hash table efficiently, ensuring that empty slots are
found even when the table starts getting full.
 Improved Performance: Offers better average-case performance for searching compared to linear and
quadratic probing.
Disadvantages of Double Hashing
 Complexity: Requires the computation of two hash functions, which might be more computationally
intensive.
 Design of Hash Functions: It's crucial to design both hash functions carefully to ensure they are
independent and that the second hash function never produces zero.
 Sensitivity to Hash Function Quality: The performance of double hashing is highly reliant on the
quality of both hash functions.

REHASHING
o Rehashing is a concept primarily used in computer science and data structures, specifically in the context of
hash tables or hash maps.
o Hash tables are data structures that allow efficient storage and retrieval of key-value pairs. They work by
using a hash function to map keys to specific locations (buckets) in an array, where the associated values are
stored.
o Rehashing is the process of resizing a hash table and redistributing its elements when the current size of the
table no longer efficiently accommodates the number of elements it contains.
o The primary goal of rehashing is to maintain a low load factor, which is the ratio of the number of stored
elements to the total number of buckets in the hash table.
o A low load factor helps ensure that the hash table remains efficient in terms of time complexity for insertion,
retrieval, and deletion operations.
Here's how rehashing typically works:
1. Check the Load Factor: Periodically or after each insertion operation, the hash table checks its load
factor. If the load factor exceeds a predefined threshold (often around 0.7 or 0.8), it indicates that the
table is becoming crowded, and rehashing is needed.
Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.
Email: [email protected], Mob: +91-8179700193
2. Create a New Hash Table: A new, larger hash table (usually with double the number of buckets) is
created. The number of buckets is increased to reduce the load factor and make the table more efficient.
3. Rehashing Process: Each element in the old hash table is rehashed, meaning their keys are mapped to
new bucket positions in the larger table using the updated hash function. This process redistributes the
key-value pairs among the new buckets.
4. Transfer Elements: The key-value pairs are transferred from the old table to the new table based on
their new hash values. This involves copying the data from the old table to the appropriate locations in
the new table.
5. Update References: Any references or pointers to the old hash table are updated to point to the new hash
table.
6. Dispose of the Old Table: Once all elements have been transferred, the old hash table can be deallocated
or discarded.
Consider the following example:
Keys: 13, 15, 26, 6, 23, 24, 7
h(k) = k mod 7
Presumed threshold value for α is 0.85

Consider Key: 13
Hash value : 13 mod 7 (h(k) = k mod 7)
: 6 index
Load Factor α : 1/7 = 0.14

Consider Key: 15
Hash value : 15 mod 7 (h(k) = k mod 7)
: 1 index
Load Factor α : 2/7 = 0.29

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Consider Key: 26
Hash value : 26 mod 7 (h(k) = k mod 7)
: 5 index
Load Factor α : 3/7 = 0.43

Consider Key: 6
Hash value : 6 mod 7 (h(k) = k mod 7)
: 6 Collision
Use Linear Probing: (h(k) + i) mod 7
: (6+1) mod 7
: 0 index
Load Factor α : 4/7 = 0.57
Consider Key: 23
Hash value : 23 mod 7 (h(k) = k mod 7)
: 2 index
Load Factor α : 5/7 = 0.71

Consider Key: 24
Hash value : 24 mod 7 (h(k) = k mod 7)
: 3 index
Load Factor α : 6/7 = 0.86

Here, at this point the α is crossing the threshold value, almost 86% of the hash table is full, and when we
perform more insertion into the table, it requires too many collisions before inserting the new key. This is the
right time to double the hash table and perform rehashing.
Double the hash table size, it will become 14 and take the greater and nearest prime number of it, greater and
nearest prime number of 14 is 17. Make the size of the new hash table as 17, and perform rehashing.

Performing Rehashing
Keys: 13, 15, 26, 6, 23, 24, 7
h(k) = k mod 17

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Consider Key: 13
Hash value : 13 mod 17 (h(k) = k mod 17)
: 13 index
Load Factor α : 1/17 = 0.06

Consider Key: 15
Hash value : 15 mod 17 (h(k) = k mod 17)
: 15 index
Load Factor α : 2/17 = 0.12

Consider Key: 26
Hash value : 26 mod 17 (h(k) = k mod 17)
: 9 index
Load Factor α : 3/17 = 0.18

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Consider Key: 6
Hash value : 6 mod 17 (h(k) = k mod 17)
: 6 index
Load Factor α : 4/17 = 0.24

Consider Key: 23
Hash value : 23 mod 17 (h(k) = k mod 17)
: 6 Collision
Use Linear Probing: (h(k) + i) mod 17 where i=1
: (6+1) mod 17
: 7 index
Load Factor α : 5/17 = 0.29

Consider Key: 24
Hash value : 24 mod 17 (h(k) = k mod 17)
: 7 Collision
Use Linear Probing: (h(k) + i) mod 17 where i=1
: (7+1) mod 17
: 8 index
Load Factor α : 6/17 = 0.35

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Consider Key: 7
Hash value : 7 mod 17 (h(k) = k mod 17)
: 7 Collision
Use Linear Probing: (h(k) + i) mod 17 where i=1
: (7+1) mod 17
: 8 Collision
Use Linear Probing: (h(k) + i) mod 17 where i=2
: (7+2) mod 17
: 9 Collision
Use Linear Probing: (h(k) + i) mod 17 where i=3
: (7+3) mod 17
: 10 index
Load Factor α : 7/17 = 0.41

Advantages:
o If we double the hash table size then the number of collisions will be reduced. So that the performance of
the hash table is increased.
Disadvantages:
o Extra memory is used which makes the technique memory inefficient.

EXTENDIBLE HASHING
o Extendible hashing is a dynamic hashing technique used in computer science and database systems to
efficiently organize and search data.
o Extendible Hashing is a dynamic hashing method wherein directories, and buckets are used to hash data.
o It is an aggressively flexible method in which the hash function also experiences dynamic changes.

The main features in Extendible Hashing technique are:


 Directories: The directories store addresses of the buckets in pointers. An id is assigned to each directory
which may change each time when Directory Expansion takes place.
 Buckets: The buckets are used to hash the actual data.

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Basic Structure of Extendible Hashing:

Frequently used terms in Extendible Hashing:


 Directories: These containers store pointers to buckets. Each directory is given a unique id which may
change each time when expansion takes place. The hash function returns this directory id which is used
to navigate to the appropriate bucket.
Number of Directories = 2^Global Depth.
 Buckets: They store the hashed keys. Directories point to buckets. A bucket may contain more than one
pointer to it if its local depth is less than the global depth.
 Global Depth: It is associated with the Directories. They denote the number of bits which are used by
the hash function to categorize the keys.
Global Depth = Number of bits in directory id.
 Local Depth: It is the same as that of Global Depth except for the fact that Local Depth is associated
with the buckets and not the directories. Local depth in accordance with the global depth is used to decide
the action that to be performed in case an overflow occurs. Local Depth is always less than or equal to the
Global Depth.
 Bucket Splitting: When the number of elements in a bucket exceeds a particular size, then the bucket is
split into two parts.
 Directory Expansion: Directory Expansion Takes place when a bucket overflows. Directory Expansion
is performed when the local depth of the overflowing bucket is equal to the global depth.

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Basic Working of Extendible Hashing:
Step 1 – Analyze Data Elements: Data elements may exist in
various forms Ex: Integer, String, Float, etc., Currently, let us
consider data elements of type integer. Ex: 49.
Step 2 – Convert into binary format: Convert the data element
in Binary form. For string elements, consider the ASCII
equivalent integer of the starting character and then convert the
integer into binary form. Since we have 49 as our data element, its
binary form is 110001.
Step 3 – Check Global Depth of the directory: Suppose the
global depth of the Hash-directory is 3.
Step 4 – Identify the Directory: Consider the ‘Global-Depth’
number of LSBs in the binary number and match it to the
directory id.
Ex: The binary obtained is: 110001 and the global-depth is 3. So,
the hash function will return 3 LSBs of 110001 viz. 001.
Step 5 – Navigation: Now, navigate to the bucket pointed by the
directory with directory-id 001.
Step 6 – Insertion and Overflow Check: Insert the element and
check if the bucket overflows. If an overflow is encountered, go
to Step – 7 followed by Step – 8, otherwise, go to Step – 9.
Step 7 – Tackling Over Flow Condition during Data
Insertion: Many times, while inserting data in the buckets, it
might happen that the Bucket overflows. In such cases, we need to
follow an appropriate procedure to avoid mishandling of data.
First, Check if the local depth is less than or equal to the global
depth. Then choose one of the cases below.
 Case – 1: If the local depth of the overflowing Bucket is
equal to the global depth, then Directory Expansion, as
well as Bucket Split, needs to be performed. Then
increment the global depth and the local depth value by 1.
And, assign appropriate pointers.

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Directory expansion will double the number of directories
present in the hash structure.
 Case – 2: In case the local depth is less than the global
depth, then only Bucket Split takes place. Then increment
only the local depth value by 1. And, assign appropriate
pointers.
Step 8 – Rehashing of Split Bucket Elements: The Elements
present in the overflowing bucket that is split are rehashed w.r.t
the new global depth of the directory.
Step 9 – The element is successfully hashed.

Consider Example on Extendible Hashing:


keys: 16, 4, 6, 22, 24, 10, 31, 7, 9, 20, 26.
Bucket Size: 3 (Assume)
Hash Function: Suppose the global depth is X. Then the Hash Function returns X LSBs.
Solution: First, calculate the binary forms of each of the given keys.

Initially, the global-depth and local-depth is always 1.


Thus, the hashing frame looks like this:

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Inserting 16:
The binary format of 16 is 10000 and global-depth is
1. The hash function returns 1 LSB of 10000 which is
0. Hence, 16 is mapped to the directory with id=0.

Inserting 4 and 6:
Both 4(100) and 6(110) have 0 in their LSB. Hence,
they are hashed as follows:

Inserting 22: The binary form of 22 is 10110. Its


LSB is 0. The bucket pointed by directory 0 is already
full. Hence, Over Flow occurs.

As directed by Step 7 - Case 1:


Since Local Depth = Global Depth, the bucket splits
and directory expansion takes place. Also, rehashing
of numbers present in the overflowing bucket takes
place after the split. And, since the global depth is
incremented by 1, now, the global depth is 2. Hence,
16, 4, 6, 22 are now rehashed w.r.t 2 LSBs.
[16(10000), 4(100), 6(110), 22(10110)]

Note: The bucket which was underflow has remained untouched. But, since the number of directories has
doubled, we now have 2 directories 01 and 11 pointing to the same bucket. This is because the local-depth of

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
the bucket has remained 1. And, any bucket having a local depth less than the global depth is pointed-to by
more than one directories.
Inserting 24 and 10: 24(11000) and 10(1010) can be
hashed based on directories with id 00 and 10. Here,
we encounter no overflow condition.

Inserting 31,7,9: All of these elements [31(11111),


7(111), 9(1001)] have either 01 or 11 in their LSBs.
Hence, they are mapped on the bucket pointed out by
01 and 11. We do not encounter any overflow
condition here.

Inserting 20: Insertion of data element 20 (10100)


will again cause the overflow problem.

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
20 is inserted in bucket pointed out by 00. As directed
by Step 7-Case 1, since the local depth of the
bucket = global-depth, directory expansion
(doubling) takes place along with bucket splitting.
Elements present in overflowing bucket are rehashed
with the new global depth. Now, the new Hash table
looks like this:

Inserting 26: Global depth is 3. Hence, 3 LSBs of


26(11010) are considered. Therefore 26 best fits in
the bucket pointed out by directory 010.

The bucket overflows, and, as directed by Step 7-


Case 2, since the local depth of bucket < Global
depth (2<3), directories are not doubled but, only the
bucket is split and elements are rehashed.
Finally, the output of hashing the given list of
numbers is obtained.

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193
Key Observations:
1) A Bucket will have more than one pointer pointing to it if its local depth is less than the global depth.
2) When overflow condition occurs in a bucket, all the entries in the bucket are rehashed with a new local
depth.
3) If Local Depth of the overflowing bucket
4) The size of a bucket cannot be changed after the data insertion process begins.

Advantages:
1) Data retrieval is less expensive (in terms of computing).
2) No problem of Data-loss since the storage capacity increases dynamically.
3) With dynamic changes in hashing function, associated old values are rehashed w.r.t the new hash
function.

Limitations Of Extendible Hashing:


1) The directory size may increase significantly if several records are hashed on the same directory while
keeping the record distribution non-uniform.
2) Size of every bucket is fixed.
3) Memory is wasted in pointers when the global depth and local depth difference becomes drastic.
4) This method is complicated to code.

Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.


Email: [email protected], Mob: +91-8179700193

You might also like