Unit-II DS Dictionaries and Hash Tables
Unit-II DS Dictionaries and Hash Tables
Dictionaries: linear list representation, skip list representation, operations - insertion, deletion and searching.
Hash Table Representation: hash functions, collision resolution-separate chaining, open addressing linear
probing, quadratic probing, double hashing, re-hashing, extendible hashing.
o A linear list representation of a dictionary involves using a simple list structure (like an array or a linked
list) to store key-value pairs. Each entry in the list consists of two parts: a unique key and an associated
value.
o Let's understand the implementation step by step:
//node Declaration
struct node {
int key; // Assuming keys are integers
int value; // Assuming values are integers
struct node* next;
};
//Head Declaration
struct node* head=NULL;
//Function Declaration
bool isEmpty();
void insertPair(int,int);
void deletePair(int);
void displayPairs();
void searchPair(int);
//Driver Code
int main(void)
{
int choice, key, value;
while(choice!=5)
{
system("cls");
display();
printf("\n******Dictionary Implementation Using Linked List******");
printf("\n-------------------------------------------------------------------------");
Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.
Email: [email protected], Mob: +91-8179700193
printf("\n1.Insert");
printf("\n2.Delete");
printf("\n3.Display");
printf("\n4.Search");
printf("\n5.Exit");
printf("\nEnter your choice: ");
scanf("%d",&choice);
switch(choice)
{
case 1: printf("\nEnter key and value(pair): ");
scanf("%d %d", &key, &value);
insertPair(key, value);
printf("\npress any key to continue...");
getch();
break;
case 2: if(isEmpty())
printf("\nNo pairs in Dictionaries...:(");
else
{
printf("\nEnter key to delete: ");
scanf("%d", &key);
deletePair(key);
}
printf("\npress any key to continue...");
getch();
break;
case 3: displayPairs();
printf("\npress any key to continue...");
getch();
break;
case 4: if(isEmpty())
printf("\nNo pairs in Dictionaries...:(");
else
{
printf("\nEnter key to search: ");
scanf("%d", &key);
searchPair(key);
}
printf("\npress any key to continue...");
getch();
break;
case 5: printf("\nApplication is Exiting...:(");
exit(0);
default: printf("\nInvalid Option...:(");
printf("\npress any key to continue...");
getch();
}
}
return 0;
}
Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.
Email: [email protected], Mob: +91-8179700193
bool isEmpty()
{
if(head==NULL)
return true;
else
return false;
}
//Insertion Function
void insertPair(int key, int value)
{
struct node* newNode;
struct node* temp1;
struct node* temp2;
newNode = (struct node*)malloc(sizeof(struct node));
newNode->key = key;
newNode->value = value;
newNode->next = NULL;
if(head==NULL)
head=newNode;
else
{
temp1=head;
while(temp1!=NULL)
{
if(temp1->key==key)
{
temp1->value=value;
printf("\nkey=%d updated successfully with value=%d ...:(", key, value);
return;
}
temp2 = temp1;
temp1 = temp1->next;
}
temp2->next = newNode;
}
printf("<key=%d value=%d> pair Appended Successfully...:)", key, value);
}
//Deletion Function
void deletePair(int key)
{
struct node* temp1=head;
struct node* temp2;
//Search Function
void searchPair(int key)
{
struct node* temp = head;
while (temp != NULL)
{
if (temp->key == key)
{
printf("key=%d is associated with value=%d", key, temp->value);
return;
}
temp = temp->next;
}
printf("\nPair with key=%d is not found in Dictionary...:(", key);
}
Output:
a = a → forward[0]
lvl = random_Level()
if lvl > L → level then
for i = L → level + 1 to lvl do
update[i] = L → header
L → level = lvl
Example 2: Consider the above example and search for key 17.
Answer:
Step 3: Initialization
Step 5: Search
To search for a key:
Start from the topmost layer of the head node.
Move forward in the current layer until you find a node with a greater key or reach the end.
If you find a greater key, move down one layer.
Repeat until you reach the bottom layer. If the key matches, return the value; otherwise, the key is not
in the list.
Step 6: Deletion
To delete a key-value pair:
Perform a search for the key, keeping track of the predecessors at each level.
If the key is found, update the pointers of these predecessor nodes to skip the node being deleted.
Remove the node and deallocate its memory.
Step 7: Random Level Generation
The level for a new node is typically determined using a random process. A common method is a coin flip
algorithm: a random level is generated, and as long as a coin flip results in heads (or a random value meets a
certain condition), you increase the level.
Step 8: Traversal
To traverse the Skip List, simply follow the bottom layer from the head node to the end, processing or printing
the values.
Consider Key: 52
Hash value : 52 mod 10
: 2
Consider Key: 91
Hash value : 91 mod 10
: 1
Consider Key: 48
Hash value : 48 mod 10
: 8
Consider Key: 83
Hash value : 83 mod 10
: 3
Search Operation:
Use the same hash function k mod 10 for searching
Search for Key: 67
67 mod 10 = 7 index
Goto index 7 and find the element.
Deletion Operation:
Use the same hash function k mod 10 for deletion
Delete key: 82
82 mod 10 = 2 index
Goto index 2 and delete the key
Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.
Email: [email protected], Mob: +91-8179700193
Collision Resolution
o In hash tables, collision resolution is a method used to handle situations where two or more keys hash to the
same index.
Collision in Hashing
Consider the keys: 24, 19, 32, 44, 58
Hash function: k mod n (n value 6)
Consider Key: 24
Hash value : 24 mod 6
: 0
Consider Key: 19
Hash value : 19 mod 6
: 1
Consider Key: 32
Hash value : 32 mod 6
: 2
Consider Key: 44
Hash value : 44 mod 6
: 2 Collision
SEPARATE CHAINING
o Separate chaining is a widely used method to resolve collisions in hash tables. When two or more
elements are hash to the same location, these elements are represented into a singly-linked list like a
chain.
o Since this method uses extra memory to resolve the collision, therefore, it is also known as open hashing.
How Separate Chaining Works
S. No. Operation Description
1. Hash Table The hash table is an array of a fixed size. Each element of this array
Structure: is a head pointer to a linked list.
2. Hash Function: The hash function takes a key and computes an index in the array.
3. Insertion: o Apply the hash function to the key to get the index.
o Insert the key-value pair at the head of the linked list at this
index.
4. Searching: o Compute the index for the key using the hash function.
o Search through the linked list at that index for the desired key.
5. Deletion: o Find the index using the hash function.
o Search the linked list for the key and remove the corresponding
node.
Consider Key: 32
Hash value : 32 mod 6
: 2
Consider Key: 44
Hash value : 44 mod 6
: 2 Collision
Consider Key: 56
Hash value : 56 mod 6
: 2 Collision
This technique is called open hashing because whatever space is available in the hash table that will be utilized
and extra linked list space is also utilized.
OPEN ADDRESSING
o Open addressing is a collision resolution technique used in hash tables. In open addressing, all elements are
stored directly in the hash table itself.
o When a collision occurs (i.e., two items hash to the same slot), the method seeks to find another slot to
accommodate one of the items using a probing sequence.
o It includes several sub-methods:
a) Linear probing
Description: When a collision occurs, linear probing searches for the next available slot linearly in the table.
How Linear Probing Works
1. Hash Function: Like any hash table, linear probing starts with a hash function that computes an initial
index for a given key.
2. Insertion:
o Compute the hash for the key to find the initial index.
o If the slot at the computed index is empty, insert the item there.
o If the slot is occupied, check the next slot (i.e., move linearly forward) until an empty slot is found.
o If the end of the table is reached, wrap around to the beginning.
Algorithm for insert
function INSERT (hash table, table length, key, hash_key)
index ← hash_value
first_scan ← true
while isEmpty(hash_table, index) = false do
3. Searching:
o Compute the hash for the key to find the initial index.
o If the slot contains a different key, move linearly forward until the key is found or an empty slot is
encountered.
Algorithm for searching
function SEARCH (hash_table, table_length, key, hash_key)
index ← hash_value
first_scan ← true
while hash table(index]!= key do
index ← (index + 1) mod table_length
if index = hash value and first scan = false then
return -1;
first_scan = false;
return index;
4. Deletion:
o Find the item using the search operation.
o Remove the item and mark the slot as deleted (a special marker different from empty and occupied).
o Note: Deletion complicates the search and insert operations because the search must continue past
deleted items, and insert can use slots marked as deleted.
Consider Key: 43
Hash value : 43 mod 10 (k mod 10)
: 3
Consider Key: 72
Hash value : 72 mod 10 (k mod 10)
: 2
Consider Key: 99
Hash value : 99 mod 10 (k mod 10)
: 9 index
Consider Key: 19
Hash value : 19 mod 10 (k mod 10)
: 9 Collision
: (9+1) mod 10 (h(k, i) when i=1)
: 0 index
Consider Key: 82
Hash value : 82 mod 10 (k mod 10)
: 2 Collision
: (2+1) mod 10 h(k, i) when i=1
: 3 Collision
: (2+2) mod 10 h(k, i) when i=2
: 4 Collision
: (2+3) mod 10 h(k, i) when i=3
: 5 Collision
: (2+4) mod 10 h(k, i) when i=4
: 6 index
Consider Key: 3
Hash value : 32 mod 10 (i2 mod 10)
: 9 index
Consider Key: 4
Hash value : 42 mod 10 (i2 mod 10)
: 6 index
Consider Key: 25
Hash value : 252 mod 10 (i2 mod 10)
: 5 index
Consider Key: 6
Hash value : 62 mod 10 (i2 mod 10)
: 6 Collision
: (6+1) mod 10 (h(k, i) when i=1)
: 7 index
Consider Key: 8
Hash value : 82 mod 10 (i2 mod 10)
: 4 Collision
: (4+1) mod 10 (h(k, i) when i=1)
: 5 Collision
: (4+2) mod 10 (h(k, i) when i=2)
: 6 Collision
: (4+3) mod 10 (h(k, i) when i=3)
: 7 Collision
: (4+4) mod 10 (h(k, i) when i=4)
: 8 Collision
: (4+5) mod 10 (h(k, i) when i=5)
: 9 Collision
: (4+6) mod 10 (h(k, i) when i=6)
: 0 Collision
: (4+7) mod 10 (h(k, i) when i=7)
: 1 Collision
: (4+8) mod 10 (h(k, i) when i=8)
: 2 index
QUADRATIC PROBING
Quadratic probing is another method of open addressing used in hash tables to resolve collisions. Unlike linear
probing, where the interval between probes is fixed, quadratic probing uses a quadratic function to calculate the
interval between probes. This approach helps to reduce the clustering problem seen in linear probing.
How Quadratic Probing Works
1. Insertion using Quadratic Probing:
o Use the hash function hash(key) to calculate the initial index for the given key.
o Check if the slot at the calculated initial index is empty.
o If it's empty, place the key-value pair in that slot, and the insertion is complete.
o If the slot at the initial index is occupied (collision occurred), quadratic probing comes into play.
o Initialize a variable i to 1 (for the first iteration). Calculate the next probe index using the quadratic
probing formula:
o next_index = (initial_index + (i2)) % table_size
Consider Key: 16
Hash value : 16 mod 10 (k mod 10)
: 6 index
Consider Key: 33
Hash value : 33 mod 10 (k mod 10)
: 3 index
Consider Key: 18
Hash value : 18 mod 10 (k mod 10)
: 8 index
Consider Key: 27
Hash value : 27 mod 10 (k mod 10)
: 7 index
Consider Key: 62
Hash value : 62 mod 10 (k mod 10)
: 2 Collision
: (2+12) mod 10 (h(k, i) when i=1)
: 3 Collision
: (2+22) mod 10 (h(k, i) when i=2)
: 6 Collision No Guarantee of
: (2+32) mod 10 (h(k, i) when i=3) finding the empty
: 1 Collision slot.
: (2+42) mod 10 (h(k, i) when i=4)
: 8 Collision
: (2+52) mod 10 (h(k, i) when i=5)
: 7 Collision
: (2+62) mod 10 (h(k, i) when i=6)
: 8 Collision
Consider Key: 20
Hash value : 20 mod 11 (h1(k) = k mod 11)
: 9 index
Consider Key: 45
Hash value h1(k) : 45 mod 11 (h1(k) = k mod 11)
: 1 Collision
h2(k) : 8 – (45 mod 8) (h2(k) = 8 – (k mod 8))
: 3
h’(k) : (1+(1*3)) mod 11 (h’(k) = (h1(k) + ih2(k)) mod 11)
: 4 index
Consider Key: 70
Hash value h1(k) : 70 mod 11 (h1(k) = k mod 11)
: 4 Collision
h2(k) : 8 – (70 mod 8) (h2(k) = 8 – (k mod 8))
: 2
h’(k) : (4+(1*2)) mod 11 (h’(k) = (h1(k) + ih2(k)) mod 11)
: 6 index
Consider Key: 56
Hash value h1(k) : 56 mod 11 (h1(k) = k mod 11)
: 1 Collision
h2(k) : 8 – (56 mod 8) (h2(k) = 8 – (k mod 8))
: 8
h’(k) : (1+(1*8)) mod 11 when i = 1
(h’(k) = (h1(k) + ih2(k)) mod 11)
: 9 Collision
h’(k) : (1+(2*8)) mod 11 when i = 2
(h’(k) = (h1(k) + ih2(k)) mod 11)
: 6 Collision
h’(k) : (1+(3*8)) mod 11 when i = 3
Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.
Email: [email protected], Mob: +91-8179700193
(h’(k) = (h1(k) + ih2(k)) mod 11)
: 3 index
REHASHING
o Rehashing is a concept primarily used in computer science and data structures, specifically in the context of
hash tables or hash maps.
o Hash tables are data structures that allow efficient storage and retrieval of key-value pairs. They work by
using a hash function to map keys to specific locations (buckets) in an array, where the associated values are
stored.
o Rehashing is the process of resizing a hash table and redistributing its elements when the current size of the
table no longer efficiently accommodates the number of elements it contains.
o The primary goal of rehashing is to maintain a low load factor, which is the ratio of the number of stored
elements to the total number of buckets in the hash table.
o A low load factor helps ensure that the hash table remains efficient in terms of time complexity for insertion,
retrieval, and deletion operations.
Here's how rehashing typically works:
1. Check the Load Factor: Periodically or after each insertion operation, the hash table checks its load
factor. If the load factor exceeds a predefined threshold (often around 0.7 or 0.8), it indicates that the
table is becoming crowded, and rehashing is needed.
Mr. Mohammed Afzal, Asst. Professor in CSE (AI&ML) Dept.
Email: [email protected], Mob: +91-8179700193
2. Create a New Hash Table: A new, larger hash table (usually with double the number of buckets) is
created. The number of buckets is increased to reduce the load factor and make the table more efficient.
3. Rehashing Process: Each element in the old hash table is rehashed, meaning their keys are mapped to
new bucket positions in the larger table using the updated hash function. This process redistributes the
key-value pairs among the new buckets.
4. Transfer Elements: The key-value pairs are transferred from the old table to the new table based on
their new hash values. This involves copying the data from the old table to the appropriate locations in
the new table.
5. Update References: Any references or pointers to the old hash table are updated to point to the new hash
table.
6. Dispose of the Old Table: Once all elements have been transferred, the old hash table can be deallocated
or discarded.
Consider the following example:
Keys: 13, 15, 26, 6, 23, 24, 7
h(k) = k mod 7
Presumed threshold value for α is 0.85
Consider Key: 13
Hash value : 13 mod 7 (h(k) = k mod 7)
: 6 index
Load Factor α : 1/7 = 0.14
Consider Key: 15
Hash value : 15 mod 7 (h(k) = k mod 7)
: 1 index
Load Factor α : 2/7 = 0.29
Consider Key: 6
Hash value : 6 mod 7 (h(k) = k mod 7)
: 6 Collision
Use Linear Probing: (h(k) + i) mod 7
: (6+1) mod 7
: 0 index
Load Factor α : 4/7 = 0.57
Consider Key: 23
Hash value : 23 mod 7 (h(k) = k mod 7)
: 2 index
Load Factor α : 5/7 = 0.71
Consider Key: 24
Hash value : 24 mod 7 (h(k) = k mod 7)
: 3 index
Load Factor α : 6/7 = 0.86
Here, at this point the α is crossing the threshold value, almost 86% of the hash table is full, and when we
perform more insertion into the table, it requires too many collisions before inserting the new key. This is the
right time to double the hash table and perform rehashing.
Double the hash table size, it will become 14 and take the greater and nearest prime number of it, greater and
nearest prime number of 14 is 17. Make the size of the new hash table as 17, and perform rehashing.
Performing Rehashing
Keys: 13, 15, 26, 6, 23, 24, 7
h(k) = k mod 17
Consider Key: 15
Hash value : 15 mod 17 (h(k) = k mod 17)
: 15 index
Load Factor α : 2/17 = 0.12
Consider Key: 26
Hash value : 26 mod 17 (h(k) = k mod 17)
: 9 index
Load Factor α : 3/17 = 0.18
Consider Key: 23
Hash value : 23 mod 17 (h(k) = k mod 17)
: 6 Collision
Use Linear Probing: (h(k) + i) mod 17 where i=1
: (6+1) mod 17
: 7 index
Load Factor α : 5/17 = 0.29
Consider Key: 24
Hash value : 24 mod 17 (h(k) = k mod 17)
: 7 Collision
Use Linear Probing: (h(k) + i) mod 17 where i=1
: (7+1) mod 17
: 8 index
Load Factor α : 6/17 = 0.35
Advantages:
o If we double the hash table size then the number of collisions will be reduced. So that the performance of
the hash table is increased.
Disadvantages:
o Extra memory is used which makes the technique memory inefficient.
EXTENDIBLE HASHING
o Extendible hashing is a dynamic hashing technique used in computer science and database systems to
efficiently organize and search data.
o Extendible Hashing is a dynamic hashing method wherein directories, and buckets are used to hash data.
o It is an aggressively flexible method in which the hash function also experiences dynamic changes.
Inserting 4 and 6:
Both 4(100) and 6(110) have 0 in their LSB. Hence,
they are hashed as follows:
Note: The bucket which was underflow has remained untouched. But, since the number of directories has
doubled, we now have 2 directories 01 and 11 pointing to the same bucket. This is because the local-depth of
Advantages:
1) Data retrieval is less expensive (in terms of computing).
2) No problem of Data-loss since the storage capacity increases dynamically.
3) With dynamic changes in hashing function, associated old values are rehashed w.r.t the new hash
function.