BCS304 DS Module 5 Notes
BCS304 DS Module 5 Notes
Module 5
Priority Queues: Single and double ended Priority Queues, Leftist Trees
Text Book: Chapter 8: 8.1 to 8.3, Chapter 9: 9.1, 9.2, Chapter 10: 10.1
Need of Hashing?
• Every day, the data on the internet is increasing multifold and it is always a struggle to store this
data efficiently.
• In day-to-day programming, this amount of data might not be that big, but still, it needs to be
stored, accessed, and processed easily and efficiently.
• A very common data structure that is used for such a purpose is the Array data structure.
Components of Hashing
• There are majorly three components of hashing (Refer below figure):
o Key: A Key can be anything string or integer which is fed as input in the hash function that
determines an index or location for storage of an item in a data structure.
o Hash Function: The hash function receives the input key and returns the index of an
element in an array called a hash table. The index is known as the hash index.
o Hash Table: Hash table is a data structure that maps keys to values using a special function
called a hash function. Hash stores the data in an associative manner in an array where each
data value has its own unique index.
Hash Table
• Hash table is one of the most important data structures that uses a special function known as a hash
function that maps a given value with a key to access the elements faster.
• A Hash table is a data structure that stores some information, and the information has basically
two main components, i.e., key and value.
• The hash table can be implemented with the help of an associative array. The efficiency of mapping
depends upon the efficiency of the hash function used for mapping.
• For example, suppose the key value is John and the value is the phone number, so when we pass
the key value in the hash function shown as below:
Hash(key)= index;
When we pass the key in the hash function, then it gives the index.
Hash(john) = 3;
The above example adds the john at the index 3.
“a” = 1,
“b”=2, .. etc, to all alphabetical characters.
• Step 3: Therefore, the numerical value by summation of all characters of the string:
“ab” = 1 + 2 = 3,
“cd” = 3 + 4 = 7 ,
“efg” = 5 + 6 + 7 = 18
• Step 4: Now, assume that we have a table of size 7 to store these strings. The hash function that is
used here is the sum of the characters in key mod Table size. We can compute the location of the
string in the array by taking the sum(string) mod 7.
• Step 5: So we will then store
“ab” in 3 mod 7 = 3,
“cd” in 7 mod 7 = 0, and
“efg” in 18 mod 7 = 4.
• The above technique enables us to calculate the location of a given string by using a simple hash
function and rapidly find the value that is stored in that location.
• Therefore the idea of hashing seems like a great way to store (key, value) pairs of the data in a
table.
4. Efficiently computable.
5. Should uniformly distribute the keys (Each table position is equally likely for each.
6. Should have a low load factor(number of items in the table divided by the size of the table).
h(x) = x % m;
where m is the size of the hash table.
▪ For example, if the key value is 6 and the size of the hash table is 10. When we apply the hash
function to key 6 then the index would be:
h(6) = 6%10 = 6
The index is 6 at which the value is stored.
▪ Example: Calculate the hash values of keys 1234 and 5462.
Solution Setting M = 97, hash values can be calculated as:
h(1234) = 1234 % 97 = 70
h(5642) = 5642 % 97 = 16
Multiplication Method
▪ The steps involved in the multiplication method are as follows:
Step 1: Choose a constant A such that 0 < A < 1.
Step 2: Multiply the key k by A.
Step 3: Extract the fractional part of kA.
Step 4: Multiply the result of Step 3 by the size of hash table (m).
▪ Hence, the hash function can be given as:
h(k) = floor(m (kA mod 1))
▪ Example: Given a hash table of size 1000, map the key 12345 to an appropriate location in the
hash table.
Solution We will use A = 0.618033, m = 1000, and k = 12345
h(12345) = Floor(1000 (12345 ¥ 0.618033 mod 1))
h(12345) = floor(1000 (7629.617385 mod 1))
h(12345) = Floor(1000 (0.617385) )
h(12345) = floor(617.385)
h(12345) = 617
Mid-Square Method
▪ The mid-square method is a good hash function which works in two steps:
Step 1: Square the value of the key. That is, find k2.
Step 2: Extract the middle r digits of the result obtained in Step 1.
▪ The algorithm works well because most or all digits of the key value contribute to the result.
▪ This is because all the digits in the original key value contribute to produce the middle digits of
the squared value.
▪ Therefore, the result is not dominated by the distribution of the bottom digit or the top digit of the
original key value.
▪ In the mid-square method, the same r digits must be chosen from all the keys. Therefore, the
▪ hash function can be given as:
h(k) = s
where s is obtained by selecting r digits from k2.
▪ Example: Calculate the hash value for keys 1234 and 5642 using the mid-square method.
The hash table has 100 memory locations.
Solution: Note that the hash table has 100 memory locations whose indices vary from 0 to 99.
This means that only two digits are needed to map the key to a location in the hash table, so r = 2.
When k = 1234, k2 = 1522756, h (1234) = 27
When k = 5642, k2 = 31832164, h (5642) = 21
Observe that the 3rd and 4th digits starting from the right are chosen.
Folding Method
▪ The folding method works in the following two steps:
▪ Step 1: Divide the key value into a number of parts. That is, divide k into parts k1, k2, ..., kn, where
each part has the same number of digits except the last part which may have lesser digits than the
other parts.
▪ Step 2: Add the individual parts. That is, obtain the sum of k1 + k2 + ... + kn. The hash value is
produced by ignoring the last carry, if any.
▪ Note that the number of digits in each part of the key will vary depending upon the size of the hash
table. For example, if the hash table has a size of 1000, then there are 1000 locations in the hash
table. To address these 1000 locations, we need at least three digits; therefore, each part of the key
must have three digits except the last part which may have lesser digits.
▪ Example: Given a hash table of 100 locations, calculate the hash value using folding method for
keys 5678, 321, and 34567.
▪ Solution: Since there are 100 memory locations to address, we will break the key into parts where
each part (except the last) will contain two digits. The hash values can be obtained as shown below:
5.3. Collisions
▪ The hashing process generates a small number for a big key, so there is a possibility that two keys
could produce the same value.
▪ The situation where the newly inserted key maps to an already occupied, and it must be handled
using some collision handling technology.
o Step 2: Now insert all the keys in the hash table one by one. The first key to be inserted is 12
which is mapped to bucket number 2 which is calculated by using the hash function 12%5=2.
o Step 3: Now the next key is 22. It will map to bucket number 2 because 22%5=2. But bucket 2
is already occupied by key 12.
o Step 5: Now the next key is 25. Its bucket number will be 25%5=0. But bucket 0 is already
occupied by key 25. So separate chaining method will again handle the collision by creating a
linked list to bucket 0.
o Hence In this way, the separate chaining method is used as the collision resolution technique.
Open Addressing
o In open addressing, all elements are stored in the hash table itself. Each table entry contains
either a record or NIL. When searching for an element, we examine the table slots one by one
until the desired element is found or it is clear that the element is not in the table.
a. Linear Probing
o In linear probing, the hash table is searched sequentially that starts from the original location of
the hash. If in case the location that we get is already occupied, then we check for the next
location.
o Algorithm:
I. Calculate the hash key. i.e. key = data % size
II. Check, if hashTable[key] is empty
o store the value directly by hashTable[key] = data
III. If the hash index already has some value then
o check for next index using key = (key+1) % size
IV. Check, if the next index is available hashTable[key] then store the value.
Otherwise try for next index.
V. Do the above process till we find the space.
Example: Let us consider a simple hash function as “key mod 5” and a sequence of keys that are
to be inserted are 50, 70, 76, 85, 93.
• Step1: First draw the empty hash table which will have a possible range of hash values from 0 to
4 according to the hash function provided.
• Step 2: Now insert all the keys in the hash table one by one. The first key is 50. It will map to
slot number 0 because 50%5=0. So insert it into slot number 0.
Step 1 Step 2
• Step 3: The next key is 70. It will map to slot number 0 because 70%5=0 but 50 is already at slot
number 0 so, search for the next empty slot and insert it.
• Step 4: The next key is 76. It will map to slot number 1 because 76%5=1 but 70 is already at slot
number 1 so, search for the next empty slot and insert it.
• Step 5: The next key is 93 It will map to slot number 3 because 93%5=3, So insert it into slot
number 3.
b) Quadratic Probing
• Quadratic probing is an open addressing scheme in computer programming for resolving hash
collisions in hash tables. Quadratic probing operates by taking the original hash index and adding
successive values of an arbitrary quadratic polynomial until an open slot is found.
• An example sequence using quadratic probing is:
H + 12, H + 22, H + 32, H + 42…………………. H + k2
• This method is also known as the mid-square method because in this method we look for i2‘th
probe (slot) in i’th iteration and the value of i = 0, 1, . . . n – 1. We always start from the original
hash location. If only the location is occupied then we check the other slots.
• Let hash(x) be the slot index computed using the hash function and n be the size of the hash
table.
If the slot hash(x) % n is full, then we try (hash(x) + 12) % n.
If (hash(x) + 12) % n is also full, then we try (hash(x) + 22) % n.
If (hash(x) + 22) % n is also full, then we try (hash(x) + 32) % n.
This process will be repeated for all the values of i until an empty slot is found
Example: Let us consider table Size = 7, hash function as Hash(x) = x % 7 and collision resolution
strategy to be f(i) = i2 . Insert = 22, 30, and 50
Step 1: Create a table of size 7.
Step 3: Inserting 50
• Hash(50) = 50 % 7 = 1
• In our hash table slot 1 is already occupied. So, we will search for slot 1+12, i.e. 1+1 = 2,
• Again slot 2 is found occupied, so we will search for cell 1+22, i.e.1+4 = 5,
• Now, cell 5 is not occupied so we will place 50 in slot 5.
c) Double Hashing
• Double hashing is a collision resolving technique in Open Addressed Hash tables. Double
hashing make use of two hash function,
• The first hash function is h1(k) which takes the key and gives out a location on the hash table.
But if the new location is not occupied or empty then we can easily place our key.
• But in case the location is occupied (collision) we will use secondary hash-function h2(k) in
combination with the first hash-function h1(k) to find the new location on the hash table.
• This combination of hash functions is of the form
h(k, i) = (h1(k) + i * h2(k)) % n
where
i is a non-negative integer that indicates a collision number,
k = element/key which is being hashed
n = hash table size.
• Example: Insert the keys 27, 43, 692, 72 into the Hash Table of size 7. where first hash-function
is h1(k) = k mod 7 and second hash-function is h2(k) = 1 + (k mod 5)
• Step 1: Insert 27
27 % 7 = 6, location 6 is empty so insert 27 into 6 slot.
Step 2: Insert 43
43 % 7 = 1, location 1 is empty so insert 43 into 1 slot.
Step 4: Insert 72
72 % 7 = 2, but location 2 is already being occupied and this is a collision.
So we need to resolve this collision using double hashing.
hnew = [h1(72) + i * (h2(72)] % 7
= [2 + 1 * (1 + 72 % 5)] % 7
=5%7
= 5,
Now, as 5 is an empty slot,
so we can insert 72 into 5th slot.
• It helps us in determining the efficiency of the hash function i.e. it tells whether the hash function
which we are using is distributing the keys uniformly or not in the hash table.
Load Factor = Total elements in hash table/ Size of hash table
What is Rehashing?
• As the name suggests, rehashing means hashing again. Basically, when the load factor increases
to more than its predefined value (the default value of the load factor is 0.75), the complexity
increases.
• So to overcome this, the size of the array is increased (doubled) and all the values are hashed again
and stored in the new double-sized array to maintain a low load factor and low complexity.
o Search − A record can be obtained using a hash function by locating the address of the
bucket where the data is stored.
o Update − It supports updating a record once it is traced in the data bucket.
• Extensible hashing: it is a dynamic hashing method, where directories and buckets are used to
hash the data.
• It is flexible method in which hash function also experiences dynamic change.
• Directories: The directories store addresses of bucket pointers & ID are assigned to each
directories which may change each time when directory is expanded.
• Bucket: The buckets are used to hash the actual data into the hash table.
• Create a directory with initial 2 slots, check the last bit with 0 in binary representation, the values
16, 4, 6 in a sequence are having last bit with 0, so added to bucket 0.
• 22 is also having last bit with 0, so divide directory into 4 slots with 2 bit binary and now check
binary representation with last 2 bits and add keys to buckets in order (as shown). Since, 01 and
11 doesnot grow, it is retained with single bucket. (After insertion of 16, 4, 6, 22, 24, 10, 31, 7, 9)
• Now, if we insert 20, 00 slot is already having order 3, further split 2 slots to 3 slots and continue
insertion based on binary values
• All the keys are inserted into hash table. All binary bits having 1 in LSB are not splitted as there
size is not increased.
Result The resulting Data Bucket is of fixed-length. The resulting Data Bucket is of variable-length.
Bucket Challenge of Bucket overflow can arise Bucket overflow can occur very late or
Overflow often depending upon memory size. doesn’t occur at all.
2) Design and develop a program in C that uses Hash Function H:K->L as H(K)=K mod
m(reminder method) and implement hashing technique to map a given key K to the
address space L. Resolve the collision (if any) using linear probing.
#include <stdio.h>
#include <stdlib.h>
#define MAX 10
int create(int);
void linear_prob(int[], int, int);
void display (int[]);
void main() {
int a[MAX],num,key,i;
int ans=1;
printf(" collision handling by linear probing : \n");
for (i=0; i<MAX; i++)
a[i] = -1;
do {
printf("\n Enter the data");
scanf("%4d", &num);
key=create(num);
linear_prob(a,key,num);
printf("\n Do you wish to continue ? (1/0) ");
scanf("%d",&ans);
}
while(ans);
display(a);
}
break;
}
//for(i=0;i<key;i++)
i=0;
while((i<key) && (flag==0)) {
if(a[i] == -1) {
a[i] = num;
flag=1;
break;
}
i++;
}
}
}
5.5. Priority Queues: Single and double ended Priority Queues, Leftist Trees
Example
• Consider an elements to insert 7, 2, 45, 32, and 12 in a priority queue.
• The element with the least value has the highest property. Thus, you should maintain the lowest
element at the front node.
• The above illustration how it maintains the priority during insertion in a queue. But, if you carry
the N comparisons for each insertion, time-complexity will become O(N2).
• For instance, to insert a node consisting of element 45. Here, it will compare element 45 with each
element inside the queue. However, this insertion will cost you O(N). Representation of the linked
queue below displays how it will insert element 45 in a priority queue.
• Deletion (Dequeue): Remove and return the element with the highest priority from the priority
queue.
• Peek: Retrieve the element with the highest priority without removing it from the priority queue.
• Size: Get the number of elements currently stored in the priority queue.
• Clear: Remove all elements from the priority queue, making it empty.
• Update Priority: Change the priority of an existing element in the priority queue.
Insert
• The new element will move to the empty space from top to bottom and left to right and heapify.
Delete
• The maximum/minimum element is the root node which will be deleted based on max/min heap.
(refer above right diagram)
• Example 2: 32, 15, 20, 30, 12, 25, 16
• Common Operations
o Return an element with minimum/Maximum priority
o Insert an element at arbitrary priority
o Delete an element with maximum / minimum priority
• Example: Consider elements to insert: (5, 10), (2, 20), (8, 30), (1,40), (7, 50)
Insert (5, 10)
o Start with an empty priority queue.
o The first element (5, 10) is inserted as the root node since the priority queue is initially
empty.
• Delete min
o The minimum element (1, 40) is deleted from the priority queue. The root node is replaced
with the last node of the heap, and then the last node is deleted.
o After deleting the minimum element, check the child nodes of (7, 50) and place smaller
priority node to the root.
o After adjusting, now check the heap property for node (7,50). Node (7, 50) has 2 child
nodes, apply min heap and move min priority node to root.
Insertion
• Insertion is same as insertion in single ended priority queue
• Since the priority of (60,6) is greater than the priority of the root (10,5), it becomes the right
child of the root.
• After insertion, the priority queue contains six elements: (50,2), (20,3), (10,5), (40,4), (30,8), and
(60,6).
o In a max heap, the operation involves checking whether the value of the node is less than
that of its children, and if so, swapping the node with the larger of its children.
• Repeat step 2 for each of the non-leaf nodes, working your way up the heap. When you reach the
root of the heap, the entire heap should now be a max heap.
• Consider the tree (fig .1) convert to max heap
• Example 3: 3, 4, 8, 11, 13
• The complexity of binary heap 𝑂(𝑛 log 𝑛) is more compared to leftist trees 𝑂(log 𝑛)
0, 𝑖𝑓 𝑥 𝑖𝑠 𝑒𝑥𝑡𝑒𝑟𝑛𝑎𝑙 𝑛𝑜𝑑𝑒
𝑠ℎ𝑜𝑟𝑡𝑒𝑠𝑡(𝑥) = {
1 + min {𝑠ℎ𝑜𝑟𝑡𝑒𝑠𝑡(𝑙𝑒𝑓𝑡_𝑐ℎ𝑖𝑙𝑑(𝑥)), 𝑠ℎ𝑜𝑟𝑡𝑒𝑠𝑡(𝑟𝑖𝑔ℎ𝑡_𝑐ℎ𝑖𝑙𝑑(𝑥)), 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
The number outside each internal node x of above figure is the value of shortest(x)
• Example 2
Proof: (a) From the definition of shortest(x) it follows that there are no external nodes on the first
shortest(x) levels of the leftist tree. Hence, the leftist tree has at least
∑𝑠ℎ𝑜𝑟𝑡𝑒𝑠𝑡(𝑥)
𝑖=1 2𝑖−1 = 2𝑠ℎ𝑜𝑟𝑡𝑒𝑠𝑡(𝑥) − 1 internal nodes.
• Leftist trees are represented with nodes that have the fields left-child, right-child, shortest, and
data.
typedef struct {
int key;
/*---------------*/
} element;
struct leftist {
struct leftist *left_child;
element data;
struct leftist *right_chiId;
int shortest;
};
Definition:
• A min-leftist tree (max leftist tree) is a leftist tree in which the key value in each node is
smaller)than the key values in its children (if any). In other words, a min (max) leftist tree is a
leftist tree that is also a min (max) tree.
• Figure below depicts two min-leftist trees. The number inside a node x is the key of the element
in x and the number outside x is shortest (x). The operations insert, delete min(delete max), and
combine can be performed in logarithmic time using a min (max) leftist tree.
• Examples of Leftist trees computing s(x)
• Merge
• Example 2:
o Step 1: Consider 2 Leftist trees
o Step 2: To apply merge for above 2 leftist trees, Find the minimum root in both leftist trees.
Minimum root is 1 and pass right subtree of min root 1 along with first tree (Apply recursive
call). This process will repeat until any one leftist tree without nodes i.e. reaches NULL.
o Step 3: To apply merge for above 2 leftist trees, Find the minimum root in both leftist trees.
Minimum root is 3 and pass right subtree of min root 3 along with first tree
o Step 4: To apply merge for above 2 leftist trees, Find the minimum root in both leftist trees.
Minimum root is 7 and pass right subtree of min root 7 along with first tree. Here, right subtree
is NULL
o Step 5: Since, right leftist tree is NULL (base condition is reached), return left tree as result
to previous step (Step 4). Attach the result obtained and apply merge concept.
o To apply merge concept, find the shortest(x) value for both trees.
For all x, Shortest(left(x) >= Shortest(right(x)
o Since, Shortest(left(x) >= Shortest(right(x) then add as right child of Minimum root.
After adding result is show in C
o Pass the result to step 2 (Return of recursive call)
o Step 6: Consider the smallest root in step 2, merge result obtained in step 5 with step 2.
o Merge the left subtree of step 2 (remaining part i.e left part - since right subtree is of
step 2 is already processed).
o To apply merge concept, find the shortest(x) value for both trees.
For all x, Shortest(left(x) >= Shortest(right(x)
o Compare the Shortest value of root left subtree i.e. 4 and root of second leftist tree.
Since criteria is satisfied, add second tree (Root 7) as right child of Root 3
o Transfer the resultant leftist tree to step 1, ignoring transferred tree in Step1.
o Step 7: Consider the smallest root in step 1, merge result obtained in step 6 with step 1.
o Merge the left subtree of step 2 (remaining part i.e left part, since right subtree is of
step 1 is already processed).
o To apply merge concept, find the shortest(x) value for both trees.
For all x, Shortest(left(x) >= Shortest(right(x)
o Compare the Shortest value of root left subtree i.e. 1 and root of second leftist tree.
Since criteria is not satisfied, swap left subtree of root 1 and second leftist tree to obtain
final tree.
else{
RetHeap = Merge(root1, root2→right) //Heap order property
FinalRoot = root2;
}
Insert Operation
• Consider the leftist tree
• Step 2: Find the minimum root and pass its right subtree along with 2nd tree for further
• Step 3: Smallest root in both leftist trees is 6, consider its right subtree of 6 along with remaining
leftist tree.
• Step 4: Since right leftist tree (root2) is empty return left subtree as result to the previous step. The
remaining tree and result of step 3 is
• Find the S value to merge these two leftist trees. The S value of left child of smaller root (Root 6)
is not greater the Root of right leftist tree so, swap and the result is
• Step 5: Pass the resultant tree to Step 1 to merge. Find the Shortest value for both leftist trees.
Smaller root is 5, check the Shortest value of Left of Root 5 with result obtained from step 4 i.e
shortest value of Root 6. S(8)>=S(6) so, add Root 6 as right child of 5.
• Time Complexity :
o For insertion Insert O(1) and Merge O(log n) = O(log n)
Initialize Heap
6 2 9 8 3 4 11 18 7 24 1 5
Delete Operation
• Delete the root element. We will get 2 leftist tree and apply merge for these leftist trees
• Complexity will be
o For Deletion O(1) and Merge O(log n) = O(log n)
• Consider an example
Example
• 10, 20, 30 are the keys, and the following are the binary search trees that can be made out from
these keys.
• In the above possible trees (iii) has less cost compared to all other cost. Hence, the tree is called
optimal binary search tree.
Example: refer Module 5 Optimal Binary Search tree document uploaded seperately