Dsa
Dsa
Internal and external sorting are two broad categories of sorting techniques, primarily distinguished by how data is managed during the sorting process and the size of data
they are designed to handle.
Key Differences
Data Location Entire dataset fits in main memory (RAM) Dataset exceeds main memory capacity, stored in secondary storage (disk)
Memory Usage Uses only RAM for sorting Uses both RAM (for buffers/chunks) and disk storage
Suitable Dataset Small to medium datasets Large datasets that cannot fit into memory
I/O Operations Minimal; mostly limited to initial read and final write Frequent; involves reading/writing data to/from disk repeatedly
Speed Generally faster due to direct access to data in memory Slower due to overhead of disk access
Quick Sort, Merge Sort, Heap External Merge Sort, Polyphase for(int j = low; j < high; ++j) {
Algorithm
Sort, Bubble Sort, Insertion Merge Sort, Replacement
Examples if(arr[j] < pivot) {
Sort Selection, External Radix Sort
++i;
Simpler implementation, less More complex due to chunk
Complexity
overhead management and merging swap(arr[i], arr[j]) }
int partition(vector<int>& arr, int low, int high) { int readCount = in.gcount() / sizeof(int);
• Merging all sorted chunks into a final sorted file69 #include <vector>
Algorithms: Quick Sort, Heap Sort, Algorithms: External Merge Sort, for (int num : data) {
etc. Polyphase
cout << num << " ";
In conclusion:
Internal sorting is ideal for small datasets that fit in memory, offering speed and }
simplicity. External sorting is necessary for very large datasets, trading off speed
return 0;
for the ability to handle massive data volumes using disk storage358.
}
Applications of Sorting and Searching in Computer Science (with C++
Examples) Binary Search Example (std::binary_search)
Sorting and searching are foundational operations in computer science, enabling cpp
efficient data management, retrieval, and analysis across a wide range of
applications. #include <iostream>#include <vector>#include <algorithm>using namespace
std;
Applications of Sorting
int main() {
• Data Organization: Sorting arranges data for easier access and
management, such as alphabetizing names in a contact list or vector<int> data = {1, 2, 5, 5, 6, 9};int target = 5;
organizing files on a computer57.
// Data must be sorted for binary_search
• Efficient Searching: Many search algorithms (like binary search) bool found = binary_search(data.begin(), data.end(), target);
require sorted data to function efficiently, reducing search time from
linear to logarithmic57. if (found) cout << "Element found!" << endl;
• Data Analysis: Sorting helps in identifying trends, patterns, and else cout << "Element not found." << endl;
outliers, which is crucial in fields like statistics, finance, and scientific
return 0;
research157.
}
• Database Management: Databases use sorting to optimize query
performance, create indexes, and enable rapid data retrieval57. Application
Sorting Role Searching Role
Area
• User Experience: Sorting improves usability in applications such as
e-commerce (product listings), social media feeds, music playlists, Index creation, query Record lookup, key-based
and email management1. Databases
optimization retrieval
• Canonicalization and Output: Sorted data is easier to read and Trend/pattern/outlier Finding specific values or
compare, useful in reporting and data export2. Data Analysis
identification records
Merge Sort and Heap Sort are generally considered the best in terms of worst- while(i < n1 && j < n2) {
case time complexity, both achieving O(nlogn)O(n \log n)O(nlogn) performance
regardless of input data134. However, each has its own trade-offs: if(L[i] <= R[j])
else
o Time Complexity: Always O(nlogn)O(n \log n)O(nlogn),
regardless of the input134. arr[k++] = R[j++];
o Space Complexity: Requires O(n)O(n)O(n) extra space for }
merging34.
while(i < n1)
o Stability: Stable (preserves the order of equal elements).
arr[k++] = L[i++];
o Use Case: Preferred when stability is required, such as in
sorting records by multiple fields. while(j < n2)
}
o Time Complexity: Always O(nlogn)O(n \log n)O(nlogn)3.
o Use Case: Suitable for memory-constrained environments int mid = left + (right - left) / 2;
where stability is not necessary.
mergeSort(arr, left, mid);
Quick Sort is often the fastest in practice due to low overhead and cache
efficiency, but its worst-case complexity is O(n2)O(n^2)O(n2), which can be mergeSort(arr, mid + 1, right);
problematic for certain input patterns135. With good pivot selection (like
merge(arr, left, mid, right);
randomized or median-of-three), the average case is O(nlogn)O(n \log
n)O(nlogn), making it a popular choice for general-purpose sorting in C++ (e.g., }
std::sort uses a variant of Quick Sort)5.
}
Why Merge Sort is Often Considered Best (Theoretical Perspective)
return 0;
} Case Time Complexity Space Complexity Notes
Conclusion
Worst O(logn)O(\log Poor pivot choices
O(n2)O(n^2)O(n2)
Case n)O(logn) (e.g., sorted data)56
• Merge Sort is the best sorting algorithm based on complexity for large
datasets because it guarantees O(nlogn)O(n \log n)O(nlogn) time in
all scenarios and is stable, making it suitable for many practical • Best/Average Case:
applications134. Achieved when the pivot divides the array into nearly equal halves56.
• Heap Sort is also optimal in terms of time and space but is not stable. • Worst Case:
Occurs when the pivot is always the smallest or largest element,
• For small datasets or when average performance is prioritized, Quick leading to unbalanced partitions (e.g., already sorted data)56.
Sort is often used, but its worst-case performance can be a
drawback135. • Space Complexity:
Only the recursion stack is used; no additional arrays are required57.
In summary:
Choose Merge Sort when you need guaranteed performance and stability, Heap C++ Implementation Example
Sort for in-place sorting without stability, and Quick Sort for practical speed on
cpp
average, with caution for worst-case scenarios.
#include <iostream>
Quick Sort: Explanation, Usage, Complexity, and C++ Example
using namespace std;
What is Quick Sort?
• Efficiency:
Quick Sort is generally faster in practice than other O(nlogn)O(n \log // QuickSort function
n)O(nlogn) algorithms like Merge Sort and Heap Sort due to its in-
place sorting and cache efficiency56. void quickSort(int arr[], int low, int high) {
Example Run
o Sorted array: {170, 90, 802, 2, 24, 45, 75, 66}
Summary Table o Sorted array: {2, 24, 45, 66, 75, 90, 170, 802}
Worst O(n2)O(n^2)O(n2)
// Function to get the largest element from an array
Space O(logn)O(\log n)O(logn)
int getMax(int arr[], int n) {
Practical Use Very fast, widely used
int max = arr[0];
Conclusion for (int i = 1; i < n; i++)
Quick Sort is a powerful, efficient, and widely-used sorting algorithm in C++. Its if (arr[i] > max)
divide-and-conquer approach, in-place sorting, and fast average-case
performance make it a top choice for many real-world applications, despite its max = arr[i];
worst-case scenario. Using randomized or median-of-three pivot selection can
help avoid the worst-case and ensure robust performance56. return max;
Radix Sort is a non-comparative sorting algorithm that sorts numbers by // Using counting sort to sort elements based on significant places
processing individual digits. It works from the least significant digit (LSD) to the
void countSort(int arr[], int n, int place) {
most significant digit (MSD), using a stable sub-sorting algorithm (commonly
Counting Sort) at each digit position. It is especially efficient for sorting integers const int max = 10;
and can outperform comparison-based algorithms when the number of digits is
small relative to the number of elements. int output[n];
text
// Build the output array
RadixSort(array, n)
for (int i = n - 1; i >= 0; i--) {
maxNum = find maximum number in array
output[count[(arr[i] / place) % 10] - 1] = arr[i];
for place = 1; maxNum/place > 0; place *= 10
count[(arr[i] / place) % 10]--;
} int n = sizeof(arr) / sizeof(arr[0]);
for (int i = 0; i < n; i++) o Best, Average, and Worst: O(d⋅(n+k))O(d \cdot (n +
k))O(d⋅(n+k)) where ddd is the number of digits in the
cout << arr[i] << " ";
maximum number and kkk is the base (10 for decimal
cout << endl; numbers)68.
} • Space Complexity:
Summary
• Radix Sort is efficient for sorting integers, especially when the range 3. Recursive Calls:
of digits is not significantly larger than the number of elements. Recursively apply merge sort to the left and right halves.
4. Merge:
• It processes digits from least significant to most significant, using a
Merge the two sorted halves into a single sorted array1368.
stable sort at each step.
Example Dry Run
• It is non-comparative, stable, and has linear time complexity relative
to the number of digits and elements. Consider the array: [38, 27, 43, 3, 9,Step 1: Divide** [38, 27, 43, 3, 9,d [3,
Radix Sort is particularly useful for sorting large lists of numbers where • Step 2: Further Divide
comparison-based sorts (like Quick Sort or Merge Sort) are less efficient due → and [27, [27,and
to their O(nlogn)O(n \log n)O(nlogn) complexity. [3, → [3] and `[3]` → `[3]` and
`[82,and
Merge Sort Algorithm: Explanation, Example, and C++ Code
int i = 0, j = 0, k = left;
if (L[i] <= R[j]) { int arr[] = {38, 27, 43, 3, 9, 82, 10};
arr[k] = R[j];
} printArray(arr, n);
return 0;
} Summary
• Merge Sort divides the array into halves, sorts each half recursively,
and merges them.
// Merge Sort function
void mergeSort(int arr[], int left, int right) { • It is efficient, stable, and guarantees O(nlogn)O(n \log n)O(nlogn)
time complexity.
if (left < right) {
• The merge step is the key operation, combining two sorted arrays into
int mid = left + (right - left) / 2; one sorted array13568.
// Sort first and second halves In conclusion:
Merge Sort is a robust, efficient, and widely-used sorting algorithm in C++, ideal
mergeSort(arr, left, mid);
for large datasets and applications requiring stable sorting.
mergeSort(arr, mid + 1, right);
Citations:
// Merge the sorted halves
Bubble Sort, Insertion Sort, and Selection Sort: Explanation, Example,
merge(arr, left, mid, right); Complexity, and C++ Code
} 1. Bubble Sort
} Explanation:
Bubble Sort compares adjacent elements in the array and swaps them if they are
in the wrong order. This process is repeated for all elements until the array is
sorted. After each pass, the largest unsorted element "bubbles up" to its correct
// Utility function to print the array position at the end of the array235.
void printArray(int arr[], int n) { Example:
Given array: 5, 3, 8, 4, 2
• Pass 1: (5,3)→swap → 3,5,8,4,2; (5,8)→ok; (8,4)→swap → 3,5,4,8,2; cpp
(8,2)→swap → 3,5,4,2,8
#include <iostream>
• Pass 2: (3,5)→ok; (5,4)→swap → 3,4,5,2,8; (5,2)→swap → 3,4,2,5,8 using namespace std;
arr[j + 1] = arr[j];
} insertionSort(arr, n);
for (int i = 0; i < n; i++) cout << arr[i] << " ";
Example:
• Average/Worst: O(n2)O(n^2)O(n2)
Given array: 5, 3, 8, 4, 2
• Space: O(1)O(1)O(1) (in-place)235
• Step 1: Find min (2), swap with 5 → 2,3,8,4,5
2. Insertion Sort
• Step 2: Min is 3 (already in place) → 2,3,8,4,5
Explanation:
Insertion Sort builds the sorted array one element at a time. It picks the next • Step 3: Min is 4, swap with 8 → 2,3,4,8,5
element and inserts it into its correct position among the previously sorted
elements. • Step 4: Min is 5, swap with 8 → 2,3,4,5,8
} if (arr[i] == target)
for (int i = 0; i < n; i++) cout << arr[i] << " "; int n = sizeof(arr) / sizeof(arr[0]);
• Best/Average/Worst: O(n2)O(n^2)O(n2) cout << "Element found at index: " << result << endl;
else
• Space: O(1)O(1)O(1) (in-place)
cout << "Element not found." << endl;
Summary Table
return 0;
Best Average Worst
Algorithm Space Stable Method
Case Case Case }
Adjacent Example:
Bubble Sort O(n) O(n²) O(n²) O(1) Yes Array: {4, 6, 1, 2, 5, 3}
swap
Target: 4
Insertion Insert Output: Element found at index: 0 (first position)24.
O(n) O(n²) O(n²) O(1) Yes
Sort element
Complexity:
Selection
O(n²) O(n²) O(n²) O(1) No Select min • Best case: O(1)O(1)O(1) (first element)
Sort
• Worst/Average case: O(n)O(n)O(n)
Conclusion:
Advantages:
• Bubble Sort is simple but inefficient for large datasets235.
• Simple and easy to implement245.
• Insertion Sort is efficient for small or nearly sorted arrays.
• Works on both sorted and unsorted arrays.
• Selection Sort is conceptually simple but generally outperformed by
insertion sort. • No extra memory required.
All three are mainly used for educational purposes and small
datasets. Limitations:
int binarySearch(int arr[], int n, int target) { Use Case Small/unsorted data Large/sorted data
if (arr[mid] == target) • Binary Search repeatedly divides the search interval in half, requiring
sorted data and offering much better performance for large datasets.
return mid;
int n = sizeof(arr) / sizeof(arr[0]); o Use Counting Sort to sort the array based on the current
digit.
int target = 7;
3. Repeat:
int result = binarySearch(arr, n, target); Repeat the process for all digit places until the most significant digit.
cout << "Element found at index: " << result << endl; cpp
cout << "Element not found." << endl; using namespace std;
return 0;
Advantages: }
• Time complexity grows logarithmically. // Using counting sort to sort the elements based on significant places
for (int i = 0; i < size; i++) Given array: {170, 45, 75, 90, 802, 24, 2, 66}
output[count[(array[i] / place) % 10] - 1] = array[i]; • Best, Average, Worst: O(d⋅(n+k))O(d \cdot (n + k))O(d⋅(n+k)), where
ddd is the number of digits, nnn is the number of elements, and kkk is
count[(array[i] / place) % 10]--; the range of the digit (10 for decimal numbers)45.
Summary
array[i] = output[i]; • It is efficient for sorting integers, especially when the number of digits
is less than the number of elements.
}
• It avoids direct element comparisons, making it faster than A Binary
Search Tree (BST) is a binary tree data structure in which each node
// Main function to implement radix sort has at most two children, and for every node:
void radixSort(int array[], int size) { • All values in the left subtree are less than the node’s value.
int max = getMax(array, size); • All values in the right subtree are greater than the node’s value.
cout << array[i] << " "; 4. Repeat steps 2-3 until you reach a null pointer (empty spot).
cout << endl; 5. Insert the new value as a leaf node at this position67.
} Example
o Delete the successor node (which will have at most one root->data = temp->data;
child)69.
root->right = deleteNode(root->right, temp->data);
Complete Example Program
}
cpp
return root;
#include <iostream>
}
using namespace std;
}
Node* insert(Node* root, int key) {
node = node->left;
cout << "Inorder traversal: ";
return node;
inorder(root);
}
cout << endl;
return temp;
root = deleteNode(root, 50);
}
cout << "After deleting 50: ";
else if (root->right == NULL) {
inorder(root);
Node* temp = root->left;
cout << endl;
delete root;
return temp;
return 0; • Complex Implementation: Insertion and deletion operations become
more complex due to the need to maintain thread pointers69.
}
Summary Table • Limited Use Cases: Mainly beneficial for traversal; not as widely used
as standard binary trees.
Operation Steps Time Complexity (avg/worst)
• Overhead: Additional logic is required to distinguish between child
Insertion Traverse, compare, insert O(log n) / O(n) pointers and thread pointers.
• BSTs are widely used due to their efficient average-case performance • Single Threaded: Only left or right NULL pointers are replaced with
and clear structure for ordered data689. threads (usually right).
A thread in computer science generally refers to a lightweight process or a • Double Threaded: Both left and right NULL pointers are replaced with
sequence of executable instructions within a program that can run threads (to predecessor and successor, respectively)6.
independently and concurrently with other threads145. However, in the context
of trees (specifically, binary trees), a thread has a different meaning: it refers to a Example: Threaded Binary Tree Insertion and Traversal
special pointer used to make tree traversal more efficient, particularly for in-
order traversal, by replacing some NULL pointers with pointers to in-order Node Structure in C++
predecessor or successor nodes69.
cpp
Advantages and Disadvantages of Threads
class Node {
General Multithreading (Software Threads)
public:
Advantages:
int key;
• Improved Performance and Concurrency: Threads allow multiple Node *left, *right;
operations to run in parallel, making better use of CPU resources and
improving program responsiveness35811. bool leftThread, rightThread;
• Resource Sharing: Threads within the same process share memory Node(int val) : key(val), left(nullptr), right(nullptr), leftThread(true),
and resources, enabling efficient communication18. rightThread(true) {}
};
• Better Responsiveness: Useful for interactive applications, as one
thread can handle user input while others perform background Insertion (Right Threaded Example)
tasks58.
cpp
• Simplified Modeling: Natural fit for tasks that can be performed
concurrently, such as handling multiple clients in a server811. Node* insert(Node* root, int key) {
Disadvantages: if (!ptr->rightThread)
ptr = ptr->right; Software Threads Threads in Trees (Threaded
Aspect
(Multithreading) Binary Trees)
else
In-Order Traversal Without Recursion or Stack • Left Thread: If a node’s left child is NULL, its left pointer is used to
point to its in-order predecessor.
cpp
void inorder(Node* root) { • Single Threaded: Only one of the above (usually right) is
implemented.
Node* cur = root;
• Double Threaded: Both left and right threads are implemented27.
while (cur != nullptr && !cur->leftThread)
A boolean flag in each node indicates whether the pointer is a traditional child
cur = cur->left; link or a thread2.
} Example
} Consider a BST:
} text
Summary Table 20
/ \
10 30
\ Feature Standard BST Threaded BST
40
Null pointers Many Replaced by threads
In-order Traversal: 10, 20, 30, 40
In-order traversal Needs recursion/stack No recursion/stack needed
• In a threaded BST:
Memory use Less efficient More efficient (no wasted pointers)
o The right pointer of 10 (which would be NULL) points to 20
(its in-order successor). Traversal speed Slower (extra space) Faster (constant space)
This setup allows you to traverse the tree in-order by simply following child
pointers and threads, without recursion or stack257. What is an AVL Tree?
C++ Example: Node Structure and In-Order Traversal An AVL tree is a self-balancing binary search tree (BST), named after its inventors
Adelson-Velsky and Landis1235. In an AVL tree, the heights of the two child
cpp subtrees of any node differ by at most one. If at any time they differ by more than
one, rebalancing is done to restore this property. This ensures that the tree
#include <iostream> remains approximately balanced, guaranteeing O(logn)O(\log n)O(logn) time
complexity for search, insertion, and deletion operations2357.
using namespace std;
Balance Factor:
For any node,
class Node {
Balance Factor=Height of Left Subtree−Height of Right Subtree\text{Balance
public: Factor} = \text{Height of Left Subtree} - \text{Height of Right
Subtree}Balance Factor=Height of Left Subtree−Height of Right Subtree
int key;
The balance factor must be -1, 0, or +1 for all nodes in an AVL tree5.
Node *left, *right;
How to Insert a Node into an AVL Tree
bool leftThread, rightThread;
Insertion Steps
Node(int val) : key(val), left(nullptr), right(nullptr), leftThread(true),
rightThread(true) {} 1. Standard BST Insertion:
Insert the new node as you would in a standard BST3689.
};
2. Update Heights:
Update the height of each ancestor node.
o If balance < -1 and key > right child key: Left rotation (RR) // 1. Perform normal BST insertion
o If balance > 1 and key > left child key: Left-Right rotation if (!node) return new Node(key);
(LR)
if (key < node->key)
o If balance < -1 and key < right child key: Right-Left rotation
node->left = insert(node->left, key);
(RL)689.
else if (key > node->key)
C++ Code for AVL Tree Insertion
node->right = insert(node->right, key);
cpp
else // Duplicate keys not allowed
#include <iostream>
return node;
#include <algorithm>
};
// 4. Balance the node if needed
int height(Node* n) {
// Left Left Case
return n ? n->height : 0;
if (balance > 1 && key < node->left->key)
}
return rightRotate(node);
int getBalance(Node* n) {
// Right Right Case
return n ? height(n->left) - height(n->right) : 0;
if (balance < -1 && key > node->right->key)
}
return leftRotate(node);
Node* rightRotate(Node* y) {
// Left Right Case
Node* x = y->left;
if (balance > 1 && key > node->left->key) {
Node* T2 = x->right;
node->left = leftRotate(node->left);
x->right = y;
return rightRotate(node);
y->left = T2;
}
y->height = max(height(y->left), height(y->right)) + 1;
return leftRotate(node);
Node* leftRotate(Node* x) {
}
Node* y = x->right;
Node* T2 = y->left;
return node;
y->left = x;
}
x->right = T2;
int main() { • Compare the target key with the current node's key:
root = insert(root, 30); • Repeat until the node is found or a null pointer is reached (not found).
cout << "Inorder traversal of the AVL tree: "; Node *left, *right;
return 0;
• An AVL tree is a self-balancing BST where the height difference return root;
(balance factor) between left and right subtrees is at most 1 for every
node1235. if (key < root->key)
• This guarantees efficient operations with O(logn)O(\log n)O(logn) return search(root->right, key);
time complexity.
}
References:
1235689 Usage:
Call search(root, key). Returns pointer to the node if found, or nullptr if not
Citations: found146.
2. https://herovired.com/learning-hub/blogs/avl-tree/ Algorithm:
6. https://www.scholarhat.com/tutorial/datastructures/avl-tree-in-data- o If the node has two children, find its in-order successor
structures (smallest in right subtree), copy its value, and delete the
successor.
7. https://www.wscubetech.com/resources/dsa/avl-tree
2. Update Heights:
8. https://www.tutorialspoint.com/data_structures_algorithms/avl_tree_
After deletion, update the height of each ancestor node.
algorithm.htm
3. Balance the Tree:
9. https://ebooks.inflibnet.ac.in/csp01/chapter/insertion-and-deletion-
Check the balance factor for each ancestor:
avl-trees/
o If unbalanced (balance factor > 1 or < -1), perform
appropriate rotations:
Answer from Perplexity: https://www.perplexity.ai/search/summarize-what-are-
the-differe-61ecHbEzRQWNf4W.Ywftlg?login-source=signupButton&login- ▪ Left Left (LL): Right rotation.
new=false&utm_source=copy_output
▪ Right Right (RR): Left rotation.
Search and Deletion Operations in AVL Tree (C++)
▪ Left Right (LR): Left rotation on left child, then
Overview: right rotation.
An AVL tree is a self-balancing binary search tree where the difference in heights ▪ Right Left (RL): Right rotation on right child, then
(balance factor) between left and right subtrees is at most one for every node. left rotation246.
C++ Example: // Node with one child or no child
} temp = root;
root = nullptr;
} delete temp;
} else {
y->left = T2; }
return x;
} // Update height
Node* leftRotate(Node* x) {
y->left = x;
return y;
} // Left Right
current = current->left;
return leftRotate(root);
Usage: cpp
Call deleteNode(root, key) to delete a node and maintain AVL balance1246.
#include <iostream>
Summary Table
using namespace std;
Time
Operation Steps Rotations
Complexity
class BTreeNode {
Search Compare & traverse left/right None O(log n)
public:
BST delete, update heights, LL, RR, LR,
Deletion O(log n) int *keys;
balance RL
int t; // Minimum degree
Conclusion
BTreeNode **C;
• Search in an AVL tree is identical to a standard BST, always
O(logn)O(\log n)O(logn) due to balancing347. int n;
• Deletion involves standard BST deletion followed by updating heights bool leaf;
and rebalancing using rotations to maintain the AVL property246.
• Both operations are efficient and guarantee logarithmic time due to BTreeNode(int t1, bool leaf1);
the strict balancing of AVL trees.
void traverse();
Citations:
BTreeNode *search(int k);
1. https://github.com/KhaledAshrafH/AVL-Tree
};
2. https://www.tutorialspoint.com/cplusplus-program-to-implement-
avl-tree
4. https://github.com/KadirEmreOto/AVL-Tree t = t1;
Answer from Perplexity: https://www.perplexity.ai/search/summarize-what-are- // Insert, search, and traversal methods would be implemented here
the-differe-61ecHbEzRQWNf4W.Ywftlg?login-source=signupButton&login-
Conclusion:
new=false&utm_source=copy_output
B-Trees provide an efficient, balanced structure for organizing and accessing
6. Short Notes on B-Tree large datasets, especially when disk I/O is a concern. Their ability to maintain
balance and allow multiple keys per node makes them ideal for database and
A B-Tree is a self-balancing search tree in which each node can contain multiple filesystem implementations158.
keys and can have more than two children. It is widely used in databases and file
systems to efficiently manage large blocks of data that cannot fit entirely in 7. General Tree and Conversion to Binary Tree
memory.
General Tree
Key Properties of B-Tree158:
A General Tree is a hierarchical data structure where each node can have any
number of children, making it highly flexible for representing complex
• Order: If the order of the B-tree is nnn, each node can have at most
relationships (e.g., file systems, organizational charts)69.
nnn children and n−1n-1n−1 keys.
Characteristics:
• Balanced: All leaves are at the same depth (height).
• No restriction on the number of children per node.
• Node Capacity: Each node (except root) must have at least ⌈n/2⌉\lceil
n/2 \rceil⌈n/2⌉ children. • Nodes can have zero or more children.
• Root: The root must have at least 2 children if it is not a leaf. • Used for representing hierarchical data with variable branching.
• Key Order: Keys in each node are stored in increasing order. C++ Example Structure:
• Suitable for systems that read and write large blocks of data.
class GenTreeNode {
public: BinTreeNode* curr = bRoot->left;
}; }
Purpose: }
To represent a general tree using a binary tree structure, which simplifies storage
and traversal using standard binary tree algorithms. Example Illustration:
1. Left-Child: text
For each node, keep its first (leftmost) child as the left child in the
A
binary tree.
├── B
2. Right-Sibling:
For each node, link its immediate right sibling as the right child in the ├── C
binary tree.
└── D
3. Remove Other Children:
All other children (other than the first) are linked as a chain through the After conversion:
right child pointers.
text
Result:
Each node in the binary tree has at most two children: A
• The left child points to its first child in the general tree. /
B
• The right child points to its next sibling.
\
C++ Example:
C
cpp
\
// General Tree Node
D
class GenTreeNode {
GenTreeNode(int val) : data(val) {} Summary Table: General Tree vs. Binary Tree69
};
• B-Trees are balanced, multi-way search trees ideal for large-scale
storage and retrieval.
// Conversion function • General Trees allow any number of children per node; they can be
systematically converted to binary trees using the left-child, right-
BinTreeNode* convertToBinary(GenTreeNode* root) { sibling method for easier processing and storage7.
BinTreeNode* bRoot = new BinTreeNode(root->data); Huffman Algorithm: Explanation and C++ Example
3. While there is more than one node in the heap: Node *newNode = new Node('\0', left->frequency + right->frequency);
o Remove the two nodes with the lowest frequency. newNode->left = left;
newNode->right = right;
o Create a new internal node with these two nodes as
children and frequency equal to the sum of their pq.push(newNode);
frequencies.
}
o Insert the new node back into the min-heap.
return pq.top();
4. The remaining node is the root of the Huffman Tree.
}
5. Traverse the tree to assign codes: left edge as '0', right edge as '1'.
Example
// Generate Huffman Codes
Suppose the input is:
A:5, B:9, C:12, D:13, E:16, F:45 void generateCodes(Node* root, const string& str, unordered_map<char,
string>& huffmanCode) {
• The most frequent character (F) gets the shortest code.
if (!root) return;
• The least frequent (A) gets the longest code.
if (!root->left && !root->right) {
C++ Implementation
huffmanCode[root->character] = str;
cpp
}
#include <iostream>
generateCodes(root->left, str + "0", huffmanCode);
#include <queue>
generateCodes(root->right, str + "1", huffmanCode);
#include <unordered_map>
}
#include <vector>
#include <string>
int main() {
using namespace std;
unordered_map<char, int> freq = {{'A', 5}, {'B', 9}, {'C', 12}, {'D', 13}, {'E', 16}, {'F',
45}};
struct Node {
Node(char c, int f) : character(c), frequency(f), left(nullptr), right(nullptr) {} cout << "Huffman Codes:\n";
cout << pair.first << ": " << pair.second << endl;
} text
}; Huffman Codes:
F: 0
B: 1101
};
• A (least frequent) has the longest code: 1100
Searching in an M-way Search Tree
• Each code is a unique prefix, so the encoding is unambiguous.
Algorithm:
Key Points
1. At each node, compare the target value with the keys in the node.
• Time Complexity: O(nlogn)O(n \log n)O(nlogn), where nnn is the
number of unique characters4. 2. If the value matches a key, return success.
1. https://www.programiz.com/dsa/huffman-coding int i = 0;
1. Search for the correct leaf node where the new key should be inserted.
Answer from Perplexity: https://www.perplexity.ai/search/summarize-what-are-
2. If the node has fewer than m-1 keys, insert the key at the correct
the-differe-61ecHbEzRQWNf4W.Ywftlg?login-source=signupButton&login-
position.
new=false&utm_source=copy_output
3. If the node is full, split the node:
M-way Search Tree: Definition, Operations, and C++ Implementation
What is an M-way Search Tree? o Promote the median key to the parent.
An m-way search tree (or multi-way search tree) is a generalization of the binary o Split the node into two nodes, distributing keys and
search tree (BST) where each node can have up to m children and contains up to children.
m-1 keys. The keys within each node are kept in sorted order, and the children
pointers partition the key space so that:
o If the parent is also full, split recursively up to the root,
possibly creating a new root25610.
• All keys in the first child are less than the first key, C++ Code (Simplified, without splitting for brevity):
• Keys in the ith child are between the *(i-1)*th and ith key, cpp
• All keys in the last child are greater than the last key1356. void insert(Node* &root, int key) {
This structure reduces the height of the tree, making search, insertion, and if (!root) {
deletion more efficient, especially for large datasets.
root = new Node();
Structure of an M-way Search Tree Node (C++ Example)
root->keys[0] = key;
cpp
root->count = 1;
const int M = 4; // Example: 4-way search tree
return;
}
struct Node {
int i = 0;
int count; // Number of keys in the node
while (i < root->count && key > root->keys[i])
i++; root->keys[i] = succ->keys[0];
if (root->count < M - 1) { }
for (int j = root->count; j > i; --j) // Rebalancing logic would go here (not shown for brevity)
root->keys[i] = key; Note: Full implementation would handle rebalancing after deletion.
} Conclusion
Note: Full implementation would include node splitting when a node is full. • An m-way search tree is a generalization of BSTs where each node
can have up to m children and m-1 keys.
Deletion in an M-way Search Tree
• Searching involves comparing keys and following the correct child
Algorithm:
pointer.
1. Search for the key to be deleted.
• Insertion adds keys to leaf nodes or splits nodes as needed.
2. If the key is in a leaf node, remove it directly.
• Deletion removes keys and may require rebalancing.
3. If the key is in an internal node:
• These trees are foundational for efficient large-scale data storage,
o Replace it with either its in-order predecessor (largest in left
such as in database indices and file systems12356.
subtree) or successor (smallest in right subtree), and then
delete that value from the child node. Huffman Coding: Description, Importance in Data Structures, and C++
Example
4. If a node falls below the minimum number of keys, borrow a key from a
sibling or merge nodes as needed to maintain tree properties68. What is Huffman Coding?
C++ Code (Simplified): Huffman coding is a lossless data compression algorithm that assigns
variable-length codes to input characters, with shorter codes for more frequent
cpp
characters and longer codes for less frequent ones1689. It is a greedy algorithm
void deleteKey(Node* &root, int key) { that builds an optimal prefix code-meaning no code is a prefix of another-
ensuring unambiguous decoding68. Huffman coding is widely used in file
if (!root) return; compression (ZIP, GZIP), image and audio compression (JPEG, MP3), and
network data transmission12310.
int i = 0;
How Huffman Coding Works
while (i < root->count && key > root->keys[i])
1. Frequency Calculation:
i++; Count the frequency of each character in the input data568.
// Key found o Create a leaf node for each character, storing its frequency.
root->keys[j] = root->keys[j + 1]; ▪ Remove the two nodes with the lowest
frequencies.
root->count--;
▪ Create a new internal node with these two as
} else { children; its frequency is the sum of their
frequencies.
// Internal node: find successor and replace
▪ Insert the new node back into the queue.
Node* succ = root->children[i + 1];
while (succ->children[0])
o The remaining node is the root of the Huffman tree569.
3. Code Assignment:
succ = succ->children[0];
o Traverse the tree from root to leaves. if (!root) return;
o Assign '0' for a left edge and '1' for a right edge. if (!root->left && !root->right) huffmanCode[root->ch] = code;
o The code for each character is the sequence of 0s and 1s generateCodes(root->left, code + "0", huffmanCode);
along the path from root to that character456.
generateCodes(root->right, code + "1", huffmanCode);
4. Encoding and Decoding:
}
o Replace each character in the original data with its code to
compress.
int main() {
o For decompression, traverse the Huffman tree according to
the bit sequence until a leaf is reached, then output the // Example frequencies
corresponding character47.
unordered_map<char, int> freq = {{'A', 5}, {'B', 9}, {'C', 12}, {'D', 13}, {'E', 16}, {'F',
Importance in Data Structures 45}};
#include <queue> }
#include <vector>
Node *left, *right; cout << pair.first << ": " << pair.second << endl;
}; }
Sample Output:
} D: 101
}; A: 1100
B: 1101
void generateCodes(Node* root, string code, unordered_map<char, string>& Summary Table: Huffman Coding
huffmanCode) {
Feature Description adj[v].push_back(u); // For undirected graph
}
Type Lossless compression, greedy algorithm
const vector<int>& neighbors(int u) const { return adj[u]; }
Data Structure
Binary tree (Huffman tree), priority queue (min-heap)
Used int size() const { return V; }
Conclusion cpp
Huffman coding is a foundational algorithm in data structures for efficient, void addEdge(int u, int v) {
lossless data compression. By using a binary tree and priority queue, it generates
adj[u].push_back(v);
optimal, prefix-free codes, enabling significant savings in storage and bandwidth.
Its practical importance is evident in many modern compression standards and adj[v].push_back(u); // For undirected graph
systems1610.
}
Citations:
Delete Edge:
cpp
Operations on Graphs: Concepts and C++ Code
void removeEdge(int u, int v) {
Graphs are fundamental data structures in computer science, consisting of a set
of vertices (nodes) and edges (connections). The most common operations on adj[u].erase(remove(adj[u].begin(), adj[u].end(), v), adj[u].end());
graphs include:
adj[v].erase(remove(adj[v].begin(), adj[v].end(), u), adj[v].end());
• Graph Representation
}
• Insertion and Deletion of Vertices/Edges Delete Vertex:
Remove all edges associated with the vertex and its adjacency list.
• Traversal (BFS & DFS)
3. Graph Traversal (BFS & DFS)
• Searching (Path Finding)
Breadth-First Search (BFS)
• Cycle Detection
BFS visits nodes level by level, using a queue.
• Shortest Path Algorithms cpp
Below is a detailed explanation of these operations, accompanied by C++ code #include <queue>
samples.
#include <vector>
1. Graph Representation
#include <iostream>
Graphs can be represented in several ways:
using namespace std;
• Adjacency List: Efficient for sparse graphs.
DFS explores as far as possible along each branch before backtracking, using return true;
recursion or a stack.
}
cpp
}
void DFSUtil(const Graph& g, int u, vector<bool>& visited) {
return false;
visited[u] = true;
}
cout << u << " ";
return true;
void DFS(const Graph& g, int start) {
}
vector<bool> visited(g.size(), false);
}
DFSUtil(g, start, visited);
return false;
}
}
4. Searching (Path Finding)
6. Shortest Path (Unweighted Graphs)
You can use BFS or DFS to determine if a path exists between two nodes.
BFS can be used to find the shortest path in an unweighted graph.
cpp
cpp
bool hasPath(const Graph& g, int src, int dest) {
vector<int> shortestPath(const Graph& g, int src) {
vector<bool> visited(g.size(), false);
vector<int> dist(g.size(), -1);
queue<int> q;
queue<int> q;
q.push(src);
q.push(src);
visited[src] = true;
dist[src] = 0;
while (!q.empty()) {
while (!q.empty()) {
int u = q.front(); q.pop();
int u = q.front(); q.pop();
if (u == dest) return true;
for (int v : g.neighbors(u)) {
for (int v : g.neighbors(u)) {
if (dist[v] == -1) {
if (!visited[v]) {
dist[v] = dist[u] + 1;
visited[v] = true;
q.push(v);
q.push(v);
}
}
}
}
}
}
return dist;
return false;
}
}
Summary Table
5. Cycle Detection
C++
Cycle detection can be done using DFS by checking for back edges. Operation Description
Structure/Algorithm
cpp
Adjacency List/Matrix/Edge vector<vector<int>>,
Representation
bool isCyclicUtil(const Graph& g, int v, vector<bool>& visited, int parent) { List etc.
Warshall's Algorithm 0 3 ∞ 7
Purpose: 8 0 2 ∞
Warshall's algorithm is used to compute the transitive closure of a directed
graph. That is, it determines whether a path exists between every pair of vertices, 5 ∞ 0 1
regardless of path length. It does not compute shortest paths or path weights,
2 ∞ ∞ 0
only reachability.
After applying Floyd-Warshall, you get the shortest path distances between all
Algorithm Steps:
pairs.
1. Represent the graph as an adjacency matrix AAA, where A[i][j]=1A[i][j]
C++ Code Example: Floyd-Warshall Algorithm
= 1A[i][j]=1 if there is an edge from vertex iii to jjj, else 000.
cpp
2. For each vertex kkk from 111 to nnn:
#include <iostream>
o For each pair of vertices (i,j)(i, j)(i,j):
#include <vector>
▪ If A[i][j]=1A[i][j] = 1A[i][j]=1 or (A[i][k]=1A[i][k] =
1A[i][k]=1 and A[k][j]=1A[k][j] = 1A[k][j]=1), then using namespace std;
set A[i][j]=1A[i][j] = 1A[i][j]=1.
const int INF = 1e9;
3. After all iterations, A[i][j]=1A[i][j] = 1A[i][j]=1 if there is a path from iii to
jjj.
[ [0, 1, 1],
int main() {
[0, 0, 1],
int n = 4;
[0, 0, 0] ]
vector<vector<int>> dist = {
Now, A[2]=1A[2] = 1A[2]=1, indicating a path from 0 to 2 via 1.
{0, 3, INF, 7},
Floyd-Warshall Algorithm
{8, 0, 2, INF},
Purpose:
{5, INF, 0, 1},
The Floyd-Warshall algorithm finds the shortest paths between all pairs of
vertices in a weighted graph (can be directed or undirected, with positive or {2, INF, INF, 0}
negative edge weights but no negative cycles)124568.
};
Algorithm Steps:
floydWarshall(dist, n);
1. Initialization:
Create a distance matrix distdistdist where dist[i][j]dist[i][j]dist[i][j] is cout << "Shortest distances between every pair of vertices:\n";
the weight of the edge from iii to jjj, or infinity if no edge exists. Set
dist[i][i]=0dist[i][i] = 0dist[i][i]=0 for all iii. for (int i = 0; i < n; ++i) {
cout << "INF "; Consider a graph with vertices A, B, C, D, E, F and the following weighted edges:
• Floyd-Warshall algorithm computes the shortest path distances vector<int> dist(n, INT_MAX);
between all pairs of vertices in a weighted graph, using dynamic
programming and an adjacency matrix representation124568. priority_queue<pii, vector<pii>, greater<pii>> pq;
pq.push({0, src});
Dijkstra's Algorithm for Shortest Path
1. Initialization:
// If this distance is not up-to-date, skip
o Set the distance to the source vertex as 0 and all other
vertices as infinity. if (d > dist[u]) continue;
▪ Select the unvisited vertex with the smallest dist[v] = dist[u] + weight;
distance (let's call it u).
pq.push({dist[v], v});
▪ For each neighbor v of u, calculate the distance
from the source to v through u. If this distance is }
less than the current stored distance for v,
}
update it.
}
▪ Mark u as visited.
3. Termination:
cout << "Vertex\tDistance from Source\n";
o When all vertices are visited, the algorithm ends. The
distance array now contains the shortest distances from for (int i = 0; i < n; ++i)
the source to every vertex1357.
cout << char('A' + i) << "\t" << dist[i] << endl;
} • Game AI pathfinding
In summary:
Dijkstra's algorithm efficiently computes the shortest path from a single source
int main() {
to all other nodes in a weighted graph with non-negative edges, using a greedy
int n = 6; // Number of vertices (A-F) strategy and a priority queue for optimal performance12357.
vector<vector<pii>> adj(n);
Topological Sorting in C++
adj[0].push_back({1, 4}); // A-B Topological sorting is a linear ordering of the vertices of a Directed Acyclic
Graph (DAG) such that for every directed edge u→vu \rightarrow vu→v, vertex uuu
adj[0].push_back({2, 2}); // A-C comes before vvv in the ordering. This is widely used in scheduling tasks,
resolving symbol dependencies in compilers, and determining the order of
adj[1].push_back({2, 1}); // B-C compilation in build systems3567.
Output • At the end, pop vertices from the stack to get the topological order256.
Vertex Distance from Source • Compute the indegree (number of incoming edges) for each vertex.
F 12 • If all vertices are processed, the ordering is valid. Otherwise, the graph
has a cycle17.
(Distances may vary based on graph representation and edge direction.)
C++ Code Example: DFS Approach
Explanation
cpp
• The algorithm starts at the source (A), visiting the nearest unvisited
vertex at each step and updating the shortest known distances to its #include <iostream>
neighbors.
#include <list>
• It uses a priority queue to always process the vertex with the smallest
#include <stack>
tentative distance next25.
using namespace std;
• Once a vertex is marked visited, its shortest distance is finalized and
never updated again137.
list<int> *adj;
• Time Complexity: O((V+E)logV)O((V + E) \log V)O((V+E)logV) with a
min-heap priority queue. void topologicalSortUtil(int v, bool visited[], stack<int> &Stack);
void topologicalSort();
• Network routing protocols
};
Output:
5 4 2 3 1 0 (One possible valid ordering)56.
Graph::Graph(int V) {
C++ Code Example: Kahn's Algorithm (BFS/Indegree)
this->V = V;
cpp
adj = new list<int>[V];
#include <iostream>
}
#include <vector>
#include <queue>
void Graph::addEdge(int v, int w) {
using namespace std;
adj[v].push_back(w);
}
void topologicalSort(int V, vector<vector<int>> &adj) {
q.push(i);
void Graph::topologicalSort() {
stack<int> Stack;
vector<int> topo;
bool *visited = new bool[V];
while (!q.empty()) {
for (int i = 0; i < V; i++)
int u = q.front(); q.pop();
visited[i] = false;
topo.push_back(u);
for (int i = 0; i < V; i++)
for (int v : adj[u]) {
if (!visited[i])
indegree[v]--;
topologicalSortUtil(i, visited, Stack);
if (indegree[v] == 0)
while (!Stack.empty()) {
q.push(v);
cout << Stack.top() << " ";
}
Stack.pop();
}
}
for (int v : topo) cout << v << " ";
cout << endl;
cout << endl;
}
}
int main() {
int main() {
Graph g(6);
int V = 6;
g.addEdge(5, 2);
vector<vector<int>> adj(V);
g.addEdge(5, 0);
adj[5] = {2, 0};
g.addEdge(4, 0);
adj[4] = {0, 1};
g.addEdge(4, 1);
adj[2] = {3};
g.addEdge(2, 3);
adj[3] = {1};
g.addEdge(3, 1);
cout << "Topological Sort using Kahn's Algorithm:\n";
cout << "Topological Sort of the given graph:\n";
topologicalSort(V, adj);
g.topologicalSort();
return 0;
return 0;
}
}
Output: public:
4 5 2 0 3 1 (One possible valid ordering)17.
void addEdge(int u, int v) {
Applications of Topological Sort
adjList[u].insert(v);
• Task scheduling (e.g., build systems, course prerequisites)
adjList[v].insert(u); // For undirected graph
• Resolving symbol dependencies in compilers }
• Determining the order of compilation const map<int, set<int>>& getAdjList() const { return adjList; }
};
• Both DFS and Kahn's Algorithm run in O(V + E) time, where V = number
of vertices, E = number of edges127.
In summary: q.push(start);
Topological sorting provides a way to order tasks in a DAG so that all
while (!q.empty()) {
dependencies are respected. It is implemented efficiently in C++ using either
DFS (with a stack) or Kahn's Algorithm (using indegrees and a queue)1257. int node = q.front();
Traversal of Graph in Detail (with C++ Code) q.pop();
Graph traversal is the process of visiting all the vertices (and possibly edges) in a if (visited.find(node) == visited.end()) {
graph in a systematic way. Traversal is fundamental for exploring graph
structures, finding paths, detecting cycles, and solving many real-world visited.insert(node);
problems.
result.push_back(node);
The two most common graph traversal techniques are:
for (int neighbor : graph.getAdjList().at(node)) {
• Breadth-First Search (BFS)
if (visited.find(neighbor) == visited.end()) {
• Depth-First Search (DFS)
q.push(neighbor);
1. Breadth-First Search (BFS)
}
Concept:
}
BFS explores the graph level by level. Starting from a source vertex, it visits all its
neighbors before moving to the next level of neighbors. BFS uses a queue to keep }
track of vertices to visit next.
}
Applications:
return result;
• Finding the shortest path in unweighted graphs
}
• Level-order traversal
#include <vector>
// Graph class using adjacency list for (int node : traversal) cout << node << " ";
} g.addEdge(0, 2);
Concept:
DFS explores as deep as possible along each branch before backtracking. It uses
cout << "DFS Traversal: ";
a stack (often implemented via recursion) to keep track of the path.
g.DFS(0);
Applications:
cout << endl;
• Detecting cycles
return 0;
• Topological sorting
}
• Connected components
Output:
C++ Implementation: DFS Traversal: 0 1 3 4 2 5
This shows nodes visited by going as deep as possible before backtracking57.
cpp
Summary Table
#include <iostream>
Traversal Data Structure Order Visited Applications
#include <list>
Shortest path, connectivity,
#include <vector> BFS Queue Level by level
search
using namespace std;
Deep before Cycle detection, topological
DFS Stack/Recursion
backtrack sort
visited[v] = true; • DFS uses a stack (or recursion) to explore as deep as possible, useful
for cycle detection and topological sorting.
cout << v << " ";
for (int neighbor : adj[v]) { • Both can be implemented efficiently in C++ using standard data
structures12357.
if (!visited[neighbor])
Difference Between DFS and BFS (with C++ Code)
DFSUtil(neighbor, visited);
Below is a comprehensive point-wise comparison between Depth-First Search
} (DFS) and Breadth-First Search (BFS), including their principles,
implementation, applications, and C++ code examples.
}
1. Definition and Traversal Order
public:
• BFS (Breadth-First Search):
Graph(int V) {
o Explores all nodes at the present depth level before moving
this->V = V;
on to nodes at the next depth level (layer by layer)145.
adj = new list<int>[V];
• DFS (Depth-First Search):
}
o Explores as far as possible along each branch before
void addEdge(int v, int w) { backtracking (goes deep before wide)145.
} • BFS: Uses a Queue (FIFO principle) to keep track of the next vertex to
visit147.
void DFS(int v) {
vector<bool> visited(V, false); • DFS: Uses a Stack (LIFO principle) or recursion to keep track of the
path147.
DFSUtil(v, visited);
3. Implementation Principle
}
• BFS: First-In-First-Out (FIFO)47.
};
• DFS: Last-In-First-Out (LIFO)47.
Graph g(6);
• Time Complexity: Both BFS and DFS have O(V+E)O(V + E)O(V+E) time #include <iostream>
complexity for adjacency list representation, where V = vertices, E =
#include <queue>
edges45.
#include <vector>
• Space Complexity:
using namespace std;
o BFS: Higher, as it stores all nodes at the current level in the
queue5.
o DFS: Lower, as it only stores nodes along the current path in void bfs(vector<vector<int>>& adj, int start) {
the stack or recursion call stack5.
int n = adj.size();
5. Path Finding and Optimality
vector<bool> visited(n, false);
• BFS: Guarantees the shortest path in unweighted graphs5.
queue<int> q;
o Network broadcasting, social network friend suggestions, cout << node << " ";
bipartite graph checking145.
for (int neighbor : adj[node]) {
• DFS: if (!visited[neighbor]) {
}
• DFS: Suitable for solutions that may be far from the source or require
exploring all possibilities (deep search)4. }
8. Backtracking DFS Implementation (C++):
• DFS: Can get trapped in cycles if visited nodes are not tracked4. void dfsUtil(vector<vector<int>>& adj, int node, vector<bool>& visited) {
• BFS: Visits siblings before children4. cout << node << " ";
• BFS: Not commonly used for cycle detection. dfsUtil(adj, start, visited);
cpp
Parameter BFS DFS Let's compute the hash values:
Deep along a branch, then Key kmod 7k \mod 7kmod7 Hash Index
Traversal Order Level by level
backtrack
32 32 % 7 = 4 4
Data Structure Queue (FIFO) Stack (LIFO) or Recursion
49 49 % 7 = 0 0
Space Higher (stores all nodes at a Lower (stores only current
Complexity level) path) 97 97 % 7 = 6 6
• BFS is optimal for shortest path and level-wise traversal, uses more Index Keys Stored (Chain)
memory, and is implemented with a queue145.
0 49
• DFS is suited for deep exploration, uses less memory, enables
backtracking, and is implemented with a stack or recursion145.
1 155 → 183
• Both are fundamental for graph and tree algorithms, each with distinct
2
strengths and applications.
A good hash function is critical for the efficient performance of a hash table. The 4 32 → 102
key rules and principles are:
5
• Uniform Distribution: The hash function should distribute keys as
evenly as possible across the table to minimize collisions and avoid 6 97
clustering23.
C++ Code Example: Hash Table with Chaining (Division Method)
• Minimize Collisions: Different keys should rarely hash to the same
index. Fewer collisions mean faster lookups and insertions234. cpp
• Efficiency: The function should be simple and fast to compute3. #include <iostream>
#include <list>
• Deterministic: The same input must always produce the same
output. using namespace std;
• Scalability: Should perform well as the table size or the number of class HashTable {
keys grows3.
int size;
• Avoid Patterns: The function should not produce patterns that could list<int> *table; // Array of linked lists
cause clustering.
public:
• Use of Established Algorithms: Prefer well-tested hash functions
over custom ones for critical applications3. HashTable(int s) : size(s) {
where kkk is the key and mmm is the table size69. int index = key % size;
table[index].push_back(key);
• Table size mmm should preferably be a prime number, not a power of
2, to help distribute keys more uniformly6. }
Given values: 32, 49, 97, 101, 102, 155, 183 for (int i = 0; i < size; ++i) {
Table size (m): 7
Hash function: h(k)=kmod 7h(k) = k \mod 7h(k)=kmod7 cout << i << ": ";
for (int val : table[i]) • Good hash functions ensure uniform distribution, minimize collisions,
and are efficient to compute234.
cout << val << " -> ";
cout << "NULL\n"; • Using the division method with table size 7, the given values are
distributed as shown, with collisions handled by chaining6710.
}
• The provided C++ code demonstrates insertion and display of the
} hash table using chaining for collision resolution.
~HashTable() {
delete[] table; 13. What do you mean by hashing? Explain various hashing functions with
suitable examples.
}
What is Hashing?
};
Hashing is the process of transforming input data (called a key) into a fixed-size
value (called a hash value, hash code, or digest) using a mathematical function
int main() { called a hash function124. The hash value is typically used as an index in a hash
table for efficient data storage and retrieval. Hashing is a one-way process: it is
int keys[] = {32, 49, 97, 101, 102, 155, 183}; extremely difficult to reconstruct the original data from its hash value125.
HashTable ht(7); • Input Key: The data to be hashed (e.g., a number, string, file).
ht.insert(keys[i]); • Hash Table: The data structure that stores the hash values and
associated data128.
Use cases:
cout << "Hash Table using Division Method and Chaining:\n";
• Fast data lookup in hash tables
ht.display();
return 0;
• Password storage
} • Digital signatures
Hash Table using Division Method and Chaining: 1. Division (Modulo) Method
Summary Table }
Reduce number of keys mapping to same • Example: m=10,A=0.618m = 10, A = 0.618m=10,A=0.618, key k=112k
Minimize Collisions = 112k=112:
index234
h(112)=⌊10×(112×0.618mod 1)⌋h(112) = \lfloor 10 \times (112 \times
Efficiency Fast computation3 0.618 \mod 1) \rfloorh(112)=⌊10×(112×0.618mod1)⌋
cpp
Flexibility Works for various key types3
int hashFunc(int key, int tableSize) {
Scalability Performs well as data grows3
double A = 0.6180339887;
Avoid Patterns Prevent clustering
return int(tableSize * fmod(key * A, 1));
Use Established Algorithms Prefer proven hash functions3 }
cpp ht.display();
int mid = (squared / 10) % 100; // extract middle two digits Output:
} 0: 49 -> NULL
2: NULL
• Description: Used in security (e.g., SHA-256, MD5); produce fixed-
length, unique, and irreversible digests25. 3: 101 -> NULL
• Example: Hashing a password before storing it. 4: 32 -> 102 -> NULL
Hashing Function Example Use o For example, in a record {101, "Raj", 85}, the field "Raj"
represents the name.
Division (Modulo) Fast, simple, general purpose
4. Data Section:
Multiplication More uniform distribution
o This holds the actual content of the file, meaning the
records are stored in this part.
Folding Large numeric keys
5. End-of-File (EOF) Marker:
Mid-Square Uniform for certain key types
o A special marker that indicates the end of the file to prevent
Cryptographic (SHA, MD) Security, data integrity reading beyond it.
Collision Resolution Description o In text files, it's often represented as a special character like
EOF or -1.
Chaining Linked lists at each index
Linear Probing Next available slot Q2. Describe Various Kinds of Operations Required to Maintain Files. (15
Marks)
Probing with quadratic
Quadratic Probing
step Introduction:
Double Hashing Second hash for step size File operations are fundamental for managing data stored in external storage.
These operations include creating, opening, reading, writing, updating, and
In summary: deleting files. File handling ensures data permanence, structured access, and
security.
• Hashing is a method to map data to a fixed-size value using a hash
function for efficient storage and retrieval1258.
Types of File Operations:
• Hash functions include division, multiplication, folding, mid-square,
and cryptographic hashes. 1. Create:
• Collisions can be resolved by chaining (linked lists) or open o This operation creates a new file in the system.
addressing (probing)86.
o A unique filename and location are assigned, and space is
• C++ code examples illustrate both hash function usage and collision allocated.
resolution.
2. Open:
Absolutely, Raj! Below is the exam-oriented, rewritten version of all your
o Before any operation (read/write), a file must be opened
questions with long, 15-mark-style definitions. C++ code is included only where
using a specific mode (read, write, append, binary, etc.).
it's truly needed for explanation. These answers are ideal for a BCA-level theory
exam—especially one focusing on File Organization or System Software. Each 3. Read:
answer is detailed, structured, and clear for scoring high in exams.
o Used to retrieve data from a file.
Files are managed by the operating system, and can be of various types—text o Adds new data at the end of the file without modifying
files, binary files, executable files, etc. Depending on the file organization existing content.
method, data may be accessed sequentially, directly, or through indexing.
6. Update (Modify):
o This is the metadata section that stores important o Removes data or the file itself from the storage system.
information about the file, such as file type, size, format,
record length, and creation/modification dates. o Logical deletion marks a record as deleted; physical
deletion removes it permanently.
o For example, a file may have a header stating that it stores
100 records of 50 bytes each. 8. Close:
o For instance, a student record may contain fields like name, C++ Example: Writing and Reading a File
roll number, and marks.
#include <iostream>
3. Fields:
#include <fstream>
using namespace std; o Periodically, the main file and overflow areas are merged
and sorted again to reduce lookup time and fragmentation.
int main() {
Diagram:
// Writing to a file
+-------------------+ +---------------+
ofstream fout("example.txt");
| Index File | --> | Key: Address |
fout << "This is file handling in C++.";
+-------------------+ +---------------+
fout.close();
+-------------------+ +-----------------+
// Reading from the file
| Main Data File | | Overflow Area |
ifstream fin("example.txt");
+-------------------+ +-----------------+
string content;
| 1001 | John |--+ ->| 1006 | Sarah |
while (getline(fin, content)) {
| 1002 | Alice | | 1010 | Rohan |
cout << content << endl;
+-------------------+ +------------------+
}
fin.close();
Q10. Differentiate Between Multi-list and Inverted List File Organization.
return 0;
(15 Marks)
}
Inverted List File
Feature Multi-list File Organization
Organization
Q9. What is Indexed Sequential File? Explain Techniques for Handling Multiple linked lists for each Central index for each
Structure
Overflow. (15 Marks) key/field field/attribute
Structure:
Explanation:
1. Main File (Data File): Contains the actual records in sorted order.
• In a multi-list, records are connected in multiple linked lists, with
2. Index File: Contains key-to-address mappings. each list representing a relationship.
3. Overflow Area: Used when new records cannot be inserted in • In an inverted list, each attribute has its own index which points to all
sequence. records having that attribute value. Common in information retrieval
systems.
Advantages:
Q11. Describe Fixed and Variable Length Record With Example. (15
• Efficient for both sequential and random access. Marks)
o Overflow records are linked to the main record using char name[20]; // 20 bytes
pointers or addresses.
float salary; // 4 bytes
o Efficient but increases pointer overhead.
};
3. Reorganization:
// Total = 28 bytes
Example:
Variable-Length Record: Email messages, social media posts, or chat logs where the length of content
varies significantly.
• Record sizes may vary due to varying field lengths (e.g., comments,
addresses).
• Efficient in space but more complex to access and maintain. Difference Table:
Primary Key:
Example:
• Employee ID
• Aadhaar Number
Secondary Key:
Example:
• Student Name
• Department Name
• City
Differences Table:
Can be
Uniqueness Must be unique
duplicate
Record
Main Purpose Search/filtering
identification