0% found this document useful (0 votes)
5 views

Dsa

Internal sorting is used when the entire dataset fits in RAM, allowing for faster and simpler sorting, while external sorting is necessary for larger datasets that exceed memory capacity, involving more complex operations with frequent disk I/O. Internal sorting algorithms include Quick Sort and Merge Sort, whereas external sorting techniques include External Merge Sort. The choice between internal and external sorting depends on the dataset size and the required efficiency.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Dsa

Internal sorting is used when the entire dataset fits in RAM, allowing for faster and simpler sorting, while external sorting is necessary for larger datasets that exceed memory capacity, involving more complex operations with frequent disk I/O. Internal sorting algorithms include Quick Sort and Merge Sort, whereas external sorting techniques include External Merge Sort. The choice between internal and external sorting depends on the dataset size and the required efficiency.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Difference Between Internal and External Sorting

Internal and external sorting are two broad categories of sorting techniques, primarily distinguished by how data is managed during the sorting process and the size of data
they are designed to handle.

Key Differences

Aspect Internal Sorting External Sorting

Data Location Entire dataset fits in main memory (RAM) Dataset exceeds main memory capacity, stored in secondary storage (disk)

Memory Usage Uses only RAM for sorting Uses both RAM (for buffers/chunks) and disk storage

Suitable Dataset Small to medium datasets Large datasets that cannot fit into memory

I/O Operations Minimal; mostly limited to initial read and final write Frequent; involves reading/writing data to/from disk repeatedly

Speed Generally faster due to direct access to data in memory Slower due to overhead of disk access

Quick Sort, Merge Sort, Heap External Merge Sort, Polyphase for(int j = low; j < high; ++j) {
Algorithm
Sort, Bubble Sort, Insertion Merge Sort, Replacement
Examples if(arr[j] < pivot) {
Sort Selection, External Radix Sort
++i;
Simpler implementation, less More complex due to chunk
Complexity
overhead management and merging swap(arr[i], arr[j]) }

Sorting arrays, lists, or tables } swap(arr[i+1], arr[high]);


Sorting large files, databases, or
Application in memory (e.g., in-memory
datasets stored on disk return i + 1;
databases)
}
Efficient for large datasets, but
Highly efficient for
Efficiency slower than internal sorting for void quickSort(vector<int>& arr, int low, int high) {
small/medium datasets
small datasets
if(low < high) {
Explanation
int pi = partition(arr, low, high);
• Internal Sorting is used when the entire dataset to be sorted can fit quickSort(arr, low, pi - 1);
into the main memory (RAM). All sorting operations are performed in-
memory, making these algorithms fast and simple to implement. quickSort(arr, pi + 1, high);
Examples include Quick Sort, Merge Sort, and Heap Sort38.
}}
• External Sorting is necessary when the dataset is too large to fit into
int main() {
RAM. Data is divided into manageable chunks, each sorted in
memory, then merged using external storage (like a hard disk). This vector<int> arr = {5, 2, 9, 1, 5, 6};
approach minimizes random access and disk I/O, which are much
slower than RAM access. Examples include External Merge Sort and quickSort(arr, 0, arr.size() - 1);
Polyphase Merge Sort368.
for(int i : arr) cout << i << " ";
Trade-offs and Use Cases
return 0;}
• Internal sorting is preferred for small to medium datasets due to its
External Sorting (Simplified External Merge Sort Example)
speed and simplicity58.
External sorting is more complex and typically involves file I/O. The following is a
• External sorting is essential for very large datasets (e.g., big data, simplified illustration of the process:
large logs, database tables) where only a portion can be loaded into
memory at a time. It is slower but necessary to handle data beyond // Pseudocode for clarity, not a full implementation
RAM capacity568.
#include <fstream>#include <vector>#include <algorithm>
C++ Example Code
using namespace std;
Internal Sorting (Quick Sort Example)
void sortChunk(const string& inputFile, const string& outputFile, int chunkSize)
cpp {

#include <iostream> ifstream in(inputFile);

#include <vector> ofstream out(outputFile);

#include <algorithm> vector<int> buffer(chunkSize);

using namespace std;

while(in.read((char*)buffer.data(), chunkSize * sizeof(int))) {

int partition(vector<int>& arr, int low, int high) { int readCount = in.gcount() / sizeof(int);

int pivot = arr[high]; sort(buffer.begin(), buffer.begin() + readCount);

int i = low - 1; out.write((char*)buffer.data(), readCount * sizeof(int));


} • Real-Time Systems: Searching is used in systems that require
immediate data lookup, such as routing tables and inventory systems.
// Merge sorted chunks (not shown here)

} • Security: Searching is applied in password verification and intrusion


detection systems.
A full external sort would involve:
C++ Code Examples
• Splitting data into chunks that fit in RAM
Sorting Example (std::sort for integers)

• Sorting each chunk in memory (using an internal sort) cpp

• Writing sorted chunks to disk #include <iostream>

• Merging all sorted chunks into a final sorted file69 #include <vector>

Summary Table #include <algorithm>

using namespace std;


Internal Sorting External Sorting

Works entirely in RAM Uses both RAM and disk


int main() {
Fast, simple, minimal I/O Slower, complex, frequent disk I/O
vector<int> data = {5, 2, 9, 1, 5, 6};
Suitable for small/medium datasets Essential for large datasets
sort(data.begin(), data.end()); // Sorts in ascending order

Algorithms: Quick Sort, Heap Sort, Algorithms: External Merge Sort, for (int num : data) {
etc. Polyphase
cout << num << " ";
In conclusion:
Internal sorting is ideal for small datasets that fit in memory, offering speed and }
simplicity. External sorting is necessary for very large datasets, trading off speed
return 0;
for the ability to handle massive data volumes using disk storage358.
}
Applications of Sorting and Searching in Computer Science (with C++
Examples) Binary Search Example (std::binary_search)
Sorting and searching are foundational operations in computer science, enabling cpp
efficient data management, retrieval, and analysis across a wide range of
applications. #include <iostream>#include <vector>#include <algorithm>using namespace
std;
Applications of Sorting
int main() {
• Data Organization: Sorting arranges data for easier access and
management, such as alphabetizing names in a contact list or vector<int> data = {1, 2, 5, 5, 6, 9};int target = 5;
organizing files on a computer57.
// Data must be sorted for binary_search
• Efficient Searching: Many search algorithms (like binary search) bool found = binary_search(data.begin(), data.end(), target);
require sorted data to function efficiently, reducing search time from
linear to logarithmic57. if (found) cout << "Element found!" << endl;

• Data Analysis: Sorting helps in identifying trends, patterns, and else cout << "Element not found." << endl;
outliers, which is crucial in fields like statistics, finance, and scientific
return 0;
research157.
}
• Database Management: Databases use sorting to optimize query
performance, create indexes, and enable rapid data retrieval57. Application
Sorting Role Searching Role
Area
• User Experience: Sorting improves usability in applications such as
e-commerce (product listings), social media feeds, music playlists, Index creation, query Record lookup, key-based
and email management1. Databases
optimization retrieval

• Canonicalization and Output: Sorted data is easier to read and Trend/pattern/outlier Finding specific values or
compare, useful in reporting and data export2. Data Analysis
identification records

• Other Applications: Sorting is also used in GPS navigation, weather


Organizing lists, feeds, and Quick item lookup (e.g.,
forecasting, stock market analysis, medical diagnosis, and more1. User Interfaces
content contacts, products)
Applications of Searching
Information Query matching and relevance
Ranking and ordering results
• Information Retrieval: Search engines and information systems use Retrieval determination
searching algorithms to quickly locate relevant documents or data
entries7. Scientific Data preparation and Locating data points or
Research analysis experiments
• Database Queries: Searching is fundamental for finding records in
databases, especially when combined with indexing and sorting7. In summary:
Sorting and searching are essential in computer science for efficient data
• Pattern Matching: Searching algorithms are used in text editors, DNA organization, retrieval, and analysis. They underpin many real-world
sequence analysis, and plagiarism detection. applications, from databases and search engines to user-facing software, and
are implemented using standard algorithms and libraries in C++1257.
Best Sorting Algorithm on the Basis of Complexity Drawback: The main limitation is its O(n)O(n)O(n) space requirement, which can
be significant for very large datasets.
When evaluating sorting algorithms in terms of complexity, the primary focus is
on time complexity (how fast the algorithm runs) and space complexity (how C++ Example: Merge Sort
much extra memory it uses). The most efficient algorithms for large datasets are
those with a time complexity of O(nlog⁡n)O(n \log n)O(nlogn) in the worst case, cpp
as this is the theoretical lower bound for comparison-based sorting.
#include <iostream>
Comparison of Popular Sorting Algorithms
#include <vector>
Algorith Space Stabl using namespace std;
Best Case Average Case Worst Case
m Complexity e?

Bubble O(n2)O(n^2)O( O(n2)O(n^2)O(


O(n)O(n)O(n) O(1)O(1)O(1) Yes void merge(vector<int>& arr, int left, int mid, int right) {
Sort n2) n2)

int n1 = mid - left + 1;


Selectio O(n2)O(n^2)O( O(n2)O(n^2)O( O(n2)O(n^2)O(
O(1)O(1)O(1) No
n Sort n2) n2) n2) int n2 = right - mid;

Insertion O(n2)O(n^2)O( O(n2)O(n^2)O( vector<int> L(n1), R(n2);


O(n)O(n)O(n) O(1)O(1)O(1) Yes
Sort n2) n2)

Merge O(nlog⁡n)O(n O(nlog⁡n)O(n O(nlog⁡n)O(n O(n)O(n)O(n


Yes for(int i = 0; i < n1; ++i)
Sort \log n)O(nlogn) \log n)O(nlogn) \log n)O(nlogn) )
L[i] = arr[left + i];
Heap O(nlog⁡n)O(n O(nlog⁡n)O(n O(nlog⁡n)O(n
O(1)O(1)O(1) No
Sort \log n)O(nlogn) \log n)O(nlogn) \log n)O(nlogn) for(int j = 0; j < n2; ++j)

R[j] = arr[mid + 1 + j];


Quick O(nlog⁡n)O(n O(nlog⁡n)O(n O(n2)O(n^2)O( O(log⁡n)O(\l
No
Sort \log n)O(nlogn) \log n)O(nlogn) n2) og n)O(logn)

Which Algorithm is Best? int i = 0, j = 0, k = left;

Merge Sort and Heap Sort are generally considered the best in terms of worst- while(i < n1 && j < n2) {
case time complexity, both achieving O(nlog⁡n)O(n \log n)O(nlogn) performance
regardless of input data134. However, each has its own trade-offs: if(L[i] <= R[j])

• Merge Sort: arr[k++] = L[i++];

else
o Time Complexity: Always O(nlog⁡n)O(n \log n)O(nlogn),
regardless of the input134. arr[k++] = R[j++];
o Space Complexity: Requires O(n)O(n)O(n) extra space for }
merging34.
while(i < n1)
o Stability: Stable (preserves the order of equal elements).
arr[k++] = L[i++];
o Use Case: Preferred when stability is required, such as in
sorting records by multiple fields. while(j < n2)

• Heap Sort: arr[k++] = R[j++];

}
o Time Complexity: Always O(nlog⁡n)O(n \log n)O(nlogn)3.

o Space Complexity: In-place, needs only O(1)O(1)O(1) extra


space3. void mergeSort(vector<int>& arr, int left, int right) {

o Stability: Not stable. if(left < right) {

o Use Case: Suitable for memory-constrained environments int mid = left + (right - left) / 2;
where stability is not necessary.
mergeSort(arr, left, mid);
Quick Sort is often the fastest in practice due to low overhead and cache
efficiency, but its worst-case complexity is O(n2)O(n^2)O(n2), which can be mergeSort(arr, mid + 1, right);
problematic for certain input patterns135. With good pivot selection (like
merge(arr, left, mid, right);
randomized or median-of-three), the average case is O(nlog⁡n)O(n \log
n)O(nlogn), making it a popular choice for general-purpose sorting in C++ (e.g., }
std::sort uses a variant of Quick Sort)5.
}
Why Merge Sort is Often Considered Best (Theoretical Perspective)

• Consistent Performance: Merge Sort guarantees O(nlog⁡n)O(n \log


n)O(nlogn) time in all cases (best, average, and worst), making it int main() {
reliable for large datasets134.
vector<int> arr = {5, 2, 9, 1, 5, 6};
• Stability: It is stable, which is important for many real-world mergeSort(arr, 0, arr.size() - 1);
applications (e.g., sorting records by multiple keys).
for(int num : arr)
• Divide and Conquer: Efficient for external sorting (sorting data that
does not fit in memory). cout << num << " ";

return 0;
} Case Time Complexity Space Complexity Notes

Conclusion
Worst O(log⁡n)O(\log Poor pivot choices
O(n2)O(n^2)O(n2)
Case n)O(logn) (e.g., sorted data)56
• Merge Sort is the best sorting algorithm based on complexity for large
datasets because it guarantees O(nlog⁡n)O(n \log n)O(nlogn) time in
all scenarios and is stable, making it suitable for many practical • Best/Average Case:
applications134. Achieved when the pivot divides the array into nearly equal halves56.

• Heap Sort is also optimal in terms of time and space but is not stable. • Worst Case:
Occurs when the pivot is always the smallest or largest element,
• For small datasets or when average performance is prioritized, Quick leading to unbalanced partitions (e.g., already sorted data)56.
Sort is often used, but its worst-case performance can be a
drawback135. • Space Complexity:
Only the recursion stack is used; no additional arrays are required57.
In summary:
Choose Merge Sort when you need guaranteed performance and stability, Heap C++ Implementation Example
Sort for in-place sorting without stability, and Quick Sort for practical speed on
cpp
average, with caution for worst-case scenarios.
#include <iostream>
Quick Sort: Explanation, Usage, Complexity, and C++ Example
using namespace std;
What is Quick Sort?

Quick Sort is a highly efficient, comparison-based sorting algorithm that follows


the divide and conquer paradigm. It works by selecting a pivot element from the // Partition function
array, partitioning the other elements into two sub-arrays (those less than the
pivot and those greater), and recursively applying the same process to the sub- int partition(int arr[], int low, int high) {
arrays1356.
int pivot = arr[high]; // Choose last element as pivot
How Quick Sort Works
int i = (low - 1);
1. Choose a Pivot:
Select a pivot element from the array. Common strategies include for (int j = low; j <= high - 1; j++) {
picking the first, last, a random, or the median element as the pivot56.
if (arr[j] < pivot) {
2. Partitioning:
i++;
Rearrange the array so that elements less than the pivot are on its left
and elements greater than the pivot are on its right. The pivot is now in swap(arr[i], arr[j]);
its correct sorted position56.
}
3. Recursion:
Recursively apply the above steps to the sub-arrays to the left and }
right of the pivot135.
swap(arr[i + 1], arr[high]);
4. Base Case:
If the sub-array has zero or one element, it is already sorted. return (i + 1);

Why Quick Sort is Used and Useful }

• Efficiency:
Quick Sort is generally faster in practice than other O(nlog⁡n)O(n \log // QuickSort function
n)O(nlogn) algorithms like Merge Sort and Heap Sort due to its in-
place sorting and cache efficiency56. void quickSort(int arr[], int low, int high) {

• In-Place Sorting: if (low < high) {


It requires only a small, constant amount of additional storage space
int pi = partition(arr, low, high); // Partitioning index
(O(log⁡n)O(\log n)O(logn) due to recursion stack)57.
quickSort(arr, low, pi - 1); // Sort left subarray
• Versatility:
Widely used in commercial software, system libraries, and for large quickSort(arr, pi + 1, high); // Sort right subarray
datasets where average-case performance is critical5.
}
• Customization:
}
Pivot selection strategies and partitioning schemes can be tailored for
specific data characteristics56.

• Limitations: // Utility function to print array


Quick Sort is not stable (does not preserve the order of equal
elements) and its worst-case time complexity is O(n2)O(n^2)O(n2), void printArray(int arr[], int size) {
though this is rare with good pivot selection56.
for (int i = 0; i < size; i++)
Complexity Analysis
cout << arr[i] << " ";
Case Time Complexity Space Complexity Notes
cout << endl;

O(nlog⁡n)O(n \log O(log⁡n)O(\log }


Best Case Balanced partitioning
n)O(nlogn) n)O(logn)

Average O(nlog⁡n)O(n \log O(log⁡n)O(\log Most practical


int main() {
Case n)O(nlogn) n)O(logn) scenarios
int arr[] = {9, 4, 8, 3, 7, 1, 6, 2, 5};
int n = sizeof(arr) / sizeof(arr[0]); CountingSort(array, n, place)

quickSort(arr, 0, n - 1); Example

cout << "Sorted array: "; Consider the array:


{170, 45, 75, 90, 802, 24, 2, 66}
printArray(arr, n);
Step 1: Find the maximum number (802, which has 3 digits).
return 0;
Step 2: Sort by each digit place using Counting Sort:
}
• Pass 1 (Unit place):
This code selects the last element as the pivot and sorts the array in-place346.

Example Run
o Sorted array: {170, 90, 802, 2, 24, 45, 75, 66}

Input: • Pass 2 (Tens place):


9, 4, 8, 3, 7, 1, 6, 2, 5
o Sorted array: {802, 2, 24, 45, 66, 170, 75, 90}
Output:
123456789 • Pass 3 (Hundreds place):

Summary Table o Sorted array: {2, 24, 45, 66, 75, 90, 170, 802}

Feature Quick Sort Final Sorted Array:


{2, 24, 45, 66, 75, 90, 170, 802}34
Approach Divide and conquer
C++ Implementation

In-place Yes cpp

Stability No #include <iostream>

using namespace std;


Best/Average O(nlog⁡n)O(n \log n)O(nlogn)

Worst O(n2)O(n^2)O(n2)
// Function to get the largest element from an array
Space O(log⁡n)O(\log n)O(logn)
int getMax(int arr[], int n) {
Practical Use Very fast, widely used
int max = arr[0];
Conclusion for (int i = 1; i < n; i++)
Quick Sort is a powerful, efficient, and widely-used sorting algorithm in C++. Its if (arr[i] > max)
divide-and-conquer approach, in-place sorting, and fast average-case
performance make it a top choice for many real-world applications, despite its max = arr[i];
worst-case scenario. Using randomized or median-of-three pivot selection can
help avoid the worst-case and ensure robust performance56. return max;

Radix Sort Algorithm and Example }

Radix Sort Algorithm

Radix Sort is a non-comparative sorting algorithm that sorts numbers by // Using counting sort to sort elements based on significant places
processing individual digits. It works from the least significant digit (LSD) to the
void countSort(int arr[], int n, int place) {
most significant digit (MSD), using a stable sub-sorting algorithm (commonly
Counting Sort) at each digit position. It is especially efficient for sorting integers const int max = 10;
and can outperform comparison-based algorithms when the number of digits is
small relative to the number of elements. int output[n];

Algorithm Steps int count[max] = {0};

1. Find the Maximum Number:


Determine the maximum number in the array to know the number of
digits to process. // Count occurrences

2. Sort by Each Digit Place: for (int i = 0; i < n; i++)


For each digit place (unit, tens, hundreds, etc.):
count[(arr[i] / place) % 10]++;
o Use a stable sort (like Counting Sort) to sort the array based
on the current digit.
// Cumulative count
3. Repeat for All Digits:
Continue the process for all digit places, from least significant to most for (int i = 1; i < max; i++)
significant.
count[i] += count[i - 1];
Pseudocode

text
// Build the output array
RadixSort(array, n)
for (int i = n - 1; i >= 0; i--) {
maxNum = find maximum number in array
output[count[(arr[i] / place) % 10] - 1] = arr[i];
for place = 1; maxNum/place > 0; place *= 10
count[(arr[i] / place) % 10]--;
} int n = sizeof(arr) / sizeof(arr[0]);

cout << "Before sorting: ";

// Copy to original array display(arr, n);

for (int i = 0; i < n; i++) radixSort(arr, n);

arr[i] = output[i];} cout << "After sorting: ";

// Main radix sort function display(arr, n);

void radixSort(int arr[], int n) { return 0;

int max = getMax(arr, n); }

// Apply counting sort to sort elements based on place value Output:

for (int place = 1; max / place > 0; place *= 10) text

countSort(arr, n, place); Before sorting: 170 45 75 90 802 24 2 66

} After sorting: 2 24 45 66 75 90 170 802

// Display array Complexity

void display(int arr[], int n) { • Time Complexity:

for (int i = 0; i < n; i++) o Best, Average, and Worst: O(d⋅(n+k))O(d \cdot (n +
k))O(d⋅(n+k)) where ddd is the number of digits in the
cout << arr[i] << " ";
maximum number and kkk is the base (10 for decimal
cout << endl; numbers)68.

} • Space Complexity:

o O(n+k)O(n + k)O(n+k), due to the output array and counting


array8.
int main() {

int arr[] = {170, 45, 75, 90, 802, 24, 2, 66};

Summary
• Radix Sort is efficient for sorting integers, especially when the range 3. Recursive Calls:
of digits is not significantly larger than the number of elements. Recursively apply merge sort to the left and right halves.

4. Merge:
• It processes digits from least significant to most significant, using a
Merge the two sorted halves into a single sorted array1368.
stable sort at each step.
Example Dry Run
• It is non-comparative, stable, and has linear time complexity relative
to the number of digits and elements. Consider the array: [38, 27, 43, 3, 9,Step 1: Divide** [38, 27, 43, 3, 9,d [3,

Radix Sort is particularly useful for sorting large lists of numbers where • Step 2: Further Divide
comparison-based sorts (like Quick Sort or Merge Sort) are less efficient due → and [27, [27,and
to their O(nlog⁡n)O(n \log n)O(nlogn) complexity. [3, → [3] and `[3]` → `[3]` and
`[82,and
Merge Sort Algorithm: Explanation, Example, and C++ Code

Algorithm Overview • Step 3: Merge


Merge and → [27, Merge `` and [27,[27, Merge 3and `` →3 Merge ``
Merge Sort is a classic divide-and-conquer sorting algorithm. It works by and `` →[10, Merge [3] and → `[3, 9, 10,erge and `[3, 9, 10,, 9, 10, 27,
recursively dividing the array into two halves, sorting each half, and then merging 38, 43,C++ Implementation**
the sorted halves back together. This process continues until the entire array is
sorted. cpp

Key Steps #include <iostream>

1. Divide: using namespace std;


Split the array into two halves until each subarray contains only one
element (which is trivially sorted)1368.
// Merge two sorted subarrays into a single sorted array
2. Conquer (Sort):
Recursively sort each subarray168. void merge(int arr[], int left, int mid, int right) {
3. Combine (Merge): int n1 = mid - left + 1;
Merge the sorted subarrays to produce new sorted subarrays until
there is only one sorted array left13568. int n2 = right - mid;

Merge Sort Algorithm Steps

1. Base Case: // Create temp arrays


If the array has one or zero elements, it is already sorted.
int L[n1], R[n2];
2. Divide the Array:
Find the middle index and divide the array into two halves. for (int i = 0; i < n1; i++)

L[i] = arr[left + i];


for (int j = 0; j < n2; j++) for (int i = 0; i < n; i++)

R[j] = arr[mid + 1 + j]; cout << arr[i] << " ";

cout << endl;

// Merge the temp arrays back into arr[left..right] }

int i = 0, j = 0, k = left;

while (i < n1 && j < n2) { int main() {

if (L[i] <= R[j]) { int arr[] = {38, 27, 43, 3, 9, 82, 10};

arr[k] = L[i]; int n = sizeof(arr) / sizeof(arr[0]);

i++; cout << "Original array: ";

} else { printArray(arr, n);

arr[k] = R[j];

j++; mergeSort(arr, 0, n - 1);

k++; cout << "Sorted array: ";

} printArray(arr, n);

return 0;

// Copy any remaining elements of L[] }

while (i < n1) { Complexity Analysis

arr[k] = L[i]; • Time Complexity:


i++; o Best, Average, Worst: O(nlog⁡n)O(n \log n)O(nlogn) for all
cases, as the array is always split into halves and
k++;
merged1368.
}
• Space Complexity:

o O(n)O(n)O(n) due to the temporary arrays used for merging.


// Copy any remaining elements of R[]
Advantages of Merge Sort
while (j < n2) {
• Consistent O(nlog⁡n)O(n \log n)O(nlogn) performance regardless of
arr[k] = R[j]; input order.
j++;
• Stable sort (preserves the order of equal elements).
k++;
• Well-suited for sorting linked lists and large datasets, and for external
} sorting.

} Summary

• Merge Sort divides the array into halves, sorts each half recursively,
and merges them.
// Merge Sort function

void mergeSort(int arr[], int left, int right) { • It is efficient, stable, and guarantees O(nlog⁡n)O(n \log n)O(nlogn)
time complexity.
if (left < right) {
• The merge step is the key operation, combining two sorted arrays into
int mid = left + (right - left) / 2; one sorted array13568.
// Sort first and second halves In conclusion:
Merge Sort is a robust, efficient, and widely-used sorting algorithm in C++, ideal
mergeSort(arr, left, mid);
for large datasets and applications requiring stable sorting.
mergeSort(arr, mid + 1, right);
Citations:
// Merge the sorted halves
Bubble Sort, Insertion Sort, and Selection Sort: Explanation, Example,
merge(arr, left, mid, right); Complexity, and C++ Code

} 1. Bubble Sort

} Explanation:
Bubble Sort compares adjacent elements in the array and swaps them if they are
in the wrong order. This process is repeated for all elements until the array is
sorted. After each pass, the largest unsorted element "bubbles up" to its correct
// Utility function to print the array position at the end of the array235.
void printArray(int arr[], int n) { Example:
Given array: 5, 3, 8, 4, 2
• Pass 1: (5,3)→swap → 3,5,8,4,2; (5,8)→ok; (8,4)→swap → 3,5,4,8,2; cpp
(8,2)→swap → 3,5,4,2,8
#include <iostream>
• Pass 2: (3,5)→ok; (5,4)→swap → 3,4,5,2,8; (5,2)→swap → 3,4,2,5,8 using namespace std;

• Pass 3: (3,4)→ok; (4,2)→swap → 3,2,4,5,8

• Pass 4: (3,2)→swap → 2,3,4,5,8 void insertionSort(int arr[], int n) {

C++ Code: for (int i = 1; i < n; i++) {

cpp int key = arr[i];

#include <iostream> int j = i - 1;

using namespace std; while (j >= 0 && arr[j] > key) {

arr[j + 1] = arr[j];

void bubbleSort(int arr[], int n) { j--;

for (int pass = n - 1; pass >= 0; pass--) { }

for (int i = 0; i < pass; i++) { arr[j + 1] = key;

if (arr[i] > arr[i + 1]) { }

int temp = arr[i]; }

arr[i] = arr[i + 1];

arr[i + 1] = temp; int main() {

} int arr[] = {5, 3, 8, 4, 2};

} int n = sizeof(arr) / sizeof(arr[0]);

} insertionSort(arr, n);

} cout << "Insertion Sorted: ";

for (int i = 0; i < n; i++) cout << arr[i] << " ";

int main() { return 0;

int arr[] = {5, 3, 8, 4, 2}; }

int n = sizeof(arr) / sizeof(arr[0]); Complexity:

bubbleSort(arr, n); • Best: O(n)O(n)O(n) (if already sorted)

cout << "Bubble Sorted: "; • Average/Worst: O(n2)O(n^2)O(n2)


for (int i = 0; i < n; i++) cout << arr[i] << " ";
• Space: O(1)O(1)O(1) (in-place)
return 0;
3. Selection Sort
}
Explanation:
Complexity: Selection Sort repeatedly finds the minimum element from the unsorted part and
swaps it with the first unsorted element. This process continues moving the
• Best: O(n)O(n)O(n) (if already sorted) boundary of the sorted and unsorted parts.

Example:
• Average/Worst: O(n2)O(n^2)O(n2)
Given array: 5, 3, 8, 4, 2
• Space: O(1)O(1)O(1) (in-place)235
• Step 1: Find min (2), swap with 5 → 2,3,8,4,5
2. Insertion Sort
• Step 2: Min is 3 (already in place) → 2,3,8,4,5
Explanation:
Insertion Sort builds the sorted array one element at a time. It picks the next • Step 3: Min is 4, swap with 8 → 2,3,4,8,5
element and inserts it into its correct position among the previously sorted
elements. • Step 4: Min is 5, swap with 8 → 2,3,4,5,8

Example: C++ Code:


Given array: 5, 3, 8, 4, 2
cpp
• Step 1: 3 is inserted before 5 → 3,5,8,4,2
#include <iostream>
• Step 2: 8 is in correct place → 3,5,8,4,2 using namespace std;

• Step 3: 4 is inserted between 3 and 5 → 3,4,5,8,2

• Step 4: 2 is inserted at the start → 2,3,4,5,8 void selectionSort(int arr[], int n) {

C++ Code: for (int i = 0; i < n - 1; i++) {


int minIdx = i; 4. If the end of the array is reached without a match, return -1245.

for (int j = i + 1; j < n; j++) { C++ Code Example:

if (arr[j] < arr[minIdx]) cpp

minIdx = j; #include <iostream>

} using namespace std;

int temp = arr[i];

arr[i] = arr[minIdx]; int linearSearch(int arr[], int n, int target) {

arr[minIdx] = temp; for (int i = 0; i < n; i++) {

} if (arr[i] == target)

} return i; // Return index if found

int main() { return -1; // Not found

int arr[] = {5, 3, 8, 4, 2}; }

int n = sizeof(arr) / sizeof(arr[0]);

selectionSort(arr, n); int main() {

cout << "Selection Sorted: "; int arr[] = {4, 6, 1, 2, 5, 3};

for (int i = 0; i < n; i++) cout << arr[i] << " "; int n = sizeof(arr) / sizeof(arr[0]);

return 0; int target = 4;

} int result = linearSearch(arr, n, target);

Complexity: if (result != -1)

• Best/Average/Worst: O(n2)O(n^2)O(n2) cout << "Element found at index: " << result << endl;

else
• Space: O(1)O(1)O(1) (in-place)
cout << "Element not found." << endl;
Summary Table
return 0;
Best Average Worst
Algorithm Space Stable Method
Case Case Case }

Adjacent Example:
Bubble Sort O(n) O(n²) O(n²) O(1) Yes Array: {4, 6, 1, 2, 5, 3}
swap
Target: 4
Insertion Insert Output: Element found at index: 0 (first position)24.
O(n) O(n²) O(n²) O(1) Yes
Sort element
Complexity:

Selection
O(n²) O(n²) O(n²) O(1) No Select min • Best case: O(1)O(1)O(1) (first element)
Sort
• Worst/Average case: O(n)O(n)O(n)
Conclusion:
Advantages:
• Bubble Sort is simple but inefficient for large datasets235.
• Simple and easy to implement245.
• Insertion Sort is efficient for small or nearly sorted arrays.
• Works on both sorted and unsorted arrays.
• Selection Sort is conceptually simple but generally outperformed by
insertion sort. • No extra memory required.
All three are mainly used for educational purposes and small
datasets. Limitations:

• Inefficient for large datasets.

• Time complexity grows linearly with input size.


Linear Search and Binary Search: Algorithm, Example, Advantages &
Binary Search
Limitations (C++)
Algorithm Steps:
Linear Search
1. Ensure the array is sorted.
Algorithm Steps:
2. Set two pointers: low (start) and high (end).
1. Start from the first element of the array.
3. While low ≤ high:
2. Compare each element with the target value.

3. If a match is found, return its position (index).


o Calculate mid = low + (high - low) / 2.

o If arr[mid] equals the target, return mid.


o If arr[mid] < target, set low = mid + 1. • Not suitable for linked lists or unsorted data.

o If arr[mid] > target, set high = mid - 1. Comparison Table

4. If not found, return -1678. Feature Linear Search Binary Search


C++ Code Example:
Array Sorted? Not required Required
cpp
Time Complexity O(n) O(log n)
#include <iostream>
Data Structure Any Array (random access)
using namespace std;
Simplicity Very simple More complex

int binarySearch(int arr[], int n, int target) { Use Case Small/unsorted data Large/sorted data

int low = 0, high = n - 1; Summary:

while (low <= high) {


• Linear Search checks each element sequentially and is simple but
int mid = low + (high - low) / 2; slow for large arrays.

if (arr[mid] == target) • Binary Search repeatedly divides the search interval in half, requiring
sorted data and offering much better performance for large datasets.
return mid;

else if (arr[mid] < target)


Radix Sort Algorithm in C++
low = mid + 1;
Algorithm Steps
else
Radix Sort is a non-comparative sorting algorithm that sorts integers by
high = mid - 1; processing individual digits. The sorting is performed from the least significant
digit (LSD) to the most significant digit (MSD), using a stable subroutine such as
}
Counting Sort at each digit position.
return -1;
Step-by-Step Algorithm
}
1. Find the Maximum Number:
Determine the maximum value in the array to know the number of
digits to process.
int main() {
2. Sort by Each Digit Place:
int arr[] = {2, 3, 5, 7, 11, 13, 17}; For each digit place (units, tens, hundreds, etc.):

int n = sizeof(arr) / sizeof(arr[0]); o Use Counting Sort to sort the array based on the current
digit.
int target = 7;
3. Repeat:
int result = binarySearch(arr, n, target); Repeat the process for all digit places until the most significant digit.

if (result != -1) C++ Implementation

cout << "Element found at index: " << result << endl; cpp

else #include <iostream>

cout << "Element not found." << endl; using namespace std;

return 0;

} // Function to get the largest element from an array

Example: int getMax(int array[], int n) {


Array: {2, 3, 5, 7, 11, 13, 17}
Target: 7 int max = array[0];
Output: Element found at index: 367.
for (int i = 1; i < n; i++)
Complexity:
if (array[i] > max)
• Best case: O(1)O(1)O(1) (middle element)
max = array[i];

• Worst/Average case: O(log⁡n)O(\log n)O(logn) return max;

Advantages: }

• Much faster than linear search for large, sorted datasets678.

• Time complexity grows logarithmically. // Using counting sort to sort the elements based on significant places

Limitations: void countSort(int array[], int size, int place) {

const int max = 10;


• Requires the array to be sorted.
int output[size];
• More complex to implement.
int count[max] = {0}; return 0;

// Calculate count of elements How the Algorithm Works (Example)

for (int i = 0; i < size; i++) Given array: {170, 45, 75, 90, 802, 24, 2, 66}

count[(array[i] / place) % 10]++; • Pass 1 (Units place):


Sorted by unit digit: {170, 90, 802, 2, 24, 45, 75, 66}

// Calculate cumulative count • Pass 2 (Tens place):


Sorted by tens digit: {802, 2, 24, 45, 66, 170, 75, 90}
for (int i = 1; i < max; i++)
• Pass 3 (Hundreds place):
count[i] += count[i - 1]; Sorted by hundreds digit: {2, 24, 45, 66, 75, 90, 170, 802}

Final sorted array:


2 24 45 66 75 90 170 80234
// Place the elements in sorted order
Time Complexity
for (int i = size - 1; i >= 0; i--) {

output[count[(array[i] / place) % 10] - 1] = array[i]; • Best, Average, Worst: O(d⋅(n+k))O(d \cdot (n + k))O(d⋅(n+k)), where
ddd is the number of digits, nnn is the number of elements, and kkk is
count[(array[i] / place) % 10]--; the range of the digit (10 for decimal numbers)45.

} • Space Complexity: O(n+k)O(n + k)O(n+k)

Summary

// Copy the output array to the original array


• Radix Sort processes each digit of the numbers, using Counting Sort
for (int i = 0; i < size; i++) as a stable subroutine.

array[i] = output[i]; • It is efficient for sorting integers, especially when the number of digits
is less than the number of elements.
}
• It avoids direct element comparisons, making it faster than A Binary
Search Tree (BST) is a binary tree data structure in which each node
// Main function to implement radix sort has at most two children, and for every node:

void radixSort(int array[], int size) { • All values in the left subtree are less than the node’s value.

int max = getMax(array, size); • All values in the right subtree are greater than the node’s value.

• Both left and right subtrees are themselves BSTs248.


// Apply counting sort to sort elements based on place value This arrangement enables efficient searching, insertion, and deletion operations,
making BSTs fundamental in many applications such as dynamic sets, lookup
for (int place = 1; max / place > 0; place *= 10)
tables, and priority queues9.
countSort(array, size, place);
Insertion in Binary Search Tree
}
Algorithm

1. Start at the root node.


// Utility function to print the array
2. Compare the value to insert with the current node’s value.
void display(int array[], int size) {
3. If the value is less, move to the left child; if greater, move to the right
for (int i = 0; i < size; i++) child.

cout << array[i] << " "; 4. Repeat steps 2-3 until you reach a null pointer (empty spot).

cout << endl; 5. Insert the new value as a leaf node at this position67.

} Example

Insert 15 into the BST:

int main() { text

int array[] = {170, 45, 75, 90, 802, 24, 2, 66}; 20

int n = sizeof(array) / sizeof(array[0]); / \

cout << "Before sorting: "; 10 30

display(array, n); • 15 < 20 → go left.

radixSort(array, n); • 15 > 10 → go right.


cout << "After sorting: ";
• Right child of 10 is null, insert 15 here.
display(array, n);
C++ Code
cpp C++ Code

#include <iostream> cpp

using namespace std; Node* findMin(Node* node) {

while (node->left != NULL)

struct Node { node = node->left;

int data; return node;

Node *left, *right; }

Node(int val) : data(val), left(NULL), right(NULL) {}

}; Node* deleteNode(Node* root, int key) {

if (root == NULL) return root;

Node* insert(Node* root, int key) { if (key < root->data)

if (root == NULL) return new Node(key); root->left = deleteNode(root->left, key);

if (key < root->data) else if (key > root->data)

root->left = insert(root->left, key); root->right = deleteNode(root->right, key);

else if (key > root->data) else {

root->right = insert(root->right, key); // Node with only one child or no child

return root; if (root->left == NULL) {

} Node* temp = root->right;

Deletion in Binary Search Tree delete root;

Algorithm return temp;

To delete a node with value key from BST: }

1. Start at the root. else if (root->right == NULL) {

2. Search for the node to delete: Node* temp = root->left;

o If key < node’s value, go left. delete root;

o If key > node’s value, go right. return temp;

o If key == node’s value, node found. }

3. Handle three cases: // Node with two children

o No children (leaf): Remove the node. Node* temp = findMin(root->right);

o One child: Replace node with its child. root->data = temp->data;

root->right = deleteNode(root->right, temp->data);


o Two children:
}
▪ Find the node’s inorder successor (smallest
value in right subtree). return root;

▪ Replace node’s value with successor’s value. }

▪ Delete the successor node6. Detailed Steps and Explanations

Example Insertion Steps

Delete 10 from: • Begin at the root.

text • Traverse left or right based on comparison.


20
• Insert at the first null position found.
/ \
• Maintains BST property679.
10 30
Deletion Steps
\
• Locate the node to delete.
40
• If the node is a leaf, simply remove it.
• 10 has no children: remove node.
• If the node has one child, link its parent to its child.
• If 10 had one child, replace 10 with its child.
• If the node has two children:
• If 10 had two children, replace 10 with its inorder successor.
o Find the inorder successor (leftmost node in right subtree). }

o Replace node’s value with successor’s value. Node* temp = findMin(root->right);

o Delete the successor node (which will have at most one root->data = temp->data;
child)69.
root->right = deleteNode(root->right, temp->data);
Complete Example Program
}
cpp
return root;
#include <iostream>
}
using namespace std;

void inorder(Node* root) {


struct Node {
if (root != NULL) {
int data;
inorder(root->left);
Node *left, *right;
cout << root->data << " ";
Node(int val) : data(val), left(NULL), right(NULL) {}
inorder(root->right);
};
}

}
Node* insert(Node* root, int key) {

if (root == NULL) return new Node(key);


int main() {
if (key < root->data)
Node* root = NULL;
root->left = insert(root->left, key);
root = insert(root, 50);
else if (key > root->data)
root = insert(root, 30);
root->right = insert(root->right, key);
root = insert(root, 20);
return root;
root = insert(root, 40);
}
root = insert(root, 70);

root = insert(root, 60);


Node* findMin(Node* node) {
root = insert(root, 80);
while (node->left != NULL)

node = node->left;
cout << "Inorder traversal: ";
return node;
inorder(root);
}
cout << endl;

Node* deleteNode(Node* root, int key) {


root = deleteNode(root, 20);
if (root == NULL) return root;
cout << "After deleting 20: ";
if (key < root->data)
inorder(root);
root->left = deleteNode(root->left, key);
cout << endl;
else if (key > root->data)

root->right = deleteNode(root->right, key);


root = deleteNode(root, 30);
else {
cout << "After deleting 30: ";
if (root->left == NULL) {
inorder(root);
Node* temp = root->right;
cout << endl;
delete root;

return temp;
root = deleteNode(root, 50);
}
cout << "After deleting 50: ";
else if (root->right == NULL) {
inorder(root);
Node* temp = root->left;
cout << endl;
delete root;

return temp;
return 0; • Complex Implementation: Insertion and deletion operations become
more complex due to the need to maintain thread pointers69.
}

Summary Table • Limited Use Cases: Mainly beneficial for traversal; not as widely used
as standard binary trees.
Operation Steps Time Complexity (avg/worst)
• Overhead: Additional logic is required to distinguish between child
Insertion Traverse, compare, insert O(log n) / O(n) pointers and thread pointers.

How Threads are Implemented in Trees


Deletion Search, handle cases, rebalance O(log n) / O(n)
A threaded binary tree modifies the standard binary tree structure:
Conclusion
• In a normal binary tree, many left or right child pointers are NULL
• A Binary Search Tree efficiently supports dynamic set operations (especially in leaves).
such as search, insert, and delete.
• In a threaded binary tree, these NULL pointers are replaced with
• Insertion places new nodes as leaves, maintaining the BST property. threads pointing to the node's in-order predecessor or successor,
facilitating traversal.
• Deletion handles three cases: leaf, one child, two children, with
special handling for the latter using the inorder successor. Types of Threaded Binary Trees

• BSTs are widely used due to their efficient average-case performance • Single Threaded: Only left or right NULL pointers are replaced with
and clear structure for ordered data689. threads (usually right).

A thread in computer science generally refers to a lightweight process or a • Double Threaded: Both left and right NULL pointers are replaced with
sequence of executable instructions within a program that can run threads (to predecessor and successor, respectively)6.
independently and concurrently with other threads145. However, in the context
of trees (specifically, binary trees), a thread has a different meaning: it refers to a Example: Threaded Binary Tree Insertion and Traversal
special pointer used to make tree traversal more efficient, particularly for in-
order traversal, by replacing some NULL pointers with pointers to in-order Node Structure in C++
predecessor or successor nodes69.
cpp
Advantages and Disadvantages of Threads
class Node {
General Multithreading (Software Threads)
public:
Advantages:
int key;
• Improved Performance and Concurrency: Threads allow multiple Node *left, *right;
operations to run in parallel, making better use of CPU resources and
improving program responsiveness35811. bool leftThread, rightThread;

• Resource Sharing: Threads within the same process share memory Node(int val) : key(val), left(nullptr), right(nullptr), leftThread(true),
and resources, enabling efficient communication18. rightThread(true) {}

};
• Better Responsiveness: Useful for interactive applications, as one
thread can handle user input while others perform background Insertion (Right Threaded Example)
tasks58.
cpp
• Simplified Modeling: Natural fit for tasks that can be performed
concurrently, such as handling multiple clients in a server811. Node* insert(Node* root, int key) {

Disadvantages: Node* ptr = root;

Node* parent = nullptr;


• Complexity: Multithreaded programs are harder to design, debug, and
maintain due to risks like deadlocks and race conditions2511.

• Synchronization Overhead: Managing access to shared resources while (ptr != nullptr) {


requires careful synchronization, which can degrade performance58.
if (key == ptr->key) {
• Resource Consumption: Each thread uses system resources; too
// Duplicate keys not allowed
many threads can exhaust memory or CPU time5.
return root;
• Potential for Bugs: Issues like race conditions and deadlocks are
difficult to detect and fix2511. }

Threads in Trees (Threaded Binary Trees) parent = ptr;

Advantages: if (key < ptr->key) {

• Faster Traversal: Threaded binary trees allow in-order traversal if (!ptr->leftThread)


without recursion or a stack, saving both time and space69.
ptr = ptr->left;
• Efficient Use of Memory: Utilizes otherwise unused NULL pointers to
else
store threads (pointers to in-order predecessor or successor)6.
break;
• Simplifies Traversal Algorithms: Makes traversal algorithms more
straightforward and efficient. } else {

Disadvantages: if (!ptr->rightThread)
ptr = ptr->right; Software Threads Threads in Trees (Threaded
Aspect
(Multithreading) Binary Trees)
else

break; Purpose Parallel/concurrent execution Efficient tree traversal

} Performance, responsiveness, Fast traversal, no


Advantages
resource sharing stack/recursion needed
}
Complexity, bugs, Complex insert/delete, less
Disadvantages
synchronization common
Node* newNode = new Node(key);
Special pointers in tree
C++ Example std::thread from <thread>
if (parent == nullptr) { nodes

root = newNode; Conclusion


newNode->left = newNode->right = nullptr;
• Threads in general enable concurrency in programs, but in tree data
newNode->leftThread = newNode->rightThread = true; structures, threading refers to using otherwise unused pointers to
facilitate efficient traversal.
} else if (key < parent->key) {
• Threaded binary trees are a specialized structure that optimizes in-
newNode->left = parent->left; order traversal, eliminating the need for recursion or a stack, at the
cost of more complex insertion and deletion logic69.
newNode->right = parent;

parent->leftThread = false; • C++ implementations involve marking whether a left/right pointer is a


thread and updating these pointers during insertion and traversal.
parent->left = newNode;
Role of Threads in Binary Search Tree (BST) with Example
} else {
What Are Threads in a BST?
newNode->left = parent;
In a standard binary search tree, each node has pointers to its left and right
newNode->right = parent->right; children. If a child does not exist, the corresponding pointer is NULL. In a
threaded binary tree, these NULL pointers are replaced with threads-special
parent->rightThread = false; pointers to a node’s in-order predecessor or successor. This modification
enables fast and efficient in-order traversal without using recursion or a
parent->right = newNode;
stack12457.
}
How Do Threads Work?
return root;
• Right Thread: If a node’s right child is NULL, its right pointer is used to
} point to its in-order successor.

In-Order Traversal Without Recursion or Stack • Left Thread: If a node’s left child is NULL, its left pointer is used to
point to its in-order predecessor.
cpp

void inorder(Node* root) { • Single Threaded: Only one of the above (usually right) is
implemented.
Node* cur = root;
• Double Threaded: Both left and right threads are implemented27.
while (cur != nullptr && !cur->leftThread)
A boolean flag in each node indicates whether the pointer is a traditional child
cur = cur->left; link or a thread2.

Why Use Threads in a BST?

while (cur != nullptr) {


• Efficient Traversal:
std::cout << cur->key << " "; In a normal BST, in-order traversal requires recursion or an explicit
stack to keep track of nodes. With threads, traversal can be performed
if (cur->rightThread) in linear time and constant space, moving directly to the next node in
sequence1257.
cur = cur->right;
• Memory Utilization:
else { Utilizes otherwise unused NULL pointers, making the structure more
memory-efficient5.
cur = cur->right;

while (cur != nullptr && !cur->leftThread) • No Extra Space for Stack:


Eliminates the need for extra memory during traversal, which is
cur = cur->left; especially beneficial for large trees27.

} Example

} Consider a BST:

} text

Summary Table 20

/ \

10 30
\ Feature Standard BST Threaded BST

40
Null pointers Many Replaced by threads
In-order Traversal: 10, 20, 30, 40
In-order traversal Needs recursion/stack No recursion/stack needed
• In a threaded BST:
Memory use Less efficient More efficient (no wasted pointers)
o The right pointer of 10 (which would be NULL) points to 20
(its in-order successor). Traversal speed Slower (extra space) Faster (constant space)

o The left pointer of 30 (which would be NULL) points to 20 Conclusion


(its in-order predecessor).
Threads in a binary search tree transform otherwise unused NULL pointers into
o The right pointer of 30 points to 40 (its child), but the right direct links to in-order predecessor or successor nodes. This enables efficient,
pointer of 40 (which would be NULL) points to the root or stackless, and recursive-free in-order traversal, making the tree more memory
next in the traversal.

This setup allows you to traverse the tree in-order by simply following child
pointers and threads, without recursion or stack257. What is an AVL Tree?

C++ Example: Node Structure and In-Order Traversal An AVL tree is a self-balancing binary search tree (BST), named after its inventors
Adelson-Velsky and Landis1235. In an AVL tree, the heights of the two child
cpp subtrees of any node differ by at most one. If at any time they differ by more than
one, rebalancing is done to restore this property. This ensures that the tree
#include <iostream> remains approximately balanced, guaranteeing O(log⁡n)O(\log n)O(logn) time
complexity for search, insertion, and deletion operations2357.
using namespace std;
Balance Factor:
For any node,
class Node {
Balance Factor=Height of Left Subtree−Height of Right Subtree\text{Balance
public: Factor} = \text{Height of Left Subtree} - \text{Height of Right
Subtree}Balance Factor=Height of Left Subtree−Height of Right Subtree
int key;
The balance factor must be -1, 0, or +1 for all nodes in an AVL tree5.
Node *left, *right;
How to Insert a Node into an AVL Tree
bool leftThread, rightThread;
Insertion Steps
Node(int val) : key(val), left(nullptr), right(nullptr), leftThread(true),
rightThread(true) {} 1. Standard BST Insertion:
Insert the new node as you would in a standard BST3689.
};
2. Update Heights:
Update the height of each ancestor node.

// Find the leftmost node 3. Check Balance Factor:


For each ancestor, check the balance factor.
Node* leftmost(Node* node) {
4. Rebalance if Needed:
while (node && !node->leftThread) If the balance factor becomes less than -1 or greater than +1, perform
rotations to restore balance3689.
node = node->left;
Rotations
return node;
There are four cases where the tree can become unbalanced after insertion:
}

• Left Left (LL) Case:


New node inserted into the left subtree of the left child.
// In-order traversal using threads Solution: Right rotation.

void inorder(Node* root) { • Right Right (RR) Case:


New node inserted into the right subtree of the right child.
Node* cur = leftmost(root);
Solution: Left rotation.
while (cur) {
• Left Right (LR) Case:
cout << cur->key << " "; New node inserted into the right subtree of the left child.
Solution: Left rotation on left child, then right rotation on current node.
if (cur->rightThread)
• Right Left (RL) Case:
cur = cur->right;
New node inserted into the left subtree of the right child.
else Solution: Right rotation on right child, then left rotation on current
node.
cur = leftmost(cur->right);
Algorithm for Insertion in an AVL Tree
}
1. Insert the new node using standard BST logic.
}
2. Update the height of the current node.
This example assumes the tree is already threaded.
3. Calculate the balance factor.
Summary Table
4. If the node becomes unbalanced:
o If balance > 1 and key < left child key: Right rotation (LL) Node* insert(Node* node, int key) {

o If balance < -1 and key > right child key: Left rotation (RR) // 1. Perform normal BST insertion

o If balance > 1 and key > left child key: Left-Right rotation if (!node) return new Node(key);
(LR)
if (key < node->key)
o If balance < -1 and key < right child key: Right-Left rotation
node->left = insert(node->left, key);
(RL)689.
else if (key > node->key)
C++ Code for AVL Tree Insertion
node->right = insert(node->right, key);
cpp
else // Duplicate keys not allowed
#include <iostream>
return node;
#include <algorithm>

using namespace std;


// 2. Update height

node->height = 1 + max(height(node->left), height(node->right));


struct Node {

int key, height;


// 3. Get balance factor
Node *left, *right;
int balance = getBalance(node);
Node(int val) : key(val), height(1), left(nullptr), right(nullptr) {}

};
// 4. Balance the node if needed

int height(Node* n) {
// Left Left Case
return n ? n->height : 0;
if (balance > 1 && key < node->left->key)
}
return rightRotate(node);

int getBalance(Node* n) {
// Right Right Case
return n ? height(n->left) - height(n->right) : 0;
if (balance < -1 && key > node->right->key)
}
return leftRotate(node);

Node* rightRotate(Node* y) {
// Left Right Case
Node* x = y->left;
if (balance > 1 && key > node->left->key) {
Node* T2 = x->right;
node->left = leftRotate(node->left);
x->right = y;
return rightRotate(node);
y->left = T2;
}
y->height = max(height(y->left), height(y->right)) + 1;

x->height = max(height(x->left), height(x->right)) + 1;


// Right Left Case
return x;
if (balance < -1 && key < node->right->key) {
}
node->right = rightRotate(node->right);

return leftRotate(node);
Node* leftRotate(Node* x) {
}
Node* y = x->right;

Node* T2 = y->left;
return node;
y->left = x;
}
x->right = T2;

x->height = max(height(x->left), height(x->right)) + 1;


// Utility function for in-order traversal
y->height = max(height(y->left), height(y->right)) + 1;
void inorder(Node* root) {
return y;
if (root) {
}
inorder(root->left);
cout << root->key << " "; This ensures efficient operations-search, insertion, and deletion-all in
O(log⁡n)O(\log n)O(logn) time347.
inorder(root->right);
1. Search Operation in AVL Tree
}
Algorithm:
}
• Start at the root.

int main() { • Compare the target key with the current node's key:

Node* root = nullptr; o If equal, the node is found.

root = insert(root, 10); o If less, move to the left child.

root = insert(root, 20); o If greater, move to the right child.

root = insert(root, 30); • Repeat until the node is found or a null pointer is reached (not found).

root = insert(root, 40); C++ Example:

root = insert(root, 50); cpp

root = insert(root, 25); struct Node {

int key, height;

cout << "Inorder traversal of the AVL tree: "; Node *left, *right;

inorder(root); Node(int val) : key(val), height(1), left(nullptr), right(nullptr) {}

cout << endl; };

return 0;

} Node* search(Node* root, int key) {

Summary if (root == nullptr || root->key == key)

• An AVL tree is a self-balancing BST where the height difference return root;
(balance factor) between left and right subtrees is at most 1 for every
node1235. if (key < root->key)

return search(root->left, key);


• Insertion is like BST insertion, followed by updating heights and
rebalancing the tree using rotations if necessary35689. else

• This guarantees efficient operations with O(log⁡n)O(\log n)O(logn) return search(root->right, key);
time complexity.
}
References:
1235689 Usage:
Call search(root, key). Returns pointer to the node if found, or nullptr if not
Citations: found146.

1. https://www.w3schools.com/dsa/dsa_data_avltrees.php 2. Deletion Operation in AVL Tree

2. https://herovired.com/learning-hub/blogs/avl-tree/ Algorithm:

3. https://en.wikipedia.org/wiki/AVL_tree 1. Standard BST Deletion:

4. https://blog.heycoach.in/properties-of-avl-trees/ o Find the node to delete.

5. https://www.programiz.com/dsa/avl-tree o If the node has one or zero children, remove it directly.

6. https://www.scholarhat.com/tutorial/datastructures/avl-tree-in-data- o If the node has two children, find its in-order successor
structures (smallest in right subtree), copy its value, and delete the
successor.
7. https://www.wscubetech.com/resources/dsa/avl-tree
2. Update Heights:
8. https://www.tutorialspoint.com/data_structures_algorithms/avl_tree_
After deletion, update the height of each ancestor node.
algorithm.htm
3. Balance the Tree:
9. https://ebooks.inflibnet.ac.in/csp01/chapter/insertion-and-deletion-
Check the balance factor for each ancestor:
avl-trees/
o If unbalanced (balance factor > 1 or < -1), perform
appropriate rotations:
Answer from Perplexity: https://www.perplexity.ai/search/summarize-what-are-
the-differe-61ecHbEzRQWNf4W.Ywftlg?login-source=signupButton&login- ▪ Left Left (LL): Right rotation.
new=false&utm_source=copy_output
▪ Right Right (RR): Left rotation.
Search and Deletion Operations in AVL Tree (C++)
▪ Left Right (LR): Left rotation on left child, then
Overview: right rotation.

An AVL tree is a self-balancing binary search tree where the difference in heights ▪ Right Left (RL): Right rotation on right child, then
(balance factor) between left and right subtrees is at most one for every node. left rotation246.
C++ Example: // Node with one child or no child

cpp if ((root->left == nullptr) || (root->right == nullptr)) {

int height(Node* n) { Node* temp = root->left ? root->left : root->right;

return n ? n->height : 0; if (temp == nullptr) {

} temp = root;

root = nullptr;

int getBalance(Node* n) { } else

return n ? height(n->left) - height(n->right) : 0; *root = *temp;

} delete temp;

} else {

Node* rightRotate(Node* y) { // Node with two children

Node* x = y->left; Node* temp = minValueNode(root->right);

Node* T2 = x->right; root->key = temp->key;

x->right = y; root->right = deleteNode(root->right, temp->key);

y->left = T2; }

y->height = std::max(height(y->left), height(y->right)) + 1; }

x->height = std::max(height(x->left), height(x->right)) + 1; if (root == nullptr) return root;

return x;

} // Update height

root->height = 1 + std::max(height(root->left), height(root->right));

Node* leftRotate(Node* x) {

Node* y = x->right; // Balance the node

Node* T2 = y->left; int balance = getBalance(root);

y->left = x;

x->right = T2; // Left Left

x->height = std::max(height(x->left), height(x->right)) + 1; if (balance > 1 && getBalance(root->left) >= 0)

y->height = std::max(height(y->left), height(y->right)) + 1; return rightRotate(root);

return y;

} // Left Right

if (balance > 1 && getBalance(root->left) < 0) {

Node* minValueNode(Node* node) { root->left = leftRotate(root->left);

Node* current = node; return rightRotate(root);

while (current->left != nullptr) }

current = current->left;

return current; // Right Right

} if (balance < -1 && getBalance(root->right) <= 0)

return leftRotate(root);

Node* deleteNode(Node* root, int key) {

// Standard BST delete // Right Left

if (root == nullptr) return root; if (balance < -1 && getBalance(root->right) > 0) {

if (key < root->key) root->right = rightRotate(root->right);

root->left = deleteNode(root->left, key); return leftRotate(root);

else if (key > root->key) }

root->right = deleteNode(root->right, key);

else { return root;


} C++ Example Structure5:

Usage: cpp
Call deleteNode(root, key) to delete a node and maintain AVL balance1246.
#include <iostream>
Summary Table
using namespace std;
Time
Operation Steps Rotations
Complexity
class BTreeNode {
Search Compare & traverse left/right None O(log n)
public:
BST delete, update heights, LL, RR, LR,
Deletion O(log n) int *keys;
balance RL
int t; // Minimum degree
Conclusion
BTreeNode **C;
• Search in an AVL tree is identical to a standard BST, always
O(log⁡n)O(\log n)O(logn) due to balancing347. int n;

• Deletion involves standard BST deletion followed by updating heights bool leaf;
and rebalancing using rotations to maintain the AVL property246.

• Both operations are efficient and guarantee logarithmic time due to BTreeNode(int t1, bool leaf1);
the strict balancing of AVL trees.
void traverse();
Citations:
BTreeNode *search(int k);
1. https://github.com/KhaledAshrafH/AVL-Tree
};
2. https://www.tutorialspoint.com/cplusplus-program-to-implement-
avl-tree

3. https://www.programiz.com/dsa/avl-tree BTreeNode::BTreeNode(int t1, bool leaf1) {

4. https://github.com/KadirEmreOto/AVL-Tree t = t1;

5. https://runestone.academy/ns/books/published/cppds/Trees/AVLTree leaf = leaf1;


Implementation.html
keys = new int[2 * t - 1];
6. https://www.youtube.com/watch?v=2ScmZ0_dxJc
C = new BTreeNode *[2 * t];
7. https://www.w3schools.com/dsa/dsa_data_avltrees.php
n = 0;
8. https://www.javaguides.net/2023/08/cpp-program-to-implement-avl-
tree.html }

Answer from Perplexity: https://www.perplexity.ai/search/summarize-what-are- // Insert, search, and traversal methods would be implemented here
the-differe-61ecHbEzRQWNf4W.Ywftlg?login-source=signupButton&login-
Conclusion:
new=false&utm_source=copy_output
B-Trees provide an efficient, balanced structure for organizing and accessing
6. Short Notes on B-Tree large datasets, especially when disk I/O is a concern. Their ability to maintain
balance and allow multiple keys per node makes them ideal for database and
A B-Tree is a self-balancing search tree in which each node can contain multiple filesystem implementations158.
keys and can have more than two children. It is widely used in databases and file
systems to efficiently manage large blocks of data that cannot fit entirely in 7. General Tree and Conversion to Binary Tree
memory.
General Tree
Key Properties of B-Tree158:
A General Tree is a hierarchical data structure where each node can have any
number of children, making it highly flexible for representing complex
• Order: If the order of the B-tree is nnn, each node can have at most
relationships (e.g., file systems, organizational charts)69.
nnn children and n−1n-1n−1 keys.
Characteristics:
• Balanced: All leaves are at the same depth (height).
• No restriction on the number of children per node.
• Node Capacity: Each node (except root) must have at least ⌈n/2⌉\lceil
n/2 \rceil⌈n/2⌉ children. • Nodes can have zero or more children.

• Root: The root must have at least 2 children if it is not a leaf. • Used for representing hierarchical data with variable branching.

• Key Order: Keys in each node are stored in increasing order. C++ Example Structure:

• Efficient Operations: Search, insertion, and deletion are all cpp


performed in O(log⁡n)O(\log n)O(logn) time.
#include <iostream>
Applications:
#include <vector>
• Used in database indexing and file systems due to efficient disk using namespace std;
access.

• Suitable for systems that read and write large blocks of data.
class GenTreeNode {
public: BinTreeNode* curr = bRoot->left;

int data; for (size_t i = 1; i < root->children.size(); ++i) {

vector<GenTreeNode*> children; curr->right = convertToBinary(root->children[i]);

GenTreeNode(int val) : data(val) {} curr = curr->right;

}; }

Conversion of General Tree to Binary Tree return bRoot;

Purpose: }
To represent a general tree using a binary tree structure, which simplifies storage
and traversal using standard binary tree algorithms. Example Illustration:

Steps for Conversion7: Suppose a general tree node A has children B, C, D:

1. Left-Child: text
For each node, keep its first (leftmost) child as the left child in the
A
binary tree.
├── B
2. Right-Sibling:
For each node, link its immediate right sibling as the right child in the ├── C
binary tree.
└── D
3. Remove Other Children:
All other children (other than the first) are linked as a chain through the After conversion:
right child pointers.
text
Result:
Each node in the binary tree has at most two children: A

• The left child points to its first child in the general tree. /

B
• The right child points to its next sibling.
\
C++ Example:
C
cpp
\
// General Tree Node
D
class GenTreeNode {

public: • B is the left child of A.

int data; • C is the right child of B.

vector<GenTreeNode*> children; • D is the right child of C.

GenTreeNode(int val) : data(val) {} Summary Table: General Tree vs. Binary Tree69

}; Aspect General Tree Binary Tree

Children Any number At most two (left and right)


// Binary Tree Node (Left-Child Right-Sibling)
Structure Flexible, unordered Strict, ordered (left/right)
class BinTreeNode {
Search trees, expression
public: Use Cases File systems, org charts
trees
int data;
Left-Child Right-Sibling
Conversion N/A
BinTreeNode *left, *right; representation

BinTreeNode(int val) : data(val), left(nullptr), right(nullptr) {} In summary:

};
• B-Trees are balanced, multi-way search trees ideal for large-scale
storage and retrieval.

// Conversion function • General Trees allow any number of children per node; they can be
systematically converted to binary trees using the left-child, right-
BinTreeNode* convertToBinary(GenTreeNode* root) { sibling method for easier processing and storage7.

if (!root) return nullptr; 1.

BinTreeNode* bRoot = new BinTreeNode(root->data); Huffman Algorithm: Explanation and C++ Example

What is Huffman Coding?

if (!root->children.empty()) Huffman coding is a lossless data compression algorithm that assigns


variable-length codes to input characters, with shorter codes for more frequent
bRoot->left = convertToBinary(root->children[0]);
characters and longer codes for less frequent ones. The main steps are:

1. Build a frequency table for all characters.


2. Construct a Huffman Tree using a priority queue (min-heap) based Node* buildHuffmanTree(const unordered_map<char, int>& freq) {
on frequencies.
priority_queue<Node*, vector<Node*>, Compare> pq;
3. Generate Huffman Codes by traversing the tree.
for (const auto& pair : freq) {
4. Encode the input using these codes.
pq.push(new Node(pair.first, pair.second));
This algorithm is widely used in compression formats like ZIP and JPEG.
}
Huffman Coding Algorithm Steps
while (pq.size() > 1) {
1. Count the frequency of each character in the input.
Node *left = pq.top(); pq.pop();
2. Create a leaf node for each character and build a min-heap of all leaf
nodes. Node *right = pq.top(); pq.pop();

3. While there is more than one node in the heap: Node *newNode = new Node('\0', left->frequency + right->frequency);

o Remove the two nodes with the lowest frequency. newNode->left = left;

newNode->right = right;
o Create a new internal node with these two nodes as
children and frequency equal to the sum of their pq.push(newNode);
frequencies.
}
o Insert the new node back into the min-heap.
return pq.top();
4. The remaining node is the root of the Huffman Tree.
}
5. Traverse the tree to assign codes: left edge as '0', right edge as '1'.

Example
// Generate Huffman Codes
Suppose the input is:
A:5, B:9, C:12, D:13, E:16, F:45 void generateCodes(Node* root, const string& str, unordered_map<char,
string>& huffmanCode) {
• The most frequent character (F) gets the shortest code.
if (!root) return;
• The least frequent (A) gets the longest code.
if (!root->left && !root->right) {
C++ Implementation
huffmanCode[root->character] = str;
cpp
}
#include <iostream>
generateCodes(root->left, str + "0", huffmanCode);
#include <queue>
generateCodes(root->right, str + "1", huffmanCode);
#include <unordered_map>
}
#include <vector>

#include <string>
int main() {
using namespace std;
unordered_map<char, int> freq = {{'A', 5}, {'B', 9}, {'C', 12}, {'D', 13}, {'E', 16}, {'F',
45}};

// Node structure for the Huffman Tree Node* root = buildHuffmanTree(freq);

struct Node {

char character; unordered_map<char, string> huffmanCode;

int frequency; generateCodes(root, "", huffmanCode);

Node *left, *right;

Node(char c, int f) : character(c), frequency(f), left(nullptr), right(nullptr) {} cout << "Huffman Codes:\n";

}; for (const auto& pair : huffmanCode) {

cout << pair.first << ": " << pair.second << endl;

// Comparator for the priority queue (min-heap) }

struct Compare { return 0;

bool operator()(Node* l, Node* r) { }

return l->frequency > r->frequency; Sample Output:

} text

}; Huffman Codes:

F: 0

// Build the Huffman Tree C: 100


D: 101 int keys[M]; // Array of keys (max M-1 keys)

A: 1100 Node* children[M + 1]; // Array of child pointers (max M children)

B: 1101

E: 111 Node() : count(0) {

Explanation of Output for (int i = 0; i <= M; ++i) children[i] = nullptr;

• F (most frequent) has the shortest code: 0 }

};
• A (least frequent) has the longest code: 1100
Searching in an M-way Search Tree
• Each code is a unique prefix, so the encoding is unambiguous.
Algorithm:
Key Points
1. At each node, compare the target value with the keys in the node.
• Time Complexity: O(nlog⁡n)O(n \log n)O(nlogn), where nnn is the
number of unique characters4. 2. If the value matches a key, return success.

3. Otherwise, determine the correct child pointer to follow (based on key


• Space Complexity: O(n)O(n)O(n) for the tree and code table.
intervals) and recurse.
• Advantage: Produces optimal prefix codes for lossless compression. 4. If a null child is reached, the value is not in the tree569.
In summary: C++ Code:
Huffman coding efficiently compresses data by assigning shorter codes to
frequent characters. The algorithm builds a binary tree using a min-heap and cpp
generates codes by traversing the tree. The provided C++ code demonstrates the
full process from frequency table to code generation1346. Node* search(Node* root, int key) {

Citations: if (!root) return nullptr;

1. https://www.programiz.com/dsa/huffman-coding int i = 0;

2. https://gist.github.com/pwxcoo/72d7d3c5c3698371c21e486722f9b3 while (i < root->count && key > root->keys[i])


4b
i++;
3. https://www.w3schools.com/dsa/dsa_ref_huffman_coding.php
if (i < root->count && key == root->keys[i])
4. https://www.tutorialspoint.com/huffman-coding
return root; // Key found
5. https://github.com/cynricfu/huffman-coding
return search(root->children[i], key);
6. https://blog.heycoach.in/huffman-encoding-decoding-in-c/
}
7. https://www.studytonight.com/data-structures/huffman-coding
Insertion in an M-way Search Tree
8. https://iamshnoo.github.io/huffman/index.html?amp=1
Algorithm:

1. Search for the correct leaf node where the new key should be inserted.
Answer from Perplexity: https://www.perplexity.ai/search/summarize-what-are-
2. If the node has fewer than m-1 keys, insert the key at the correct
the-differe-61ecHbEzRQWNf4W.Ywftlg?login-source=signupButton&login-
position.
new=false&utm_source=copy_output
3. If the node is full, split the node:
M-way Search Tree: Definition, Operations, and C++ Implementation

What is an M-way Search Tree? o Promote the median key to the parent.

An m-way search tree (or multi-way search tree) is a generalization of the binary o Split the node into two nodes, distributing keys and
search tree (BST) where each node can have up to m children and contains up to children.
m-1 keys. The keys within each node are kept in sorted order, and the children
pointers partition the key space so that:
o If the parent is also full, split recursively up to the root,
possibly creating a new root25610.
• All keys in the first child are less than the first key, C++ Code (Simplified, without splitting for brevity):

• Keys in the ith child are between the *(i-1)*th and ith key, cpp

• All keys in the last child are greater than the last key1356. void insert(Node* &root, int key) {

This structure reduces the height of the tree, making search, insertion, and if (!root) {
deletion more efficient, especially for large datasets.
root = new Node();
Structure of an M-way Search Tree Node (C++ Example)
root->keys[0] = key;
cpp
root->count = 1;
const int M = 4; // Example: 4-way search tree
return;

}
struct Node {
int i = 0;
int count; // Number of keys in the node
while (i < root->count && key > root->keys[i])
i++; root->keys[i] = succ->keys[0];

if (i < root->count && key == root->keys[i]) deleteKey(root->children[i + 1], succ->keys[0]);

return; // Duplicate key, do nothing }

if (!root->children[i]) { } else if (root->children[i]) {

// Insert key in this node if space is available deleteKey(root->children[i], key);

if (root->count < M - 1) { }

for (int j = root->count; j > i; --j) // Rebalancing logic would go here (not shown for brevity)

root->keys[j] = root->keys[j - 1]; }

root->keys[i] = key; Note: Full implementation would handle rebalancing after deletion.

root->count++; Summary Table

} else { Operation Steps Complexity

// Node splitting logic would go here (not shown for brevity)


Searching Compare keys in node, follow correct child, repeat O(logₘ n)
}
Find leaf, insert key or split node, propagate split if
Insertion O(logₘ n)
} else { needed

insert(root->children[i], key); Remove key, replace with successor/predecessor if


Deletion O(logₘ n)
needed, rebalance if node underflows
}

} Conclusion

Note: Full implementation would include node splitting when a node is full. • An m-way search tree is a generalization of BSTs where each node
can have up to m children and m-1 keys.
Deletion in an M-way Search Tree
• Searching involves comparing keys and following the correct child
Algorithm:
pointer.
1. Search for the key to be deleted.
• Insertion adds keys to leaf nodes or splits nodes as needed.
2. If the key is in a leaf node, remove it directly.
• Deletion removes keys and may require rebalancing.
3. If the key is in an internal node:
• These trees are foundational for efficient large-scale data storage,
o Replace it with either its in-order predecessor (largest in left
such as in database indices and file systems12356.
subtree) or successor (smallest in right subtree), and then
delete that value from the child node. Huffman Coding: Description, Importance in Data Structures, and C++
Example
4. If a node falls below the minimum number of keys, borrow a key from a
sibling or merge nodes as needed to maintain tree properties68. What is Huffman Coding?

C++ Code (Simplified): Huffman coding is a lossless data compression algorithm that assigns
variable-length codes to input characters, with shorter codes for more frequent
cpp
characters and longer codes for less frequent ones1689. It is a greedy algorithm
void deleteKey(Node* &root, int key) { that builds an optimal prefix code-meaning no code is a prefix of another-
ensuring unambiguous decoding68. Huffman coding is widely used in file
if (!root) return; compression (ZIP, GZIP), image and audio compression (JPEG, MP3), and
network data transmission12310.
int i = 0;
How Huffman Coding Works
while (i < root->count && key > root->keys[i])
1. Frequency Calculation:
i++; Count the frequency of each character in the input data568.

if (i < root->count && key == root->keys[i]) { 2. Tree Construction:

// Key found o Create a leaf node for each character, storing its frequency.

if (!root->children[i]) { o Insert all nodes into a priority queue (min-heap) based on


frequency459.
// Leaf node: remove key

for (int j = i; j < root->count - 1; ++j)


o While more than one node remains:

root->keys[j] = root->keys[j + 1]; ▪ Remove the two nodes with the lowest
frequencies.
root->count--;
▪ Create a new internal node with these two as
} else { children; its frequency is the sum of their
frequencies.
// Internal node: find successor and replace
▪ Insert the new node back into the queue.
Node* succ = root->children[i + 1];

while (succ->children[0])
o The remaining node is the root of the Huffman tree569.

3. Code Assignment:
succ = succ->children[0];
o Traverse the tree from root to leaves. if (!root) return;

o Assign '0' for a left edge and '1' for a right edge. if (!root->left && !root->right) huffmanCode[root->ch] = code;

o The code for each character is the sequence of 0s and 1s generateCodes(root->left, code + "0", huffmanCode);
along the path from root to that character456.
generateCodes(root->right, code + "1", huffmanCode);
4. Encoding and Decoding:
}
o Replace each character in the original data with its code to
compress.
int main() {
o For decompression, traverse the Huffman tree according to
the bit sequence until a leaf is reached, then output the // Example frequencies
corresponding character47.
unordered_map<char, int> freq = {{'A', 5}, {'B', 9}, {'C', 12}, {'D', 13}, {'E', 16}, {'F',
Importance in Data Structures 45}};

• Efficient Data Compression: priority_queue<Node*, vector<Node*>, Compare> pq;


Huffman coding minimizes the total number of bits needed to
represent data, reducing storage and transmission costs1310. for (auto& pair : freq)

pq.push(new Node(pair.first, pair.second));


• Optimal Prefix Codes:
Ensures no code is a prefix of another, preventing ambiguity in
decoding68.
// Build Huffman Tree
• Practical Applications:
Used in file formats (ZIP, JPEG, MP3), network protocols, and storage while (pq.size() > 1) {
systems for efficient, lossless compression1210.
Node *left = pq.top(); pq.pop();

• Algorithmic Concepts: Node *right = pq.top(); pq.pop();


Demonstrates the use of greedy algorithms, priority queues, and
binary trees in real-world data structure problems69. Node *parent = new Node('\0', left->freq + right->freq);

C++ Example: Huffman Coding Implementation parent->left = left;

cpp parent->right = right;

#include <iostream> pq.push(parent);

#include <queue> }

#include <unordered_map> Node* root = pq.top();

#include <vector>

#include <string> // Generate codes

using namespace std; unordered_map<char, string> huffmanCode;

generateCodes(root, "", huffmanCode);

// Node for Huffman Tree

struct Node { // Output codes

char ch; cout << "Huffman Codes:\n";

int freq; for (auto& pair : huffmanCode)

Node *left, *right; cout << pair.first << ": " << pair.second << endl;

Node(char c, int f) : ch(c), freq(f), left(nullptr), right(nullptr) {} return 0;

}; }

Sample Output:

// Comparator for priority queue text

struct Compare { Huffman Codes:

bool operator()(Node* a, Node* b) { F: 0

return a->freq > b->freq; C: 100

} D: 101

}; A: 1100

B: 1101

// Generate Huffman Codes E: 111

void generateCodes(Node* root, string code, unordered_map<char, string>& Summary Table: Huffman Coding
huffmanCode) {
Feature Description adj[v].push_back(u); // For undirected graph

}
Type Lossless compression, greedy algorithm
const vector<int>& neighbors(int u) const { return adj[u]; }
Data Structure
Binary tree (Huffman tree), priority queue (min-heap)
Used int size() const { return V; }

Output Variable-length, prefix-free codes };

2. Insertion and Deletion


Reduces storage/transmission size; optimal for given
Efficiency
frequencies Insert Vertex:
Increase the vertex count and add a new adjacency list.
File compression (ZIP, JPEG), network data, storage,
Applications Insert Edge:
text/audio compression

Conclusion cpp

Huffman coding is a foundational algorithm in data structures for efficient, void addEdge(int u, int v) {
lossless data compression. By using a binary tree and priority queue, it generates
adj[u].push_back(v);
optimal, prefix-free codes, enabling significant savings in storage and bandwidth.
Its practical importance is evident in many modern compression standards and adj[v].push_back(u); // For undirected graph
systems1610.
}
Citations:
Delete Edge:

cpp
Operations on Graphs: Concepts and C++ Code
void removeEdge(int u, int v) {
Graphs are fundamental data structures in computer science, consisting of a set
of vertices (nodes) and edges (connections). The most common operations on adj[u].erase(remove(adj[u].begin(), adj[u].end(), v), adj[u].end());
graphs include:
adj[v].erase(remove(adj[v].begin(), adj[v].end(), u), adj[v].end());
• Graph Representation
}
• Insertion and Deletion of Vertices/Edges Delete Vertex:
Remove all edges associated with the vertex and its adjacency list.
• Traversal (BFS & DFS)
3. Graph Traversal (BFS & DFS)
• Searching (Path Finding)
Breadth-First Search (BFS)
• Cycle Detection
BFS visits nodes level by level, using a queue.
• Shortest Path Algorithms cpp
Below is a detailed explanation of these operations, accompanied by C++ code #include <queue>
samples.
#include <vector>
1. Graph Representation
#include <iostream>
Graphs can be represented in several ways:
using namespace std;
• Adjacency List: Efficient for sparse graphs.

• Adjacency Matrix: Efficient for dense graphs.


void BFS(const Graph& g, int start) {
• Edge List: Simple list of all edges. vector<bool> visited(g.size(), false);
Adjacency List Example (C++): queue<int> q;
cpp q.push(start);
#include <iostream> visited[start] = true;
#include <vector> while (!q.empty()) {
using namespace std; int u = q.front(); q.pop();

cout << u << " ";


class Graph { for (int v : g.neighbors(u)) {
int V; if (!visited[v]) {
vector<vector<int>> adj; visited[v] = true;
public: q.push(v);
Graph(int V) : V(V), adj(V) {} }
void addEdge(int u, int v) { }
adj[u].push_back(v); }
} return true;

Depth-First Search (DFS) } else if (u != parent) {

DFS explores as far as possible along each branch before backtracking, using return true;
recursion or a stack.
}
cpp
}
void DFSUtil(const Graph& g, int u, vector<bool>& visited) {
return false;
visited[u] = true;
}
cout << u << " ";

for (int v : g.neighbors(u)) {


bool isCyclic(const Graph& g) {
if (!visited[v])
vector<bool> visited(g.size(), false);
DFSUtil(g, v, visited);
for (int u = 0; u < g.size(); ++u) {
}
if (!visited[u]) {
}
if (isCyclicUtil(g, u, visited, -1))

return true;
void DFS(const Graph& g, int start) {
}
vector<bool> visited(g.size(), false);
}
DFSUtil(g, start, visited);
return false;
}
}
4. Searching (Path Finding)
6. Shortest Path (Unweighted Graphs)
You can use BFS or DFS to determine if a path exists between two nodes.
BFS can be used to find the shortest path in an unweighted graph.
cpp
cpp
bool hasPath(const Graph& g, int src, int dest) {
vector<int> shortestPath(const Graph& g, int src) {
vector<bool> visited(g.size(), false);
vector<int> dist(g.size(), -1);
queue<int> q;
queue<int> q;
q.push(src);
q.push(src);
visited[src] = true;
dist[src] = 0;
while (!q.empty()) {
while (!q.empty()) {
int u = q.front(); q.pop();
int u = q.front(); q.pop();
if (u == dest) return true;
for (int v : g.neighbors(u)) {
for (int v : g.neighbors(u)) {
if (dist[v] == -1) {
if (!visited[v]) {
dist[v] = dist[u] + 1;
visited[v] = true;
q.push(v);
q.push(v);
}
}
}
}
}
}
return dist;
return false;
}
}
Summary Table
5. Cycle Detection
C++
Cycle detection can be done using DFS by checking for back edges. Operation Description
Structure/Algorithm

cpp
Adjacency List/Matrix/Edge vector<vector<int>>,
Representation
bool isCyclicUtil(const Graph& g, int v, vector<bool>& visited, int parent) { List etc.

visited[v] = true; Insert/Delete


Add/remove nodes/edges addEdge, removeEdge
Vertex/Edge
for (int u : g.neighbors(v)) {
Traversal Visit all nodes (BFS, DFS) Queue/Recursion
if (!visited[u]) {

if (isCyclicUtil(g, u, visited, v))


C++ 2. Main Loop:
Operation Description For each vertex kkk (acting as an intermediate node):
Structure/Algorithm

Search/Path Finding Check if path exists BFS/DFS


o For each pair of vertices (i,j)(i, j)(i,j):

▪ If dist[i][j]>dist[i][k]+dist[k][j]dist[i][j] > dist[i][k] +


Cycle Detection Detect cycles DFS with parent tracking
dist[k][j]dist[i][j]>dist[i][k]+dist[k][j], then update
dist[i][j]=dist[i][k]+dist[k][j]dist[i][j] = dist[i][k] +
Find minimum steps between
Shortest Path BFS dist[k][j]dist[i][j]=dist[i][k]+dist[k][j]458.
nodes
3. Result:
Conclusion After all iterations, dist[i][j]dist[i][j]dist[i][j] contains the shortest
distance from iii to jjj.
Graphs support a variety of essential operations including representation,
insertion, deletion, traversal (BFS/DFS), searching, cycle detection, and shortest Formula:
path finding. These operations are foundational for solving complex problems in
computer science, such as network analysis, pathfinding, and dependency dist[i][j]=min⁡(dist[i][j], dist[i][k]+dist[k][j])dist[i][j] = \min(dist[i][j],\ dist[i][k] +
resolution, and can be efficiently implemented in C++ using standard data dist[k][j])dist[i][j]=min(dist[i][j], dist[i][k]+dist[k][j])
structures and algorithms1236.
Example:
Suppose you have the following weighted adjacency matrix for 4 vertices:

Warshall's Algorithm and Floyd-Warshall Algorithm for Shortest Path text

Warshall's Algorithm 0 3 ∞ 7

Purpose: 8 0 2 ∞
Warshall's algorithm is used to compute the transitive closure of a directed
graph. That is, it determines whether a path exists between every pair of vertices, 5 ∞ 0 1
regardless of path length. It does not compute shortest paths or path weights,
2 ∞ ∞ 0
only reachability.
After applying Floyd-Warshall, you get the shortest path distances between all
Algorithm Steps:
pairs.
1. Represent the graph as an adjacency matrix AAA, where A[i][j]=1A[i][j]
C++ Code Example: Floyd-Warshall Algorithm
= 1A[i][j]=1 if there is an edge from vertex iii to jjj, else 000.
cpp
2. For each vertex kkk from 111 to nnn:
#include <iostream>
o For each pair of vertices (i,j)(i, j)(i,j):
#include <vector>
▪ If A[i][j]=1A[i][j] = 1A[i][j]=1 or (A[i][k]=1A[i][k] =
1A[i][k]=1 and A[k][j]=1A[k][j] = 1A[k][j]=1), then using namespace std;
set A[i][j]=1A[i][j] = 1A[i][j]=1.
const int INF = 1e9;
3. After all iterations, A[i][j]=1A[i][j] = 1A[i][j]=1 if there is a path from iii to
jjj.

Example: void floydWarshall(vector<vector<int>>& dist, int n) {


Given adjacency matrix for 3 vertices:
for (int k = 0; k < n; ++k)
text
for (int i = 0; i < n; ++i)
A = [ [0, 1, 0],
for (int j = 0; j < n; ++j)
[0, 0, 1],
if (dist[i][k] < INF && dist[k][j] < INF)
[0, 0, 0] ]
dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j]);
After applying Warshall's algorithm, the matrix becomes:
}
text

[ [0, 1, 1],
int main() {
[0, 0, 1],
int n = 4;
[0, 0, 0] ]
vector<vector<int>> dist = {
Now, A[2]=1A[2] = 1A[2]=1, indicating a path from 0 to 2 via 1.
{0, 3, INF, 7},
Floyd-Warshall Algorithm
{8, 0, 2, INF},
Purpose:
{5, INF, 0, 1},
The Floyd-Warshall algorithm finds the shortest paths between all pairs of
vertices in a weighted graph (can be directed or undirected, with positive or {2, INF, INF, 0}
negative edge weights but no negative cycles)124568.
};
Algorithm Steps:
floydWarshall(dist, n);
1. Initialization:
Create a distance matrix distdistdist where dist[i][j]dist[i][j]dist[i][j] is cout << "Shortest distances between every pair of vertices:\n";
the weight of the edge from iii to jjj, or infinity if no edge exists. Set
dist[i][i]=0dist[i][i] = 0dist[i][i]=0 for all iii. for (int i = 0; i < n; ++i) {

for (int j = 0; j < n; ++j) {


if (dist[i][j] == INF) Example

cout << "INF "; Consider a graph with vertices A, B, C, D, E, F and the following weighted edges:

else • A-B: 4, A-C: 2


cout << dist[i][j] << " ";
• B-C: 1, B-D: 5
}
• C-D: 8, C-E: 10
cout << endl;
• D-E: 2, D-F: 6
}
• E-F: 2
return 0;
Suppose we want the shortest path from A to all other vertices.
}
C++ Code Implementation
Summary Table
cpp
Handles Code
Algorithm Purpose Input Output
Weights Complexity #include <iostream>

Transitive Adjacency Reachability #include <vector>


Warshall's No Simple
closure matrix (0/1) matrix (0/1)
#include <queue>

All-pairs Weighted using namespace std;


Floyd- Shortest path
shortest adjacency Yes Moderate
Warshall lengths
path matrix

Conclusion typedef pair<int, int> pii; // (distance, vertex)

• Warshall's algorithm determines if a path exists between every pair of


vertices (reachability). void dijkstra(int n, vector<vector<pii>>& adj, int src) {

• Floyd-Warshall algorithm computes the shortest path distances vector<int> dist(n, INT_MAX);
between all pairs of vertices in a weighted graph, using dynamic
programming and an adjacency matrix representation124568. priority_queue<pii, vector<pii>, greater<pii>> pq;

• Both algorithms are fundamental for graph analysis and network


optimization. dist[src] = 0;

pq.push({0, src});
Dijkstra's Algorithm for Shortest Path

Introduction while (!pq.empty()) {


Dijkstra's algorithm is a classic greedy algorithm used to find the shortest path int u = pq.top().second;
from a single source vertex to all other vertices in a weighted, non-negative edge
graph1357. It is widely used in network routing, mapping, and many real-world int d = pq.top().first;
shortest path problems.
pq.pop();
Algorithm Steps

1. Initialization:
// If this distance is not up-to-date, skip
o Set the distance to the source vertex as 0 and all other
vertices as infinity. if (d > dist[u]) continue;

o Mark all vertices as unvisited.


for (auto edge : adj[u]) {
o Use a priority queue (min-heap) to efficiently select the next
vertex with the smallest tentative distance25. int v = edge.first;
2. Main Loop: int weight = edge.second;
o While there are unvisited vertices: if (dist[u] + weight < dist[v]) {

▪ Select the unvisited vertex with the smallest dist[v] = dist[u] + weight;
distance (let's call it u).
pq.push({dist[v], v});
▪ For each neighbor v of u, calculate the distance
from the source to v through u. If this distance is }
less than the current stored distance for v,
}
update it.
}
▪ Mark u as visited.

3. Termination:
cout << "Vertex\tDistance from Source\n";
o When all vertices are visited, the algorithm ends. The
distance array now contains the shortest distances from for (int i = 0; i < n; ++i)
the source to every vertex1357.
cout << char('A' + i) << "\t" << dist[i] << endl;
} • Game AI pathfinding

In summary:
Dijkstra's algorithm efficiently computes the shortest path from a single source
int main() {
to all other nodes in a weighted graph with non-negative edges, using a greedy
int n = 6; // Number of vertices (A-F) strategy and a priority queue for optimal performance12357.

vector<vector<pii>> adj(n);
Topological Sorting in C++

// Edges: (u, v, weight) Definition and Purpose

adj[0].push_back({1, 4}); // A-B Topological sorting is a linear ordering of the vertices of a Directed Acyclic
Graph (DAG) such that for every directed edge u→vu \rightarrow vu→v, vertex uuu
adj[0].push_back({2, 2}); // A-C comes before vvv in the ordering. This is widely used in scheduling tasks,
resolving symbol dependencies in compilers, and determining the order of
adj[1].push_back({2, 1}); // B-C compilation in build systems3567.

adj[1].push_back({3, 5}); // B-D When is Topological Sort Possible?

adj[2].push_back({3, 8}); // C-D


• Only for DAGs (Directed Acyclic Graphs).
adj[2].push_back({4, 10}); // C-E
• If the graph contains a cycle, topological sorting is not possible56.
adj[3].push_back({4, 2}); // D-E
Algorithms for Topological Sort
adj[3].push_back({5, 6}); // D-F
There are two main approaches:
adj[4].push_back({5, 2}); // E-F
1. Depth-First Search (DFS) Based Approach

• Visit each unvisited vertex.


dijkstra(n, adj, 0); // Source is A (index 0)
• Recursively visit all its unvisited neighbors.
return 0;

} • After visiting all neighbors, push the vertex to a stack.

Output • At the end, pop vertices from the stack to get the topological order256.

text 2. Kahn's Algorithm (BFS/Indegree Method)

Vertex Distance from Source • Compute the indegree (number of incoming edges) for each vertex.

A 0 • Enqueue all vertices with indegree 0.


B 3
• While the queue is not empty:
C 2
o Remove a vertex from the queue and add it to the result.
D 8
o For each neighbor, decrease its indegree by 1. If indegree
E 10 becomes 0, enqueue it.

F 12 • If all vertices are processed, the ordering is valid. Otherwise, the graph
has a cycle17.
(Distances may vary based on graph representation and edge direction.)
C++ Code Example: DFS Approach
Explanation
cpp
• The algorithm starts at the source (A), visiting the nearest unvisited
vertex at each step and updating the shortest known distances to its #include <iostream>
neighbors.
#include <list>
• It uses a priority queue to always process the vertex with the smallest
#include <stack>
tentative distance next25.
using namespace std;
• Once a vertex is marked visited, its shortest distance is finalized and
never updated again137.

Key Points class Graph {

• Greedy Approach: Always selects the nearest unvisited vertex137. int V;

list<int> *adj;
• Time Complexity: O((V+E)log⁡V)O((V + E) \log V)O((V+E)logV) with a
min-heap priority queue. void topologicalSortUtil(int v, bool visited[], stack<int> &Stack);

• Limitation: Works only with non-negative edge weights. public:

Applications Graph(int V);

• GPS navigation and mapping void addEdge(int v, int w);

void topologicalSort();
• Network routing protocols
};
Output:
5 4 2 3 1 0 (One possible valid ordering)56.
Graph::Graph(int V) {
C++ Code Example: Kahn's Algorithm (BFS/Indegree)
this->V = V;
cpp
adj = new list<int>[V];
#include <iostream>
}
#include <vector>

#include <queue>
void Graph::addEdge(int v, int w) {
using namespace std;
adj[v].push_back(w);

}
void topologicalSort(int V, vector<vector<int>> &adj) {

vector<int> indegree(V, 0);


void Graph::topologicalSortUtil(int v, bool visited[], stack<int> &Stack) {
for (int i = 0; i < V; i++)
visited[v] = true;
for (int v : adj[i])
for (auto i = adj[v].begin(); i != adj[v].end(); ++i)
indegree[v]++;
if (!visited[*i])

topologicalSortUtil(*i, visited, Stack);


queue<int> q;
Stack.push(v);
for (int i = 0; i < V; i++)
}
if (indegree[i] == 0)

q.push(i);
void Graph::topologicalSort() {

stack<int> Stack;
vector<int> topo;
bool *visited = new bool[V];
while (!q.empty()) {
for (int i = 0; i < V; i++)
int u = q.front(); q.pop();
visited[i] = false;
topo.push_back(u);
for (int i = 0; i < V; i++)
for (int v : adj[u]) {
if (!visited[i])
indegree[v]--;
topologicalSortUtil(i, visited, Stack);
if (indegree[v] == 0)
while (!Stack.empty()) {
q.push(v);
cout << Stack.top() << " ";
}
Stack.pop();
}
}
for (int v : topo) cout << v << " ";
cout << endl;
cout << endl;
}
}

int main() {
int main() {
Graph g(6);
int V = 6;
g.addEdge(5, 2);
vector<vector<int>> adj(V);
g.addEdge(5, 0);
adj[5] = {2, 0};
g.addEdge(4, 0);
adj[4] = {0, 1};
g.addEdge(4, 1);
adj[2] = {3};
g.addEdge(2, 3);
adj[3] = {1};
g.addEdge(3, 1);
cout << "Topological Sort using Kahn's Algorithm:\n";
cout << "Topological Sort of the given graph:\n";
topologicalSort(V, adj);
g.topologicalSort();
return 0;
return 0;
}
}
Output: public:
4 5 2 0 3 1 (One possible valid ordering)17.
void addEdge(int u, int v) {
Applications of Topological Sort
adjList[u].insert(v);
• Task scheduling (e.g., build systems, course prerequisites)
adjList[v].insert(u); // For undirected graph
• Resolving symbol dependencies in compilers }

• Determining the order of compilation const map<int, set<int>>& getAdjList() const { return adjList; }

• Data serialization, circuit design private:

Complexity map<int, set<int>> adjList;

};
• Both DFS and Kahn's Algorithm run in O(V + E) time, where V = number
of vertices, E = number of edges127.

Summary Table // BFS function

Data vector<int> bfs(const Graph& graph, int start) {


Method Approach Output Order Cycle Detection
Structure
set<int> visited;
Reverse
DFS Recursive Stack No (unless checked) queue<int> q;
postorder
vector<int> result;
Kahn's Yes (if not all nodes
Iterative Queue As processed
(BFS) processed)

In summary: q.push(start);
Topological sorting provides a way to order tasks in a DAG so that all
while (!q.empty()) {
dependencies are respected. It is implemented efficiently in C++ using either
DFS (with a stack) or Kahn's Algorithm (using indegrees and a queue)1257. int node = q.front();
Traversal of Graph in Detail (with C++ Code) q.pop();
Graph traversal is the process of visiting all the vertices (and possibly edges) in a if (visited.find(node) == visited.end()) {
graph in a systematic way. Traversal is fundamental for exploring graph
structures, finding paths, detecting cycles, and solving many real-world visited.insert(node);
problems.
result.push_back(node);
The two most common graph traversal techniques are:
for (int neighbor : graph.getAdjList().at(node)) {
• Breadth-First Search (BFS)
if (visited.find(neighbor) == visited.end()) {
• Depth-First Search (DFS)
q.push(neighbor);
1. Breadth-First Search (BFS)
}
Concept:
}
BFS explores the graph level by level. Starting from a source vertex, it visits all its
neighbors before moving to the next level of neighbors. BFS uses a queue to keep }
track of vertices to visit next.
}
Applications:
return result;
• Finding the shortest path in unweighted graphs
}
• Level-order traversal

• Connected components detection int main() {

C++ Implementation: Graph graph;

cpp graph.addEdge(0, 1);

#include <iostream> graph.addEdge(0, 2);

#include <queue> graph.addEdge(1, 3);

#include <set> graph.addEdge(1, 4);

#include <map> graph.addEdge(2, 5);

#include <vector>

using namespace std; vector<int> traversal = bfs(graph, 0);

cout << "BFS Traversal: ";

// Graph class using adjacency list for (int node : traversal) cout << node << " ";

class Graph { cout << endl;


return 0; g.addEdge(0, 1);

} g.addEdge(0, 2);

Output: g.addEdge(1, 3);


BFS Traversal: 0 1 2 3 4 5
This shows nodes visited level by level starting from node 02369. g.addEdge(1, 4);

2. Depth-First Search (DFS) g.addEdge(2, 5);

Concept:
DFS explores as deep as possible along each branch before backtracking. It uses
cout << "DFS Traversal: ";
a stack (often implemented via recursion) to keep track of the path.
g.DFS(0);
Applications:
cout << endl;
• Detecting cycles
return 0;
• Topological sorting
}
• Connected components
Output:
C++ Implementation: DFS Traversal: 0 1 3 4 2 5
This shows nodes visited by going as deep as possible before backtracking57.
cpp
Summary Table
#include <iostream>
Traversal Data Structure Order Visited Applications
#include <list>
Shortest path, connectivity,
#include <vector> BFS Queue Level by level
search
using namespace std;
Deep before Cycle detection, topological
DFS Stack/Recursion
backtrack sort

class Graph { Conclusion


int V;
• BFS and DFS are the two fundamental graph traversal algorithms.
list<int> *adj;
• BFS uses a queue to visit nodes level by level, ideal for shortest path
void DFSUtil(int v, vector<bool>& visited) { and connectivity.

visited[v] = true; • DFS uses a stack (or recursion) to explore as deep as possible, useful
for cycle detection and topological sorting.
cout << v << " ";

for (int neighbor : adj[v]) { • Both can be implemented efficiently in C++ using standard data
structures12357.
if (!visited[neighbor])
Difference Between DFS and BFS (with C++ Code)
DFSUtil(neighbor, visited);
Below is a comprehensive point-wise comparison between Depth-First Search
} (DFS) and Breadth-First Search (BFS), including their principles,
implementation, applications, and C++ code examples.
}
1. Definition and Traversal Order
public:
• BFS (Breadth-First Search):
Graph(int V) {
o Explores all nodes at the present depth level before moving
this->V = V;
on to nodes at the next depth level (layer by layer)145.
adj = new list<int>[V];
• DFS (Depth-First Search):
}
o Explores as far as possible along each branch before
void addEdge(int v, int w) { backtracking (goes deep before wide)145.

adj[v].push_back(w); 2. Data Structure Used

} • BFS: Uses a Queue (FIFO principle) to keep track of the next vertex to
visit147.
void DFS(int v) {

vector<bool> visited(V, false); • DFS: Uses a Stack (LIFO principle) or recursion to keep track of the
path147.
DFSUtil(v, visited);
3. Implementation Principle
}
• BFS: First-In-First-Out (FIFO)47.
};
• DFS: Last-In-First-Out (LIFO)47.

4. Time and Space Complexity


int main() {

Graph g(6);
• Time Complexity: Both BFS and DFS have O(V+E)O(V + E)O(V+E) time #include <iostream>
complexity for adjacency list representation, where V = vertices, E =
#include <queue>
edges45.
#include <vector>
• Space Complexity:
using namespace std;
o BFS: Higher, as it stores all nodes at the current level in the
queue5.

o DFS: Lower, as it only stores nodes along the current path in void bfs(vector<vector<int>>& adj, int start) {
the stack or recursion call stack5.
int n = adj.size();
5. Path Finding and Optimality
vector<bool> visited(n, false);
• BFS: Guarantees the shortest path in unweighted graphs5.
queue<int> q;

• DFS: Does not guarantee the shortest path5. q.push(start);

6. Applications visited[start] = true;

• BFS: while (!q.empty()) {

o Finding shortest path in unweighted graphs5. int node = q.front(); q.pop();

o Network broadcasting, social network friend suggestions, cout << node << " ";
bipartite graph checking145.
for (int neighbor : adj[node]) {
• DFS: if (!visited[neighbor]) {

o Cycle detection, topological sorting, solving visited[neighbor] = true;


puzzles/mazes, connected components145.
q.push(neighbor);
7. Suitability
}
• BFS: Suitable for searching vertices closer to the source (level-wise
search)4. }

}
• DFS: Suitable for solutions that may be far from the source or require
exploring all possibilities (deep search)4. }
8. Backtracking DFS Implementation (C++):

• BFS: No backtracking4. cpp

• DFS: Uses backtracking to explore alternative paths4. #include <iostream>

9. Loop Trapping #include <vector>

using namespace std;


• BFS: Less prone to getting trapped in infinite loops (with proper visited
checks)4.

• DFS: Can get trapped in cycles if visited nodes are not tracked4. void dfsUtil(vector<vector<int>>& adj, int node, vector<bool>& visited) {

10. Order of Visiting Nodes visited[node] = true;

• BFS: Visits siblings before children4. cout << node << " ";

for (int neighbor : adj[node]) {


• DFS: Visits children before siblings4.
if (!visited[neighbor])
11. Tree Traversal
dfsUtil(adj, neighbor, visited);
• BFS: Used for level-order traversal in trees2.
}
• DFS: Used for pre-order, in-order, and post-order traversals in trees2.
}
12. Implementation Complexity

• BFS: Straightforward with a queue4.


void dfs(vector<vector<int>>& adj, int start) {
• DFS: Can use recursion or explicit stack4. int n = adj.size();
13. Cycle Detection vector<bool> visited(n, false);

• BFS: Not commonly used for cycle detection. dfsUtil(adj, start, visited);

• DFS: Commonly used for cycle detection in graphs4. }

14. Example C++ Code 15. Summary Table

BFS Implementation (C++):

cpp
Parameter BFS DFS Let's compute the hash values:

Deep along a branch, then Key kmod 7k \mod 7kmod7 Hash Index
Traversal Order Level by level
backtrack
32 32 % 7 = 4 4
Data Structure Queue (FIFO) Stack (LIFO) or Recursion
49 49 % 7 = 0 0
Space Higher (stores all nodes at a Lower (stores only current
Complexity level) path) 97 97 % 7 = 6 6

Shortest Path Yes (in unweighted graphs) No 101 101 % 7 = 3 3

Backtracking No Yes 102 102 % 7 = 4 4

Shortest path, bipartite, Cycle detection, topological 155 155 % 7 = 1 1


Applications
networking sort
183 183 % 7 = 1 1
Tree Traversal Level-order Pre, In, Post-order
Chaining as Collision Resolution
Loop Trapping Less likely Possible without visited check
• Chaining: Each table index points to a linked list (chain) of all keys
Implementation Simple with queue Simple with recursion/stack that hash to that index710.

Cycle Detection Not typical Common


• When a collision occurs (multiple keys hash to the same index), the
new key is added to the end of the list at that index.
16. Conclusion
Resulting Hash Table:

• BFS is optimal for shortest path and level-wise traversal, uses more Index Keys Stored (Chain)
memory, and is implemented with a queue145.
0 49
• DFS is suited for deep exploration, uses less memory, enables
backtracking, and is implemented with a stack or recursion145.
1 155 → 183

• Both are fundamental for graph and tree algorithms, each with distinct
2
strengths and applications.

Rules for Choosing a Good Hash Function 3 101

A good hash function is critical for the efficient performance of a hash table. The 4 32 → 102
key rules and principles are:
5
• Uniform Distribution: The hash function should distribute keys as
evenly as possible across the table to minimize collisions and avoid 6 97
clustering23.
C++ Code Example: Hash Table with Chaining (Division Method)
• Minimize Collisions: Different keys should rarely hash to the same
index. Fewer collisions mean faster lookups and insertions234. cpp

• Efficiency: The function should be simple and fast to compute3. #include <iostream>

#include <list>
• Deterministic: The same input must always produce the same
output. using namespace std;

• Flexibility: Should work well for a wide range of possible inputs3.

• Scalability: Should perform well as the table size or the number of class HashTable {
keys grows3.
int size;

• Avoid Patterns: The function should not produce patterns that could list<int> *table; // Array of linked lists
cause clustering.
public:
• Use of Established Algorithms: Prefer well-tested hash functions
over custom ones for critical applications3. HashTable(int s) : size(s) {

Division Method of Hashing table = new list<int>[size];

The division method is a simple and widely used hash function: }

h(k)=kmod mh(k) = k \mod mh(k)=kmodm void insert(int key) {

where kkk is the key and mmm is the table size69. int index = key % size;

table[index].push_back(key);
• Table size mmm should preferably be a prime number, not a power of
2, to help distribute keys more uniformly6. }

Hashing the Given Values with Table Size 7 void display() {

Given values: 32, 49, 97, 101, 102, 155, 183 for (int i = 0; i < size; ++i) {
Table size (m): 7
Hash function: h(k)=kmod 7h(k) = k \mod 7h(k)=kmod7 cout << i << ": ";
for (int val : table[i]) • Good hash functions ensure uniform distribution, minimize collisions,
and are efficient to compute234.
cout << val << " -> ";

cout << "NULL\n"; • Using the division method with table size 7, the given values are
distributed as shown, with collisions handled by chaining6710.
}
• The provided C++ code demonstrates insertion and display of the
} hash table using chaining for collision resolution.

~HashTable() {

delete[] table; 13. What do you mean by hashing? Explain various hashing functions with
suitable examples.
}
What is Hashing?
};
Hashing is the process of transforming input data (called a key) into a fixed-size
value (called a hash value, hash code, or digest) using a mathematical function
int main() { called a hash function124. The hash value is typically used as an index in a hash
table for efficient data storage and retrieval. Hashing is a one-way process: it is
int keys[] = {32, 49, 97, 101, 102, 155, 183}; extremely difficult to reconstruct the original data from its hash value125.

int n = sizeof(keys) / sizeof(keys[0]); Key components of hashing:

HashTable ht(7); • Input Key: The data to be hashed (e.g., a number, string, file).

• Hash Function: The mathematical function that converts the input


for (int i = 0; i < n; ++i) into a hash value.

ht.insert(keys[i]); • Hash Table: The data structure that stores the hash values and
associated data128.

Use cases:
cout << "Hash Table using Division Method and Chaining:\n";
• Fast data lookup in hash tables
ht.display();

return 0;
• Password storage

} • Digital signatures

Sample Output: • Data integrity verification125

text Common Hashing Functions

Hash Table using Division Method and Chaining: 1. Division (Modulo) Method

0: 49 -> NULL • Formula: h(k)=kmod mh(k) = k \mod mh(k)=kmodm


1: 155 -> 183 -> NULL
• Example: Table size m=10m = 10m=10, key k=112k = 112k=112:
2: NULL h(112)=112mod 10=2h(112) = 112 \mod 10 = 2h(112)=112mod10=28

3: 101 -> NULL • C++ Example:

4: 32 -> 102 -> NULL cpp

5: NULL int hashFunc(int key, int tableSize) {

6: 97 -> NULL return key % tableSize;

Summary Table }

Rule for Good Hash 2. Multiplication Method


Explanation
Function
• Formula: h(k)=⌊m⋅(kAmod 1)⌋h(k) = \lfloor m \cdot (kA \mod 1)
Uniform Distribution Spread keys evenly across table23 \rfloorh(k)=⌊m⋅(kAmod1)⌋, where 0<A<10 < A < 10<A<1

Reduce number of keys mapping to same • Example: m=10,A=0.618m = 10, A = 0.618m=10,A=0.618, key k=112k
Minimize Collisions = 112k=112:
index234
h(112)=⌊10×(112×0.618mod 1)⌋h(112) = \lfloor 10 \times (112 \times
Efficiency Fast computation3 0.618 \mod 1) \rfloorh(112)=⌊10×(112×0.618mod1)⌋

Deterministic Same input gives same output • C++ Example:

cpp
Flexibility Works for various key types3
int hashFunc(int key, int tableSize) {
Scalability Performs well as data grows3
double A = 0.6180339887;
Avoid Patterns Prevent clustering
return int(tableSize * fmod(key * A, 1));
Use Established Algorithms Prefer proven hash functions3 }

Conclusion 3. Folding Method


• Description: Split the key into parts, add them together, then take int index = key % size;
modulo table size.
table[index].push_back(key);
• Example: Key = 123456, split into 123 and 456, sum = 579, then }
579mod m579 \mod m579modm
void display() {
• C++ Example:
for (int i = 0; i < size; ++i) {
cpp
cout << i << ": ";
int hashFunc(int key, int tableSize) {
for (int val : table[i]) cout << val << " -> ";
int part1 = key / 1000;
cout << "NULL\n";
int part2 = key % 1000;
}
return (part1 + part2) % tableSize;
}
}
~HashTable() { delete[] table; }
4. Mid-Square Method
};
• Description: Square the key, extract the middle digits, then take
int main() {
modulo table size.
HashTable ht(7);
• Example: Key = 123, 1232=15129123^2 = 151291232=15129, middle
digits = 151, 151mod m151 \mod m151modm int keys[] = {32, 49, 97, 101, 102, 155, 183};

• C++ Example: for (int k : keys) ht.insert(k);

cpp ht.display();

int hashFunc(int key, int tableSize) { return 0;

int squared = key * key; }

int mid = (squared / 10) % 100; // extract middle two digits Output:

return mid % tableSize; text

} 0: 49 -> NULL

5. Cryptographic Hash Functions 1: 155 -> 183 -> NULL

2: NULL
• Description: Used in security (e.g., SHA-256, MD5); produce fixed-
length, unique, and irreversible digests25. 3: 101 -> NULL

• Example: Hashing a password before storing it. 4: 32 -> 102 -> NULL

• C++ Example: (using libraries, e.g., OpenSSL) 5: NULL

14. How can collision be resolved? 6: 97 -> NULL

Collision: 2. Open Addressing


A collision occurs when two different keys hash to the same index in the hash
table. • If a collision occurs, probe for the next available slot using a defined
sequence.
Collision Resolution Techniques
• Linear Probing: Check next slot (index + 1, index + 2, ...).
1. Chaining
• Quadratic Probing: Check index + 1^2, index + 2^2, etc.
• Each table index points to a linked list of entries.
• Double Hashing: Use a second hash function to determine the step
• All keys that hash to the same index are stored in the list. size.

• C++ Example: • C++ Example (Linear Probing):


cpp cpp
#include <iostream> const int TABLE_SIZE = 7;
#include <list> int table[TABLE_SIZE] = {0};
using namespace std; void insert(int key) {
class HashTable { int hash = key % TABLE_SIZE;
int size; while (table[hash] != 0) {
list<int>* table; hash = (hash + 1) % TABLE_SIZE;
public: }
HashTable(int s) : size(s) { table = new list<int>[size]; } table[hash] = key;
void insert(int key) { }
Summary Table o The smallest unit of data within a record.

Hashing Function Example Use o For example, in a record {101, "Raj", 85}, the field "Raj"
represents the name.
Division (Modulo) Fast, simple, general purpose
4. Data Section:
Multiplication More uniform distribution
o This holds the actual content of the file, meaning the
records are stored in this part.
Folding Large numeric keys
5. End-of-File (EOF) Marker:
Mid-Square Uniform for certain key types
o A special marker that indicates the end of the file to prevent
Cryptographic (SHA, MD) Security, data integrity reading beyond it.

Collision Resolution Description o In text files, it's often represented as a special character like
EOF or -1.
Chaining Linked lists at each index

Linear Probing Next available slot Q2. Describe Various Kinds of Operations Required to Maintain Files. (15
Marks)
Probing with quadratic
Quadratic Probing
step Introduction:

Double Hashing Second hash for step size File operations are fundamental for managing data stored in external storage.
These operations include creating, opening, reading, writing, updating, and
In summary: deleting files. File handling ensures data permanence, structured access, and
security.
• Hashing is a method to map data to a fixed-size value using a hash
function for efficient storage and retrieval1258.
Types of File Operations:
• Hash functions include division, multiplication, folding, mid-square,
and cryptographic hashes. 1. Create:

• Collisions can be resolved by chaining (linked lists) or open o This operation creates a new file in the system.
addressing (probing)86.
o A unique filename and location are assigned, and space is
• C++ code examples illustrate both hash function usage and collision allocated.
resolution.
2. Open:
Absolutely, Raj! Below is the exam-oriented, rewritten version of all your
o Before any operation (read/write), a file must be opened
questions with long, 15-mark-style definitions. C++ code is included only where
using a specific mode (read, write, append, binary, etc.).
it's truly needed for explanation. These answers are ideal for a BCA-level theory
exam—especially one focusing on File Organization or System Software. Each 3. Read:
answer is detailed, structured, and clear for scoring high in exams.
o Used to retrieve data from a file.

o The data can be read sequentially or randomly depending


Q1. Define File. Describe Constituents of a File. (15 Marks) on the file organization.
Definition of a File: 4. Write:
A file is a collection of logically related data stored in secondary memory, such o Inserts new data into the file or overwrites existing data
as a hard drive or SSD, under a specific name (filename). It is the basic unit of depending on the mode.
storage used in every computer system for data permanence and accessibility.
Files allow data to persist beyond program execution. 5. Append:

Files are managed by the operating system, and can be of various types—text o Adds new data at the end of the file without modifying
files, binary files, executable files, etc. Depending on the file organization existing content.
method, data may be accessed sequentially, directly, or through indexing.
6. Update (Modify):

o Changes or updates part of the data in the file, usually done


Constituents of a File: by locating the specific record and overwriting it.
1. Header: 7. Delete:

o This is the metadata section that stores important o Removes data or the file itself from the storage system.
information about the file, such as file type, size, format,
record length, and creation/modification dates. o Logical deletion marks a record as deleted; physical
deletion removes it permanently.
o For example, a file may have a header stating that it stores
100 records of 50 bytes each. 8. Close:

2. Records: o Finalizes the operation, ensuring the data is saved and


resources are released.
o A record is a collection of fields that represent a single unit
of meaningful information.

o For instance, a student record may contain fields like name, C++ Example: Writing and Reading a File
roll number, and marks.
#include <iostream>
3. Fields:
#include <fstream>
using namespace std; o Periodically, the main file and overflow areas are merged
and sorted again to reduce lookup time and fragmentation.

int main() {
Diagram:
// Writing to a file
+-------------------+ +---------------+
ofstream fout("example.txt");
| Index File | --> | Key: Address |
fout << "This is file handling in C++.";
+-------------------+ +---------------+
fout.close();

+-------------------+ +-----------------+
// Reading from the file
| Main Data File | | Overflow Area |
ifstream fin("example.txt");
+-------------------+ +-----------------+
string content;
| 1001 | John |--+ ->| 1006 | Sarah |
while (getline(fin, content)) {
| 1002 | Alice | | 1010 | Rohan |
cout << content << endl;
+-------------------+ +------------------+
}

fin.close();
Q10. Differentiate Between Multi-list and Inverted List File Organization.
return 0;
(15 Marks)
}
Inverted List File
Feature Multi-list File Organization
Organization

Q9. What is Indexed Sequential File? Explain Techniques for Handling Multiple linked lists for each Central index for each
Structure
Overflow. (15 Marks) key/field field/attribute

Indexed Sequential File: Many-to-many relationships (e.g., Searching based on multiple


Use Case
Students and Courses) non-primary keys
An indexed sequential file combines the advantages of sequential and direct
access. Records are stored sequentially based on a key field, and an index is
maintained to allow fast access to blocks or records. Access Quick access using inverted
Traversal through linked records
Type indexes
This type of file organization is suitable for applications like employee databases,
bank systems, and library management, where both sequential and random Medium (depends on list High (requires maintaining
Complexity
access are frequently needed. connections) multiple indexes)

Example Student–Course Enrollment Search Engine indexing

Structure:
Explanation:
1. Main File (Data File): Contains the actual records in sorted order.
• In a multi-list, records are connected in multiple linked lists, with
2. Index File: Contains key-to-address mappings. each list representing a relationship.

3. Overflow Area: Used when new records cannot be inserted in • In an inverted list, each attribute has its own index which points to all
sequence. records having that attribute value. Common in information retrieval
systems.

Advantages:
Q11. Describe Fixed and Variable Length Record With Example. (15
• Efficient for both sequential and random access. Marks)

• Supports sorted files and indexed lookups. Fixed-Length Record:

• Every record occupies the same amount of space.


Overflow Handling Techniques:
• Easy to process, fast for retrieval.
1. Overflow Area (Separate):
• Wastes space if the data is smaller than the allocated size.
o A separate space is reserved where overflow records are
stored. Example:

o Slows down access as multiple areas need to be scanned. struct Employee {

2. Linked Overflow: int empID; // 4 bytes

o Overflow records are linked to the main record using char name[20]; // 20 bytes
pointers or addresses.
float salary; // 4 bytes
o Efficient but increases pointer overhead.
};
3. Reorganization:
// Total = 28 bytes
Example:

Variable-Length Record: Email messages, social media posts, or chat logs where the length of content
varies significantly.
• Record sizes may vary due to varying field lengths (e.g., comments,
addresses).

• Efficient in space but more complex to access and maintain. Difference Table:

Complexity Simple Requires delimiters/pointers

Q12. Describe Primary and Secondary Key


With Example. (15 Marks)
Feature Fixed-Length Record Variable-Length Record

Size Constant Varies per record

Access Speed Fast Slower

Space Efficiency May waste space Space-optimized

Primary Key:

A primary key is an attribute that uniquely


identifies a record in a file or table. It must be
unique and non-null. Only one primary key is
allowed per record structure.

Example:

• Student Roll Number

• Employee ID

• Aadhaar Number

Secondary Key:

A secondary key is any non-unique attribute used


for searching, sorting, or filtering. It does not
uniquely identify records but enhances flexibility
in access.

Example:

• Student Name

• Department Name

• City

Differences Table:

Feature Primary Key Secondary Key

Can be
Uniqueness Must be unique
duplicate

Null Values Not allowed Allowed

Record
Main Purpose Search/filtering
identification

Number per Only one


Can be multiple
Record primary key

Would you like me to compile this into a printable


PDF or Word file for your exam prep, or continue
with more questions in this format?

You might also like