SlideShare a Scribd company logo
UNIT V : Searching, Sorting and Hashing
By
Mr.S.Selvaraj
Asst. Professor (SRG) / CSE
Kongu Engineering College
Perundurai, Erode, Tamilnadu, India
Thanks to and Resource from : Data Structures and Algorithm Analysis in C by Mark Allen Weiss & Sumitabha Das, “Computer Fundamentals and C
Programming”, 1st Edition, McGraw Hill, 2018.
20CST32 – Data Structures
Syllabus – Unit Wise
4/6/2022 5.1 _ Searching 2
List of Exercises
4/6/2022 5.1 _ Searching 3
Text Book and Reference Book
4/6/2022 5.1 _ Searching 4
Unit V : Contents
1. Searching
– Linear search
– Binary Search
2. Sorting:
– Internal sorting:
• Insertion sort
• Selection sort
• Bubble sort
• Shell sort
• Quick Sort
• Heap sort
• Bucket sort
– External sorting:
• Merge Sort
• Multiway Merge
• Polyphase Merge
3. Hashing:
– Hash Functions
– Separate Chaining
– Closed Hashing (Open Addressing)
• Linear Probing
• Quadratic Probing
• Double Hashing
– Rehashing
– Extendible Hashing.
4/6/2022 5
5.1 _ Searching
Searching
• Search is a process of finding a value in a list
of values.
• In other words, searching is the process of
locating given value position in a list of
values.
4/6/2022 5.1 _ Searching 6
Types of Search
• Linear Search
• Binary Search
• Interpolation search
• Sublist search
• Exponential search
• Jump search
• Fibonacci search and etc.,
4/6/2022 5.1 _ Searching 7
Linear Search
• Linear search algorithm finds a given element in a list of
elements with O(n) time complexity where n is total
number of elements in the list.
• This search process starts comparing search element with
the first element in the list.
• If both are matched then result is element found otherwise
search element is compared with the next element in the
list.
• Repeat the same until search element is compared with the
last element in the list, if that last element also doesn't
match, then the result is "Element not found in the list".
• That means, the search element is compared with element
by element in the list.
4/6/2022 5.1 _ Searching 8
Linear Search - Algorithm
• Step 1 - Read the search element from the user.
• Step 2 - Compare the search element with the first element in the list.
• Step 3 - If both are matched, then display "Given element is found!!!" and
terminate the function.
• Step 4 - If both are not matched, then compare search element with the
next element in the list.
• Step 5 - Repeat steps 3 and 4 until search element is compared with last
element in the list.
• Step 6 - If last element in the list also doesn't match, then display
"Element is not found!!!" and terminate the function
4/6/2022 5.1 _ Searching 9
Linear Search - Example
4/6/2022 5.1 _ Searching 10
4/6/2022 5.1 _ Searching 11
4/6/2022 5.1 _ Searching 12
4/6/2022 5.1 _ Searching 13
Linear Search
4/6/2022 5.1 _ Searching 14
Binary Search
• Binary search finds a given element in a list of elements
with O(logn) time complexity where n is total number of
elements in the list.
• The binary search algorithm can be used with only a sorted
list of elements.
• That means the binary search is used only with a list of
elements that are already arranged in an order.
• The binary search can not be used for a list of elements
arranged in random order.
• This search process starts comparing the search element with
the middle element in the list.
4/6/2022 5.1 _ Searching 15
Binary Search - Algorithm
• Step 1 - Read the search element from the user.
• Step 2 - Find the middle element in the sorted list.
• Step 3 - Compare the search element with the middle element in
the sorted list.
• Step 4 - If both are matched, then display "Given element is
found!!!" and terminate the function.
• Step 5 - If both are not matched, then check whether the search
element is smaller or larger than the middle element.
• Step 6 - If the search element is smaller than middle element,
repeat steps 2, 3, 4 and 5 for the left sublist of the middle element.
• Step 7 - If the search element is larger than middle element, repeat
steps 2, 3, 4 and 5 for the right sublist of the middle element.
• Step 8 - Repeat the same process until we find the search element
in the list or until sublist contains only one element.
• Step 9 - If that element also doesn't match with the search
element, then display "Element is not found in the list!!!" and
terminate the function.
4/6/2022 5.1 _ Searching 16
Binary Search - Example
4/6/2022 5.1 _ Searching 17
4/6/2022 5.1 _ Searching 18
Binary Search - Program
4/6/2022 5.1 _ Searching 19
4/6/2022 5.1 _ Searching 20
Binary Search
4/6/2022 5.1 _ Searching 21
Thank you
4/6/2022 5.1 _ Searching 22
UNIT V : Searching, Sorting and Hashing
By
Mr.S.Selvaraj
Asst. Professor (SRG) / CSE
Kongu Engineering College
Perundurai, Erode, Tamilnadu, India
Thanks to and Resource from : Data Structures and Algorithm Analysis in C by Mark Allen Weiss & Sumitabha Das, “Computer Fundamentals and C
Programming”, 1st Edition, McGraw Hill, 2018.
20CST32 – Data Structures
Unit V : Contents
1. Searching
– Linear search
– Binary Search
2. Sorting:
– Internal sorting:
• Insertion sort
• Selection sort
• Bubble sort
• Shell sort
• Quick Sort
• Heap sort
• Bucket sort
– External sorting:
• Merge Sort
• Multiway Merge
• Polyphase Merge
3. Hashing:
– Hash Functions
– Separate Chaining
– Closed Hashing (Open Addressing)
• Linear Probing
• Quadratic Probing
• Double Hashing
– Rehashing
– Extendible Hashing.
4/6/2022 24
5.2 _ Sorting
Sorting
• The arrangement of data in a preferred order is
called sorting in the data structure.
• By sorting data, it is easier to search through it
quickly and easily.
• The simplest example of sorting is a dictionary.
• Before the era of the Internet, when you wanted
to look up a word in a dictionary, you would do so
in alphabetical order. This made it easy.
4/6/2022 5.2 _ Sorting 25
Types of Sorting
4/6/2022 5.2 _ Sorting 26
Types of Sorting
• When all data is placed in-memory, then sorting
is called internal sorting.
• When all data that needs to be sorted cannot be
placed in-memory at a time, the sorting is
called external sorting.
• External Sorting is used for massive amount of
data.
• Merge Sort and its variations are typically used
for external sorting.
• Some external storage like hard-disk, CD, etc is
used for external storage.
4/6/2022 5.2 _ Sorting 27
Internal Vs External Sorting
4/6/2022 5.2 _ Sorting 28
Insertion Sort
• Insertion sort is a simple sorting algorithm that works similar to the
way you sort playing cards in your hands.
• The array is virtually split into a sorted and an unsorted part.
• Values from the unsorted part are picked and placed at the correct
position in the sorted part.
• This is an in-place comparison-based sorting algorithm.
• Here, a sub-list is maintained which is always sorted. For example,
the lower part of an array is maintained to be sorted.
• An element which is to be 'insert'ed in this sorted sub-list, has to
find its appropriate place and then it has to be inserted there.
Hence the name, insertion sort.
• The array is searched sequentially and unsorted items are moved
and inserted into the sorted sub-list (in the same array).
• This algorithm is not suitable for large data sets as its average and
worst case complexity are of Ο(n2), where n is the number of items.
4/6/2022 5.2 _ Sorting 29
Insertion Sort - Algorithm
• To sort an array of size n in ascending order:
– 1: Iterate from arr[1] to arr[n] over the array.
– 2: Compare the current element (key) to its
predecessor.
– 3: If the key element is smaller than its
predecessor, compare it to the elements before.
Move the greater elements one position up to
make space for the swapped element.
4/6/2022 5.2 _ Sorting 30
Insertion Sort – Example 1
4/6/2022 5.2 _ Sorting 31
4/6/2022 5.2 _ Sorting 32
By now we have 14 and 27 in the sorted sub-list. Next, it compares 33 with 10.
This process goes on until all the unsorted values are covered in a sorted sub-list.
Insertion Sort – Example 2
4/6/2022 5.2 _ Sorting 33
Selection Sort
• Selection sort is a simple sorting algorithm.
• This sorting algorithm is an in-place comparison-based
algorithm in which the list is divided into two parts, the
sorted part at the left end and the unsorted part at the
right end.
• Initially, the sorted part is empty and the unsorted part is
the entire list.
• The smallest element is selected from the unsorted array
and swapped with the leftmost element, and that element
becomes a part of the sorted array.
• This process continues moving unsorted array boundary by
one element to the right.
• This algorithm is not suitable for large data sets as its
average and worst case complexities are of Ο(n2),
where n is the number of items.
4/6/2022 5.2 _ Sorting 34
Selection Sort - Algorithm
• Step 1 − Set MIN to location 0
• Step 2 − Search the minimum element in the list
• Step 3 − Swap with value at location MIN
• Step 4 − Increment MIN to point to next element
• Step 5 − Repeat until list is sorted
4/6/2022 5.2 _ Sorting 35
Example
4/6/2022 5.2 _ Sorting 36
4/6/2022 5.2 _ Sorting 37
Bubble Sort
• Bubble sort is a simple sorting algorithm.
• This sorting algorithm is comparison-based
algorithm in which each pair of adjacent
elements is compared and the elements are
swapped if they are not in order.
• This algorithm is not suitable for large data
sets as its average and worst case complexity
are of Ο(n2) where n is the number of items.
4/6/2022 5.2 _ Sorting 38
Bubble Sort - Algorithm
4/6/2022 5.2 _ Sorting 39
Bubble Sort - Example
4/6/2022 5.2 _ Sorting 40
4/6/2022 5.2 _ Sorting 41
Shell Sort
• Shell sort, named after its inventor, Donald Shell, was one of the
first algorithms to break the quadratic time barrier.
• Shell sort is a highly efficient sorting algorithm and is based on
insertion sort algorithm.
• This algorithm avoids large shifts as in case of insertion sort.
• If the smaller value is to the far right and has to be moved to the
far left.
• This algorithm uses insertion sort on a widely spread elements, first
to sort them and then sorts the less widely spaced elements.
• This spacing is termed as interval.
• This algorithm is quite efficient for medium-sized data sets as its
average and worst-case complexity of this algorithm depends on
the gap sequence the best known is Ο(n), where n is the number of
items.
• And the worst case space complexity is O(n).
4/6/2022 5.2 _ Sorting 42
Shell Sort - Algorithm
4/6/2022 5.2 _ Sorting 43
Shell Sort - Example
• In this example, we take the interval of 4.
• Make a virtual sub-list of all values located at the interval of 4
positions.
• Here these values are {35, 14}, {33, 19}, {42, 27} and {10, 44}
4/6/2022 5.2 _ Sorting 44
Shell Sort - Example
• We compare values in each sub-list and swap
them (if necessary) in the original array.
• After this step, the new array should look like
this −
4/6/2022 5.2 _ Sorting 45
• Shell sort uses insertion sort to sort the array.
4/6/2022 5.2 _ Sorting 46
Quick Sort
• Quick sort is a fast sorting algorithm used to sort a list of elements.
• Quick sort algorithm is invented by C. A. R. Hoare.
• The quick sort algorithm attempts to separate the list of elements
into two parts and then sort each part recursively.
• That means it use divide and conquer strategy.
• In quick sort, the partition of the list is performed based on the
element called pivot.
• Here pivot element is one of the elements in the list.
• The list is divided into two partitions such that
– all elements to the left of pivot are smaller than the pivot and
– all elements to the right of pivot are greater than or equal to the
pivot
• This algorithm is quite efficient for large-sized data sets as its
average and worst-case complexity are O(n2), respectively.
4/6/2022 5.2 _ Sorting 47
Quick Sort – Algorithm(Pivot)
• Step 1 - Consider the first element of the list as pivot (i.e., Element at first
position in the list).
• Step 2 - Define two variables i and j. Set i and j to first and last elements of
the list respectively.
• Step 3 - Increment i until list[i] > pivot then stop.
• Step 4 - Decrement j until list[j] < pivot then stop.
• Step 5 - If i < j then exchange list[i] and list[j].
• Step 6 - Repeat steps 3,4 & 5 until i > j.
• Step 7 - Exchange the pivot element with list[j] element.
4/6/2022 5.2 _ Sorting 48
Quick Sort – Example
4/6/2022 5.2 _ Sorting 49
4/6/2022 5.2 _ Sorting 50
4/6/2022 5.2 _ Sorting 51
4/6/2022 5.2 _ Sorting 52
4/6/2022 5.2 _ Sorting 53
Quick sort - Program
4/6/2022 5.2 _ Sorting 54
4/6/2022 5.2 _ Sorting 55
Heap sort
• Heap sort is one of the sorting algorithms
used to arrange a list of elements in order.
• Heapsort algorithm uses one of the tree
concepts called Heap Tree.
• In this sorting algorithm, we use
– Max Heap to arrange list of elements in
Descending order and
– Min Heap to arrange list elements in Ascending
order.
4/6/2022 5.2 _ Sorting 56
Heap Sort - Algorithm
• Step 1 - Construct a Binary Tree with given list of Elements.
• Step 2 - Transform the Binary Tree into Min Heap.
• Step 3 - Delete the root element from Min Heap
using Heapify method.
• Step 4 - Put the deleted element into the Sorted list.
• Step 5 - Repeat the same until Min Heap becomes empty.
• Step 6 - Display the sorted list.
4/6/2022 5.2 _ Sorting 57
Heap Sort - Example
4/6/2022 5.2 _ Sorting 58
4/6/2022 5.2 _ Sorting 59
4/6/2022 5.2 _ Sorting 60
4/6/2022 5.2 _ Sorting 61
4/6/2022 5.2 _ Sorting 62
4/6/2022 5.2 _ Sorting 63
4/6/2022 5.2 _ Sorting 64
Bucket Sort
• Bucket Sort is a sorting algorithm that divides
the unsorted array elements into several
groups called buckets.
• Each bucket is then sorted by using any of the
suitable sorting algorithms or recursively
applying the same bucket algorithm.
• Finally, the sorted buckets are combined to
form a final sorted array.
• Scatter Gather Approach is used.
4/6/2022 5.2 _ Sorting 65
Scatter Gather approach
• The process of bucket sort can be understood
as a scatter-gather approach.
• Here, elements are first scattered into buckets
then the elements in each bucket are sorted.
• Finally, the elements are gathered in order.
4/6/2022 5.2 _ Sorting 66
Bucket Sort - Example
4/6/2022 5.2 _ Sorting 67
Method 2
4/6/2022 5.2 _ Sorting 68
Example
4/6/2022 5.2 _ Sorting 69
Creation of Bucket
4/6/2022 5.2 _ Sorting 70
Insertion into bucket
4/6/2022 5.2 _ Sorting 71
Insertion
4/6/2022 5.2 _ Sorting 72
Sort each bucket using insertion sort
4/6/2022 5.2 _ Sorting 73
Sorted Array
4/6/2022 5.2 _ Sorting 74
Program – bucket sort
4/6/2022 5.2 _ Sorting 75
Bucket Sort – Time Complexity
4/6/2022 5.2 _ Sorting 76
Merge Sort
• Merge sort is a sorting technique based on
divide and conquer technique.
• With worst-case time complexity being Ο(n
log n).
• It is one of the most respected algorithms.
• Merge sort first divides the array into equal
halves and then combines them in a sorted
manner.
4/6/2022 5.2 _ Sorting 77
Merge Sort - Algorithm
4/6/2022 5.2 _ Sorting 78
Merge Sort - Example
4/6/2022 5.2 _ Sorting 79
Merge Sort – Example (Contd.,)
4/6/2022 5.2 _ Sorting 80
Merge Sort – Example 2
4/6/2022 5.2 _ Sorting 81
Applications and Drawbacks of
Merge Sort
• Applications:
– Merge Sort is useful for sorting linked lists in O(nLogn)
time.
– Inversion Count Problem
– Used in External Sorting
• Drawbacks:
– Slower comparative to the other sort algorithms for
smaller tasks.
– Merge sort algorithm requires an additional memory
space of 0(n) for the temporary array.
– It goes through the whole process even if the array is
sorted.
4/6/2022 5.2 _ Sorting 82
2 way merge
4/6/2022 5.2 _ Sorting 83
4 way merge
4/6/2022 5.2 _ Sorting 84
Different ways of merge
4/6/2022 5.2 _ Sorting 85
2-way merge- example
4/6/2022 5.2 _ Sorting 86
Mutiway Merge Sort
• The basic external sorting algorithm uses the merge
routine from mergesort.
• Suppose we have four tapes, Ta1, Ta2, Tb1, Tb2, which
are two input and two output tapes.
• Depending on the point in the algorithm, the a and b
tapes are either input tapes or output tapes.
• Suppose the data is initially on Ta1. Suppose further
that the internal memory can hold (and sort) m records
at a time.
• A natural first step is to read m records at a time from
the input tape, sort the records internally, and then
write the sorted records alternately to Tb1 and Tb2.
• We will call each set of sorted records a run. When this
is done, we rewind all the tapes.
4/6/2022 5.2 _ Sorting 87
Multiway Merge - Example
• If m = 3, then after the runs are constructed,
the tapes will contain the data indicated in the
following figure.
4/6/2022 5.2 _ Sorting 88
Multiway Merge – Example (Contd.,)
• This algorithm will require log(n/m) passes, plus
the initial run constructing pass.
• For instance, if we have 10 million records of 128
bytes each, and four megabytes of internal
memory, then the first pass will create 320 runs.
We would then need nine more passes to
complete the sort.
• Our example requires log 13/3 = 3 more passes,
which are shown in the following figure.
4/6/2022 5.2 _ Sorting 89
Multiway Merge – Example (Contd.,)
4/6/2022 5.2 _ Sorting 90
Polyphase Merge
• A polyphase merge sort is an algorithm which
decreases the number of runs at every iteration
of the main loop by merging runs into larger runs.
• It is used for external sorting.
• In this type of sort, the tapes being merged, and
the tape to which the merged sub files are
written, vary continuously throughout the sort.
• In this technique, the concept of a pass through
records is not as clear-cut as in the straight or the
natural merge.
4/6/2022 5.2 _ Sorting 91
4/6/2022 5.2 _ Sorting 92
Polyphase Merge
• The k-way merging strategy requires the use of 2k tapes.
• This could be prohibitive for some applications.
• It is possible to get by with only k+1 tape.
• Suppose we have three tapes, T1, T2, T3 and an input file on T1 that
will produce 34 runs.
• One options in to put 17 runs each on T2 and T3.
• We could then the merge this result onto T1, obtaining one tape
with 17 runs.
• The problem is that since all the runs are on one tape, we must
now put some of these runs on T2 to perform another merge.
• The logical way to do this is to copy first 8 runs from T1 onto T2 and
then perform the merge.
• This has effect on adding an extra half pass for every pass we do.
4/6/2022 5.2 _ Sorting 93
Polyphase Merge
• Alternative method is to split the original 34 runs unevenly.
• Suppose we put 21 runs on T2 and 13 runs on T3.
• We would then merge 13 runs onto T1 before T3 was empty.
• At this point we could rewind T1 and T3 and merge T1 with 13 runs
and T2 which has 8 runs onto T3.
• We would then merge 8 runs untill T2 was empty which would
leave 5 runs left on T1 and 8 runs on T3.
• We could then merge T1 and T3 and so on.
4/6/2022 5.2 _ Sorting 94
Polyphase Merge
• The original distribution of runs makes a lot of difference.
• E.g. if 22 runs are placed on T2 with 12 on T3, then after first merge we
obtain 12 runs on T1 and 10 on T2.
• After another merge, there are 10 runs on T1 and 2 runs on T3.
• At this point the going gets slow, because we can only merge two sets of
runs before T3 is exhausted.
• Then T1 has 8 runs and T2 has 2 runs.
• Again we can only merge two sets of runs, obtaining T1 with 6 and T3 with
2 runs.
• After three more passes, T2 has 2 runs and other tapes are empty.
• We copy one run to another tape and then we can finish the merge.
• It turns out if the no. of runs is a Fibonacci number, Fn then the best way
to distribute them is to split them into two Fibonacci numbers Fn-1 and
Fn-2.
• Otherwise, it is necessary to pad the tape with dummy runs in order to get
no. of runs up to Fibonacci number.
4/6/2022 5.2 _ Sorting 95
Asymptotic Notations
4/6/2022 5.2 _ Sorting 96
4/6/2022 5.2 _ Sorting 97
Time and Space Complexity
4/6/2022 5.2 _ Sorting 98
4/6/2022 5.2 _ Sorting 99
Time and Space Complexity
4/6/2022 5.2 _ Sorting 100
Time and Space Complexity
4/6/2022 5.2 _ Sorting 101
Thank you
4/6/2022 5.2 _ Sorting 102
UNIT V : Searching, Sorting and Hashing
By
Mr.S.Selvaraj
Asst. Professor (SRG) / CSE
Kongu Engineering College
Perundurai, Erode, Tamilnadu, India
Thanks to and Resource from : Data Structures and Algorithm Analysis in C by Mark Allen Weiss & Sumitabha Das, “Computer Fundamentals and C
Programming”, 1st Edition, McGraw Hill, 2018.
20CST32 – Data Structures
Unit V : Contents
1. Searching
– Linear search
– Binary Search
2. Sorting:
– Internal sorting:
• Insertion sort
• Selection sort
• Bubble sort
• Shell sort
• Quick Sort
• Heap sort
• Bucket sort
– External sorting:
• Merge Sort
• Multiway Merge
• Polyphase Merge
3. Hashing:
– Hash Functions
– Separate Chaining
– Closed Hashing (Open Addressing)
• Linear Probing
• Quadratic Probing
• Double Hashing
– Rehashing
– Extendible Hashing.
4/6/2022 104
5.3 _ Hashing
Hashing
• The implementation of hash tables is frequently called
hashing.
• Hashing is a technique used for performing insertions,
deletions and finds in constant average time.
• Tree operations that require any ordering information
among the elements are not supported efficiently.
• Thus, operations such as find_min, find_max, and the
printing of the entire table in sorted order in linear time
are not supported.
• Here, The central data structure is hash table. We will See
– several methods of implementing the hash table.
– Compare these methods analytically.
– Show numerous applications of hashing.
– Compare hash tables with binary search trees.
4/6/2022 5.3 _ Hashing 105
Hashing
• The ideal hash table data structure is merely an array of some fixed size,
containing the keys.
• Typically, a key is a string with an associated value (for instance, salary
information).
• We will refer to the table size as H_SIZE, with the understanding that this
is part of a hash data structure and not merely some variable floating
around globally.
• The common convention is to have the table run from 0 to H_SIZE-1;
• Each key is mapped into some number in the range 0 to H_SIZE - 1 and
placed in the appropriate cell.
• The mapping is called a hash function, which ideally should be simple to
compute and should ensure that any two distinct keys get different cells.
• Since there are a finite number of cells and a virtually inexhaustible supply
of keys, this is clearly impossible, and thus we seek a hash function that
distributes the keys evenly among the cells.
4/6/2022 5.3 _ Hashing 106
Example
• In this example, john hashes to 3, phil hashes
to 4, dave hashes to 6, and mary hashes to 7.
4/6/2022 5.3 _ Hashing 107
Collision
• The only remaining problems
– deal with choosing a function,
– deciding what to do when two keys hash to the
same value (this is known as a collision), and
– deciding on the table size.
4/6/2022 5.3 _ Hashing 108
Hash Function
• If the input keys are integers, then simply returning key mod
H_SIZE is generally a reasonable strategy, unless key happens to
have some undesirable properties.
• In this case, the choice of hash function needs to be carefully
considered.
• For instance, if the table size is 10 and the keys all end in zero, then
the standard hash function is obviously a bad choice.
• For reasons we shall see later, and to avoid situations like the one
above, it is usually a good idea to ensure that the table size is
prime.
• When the input keys are random integers, then this function is not
only very simple to compute but also distributes the keys evenly.
• Usually, the keys are strings; in this case, the hash function needs to
be chosen carefully.
4/6/2022 5.3 _ Hashing 109
Open Hashing (Separate Chaining)
• The first strategy, commonly known as either
open hashing, or separate chaining, is to keep
a list of all elements that hash to the same
value.
• For convenience, our lists have headers.
• If space is tight, it might be preferable to avoid
their use.
4/6/2022 5.3 _ Hashing 110
Open Hashing – Find & Insert
• To perform a find, we use the hash function to determine which list
to traverse.
• We then traverse this list in the normal manner, returning the
position where the item is found.
• To perform an insert, we traverse down the appropriate list to
check whether the element is already in place.
– if duplicates are expected, an extra field is usually kept, and this field
would be incremented in the event of a match.
– If the element turns out to be new, it is inserted either at the front of
the list or at the end of the list, whichever is easiest.
• This is an issue most easily addressed while the code is being
written.
• Sometimes new elements are inserted at the front of the list, since
it is convenient and also because frequently it happens that
recently inserted elements are the most likely to be accessed in
the near future.
4/6/2022 5.3 _ Hashing 111
Open hashing – Type Declarations
struct list_node
{
element_type element;
node_ptr next;
};
typedef node_ptr LIST;
typedef node_ptr position;
/* LIST *the_list will be an array of lists, allocated later */
/* The lists will use headers, allocated later */
struct hash_tbl
{
unsigned int table_size;
LIST *the_lists;
};
typedef struct hash_tbl *HASH_TABLE;
4/6/2022 5.3 _ Hashing 112
Open hashing – Initialization
• HASH_TABLE initialize_table( unsigned int table_size )
• {
• HASH_TABLE H;
• int i;
• /*1*/ if( table size < MIN_TABLE_SIZE )
• {
• /*2*/ error("Table size too small");
• /*3*/ return NULL;
• }
• /* Allocate table */
• /*4*/ H = (HASH_TABLE) malloc ( sizeof (struct hash_tbl) );
• /*5*/ if( H == NULL )
• /*6*/ fatal_error("Out of space!!!");
• /*7*/ H->table_size = next_prime( table_size );
• /* Allocate list pointers */
• /*8*/ H->the_lists = (position *) malloc( sizeof (LIST) * H->table_size );
• /*9*/ if( H->the_lists == NULL )
• /*10*/ fatal_error("Out of space!!!");
• /* Allocate list headers */
• /*11*/ for(i=0; i<H->table_size; i++ )
• {
• /*12*/ H->the_lists[i] = (LIST) malloc( sizeof (struct list_node) );
• /*13*/ if( H->the_lists[i] == NULL )
• /*14*/ fatal_error("Out of space!!!");
• else
• /*15*/ H->the_lists[i]->next = NULL;
• }
• /*16*/ return H;
• }
4/6/2022 5.3 _ Hashing 113
Open Hashing – Find Routine
• position find( element_type key, HASH_TABLE H )
• {
• position p;
• LIST L;
• /*1*/ L = H->the_lists[ hash( key, H->table_size) ];
• /*2*/ p = L->next;
• /*3*/ while( (p != NULL) && (p->element != key) )
• /* Probably need strcmp!! */
• /*4*/ p = p->next;
• /*5*/ return p;
• }
4/6/2022 5.3 _ Hashing 114
Open Hashing – Insert Routine
• void insert( element_type key, HASH_TABLE H )
• {
• position pos, new_cell;
• LIST L;
• /*1*/ pos = find( key, H );
• /*2*/ if( pos == NULL )
• {
• /*3*/ new_cell = (position) malloc(sizeof(struct list_node));
• /*4*/ if( new_cell == NULL )
• /*5*/ fatal_error("Out of space!!!");
• else
• {
• /*6*/ L = H->the_lists[ hash( key, H->table size ) ];
• /*7*/ new_cell->next = L->next;
• /*8*/ new_cell->element = key; /* Probably need strcpy!! */
• /*9*/ L->next = new_cell;
• }
• }
• }
4/6/2022 5.3 _ Hashing 115
Open Hashing - Example
• We assume for this section that the keys are the first 10
perfect squares and that the hashing function is simply
hash(x) = x mod 10. (The table size is not prime, but is used
here for simplicity.)
4/6/2022 5.3 _ Hashing 116
Load Factor
• We define the load factor, ∆ , of a hash table
to be the ratio of the number of elements in
the hash table to the table size.
• In the example above, ∆ = 1.0.
• The average length of a list is ∆.
• The effort required to perform a search is the
constant time required to evaluate the hash
function plus the time to traverse the list.
4/6/2022 5.3 _ Hashing 117
Load Factor
• In an unsuccessful search, the number of links to traverse is
∆ (excluding the final NULL link) on average.
• A successful search requires that about 1+(∆/2) links be
traversed, since there is a guarantee that one link must be
traversed (since the search is successful), and we also
expect to go halfway down a list to find our match.
• This analysis shows that the table size is not really
important, but the load factor is.
• The general rule for open hashing is to make the table size
about as large as the number of elements expected (in
other words, let ∆ ≈ 1).
• It is also a good idea, as mentioned before, to keep the
table size prime to ensure a good distribution.
4/6/2022 5.3 _ Hashing 118
Closed Hashing (Open Addressing)
• Open hashing has the disadvantage of requiring pointers.
• This tends to slow the algorithm down a bit because of the time
required to allocate new cells, and also essentially requires the
implementation of a second data structure.
• Closed hashing, also known as open addressing, is an alternative to
resolving collisions with linked lists.
• In a closed hashing system, if a collision occurs, alternate cells are
tried until an empty cell is found.
• More formally, cells h0(x), h1 (x), h2(x), . . . are tried in succession
where hi(x) = (hash(x) + f(i))mod H_SIZE, with f(0) = 0.
• The function, f , is the collision resolution strategy.
• Because all the data goes inside the table, a bigger table is needed
for closed hashing than for open hashing.
• Generally, the load factor should be below = 0.5 for closed
hashing.
4/6/2022 5.3 _ Hashing 119
Collision Resolution Strategies
• We now look at three common collision
resolution strategies.
– Linear Probing
– Quadratic Probing
– Double Hashing
4/6/2022 5.3 _ Hashing 120
Linear Probing
• In linear probing, f is a linear function of i, typically f(i) = i.
• This amounts to trying cells sequentially (with wraparound) in
search of an empty cell.
• Figure shows in next slide the result of inserting keys {89, 18, 49, 58,
69} into a closed table using the same hash function as before and
the collision resolution strategy, (i) = i.
• The first collision occurs when 49 is inserted; it is put in the next
available spot, namely spot 0, which is open.
• 58 collides with 18, 89, and then 49 before an empty cell is found
three away. The collision for 69 is handled in a similar manner.
• As long as the table is big enough, a free cell can always be found,
but the time to do so can get quite large.
• Worse, even if the table is relatively empty, blocks of occupied
cells start forming.
• This effect, known as primary clustering, means that any key that
hashes into the cluster will require several attempts to resolve the
collision, and then it will add to the cluster.
4/6/2022 5.3 _ Hashing 121
Linear Probing - Example
4/6/2022 5.3 _ Hashing 122
• Although we will not perform the calculations
here, it can be shown that the expected
number of probes using linear probing is
roughly
– 1/2(1 + 1/(1 - ∆)2) for insertions and unsuccessful
searches and
– 1/2(1 + 1/ (1- ∆)) for successful searches.
4/6/2022 5.3 _ Hashing 123
Quadratic probing
• Quadratic probing is a collision resolution method that eliminates the primary
clustering problem
• of linear probing.
• Quadratic probing is what you would expect-the collision function is quadratic.
• The popular choice is f(i) = i2.
• Figure shows the resulting closed table with this collision function on the same
input used in the linear probing example.
• When 49 collides with 89, the next position attempted is one cell away. This cell is
empty, so 49 is placed there.
• Next 58 collides at position 8. Then the cell one away is tried but another collision
occurs. A vacant cell is found at the next cell tried, which is 22 = 4 away. 58 is thus
placed in cell 2.
• The same thing happens for 69.
• For linear probing it is a bad idea to let the hash table get nearly full, because
performance degrades.
• For quadratic probing, the situation is even more drastic: There is no guarantee of
finding an empty cell once the table gets more than half full, or even before the
table gets half full if the table size is not prime.
• This is because at most half of the table can be used as alternate locations to
resolve collisions.
• Indeed, we prove now that if the table is half empty and the table size is prime,
then we are always guaranteed to be able to insert a new element.
4/6/2022 5.3 _ Hashing 124
Quadratic Probing - Example
4/6/2022 5.3 _ Hashing 125
Closed hashing - Type declaration
• enum kind_of_entry { legitimate, empty, deleted };
• struct hash_entry
• {
• element_type element;
• enum kind_of_entry info;
• };
• typedef INDEX position;
• typedef struct hash_entry cell;
• /* the_cells is an array of hash_entry cells, allocated later */
• struct hash_tbl
• {
• unsigned int table_size;
• cell *the_cells;
• };
• typedef struct hash_tbl *HASH_TABLE;
4/6/2022 5.3 _ Hashing 126
Closed hashing – Initialization
• HASH_TABLE initialize_table( unsigned int table_size )
• {
• HASH_TABLE H;
• int i;
• /*1*/ if( table_size < MIN_TABLE_SIZE )
• {
• /*2*/ error("Table size too small");
• /*3*/ return NULL;
• }
• /* Allocate table */
• /*4*/ H = (HASH_TABLE) malloc( sizeof ( struct hash_tbl ) );
• /*5*/ if( H == NULL )
• /*6*/ fatal_error("Out of space!!!");
• /*7*/ H->table_size = next_prime( table_size );
• /* Allocate cells */
• /*8*/ H->the cells = (cell *) malloc ( sizeof ( cell ) * H->table_size );
• /*9*/ if( H->the_cells == NULL )
• /*10*/ fatal_error("Out of space!!!");
• /*11*/ for(i=0; i<H->table_size; i++ )
• /*12*/ H->the_cells[i].info = empty;
• /*13*/ return H;
• }
4/6/2022 5.3 _ Hashing 127
Closed hashing – Find Routine with
Quadratic Probing
• position find( element_type key, HASH_TABLE H )
• {
• position i, current_pos;
• /*1*/ i = 0;
• /*2*/ current_pos = hash( key, H->table_size );
• /* Probably need strcmp! */
• /*3*/ while( (H->the_cells[current_pos].element != key ) &&
• (H->the_cells[current_pos].info != empty ) )
• {
• /*4*/ current_pos += 2*(++i) - 1;
• /*5*/ if( current_pos >= H->table_size )
• /*6*/ current_pos -= H->table_size;
• }
• /*7*/ return current_pos;
• }
4/6/2022 5.3 _ Hashing 128
Closed hashing – Insert Routine with
Quadratic Probing
• void
• insert( element_type key, HASH_TABLE H )
• {
• position pos;
• pos = find( key, H );
• if( H->the_cells[pos].info != legitimate )
• { /* ok to insert here */
• H->the_cells[pos].info = legitimate;
• H->the_cells[pos].element = key;
• /* Probably need strcpy!! */
• }
• }
4/6/2022 5.3 _ Hashing 129
Quadratic probing
• Although quadratic probing eliminates primary
clustering, elements that hash to the same
position will probe the same alternate cells. This
is known as secondary clustering.
• Secondary clustering is a slight theoretical
blemish.
• Simulation results suggest that it generally causes
less than an extra probe per search.
• Double hashing technique eliminates this, but
does so at the cost of extra multiplications and
divisions.
4/6/2022 5.3 _ Hashing 130
Double Hashing
4/6/2022 5.3 _ Hashing 131
• For double hashing, one popular choice is f(i) = i.h2(x).
• This formula says that we apply a second hash function to x and
probe at a distance h2(x), 2h2(x), . . ., and so on.
• A poor choice of h2(x) would be disastrous.
• For instance, the obvious choice h2(x) = x mod 9 would not help if
99 were inserted into the input in the previous examples.
• Thus, the function must never evaluate to zero.
• It is also important to make sure all cells can be probed (this is not
possible in the example below, because the table size is not prime).
• A function such as h2(x) = R - (x mod R), with R a prime smaller
than H_SIZE, will work well.
• If we choose R = 7, then Figure shows the results of inserting the
same keys as before.
Double Hashing - Example
4/6/2022 5.3 _ Hashing 132
Double Hashing - Example
• The first collision occurs when 49 is inserted. h2(49) = 7
- 0 = 7, so 49 is inserted in position 6.
• h2(58) = 7 - 2 = 5, so 58 is inserted at location 3.
• Finally, 69 collides and is inserted at a distance h2(69) =
7 - 6 = 1 away. 69 is inserted at location 0.
• If we tried to insert 60 in position 0, we would have a
collision. Since h2(60) = 7 - 4 = 3, we would then try
positions 3, 6, 9, and then 2 until an empty spot is
found.
• It is generally possible to find some bad case, but there
are not too many here.
4/6/2022 5.3 _ Hashing 133
Double Hashing - Example
• As we have said before, the size of our sample hash table is not
prime.
• We have done this for convenience in computing the hash function,
but it is worth seeing why it is important to make sure the table size
is prime when double hashing is used.
• If we attempt to insert 23 into the table, it would collide with 58.
Since h2(23) = 7 - 2 = 5, and the table size is 10, we essentially have
only one alternate location, and it is already taken. Thus, if the table
size is not prime, it is possible to run out of alternate locations
prematurely.
• However, if double hashing is correctly implemented, simulations
imply that the expected number of probes is almost the same as for
a random collision resolution strategy.
• This makes double hashing theoretically interesting.
• Quadratic probing, however, does not require the use of a second
hash function and is thus likely to be simpler and faster in practice.
4/6/2022 5.3 _ Hashing 134
Rehashing
• If the table gets too full,
– the running time for the operations will start taking too long.
– inserts might fail for closed hashing with quadratic resolution.
• This can happen if there are too many deletions intermixed
with insertions.
• A solution, then, is
– to build another table that is about twice as big (with
associated new hash function) and
– scan down the entire original hash table,
– computing the new hash value for each (non-deleted) element
and
– inserting it in the new table.
4/6/2022 5.3 _ Hashing 135
Rehashing - Example
• As an example, suppose the elements 13, 15, 24, and 6
are inserted into a closed hash table of size 7.
• The hash function is h(x) = x mod 7.
• Suppose linear probing is used to resolve collisions.
• The resulting hash table appears in
4/6/2022 5.3 _ Hashing 136
Rehashing - Example
• If 23 is inserted into the table:
4/6/2022 5.3 _ Hashing 137
Rehashing - Example
• If 23 is inserted into the table, the resulting
table will be over 70 percent full.
• Because the table is so full, a new table is
created.
• The size of this table is 17, because this is the
first prime which is twice as large as the old
table size.
• The new hash function is then h(x) = x mod 17.
• The old table is scanned, and elements 6, 15,
23, 24, and 13 are inserted into the new table.
• The resulting table appears as
4/6/2022 5.3 _ Hashing 138
Rehashing - Example
• This entire operation is called rehashing.
• This is obviously a very expensive operation – the running
time is O(n), since there are n elements to rehash and the
table size is roughly 2n, but it is actually not all that bad,
because it happens very infrequently.
• In particular, there must have been n/2 inserts prior to the
last rehash, so it essentially adds a constant cost to each
insertion. (This is why the new table is made twice as large
as the old table.)
• If this data structure is part of the program, the effect is not
noticeable.
• On the other hand, if the hashing is performed as part of an
interactive system, then the unfortunate user whose
insertion caused a rehash could see a slowdown.
4/6/2022 5.3 _ Hashing 139
Rehashing Implementation
• Rehashing can be implemented in several ways with quadratic
probing.
– One alternative is to rehash as soon as the table is half full.
– The other extreme is to rehash only when an insertion fails.
– A third, middle of the road, strategy is to rehash when the table
reaches a certain load factor.
• Since performance does degrade as the load factor increases, the
third strategy, implemented with a good cutoff, could be best.
• Rehashing frees the programmer from worrying about the table
size and is important because hash tables cannot be made
arbitrarily large in complex programs.
• The exercises ask you to investigate the use of rehashing in
conjunction with lazy deletion.
• Rehashing can be used in other data structures as well.
• For instance, if the queue data structure of became full, we could
declare a double-sized array and copy everything over, freeing the
original.
4/6/2022 5.3 _ Hashing 140
Rehashing Implementation - Code
HASH_TABLE
rehash( HASH_TABLE H )
{
unsigned int i, old_size;
cell *old_cells;
/*1*/ old_cells = H->the_cells;
/*2*/ old_size = H->table_size;
/* Get a new, empty table */
/*3*/ H = initialize_table( 2*old_size );
/* Scan through old table, reinserting into new */
/*4*/ for( i=0; i<old_size; i++ )
/*5*/ if( old_cells[i].info == legitimate )
/*6*/ insert( old_cells[i].element, H );
/*7*/ free( old_cells );
/*8*/ return H;
}
4/6/2022 5.3 _ Hashing 141
Extendible Hashing
• We have deals with the case where the amount of data is too large
to fit in main memory.
• Here the main consideration then is the number of disk accesses
required to retrieve data.
• As before, we assume that at any point we have n records to store;
the value of n changes over time. Furthermore, at most m records
fit in one disk block.
• We will use m = 4 in this section.
• If either open hashing or closed hashing is used, the major
problem is that collisions could cause several blocks to be
examined during a find, even for a well-distributed hash table.
• Furthermore, when the table gets too full, an extremely expensive
rehashing step must be performed, which requires O(n) disk
accesses.
• A clever alternative, known as extendible hashing, allows a find to
be performed in two disk accesses. Insertions also require few disk
accesses.
4/6/2022 5.3 _ Hashing 142
Extendible Hashing
• We recall from previous discussions that a B-tree has depth
O(logm/2 n).
• As m increases, the depth of a B-tree decreases.
• We could in theory choose m to be so large that the depth of the B-
tree would be 1.
• Then any find after the first would take one disk access, since,
presumably, the root node could be stored in main memory.
• The problem with this strategy is that the branching factor is so
high that it would take considerable processing to determine which
leaf the data was in.
• If the time to perform this step could be reduced, then we would
have a practical scheme.
• This is exactly the strategy used by extendible hashing.
4/6/2022 5.3 _ Hashing 143
Extendible Hashing - Example
• Let us suppose, for the moment, that our
data consists of several six-bit integers.
• Figure shows an extendible hashing scheme
for this data.
• The root of the "tree" contains four pointers
determined by the leading two bits of the
data.
• Each leaf has up to m = 4 elements.
• It happens that in each leaf the first two bits
are identical; this is indicated by the number
in parentheses.
• To be more formal, D will represent the
number of bits used by the root, which is
sometimes known as the directory.
• The number of entries in the directory is
thus 2D.
• dl is the number of leading bits that all the
elements of some leaf l have in common.
• dl will depend on the particular leaf, and dl D.
4/6/2022 5.3 _ Hashing 144
Extendible Hashing - Example
• Suppose that we want to insert the key 100100.
• This would go into the third leaf, but as the third leaf is already full, there
is no room.
• We thus split this leaf into two leaves, which are now determined by the
first three bits.
• This requires increasing the directory size to 3.
• These changes are reflected in figure.
4/6/2022 5.3 _ Hashing 145
Extendible Hashing - Example
• Notice that all of the leaves not involved in the split are now
pointed to by two adjacent directory entries.
• Thus, although an entire directory is rewritten, none of the other
leaves are actually accessed.
• If the key 000000 is now inserted, then the first leaf is split,
generating two leaves with dl = 3.
• Since D = 3, the only change required in the directory is the
updating of the 000 and 001 pointers. See Figure.
4/6/2022 5.3 _ Hashing 146
Extendible Hashing - Example
• This very simple strategy provides quick access times for insert and
find operations on large databases.
• There are a few important details we have not considered.
• First, it is possible that several directory splits will be required if the
elements in a leaf agree in more than D + 1 leading bits.
• For instance, starting at the original example, with D = 2, if 111010,
111011, and finally 111100 are inserted, the directory size must be
increased to 4 to distinguish between the five keys.
• This is an easy detail to take care of, but must not be forgotten.
• Second, there is the possibility of duplicate keys; if there are more
than m duplicates, then this algorithm does not work at all.
• In this case, some other arrangements need to be made.
4/6/2022 5.3 _ Hashing 147
Thank you
4/6/2022 5.3 _ Hashing 148
Ad

More Related Content

What's hot (20)

Selection sorting
Selection sortingSelection sorting
Selection sorting
Himanshu Kesharwani
 
Binary Search - Design & Analysis of Algorithms
Binary Search - Design & Analysis of AlgorithmsBinary Search - Design & Analysis of Algorithms
Binary Search - Design & Analysis of Algorithms
Drishti Bhalla
 
Merge sort algorithm
Merge sort algorithmMerge sort algorithm
Merge sort algorithm
srutisenpatra
 
Doubly Linked List
Doubly Linked ListDoubly Linked List
Doubly Linked List
V.V.Vanniaperumal College for Women
 
Merge Sort
Merge SortMerge Sort
Merge Sort
Nikhil Sonkamble
 
Searching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data StructureSearching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data Structure
Balwant Gorad
 
Linked list
Linked listLinked list
Linked list
KalaivaniKS1
 
linked list in data structure
linked list in data structure linked list in data structure
linked list in data structure
shameen khan
 
Complexity of Algorithm
Complexity of AlgorithmComplexity of Algorithm
Complexity of Algorithm
Muhammad Muzammal
 
Hashing Technique In Data Structures
Hashing Technique In Data StructuresHashing Technique In Data Structures
Hashing Technique In Data Structures
SHAKOOR AB
 
Searching techniques in Data Structure And Algorithm
Searching techniques in Data Structure And AlgorithmSearching techniques in Data Structure And Algorithm
Searching techniques in Data Structure And Algorithm
03446940736
 
Introduction to data structure
Introduction to data structure Introduction to data structure
Introduction to data structure
NUPOORAWSARMOL
 
Binary Search
Binary SearchBinary Search
Binary Search
kunj desai
 
Linear search-and-binary-search
Linear search-and-binary-searchLinear search-and-binary-search
Linear search-and-binary-search
International Islamic University
 
Data Structures - Lecture 9 [Stack & Queue using Linked List]
 Data Structures - Lecture 9 [Stack & Queue using Linked List] Data Structures - Lecture 9 [Stack & Queue using Linked List]
Data Structures - Lecture 9 [Stack & Queue using Linked List]
Muhammad Hammad Waseem
 
Binary Tree Traversal
Binary Tree TraversalBinary Tree Traversal
Binary Tree Traversal
Dhrumil Panchal
 
Expression trees
Expression treesExpression trees
Expression trees
Salman Vadsarya
 
Doubly Linked List
Doubly Linked ListDoubly Linked List
Doubly Linked List
Ninad Mankar
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structure
Sajid Marwat
 
Queue AS an ADT (Abstract Data Type)
Queue AS an ADT (Abstract Data Type)Queue AS an ADT (Abstract Data Type)
Queue AS an ADT (Abstract Data Type)
Self-Employed
 
Binary Search - Design & Analysis of Algorithms
Binary Search - Design & Analysis of AlgorithmsBinary Search - Design & Analysis of Algorithms
Binary Search - Design & Analysis of Algorithms
Drishti Bhalla
 
Merge sort algorithm
Merge sort algorithmMerge sort algorithm
Merge sort algorithm
srutisenpatra
 
Searching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data StructureSearching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data Structure
Balwant Gorad
 
linked list in data structure
linked list in data structure linked list in data structure
linked list in data structure
shameen khan
 
Hashing Technique In Data Structures
Hashing Technique In Data StructuresHashing Technique In Data Structures
Hashing Technique In Data Structures
SHAKOOR AB
 
Searching techniques in Data Structure And Algorithm
Searching techniques in Data Structure And AlgorithmSearching techniques in Data Structure And Algorithm
Searching techniques in Data Structure And Algorithm
03446940736
 
Introduction to data structure
Introduction to data structure Introduction to data structure
Introduction to data structure
NUPOORAWSARMOL
 
Data Structures - Lecture 9 [Stack & Queue using Linked List]
 Data Structures - Lecture 9 [Stack & Queue using Linked List] Data Structures - Lecture 9 [Stack & Queue using Linked List]
Data Structures - Lecture 9 [Stack & Queue using Linked List]
Muhammad Hammad Waseem
 
Doubly Linked List
Doubly Linked ListDoubly Linked List
Doubly Linked List
Ninad Mankar
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structure
Sajid Marwat
 
Queue AS an ADT (Abstract Data Type)
Queue AS an ADT (Abstract Data Type)Queue AS an ADT (Abstract Data Type)
Queue AS an ADT (Abstract Data Type)
Self-Employed
 

Similar to Searching, Sorting and Hashing Techniques (20)

DS - Unit 2 FINAL (2).pptx
DS - Unit 2 FINAL (2).pptxDS - Unit 2 FINAL (2).pptx
DS - Unit 2 FINAL (2).pptx
prakashvs7
 
Data Structures_ Sorting & Searching
Data Structures_ Sorting & SearchingData Structures_ Sorting & Searching
Data Structures_ Sorting & Searching
ThenmozhiK5
 
Chapter 2. data structure and algorithm
Chapter  2. data structure and algorithmChapter  2. data structure and algorithm
Chapter 2. data structure and algorithm
SolomonEndalu
 
Searching,sorting
Searching,sortingSearching,sorting
Searching,sorting
LavanyaJ28
 
Data Structures Unit 2 FINAL presentation.pptx
Data Structures   Unit 2 FINAL presentation.pptxData Structures   Unit 2 FINAL presentation.pptx
Data Structures Unit 2 FINAL presentation.pptx
dilipd20
 
Searching_Sorting.pptx
Searching_Sorting.pptxSearching_Sorting.pptx
Searching_Sorting.pptx
21BD1A058RSahithi
 
MODULE 5-Searching and-sorting
MODULE 5-Searching and-sortingMODULE 5-Searching and-sorting
MODULE 5-Searching and-sorting
nikshaikh786
 
search_sort.ppt
search_sort.pptsearch_sort.ppt
search_sort.ppt
SwatiHans10
 
Unit 6 dsa SEARCHING AND SORTING
Unit 6 dsa SEARCHING AND SORTINGUnit 6 dsa SEARCHING AND SORTING
Unit 6 dsa SEARCHING AND SORTING
PUNE VIDYARTHI GRIHA'S COLLEGE OF ENGINEERING, NASHIK
 
Chapter-2.pptx
Chapter-2.pptxChapter-2.pptx
Chapter-2.pptx
selemonGamo
 
Searching and Sorting Algorithms in Data Structures
Searching and Sorting Algorithms  in Data StructuresSearching and Sorting Algorithms  in Data Structures
Searching and Sorting Algorithms in Data Structures
poongothai11
 
searching in data structure.pptx
searching in data structure.pptxsearching in data structure.pptx
searching in data structure.pptx
chouguleamruta24
 
Data structure and algorithms
Data structure and algorithmsData structure and algorithms
Data structure and algorithms
technologygyan
 
DFC30233_CHAPTER 6 (1).pptxxxxxxxxxxxxxxxxxxxxxxxx
DFC30233_CHAPTER 6 (1).pptxxxxxxxxxxxxxxxxxxxxxxxxDFC30233_CHAPTER 6 (1).pptxxxxxxxxxxxxxxxxxxxxxxxx
DFC30233_CHAPTER 6 (1).pptxxxxxxxxxxxxxxxxxxxxxxxx
rajinevitable05
 
Data structure Unit - II Searching and Sorting.pptx
Data structure Unit - II Searching and Sorting.pptxData structure Unit - II Searching and Sorting.pptx
Data structure Unit - II Searching and Sorting.pptx
gavanisanjana
 
advanced searching and sorting.pdf
advanced searching and sorting.pdfadvanced searching and sorting.pdf
advanced searching and sorting.pdf
haramaya university
 
Searching and sorting
Searching  and sortingSearching  and sorting
Searching and sorting
PoojithaBollikonda
 
Data Structure and Algorithm Chapter 2.ppsx Chapter 2.ppsx
Data Structure and Algorithm  Chapter 2.ppsx Chapter 2.ppsxData Structure and Algorithm  Chapter 2.ppsx Chapter 2.ppsx
Data Structure and Algorithm Chapter 2.ppsx Chapter 2.ppsx
SolomonEndalu
 
SEARCHING AND SORTING ALGORITHMS
SEARCHING AND SORTING ALGORITHMSSEARCHING AND SORTING ALGORITHMS
SEARCHING AND SORTING ALGORITHMS
Gokul Hari
 
Data operatons & searching and sorting algorithms
Data operatons & searching and sorting algorithmsData operatons & searching and sorting algorithms
Data operatons & searching and sorting algorithms
Anushdika Jeganathan
 
DS - Unit 2 FINAL (2).pptx
DS - Unit 2 FINAL (2).pptxDS - Unit 2 FINAL (2).pptx
DS - Unit 2 FINAL (2).pptx
prakashvs7
 
Data Structures_ Sorting & Searching
Data Structures_ Sorting & SearchingData Structures_ Sorting & Searching
Data Structures_ Sorting & Searching
ThenmozhiK5
 
Chapter 2. data structure and algorithm
Chapter  2. data structure and algorithmChapter  2. data structure and algorithm
Chapter 2. data structure and algorithm
SolomonEndalu
 
Searching,sorting
Searching,sortingSearching,sorting
Searching,sorting
LavanyaJ28
 
Data Structures Unit 2 FINAL presentation.pptx
Data Structures   Unit 2 FINAL presentation.pptxData Structures   Unit 2 FINAL presentation.pptx
Data Structures Unit 2 FINAL presentation.pptx
dilipd20
 
MODULE 5-Searching and-sorting
MODULE 5-Searching and-sortingMODULE 5-Searching and-sorting
MODULE 5-Searching and-sorting
nikshaikh786
 
Searching and Sorting Algorithms in Data Structures
Searching and Sorting Algorithms  in Data StructuresSearching and Sorting Algorithms  in Data Structures
Searching and Sorting Algorithms in Data Structures
poongothai11
 
searching in data structure.pptx
searching in data structure.pptxsearching in data structure.pptx
searching in data structure.pptx
chouguleamruta24
 
Data structure and algorithms
Data structure and algorithmsData structure and algorithms
Data structure and algorithms
technologygyan
 
DFC30233_CHAPTER 6 (1).pptxxxxxxxxxxxxxxxxxxxxxxxx
DFC30233_CHAPTER 6 (1).pptxxxxxxxxxxxxxxxxxxxxxxxxDFC30233_CHAPTER 6 (1).pptxxxxxxxxxxxxxxxxxxxxxxxx
DFC30233_CHAPTER 6 (1).pptxxxxxxxxxxxxxxxxxxxxxxxx
rajinevitable05
 
Data structure Unit - II Searching and Sorting.pptx
Data structure Unit - II Searching and Sorting.pptxData structure Unit - II Searching and Sorting.pptx
Data structure Unit - II Searching and Sorting.pptx
gavanisanjana
 
advanced searching and sorting.pdf
advanced searching and sorting.pdfadvanced searching and sorting.pdf
advanced searching and sorting.pdf
haramaya university
 
Data Structure and Algorithm Chapter 2.ppsx Chapter 2.ppsx
Data Structure and Algorithm  Chapter 2.ppsx Chapter 2.ppsxData Structure and Algorithm  Chapter 2.ppsx Chapter 2.ppsx
Data Structure and Algorithm Chapter 2.ppsx Chapter 2.ppsx
SolomonEndalu
 
SEARCHING AND SORTING ALGORITHMS
SEARCHING AND SORTING ALGORITHMSSEARCHING AND SORTING ALGORITHMS
SEARCHING AND SORTING ALGORITHMS
Gokul Hari
 
Data operatons & searching and sorting algorithms
Data operatons & searching and sorting algorithmsData operatons & searching and sorting algorithms
Data operatons & searching and sorting algorithms
Anushdika Jeganathan
 
Ad

More from Selvaraj Seerangan (20)

Unit 2,3,4 _ Internet of Things A Hands-On Approach (Arshdeep Bahga, Vijay Ma...
Unit 2,3,4 _ Internet of Things A Hands-On Approach (Arshdeep Bahga, Vijay Ma...Unit 2,3,4 _ Internet of Things A Hands-On Approach (Arshdeep Bahga, Vijay Ma...
Unit 2,3,4 _ Internet of Things A Hands-On Approach (Arshdeep Bahga, Vijay Ma...
Selvaraj Seerangan
 
Unit 5 _ Fog Computing .pdf
Unit 5 _ Fog Computing .pdfUnit 5 _ Fog Computing .pdf
Unit 5 _ Fog Computing .pdf
Selvaraj Seerangan
 
CAT III Answer Key.pdf
CAT III Answer Key.pdfCAT III Answer Key.pdf
CAT III Answer Key.pdf
Selvaraj Seerangan
 
END SEM _ Design Thinking _ 16 Templates.pptx
END SEM _ Design Thinking _ 16 Templates.pptxEND SEM _ Design Thinking _ 16 Templates.pptx
END SEM _ Design Thinking _ 16 Templates.pptx
Selvaraj Seerangan
 
Design Thinking _ Complete Templates.pptx
Design Thinking _ Complete Templates.pptxDesign Thinking _ Complete Templates.pptx
Design Thinking _ Complete Templates.pptx
Selvaraj Seerangan
 
CAT 3 _ List of Templates.pptx
CAT 3 _ List of Templates.pptxCAT 3 _ List of Templates.pptx
CAT 3 _ List of Templates.pptx
Selvaraj Seerangan
 
[PPT] _ Unit 5 _ Evolve.pptx
[PPT] _ Unit 5 _ Evolve.pptx[PPT] _ Unit 5 _ Evolve.pptx
[PPT] _ Unit 5 _ Evolve.pptx
Selvaraj Seerangan
 
[PPT] _ Unit 4 _ Engage.pptx
[PPT] _ Unit 4 _ Engage.pptx[PPT] _ Unit 4 _ Engage.pptx
[PPT] _ Unit 4 _ Engage.pptx
Selvaraj Seerangan
 
[PPT] _ Unit 3 _ Experiment.pptx
[PPT] _ Unit 3 _ Experiment.pptx[PPT] _ Unit 3 _ Experiment.pptx
[PPT] _ Unit 3 _ Experiment.pptx
Selvaraj Seerangan
 
CAT 2 _ List of Templates.pptx
CAT 2 _ List of Templates.pptxCAT 2 _ List of Templates.pptx
CAT 2 _ List of Templates.pptx
Selvaraj Seerangan
 
Design Thinking - Empathize Phase
Design Thinking - Empathize PhaseDesign Thinking - Empathize Phase
Design Thinking - Empathize Phase
Selvaraj Seerangan
 
CAT-II Answer Key.pdf
CAT-II Answer Key.pdfCAT-II Answer Key.pdf
CAT-II Answer Key.pdf
Selvaraj Seerangan
 
PSP LAB MANUAL.pdf
PSP LAB MANUAL.pdfPSP LAB MANUAL.pdf
PSP LAB MANUAL.pdf
Selvaraj Seerangan
 
18CSL51 - Network Lab Manual.pdf
18CSL51 - Network Lab Manual.pdf18CSL51 - Network Lab Manual.pdf
18CSL51 - Network Lab Manual.pdf
Selvaraj Seerangan
 
DS LAB MANUAL.pdf
DS LAB MANUAL.pdfDS LAB MANUAL.pdf
DS LAB MANUAL.pdf
Selvaraj Seerangan
 
CAT 1 _ List of Templates.pptx
CAT 1 _ List of Templates.pptxCAT 1 _ List of Templates.pptx
CAT 1 _ List of Templates.pptx
Selvaraj Seerangan
 
[PPT] _ UNIT 1 _ COMPLETE.pptx
[PPT] _ UNIT 1 _ COMPLETE.pptx[PPT] _ UNIT 1 _ COMPLETE.pptx
[PPT] _ UNIT 1 _ COMPLETE.pptx
Selvaraj Seerangan
 
CAT-1 Answer Key.doc
CAT-1 Answer Key.docCAT-1 Answer Key.doc
CAT-1 Answer Key.doc
Selvaraj Seerangan
 
Unit 3 Complete.pptx
Unit 3 Complete.pptxUnit 3 Complete.pptx
Unit 3 Complete.pptx
Selvaraj Seerangan
 
[PPT] _ Unit 2 _ 9.0 _ Domain Specific IoT _Home Automation.pdf
[PPT] _ Unit 2 _ 9.0 _ Domain Specific IoT _Home Automation.pdf[PPT] _ Unit 2 _ 9.0 _ Domain Specific IoT _Home Automation.pdf
[PPT] _ Unit 2 _ 9.0 _ Domain Specific IoT _Home Automation.pdf
Selvaraj Seerangan
 
Unit 2,3,4 _ Internet of Things A Hands-On Approach (Arshdeep Bahga, Vijay Ma...
Unit 2,3,4 _ Internet of Things A Hands-On Approach (Arshdeep Bahga, Vijay Ma...Unit 2,3,4 _ Internet of Things A Hands-On Approach (Arshdeep Bahga, Vijay Ma...
Unit 2,3,4 _ Internet of Things A Hands-On Approach (Arshdeep Bahga, Vijay Ma...
Selvaraj Seerangan
 
END SEM _ Design Thinking _ 16 Templates.pptx
END SEM _ Design Thinking _ 16 Templates.pptxEND SEM _ Design Thinking _ 16 Templates.pptx
END SEM _ Design Thinking _ 16 Templates.pptx
Selvaraj Seerangan
 
Design Thinking _ Complete Templates.pptx
Design Thinking _ Complete Templates.pptxDesign Thinking _ Complete Templates.pptx
Design Thinking _ Complete Templates.pptx
Selvaraj Seerangan
 
CAT 3 _ List of Templates.pptx
CAT 3 _ List of Templates.pptxCAT 3 _ List of Templates.pptx
CAT 3 _ List of Templates.pptx
Selvaraj Seerangan
 
[PPT] _ Unit 3 _ Experiment.pptx
[PPT] _ Unit 3 _ Experiment.pptx[PPT] _ Unit 3 _ Experiment.pptx
[PPT] _ Unit 3 _ Experiment.pptx
Selvaraj Seerangan
 
CAT 2 _ List of Templates.pptx
CAT 2 _ List of Templates.pptxCAT 2 _ List of Templates.pptx
CAT 2 _ List of Templates.pptx
Selvaraj Seerangan
 
Design Thinking - Empathize Phase
Design Thinking - Empathize PhaseDesign Thinking - Empathize Phase
Design Thinking - Empathize Phase
Selvaraj Seerangan
 
18CSL51 - Network Lab Manual.pdf
18CSL51 - Network Lab Manual.pdf18CSL51 - Network Lab Manual.pdf
18CSL51 - Network Lab Manual.pdf
Selvaraj Seerangan
 
CAT 1 _ List of Templates.pptx
CAT 1 _ List of Templates.pptxCAT 1 _ List of Templates.pptx
CAT 1 _ List of Templates.pptx
Selvaraj Seerangan
 
[PPT] _ UNIT 1 _ COMPLETE.pptx
[PPT] _ UNIT 1 _ COMPLETE.pptx[PPT] _ UNIT 1 _ COMPLETE.pptx
[PPT] _ UNIT 1 _ COMPLETE.pptx
Selvaraj Seerangan
 
[PPT] _ Unit 2 _ 9.0 _ Domain Specific IoT _Home Automation.pdf
[PPT] _ Unit 2 _ 9.0 _ Domain Specific IoT _Home Automation.pdf[PPT] _ Unit 2 _ 9.0 _ Domain Specific IoT _Home Automation.pdf
[PPT] _ Unit 2 _ 9.0 _ Domain Specific IoT _Home Automation.pdf
Selvaraj Seerangan
 
Ad

Recently uploaded (20)

Urban Transport Infrastructure September 2023
Urban Transport Infrastructure September 2023Urban Transport Infrastructure September 2023
Urban Transport Infrastructure September 2023
Rajesh Prasad
 
Deepfake Phishing: A New Frontier in Cyber Threats
Deepfake Phishing: A New Frontier in Cyber ThreatsDeepfake Phishing: A New Frontier in Cyber Threats
Deepfake Phishing: A New Frontier in Cyber Threats
RaviKumar256934
 
698642933-DdocfordownloadEEP-FAKE-PPT.pptx
698642933-DdocfordownloadEEP-FAKE-PPT.pptx698642933-DdocfordownloadEEP-FAKE-PPT.pptx
698642933-DdocfordownloadEEP-FAKE-PPT.pptx
speedcomcyber25
 
Understand water laser communication using Arduino laser and solar panel
Understand water laser communication using Arduino laser and solar panelUnderstand water laser communication using Arduino laser and solar panel
Understand water laser communication using Arduino laser and solar panel
NaveenBotsa
 
PYTHON--QUIZ-1_20250422_002514_0000.pptx
PYTHON--QUIZ-1_20250422_002514_0000.pptxPYTHON--QUIZ-1_20250422_002514_0000.pptx
PYTHON--QUIZ-1_20250422_002514_0000.pptx
rmvigram
 
Frontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend EngineersFrontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend Engineers
Michael Hertzberg
 
Introduction to Additive Manufacturing(3D printing)
Introduction to Additive Manufacturing(3D printing)Introduction to Additive Manufacturing(3D printing)
Introduction to Additive Manufacturing(3D printing)
vijimech408
 
ldr darkness sensor circuit.pptx for engineers
ldr darkness sensor circuit.pptx for engineersldr darkness sensor circuit.pptx for engineers
ldr darkness sensor circuit.pptx for engineers
PravalikaChidurala
 
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdfLittle Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
gori42199
 
Automatic Quality Assessment for Speech and Beyond
Automatic Quality Assessment for Speech and BeyondAutomatic Quality Assessment for Speech and Beyond
Automatic Quality Assessment for Speech and Beyond
NU_I_TODALAB
 
Personal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.pptPersonal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.ppt
ganjangbegu579
 
Domain1_Security_Principles --(My_Notes)
Domain1_Security_Principles --(My_Notes)Domain1_Security_Principles --(My_Notes)
Domain1_Security_Principles --(My_Notes)
efs14135
 
Python Functions, Modules and Packages
Python Functions, Modules and PackagesPython Functions, Modules and Packages
Python Functions, Modules and Packages
Dr. A. B. Shinde
 
Espresso PD Official MP_eng Version.pptx
Espresso PD Official MP_eng Version.pptxEspresso PD Official MP_eng Version.pptx
Espresso PD Official MP_eng Version.pptx
NingChacha1
 
HSE Induction for heat stress work .pptx
HSE Induction for heat stress work .pptxHSE Induction for heat stress work .pptx
HSE Induction for heat stress work .pptx
agraahmed
 
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
SanjeetMishra29
 
Compressive Strength Estimation of Mesh Embedded Masonry Prism Using Empirica...
Compressive Strength Estimation of Mesh Embedded Masonry Prism Using Empirica...Compressive Strength Estimation of Mesh Embedded Masonry Prism Using Empirica...
Compressive Strength Estimation of Mesh Embedded Masonry Prism Using Empirica...
Journal of Soft Computing in Civil Engineering
 
Zeiss-Ultra-Optimeter metrology subject.pdf
Zeiss-Ultra-Optimeter metrology subject.pdfZeiss-Ultra-Optimeter metrology subject.pdf
Zeiss-Ultra-Optimeter metrology subject.pdf
Saikumar174642
 
22PCOAM16 Unit 3 Session 23 Different ways to Combine Classifiers.pptx
22PCOAM16 Unit 3 Session 23  Different ways to Combine Classifiers.pptx22PCOAM16 Unit 3 Session 23  Different ways to Combine Classifiers.pptx
22PCOAM16 Unit 3 Session 23 Different ways to Combine Classifiers.pptx
Guru Nanak Technical Institutions
 
Using the Artificial Neural Network to Predict the Axial Strength and Strain ...
Using the Artificial Neural Network to Predict the Axial Strength and Strain ...Using the Artificial Neural Network to Predict the Axial Strength and Strain ...
Using the Artificial Neural Network to Predict the Axial Strength and Strain ...
Journal of Soft Computing in Civil Engineering
 
Urban Transport Infrastructure September 2023
Urban Transport Infrastructure September 2023Urban Transport Infrastructure September 2023
Urban Transport Infrastructure September 2023
Rajesh Prasad
 
Deepfake Phishing: A New Frontier in Cyber Threats
Deepfake Phishing: A New Frontier in Cyber ThreatsDeepfake Phishing: A New Frontier in Cyber Threats
Deepfake Phishing: A New Frontier in Cyber Threats
RaviKumar256934
 
698642933-DdocfordownloadEEP-FAKE-PPT.pptx
698642933-DdocfordownloadEEP-FAKE-PPT.pptx698642933-DdocfordownloadEEP-FAKE-PPT.pptx
698642933-DdocfordownloadEEP-FAKE-PPT.pptx
speedcomcyber25
 
Understand water laser communication using Arduino laser and solar panel
Understand water laser communication using Arduino laser and solar panelUnderstand water laser communication using Arduino laser and solar panel
Understand water laser communication using Arduino laser and solar panel
NaveenBotsa
 
PYTHON--QUIZ-1_20250422_002514_0000.pptx
PYTHON--QUIZ-1_20250422_002514_0000.pptxPYTHON--QUIZ-1_20250422_002514_0000.pptx
PYTHON--QUIZ-1_20250422_002514_0000.pptx
rmvigram
 
Frontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend EngineersFrontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend Engineers
Michael Hertzberg
 
Introduction to Additive Manufacturing(3D printing)
Introduction to Additive Manufacturing(3D printing)Introduction to Additive Manufacturing(3D printing)
Introduction to Additive Manufacturing(3D printing)
vijimech408
 
ldr darkness sensor circuit.pptx for engineers
ldr darkness sensor circuit.pptx for engineersldr darkness sensor circuit.pptx for engineers
ldr darkness sensor circuit.pptx for engineers
PravalikaChidurala
 
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdfLittle Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
gori42199
 
Automatic Quality Assessment for Speech and Beyond
Automatic Quality Assessment for Speech and BeyondAutomatic Quality Assessment for Speech and Beyond
Automatic Quality Assessment for Speech and Beyond
NU_I_TODALAB
 
Personal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.pptPersonal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.ppt
ganjangbegu579
 
Domain1_Security_Principles --(My_Notes)
Domain1_Security_Principles --(My_Notes)Domain1_Security_Principles --(My_Notes)
Domain1_Security_Principles --(My_Notes)
efs14135
 
Python Functions, Modules and Packages
Python Functions, Modules and PackagesPython Functions, Modules and Packages
Python Functions, Modules and Packages
Dr. A. B. Shinde
 
Espresso PD Official MP_eng Version.pptx
Espresso PD Official MP_eng Version.pptxEspresso PD Official MP_eng Version.pptx
Espresso PD Official MP_eng Version.pptx
NingChacha1
 
HSE Induction for heat stress work .pptx
HSE Induction for heat stress work .pptxHSE Induction for heat stress work .pptx
HSE Induction for heat stress work .pptx
agraahmed
 
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
SanjeetMishra29
 
Zeiss-Ultra-Optimeter metrology subject.pdf
Zeiss-Ultra-Optimeter metrology subject.pdfZeiss-Ultra-Optimeter metrology subject.pdf
Zeiss-Ultra-Optimeter metrology subject.pdf
Saikumar174642
 
22PCOAM16 Unit 3 Session 23 Different ways to Combine Classifiers.pptx
22PCOAM16 Unit 3 Session 23  Different ways to Combine Classifiers.pptx22PCOAM16 Unit 3 Session 23  Different ways to Combine Classifiers.pptx
22PCOAM16 Unit 3 Session 23 Different ways to Combine Classifiers.pptx
Guru Nanak Technical Institutions
 

Searching, Sorting and Hashing Techniques

  • 1. UNIT V : Searching, Sorting and Hashing By Mr.S.Selvaraj Asst. Professor (SRG) / CSE Kongu Engineering College Perundurai, Erode, Tamilnadu, India Thanks to and Resource from : Data Structures and Algorithm Analysis in C by Mark Allen Weiss & Sumitabha Das, “Computer Fundamentals and C Programming”, 1st Edition, McGraw Hill, 2018. 20CST32 – Data Structures
  • 2. Syllabus – Unit Wise 4/6/2022 5.1 _ Searching 2
  • 3. List of Exercises 4/6/2022 5.1 _ Searching 3
  • 4. Text Book and Reference Book 4/6/2022 5.1 _ Searching 4
  • 5. Unit V : Contents 1. Searching – Linear search – Binary Search 2. Sorting: – Internal sorting: • Insertion sort • Selection sort • Bubble sort • Shell sort • Quick Sort • Heap sort • Bucket sort – External sorting: • Merge Sort • Multiway Merge • Polyphase Merge 3. Hashing: – Hash Functions – Separate Chaining – Closed Hashing (Open Addressing) • Linear Probing • Quadratic Probing • Double Hashing – Rehashing – Extendible Hashing. 4/6/2022 5 5.1 _ Searching
  • 6. Searching • Search is a process of finding a value in a list of values. • In other words, searching is the process of locating given value position in a list of values. 4/6/2022 5.1 _ Searching 6
  • 7. Types of Search • Linear Search • Binary Search • Interpolation search • Sublist search • Exponential search • Jump search • Fibonacci search and etc., 4/6/2022 5.1 _ Searching 7
  • 8. Linear Search • Linear search algorithm finds a given element in a list of elements with O(n) time complexity where n is total number of elements in the list. • This search process starts comparing search element with the first element in the list. • If both are matched then result is element found otherwise search element is compared with the next element in the list. • Repeat the same until search element is compared with the last element in the list, if that last element also doesn't match, then the result is "Element not found in the list". • That means, the search element is compared with element by element in the list. 4/6/2022 5.1 _ Searching 8
  • 9. Linear Search - Algorithm • Step 1 - Read the search element from the user. • Step 2 - Compare the search element with the first element in the list. • Step 3 - If both are matched, then display "Given element is found!!!" and terminate the function. • Step 4 - If both are not matched, then compare search element with the next element in the list. • Step 5 - Repeat steps 3 and 4 until search element is compared with last element in the list. • Step 6 - If last element in the list also doesn't match, then display "Element is not found!!!" and terminate the function 4/6/2022 5.1 _ Searching 9
  • 10. Linear Search - Example 4/6/2022 5.1 _ Searching 10
  • 11. 4/6/2022 5.1 _ Searching 11
  • 12. 4/6/2022 5.1 _ Searching 12
  • 13. 4/6/2022 5.1 _ Searching 13
  • 14. Linear Search 4/6/2022 5.1 _ Searching 14
  • 15. Binary Search • Binary search finds a given element in a list of elements with O(logn) time complexity where n is total number of elements in the list. • The binary search algorithm can be used with only a sorted list of elements. • That means the binary search is used only with a list of elements that are already arranged in an order. • The binary search can not be used for a list of elements arranged in random order. • This search process starts comparing the search element with the middle element in the list. 4/6/2022 5.1 _ Searching 15
  • 16. Binary Search - Algorithm • Step 1 - Read the search element from the user. • Step 2 - Find the middle element in the sorted list. • Step 3 - Compare the search element with the middle element in the sorted list. • Step 4 - If both are matched, then display "Given element is found!!!" and terminate the function. • Step 5 - If both are not matched, then check whether the search element is smaller or larger than the middle element. • Step 6 - If the search element is smaller than middle element, repeat steps 2, 3, 4 and 5 for the left sublist of the middle element. • Step 7 - If the search element is larger than middle element, repeat steps 2, 3, 4 and 5 for the right sublist of the middle element. • Step 8 - Repeat the same process until we find the search element in the list or until sublist contains only one element. • Step 9 - If that element also doesn't match with the search element, then display "Element is not found in the list!!!" and terminate the function. 4/6/2022 5.1 _ Searching 16
  • 17. Binary Search - Example 4/6/2022 5.1 _ Searching 17
  • 18. 4/6/2022 5.1 _ Searching 18
  • 19. Binary Search - Program 4/6/2022 5.1 _ Searching 19
  • 20. 4/6/2022 5.1 _ Searching 20
  • 21. Binary Search 4/6/2022 5.1 _ Searching 21
  • 22. Thank you 4/6/2022 5.1 _ Searching 22
  • 23. UNIT V : Searching, Sorting and Hashing By Mr.S.Selvaraj Asst. Professor (SRG) / CSE Kongu Engineering College Perundurai, Erode, Tamilnadu, India Thanks to and Resource from : Data Structures and Algorithm Analysis in C by Mark Allen Weiss & Sumitabha Das, “Computer Fundamentals and C Programming”, 1st Edition, McGraw Hill, 2018. 20CST32 – Data Structures
  • 24. Unit V : Contents 1. Searching – Linear search – Binary Search 2. Sorting: – Internal sorting: • Insertion sort • Selection sort • Bubble sort • Shell sort • Quick Sort • Heap sort • Bucket sort – External sorting: • Merge Sort • Multiway Merge • Polyphase Merge 3. Hashing: – Hash Functions – Separate Chaining – Closed Hashing (Open Addressing) • Linear Probing • Quadratic Probing • Double Hashing – Rehashing – Extendible Hashing. 4/6/2022 24 5.2 _ Sorting
  • 25. Sorting • The arrangement of data in a preferred order is called sorting in the data structure. • By sorting data, it is easier to search through it quickly and easily. • The simplest example of sorting is a dictionary. • Before the era of the Internet, when you wanted to look up a word in a dictionary, you would do so in alphabetical order. This made it easy. 4/6/2022 5.2 _ Sorting 25
  • 26. Types of Sorting 4/6/2022 5.2 _ Sorting 26
  • 27. Types of Sorting • When all data is placed in-memory, then sorting is called internal sorting. • When all data that needs to be sorted cannot be placed in-memory at a time, the sorting is called external sorting. • External Sorting is used for massive amount of data. • Merge Sort and its variations are typically used for external sorting. • Some external storage like hard-disk, CD, etc is used for external storage. 4/6/2022 5.2 _ Sorting 27
  • 28. Internal Vs External Sorting 4/6/2022 5.2 _ Sorting 28
  • 29. Insertion Sort • Insertion sort is a simple sorting algorithm that works similar to the way you sort playing cards in your hands. • The array is virtually split into a sorted and an unsorted part. • Values from the unsorted part are picked and placed at the correct position in the sorted part. • This is an in-place comparison-based sorting algorithm. • Here, a sub-list is maintained which is always sorted. For example, the lower part of an array is maintained to be sorted. • An element which is to be 'insert'ed in this sorted sub-list, has to find its appropriate place and then it has to be inserted there. Hence the name, insertion sort. • The array is searched sequentially and unsorted items are moved and inserted into the sorted sub-list (in the same array). • This algorithm is not suitable for large data sets as its average and worst case complexity are of Ο(n2), where n is the number of items. 4/6/2022 5.2 _ Sorting 29
  • 30. Insertion Sort - Algorithm • To sort an array of size n in ascending order: – 1: Iterate from arr[1] to arr[n] over the array. – 2: Compare the current element (key) to its predecessor. – 3: If the key element is smaller than its predecessor, compare it to the elements before. Move the greater elements one position up to make space for the swapped element. 4/6/2022 5.2 _ Sorting 30
  • 31. Insertion Sort – Example 1 4/6/2022 5.2 _ Sorting 31
  • 32. 4/6/2022 5.2 _ Sorting 32 By now we have 14 and 27 in the sorted sub-list. Next, it compares 33 with 10. This process goes on until all the unsorted values are covered in a sorted sub-list.
  • 33. Insertion Sort – Example 2 4/6/2022 5.2 _ Sorting 33
  • 34. Selection Sort • Selection sort is a simple sorting algorithm. • This sorting algorithm is an in-place comparison-based algorithm in which the list is divided into two parts, the sorted part at the left end and the unsorted part at the right end. • Initially, the sorted part is empty and the unsorted part is the entire list. • The smallest element is selected from the unsorted array and swapped with the leftmost element, and that element becomes a part of the sorted array. • This process continues moving unsorted array boundary by one element to the right. • This algorithm is not suitable for large data sets as its average and worst case complexities are of Ο(n2), where n is the number of items. 4/6/2022 5.2 _ Sorting 34
  • 35. Selection Sort - Algorithm • Step 1 − Set MIN to location 0 • Step 2 − Search the minimum element in the list • Step 3 − Swap with value at location MIN • Step 4 − Increment MIN to point to next element • Step 5 − Repeat until list is sorted 4/6/2022 5.2 _ Sorting 35
  • 37. 4/6/2022 5.2 _ Sorting 37
  • 38. Bubble Sort • Bubble sort is a simple sorting algorithm. • This sorting algorithm is comparison-based algorithm in which each pair of adjacent elements is compared and the elements are swapped if they are not in order. • This algorithm is not suitable for large data sets as its average and worst case complexity are of Ο(n2) where n is the number of items. 4/6/2022 5.2 _ Sorting 38
  • 39. Bubble Sort - Algorithm 4/6/2022 5.2 _ Sorting 39
  • 40. Bubble Sort - Example 4/6/2022 5.2 _ Sorting 40
  • 41. 4/6/2022 5.2 _ Sorting 41
  • 42. Shell Sort • Shell sort, named after its inventor, Donald Shell, was one of the first algorithms to break the quadratic time barrier. • Shell sort is a highly efficient sorting algorithm and is based on insertion sort algorithm. • This algorithm avoids large shifts as in case of insertion sort. • If the smaller value is to the far right and has to be moved to the far left. • This algorithm uses insertion sort on a widely spread elements, first to sort them and then sorts the less widely spaced elements. • This spacing is termed as interval. • This algorithm is quite efficient for medium-sized data sets as its average and worst-case complexity of this algorithm depends on the gap sequence the best known is Ο(n), where n is the number of items. • And the worst case space complexity is O(n). 4/6/2022 5.2 _ Sorting 42
  • 43. Shell Sort - Algorithm 4/6/2022 5.2 _ Sorting 43
  • 44. Shell Sort - Example • In this example, we take the interval of 4. • Make a virtual sub-list of all values located at the interval of 4 positions. • Here these values are {35, 14}, {33, 19}, {42, 27} and {10, 44} 4/6/2022 5.2 _ Sorting 44
  • 45. Shell Sort - Example • We compare values in each sub-list and swap them (if necessary) in the original array. • After this step, the new array should look like this − 4/6/2022 5.2 _ Sorting 45
  • 46. • Shell sort uses insertion sort to sort the array. 4/6/2022 5.2 _ Sorting 46
  • 47. Quick Sort • Quick sort is a fast sorting algorithm used to sort a list of elements. • Quick sort algorithm is invented by C. A. R. Hoare. • The quick sort algorithm attempts to separate the list of elements into two parts and then sort each part recursively. • That means it use divide and conquer strategy. • In quick sort, the partition of the list is performed based on the element called pivot. • Here pivot element is one of the elements in the list. • The list is divided into two partitions such that – all elements to the left of pivot are smaller than the pivot and – all elements to the right of pivot are greater than or equal to the pivot • This algorithm is quite efficient for large-sized data sets as its average and worst-case complexity are O(n2), respectively. 4/6/2022 5.2 _ Sorting 47
  • 48. Quick Sort – Algorithm(Pivot) • Step 1 - Consider the first element of the list as pivot (i.e., Element at first position in the list). • Step 2 - Define two variables i and j. Set i and j to first and last elements of the list respectively. • Step 3 - Increment i until list[i] > pivot then stop. • Step 4 - Decrement j until list[j] < pivot then stop. • Step 5 - If i < j then exchange list[i] and list[j]. • Step 6 - Repeat steps 3,4 & 5 until i > j. • Step 7 - Exchange the pivot element with list[j] element. 4/6/2022 5.2 _ Sorting 48
  • 49. Quick Sort – Example 4/6/2022 5.2 _ Sorting 49
  • 50. 4/6/2022 5.2 _ Sorting 50
  • 51. 4/6/2022 5.2 _ Sorting 51
  • 52. 4/6/2022 5.2 _ Sorting 52
  • 53. 4/6/2022 5.2 _ Sorting 53
  • 54. Quick sort - Program 4/6/2022 5.2 _ Sorting 54
  • 55. 4/6/2022 5.2 _ Sorting 55
  • 56. Heap sort • Heap sort is one of the sorting algorithms used to arrange a list of elements in order. • Heapsort algorithm uses one of the tree concepts called Heap Tree. • In this sorting algorithm, we use – Max Heap to arrange list of elements in Descending order and – Min Heap to arrange list elements in Ascending order. 4/6/2022 5.2 _ Sorting 56
  • 57. Heap Sort - Algorithm • Step 1 - Construct a Binary Tree with given list of Elements. • Step 2 - Transform the Binary Tree into Min Heap. • Step 3 - Delete the root element from Min Heap using Heapify method. • Step 4 - Put the deleted element into the Sorted list. • Step 5 - Repeat the same until Min Heap becomes empty. • Step 6 - Display the sorted list. 4/6/2022 5.2 _ Sorting 57
  • 58. Heap Sort - Example 4/6/2022 5.2 _ Sorting 58
  • 59. 4/6/2022 5.2 _ Sorting 59
  • 60. 4/6/2022 5.2 _ Sorting 60
  • 61. 4/6/2022 5.2 _ Sorting 61
  • 62. 4/6/2022 5.2 _ Sorting 62
  • 63. 4/6/2022 5.2 _ Sorting 63
  • 64. 4/6/2022 5.2 _ Sorting 64
  • 65. Bucket Sort • Bucket Sort is a sorting algorithm that divides the unsorted array elements into several groups called buckets. • Each bucket is then sorted by using any of the suitable sorting algorithms or recursively applying the same bucket algorithm. • Finally, the sorted buckets are combined to form a final sorted array. • Scatter Gather Approach is used. 4/6/2022 5.2 _ Sorting 65
  • 66. Scatter Gather approach • The process of bucket sort can be understood as a scatter-gather approach. • Here, elements are first scattered into buckets then the elements in each bucket are sorted. • Finally, the elements are gathered in order. 4/6/2022 5.2 _ Sorting 66
  • 67. Bucket Sort - Example 4/6/2022 5.2 _ Sorting 67
  • 68. Method 2 4/6/2022 5.2 _ Sorting 68
  • 70. Creation of Bucket 4/6/2022 5.2 _ Sorting 70
  • 71. Insertion into bucket 4/6/2022 5.2 _ Sorting 71
  • 73. Sort each bucket using insertion sort 4/6/2022 5.2 _ Sorting 73
  • 75. Program – bucket sort 4/6/2022 5.2 _ Sorting 75
  • 76. Bucket Sort – Time Complexity 4/6/2022 5.2 _ Sorting 76
  • 77. Merge Sort • Merge sort is a sorting technique based on divide and conquer technique. • With worst-case time complexity being Ο(n log n). • It is one of the most respected algorithms. • Merge sort first divides the array into equal halves and then combines them in a sorted manner. 4/6/2022 5.2 _ Sorting 77
  • 78. Merge Sort - Algorithm 4/6/2022 5.2 _ Sorting 78
  • 79. Merge Sort - Example 4/6/2022 5.2 _ Sorting 79
  • 80. Merge Sort – Example (Contd.,) 4/6/2022 5.2 _ Sorting 80
  • 81. Merge Sort – Example 2 4/6/2022 5.2 _ Sorting 81
  • 82. Applications and Drawbacks of Merge Sort • Applications: – Merge Sort is useful for sorting linked lists in O(nLogn) time. – Inversion Count Problem – Used in External Sorting • Drawbacks: – Slower comparative to the other sort algorithms for smaller tasks. – Merge sort algorithm requires an additional memory space of 0(n) for the temporary array. – It goes through the whole process even if the array is sorted. 4/6/2022 5.2 _ Sorting 82
  • 83. 2 way merge 4/6/2022 5.2 _ Sorting 83
  • 84. 4 way merge 4/6/2022 5.2 _ Sorting 84
  • 85. Different ways of merge 4/6/2022 5.2 _ Sorting 85
  • 86. 2-way merge- example 4/6/2022 5.2 _ Sorting 86
  • 87. Mutiway Merge Sort • The basic external sorting algorithm uses the merge routine from mergesort. • Suppose we have four tapes, Ta1, Ta2, Tb1, Tb2, which are two input and two output tapes. • Depending on the point in the algorithm, the a and b tapes are either input tapes or output tapes. • Suppose the data is initially on Ta1. Suppose further that the internal memory can hold (and sort) m records at a time. • A natural first step is to read m records at a time from the input tape, sort the records internally, and then write the sorted records alternately to Tb1 and Tb2. • We will call each set of sorted records a run. When this is done, we rewind all the tapes. 4/6/2022 5.2 _ Sorting 87
  • 88. Multiway Merge - Example • If m = 3, then after the runs are constructed, the tapes will contain the data indicated in the following figure. 4/6/2022 5.2 _ Sorting 88
  • 89. Multiway Merge – Example (Contd.,) • This algorithm will require log(n/m) passes, plus the initial run constructing pass. • For instance, if we have 10 million records of 128 bytes each, and four megabytes of internal memory, then the first pass will create 320 runs. We would then need nine more passes to complete the sort. • Our example requires log 13/3 = 3 more passes, which are shown in the following figure. 4/6/2022 5.2 _ Sorting 89
  • 90. Multiway Merge – Example (Contd.,) 4/6/2022 5.2 _ Sorting 90
  • 91. Polyphase Merge • A polyphase merge sort is an algorithm which decreases the number of runs at every iteration of the main loop by merging runs into larger runs. • It is used for external sorting. • In this type of sort, the tapes being merged, and the tape to which the merged sub files are written, vary continuously throughout the sort. • In this technique, the concept of a pass through records is not as clear-cut as in the straight or the natural merge. 4/6/2022 5.2 _ Sorting 91
  • 92. 4/6/2022 5.2 _ Sorting 92
  • 93. Polyphase Merge • The k-way merging strategy requires the use of 2k tapes. • This could be prohibitive for some applications. • It is possible to get by with only k+1 tape. • Suppose we have three tapes, T1, T2, T3 and an input file on T1 that will produce 34 runs. • One options in to put 17 runs each on T2 and T3. • We could then the merge this result onto T1, obtaining one tape with 17 runs. • The problem is that since all the runs are on one tape, we must now put some of these runs on T2 to perform another merge. • The logical way to do this is to copy first 8 runs from T1 onto T2 and then perform the merge. • This has effect on adding an extra half pass for every pass we do. 4/6/2022 5.2 _ Sorting 93
  • 94. Polyphase Merge • Alternative method is to split the original 34 runs unevenly. • Suppose we put 21 runs on T2 and 13 runs on T3. • We would then merge 13 runs onto T1 before T3 was empty. • At this point we could rewind T1 and T3 and merge T1 with 13 runs and T2 which has 8 runs onto T3. • We would then merge 8 runs untill T2 was empty which would leave 5 runs left on T1 and 8 runs on T3. • We could then merge T1 and T3 and so on. 4/6/2022 5.2 _ Sorting 94
  • 95. Polyphase Merge • The original distribution of runs makes a lot of difference. • E.g. if 22 runs are placed on T2 with 12 on T3, then after first merge we obtain 12 runs on T1 and 10 on T2. • After another merge, there are 10 runs on T1 and 2 runs on T3. • At this point the going gets slow, because we can only merge two sets of runs before T3 is exhausted. • Then T1 has 8 runs and T2 has 2 runs. • Again we can only merge two sets of runs, obtaining T1 with 6 and T3 with 2 runs. • After three more passes, T2 has 2 runs and other tapes are empty. • We copy one run to another tape and then we can finish the merge. • It turns out if the no. of runs is a Fibonacci number, Fn then the best way to distribute them is to split them into two Fibonacci numbers Fn-1 and Fn-2. • Otherwise, it is necessary to pad the tape with dummy runs in order to get no. of runs up to Fibonacci number. 4/6/2022 5.2 _ Sorting 95
  • 97. 4/6/2022 5.2 _ Sorting 97
  • 98. Time and Space Complexity 4/6/2022 5.2 _ Sorting 98
  • 99. 4/6/2022 5.2 _ Sorting 99
  • 100. Time and Space Complexity 4/6/2022 5.2 _ Sorting 100
  • 101. Time and Space Complexity 4/6/2022 5.2 _ Sorting 101
  • 102. Thank you 4/6/2022 5.2 _ Sorting 102
  • 103. UNIT V : Searching, Sorting and Hashing By Mr.S.Selvaraj Asst. Professor (SRG) / CSE Kongu Engineering College Perundurai, Erode, Tamilnadu, India Thanks to and Resource from : Data Structures and Algorithm Analysis in C by Mark Allen Weiss & Sumitabha Das, “Computer Fundamentals and C Programming”, 1st Edition, McGraw Hill, 2018. 20CST32 – Data Structures
  • 104. Unit V : Contents 1. Searching – Linear search – Binary Search 2. Sorting: – Internal sorting: • Insertion sort • Selection sort • Bubble sort • Shell sort • Quick Sort • Heap sort • Bucket sort – External sorting: • Merge Sort • Multiway Merge • Polyphase Merge 3. Hashing: – Hash Functions – Separate Chaining – Closed Hashing (Open Addressing) • Linear Probing • Quadratic Probing • Double Hashing – Rehashing – Extendible Hashing. 4/6/2022 104 5.3 _ Hashing
  • 105. Hashing • The implementation of hash tables is frequently called hashing. • Hashing is a technique used for performing insertions, deletions and finds in constant average time. • Tree operations that require any ordering information among the elements are not supported efficiently. • Thus, operations such as find_min, find_max, and the printing of the entire table in sorted order in linear time are not supported. • Here, The central data structure is hash table. We will See – several methods of implementing the hash table. – Compare these methods analytically. – Show numerous applications of hashing. – Compare hash tables with binary search trees. 4/6/2022 5.3 _ Hashing 105
  • 106. Hashing • The ideal hash table data structure is merely an array of some fixed size, containing the keys. • Typically, a key is a string with an associated value (for instance, salary information). • We will refer to the table size as H_SIZE, with the understanding that this is part of a hash data structure and not merely some variable floating around globally. • The common convention is to have the table run from 0 to H_SIZE-1; • Each key is mapped into some number in the range 0 to H_SIZE - 1 and placed in the appropriate cell. • The mapping is called a hash function, which ideally should be simple to compute and should ensure that any two distinct keys get different cells. • Since there are a finite number of cells and a virtually inexhaustible supply of keys, this is clearly impossible, and thus we seek a hash function that distributes the keys evenly among the cells. 4/6/2022 5.3 _ Hashing 106
  • 107. Example • In this example, john hashes to 3, phil hashes to 4, dave hashes to 6, and mary hashes to 7. 4/6/2022 5.3 _ Hashing 107
  • 108. Collision • The only remaining problems – deal with choosing a function, – deciding what to do when two keys hash to the same value (this is known as a collision), and – deciding on the table size. 4/6/2022 5.3 _ Hashing 108
  • 109. Hash Function • If the input keys are integers, then simply returning key mod H_SIZE is generally a reasonable strategy, unless key happens to have some undesirable properties. • In this case, the choice of hash function needs to be carefully considered. • For instance, if the table size is 10 and the keys all end in zero, then the standard hash function is obviously a bad choice. • For reasons we shall see later, and to avoid situations like the one above, it is usually a good idea to ensure that the table size is prime. • When the input keys are random integers, then this function is not only very simple to compute but also distributes the keys evenly. • Usually, the keys are strings; in this case, the hash function needs to be chosen carefully. 4/6/2022 5.3 _ Hashing 109
  • 110. Open Hashing (Separate Chaining) • The first strategy, commonly known as either open hashing, or separate chaining, is to keep a list of all elements that hash to the same value. • For convenience, our lists have headers. • If space is tight, it might be preferable to avoid their use. 4/6/2022 5.3 _ Hashing 110
  • 111. Open Hashing – Find & Insert • To perform a find, we use the hash function to determine which list to traverse. • We then traverse this list in the normal manner, returning the position where the item is found. • To perform an insert, we traverse down the appropriate list to check whether the element is already in place. – if duplicates are expected, an extra field is usually kept, and this field would be incremented in the event of a match. – If the element turns out to be new, it is inserted either at the front of the list or at the end of the list, whichever is easiest. • This is an issue most easily addressed while the code is being written. • Sometimes new elements are inserted at the front of the list, since it is convenient and also because frequently it happens that recently inserted elements are the most likely to be accessed in the near future. 4/6/2022 5.3 _ Hashing 111
  • 112. Open hashing – Type Declarations struct list_node { element_type element; node_ptr next; }; typedef node_ptr LIST; typedef node_ptr position; /* LIST *the_list will be an array of lists, allocated later */ /* The lists will use headers, allocated later */ struct hash_tbl { unsigned int table_size; LIST *the_lists; }; typedef struct hash_tbl *HASH_TABLE; 4/6/2022 5.3 _ Hashing 112
  • 113. Open hashing – Initialization • HASH_TABLE initialize_table( unsigned int table_size ) • { • HASH_TABLE H; • int i; • /*1*/ if( table size < MIN_TABLE_SIZE ) • { • /*2*/ error("Table size too small"); • /*3*/ return NULL; • } • /* Allocate table */ • /*4*/ H = (HASH_TABLE) malloc ( sizeof (struct hash_tbl) ); • /*5*/ if( H == NULL ) • /*6*/ fatal_error("Out of space!!!"); • /*7*/ H->table_size = next_prime( table_size ); • /* Allocate list pointers */ • /*8*/ H->the_lists = (position *) malloc( sizeof (LIST) * H->table_size ); • /*9*/ if( H->the_lists == NULL ) • /*10*/ fatal_error("Out of space!!!"); • /* Allocate list headers */ • /*11*/ for(i=0; i<H->table_size; i++ ) • { • /*12*/ H->the_lists[i] = (LIST) malloc( sizeof (struct list_node) ); • /*13*/ if( H->the_lists[i] == NULL ) • /*14*/ fatal_error("Out of space!!!"); • else • /*15*/ H->the_lists[i]->next = NULL; • } • /*16*/ return H; • } 4/6/2022 5.3 _ Hashing 113
  • 114. Open Hashing – Find Routine • position find( element_type key, HASH_TABLE H ) • { • position p; • LIST L; • /*1*/ L = H->the_lists[ hash( key, H->table_size) ]; • /*2*/ p = L->next; • /*3*/ while( (p != NULL) && (p->element != key) ) • /* Probably need strcmp!! */ • /*4*/ p = p->next; • /*5*/ return p; • } 4/6/2022 5.3 _ Hashing 114
  • 115. Open Hashing – Insert Routine • void insert( element_type key, HASH_TABLE H ) • { • position pos, new_cell; • LIST L; • /*1*/ pos = find( key, H ); • /*2*/ if( pos == NULL ) • { • /*3*/ new_cell = (position) malloc(sizeof(struct list_node)); • /*4*/ if( new_cell == NULL ) • /*5*/ fatal_error("Out of space!!!"); • else • { • /*6*/ L = H->the_lists[ hash( key, H->table size ) ]; • /*7*/ new_cell->next = L->next; • /*8*/ new_cell->element = key; /* Probably need strcpy!! */ • /*9*/ L->next = new_cell; • } • } • } 4/6/2022 5.3 _ Hashing 115
  • 116. Open Hashing - Example • We assume for this section that the keys are the first 10 perfect squares and that the hashing function is simply hash(x) = x mod 10. (The table size is not prime, but is used here for simplicity.) 4/6/2022 5.3 _ Hashing 116
  • 117. Load Factor • We define the load factor, ∆ , of a hash table to be the ratio of the number of elements in the hash table to the table size. • In the example above, ∆ = 1.0. • The average length of a list is ∆. • The effort required to perform a search is the constant time required to evaluate the hash function plus the time to traverse the list. 4/6/2022 5.3 _ Hashing 117
  • 118. Load Factor • In an unsuccessful search, the number of links to traverse is ∆ (excluding the final NULL link) on average. • A successful search requires that about 1+(∆/2) links be traversed, since there is a guarantee that one link must be traversed (since the search is successful), and we also expect to go halfway down a list to find our match. • This analysis shows that the table size is not really important, but the load factor is. • The general rule for open hashing is to make the table size about as large as the number of elements expected (in other words, let ∆ ≈ 1). • It is also a good idea, as mentioned before, to keep the table size prime to ensure a good distribution. 4/6/2022 5.3 _ Hashing 118
  • 119. Closed Hashing (Open Addressing) • Open hashing has the disadvantage of requiring pointers. • This tends to slow the algorithm down a bit because of the time required to allocate new cells, and also essentially requires the implementation of a second data structure. • Closed hashing, also known as open addressing, is an alternative to resolving collisions with linked lists. • In a closed hashing system, if a collision occurs, alternate cells are tried until an empty cell is found. • More formally, cells h0(x), h1 (x), h2(x), . . . are tried in succession where hi(x) = (hash(x) + f(i))mod H_SIZE, with f(0) = 0. • The function, f , is the collision resolution strategy. • Because all the data goes inside the table, a bigger table is needed for closed hashing than for open hashing. • Generally, the load factor should be below = 0.5 for closed hashing. 4/6/2022 5.3 _ Hashing 119
  • 120. Collision Resolution Strategies • We now look at three common collision resolution strategies. – Linear Probing – Quadratic Probing – Double Hashing 4/6/2022 5.3 _ Hashing 120
  • 121. Linear Probing • In linear probing, f is a linear function of i, typically f(i) = i. • This amounts to trying cells sequentially (with wraparound) in search of an empty cell. • Figure shows in next slide the result of inserting keys {89, 18, 49, 58, 69} into a closed table using the same hash function as before and the collision resolution strategy, (i) = i. • The first collision occurs when 49 is inserted; it is put in the next available spot, namely spot 0, which is open. • 58 collides with 18, 89, and then 49 before an empty cell is found three away. The collision for 69 is handled in a similar manner. • As long as the table is big enough, a free cell can always be found, but the time to do so can get quite large. • Worse, even if the table is relatively empty, blocks of occupied cells start forming. • This effect, known as primary clustering, means that any key that hashes into the cluster will require several attempts to resolve the collision, and then it will add to the cluster. 4/6/2022 5.3 _ Hashing 121
  • 122. Linear Probing - Example 4/6/2022 5.3 _ Hashing 122
  • 123. • Although we will not perform the calculations here, it can be shown that the expected number of probes using linear probing is roughly – 1/2(1 + 1/(1 - ∆)2) for insertions and unsuccessful searches and – 1/2(1 + 1/ (1- ∆)) for successful searches. 4/6/2022 5.3 _ Hashing 123
  • 124. Quadratic probing • Quadratic probing is a collision resolution method that eliminates the primary clustering problem • of linear probing. • Quadratic probing is what you would expect-the collision function is quadratic. • The popular choice is f(i) = i2. • Figure shows the resulting closed table with this collision function on the same input used in the linear probing example. • When 49 collides with 89, the next position attempted is one cell away. This cell is empty, so 49 is placed there. • Next 58 collides at position 8. Then the cell one away is tried but another collision occurs. A vacant cell is found at the next cell tried, which is 22 = 4 away. 58 is thus placed in cell 2. • The same thing happens for 69. • For linear probing it is a bad idea to let the hash table get nearly full, because performance degrades. • For quadratic probing, the situation is even more drastic: There is no guarantee of finding an empty cell once the table gets more than half full, or even before the table gets half full if the table size is not prime. • This is because at most half of the table can be used as alternate locations to resolve collisions. • Indeed, we prove now that if the table is half empty and the table size is prime, then we are always guaranteed to be able to insert a new element. 4/6/2022 5.3 _ Hashing 124
  • 125. Quadratic Probing - Example 4/6/2022 5.3 _ Hashing 125
  • 126. Closed hashing - Type declaration • enum kind_of_entry { legitimate, empty, deleted }; • struct hash_entry • { • element_type element; • enum kind_of_entry info; • }; • typedef INDEX position; • typedef struct hash_entry cell; • /* the_cells is an array of hash_entry cells, allocated later */ • struct hash_tbl • { • unsigned int table_size; • cell *the_cells; • }; • typedef struct hash_tbl *HASH_TABLE; 4/6/2022 5.3 _ Hashing 126
  • 127. Closed hashing – Initialization • HASH_TABLE initialize_table( unsigned int table_size ) • { • HASH_TABLE H; • int i; • /*1*/ if( table_size < MIN_TABLE_SIZE ) • { • /*2*/ error("Table size too small"); • /*3*/ return NULL; • } • /* Allocate table */ • /*4*/ H = (HASH_TABLE) malloc( sizeof ( struct hash_tbl ) ); • /*5*/ if( H == NULL ) • /*6*/ fatal_error("Out of space!!!"); • /*7*/ H->table_size = next_prime( table_size ); • /* Allocate cells */ • /*8*/ H->the cells = (cell *) malloc ( sizeof ( cell ) * H->table_size ); • /*9*/ if( H->the_cells == NULL ) • /*10*/ fatal_error("Out of space!!!"); • /*11*/ for(i=0; i<H->table_size; i++ ) • /*12*/ H->the_cells[i].info = empty; • /*13*/ return H; • } 4/6/2022 5.3 _ Hashing 127
  • 128. Closed hashing – Find Routine with Quadratic Probing • position find( element_type key, HASH_TABLE H ) • { • position i, current_pos; • /*1*/ i = 0; • /*2*/ current_pos = hash( key, H->table_size ); • /* Probably need strcmp! */ • /*3*/ while( (H->the_cells[current_pos].element != key ) && • (H->the_cells[current_pos].info != empty ) ) • { • /*4*/ current_pos += 2*(++i) - 1; • /*5*/ if( current_pos >= H->table_size ) • /*6*/ current_pos -= H->table_size; • } • /*7*/ return current_pos; • } 4/6/2022 5.3 _ Hashing 128
  • 129. Closed hashing – Insert Routine with Quadratic Probing • void • insert( element_type key, HASH_TABLE H ) • { • position pos; • pos = find( key, H ); • if( H->the_cells[pos].info != legitimate ) • { /* ok to insert here */ • H->the_cells[pos].info = legitimate; • H->the_cells[pos].element = key; • /* Probably need strcpy!! */ • } • } 4/6/2022 5.3 _ Hashing 129
  • 130. Quadratic probing • Although quadratic probing eliminates primary clustering, elements that hash to the same position will probe the same alternate cells. This is known as secondary clustering. • Secondary clustering is a slight theoretical blemish. • Simulation results suggest that it generally causes less than an extra probe per search. • Double hashing technique eliminates this, but does so at the cost of extra multiplications and divisions. 4/6/2022 5.3 _ Hashing 130
  • 131. Double Hashing 4/6/2022 5.3 _ Hashing 131 • For double hashing, one popular choice is f(i) = i.h2(x). • This formula says that we apply a second hash function to x and probe at a distance h2(x), 2h2(x), . . ., and so on. • A poor choice of h2(x) would be disastrous. • For instance, the obvious choice h2(x) = x mod 9 would not help if 99 were inserted into the input in the previous examples. • Thus, the function must never evaluate to zero. • It is also important to make sure all cells can be probed (this is not possible in the example below, because the table size is not prime). • A function such as h2(x) = R - (x mod R), with R a prime smaller than H_SIZE, will work well. • If we choose R = 7, then Figure shows the results of inserting the same keys as before.
  • 132. Double Hashing - Example 4/6/2022 5.3 _ Hashing 132
  • 133. Double Hashing - Example • The first collision occurs when 49 is inserted. h2(49) = 7 - 0 = 7, so 49 is inserted in position 6. • h2(58) = 7 - 2 = 5, so 58 is inserted at location 3. • Finally, 69 collides and is inserted at a distance h2(69) = 7 - 6 = 1 away. 69 is inserted at location 0. • If we tried to insert 60 in position 0, we would have a collision. Since h2(60) = 7 - 4 = 3, we would then try positions 3, 6, 9, and then 2 until an empty spot is found. • It is generally possible to find some bad case, but there are not too many here. 4/6/2022 5.3 _ Hashing 133
  • 134. Double Hashing - Example • As we have said before, the size of our sample hash table is not prime. • We have done this for convenience in computing the hash function, but it is worth seeing why it is important to make sure the table size is prime when double hashing is used. • If we attempt to insert 23 into the table, it would collide with 58. Since h2(23) = 7 - 2 = 5, and the table size is 10, we essentially have only one alternate location, and it is already taken. Thus, if the table size is not prime, it is possible to run out of alternate locations prematurely. • However, if double hashing is correctly implemented, simulations imply that the expected number of probes is almost the same as for a random collision resolution strategy. • This makes double hashing theoretically interesting. • Quadratic probing, however, does not require the use of a second hash function and is thus likely to be simpler and faster in practice. 4/6/2022 5.3 _ Hashing 134
  • 135. Rehashing • If the table gets too full, – the running time for the operations will start taking too long. – inserts might fail for closed hashing with quadratic resolution. • This can happen if there are too many deletions intermixed with insertions. • A solution, then, is – to build another table that is about twice as big (with associated new hash function) and – scan down the entire original hash table, – computing the new hash value for each (non-deleted) element and – inserting it in the new table. 4/6/2022 5.3 _ Hashing 135
  • 136. Rehashing - Example • As an example, suppose the elements 13, 15, 24, and 6 are inserted into a closed hash table of size 7. • The hash function is h(x) = x mod 7. • Suppose linear probing is used to resolve collisions. • The resulting hash table appears in 4/6/2022 5.3 _ Hashing 136
  • 137. Rehashing - Example • If 23 is inserted into the table: 4/6/2022 5.3 _ Hashing 137
  • 138. Rehashing - Example • If 23 is inserted into the table, the resulting table will be over 70 percent full. • Because the table is so full, a new table is created. • The size of this table is 17, because this is the first prime which is twice as large as the old table size. • The new hash function is then h(x) = x mod 17. • The old table is scanned, and elements 6, 15, 23, 24, and 13 are inserted into the new table. • The resulting table appears as 4/6/2022 5.3 _ Hashing 138
  • 139. Rehashing - Example • This entire operation is called rehashing. • This is obviously a very expensive operation – the running time is O(n), since there are n elements to rehash and the table size is roughly 2n, but it is actually not all that bad, because it happens very infrequently. • In particular, there must have been n/2 inserts prior to the last rehash, so it essentially adds a constant cost to each insertion. (This is why the new table is made twice as large as the old table.) • If this data structure is part of the program, the effect is not noticeable. • On the other hand, if the hashing is performed as part of an interactive system, then the unfortunate user whose insertion caused a rehash could see a slowdown. 4/6/2022 5.3 _ Hashing 139
  • 140. Rehashing Implementation • Rehashing can be implemented in several ways with quadratic probing. – One alternative is to rehash as soon as the table is half full. – The other extreme is to rehash only when an insertion fails. – A third, middle of the road, strategy is to rehash when the table reaches a certain load factor. • Since performance does degrade as the load factor increases, the third strategy, implemented with a good cutoff, could be best. • Rehashing frees the programmer from worrying about the table size and is important because hash tables cannot be made arbitrarily large in complex programs. • The exercises ask you to investigate the use of rehashing in conjunction with lazy deletion. • Rehashing can be used in other data structures as well. • For instance, if the queue data structure of became full, we could declare a double-sized array and copy everything over, freeing the original. 4/6/2022 5.3 _ Hashing 140
  • 141. Rehashing Implementation - Code HASH_TABLE rehash( HASH_TABLE H ) { unsigned int i, old_size; cell *old_cells; /*1*/ old_cells = H->the_cells; /*2*/ old_size = H->table_size; /* Get a new, empty table */ /*3*/ H = initialize_table( 2*old_size ); /* Scan through old table, reinserting into new */ /*4*/ for( i=0; i<old_size; i++ ) /*5*/ if( old_cells[i].info == legitimate ) /*6*/ insert( old_cells[i].element, H ); /*7*/ free( old_cells ); /*8*/ return H; } 4/6/2022 5.3 _ Hashing 141
  • 142. Extendible Hashing • We have deals with the case where the amount of data is too large to fit in main memory. • Here the main consideration then is the number of disk accesses required to retrieve data. • As before, we assume that at any point we have n records to store; the value of n changes over time. Furthermore, at most m records fit in one disk block. • We will use m = 4 in this section. • If either open hashing or closed hashing is used, the major problem is that collisions could cause several blocks to be examined during a find, even for a well-distributed hash table. • Furthermore, when the table gets too full, an extremely expensive rehashing step must be performed, which requires O(n) disk accesses. • A clever alternative, known as extendible hashing, allows a find to be performed in two disk accesses. Insertions also require few disk accesses. 4/6/2022 5.3 _ Hashing 142
  • 143. Extendible Hashing • We recall from previous discussions that a B-tree has depth O(logm/2 n). • As m increases, the depth of a B-tree decreases. • We could in theory choose m to be so large that the depth of the B- tree would be 1. • Then any find after the first would take one disk access, since, presumably, the root node could be stored in main memory. • The problem with this strategy is that the branching factor is so high that it would take considerable processing to determine which leaf the data was in. • If the time to perform this step could be reduced, then we would have a practical scheme. • This is exactly the strategy used by extendible hashing. 4/6/2022 5.3 _ Hashing 143
  • 144. Extendible Hashing - Example • Let us suppose, for the moment, that our data consists of several six-bit integers. • Figure shows an extendible hashing scheme for this data. • The root of the "tree" contains four pointers determined by the leading two bits of the data. • Each leaf has up to m = 4 elements. • It happens that in each leaf the first two bits are identical; this is indicated by the number in parentheses. • To be more formal, D will represent the number of bits used by the root, which is sometimes known as the directory. • The number of entries in the directory is thus 2D. • dl is the number of leading bits that all the elements of some leaf l have in common. • dl will depend on the particular leaf, and dl D. 4/6/2022 5.3 _ Hashing 144
  • 145. Extendible Hashing - Example • Suppose that we want to insert the key 100100. • This would go into the third leaf, but as the third leaf is already full, there is no room. • We thus split this leaf into two leaves, which are now determined by the first three bits. • This requires increasing the directory size to 3. • These changes are reflected in figure. 4/6/2022 5.3 _ Hashing 145
  • 146. Extendible Hashing - Example • Notice that all of the leaves not involved in the split are now pointed to by two adjacent directory entries. • Thus, although an entire directory is rewritten, none of the other leaves are actually accessed. • If the key 000000 is now inserted, then the first leaf is split, generating two leaves with dl = 3. • Since D = 3, the only change required in the directory is the updating of the 000 and 001 pointers. See Figure. 4/6/2022 5.3 _ Hashing 146
  • 147. Extendible Hashing - Example • This very simple strategy provides quick access times for insert and find operations on large databases. • There are a few important details we have not considered. • First, it is possible that several directory splits will be required if the elements in a leaf agree in more than D + 1 leading bits. • For instance, starting at the original example, with D = 2, if 111010, 111011, and finally 111100 are inserted, the directory size must be increased to 4 to distinguish between the five keys. • This is an easy detail to take care of, but must not be forgotten. • Second, there is the possibility of duplicate keys; if there are more than m duplicates, then this algorithm does not work at all. • In this case, some other arrangements need to be made. 4/6/2022 5.3 _ Hashing 147
  • 148. Thank you 4/6/2022 5.3 _ Hashing 148