02 Sorting (Chap 13)
02 Sorting (Chap 13)
National University
Page
The efficiency of data handling can often be substantially increased if the data are sorted For example, it is practically impossible to find a name in the telephone directory if the items are not sorted In order to sort a set of item such as numbers or words, two properties must be considered The number of comparisons required to arrange the data The number of data movement
National University
Page
Depending on the sorting algorithm, the exact number of comparisons or exact number of movements may not always be easy to determine Therefore, the number of comparisons and movements are approximated with big-O notations Some sorting algorithm may do more movement of data than comparison of data It is up to the programmer to decide which algorithm is more appropriate for specific set of data For example, if only small keys are compared such as integers or characters, then comparison are relatively fast and inexpensive But if complex and big objects should be compared, then comparison can be quite costly
A.R. Hadaegh Dr. Ahmad R. Hadaegh National University Page 3
If on the other hand, the data items moved are large, and the movement is relatively done more, then movement stands out as determining factor rather than comparison Further, a simple method may only be 20% less efficient than a more elaborated algorithm If sorting is used in a program once in a while and only for small set of data, then using more complicated algorithm may not be desirable However, if size of data set is large, 20% can make significant difference and should not be ignored Lets look at different sorting algorithms now
A.R. Hadaegh Dr. Ahmad R. Hadaegh National University Page 4
Insertion Sort
Start with first two element of the array, data[0], and data[1]
If they are out of order then an interchange takes place Next data[2] is considered and placed into its proper position If data[2] is smaller than data[0], it is placed before data[0] by shifting down data[0] and data[1] by one position Otherwise, if data[2] is between data[0] and data[1], we just need to shift down data [1] and place data[2] in the second position Otherwise, data[2] remain as where it is in the array Next data[3] is considered and the same process repeats And so on
A.R. Hadaegh Dr. Ahmad R. Hadaegh National University Page 5
National University
Page
Moving 5 down
5
5 3 8 1 Moving 5 down 2 5 5 8 1
National University
2
3 5 8 1
Page 7
tmp = 8 2
3
5
3
5
8
1
8
1 Moving 5 down Moving 3 down Moving 2 down Put tmp=1 in position 1 1 2 3 5 8
Page 8
tmp=1
Moving 8 down
2
3 5 8 1
A.R. Hadaegh Dr. Ahmad R. Hadaegh
2 3
5 8 8
2
3 5 5 8
National University
2
3 3 5 8
2
2 3 5 8
Advantage of insertion sort: If the data are already sorted, they remain sorted and basically no movement is not necessary
Disadvantage of insertion sort: An item that is already in its right place may have to be moved temporary in one iteration and be moved back into its original place Complexity of Insertion Sort: Best case: This happens when the data are already sorted. It takes O(n) to go through the elements Worst case: This happens when the data are in reverse order, then for the ith item (i-1) movement is necessary
Total movement = 1 + 2 + .. . +(n-1) = n(n-1)/2 which is O(n2)
The average case is approximately half of the worst case which is still O(n2)
A.R. Hadaegh Dr. Ahmad R. Hadaegh National University Page 9
Selection Sort
Select the minimum in the array and swap it with the first element Then select the second minimum in the array and swap it with the second element And so on until everything is sorted
National University
Page 10
3
8 1 The second minimum is 2 Swap it with the second position
3
8 5
1 2
3 8 5
A.R. Hadaegh Dr. Ahmad R. Hadaegh National University
1 2
3
8 5
Page 12
1 2
3 8 5 The fourth minimum is 5 Swap it with the forth position
1 2
3
8 5
1 2
3 8 5
A.R. Hadaegh Dr. Ahmad R. Hadaegh National University
1 2 3 5 8
Page 13
National University
Page 14
Bubble Sort
Start from the bottom and move the required elements up (i.e. bubble the elements up) Two adjacent elements are interchanged if they are found to be out of order with respect to each other First data[n-1] and data[n-2] are compared and swapped if they are not in order
Then data[n-2] and data[n-3] are swapped if they are not in order
And so on
A.R. Hadaegh Dr. Ahmad R. Hadaegh National University Page 15
template <class T> void BubbleSort(T data[ ], int n) { for (int i=0; i<n-1, i++) for (int j = n-1; j>i; --j) if data[j] < data[j-1]; swap (data[j], data[j-1]); }
National University
Page 16
Iteration 2: Start from the last element up to second element and bubble the smaller elements up 1 5 2 3 8
A.R. Hadaegh Dr. Ahmad R. Hadaegh
1 5 2 no swap 3 8
1 5 swap
1 2 5 3 8
Page 17
no swap
2 3 8
National University
Iteration 4: Start from the last element up to fourth element and bubble the smaller elements up 1 2 3 5 8
A.R. Hadaegh Dr. Ahmad R. Hadaegh National University Page 18
no swap
National University
Page 19
Comparing the bubble sort with insertion and selection sorts we can say that: For the average case, bubble sort makes approximately twice as many comparisons and the same number of moves as insertion sort Bubble sort also, on average, makes as many comparison as selection sort and n times more moves than selection sort Between theses three types of sorts Insertion Sort is generally better algorithm because if array is already sorted running time only takes O(n) which is relatively faster than other algorithms
National University
Page 20
Shell Sort
Shell sort works on the idea that it is easier and faster to sort many short lists than it is to sort one large list Select an increment value k (the best value for k is not necessarily clear) Sort the sequence consisting of every kth element (use some simple sorting technique) Decrement k and repeat above step until k=1
National University
Page 21
4 7 10 2 5 12 1 9 6 3 8 11
4 3 10 2 5 7 1 9 6 12 8 11
National University
4 3 1 2 5 7 8 9 6 12 10 11
Page 22
1 2 4 3 5
1 2 3 4 5 6 7
7 8
9 6 12 10 11
A.R. Hadaegh Dr. Ahmad R. Hadaegh
7 6
9 8 11 10 12
8
9 10 11 12
Page 23
National University
Page 24
Heap Sort
Heap sort uses a heap as described in the earlier lectures As we said before, a heap is a binary tree with the following two properties: Value of each node is not less than the values stored in each of its children The tree is perfectly balanced and the leaves in the level are all in the leftmost positions
National University
Page 26
Repeat the process for all elements until you are done
National University
Page 27
template <class T> void HeapSort(T data[ ], int size) { for (int i = (size/2)-1; i>=0; i--) MoveDown(data, i, size-1); // creates the heap for (i=size-1; i>=1; --i) { Swap (data[0], data[i]); // move the largest item to data[i] MoveDown(data, 0, i-1); // restores the heap } }
National University
Page 28
2 8 6 1 10 15 3 12 11
A.R. Hadaegh Dr. Ahmad R. Hadaegh National University
2
8 1 12 11 10 15 6 3
Page 29
15
15
2 8 6 10 11 15 8
2 15 10 11 6
12 1
3 1
12
National University
Page 30
2 8 12 1
15 10
6 3 1 8
12
15 10
6 3
11
11
2 12 8 1 11 10 15 12 3 1 11 8 10
2 15
National University
Page 31
2 12 11 1 8 10 6 15 3 1 11 8 12 10
15 2 6 3
15 12 11 1 8 10 2 12 3 1 11 8 10
15 6
National University
Page 32
15 12 6 10 8 2
8 12 3
1 Restore the heap 12 6 10 15 2 3
11 1
11
11 8 1
A.R. Hadaegh Dr. Ahmad R. Hadaegh
6 10 2
15
National University Page 33
12 11 8
1 15 10 2 6 3 12 Restore the heap 8 15
1 11
10 2 6 3
11 10 8
12
A.R. Hadaegh Dr. Ahmad R. Hadaegh
6 1 2 3
15
National University Page 34
Swap the root with the last element 11 10 8 12 15 1 2 6 3 12 Restore the heap 8 15
3 10
1 2 6 11
10 8 3
12
A.R. Hadaegh Dr. Ahmad R. Hadaegh
6 1 2 11
15
National University Page 35
Swap the root with the last element 10 8 3 12 15 1 2 6 11 3 12 Restore the heap 15 8 1 10
2
6 11
8 3 2
12
A.R. Hadaegh Dr. Ahmad R. Hadaegh
6 1 10 11
15
National University Page 36
1 3 2 12
15 12 Restore the heap 15 1 10 6 3 2 8 10 6 11
11
6 3 2
12
A.R. Hadaegh Dr. Ahmad R. Hadaegh
1 8 10 11
15
National University Page 37
Swap the root with the last element 6 3 2 12 15 8 10 1 11 12 Restore the heap 3 2 6 12
A.R. Hadaegh Dr. Ahmad R. Hadaegh
2 3 6 15 8 1
10
11
1 8 10 11
15
National University Page 38
3
2 6 12 15 8 10 1 11 12 Restore the heap 2 1 6 12
A.R. Hadaegh Dr. Ahmad R. Hadaegh
1 2 6 15 8 10 3 11
3 8
10 11
15
National University Page 39
Swap the root with the last element 2 1 6 12 15 8 10 3 11 6 12 Restore the heap 1 15 2 8 10 1 3 11
2 6 12
A.R. Hadaegh Dr. Ahmad R. Hadaegh
3 8 10
11
15
National University Page 40
1 2 6 12 15 8 10 3 11
1 2 3 6
8
10 11 12 15
National University
Page 41
Quick Sort
This is known to be the best sorting method.
In this scheme: One of the elements in the array is chosen as pivot Then the array is divided into sub-arrays The elements smaller than the pivot goes into one sub-array The elements bigger than the pivot goes into another subarray The pivot goes in the middle of these two sub-arrays Then each sub-array is partitioned the same way as the original array and process repeats recursively
National University
Page 43
92
26
Select pivot
65
Partition
0 31
13 26
57
National University
43
65
92 75
81
Page 46
65
81
0 13 26 31 43 57 65 75 81 92
National University
Page 47
Radix Sort
Radix refers to the base of the number. For example radix for decimal numbers is 10 or for hex numbers is 16 or for English alphabets is 26. Radix sort has been called the bin sort in the past The name bin sort comes from mechanical devices that were used to sort keypunched cards Cards would be directed into bins and returned to the deck in a new order and then redirected into bins again For integer data, the repeated passes of a radix sort focus on the ones place value, then on the tens place value, then on the thousands place value, etc For character based data, focus would be placed on the right-most character, then the second most right-character, etc
A.R. Hadaegh Dr. Ahmad R. Hadaegh National University Page 48
0 1 2 3 4 5 6 7 8 9
A.R. Hadaegh Dr. Ahmad R. Hadaegh
472 432
The sublists are collected and made into one large bin (in order given) 472 432 254 534 654 477 459 649 239 Then Radix sort will arrange the values into 10 bins based upon the tens place value 0 1 2 3 4 5 6 7 8 9
A.R. Hadaegh Dr. Ahmad R. Hadaegh
National University
Page 51
The sublists are collected and made into one large bin (in order given) 432 534 239 649 254 654 459 472 477 Radix sort will arrange the values into 10 bins based upon the hundreds place value (done!)
0 1 2 3 4 5 6 7 8 9
The sublists are collected and the numbers are sorted 239 254 432 459 472 477 534 649 654
A.R. Hadaegh Dr. Ahmad R. Hadaegh National University Page 52
0 1 2 3 4 5 6 7 8 9
A.R. Hadaegh Dr. Ahmad R. Hadaegh
The sublists are collected and made into one large bin (in order given) 472 043 054 534 654 077 009 039 Then Radix sort will arrange the values into 10 bins based upon the tens place value 0 1 2 3 4 5 6 7 8 9
A.R. Hadaegh Dr. Ahmad R. Hadaegh
National University
Page 54
The sublists are collected and made into one large bin (in order given) 009 534 039 043 054 654 472 077 Radix sort will arrange the values into 10 bins based upon the hundreds place value (done!) 0 009 039 043 054 077 1 2 3 4 472 5 534 6 654 7 8 9 The sublists are collected and the numbers are sorted 009 039 043 054 077 472 534 654
A.R. Hadaegh Dr. Ahmad R. Hadaegh National University Page 55
Assume the data are: area Book Close Team New Place Print To sort the above elements using the radix sort you need to have 26 buckets, one for each character. You also need one more character to represent space which has the lowest value. Suppose that letter is question-mark ? and it is used to represent space You can rewrite the data as follows: area? Book? Close Team? New?? Place Print Now all letters have 5 characters and it is easy to compare them with each other To do the sorting, you can start from the right most character, place the data into appropriate buckets and collect them. Then place them into bucket based on the second right most character and collect them again and so on.
A.R. Hadaegh Dr. Ahmad R. Hadaegh National University Page 56
However, keysize (for example, the maximum number of digits) is a factor, but will still be a linear relationship because for example for at most 3 digits 3n is still O(n) which is linear
Although theoretically O(n) is an impressive running time for sort, it does not include the queue implementation Further, if radix r (the base) is a large number and a large amount of data has to be sorted, then radix sort algorithm requires r queues of at most size n and the number r*n is O(rn) which can be substantially large depending of the size of r.
A.R. Hadaegh Dr. Ahmad R. Hadaegh National University Page 57