Quick Sortpublished
Quick Sortpublished
10;Oct 2012
Abstract
Sorting is one of the most researched problems in the field of computer science, where several algorithms
have been proposed. For large input size, QuickSort has proved to be the fastest sorting algorithm. This paper
presents an intelligent QuickSort algorithm based on a dynamic pivot selection technique to enhance the
average case and eliminate the worst case behaviors of the algorithm. The suggested dynamic pivot selection
technique is data-dependent to increase the chances of splitting the array or list into relatively equal sizes in
order to reduce the number of recursive calls made for the Quicksort algorithm. Furthermore, the modified
algorithm converts the worst case into a best case behavior with Θ(n) execution time. The algorithm is
intelligent enough to recognize a sorted array or sub-array that doesn’t require further processing.
Keywords: QuickSort, MQuickSort, Dynamic Pivot Selection, Divide & Conquer, Median-of-Three
Rule, Median-of-Five Rule.
1. Introduction
The sorting problem is one of the most researched problems in the field of Computer Science since
many other algorithms require sorted lists as a prerequisite to reduce their execution time and enhance
performance. The QuickSort algorithm (Hoare, C. A. R., 1961) (Hoare R., 1962) is considered to be the
fastest sorting algorithms based on different studies (Sedgewick R., 1977) (Van Emden M. H., 1970)
(Knuth D.E., 2005). Quicksort follows the technique of divide and conquer by recursively splitting each
array into two sub arrays, which makes it easier to solve smaller problems than a single larger one (Dean
C., 2006) (Ledley R., 1962). In Quicksort, a pivot is selected from the unsorted array and used to split the
array into two sub arrays for which the same algorithm is called recursively until the sub arrays have
size one or zero. For an input size n, the QuickSort algorithm has an average runtime complexity of Θ(n
log n) and a worst case scenario of Θ(n2) when processing an already sorted list while picking the largest
element as a pivot.
The runtime of the Quicksort algorithm mainly depends on the splitting of the array and the consecutive
sub arrays. If splitting constantly results in a small reduction in the size of the array or sub array, the
runtime will be:
T(n) = n + T(n-c), where c is a constant. Thereby,
T(n) = Θ(n2) (1)
However, if splitting constantly results almost equal size subarrays, the runtime complexity of Quicksort
543 [email protected]
Vol 19, No. 10;Oct 2012
will be:
T(n) = n + T(n/2).
Thereby,
T(n) = Θ(n log n) (2)
Splitting the array into almost equal halves guarantees the best performance for the QuickSort algorithm
and reduces the number of recursive calls which will eventually reduce execution time.
Two main enhancements can be applied to the basic QuickSort algorithm. The first modification
suggests stopping the recursive calling to QuickSort when the size of the input becomes relatively small
and only few elements are out of order. In such case, it would be better to use Insertion sort which
performs in linear time for almost ordered sub arrays (Bell D., 1958) (Box R. and Lacey S., 1991) (Kruse
R. and Ryba A., 1999). The second enhancement is concerned with the pivot selection technique which
has proven to be the most decisive factor in dividing the array into sub arrays. Several techniques have
been proposed in order to avoid the worst case scenario previously explained in (1). The original
QuickSort algorithm uses the left-most or the right-most element as a pivot which can easily cause the
worst case behavior when sorting a sorted or almost list. Another technique is the random pivot selection
technique which reduces the probability of occurrence for the worst case scenario. The Median-of-Three
splitting technique (Sedgewick R., 1978) suggests picking the median of the values stored in the first, last
and ((first + last) /2) indexes as a pivot. This technique reduces the chances of the worst-case scenario
and increases the chances of the average case behavior of the algorithm. However, it does not guarantee
splitting the array into equal halves and may bring about the worst case behavior of QuickSort although
it’s unlikely to happen. Figure 1(a) shows the splitting of an eleven-element sample array. Switching the
values of the fifth and sixth elements of the array will result in a more balanced splitting as shown in
Figure 1(b). It is clear that the performance of the algorithm is sensitive to slight modifications for the
contents of the array.
Figure 1. (a) Applying the QuickSort algorithm on an eleven-element array using the
Median-of-Three splitting technique. (b) Applying the algorithm on a slightly reordered array
with the same contents and size.
The Median-of-Five with random index selection technique (Janez B. et al., 2000) is a modification for
the previous technique by adding the values stored in two randomly picked indexes to the values stored
544 [email protected]
Vol 19, No. 10;Oct 2012
in the first, last and ((first + last) /2) indexes. Although, this technique may provide a more balanced
splitting than the previous ones, it can still suffer from the same setbacks of the previous techniques.
Another modification for the previous method which aims to reduce the overhead associated with the
random number generation picks the median of the five values stored in fixed indexes as the pivot;
namely, first, ((first + last) /4), ((first + last) /2), (3 *(first + last) /4) and last (Mohammed, A. et al., 2004).
Although, this technique reduces the overhead of the random number generation by using five fixed
indexes, it still requires time to pick the median of five elements at each recursive call.
Other techniques involving the Median-of-Seven and Median-of-Nine either with or without random
index selection were proposed in (Mohammed, A. et al., 2004). Although, increasing the number of
elements may provide a more balanced split, the time needed to pick the pivot at each recursive call
increases as the number of elements increases.
The main drawback of the previous techniques is that the selection of the pivot is based on a specific
number of elements which does not necessarily reflect the nature of the array. When using any of these
pivot selection techniques, the worst case behavior of the QuickSort algorithm may still occur.
This paper is organized as follows. Section 2 presents the modified QuickSort (MQuickSort) algorithm
using the dynamic pivot selection technique. Sections 3 and 4 discuss behavioral analysis of
MQuickSort in comparison with QuickSort algorithm using various existing pivot selection techniques.
This paper concludes with section 5.
This section proposes a pivot-selection technique based on the values of every element in the array to
split the array into relatively equal halves for each recursive call which in turn reduces the number of
recursive calls and the overall execution time of QuickSort. The proposed technique aims to ensure
equal splitting for the array. Therefore, it drives the worst case behavior to be O(n log n). The proposed
technique also verifies an already sorted array or sub array which is done while comparing the elements
of the array to the pivot. If the array is already sorted, it will not be processed any further which
transforms the O(n2) complexity into the best case behavior of the algorithm; i.e. O(n). The proposed
technique operates as follows; at first, the pivot value is chosen to be the value of the rightmost element
of the array. Each element value will be compared with the pivot value and two counters are utilized to
count the number of elements with values smaller than the pivot versus the number of elements with
values larger than the pivot; namely, CountLess and CountLarger, respectively. The sum of the values of
the elements smaller than the pivot and the sum of those larger than the pivot are stored in SumLess and
SumLarger, respectively. These variables are then used to calculate the next pivots for the recursive calls.
The integer average of the values smaller than the pivot is passed as the pivot value of the recursive call
for the left sub array. Likewise, the integer average of the values larger than the pivot is passed as the
pivot value of the recursive call for the right sub array. This pivot selection technique helps in
successively splitting the array into nearly equal halves which in turn improves the efficiency of the
QuickSort algorithm. A Boolean variable is utilized by the algorithm to recognize an already sorted array
or sub array which reduces the number of recursive calls. Along with the reduction in recursive calls, the
proposed technique converts the worst-case scenario for the classical QuickSort algorithm into a best
case scenario with Θ(n) runtime. The modified algorithm MQuickSort is provided in Figure 2.
545 [email protected]
Vol 19, No. 10;Oct 2012
The dynamic pivot selection technique does not depend on the position of the values stored in the array.
Figure 3 displays the splitting of the arrays in Figure 1(a) and (b) where the same splitting tree is
generated for both arrays which indicate that the MQuickSort algorithm is not sensitive to the order of
elements in the array.
Figure 3. Applying the MQuickSort algorithm with the dynamic pivot selection technique on
the arrays of Figure 1(a) and 1(b) results with identical splitting trees.
The MQuickSort algorithm may get affected in case of the existence of some very small and/or large
elements compared to the other elements in the array. The example shown in Figure 4 suggests that the
MQuicksort algorithm still performs better than the Median-of-Three rule based QuickSort algorithm in
terms of the height of the splitting tree. Both algorithms require the same number of recursive calls.
Figure 4. (a) Applying the QuickSort algorithm using the Median-of-Three splitting
technique on an eleven element array with extreme values. (b) Applying the MQuickSort
algorithm on the same array of (a).
547 [email protected]
Vol 19, No. 10;Oct 2012
3. Performance Analysis
The MQuickSort algorithm initially selects the contents of the rightmost component of the array as a
pivot. For any iteration and while going over the contents of the array to calculate the mean of the
elements less than the pivot and that of the elements larger than the pivot, it utilizes a Boolean variable
which will have a TRUE value if the array is sorted in order to stop recursive calls. Consequently, only
one recursive call is required for a sorted array with Θ(n) runtime requirement for the first and only
required iteration. The above is also true in case of sorted sub-arrays which helps in reducing the
execution time.
For an average case with the contents of the array not having extreme values, the MQuickSort algorithm
tends to calculate the required pivots for the next two recursive calls during the current iteration in order
to split the array into approximately two equal sub-arrays. The runtime requirements of the average case
can be expressed as T(n) = n + T((n/2)+c) + T((n/2)-c), where c is a constant. The n component of T(n)
is the time needed to calculate the pivot for the next iteration while T((n/2)+c) and T((n/2)-c) are the
times needed for recursively sorting the two approximately equal sub-arrays. Solving the recurrence
results with T(n) = Θ(n log n) time requirement for the average case behavior of the MQuickSort
algorithm.
The worst case scenario of the MQuickSort algorithm takes place when there are just a few elements of
the array with extreme values. The MQuickSort algorithm tends to isolate the extreme values in early
iterations and then proceed with the remaining elements with recursive calls which will strictly have an
average case behavior with Θ(n log n) runtime requirement.
4. Experimental Analysis
Scenario1:
This scenario represents sorted arrays with sizes from 100 to 100,000 elements. Table 1 shows the
runtime requirements in milliseconds for all five algorithms. It is clear that the MQuickSort algorithm
with dynamic pivot selection constantly outperforms QuickSort using all the other pivot selection
techniques. Results also show that QuickSort using the Median-of-Three rule performs better than both
of the Median-of-Five pivot selection techniques which indicates how much time is consumed to pick a
median of five compared to picking a median of three.
Scenario 2:
This scenario represents randomly generated arrays with values chosen from 1 through Array_Size/10.
The objective of this scenario is to compare the behavior of the five algorithms with respect to arrays
with duplicate contents. Table 2 shows the runtime requirements of the algorithms. The MQuickSort
algorithm exhibits a clear superiority especially for array sizes greater than or equal to 2000. The
548 [email protected]
Vol 19, No. 10;Oct 2012
performance of the QuickSort algorithm using the last element as a pivot tends to be comparable to that
when using the Median-of-Three technique. Likewise, the QuickSort algorithm performs almost the
same using the fixed and non-fixed Median-of-Five techniques.
Scenario 3:
This scenario represents sorting arrays with no duplication. Table 3 clearly demonstrates the superiority
of the MQuickSort algorithm using the dynamic pivot selection technique especially for array sizes
greater than or equal to 3000 elements.
5. Conclusion
This article proposed a modified QuickSort (MQuickSort) based on a dynamic pivot selection technique.
In order to determine the pivots for the next level recursive calls, the integer average of the values larger
than the pivot is passed as the pivot value of the recursive call for the right sub array. Likewise, the
integer average of the values less than the pivot is passed as the pivot value of the recursive call for the
left sub array. This pivot selection technique helps in successively splitting the array into nearly equal
halves which in turn improves the efficiency of the QuickSort algorithm. Using the dynamic pivot
selection technique, the worst case scenario in a typical QuickSort algorithm is turned into a best case
for the MQuickSort algorithm with Θ(n) runtime requirement. The MQuickSort algorithm runs with Θ(n
log n) in the worst case which is confirmed by experiments
549 [email protected]
Vol 19, No. 10;Oct 2012
Table 1 Comparison of execution time (in milliseconds) for five algorithms with respect to sorted
arrays with various sizes.
Array Size QuickSort using last QuickSort using QuickSort QuickSort using MQuickSort
element as pivot Median-of-Three using Median-of-Five –
Median-of-Five Fixed
Table 2 Comparison of execution time (in milliseconds) for five algorithms with respect to arrays of
various sizes with duplicate values.
550 [email protected]
Vol 19, No. 10;Oct 2012
Table 3 Comparison of execution time (in milliseconds) for five algorithms with respect to arrays of
various sizes with no duplicate values.
Array Size QuickSort using QuickSort using QuickSort using QuickSort using MQuickSort
last element as Median-of-Three Median-of-Five Median-of-Five –
pivot Fixed
References
Hoare, C. A. R. (1961). Partition: Algorithm 63, Quicksort: Algorithm 64, and Find: Algorithm 65.
Comm. ACM. 4(7), 321-322
Hoare R.(1962). Quicksort. the Computer Journal, 4(1), 10-15
Sedgewick R. (1977). QuickSort with Equal Keys. Siam J Comput.,6: 240-287
Van Emden M. H. (1970). Algorithms 402: Increasing the efficiency of Quicksort. Communications of
the ACM, 563-567
Knuth D.E. (2005). The Art of Computer Programming. Vol. 3: Sorting and Searching,
Addison-Wesley, Reading, Mass.
Dean C. (2006). A Simple Expected Running Time Analysis for Randomized Divide and Conquer
Algorithms. Computer Journal of Discrete Applied Mathematics, 154(1), 1-5
551 [email protected]
Vol 19, No. 10;Oct 2012