Comparison of Various Sorting Algorithms - A Review
Comparison of Various Sorting Algorithms - A Review
Abstract—
An algorithm is any well-defined procedure or set of instructions, that takes some input in the form of some values,
processes them and gives some values as output. Sorting involves rearranging information into either ascending or
descending order. Sorting is considered as a fundamental operation in computer science as it is used as an
intermediate step in many operations. The goal of this paper is too reviewed on various different sorting algorithms
and compares the various performance factors among them.
Keywords—bubble sort, heap sort , insert sort, merge sort, quick sort.
I. INTRODUCTION
Algorithms have a vital and key role in solving the computational problems, informally an algorithm is a well-defined
computational procedure that takes input and produces output. Algorithm is a tool or a sequence of steps to solve the
computational problems . The existence of algorithms goes way back as they were in existence even before the existence
of computers. There are various methodologies and techniques based on which various different kinds of algorithms are
designed. Out of all these problem solving algorithms, let us talk about the sorting algorithms. In case of sorting, it is
required to arrange a sequence of numbers into a given order, generally non-decreasing. In computer science, an
algorithm that puts elements of a list into a certain order is known as sorting algorithm. The orders that are mainly used
are numerical order and lexicographical order. For making use of other algorithms (like search and merge algorithms)
sorted lists are required to work correctly; it is also often useful for conforming to well-established patterns or rules of
data and for producing such output which is easy to read and understand. There are two conditions enlisted that output
must satisfy. These conditions are:
1. The output is in non-decreasing order (each element is no smaller than the previous element according to the desired
total order)
2. The output is a permutation or reordering of the input.
Since the dawn of computing, the sorting problem has attracted a great deal of research, perhaps due to the complexity
of solving it efficiently despite its simple, familiar statement. For example, bubble sort was analyzed as early as 1956 [1].
Although many consider it a solved problem, useful new sorting algorithms are still being invented (for example, library
sort was first published in 2004). Sorting algorithms are prevalent in introductory computer science classes, where the
abundance of algorithms for the problem provides a gentle introduction to a variety of core algorithm concepts, such as
big O notation, divide and conquer algorithms, data structures, randomized algorithms, best, worst and average case
analysis, time-space tradeoffs, and lower bounds.
B. Insertion sort
Insertion sort is a simple sorting algorithm that is relatively efficient for small lists and mostly sorted lists, and often is
used as part of more sophisticated algorithms. It works by taking elements from the list one by one and inserting them in
their correct position into a new sorted list. In arrays, the new list and the remaining elements can share the array's space,
but insertion is expensive, requiring shifting all following elements over by one. Shell sort is a variant of insertion sort
that is more efficient for larger lists.
Pros:
1) Auxiliary space used is O (1).
Cons:
2
1) General Complexity is O (n ). Best Case is O (n) when the list is already sorted.
C. Heap sort
Heap sort is a much more efficient version of selection sort. It also works by determining the largest (or smallest)
element of the list, placing that at the end (or beginning) of the list, then continuing with the rest of the list, but
accomplishes this task efficiently by using a data structure called a heap, a special type of binary tree. Once the data list
has been made into a heap, the root node is guaranteed to be the largest (or smallest) element. When it is removed and
placed at the end of the list, the heap is rearranged so the largest element remaining moves to the root.
Using the heap, finding the next largest element takes O (log n) time, instead of O (n) for a linear scan as in simple
selection sort. This allows Heap sort to run in O (n log n) time, and this is also the worst case complexity.
Pros:
1) Time complexity of the algorithm is O (n log n).
2) Auxiliary Space required for the algorithm is O (1).
3) In-space and non-recursive makes it a good choice for large data sets.
Cons:
1) Works slow than other such DIVIDE-AND-CONQUER sorts that also have the same O (n log n) time
complexity due to cache behavior and other factors.
2) Unable to work when dealing with linked lists due to non convertibility of linked lists to heap structure.
D. Quick Sort
Quick Sort is a divide and conquer algorithm which relies on a partition operation: to partition an array an element
called a pivot is selected. All elements smaller than the pivots are moved before it and all greater elements are moved
after it. This can be done efficiently in linear time and in-place. The lesser and greater sub-lists are then recursively
sorted. Efficient implementations of quick sort (with in-place partitioning) are typically unstable sorts and somewhat
complex, but are among the fastest sorting algorithms in practice. Together with its modest O (log n) space usage, quick
sort is one of the most popular sorting algorithms and is available in many standard programming libraries. The most
complex issue in quick sort is choosing a good pivot element; consistently poor choices of pivots can result in drastically
slower O (n²) performance, if at each step the median is chosen as the pivot then the algorithm works in O (n log n).
Finding the median however, is an O (n) operation on unsorted lists and therefore exacts its own penalty with sorting.
Pros:
1) One advantage of parallel quick sort over other parallel sort algorithms is that no synchronization is required. A
new thread is started as soon as a sub list is available for it to work on and it does not communicate with other
threads. When all threads complete, the sort is done.
2) All comparisons are being done with a single pivot value, which can be stored in a register.
Sonal Beniwal Page 84
International Journal of
Emerging Research in Management &Technology Research Article May
ISSN: 2278-9359 (Volume-2, Issue-5) 2013
3) The list is being traversed sequentially, which produces very good locality of reference and cache behavior for
arrays.
Cons:
1) Auxiliary space used in the average case for implementing recursive function calls is O (log n) and hence proves
to be a bit space costly, especially when it comes to large data sets.
2) Its worst case has a time complexity of O (n2) which can prove very fatal for large data sets.
E. Merge Sort:
Merge sort takes advantage of the ease of merging already sorted lists into a new sorted list. It starts by comparing
every two elements (i.e., 1 with 2, then 3 with 4...) and swapping them if the first should come after the second. It then
merges each of the resulting lists of two into lists of four, then merges those lists of four, and so on; until at last two lists
are merged into the final sorted list. Of the algorithms described here, this is the first that scales well to very large lists,
because its worst-case running time is O (n log n). Merge sort has seen a relatively recent surge in popularity for practical
implementations, being used for the standard sort routine in the programming languages Perl, [4] Python (as timsort[5] ),
and Java (also uses timsort as of JDK7[6] ), among others. Merge sort has been used in Java at least since 2000 in
JDK1.3. [7] [8]
Pros:
1) Marginally faster than the heap sort for larger sets.
2) Merge sort is often the best choice for sorting a linked list because the slow random-access performance of a
linked list makes some other algorithms (such as quick sort) perform poorly, and others (such as heap sort)
completely impossible.
Cons:
1) At least twice the memory requirements of the other sorts because it is recursive. This is the BIGGEST cause for
concern as its space complexity is very high. It requires about a Θ (n) auxiliary space for its working.
2) Function overhead calls (2n-1) are much more than those for quick sort (n). This causes it to take more time
marginally to sort the input data.
F. Counting Sort
Counting sort is applicable when each input is known to belong to a particular set, S, of possibilities. The algorithm
runs in O (|S| + n) time and O(|S|) memory where n is the length of the input. It works by creating an integer array of size
|S| and using the ith bin to count the occurrences of the ith member of S in the input. Each input is then counted by
incrementing the value of its corresponding bin. Afterward, the counting array is looped through to arrange all of the
inputs in order. This sorting algorithm cannot often be used because S needs to be reasonably small for it to be efficient,
but the algorithm is extremely fast and demonstrates great asymptotic behavior as n increases. It also can be modified to
provide stable behavior.
Pros:
1) The algorithm has a time complexity of O (n+m), where n is the number of data while m is the range of the
data, which implies that the most efficient use will be when m<<n. In that case the time complexity will
turn out to be linear.
2) This sort works optimally in the case when the data is uniformly distributed.
Cons:
1) If the range m>>n, the complexity will not be linear in n and thus this sort does not remain useful anymore. This
is because chances of introduction of gaps, that is counters for those elements which do not exist in the list, will
cause a higher space complexity.
Table1: Various Sorting Algorithm
Sort Order Worst Case Memory Stable Data Type Complexity
Bubble n2 n2 nk+np Yes All Very Low
Insertion n2 n2 nk+np Yes All Very Low
Quick n log n n2 nk+np+stack No All High
Merge n log n n log n nk+np+stack Yes All Medium
Heap n log n n log n nk+np No All Very Low
Counting n n nk+np Yes All Very Low
IV. CONCLUSIONS
Every sorting algorithm has some advantages and disadvantages. In the following table we are tried to show the strengths
and weakness of some sorting algorithms according to their order, memory used, stability, data type and complexity. To
determine the good sorting algorithm, speed is the top consideration but other factor include handling various data type,
consistency of performance, length and complexity of code, and the prosperity of stability .This paper discuss
comparison based sorting algorithms . For future, we proposed design a more efficient algorithm.
ACKNOWLEDGMENT
I would like to give our sincere gratitude to our guide Ms. Sonal who encouraged and guided us throughout this paper.
Apart from her support as a professor and a researcher, we were fortunate to have her support as a friend.
REFERENCES
[1] Demuth, H. Electronic Data Sorting. PhD thesis, Stanford University, 1956.
[2] BLELLOCH, G. E., LEISERSON, C. E., MAGGS, B. M., PLAXTON, C. G., SMITH, S. J., AND ZAGHA, M.
1991. A comparison of sorting algorithms for the connection machine CM-2. In Proc. Symposium on Parallel
Algorithms and Architectures (Hilton Head, SC, July 1991).
[3] THEARLING, K. AND SMITH, S. 1992. An improved super-computing sorting benchmark. In Supercomputing
92 (1992), pp. 14 – 19. IEEE Press.
[4] Perl sort documentation (http:/ / perldoc. perl. org/ functions/ sort. html)
[5] Tim Peters's original description of timsort (http:/ / svn. python. org/ projects/ python/ trunk/ Objects/ listsort. txt)
[6] (http:/ / hg. openjdk. java. net/ jdk7/ tls/ jdk/ rev/ bfd7abda8f79)
[7] Merge sort in Java 1.3 (http:/ / java. sun. com/ j2se/ 1. 3/ docs/ api/ java/ util/ Arrays. html#sort(java. lang.
Object[])), Sun.
[8] Java 1.3 live since 2000
[9] Varinder Kumar Bansal, Rupesh Srivastavass & Pooja, (2009) “Indexed Array Algorithm for Sorting”,
International Conference on Advances in Computing, Control and Telecommunication Technologies.
[10] http://ca.answers.yahoo.com/question/index?qid=20071219060751AAjLx4O.