23 Quick Sort
23 Quick Sort
F O U R T H E D I T I O N
Algorithms
http: //al gs4. cs. pri nceton. edu
Algorithms
ROBERT SEDGEWICK | KEVIN WAYNE
2.3 QUICKSORT
!
quicksort
!
selection
!
duplicate keys
!
system sorts
2
Two classic sorting algorithms
Critical components in the worlds computational infrastructure.
Perl, C++ stable sort, Python stable sort, Firefox JavaScript, ...
Quicksort.
Home PC executes 10
8
compares/second.
Supercomputer executes 10
12
compares/second.
Lesson 1. Good algorithms are better than supercomputers.
Lesson 2. Great algorithms are better than good ones.
insertion sort (N insertion sort (N insertion sort (N
2
) mergesort (N log N) mergesort (N log N) mergesort (N log N) quicksort (N log N) quicksort (N log N) quicksort (N log N)
computer thousand million billion thousand million billion thousand million billion
home instant 2.8 hours 317 years instant 1 second 18 min instant 0.6 sec 12 min
super instant 1 second 1 week instant instant instant instant instant instant
14
Quicksort: best-case analysis
Best case. Number of compares is ~ N lg N.
random shuffle
initial values
Worst case. Number of compares is ~ ! N
2
.
15
Quicksort: worst-case analysis
random shuffle
initial values
Proposition. The average number of compares C
N
to quicksort an array of
N distinct keys is ~ 2N ln
N (and the number of exchanges is ~ " N ln N).
Pf. C
N
satisfies the recurrence C
0
= C
1
= 0 and for N # 2:
C
0
+C
N1
N
C
1
+C
N2
N
+ . . . +
C
N1
+C
0
N
partitioning
partitioning probability
left right
C
N
N + 1
=
C
N1
N
+
2
N + 1
=
C
N2
N 1
+
2
N
+
2
N + 1
=
C
N3
N 2
+
2
N 1
+
2
N
+
2
N + 1
=
2
3
+
2
4
+
2
5
+. . . +
2
N + 1
1
3
+
1
4
+
1
5
+. . .
1
N + 1
2(N + 1)
Z
N+1
3
1
x
dx
substitute previous equation
18
Quicksort: summary of performance characteristics
Worst case. Number of compares is quadratic.
N + (N - 1) + (N - 2) + + 1 ~ ! N
2
.
Order statistics.
N upper bound?
is selection as hard as sorting?
is there a linear-time algorithm for each k?
Partition array so that:
Until one is discovered, use quick-select if you dont need a full sort.
28
Theoretical context for selection
L
i
i
L
L
L
Time Bounds for Selection
bY .
Manuel Blum, Robert W. Floyd, Vaughan Watt,
Ronald L. Rive&, and Robert E. Tarjan
Abstract
The number of comparisons required to select the i-th smallest of
n numbers is shown to be at most a linear function of n by analysis of
a new selection algorithm -- PICK.
Specifically, no more than
5.4305 n comparisons are ever required. This bound is improved for
extreme values of i , and a new lower bound on the requisite number
of comparisons is also proved.
This work was supported by the National Science Foundation under grants
GJ-992 and GJ-33170X.
1
http: //al gs4. cs. pri nceton. edu
ROBERT SEDGEWICK | KEVIN WAYNE
Algorithms
!
quicksort
!
selection
!
duplicate keys
!
system sorts
2.3 QUICKSORT
http: //al gs4. cs. pri nceton. edu
ROBERT SEDGEWICK | KEVIN WAYNE
Algorithms
!
quicksort
!
selection
!
duplicate keys
!
system sorts
2.3 QUICKSORT
31
Duplicate keys
Often, purpose of sort is to bring items with equal keys together.
Huge array.
N!
x
1
! x
2
! x
n
!
i=1
x
i
lg
x
i
N
http: //al gs4. cs. pri nceton. edu
ROBERT SEDGEWICK | KEVIN WAYNE
Algorithms
!
quicksort
!
selection
!
duplicate keys
!
system sorts
2.3 QUICKSORT
http: //al gs4. cs. pri nceton. edu
ROBERT SEDGEWICK | KEVIN WAYNE
Algorithms
!
quicksort
!
selection
!
duplicate keys
!
system sorts
2.3 QUICKSORT
Sorting algorithms are essential in a broad variety of applications:
Data compression.
Computer graphics.
Computational biology.
Uses tuned quicksort for primitive types; tuned mergesort for objects.
Q. Why use different algorithms for primitive and reference types?
import java.util.Arrays;
public class StringSort
{
public static void main(String[] args)
{
String[] a = StdIn.readStrings());
Arrays.sort(a);
for (int i = 0; i < N; i++)
StdOut.println(a[i]);
}
}
45
War story (C qsort function)
AT&T Bell Labs (1991). Allan Wilks and Rick Becker discovered that a
qsort() call that should have taken seconds was taking minutes.
At the time, almost all qsort() implementations based on those in:
BSD Unix (1983): quadratic time to sort random arrays of 0s and 1s.
Why is qsort() so slow?
Basic algorithm = quicksort.
Partitioning item.
small arrays: middle entry
medium arrays: median of 3
large arrays: Tukey's ninther [next slide]
Now widely used. C, C++, Java 6, .
46
Engineering a system sort
SOFTWAREPRACTICE AND EXPERIENCE, VOL. 23(11), 12491265 (NOVEMBER 1993)
Engineering a Sort Function
JON L. BENTLEY
M. DOUGLAS McILROY
AT&T Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ 07974, U.S.A.
SUMMARY
We recount the history of a new qsort function for a C library. Our function is clearer, faster and more
robust than existing sorts. It chooses partitioning elements by a new sampling scheme; it partitions by a
novel solution to Dijkstras Dutch National Flag problem; and it swaps efciently. Its behavior was
assessed with timing and debugging testbeds, and with a program to certify performance. The design
techniques apply in domains beyond sorting.
KEY WORDS Quicksort Sorting algorithms Performance tuning Algorithm design and implementation Testing
INTRODUCTION
C libraries have long included a qsort function to sort an array, usually implemented by
Hoares Quicksort.
1
Because existing qsorts are awed, we built a new one. This paper
summarizes its evolution.
Compared to existing library sorts, our new qsort is fastertypically about twice as
fastclearer, and more robust under nonrandom inputs. It uses some standard Quicksort
tricks, abandons others, and introduces some new tricks of its own. Our approach to build-
ing a qsort is relevant to engineering other algorithms.
The qsort on our home system, based on Scowens Quickersort,
2
had served faith-
fully since Lee McMahon wrote it almost two decades ago. Shipped with the landmark Sev-
enth Edition Unix System,
3
it became a model for other qsorts. Yet in the summer of
1991 our colleagues Allan Wilks and Rick Becker found that a qsort run that should have
taken a few minutes was chewing up hours of CPU time. Had they not interrupted it, it
would have gone on for weeks.
4
They found that it took n
2
comparisons to sort an organ-
pipe array of 2n integers: 123..nn.. 321.
Shopping around for a better qsort, we found that a qsort written at Berkeley in 1983
would consume quadratic time on arrays that contain a few elements repeated many
timesin particular arrays of random zeros and ones.
5
In fact, among a dozen different
Unix libraries we found no qsort that could not easily be driven to quadratic behavior; all
were derived from the Seventh Edition or from the 1983 Berkeley function. The Seventh
0038-0644/93/11124917$13.50 Received 21 August 1992
! 1993 by John Wiley & Sons, Ltd. Revised 10 May 1993
47
Tukey's ninther
Tukey's ninther. Median of the median of 3 samples, each of 3 entries.
GPUsort.
50
System sort: Which algorithm to use?
Applications have diverse attributes.
Stable?
Parallel?
Deterministic?