Lecture 4
Lecture 4
Sorting – Part B
(Chapter 7)
Sorting
• Insertion sort
– Design approach: incremental
– Sorts in place: Yes
– Best case: (n)
– Worst case: (n2)
• Bubble Sort
– Design approach: incremental
– Sorts in place: Yes
– Running time: (n2)
2
Sorting
• Selection sort
– Design approach: incremental
– Sorts in place: Yes
– Running time: (n2)
• Merge Sort
– Design approach: divide and conquer
– Sorts in place: No
– Running time: Let’s see!!
3
Divide-and-Conquer
• Divide the problem into a number of sub-problems
– Similar sub-problems of smaller size
8
– Solve the sub-problems recursively
4
Merge Sort Approach
• To sort an array A[p . . r]:
• Divide
– Divide the n-element sequence to be sorted into two
subsequences of n/2 elements each
• Conquer
– Sort the subsequences recursively using merge sort
– When the size of the sequences is 1 there is nothing
more to do
• Combine
– Merge the two sorted subsequences
5
Merge Sort
p q r
1 2 3 4 5 6 7
0
8
Alg.: MERGE-SORT(A, p, r) 5 2 4
1
7 1 3 2 6
if p < r
Tim
then q ← (p + r)/2
Check for base case
Divide
MERGE-SORT(A, p, q) Conquer
a
MERGE-SORT(A, q + 1, r) Conquer
o
MERGE(A, p, q, r) Combine
6
Example – n Power of 2 e
1 2 3 4 5 6 7 8
Divide 5 2 4 7 1 3 2 6 q=4
00
1 2 3 4 5 6 7 8
5 2 4 7 1 3 2 6
1 2 3 4 5 6 7 8
5 2
00 4 7 1 3 2 6
1 2 3 4 5 6 7 8
00
5 2 4 7 1 3 2 6
7
Example – n Power of 2
1 2 3 4 5 6 7 8
Conquer 1 2 2 3 4 5 6 7
and
Merge 1 2 3 4 5 6 7 8
2 4 5 7 1 2 3 6
1 2 3 4 5 6 7 8
2 5 4 7 1 3 2 6
1 2 3 4 5 6 7 8
5 2 4 7 1 3 2 6
8
Example – n Not a Power of 2
1 2 3 4 5 6 7 8 9 10 11
4 7 2 6 1 4 7 3 5 2 6 q=6
Divide
1 2 3 4 5 6 7 8 9 10 11
q=3 4 7 2 6 1 4 7 3 5 2 6 q=9
1 2 3 4 5 6 7 8 9 10 11
4 7 2 6 1 4 7 3 5 2 6
1 2 3 4 5 6 7 8 9 10 11
4 7 2 6 1 4 7 3 5 2 6
1 2 4 5 7 8
4 7 6 1 7 3
9
Example – n Not a Power of 2
1 2 3 4 5 6 7 8 9 10 11
Conquer 1 2 2 3 4 4 5 6 6 7 7
and
Merge
1 2 3 4 5 6 7 8 9 10 11
1 2 4 4 6 7 2 3 5 6 7
1 2 3 4 5 6 7 8 9 10 11
2 4 7 1 4 6 3 5 7 2 6
1 2 3 4 5 6 7 8 9 10 11
4 7 2 1 6 4 3 7 5 2 6
1 2 4 5 7 8
4 7 6 1 7 3
10
aikiiia.FI a
Divide
Hirth ea
A
I
2 Cong Image
Ti
E.es a
d fifies
I 4 It
EE
FEE if
Merging
p q r
1 2 3 4 5 6 7 8
2 4 5 7 1 2 3 6
11
Merging
p q r
• Idea for merging: 1 2 3 4 5 6 7 8
2 4 5 7 1 2 3 6
– Two piles of sorted cards
• Choose the smaller of the two top cards
• Remove it and place it in the output pile
A2 A[q+1, r]
12
Example: MERGE(A, 9, 12, 16)
p q r
13
Example: MERGE(A, 9, 12, 16)
14
Example (cont.)
15
Example (cont.)
16
Example (cont.)
Done!
17
Merge - Pseudocode
p q r
Alg.: MERGE(A, p, q, r) 1 2 3 4 5 6 7 8
2 4 5 7 1 2 3 6
1. Compute n1 and n2
2. Copy the first n1 elements into n1 n2
L[1 . . n1 + 1] and the next n2 elements into R[1 . . n2 + 1]
3. L[n1 + 1] ← ; R[n2 + 1] ← p q
4. i ← 1; j ← 1 L 2 4 5 7
5. for k ← p to r q+1 r
6. do if L[ i ] ≤ R[ j ] R 1 2 3 6
7. then A[k] ← L[ i ]
8. i ←i + 1
9. else A[k] ← R[ j ]
10. j←j+1
18
Running Time of Merge
(assume last for loop)
• Initialization (copying into temporary arrays):
– (n1 + n2) = (n)
• Adding the elements to the final array:
- n iterations, each taking constant time (n)
• Total time for Merge:
– (n)
2 a 2741210 19
Analyzing Divide-and Conquer Algorithms
• The recurrence is based on the three steps of
the paradigm:
– T(n) – running time on a problem of size n
– Divide the problem into a subproblems, each of size
n/b: takes D(n) 0
I
– Conquer (solve) the subproblems aT(n/b)
– Combine the solutions C(n)
(1) if n ≤ c
I
T(n) = aT(n/b) + D(n) + C(n) otherwise
are
20
MERGE-SORT Running Time
• Divide: 9 6
– compute q as the average of p and r: D(n) = (1)
• Conquer:
– recursively solve 2 subproblems, each of size n/2
2T (n/2) an
• Combine:
– MERGE on an n-element subarray takes (n) time
C(n) = (n)
(1) if n =1
T(n) = 2T(n/2) + (n) if n > 1
n 2TT a 2 a
jan 6 15 21ign
n1
Solve the Recurrence
T(n) = c if n = 1
2T(n/2) + cn if n > 1
22
Merge Sort - Discussion
• Running time insensitive of the input
• Advantages:
– Guaranteed to run in (nlgn)
490
• Disadvantage
– Requires extra space N
α b
24
Sorting Files with Huge Records and
Small Keys
• Insertion sort or bubble sort?
• Selection sort?
25
Sorting Challenge 2
Problem: Sort a huge randomly-ordered file of
small records
Application: Process transaction record for a
phone company
26
Sorting Huge, Randomly - Ordered Files
• Selection sort?
– NO, always takes quadratic time
• Bubble sort?
– NO, quadratic time for randomly-ordered keys
• Insertion sort?
– NO, quadratic time for randomly-ordered keys
• Mergesort?
– YES, it is designed for this problem
27
Sorting Challenge 3
Problem: sort a file that is already almost in
order
Applications:
– Re-sort a huge database after a few changes
– Doublecheck that someone else sorted a file
Which sorting method to use?
A. Mergesort, guaranteed to run in time NlgN
B. Selection sort
C. Bubble sort
D. A custom algorithm for almost in-order files
E. Insertion sort
28
Sorting Files That are Almost in Order
• Selection sort?
– NO, always takes quadratic time
• Bubble sort?
– NO, bad for some definitions of “almost in order”
– Ex: B C D E F G H I J K L M N O P Q R S T U V W X Y Z A
• Insertion sort?
– YES, takes linear time for most definitions of “almost
in order”
• Mergesort or custom method?
– Probably not: insertion sort simpler and faster
29
Quicksort
A[p…q] ≤ A[q+1…r]
• Sort an array A[p…r]
• Divide
– Partition the array A into 2 subarrays A[p..q] and A[q+1..r], such
that each element of A[p..q] is smaller than or equal to each
element in A[q+1..r]
– Need to find index q to partition the array
pint
0 30
Quicksort
A[p…q] ≤ A[q+1…r]
• Conquer
00
– Recursively sort A[p..q] and A[q+1..r] using Quicksort
• Combine
– Trivial: the arrays are sorted in place
– No additional work is required to combine them
– The entire array is now sorted
31
QUICKSORT
if p < r I
then q PARTITION(A, p, r)
QUICKSORT (A, p, q)
Recurrence:
T(n) = T(q) + T(n – q) + f(n) PARTITION())
32
Partitioning the Array
• Choosing PARTITION()
– There are different ways to do this
– Each has its own advantages/disadvantages
5 3 2 6 4 1 3 7 5 3 2 6 4 1 3 7
i j i j
3 3 2 6 4 1 5 7 3 3 2 6 4 1 5 7
i j i j
A[p…q] A[q+1…r]
3 3 2 1 4 6 5 7 3 3 2 1 4 6 5 7
i j j i
34
Example
3
PNA
I
pinko
0 9.00
00
0
35
i as
is
EFFIE
i j j
iiiii.EC
it
5 6 7
IKE
Partitioning the Array
Alg. PARTITION (A, p, r)
r
1. x A[p]
p
A: 5 3 2 6 4 1 3 7
2. i p – 1
3. j r + 1 i j
A[p…q] ≤ A[q+1…r]
4. while TRUE
5. do repeat j j – 1 A: ap ar
6. until A[j] ≤ x
j=q i
7. do repeat i i + 1
8. until A[i] ≥ x
Each element is
9. if i < j visited once!
10. then exchange A[i] A[j] Running time: (n)
n=r–p+1
11. else return j
36
Recurrence
if p < r
then q PARTITION(A, p, r)
QUICKSORT (A, p, q)
Recurrence:
T(n) = T(q) + T(n – q) + n
37
Worst Case Partitioning
• Worst-case partitioning
– One region has one element and the other has n – 1 elements
– Maximally unbalanced
n n
• Recurrence: q=1 1 n-1 n
1 n-2 n-1
T(n) = T(1) + T(n – 1) + n,
n 1 n-3 n-2
T(1) = (1) 1
2 3
T(n) = T(n – 1) + n
1 1 2
n
n + k − 1 = ( n ) + ( n 2 ) = ( n 2 )
0
(n2)
=
k =1
When does the worst case happen? 38
Best Case Partitioning
• Best-case partitioning
– Partitioning produces two regions of size n/2
• Recurrence: q=n/2
T(n) = 2T(n/2) + (n)
T(n) = (nlgn) (Master theorem)
o_0
39
Case Between Worst and Best
40
How does partition affect performance?
41
How does partition affect performance?
42
Performance of Quicksort
• Average case
– All permutations of the input numbers are equally likely
– On a random input array, we will have a mix of well balanced
and unbalanced splits
– Good and bad splits are randomly distributed across throughout
the tree
partitioning cost:
n combined partitioning cost: n n = (n)
1 n-1 2n-1 = (n)
(n – 1)/2 + 1 (n – 1)/2
(n – 1)/2 (n – 1)/2