05 Sort
05 Sort
Sorting Arrays
1 Introduction
We begin this lecture by discussing how to compare running times of func-
tions in an abstract, mathematical way. The same underlying mathematics
can be used for other purposes, like comparing memory consumption or
the amount of parellism permitted by an algorithm. We then use this to
take a first look at sorting algorithms, of which there are many. In this lec-
ture it will be selection sort because of its simplicity.
In terms of our learning goals, we will work on:
2 Big-O Notation
In the design and analysis of algorithms we try to make this mathematically
precise by deriving so-called asymptotic complexity measures for algorithms.
In addition for wanting mathematical precision, there are two fundamental
principles that guide our mathematical analysis.
L ECTURE N OTES
Sorting Arrays L5.2
First, we observe that the problems we care about are ones that get
harder as our inputs get bigger, so our definition of Big-O captures
the idea that we only care about the behavior of an algorithm on large
inputs, that is, when it takes a long time. It is when the inputs are
large that differences between algorithms become really pronounced.
Second, there is another mathematical concept, Big-Θ, which you can
read about on your own and which is frequently the concept that we
actually want to talk about in this class. But computer scientists defi-
nitely tend to think and talk and communicate in terms of Big-O no-
tation. We teach Big-O in part to help you communicate with other
computer scientists!
2. We want an anaylsis that is enduring. One consequence of this is that
want our analysis to be the same even given computers that work
very different than the ones we use – in particular, ones that are much
faster than the ones we use.
The only way to handle this is to say that we don’t care about constant
factors in the mathematical analysis of how long it takes our program
to run. In practice, constant factors can make a big difference, but
they are influenced by so many factors (compiler, runtime system,
machine model, available memory, etc.) that at the abstract, mathe-
matical level a precise analysis is neither appropriate nor feasible.
Let’s see how these two fundamental principles guide us in the comparison
between functions that measure the running time of an algorithm.
Let’s say we have functions f and g that measure the number of oper-
ations of an algorithm as a function of the size of the input. For example
f (n) = 3 ∗ n measures the number of comparisons performed in linear
search for an array of size n, and g(n) = 3 ∗ log(n) measures the number of
comparisons performed in binary search for an array of size n.
The simplest form of comparison would be
g ≤0 f if for every n ≥ 0, g(n) ≤ f (n).
However, this violates principle (1) because we compare the values and g
and f on all possible inputs n.
We can refine this by saying that eventually, g will always be smaller or
equal to f . We express “eventually” by requiring that there be a number n0
such that g(n) ≤ f (n) for all n that are greater than n0 .
g ≤1 f if there is some n0 such that for every n ≥ n0 it is the case
that g(n) ≤ f (n).
L ECTURE N OTES
Sorting Arrays L5.3
This now incorporates the first principle (we only care about the func-
tion on large inputs), but constant factors still matter. For example, accord-
ing to the last definition we have 3 ∗ n ≤1 5 ∗ n but 5 ∗ n 6≤1 3 ∗ n. But if
constant factors don’t matter, then the two should be equivalent. We can
repair this by allowing the right-hand side to be multiplied by an arbitrary
constant.
g ≤2 f if there is a constant c > 0 and some n0 such that for
every n ≥ n0 we have g(n) ≤ c ∗ f (n).
This definition is now appropriate.
The less-or-equal symbol ≤ is already overloaded with many meanings,
so we write instead:
g ∈ O(f ) if there is a constant c > 0 and some n0 such that for
every n ≥ n0 we have g(n) ≤ c ∗ f (n).
This notation derives from the view of O(f ) as a set of functions, namely
those that eventually are smaller than a constant times f .1 Just to be ex-
plicit, we also write out the definition of O(f ) as a set of functions:
O(f ) = {g | there are c > 0 and n0 s.t. for all n ≥ n0 , g(n) ≤ c ∗ f (n)}
With this definition we can check that O(f (n)) = O(c ∗ f (n)).
When we characterize the running time of a function using big-O nota-
tion we refer to it as the asymptotic complexity of the function. Here, asymp-
totic refers to the fundamental principles listed above: we only care about
the function in the long run, and we ignore constant factors. Usually, we
use an analysis of the worst case among the inputs of a given size. Trying
to do average case analysis is much harder, because it depends on the distri-
bution of inputs. Since we often don’t know the distribution of inputs it is
much less clear whether an average case analysis may apply in a particular
use of an algorithm.
The asymptotic worst-case time complexity of linear search is O(n),
which we also refer to as linear time. The worst-case asymptotic time com-
plexity of binary search is O(log(n)), which we also refer to as logarithmic
time. Constant time is usually described as O(1), expressing that the running
time is independent of the size of the input.
Some brief fundamental facts about big-O. For any polynomial, only
the highest power of n matters, because it eventually comes to dominate the
1
In textbooks and research papers you may sometimes see this written as g = O(f ) but
that is questionable, comparing a function with a set of functions.
L ECTURE N OTES
Sorting Arrays L5.4
3 Sorting Algorithms
We have seen in the last lecture that having a sort arrays can make it easier
to do search. This suggests that it may be important to be able to take an
unsorted an array and rearrange it so it’s sorted!
There are many different algorithms for sorting: bucket sort, bubble
sort, insertion sort, selection sort, heap sort, etc. This is testimony to the
importance and complexity of the problem, despite its apparent simplic-
ity. In this lecture we discuss selection sort, which is one of the simplest
algorithms.
L ECTURE N OTES
Sorting Arrays L5.5
4 Selection Sort
Selection sort is based on the idea that on each iteration we select the small-
est element of the part of the array that has not yet been sorted and move it
to the end of the sorted part at the beginning of the array.
Let’s play this through for two steps on an example array. Initially, we
consider the whole array (from i = 0 to the end). We write this as A[0..n),
that is the segment of the array starting at 0 up to n, where n is excluded.
0
1
2
3
4
5
6
7
8
9
10
A
12
87
21
3
2
78
97
16
89
21
i = 0 n
We now find the minimal element of the array segment under consid-
eration (2) and move it to the front of the array. What do we do with the
element that is there? We move it to the place where 2 was (namely at
A[4]). In other words, we swap the first element with the minimal element.
Swapping is a useful operation when sorting an array in place by modifying
it, because the result of a correct sort must be a permutation of the input.
If swapping is our only operation we are immediately guaranteed that the
result is a permutation of the input.
0
1
2
3
4
5
6
7
8
9
10
A
2
87
21
3
12
78
97
16
89
21
i n
Now 2 is in the right place, and we find the smallest element in the
remaining array segment and move it to the beginning of the segment (i =
1).
L ECTURE N OTES
Sorting Arrays L5.6
0
1
2
3
4
5
6
7
8
9
10
A
2
3
21
87
12
78
97
16
89
21
i n
Let’s pause and see if we can write down properties of the variables and
array segments that allow us to write the code correctly. First we observe
rather straightforwardly that
0≤i≤n
where i = n after the last iteration and i = 0 before the first iteration. Next
we observe that the elements to the left of i are already sorted.
A[0..i) sorted
These two invariants are true initially and suffice to imply the postcondi-
tion. However, it won’t be possible to prove the correctness of selection
sort because we can’t prove that these two invariants, on their own, are
preserved by every iteration of the loop. We also need to know that all el-
ements to the left of i are less or equal to all element to the right of i. We
abbreviate this:
A[0..i) ≤ A[i..n)
saying that every element in the left segment is smaller than or equal to
every element in the right segment.
L ECTURE N OTES
Sorting Arrays L5.7
0
1
2
3
4
5
6
7
8
9
10
A
2
3
21
87
12
78
97
16
89
21
i n
In the next iteration we pick the minimal element among A[i..n), which
would be 12 = A[4]. We now swap this to i = 2 and increment i. We write
here i0 = i + 1 in order to distinguish the old value of i from the new one,
as we do in proofs of preservation of the loop invariant.
0
1
2
3
4
5
6
7
8
9
10
A
2
3
12
87
21
78
97
16
89
21
i i’ = i+1 n
L ECTURE N OTES
Sorting Arrays L5.8
We encourage you to now write the function, using the following aux-
iliary and contract functions:
Please write it and then compare it to our version on the next page.
L ECTURE N OTES
Sorting Arrays L5.9
At this point, let us verify that the loop invariants are initially satisfied.
We should also verify the assertion we added in the loop body. It ex-
presses that A[m] is less or equal to any element in the segment A[i..hi ),
abbreviated mathematically as A[m] ≤ A[i..hi ). This should be implied by
the postcondition of the find_min function.
How can we prove the postcondition (@ensures) of the sorting func-
tion? By the loop invariant lo ≤ i ≤ hi and the negation of the loop con-
dition i ≥ hi we know i = hi . The second loop invariant then states that
A[lo..hi ) is sorted, which is the postcondition.
L ECTURE N OTES
Sorting Arrays L5.10
6 Auxiliary Functions
Besides the specification functions in contracts, we also used two auxiliary
functions: swap and find_min.
Here is the implementation of swap.
For find_min, we recommend you follow the method used for selection
sort: follow the algorithm for a couple of steps on a generic example, write
down the invariants in general terms, and then synthesize the simple code
and invariants from the result. What we have is below, for completeness.
return min;
}
L ECTURE N OTES
Sorting Arrays L5.11
n(n + 1) n2 n
O( ) = O( + ) = O(n2 )
2 2 2
The last equation follows since for a polynomial, as we remarked earlier,
only the degree matters.
We summarize this by saying that the worst-case running time of selec-
tion sort is quadratic. In this algorithm there isn’t a significant difference
between average case and worst case analysis: the number of iterations is
exactly the same, and we only save one or two assignments per iteration in
the loop body of the find_min function if the array is already sorted.
L ECTURE N OTES
Sorting Arrays L5.12
8 Empirical Validation
If the running time is really O(n2 ) and not asymptotically faster, we predict
the following: for large inputs, its running time should be essentially cn2
for some constant c. If we double the size of the input to 2n, then the running
time should roughly become c(2n)2 = 4(cn2 ) which means the function
should take approximately 4 times as many seconds as before.
We try this with the function sort_time(n, r) which generates a ran-
dom array of size n and then sorts it r times. You can find the C0 code
as sort-time.c0 in this lecture’s code directory. We run this code several
times, with different parameters.
n Time Ratio
1000 0.700
2000 2.700 3.85
4000 10.790 4.00
8000 42.796 3.97
We see that especially for the larger numbers, the ratio is almost exactly 4
when doubling the size of the input. Our conjecture of quadratic asymp-
totic running time has been experimentally confirmed.
L ECTURE N OTES